Character code - Mojikodo (English notation) character code

Japanese: 文字コード - もじこーど（英語表記）character cord

On a computer, characters are represented by digital codes. These codes are called character codes. They are also called character codes. When there are only a few characters, such as the alphabet, the number of bits required for the code is small, and one byte (8 bits) is assigned to one character, but in languages such as Japanese, which use many kanji characters, several bytes are required for one character. In the United States, ASCII (a character code established by the American National Standards Institute in 1962) was used as the standard from early on and became widespread, but in the case of Japanese, although JIS (Japanese Industrial Standards) was established, it was insufficient, and various variations appeared and the standard became disordered. As a result, the characters entered and displayed on e-mails and websites were different, causing the so-called garbled characters phenomenon. The Japanese codes currently in use include JIS7, Shift-JIS, EUC, and UNICODE. However, character coding for tens of thousands of kanji characters and many other languages, including Arabic, is progressing, and a global standard is also being established. If this development becomes widespread, we will be free from garbled characters and will be able to see the correct display of websites in any language.

The large number of kanji characters has been a major reason why they have been difficult to computerize. However, it is said that 90% of all texts can be written with the 2,000 or so characters currently in circulation in Japan, and 99% of all texts can be written with a further 5,000 characters. Even if all kanji characters used in the past are included, it would be at most 100,000 characters. With this number, current computer technology no longer has any problems in terms of processing or memory. What complicates the problem of encoding kanji characters is the confusion of differences between character types and fonts, as well as issues such as whether a character is a typo or a variant. Character codes are meaningful in that they have a one-to-one relationship with character types. However, even kanji characters with the same meaning have slightly different shapes in Japan, China, Taiwan, and Korea. Should these be treated as different character types? Also, there are many created characters (variant characters) in names, etc., in which dots are added to the standard character shape to promote good luck with the number of strokes, and should these also be distinguished as different character types? There are even typos in past names that were registered in family registers. Character codes are the basic data for classifying, searching, and checking information, and considering the exponential improvement in computer performance, the character coding method that simply represents characters as a single digital code may have reached its limit.

[Tamura Koichi]

Source: Shogakukan Encyclopedia Nipponica About Encyclopedia Nipponica Information | Legend

Japanese:

コンピュータ上では、文字はデジタル符号で表される。この符号を文字コードという。キャラクターコードともいう。アルファベットのように字種の数が少ない場合は符号として必要なビット数が少なくてすみ、1バイト（8ビット）が1文字に割り当てられているが、字種の多い漢字を使う日本語などでは1文字に数バイト必要である。アメリカでは早くからASCII（アスキー、アメリカ規格協会が1962年に制定した文字コード）が標準として用いられ、普及したが、日本語の場合、JIS（ジス）で定められてはいたものの不十分であったことからさまざまな変形が現れ、標準に乱れが生じた。その結果、電子メールやホームページの表示に、入力した文字と出力した文字が違う、いわゆる文字化け現象を引き起こすこととなった。現在使用されている日本語コードには、JIS7、Shift-JIS、EUC、UNICODEなどがある。しかし数万の漢字やアラビア語など多くの言語の文字コード化が進み、世界標準も整備されつつある。この整備が行き渡れば、文字化けから解放され、いずれの言語のホームページも正しい表示を見ることができるようになる。

　漢字は字種が多いことが、これまではコンピュータ化されにくい大きな理由になっていた。しかし、日本で現在流通している漢字では、使用頻度の高い2000字種くらいまでで90％、さらに5000字種まで広げるとあらゆる文章の99％まで表記可能であるといわれている。過去に使用された漢字すべてを入れても、たかだか10万字であろう。この程度の数ならば、もはや現在のコンピュータ技術では処理の面でも記憶量の面でも問題とならない。漢字の文字コード化問題を複雑にしているのは、字種と書体（フォント）の違い、また、誤字か異体字かなどの問題が錯綜(さくそう)していることである。文字コードは字種との一対一の関係をもつことに意味がある。ところが同一の意味をもつ漢字でも日本、中国、台湾、韓国で微妙に字形が異なる字がある。これを異なる字種として扱うべきかどうか。また人名などでは画数の縁起を担いで標準字形に点を増やすなどした、つくられた字（異体字）も多く、これらも異なる字種として区別するべきかどうか。過去の人名には戸籍の届け出に際しての誤字さえある。文字コードは情報の分類や検索、照合の基礎データとなるものであり、コンピュータの幾何級数的性能向上を考慮するならば、単純に字種を一つのデジタル符号として表す文字コード化の方法には限界がきているのかもしれない。

［田村浩一郎］

出典　小学館　日本大百科全書(ニッポニカ)日本大百科全書(ニッポニカ)について　情報 | 凡例

<<: Mossi people - Mossi (English spelling)

>>: Phyllosporum - Phyllosporum