|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alphabets & fonts | | HOME | DOWNLOAD | DONATE | WHAT'S NEW | VOCABULARY DATABASE | | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
All versions of Windows are able and ready to handle Western alphabets,
but if you intend to read or type in certain non-Western scripts on your system, you will need to take certain
measures. These may include installing fonts, tuning up your Windows system a little bit, or both. On this page you will find background information about how to handle this topic in Windows.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There are languages that use alphabets (e.g. English, Spanish), languages
that use alphabets with "context-sensitive" glyphs (e.g. Arabic, Hebrew), languages that use syllabaries
(e.g. Korean), and languages that use more complex scripts (e.g. Chinese, Japanese). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Introduction One-byte fonts map up to 256 characters and are usually designed for use with a given script or alphabet, i.e. they are language-specific. For this reason, the characters are usually mapped in compliance to some standard chart. Anyway, sometimes non-standard fonts containing offbeat characters are very useful for language learning. For example, you may need the accented Russian vowels or the vowel-pointed Arabic consonants that are in use in dictionaries. Technical information about one-byte fonts The "byte" [abbreviated with a B] is a unit of data that is 8 "bits" [abbr. b] long. For this reason, you can use the byte as a unit to define up to 256 characters (1 byte = 8 bits; 2 to the power 8 makes 256 different "permutations"). Since most language scripts are alphabet-based, 256 characters are usually enough to represent all symbols of a given language. This is why all fonts in use in the first versions of Windows were single-byte fonts. Back then, different language-specific "codepages" were developed. A codepage is a chart in which each of 256 entries is allocated a character:
Morale: Every one-byte font complies with some codepage, i.e. it is language-specific. Some examples of codepages:
More information: The ISO 8859 character soup shows charts of 8-bit (single-byte) Latin, Cyrillic, Arabic, Greek, Hebrew codepages in use today. See also XenCraft's page and alis.com's page. Limitations of one-byte fonts One important disadvantage to one-byte fonts is that multilingual edition is not possible in plain text format. (Remember the gibberish text in much of the spam email we receive today?) In order to deal with this problem, most modern fonts comply with the new Unicode standard. On the other hand, languages such as Chinese, Japanese and Korean (Hanja) have too many characters to fit in a chart with only 256 entries. For this reason, so-called double-byte character sets (DBCS) have been in use for decades. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Two-byte font systems can map up to 65536 (256 squared) characters. Specifically,
so-called DBCS (Double-byte Character Sets) have been developed to make computers able to handle complex scripts
such as Chinese, Japanese, and Korean. To display or type in a DBCS font, you have to upgrade your Windows system with a CJK input method editor. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Introduction Unicode fonts can contain thousands of characters, thus potentially covering all languages of the world. Note for Windows users: Please note that Unicode fonts and methods work properly only on 2000 / XP and later. In older operating systems, trying to use non-Western Unicode characters is bound to cause you trouble. Notwithstanding, you can still use the good old one-byte fonts in Windows 2000 / XP. In fact, certain one-byte fonts with a non-standard codepage prove very useful for language learning. For example, you may need accented characters to learn Russian (More...) Technical information about the Unicode standard Unicode is a character encoding system aimed at covering all languages in the world. In opposition to Double-Byte Character Sets (which can contain up to 65536 characters), the Unicode standard can be extended progressively. So far, 17 "Planes" have been defined for Unicode, and most scripts are covered by Plane 0 ("Basic Multilingual Plane"). Moreover, in most Unicode fonts not all entries ("code points") have been mapped, partly in order not to waste system resources (memory), and partly to save the cost involved by a high increase in work time. The characters in a Unicode font are grouped into "subsets" (Latin, Greek, Braille, mathematical symbols, etc.), which by the way do not match the codepages of one-byte fonts. Check out the Free Online Unicode Character Map at Oxford University to view the characters contained in Unicode fonts. See also A Unicode Test Page to see if your browser is Unicode-compliant. Now, how are Unicode characters actually stored in memory? For this purpose, there are several systems called "encodings". The most important encodings are:
Unfortunately, due to the issues explained below, we must advise you against using Unicode fonts for non-Western scripts. Use a language-specific, single-byte font instead. Controversies It has been argued that Unicode will be too small for future East Asian online digital libraries, which will need to support old character sets. There are many characters present in Asian historical and personal names, which make up much more than the 65536 characters covered by the two-byte Unicode set. Moreover, Unicode ignores the fact that for many characters several different glyphs are in use, folding the differences into unified glyphs (Unihan glyphs). On the other hand, the Government of the Popular Republic of China has made compatibility with the new four-byte character set GB18030 compulsory for software sold in that country. More links For more information about Unicode, see Unicode.org The worldwide consortium defining the Unicode standard. Alan Wood's Unicode resources Comments on Unicode-related utilities. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
See also:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Updated: 2023 January 16 Legal notice. |
Copyright © 1999-2024 by The authors.
All rights reserved. Our homepage is http://www.vtrain.net |