ANSI vs Unicode, traditional ANSI text files store each character as a single
byte, 8 bits. Unicode uses two bytes (16 bits) per character. This allows support
for complex character sets like some foreign alphabets, but creates a programming
problem. There is no simple way to determine whether a character is stored in one
byte or two, and searching Unicode text is complicated.
Given the encoding scheme's name, ANSI - American National Standards Institute,
it's no surprise that most of the symbols it defines are part of the English
language. plus common punctuation marks, some math symbols such as plus and minus,
and U.S. currency symbols for dollars and cents - making total of 223 symbols.
On the other hand, Unicode having over 65,000 different symbols allows for many
if not all foreign letters and diacritic characters.
Over the last few years Unicode has quietly been replacing ANSI as the encoding
method of choice inside our PCs. But application programs have been slower to
adopt Unicode because searching Unicode text is complicated. Sorry to say, it
will take 2-4 years for this journal to switch all its operation from ANSI to Unicode.
May 2003
|