This is a tabular overview of the similarities and differences of the following text encodings:
ISO/IEC 8859 (without hyphen), as defined by ISO/IEC, does not assign control codes to the 0x00-0x1f and 0x7f-0x9f ranges. This is done by its superset ISO-8859 (with hyphen), as defined by IANA, which assigns the C0 and C1 control codes to these code points, as given below.
The "Latin-x" naming of the various ISO-8859 variants is non-continuous. Note the "holes" (8859-5 to 8859-8 and 8859-11 are not "Latin-x").
Standardization of ISO 8859-12 (Devanagari) was officially abandoned in 1997.
IBM CP858 differs from CP850 in only one character: 0xD5 (LATIN SMALL LETTER DOTLESS I), which was replaced with the Euro currency symbol.
Several devices from the IBM codepage era interpret code points 0x01 - 0x1F and 0x7F as graphic characters, but the official encoding tables list the same C0 control codes as given for the Windows and ISO/IEC codepages. The graphic characters are not given here.
Historically, CP1252 was based on an ANSI draft, and calling this encoding "ANSI" is still common in the Microsoft universe despite being a misnomer.
Unicode code points are given as 16bit hexadecimal on this page. This is enough to cover the Basic Multilingual Plane, and by implication, all characters presented here. However, Unicode also specifies several Supplementary Planes (e.g. historic scripts, extended CJK ideographs, emoticons etc.), which are outside the 16bit range. If you need to hold all conceivable Unicode code points, use a 32bit integer.
There are several characters that, from a casual look at their glyphs, seem to be identical, but are not. Special attention is advised at code point 0xd0 (Unicode characters 0x00d0 'Ð' and 0x0110 'Đ'). Other "close" characters are found at 0xd5, 0xd9, 0xe3, 0xf1, 0xf5, 0xfb and 0xf0.
The top row in each table cell is the character (or the acronym of a control code / special character, see note above).
Bottom row is the Unicode code point as 16-bit hexadecimal (but see note above).
On mouse-over, a tooltip appears with three lines of information:
The octal encoding is presented for use in C/C++ string literals (\ooo is well-defined, whereas \xhh fails to work as expected if the next character is also a valid hex digit).
Control codes and special characters are given as their two- or three-letter acronym, with a bold-line box around their table cell. (Character 0x20 is the standard space).
A code point not assigned a character in a given encoding is given a black box.
Code points that are assigned a character in the given encoding which differs from ISO-8859-1 are given a grey box. (In case of Windows-1250, grey boxes indicate differences from ISO-8859-2.)
Combining characters are given light grey boxes.
The first 127 code points are identical for all encodings, including UTF-8 (which was the whole point behind the latter).
Code | ...0 | ...1 | ...2 | ...3 | ...4 | ...5 | ...6 | ...7 | ...8 | ...9 | ...A | ...B | ...C | ...D | ...E | ...F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0... | NUL 0000 |
SOH 0001 |
STX 0002 |
ETX 0003 |
EOT 0004 |
ENQ 0005 |
ACK 0006 |
BEL 0007 |
BS 0008 |
HT 0009 |
LF 000a |
VT 000b |
FF 000c |
CR 000d |
SO 000e |
SI 000f |
1... | DLE 0010 |
DC1 0011 |
DC2 0012 |
DC3 0013 |
DC4 0014 |
NAK 0015 |
SYN 0016 |
ETB 0017 |
CAN 0018 |
EM 0019 |
SUB 001a |
ESC 001b |
FS 001c |
GS 001d |
RS 001e |
US 001f |
2... | SP 0020 |
! 0021 |
" 0022 |
# 0023 |
$ 0024 |
% 0025 |
& 0026 |
' 0027 |
( 0028 |
) 0029 |
* 002a |
+ 002b |
, 002c |
- 002d |
. 002e |
/ 002f |
3... | 0 0030 |
1 0031 |
2 0032 |
3 0033 |
4 0034 |
5 0035 |
6 0036 |
7 0037 |
8 0038 |
9 0039 |
: 003a |
; 003b |
< 003c |
= 003d |
> 003e |
? 003f |
4... | @ 0040 |
A 0041 |
B 0042 |
C 0043 |
D 0044 |
E 0045 |
F 0046 |
G 0047 |
H 0048 |
I 0049 |
J 004a |
K 004b |
L 004c |
M 004d |
N 004e |
O 004f |
5... | P 0050 |
Q 0051 |
R 0052 |
S 0053 |
T 0054 |
U 0055 |
V 0056 |
W 0057 |
X 0058 |
Y 0059 |
Z 005a |
[ 005b |
\ 005c |
] 005d |
^ 005e |
_ 005f |
6... | ` 0060 |
a 0061 |
b 0062 |
c 0063 |
d 0064 |
e 0065 |
f 0066 |
g 0067 |
h 0068 |
i 0069 |
j 006a |
k 006b |
l 006c |
m 006d |
n 006e |
o 006f |
7... | p 0070 |
q 0071 |
r 0072 |
s 0073 |
t 0074 |
u 0075 |
v 0076 |
w 0077 |
x 0078 |
y 0079 |
z 007a |
{ 007b |
| 007c |
} 007d |
~ 007e |
DEL 007f |
Having a text with codes in this area is a good indication that you are looking at an IBM or Windows encoding, not ISO/IEC (as the C1 control characters are hardly ever used in "normal" text).
Code | ISO-8859 | Windows | IBM | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-1 | -2 | -3 | -4 | -5 | -6 | -7 | -8 | -9 | -10 | -11 | -13 | -14 | -15 | -16 | -1250 | -1252 | CP437 | CP850 | |
0x80 | PAD 0080 |
€ 20ac |
Ç 00c7 |
||||||||||||||||
0x81 | HOP 0081 |
ü 00fc |
|||||||||||||||||
0x82 | BPH 0082 |
‚ 201a |
é 00e9 |
||||||||||||||||
0x83 | NBH 0083 |
ƒ 0192 |
â 00e2 |
||||||||||||||||
0x84 | IND 0084 |
„ 201e |
ä 00e4 |
||||||||||||||||
0x85 | NEL 0085 |
… 2026 |
à 00e0 |
||||||||||||||||
0x86 | SSA 0086 |
† 2020 |
å 00e5 |
||||||||||||||||
0x87 | ESA 0087 |
‡ 2021 |
ç 00e7 |
||||||||||||||||
0x88 | HTS 0088 |
ˆ 02c6 |
ê 00ea |
||||||||||||||||
0x89 | HTJ 0089 |
‰ 2030 |
ë 00eb |
||||||||||||||||
0x8A | VTS 008a |
Š 0160 |
è 00e8 |
||||||||||||||||
0x8B | PLD 008b |
‹ 2039 |
ï 00ef |
||||||||||||||||
0x8C | PLU 008c |
Ś 015a |
Œ 0152 |
î 00ee |
|||||||||||||||
0x8D | RI 008d |
Ť 0164 |
ì 00ec |
||||||||||||||||
0x8E | SS2 008e |
Ž 017d |
Ä 00c4 |
||||||||||||||||
0x8F | SS3 008f |
Ź 0179 |
Å 00c5 |
||||||||||||||||
0x90 | DCS 0090 |
É 00c9 |
|||||||||||||||||
0x91 | PU1 0091 |
‘ 2018 |
æ 00e6 |
||||||||||||||||
0x92 | PU2 0092 |
’ 2019 |
Æ 00c6 |
||||||||||||||||
0x93 | STS 0093 |
“ 201c |
ô 00f4 |
||||||||||||||||
0x94 | CCH 0094 |
” 201d |
ö 00f6 |
||||||||||||||||
0x95 | MW 0095 |
• 2022 |
ò 00f2 |
||||||||||||||||
0x96 | SPA 0096 |
– 2013 |
û 00fb |
||||||||||||||||
0x97 | EPA 0097 |
— 2014 |
ù 00f9 |
||||||||||||||||
0x98 | SOS 0098 |
˜ 02dc |
ÿ 00ff |
||||||||||||||||
0x99 | SGCI 0099 |
™ 2122 |
Ö 00d6 |
||||||||||||||||
0x9A | SCI 009a |
š 0161 |
Ü 00dc |
||||||||||||||||
0x9B | CSI 009b |
› 203a |
¢ 00a2 |
ø 00f8 |
|||||||||||||||
0x9C | ST 009c |
ś 015B |
œ 0153 |
£ 00a3 |
|||||||||||||||
0x9D | OSC 009d |
ť 0165 |
¥ 00a5 |
Ø 00d8 |
|||||||||||||||
0x9E | PM 009e |
ž 017E |
₧ 20A7 |
× 00d7 |
|||||||||||||||
0x9F | APC 009f |
ź 017A |
Ÿ 0178 |
ƒ 0192 |
This is where all encodings have their respective special characters. If you have text with codes from this area (and none in the 0x80 - 0x9f area above), you can only guess as to which encoding you are looking at. Try to find out which characters would make sense in their respective places, and strike off those encodings that would not give meaningful results. (Always keeping in mind that there might be typos in the text and perhaps not all characters do make sense.)
Code | ISO-8859 | Windows | IBM | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-1 | -2 | -3 | -4 | -5 | -6 | -7 | -8 | -9 | -10 | -11 | -13 | -14 | -15 | -16 | -1250 | -1252 | CP437 | CP850 | |
0xA0 | NBSP 00a0 |
á 00e1 |
|||||||||||||||||
0xA1 | ¡ 00a1 |
Ą 0104 |
Ħ 0126 |
Ą 0104 |
Ё 0401 |
‘ 2018 |
¡ 00a1 |
Ą 0104 |
ก 0e01 |
” 201d |
Ḃ 1e02 |
¡ 00a1 |
Ą 0104 |
ˇ 02c7 |
¡ 00a1 |
í 00ed |
|||
0xA2 | ¢ 00a2 |
˘ 02d8 |
ĸ 0138 |
Ђ 0402 |
’ 2019 |
¢ 00a2 |
Ē 0112 |
ข 0e02 |
¢ 00a2 |
ḃ 1e03 |
¢ 00a2 |
ą 0105 |
˘ 02d8 |
¢ 00a2 |
ó 00f3 |
||||
0xA3 | £ 00a3 |
Ł 0141 |
£ 00a3 |
Ŗ 0156 |
Ѓ 0403 |
£ 00a3 |
Ģ 0122 |
ฃ 0e03 |
£ 00a3 |
Ł 0141 |
Ł 0141 |
£ 00a3 |
ú 00fa |
||||||
0xA4 | ¤ 00a4 |
Є 0404 |
¤ 00a4 |
€ 20ac |
¤ 00a4 |
Ī 012a |
ค 0e04 |
¤ 00a4 |
Ċ 010a |
€ 20ac |
¤ 00a4 |
ñ 00f1 |
|||||||
0xA5 | ¥ 00a5 |
Ľ 013d |
Ĩ 0128 |
Ѕ 0405 |
₯ 20af |
¥ 00a5 |
Ĩ 0128 |
ฅ 0e05 |
„ 201e |
ċ 010b |
¥ 00a5 |
„ 201e |
Ą 0104 |
¥ 00a5 |
Ñ 00d1 |
||||
0xA6 | ¦ 00a6 |
Ś 015a |
Ĥ 0124 |
Ļ 013b |
І 0406 |
¦ 00a6 |
Ķ 0136 |
ฆ 0e06 |
¦ 00a6 |
Ḋ 1e0a |
Š 0160 |
¦ 00a6 |
¦ 00a6 |
ª 00aa |
|||||
0xA7 | § 00a7 |
Ї 0407 |
§ 00a7 |
ง 0e07 |
§ 00a7 |
º 00ba |
|||||||||||||
0xA8 | ¨ 00a8 |
Ј 0408 |
¨ 00a8 |
Ļ 013b |
จ 0e08 |
Ø 00d8 |
Ẁ 1e80 |
š 0161 |
¨ 00a8 |
¿ 00bf |
|||||||||
0xA9 | © 00a9 |
Š 0160 |
İ 0130 |
Š 0160 |
Љ 0409 |
© 00a9 |
Đ 0110 |
ฉ 0e09 |
© 00a9 |
⌐ 2310 |
® 00ae |
||||||||
0xAA | ª 00aa |
Ş 015e |
Ē 0112 |
Њ 040a |
ͺ 037a |
× 00d7 |
ª 00aa |
Š 0160 |
ช 0e0a |
Ŗ 0156 |
Ẃ 1e82 |
ª 00aa |
Ș 0218 |
Ş 015e |
ª 00aa |
¬ 00ac |
|||
0xAB | « 00ab |
Ť 0164 |
Ğ 011e |
Ģ 0122 |
Ћ 040b |
« 00ab |
Ŧ 0166 |
ซ 0e0b |
« 00ab |
ḋ 1e0b |
« 00ab |
« 00ab |
« 00ab |
½ 00bd |
|||||
0xAC | ¬ 00ac |
Ź 0179 |
Ĵ 0134 |
Ŧ 0166 |
Ќ 040c |
، 060c |
¬ 00ac |
Ž 017d |
ฌ 0e0c |
¬ 00ac |
Ỳ 1ef2 |
¬ 00ac |
Ź 0179 |
¬ 00ac |
¬ 00ac |
¼ 00bc |
|||
0xAD | SHY 00ad |
ญ 0e0d |
SHY 00ad |
¡ 00a1 |
|||||||||||||||
0xAE | ® 00ae |
Ž 017d |
Ž 017d |
Ў 040e |
® 00ae |
Ū 016a |
ฎ 0e0e |
® 00ae |
ź 017a |
® 00ae |
® 00ae |
« 00ab |
|||||||
0xAF | ¯ 00af |
Ż 017b |
¯ 00af |
Џ 040f |
― 2015 |
¯ 00af |
Ŋ 014a |
ฏ 0e0f |
Æ 00c6 |
Ÿ 0178 |
¯ 00af |
Ż 017b |
Ż 017b |
¯ 00af |
» 00bb |
||||
0xB0 | ° 00b0 |
А 0410 |
° 00b0 |
ฐ 0e10 |
° 00b0 |
Ḟ 1e1e |
° 00b0 |
░ 2591 |
|||||||||||
0xB1 | ± 00b1 |
ą 0105 |
ħ 0127 |
ą 0105 |
Б 0411 |
± 00b1 |
ą 0105 |
ฑ 0e11 |
± 00b1 |
ḟ 1e1f |
± 00b1 |
± 00b1 |
± 00b1 |
▒ 2592 |
|||||
0xB2 | ² 00b2 |
˛ 02db |
² 00b2 |
˛ 02db |
В 0412 |
² 00b2 |
ē 0113 |
ฒ 0e12 |
² 00b2 |
Ġ 0120 |
² 00b2 |
Č 010c |
˛ 02db |
² 00b2 |
▓ 2593 |
||||
0xB3 | ³ 00b3 |
ł 0142 |
³ 00b3 |
ŗ 0157 |
Г 0413 |
³ 00b3 |
ģ 0123 |
ณ 0e13 |
³ 00b3 |
ġ 0121 |
³ 00b3 |
ł 0142 |
ł 0142 |
³ 00b3 |
│ 2502 |
||||
0xB4 | ´ 00b4 |
Д 0414 |
΄ 0384 |
ī 012b |
ด 0e14 |
“ 201c |
Ṁ 1e40 |
Ž 017d |
´ 00b4 |
┤ 2524 |
|||||||||
0xB5 | µ 00b5 |
ľ 013e |
µ 00b5 |
ĩ 0129 |
Е 0415 |
΅ 0385 |
µ 00b5 |
ĩ 0129 |
ต 0e15 |
µ 00b5 |
ṁ 1e41 |
µ 00b5 |
” 201d |
µ 00b5 |
µ 00b5 |
╡ 2561 |
Á 00c1 |
||
0xB6 | ¶ 00b6 |
ś 015b |
ĥ 0125 |
ļ 013c |
Ж 0416 |
Ά 0386 |
¶ 00b6 |
ķ 0137 |
ถ 0e16 |
¶ 00b6 |
¶ 00b6 |
¶ 00b6 |
╢ 2562 |
 00c2 |
|||||
0xB7 | · 00b7 |
ˇ 02c7 |
· 00b7 |
ˇ 02c7 |
З 0417 |
· 00b7 |
ท 0e17 |
· 00b7 |
Ṗ 1e56 |
· 00b7 |
· 00b7 |
· 00b7 |
╖ 2556 |
À 00c0 |
|||||
0xB8 | ¸ 00b8 |
И 0418 |
Έ 0388 |
¸ 00b8 |
ļ 013c |
ธ 0e18 |
ø 00f8 |
ẁ 1e81 |
ž 017e |
¸ 00b8 |
╕ 2555 |
© 00a9 |
|||||||
0xB9 | ¹ 00b9 |
š 0161 |
ı 0131 |
š 0161 |
Й 0419 |
Ή 0389 |
¹ 00b9 |
đ 0111 |
น 0e19 |
¹ 00b9 |
ṗ 1e57 |
¹ 00b9 |
č 010d |
ą 0105 |
¹ 00b9 |
╣ 2563 |
|||
0xBA | º 00ba |
ş 015f |
ē 0113 |
К 041a |
Ί 038a |
÷ 00f7 |
º 00ba |
š 0161 |
บ 0e1a |
ŗ 0157 |
ẃ 1e83 |
º 00ba |
ș 0219 |
ş 015f |
º 00ba |
║ 2551 |
|||
0xBB | » 00bb |
ť 0165 |
ğ 011f |
ģ 0123 |
Л 041b |
؛ 061b |
» 00bb |
ŧ 0167 |
ป 0e1b |
» 00bb |
Ṡ 1e60 |
» 00bb |
» 00bb |
» 00bb |
╗ 2557 |
||||
0xBC | ¼ 00bc |
ź 017a |
ĵ 0135 |
ŧ 0167 |
М 041c |
Ό 038c |
¼ 00bc |
ž 017e |
ผ 0e1c |
¼ 00bc |
ỳ 1ef3 |
Œ 0152 |
Ľ 013d |
¼ 00bc |
╝ 255d |
||||
0xBD | ½ 00bd |
˝ 02dd |
½ 00bd |
Ŋ 014a |
Н 041d |
½ 00bd |
― 2015 |
ฝ 0e1d |
½ 00bd |
Ẅ 1e84 |
œ 0153 |
˝ 02dd |
½ 00bd |
╜ 255c |
¢ 00a2 |
||||
0xBE | ¾ 00be |
ž 017e |
ž 017e |
О 041e |
Ύ 038e |
¾ 00be |
ū 016b |
พ 0e1e |
¾ 00be |
ẅ 1e85 |
Ÿ 0178 |
ľ 013e |
¾ 00be |
╛ 255b |
¥ 00a5 |
||||
0xBF | ¿ 00bf |
ż 017c |
ŋ 014b |
П 041f |
؟ 061f |
Ώ 038f |
¿ 00bf |
ŋ 014b |
ฟ 0e1f |
æ 00e6 |
ṡ 1e61 |
¿ 00bf |
ż 017c |
ż 017c |
¿ 00bf |
┐ 2510 |
|||
0xC0 | À 00c0 |
ş 015f |
À 00c0 |
Ā 0100 |
Р 0420 |
ΐ 0390 |
À 00c0 |
Ā 0100 |
ภ 0e20 |
Ą 0104 |
À 00c0 |
ş 015f |
À 00c0 |
└ 2514 |
|||||
0xC1 | Á 00c1 |
С 0421 |
ء 0621 |
Α 0391 |
Á 00c1 |
ม 0e21 |
Į 012e |
Á 00c1 |
┴ 2534 |
||||||||||
0xC2 | Â 00c2 |
Т 0422 |
آ 0622 |
Β 0392 |
 00c2 |
ย 0e22 |
Ā 0100 |
 00c2 |
┬ 252c |
||||||||||
0xC3 | Ã 00c3 |
Ă 0102 |
à 00c3 |
У 0423 |
أ 0623 |
Γ 0393 |
à 00c3 |
ร 0e23 |
Ć 0106 |
à 00c3 |
Ă 0102 |
Ă 0102 |
├ 251c |
||||||
0xC4 | Ä 00c4 |
Ф 0424 |
ؤ 0624 |
Δ 0394 |
Ä 00c4 |
ฤ 0e24 |
Ä 00c4 |
─ 2500 |
|||||||||||
0xC5 | Å 00c5 |
Ĺ 0139 |
Ċ 010a |
Å 00c5 |
Х 0425 |
إ 0625 |
Ε 0395 |
Å 00c5 |
ล 0e25 |
Å 00c5 |
Ć 0106 |
Ĺ 0139 |
Å 00c5 |
┼ 253c |
|||||
0xC6 | Æ 00c6 |
Ć 0106 |
Ĉ 0108 |
Æ 00c6 |
Ц 0426 |
ئ 0626 |
Ζ 0396 |
Æ 00c6 |
ฦ 0e26 |
Ę 0118 |
Æ 00c6 |
Ć 0106 |
Æ 00c6 |
╞ 255e |
ã 00e3 |
||||
0xC7 | Ç 00c7 |
Į 012e |
Ч 0427 |
ا 0627 |
Η 0397 |
Ç 00c7 |
Į 012e |
ว 0e27 |
Ē 0112 |
Ç 00c7 |
╟ 255f |
à 00c3 |
|||||||
0xC8 | È 00c8 |
Č 010c |
È 00c8 |
Č 010c |
Ш 0428 |
ب 0628 |
Θ 0398 |
È 00c8 |
Č 010c |
ศ 0e28 |
Č 010c |
È 00c8 |
Č 010c |
È 00c8 |
╚ 255a |
||||
0xC9 | É 00c9 |
Щ 0429 |
ة 0629 |
Ι 0399 |
É 00c9 |
ษ 0e29 |
É 00c9 |
╔ 2554 |
|||||||||||
0xCA | Ê 00ca |
Ę 0118 |
Ê 00ca |
Ę 0118 |
Ъ 042a |
ت 062a |
Κ 039a |
Ê 00ca |
Ę 0118 |
ส 0e2a |
Ź 0179 |
Ê 00ca |
Ę 0118 |
Ê 00ca |
╩ 2569 |
||||
0xCB | Ë 00cb |
Ы 042b |
ث 062b |
Λ 039b |
Ë 00cb |
ห 0e2b |
Ė 0116 |
Ë 00cb |
╦ 2566 |
||||||||||
0xCC | Ì 00cc |
Ě 011a |
Ì 00cc |
Ė 0116 |
Ь 042c |
ج 062c |
Μ 039c |
Ì 00cc |
Ė 0116 |
ฬ 0e2c |
Ģ 0122 |
Ì 00cc |
Ě 011a |
Ì 00cc |
╠ 2560 |
||||
0xCD | Í 00cd |
Э 042d |
ح 062d |
Ν 039d |
Í 00cd |
อ 0e2d |
Ķ 0136 |
Í 00cd |
═ 2550 |
||||||||||
0xCE | Î 00ce |
Ю 042e |
خ 062e |
Ξ 039e |
Î 00ce |
ฮ 0e2e |
Ī 012a |
Î 00ce |
╬ 256c |
||||||||||
0xCF | Ï 00cf |
Ď 010e |
Ï 00cf |
Ī 012a |
Я 042f |
د 062f |
Ο 039f |
Ï 00cf |
ฯ 0e2f |
Ļ 013b |
Ï 00cf |
Ď 010e |
Ï 00cf |
╧ 2567 |
¤ 00a4 |
||||
0xD0 | Ð 00d0 |
Đ 0110 |
Đ 0110 |
а 0430 |
ذ 0630 |
Π 03a0 |
Ğ 011e |
Ð 00d0 |
ะ 0e30 |
Š 0160 |
Ŵ 0174 |
Ð 00d0 |
Đ 0110 |
Đ 0110 |
Ð 00d0 |
╨ 2568 |
ð 00f0 |
||
0xD1 | Ñ 00d1 |
Ń 0143 |
Ñ 00d1 |
Ņ 0145 |
б 0431 |
ر 0631 |
Ρ 03a1 |
Ñ 00d1 |
Ņ 0145 |
ั 0e31 |
Ń 0143 |
Ñ 00d1 |
Ń 0143 |
Ń 0143 |
Ñ 00d1 |
╤ 2564 |
Ð 00d0 |
||
0xD2 | Ò 00d2 |
Ň 0147 |
Ò 00d2 |
Ō 014c |
в 0432 |
ز 0632 |
Ò 00d2 |
Ō 014c |
า 0e32 |
Ņ 0145 |
Ò 00d2 |
Ň 0147 |
Ò 00d2 |
╥ 2565 |
Ê 00ca |
||||
0xD3 | Ó 00d3 |
Ķ 0136 |
г 0433 |
س 0633 |
Σ 03a3 |
Ó 00d3 |
ำ 0e33 |
Ó 00d3 |
╙ 2559 |
Ë 00cb |
|||||||||
0xD4 | Ô 00d4 |
д 0434 |
ش 0634 |
Τ 03a4 |
Ô 00d4 |
ิ 0e34 |
Ō 014c |
Ô 00d4 |
╘ 2558 |
È 00c8 |
|||||||||
0xD5 | Õ 00d5 |
Ő 0150 |
Ġ 0120 |
Õ 00d5 |
е 0435 |
ص 0635 |
Υ 03a5 |
Õ 00d5 |
ี 0e35 |
Õ 00d5 |
Ő 0150 |
Ő 0150 |
Õ 00d5 |
╒ 2552 |
ı 0131 |
||||
0xD6 | Ö 00d6 |
ж 0436 |
ض 0636 |
Φ 03a6 |
Ö 00d6 |
ึ 0e36 |
Ö 00d6 |
╓ 2553 |
Í 00cd |
||||||||||
0xD7 | × 00d7 |
з 0437 |
ط 0637 |
Χ 03a7 |
× 00d7 |
Ũ 0168 |
ื 0e37 |
× 00d7 |
Ṫ 1e6a |
× 00d7 |
Ś 015a |
× 00d7 |
╫ 256b |
Î 00ce |
|||||
0xD8 | Ø 00d8 |
Ř 0158 |
Ĝ 011c |
Ø 00d8 |
и 0438 |
ظ 0638 |
Ψ 03a8 |
Ø 00d8 |
ุ 0e38 |
Ų 0172 |
Ø 00d8 |
Ű 0170 |
Ř 0158 |
Ø 00d8 |
╪ 256a |
Ï 00cf |
|||
0xD9 | Ù 00d9 |
Ů 016e |
Ù 00d9 |
Ų 0172 |
й 0439 |
ع 0639 |
Ω 03a9 |
Ù 00d9 |
Ų 0172 |
ู 0e39 |
Ł 0141 |
Ù 00d9 |
┘ 2518 |
||||||
0xDA | Ú 00da |
к 043a |
غ 063a |
Ϊ 03aa |
Ú 00da |
ฺ 0e3a |
Ś 015a |
Ú 00da |
┌ 250c |
||||||||||
0xDB | Û 00db |
Ű 0170 |
Û 00db |
л 043b |
Ϋ 03ab |
Û 00db |
Ū 016a |
Û 00db |
█ 2588 |
||||||||||
0xDC | Ü 00dc |
м 043c |
ά 03ac |
Ü 00dc |
Ü 00dc |
▄ 2584 |
|||||||||||||
0xDD | Ý 00dd |
Ŭ 016c |
Ũ 0168 |
н 043d |
έ 03ad |
İ 0130 |
Ý 00dd |
Ż 017b |
Ý 00dd |
Ę 0118 |
Ý 00dd |
▌ 258c |
¦ 00a6 |
||||||
0xDE | Þ 00de |
Ţ 0162 |
Ŝ 015c |
Ū 016a |
о 043e |
ή 03ae |
Ş 015e |
Þ 00de |
Ž 017d |
Ŷ 0176 |
Þ 00de |
Ț 021a |
Ţ 0162 |
Þ 00de |
▐ 2590 |
Ì 00cc |
|||
0xDF | ß 00df |
п 043f |
ί 03af |
‗ 2017 |
ß 00df |
฿ 0e3f |
ß 00df |
▀ 2580 |
|||||||||||
0xE0 | à 00e0 |
ŕ 0155 |
à 00e0 |
ā 0101 |
р 0440 |
ـ 0640 |
ΰ 03b0 |
א 05d0 |
à 00e0 |
ā 0101 |
เ 0e40 |
ą 0105 |
à 00e0 |
ŕ 0155 |
à 00e0 |
α 03b1 |
Ó 00d3 |
||
0xE1 | á 00e1 |
с 0441 |
ف 0641 |
α 03b1 |
ב 05d1 |
á 00e1 |
แ 0e41 |
į 012f |
á 00e1 |
ß 00df |
|||||||||
0xE2 | â 00e2 |
т 0442 |
ق 0642 |
β 03b2 |
ג 05d2 |
â 00e2 |
โ 0e42 |
ā 0101 |
â 00e2 |
Γ 0393 |
Ô 00d4 |
||||||||
0xE3 | ã 00e3 |
ă 0103 |
ã 00e3 |
у 0443 |
ك 0643 |
γ 03b3 |
ד 05d3 |
ã 00e3 |
ใ 0e43 |
ć 0107 |
ã 00e3 |
ă 0103 |
ă 0103 |
ã 00e3 |
π 03c0 |
Ò 00d2 |
|||
0xE4 | ä 00e4 |
ф 0444 |
ل 0644 |
δ 03b4 |
ה 05d4 |
ä 00e4 |
ไ 0e44 |
ä 00e4 |
Σ 03a3 |
õ 00f5 |
|||||||||
0xE5 | å 00e5 |
ĺ 013a |
ċ 010b |
å 00e5 |
х 0445 |
م 0645 |
ε 03b5 |
ו 05d5 |
å 00e5 |
ๅ 0e45 |
å 00e5 |
ć 0107 |
ĺ 013a |
å 00e5 |
σ 03c3 |
Õ 00d5 |
|||
0xE6 | æ 00e6 |
ć 0107 |
ĉ 0109 |
æ 00e6 |
ц 0446 |
ن 0646 |
ζ 03b6 |
ז 05d6 |
æ 00e6 |
ๆ 0e46 |
ę 0119 |
æ 00e6 |
ć 0107 |
æ 00e6 |
µ 00b5 |
||||
0xE7 | ç 00e7 |
į 012f |
ч 0447 |
ه 0647 |
η 03b7 |
ח 05d7 |
ç 00e7 |
į 012f |
็ 0e47 |
ē 0113 |
ç 00e7 |
τ 03c4 |
þ 00fe |
||||||
0xE8 | è 00e8 |
č 010d |
è 00e8 |
č 010d |
ш 0448 |
و 0648 |
θ 03b8 |
ט 05d8 |
è 00e8 |
č 010d |
่ 0e48 |
č 010d |
è 00e8 |
č 010d |
è 00e8 |
Φ 03a6 |
Þ 00de |
||
0xE9 | é 00e9 |
щ 0449 |
ى 0649 |
ι 03b9 |
י 05d9 |
é 00e9 |
้ 0e49 |
é 00e9 |
Θ 0398 |
Ú 00da |
|||||||||
0xEA | ê 00ea |
ę 0119 |
ê 00ea |
ę 0119 |
ъ 044a |
ي 064a |
κ 03ba |
ך 05da |
ê 00ea |
ę 0119 |
๊ 0e4a |
ź 017a |
ê 00ea |
ę 0119 |
ê 00ea |
Ω 03a9 |
Û 00db |
||
0xEB | ë 00eb |
ы 044b |
ً 064b |
λ 03bb |
כ 05db |
ë 00eb |
๋ 0e4b |
ė 0117 |
ë 00eb |
δ 03b4 |
Ù 00d9 |
||||||||
0xEC | ì 00ec |
ě 011b |
ì 00ec |
ė 0117 |
ь 044c |
ٌ 064c |
μ 03bc |
ל 05dc |
ì 00ec |
ė 0117 |
์ 0e4c |
ģ 0123 |
ì 00ec |
ě 011b |
ì 00ec |
∞ 221e |
ý 00fd |
||
0xED | í 00ed |
э 044d |
ٍ 064d |
ν 03bd |
ם 05dd |
í 00ed |
ํ 0e4d |
ķ 0137 |
í 00ed |
φ 03c6 |
Ý 00dd |
||||||||
0xEE | î 00ee |
ю 044e |
َ 064e |
ξ 03be |
מ 05de |
î 00ee |
๎ 0e4e |
ī 012b |
î 00ee |
ε 03b5 |
¯ 00af |
||||||||
0xEF | ï 00ef |
ď 010f |
ï 00ef |
ī 012b |
я 044f |
ُ 064f |
ο 03bf |
ן 05df |
ï 00ef |
๏ 0e4f |
ļ 013c |
ï 00ef |
ď 010f |
ï 00ef |
∩ 2229 |
´ 00b4 |
|||
0xF0 | ð 00f0 |
đ 0111 |
đ 0111 |
№ 2116 |
ِ 0650 |
π 03c0 |
נ 05e0 |
ğ 011f |
ð 00f0 |
๐ 0e50 |
š 0161 |
ŵ 0175 |
ð 00f0 |
đ 0111 |
đ 0111 |
ð 00f0 |
≡ 2261 |
SHY 00ad |
|
0xF1 | ñ 00f1 |
ń 0144 |
ñ 00f1 |
ņ 0146 |
ё 0451 |
ّ 0651 |
ρ 03c1 |
ס 05e1 |
ñ 00f1 |
ņ 0146 |
๑ 0e51 |
ń 0144 |
ñ 00f1 |
ń 0144 |
ń 0144 |
ñ 00f1 |
± 00b1 |
||
0xF2 | ò 00f2 |
ň 0148 |
ò 00f2 |
ō 014d |
ђ 0452 |
ْ 0652 |
ς 03c2 |
ע 05e2 |
ò 00f2 |
ō 014d |
๒ 0e52 |
ņ 0146 |
ò 00f2 |
ň 0148 |
ò 00f2 |
≥ 2265 |
‗ 2017 |
||
0xF3 | ó 00f3 |
ķ 0137 |
ѓ 0453 |
σ 03c3 |
ף 05e3 |
ó 00f3 |
๓ 0e53 |
ó 00f3 |
≤ 2264 |
¾ 00be |
|||||||||
0xF4 | ô 00f4 |
є 0454 |
τ 03c4 |
פ 05e4 |
ô 00f4 |
๔ 0e54 |
ō 014d |
ô 00f4 |
⌠ 2320 |
¶ 00b6 |
|||||||||
0xF5 | õ 00f5 |
ő 0151 |
ġ 0121 |
õ 00f5 |
ѕ 0455 |
υ 03c5 |
ץ 05e5 |
õ 00f5 |
๕ 0e55 |
õ 00f5 |
ő 0151 |
ő 0151 |
õ 00f5 |
⌡ 2321 |
§ 00a7 |
||||
0xF6 | ö 00f6 |
і 0456 |
φ 03c6 |
צ 05e6 |
ö 00f6 |
๖ 0e56 |
ö 00f6 |
÷ 00f7 |
|||||||||||
0xF7 | ÷ 00f7 |
ї 0457 |
χ 03c7 |
ק 05e7 |
÷ 00f7 |
ũ 0169 |
๗ 0e57 |
÷ 00f7 |
ṫ 1e6b |
÷ 00f7 |
ś 015b |
÷ 00f7 |
≈ 2248 |
¸ 00b8 |
|||||
0xF8 | ø 00f8 |
ř 0159 |
ĝ 011d |
ø 00f8 |
ј 0458 |
ψ 03c8 |
ר 05e8 |
ø 00f8 |
๘ 0e58 |
ų 0173 |
ø 00f8 |
ű 0171 |
ř 0159 |
ø 00f8 |
° 00b0 |
||||
0xF9 | ù 00f9 |
ů 016f |
ù 00f9 |
ų 0173 |
љ 0459 |
ω 03c9 |
ש 05e9 |
ù 00f9 |
ų 0173 |
๙ 0e59 |
ł 0142 |
ù 00f9 |
ů 016f |
ù 00f9 |
∙ 2219 |
¨ 00a8 |
|||
0xFA | ú 00fa |
њ 045a |
ϊ 03ca |
ת 05ea |
ú 00fa |
๚ 0e5a |
ś 015b |
ú 00fa |
· 00b7 |
||||||||||
0xFB | û 00fb |
ű 0171 |
û 00fb |
ћ 045b |
ϋ 03cb |
û 00fb |
๛ 0e5b |
ū 016b |
û 00fb |
ű 0171 |
û 00fb |
√ 221a |
¹ 00b9 |
||||||
0xFC | ü 00fc |
ќ 045c |
ό 03cc |
ü 00fc |
ü 00fc |
ⁿ 207f |
³ 00b3 |
||||||||||||
0xFD | ý 00fd |
ŭ 016d |
ũ 0169 |
§ 00a7 |
ύ 03cd |
LRM 200e |
ı 0131 |
ý 00fd |
ż 017c |
ý 00fd |
ę 0119 |
ý 00fd |
² 00b2 |
||||||
0xFE | þ 00fe |
ţ 0163 |
ŝ 015d |
ū 016b |
ў 045e |
ώ 03ce |
RLM 200f |
ş 015f |
þ 00fe |
ž 017e |
ŷ 0177 |
þ 00fe |
ț 021b |
ţ 0163 |
þ 00fe |
■ 25a0 |
|||
0xFF | ÿ 00ff |
˙ 02d9 |
џ 045f |
ÿ 00ff |
ĸ 0138 |
’ 2019 |
ÿ 00ff |
˙ 02d9 |
ÿ 00ff |
NBSP 00a0 |
The Latin-1, Latin-9 and Windows-1252 encodings are very difficult to tell apart, especially if the file in question has only a few characters in the "differing" range. Thus a table listing the characters in which Latin-1 and Latin-9 differ, including their encodings in Windows-1252.
Character | € | Š | š | Ž | ž | Œ | œ | Ÿ | ¤ | ¦ | ¨ | ´ | ¸ | ¼ | ½ | ¾ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ISO 8859-1 | 0xA4 | 0xA6 | 0xA8 | 0xB4 | 0xB8 | 0xBC | 0xBD | 0xBE | ||||||||
ISO 8859-15 | 0xA4 | 0xA6 | 0xA8 | 0xB4 | 0xB8 | 0xBC | 0xBD | 0xBE | ||||||||
Windows-1252 | 0x80 | 0x8A | 0x9A | 0x8E | 0x9E | 0x8C | 0x9C | 0x9F | 0xA4 | 0xA6 | 0xA8 | 0xB4 | 0xB8 | 0xBC | 0xBD | 0xBE |
Unicode | U+20ac | U+0160 | U+0161 | U+017d | U+017e | U+0152 | U+0153 | U+0178 | U+00a4 | U+00a6 | U+00a8 | U+00b4 | U+00b8 | U+00bc | U+00bd | U+00be |
UTF-8 | e2 82 ac | c5 a0 | c5 a1 | c5 bd | c5 be | c5 92 | c5 93 | c5 b8 | c2 a4 | c2 a6 | c2 a8 | c2 b4 | c2 b8 | c2 bc | c2 bd | c2 be |
If your environment forces you to state an encoding using codepage numbers, but you want to use ISO/IEC Latin encodings, here is a list of how Windows refers to those standard encodings:
ISO/IEC Latin | Windows Codepage |
---|---|
ISO 8859-1 (Latin-1 Western European) | Windows-28591 |
ISO 8859-2 (Latin-2 Central European) | Windows-28592 |
ISO 8859-3 (Latin-3 South European) | Windows-28593 |
ISO 8859-4 (Latin-4 North European) | Windows-28594 |
ISO 8859-5 (Latin / Cyrillic) | Windows-28595 |
ISO 8859-6 (Latin / Arabic) | Windows-28596 |
ISO 8859-7 (Latin / Greek) | Windows-28597 |
ISO 8859-8 (Latin / Hebrew) | Windows-28598 |
ISO 8859-9 (Latin-5 Turkish) | Windows-28599 |
ISO 8859-10 (Latin-6 Nordic) | Windows-28600 |
ISO 8859-11 (Latin / Thai) | Windows-874 comes close but is not identical... |
ISO 8859-13 (Latin-7 Baltic Rim) | Windows-28603 |
ISO 8859-14 (Latin-8 Celtic) | Windows-28604 |
ISO 8859-15 (Latin-9) | Windows-28605 |
ISO 8859-16 (Latin-10 South-Eastern European) | Windows-28606 (?) |
Most people have problems with understanding that UTF-16 is not a "wide" encoding. Just like with UTF-8, a single code point can require more than one code unit (up to four 8-bit code units in UTF-8, up to two 16-bit code units in UTF-16). The only really wide encoding is UTF-32, which takes 32 bits for every code point.
To showcase this, some example characters and their respective encoding. All code units are hexadecimal.
Character | a | ä | Š | € | |
---|---|---|---|---|---|
ISO-8859-15 | 61 | e4 | a6 | a4 | --- |
UTF-32 | 00000061 | 000000e4 | 00000160 | 000020ac | 0002f929 |
UTF-16 | 0061 | 00e4 | 0160 | 20ac | d87e dd29 |
UTF-8 | 61 | c3 a4 | c5 a0 | e2 82 ac | f0 af a4 a9 |
And then you still have to consider that a code point does not necessarily equal a character, and that a given character does not imply a specific code point.
The character Ü, for example, can be encoded as either U+00dc (LATIN CAPITAL LETTER U WITH DIAERESIS), or as the sequence of the two code points U+0055 U+0308 (LATIN CAPITAL LETTER U, COMBINING DIAERESIS).
With UTF-16 and UTF-32, you also have to consider endianess. To enable parsers to determine endianess of a text automatically, a special character is put at the beginning, the Byte Order Mark (BOM, code point U+feff). If no BOM exists, Big Endian should be assumed.
UTF-8 does not have this issue, and a BOM is not required.
Rule of thumb when looking at UTF-8: Any byte value of 0x80 or higher is part of a multi-byte sequence.
Rule of thumb when looking at UTF-16: Any 16-bit value between 0xd800 and 0xdfff is part of a surrogate pair.