除了Unicode(萬國碼)與ASCII(美國標準資訊交換碼)字元編碼外,有時與大型主機電腦進行資料交換時會碰到EBCDIC碼:
Extended Binary Coded Decimal Interchage Code): 擴充二進位編碼的十進位數資訊交換碼
Unicode在英數字、符號及控制字元上與ASCI完全相同,不用特別轉換的需求,但若碰到了EBCDIC:
今天先解決EBCDIC To ASCII
字元用途 | 編碼系統(主機) | 編碼系統 |
英文字(拉丁字母)、阿拉伯數字、符號、控制字元 | EBCDIC | ASCII |
其他語言字元 | NHC | BIG-5、Unicode |
- 參考這篇VB版的kb,改寫成.NET C#版
- 另外Stack overflow上也有網友提供簡易Encoding.Convert的轉法,也一起來比較兩者的差異。
2016/7/22 Update 這邊貼錯轉碼表了,應該要用EBCDIC To ASCII
1.首先按照VB版的kb宣告ASCII To EBCDIC EBCDIC To ASCII轉碼表;Encoding Convert則內建在.NET,待會定義編碼系統種類就可以。
static byte[] EbcdicToAscii = new byte[]{
0X00,0X01,0X02,0X03,0X9C,0X09,0X86,0X7F,0X97,0X8D,0X8E,0X0B,0X0C,0X0D,0X0E,0X0F,
0X10,0X11,0X12,0X13,0X9D,0X85,0X08,0X87,0X18,0X19,0X92,0X8F,0X1C,0X1D,0X1E,0X1F,
0X80,0X81,0X82,0X83,0X84,0X0A,0X17,0X1B,0X88,0X89,0X8A,0X8B,0X8C,0X05,0X06,0X07,
0X90,0X91,0X16,0X93,0X94,0X95,0X96,0X04,0X98,0X99,0X9A,0X9B,0X14,0X15,0X9E,0X1A,
0X20,0XA0,0XA1,0XA2,0XA3,0XA4,0XA5,0XA6,0XA7,0XA8,0XA2,0X2E,0X3C,0X28,0X2B,0X7C,
0X26,0XA9,0XAA,0XAB,0XAC,0XAD,0XAE,0XAF,0XB0,0XB1,0X21,0X24,0X2A,0X29,0X3B,0X5E,
0X2D,0X2F,0XB2,0XB3,0XB4,0XB5,0XB6,0XB7,0XB8,0XB9,0X7C,0X2C,0X25,0X5F,0X3E,0X3F,
0XBA,0XBB,0XBC,0XBD,0XBE,0XBF,0XC0,0XC1,0XC2,0X60,0X3A,0X23,0X40,0X27,0X3D,0X22,
0XC3,0X61,0X62,0X63,0X64,0X65,0X66,0X67,0X68,0X69,0XC4,0XC5,0XC6,0XC7,0XC8,0XC9,
0XCA,0X6A,0X6B,0X6C,0X6D,0X6E,0X6F,0X70,0X71,0X72,0XCB,0XCC,0XCD,0XCE,0XCF,0XD0,
0XD1,0X7E,0X73,0X74,0X75,0X76,0X77,0X78,0X79,0X7A,0XD2,0XD3,0XD4,0X5B,0XD6,0XD7,
0XD8,0XD9,0XDA,0XDB,0XDC,0XDD,0XDE,0XDF,0XE0,0XE1,0X5B,0X5D,0XE4,0X5D,0XE6,0XE7,
0X7B,0X41,0X42,0X43,0X44,0X45,0X46,0X47,0X48,0X49,0XE8,0XE9,0XEA,0XEB,0XEC,0XED,
0X7D,0X4A,0X4B,0X4C,0X4D,0X4E,0X4F,0X50,0X51,0X52,0XEE,0XEF,0XF0,0XF1,0XF2,0XF3,
0X5C,0X9F,0X53,0X54,0X55,0X56,0X57,0X58,0X59,0X5A,0XF4,0XF5,0XF6,0XF7,0XF8,0XF9,
0X30,0X31,0X32,0X33,0X34,0X35,0X36,0X37,0X38,0X39,0XFA,0XFB,0XFC,0XFD,0XFE,0XFF
};
2.建立轉換方法1
public static byte[] ConvertEbcdicToAscii(byte[] ebcdicData)
{
//宣告輸出位元組陣列
Byte[] OutByte = new byte[ebcdicData.Length];
//逐Byte依照Mapping TABLE轉出
for (int i = 0; i < ebcdicData.Length; i++)
{
OutByte[i] = EbcdicToAscii[(int)ebcdicData[i]];
}
return OutByte;
}
3.建立轉換方法2(對,就是這麼簡單)
public static byte[] ConvertEbcdicToAscii2(byte[] ebcdicData)
{
//建立編碼ASCII
Encoding ascii = Encoding.ASCII;
//建立編碼IBM037
Encoding ebcdic = Encoding.GetEncoding("IBM037");
//Retutn Ascii Data
return Encoding.Convert(ebcdic, ascii, ebcdicData);
}
4.先測試簡單的大小寫英文(拉丁字母)、阿拉伯數字及簡單符號
依據EBCDIC Table:
依據ASCII Table( 火星任務電影有用到喔!)
預定轉換結果:
字元 | EBCDIC( | ASCII |
A | 0xC1 | 0x41 |
B | 0xC2 | 0x42 |
a | 0x81 | 0x61 |
b | 0x82 | 0x62 |
1 | 0xF1 | 0x31 |
2 | 0xF2 | 0x32 |
: | 0x7A | 0x3A |
# | 0x7B | 0x23 |
@ | 0x7C | 0x40 |
測試程式碼
//EBCDIC C1:[A] C2:[B]
//EBCDIC 81:[a] 82:[b]
//EBCDIC F0:[0] F1:[1] F2:[2]
//EBCDIC 7A:[:] 7B:[#] 7C:[@]
byte[] MyEBCDICbyte = new byte[] { 0xC1, 0xC2, 0x81, 0x82, 0xF0, 0xF1, 0xF2, 0x7A, 0x7B, 0x7C };
//轉換方法(1)-仿照VB版轉換
byte[] MyASCIIByte = ConvertEbcdicToAscii(MyEBCDICbyte);
//轉換方法(2)-Encoding.Convert
byte[] MyASCIIByte2 = ConvertEbcdicToAscii2(MyEBCDICbyte);
Console.WriteLine("EBCDIC: {0}", MyEBCDICbyte.BToHex());
Console.WriteLine("ASCII 1: {0}", MyASCIIByte.BToHex());
Console.WriteLine("ASCII 2: {0}", MyASCIIByte2.BToHex());
Assert.AreEqual(MyASCIIByte.BToHex(), MyASCIIByte2.BToHex());
增加一個Byte To Hex Extension來顯示16進位(Hex string)內容:
public static string BToHex(this byte[] Bdata)
{
return BitConverter.ToString(Bdata).Replace("-", "");
}
測試結果: 兩個方法都正確轉換
5.身兼測試工程師分身的Developer,完整的字碼區間測試應該也要進行:
先建立完整的區間: 00-FF(256個字元)
byte[] MyEBCDICbyte = new byte[]{
0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0A,0x0B,0x0C,0x0D,0x0E,0x0F,
0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1A,0x1B,0x1C,0x1D,0x1E,0x1F,
0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2A,0x2B,0x2C,0x2D,0x2E,0x2F,
0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3A,0x3B,0x3C,0x3D,0x3E,0x3F,
0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4A,0x4B,0x4C,0x4D,0x4E,0x4F,
0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5A,0x5B,0x5C,0x5D,0x5E,0x5F,
0x60,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6A,0x6B,0x6C,0x6D,0x6E,0x6F,
0x70,0x71,0x72,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7A,0x7B,0x7C,0x7D,0x7E,0x7F,
0x80,0x81,0x82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,0x8A,0x8B,0x8C,0x8D,0x8E,0x8F,
0x90,0x91,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9A,0x9B,0x9C,0x9D,0x9E,0x9F,
0xA0,0xA1,0xA2,0xA3,0xA4,0xA5,0xA6,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC,0xAD,0xAE,0xAF,
0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC,0xBD,0xBE,0xBF,
0xC0,0xC1,0xC2,0xC3,0xC4,0xC5,0xC6,0xC7,0xC8,0xC9,0xCA,0xCB,0xCC,0xCD,0xCE,0xCF,
0xD0,0xD1,0xD2,0xD3,0xD4,0xD5,0xD6,0xD7,0xD8,0xD9,0xDA,0xDB,0xDC,0xDD,0xDE,0xDF,
0xE0,0xE1,0xE2,0xE3,0xE4,0xE5,0xE6,0xE7,0xE8,0xE9,0xEA,0xEB,0xEC,0xED,0xEE,0xEF,
0xF0,0xF1,0xF2,0xF3,0xF4,0xF5,0xF6,0xF7,0xF8,0xF9,0xFA,0xFB,0xFC,0xFD,0xFE,0xFF
};
測試程式
//轉換方法(1)-仿照VB版轉換
byte[] MyASCIIByte = ConvertEbcdicToAscii(MyEBCDICbyte);
//轉換方法(2)-Encoding.Convert
byte[] MyASCIIByte2 = ConvertEbcdicToAscii2(MyEBCDICbyte);
for (int i = 0; i < MyASCIIByte.Length; i++)
{
byte[] MyASCII8Byte = new byte[1] { MyASCIIByte[i] };
int j = i + 1;
string HexString = MyASCII8Byte.BToHex().Replace("3F", "__");
if (j % 16 == 0)
{
Console.WriteLine("0x{0}", HexString);
}
else
{
Console.Write("0x{0},", HexString);
}
}
Console.WriteLine("");
for (int i = 0; i < MyASCIIByte2.Length; i++)
{
byte[] MyASCII8Byte = new byte[1] { MyASCIIByte2[i] };
string HexString = MyASCII8Byte.BToHex().Replace("3F", "__");
int j = i + 1;
if (j % 16 == 0)
{
Console.WriteLine("0x{0}", HexString);
}
else
{
Console.Write("0x{0},", HexString);
}
}
測試結果:為求對齊好方便肉眼看差異,貼到notepad
由於ASCII只支援了128個字元,當Encoding.Convert 碰到不認識的字元時,有些內碼轉不出來,自動轉成?(0x3F),上頭我們也轉成底線好方便比對。
小結:
- 兩種方法在英數字(拉丁字母+阿拉伯數字)、符號都可以轉換。
- 某些特定字元在不同大型主機電腦會點小差異需要注意(Example: 驚嘆號!,HP:4F,IBM、AT&T: 5A)。
- 如果大型主機電腦系統會傳超過128種ASCII的字元範圍(像是中文),建議還是需要撰寫AP轉換,避免轉換過程會掉碼(轉換成問號?)。
台南加油!
參考:
ISO8859-1(1~n版,擴充ASCII,增加不同歐語系腔調字母)
如何將 ASCII 和 EBCDIC 的字元碼之間轉換(微軟KB)
How to convert from EBCDIC to ASCII in C#.net