[.NET][C#]EBCDIC 轉 ASCII 字元編碼系統

除了Unicode(萬國碼)與ASCII(美國標準資訊交換碼)字元編碼外,有時與大型主機電腦進行資料交換時會碰到EBCDIC碼:

Extended Binary Coded Decimal Interchage Code): 擴充二進位編碼的十進位數資訊交換碼
Unicode在英數字、符號及控制字元上與ASCI完全相同,不用特別轉換的需求,但若碰到了EBCDIC:

 

今天先解決EBCDIC To ASCII

字元用途 編碼系統(主機) 編碼系統
英文字(拉丁字母)、阿拉伯數字、符號、控制字元 EBCDIC ASCII
其他語言字元 NHC BIG-5、Unicode

 

2016/7/22 Update 這邊貼錯轉碼表了,應該要用EBCDIC To ASCII

1.首先按照VB版的kb宣告ASCII To EBCDIC EBCDIC To ASCII轉碼表;Encoding Convert則內建在.NET,待會定義編碼系統種類就可以。

static byte[] EbcdicToAscii = new byte[]{
0X00,0X01,0X02,0X03,0X9C,0X09,0X86,0X7F,0X97,0X8D,0X8E,0X0B,0X0C,0X0D,0X0E,0X0F,
0X10,0X11,0X12,0X13,0X9D,0X85,0X08,0X87,0X18,0X19,0X92,0X8F,0X1C,0X1D,0X1E,0X1F,
0X80,0X81,0X82,0X83,0X84,0X0A,0X17,0X1B,0X88,0X89,0X8A,0X8B,0X8C,0X05,0X06,0X07,
0X90,0X91,0X16,0X93,0X94,0X95,0X96,0X04,0X98,0X99,0X9A,0X9B,0X14,0X15,0X9E,0X1A,
0X20,0XA0,0XA1,0XA2,0XA3,0XA4,0XA5,0XA6,0XA7,0XA8,0XA2,0X2E,0X3C,0X28,0X2B,0X7C,
0X26,0XA9,0XAA,0XAB,0XAC,0XAD,0XAE,0XAF,0XB0,0XB1,0X21,0X24,0X2A,0X29,0X3B,0X5E,
0X2D,0X2F,0XB2,0XB3,0XB4,0XB5,0XB6,0XB7,0XB8,0XB9,0X7C,0X2C,0X25,0X5F,0X3E,0X3F,
0XBA,0XBB,0XBC,0XBD,0XBE,0XBF,0XC0,0XC1,0XC2,0X60,0X3A,0X23,0X40,0X27,0X3D,0X22,
0XC3,0X61,0X62,0X63,0X64,0X65,0X66,0X67,0X68,0X69,0XC4,0XC5,0XC6,0XC7,0XC8,0XC9,
0XCA,0X6A,0X6B,0X6C,0X6D,0X6E,0X6F,0X70,0X71,0X72,0XCB,0XCC,0XCD,0XCE,0XCF,0XD0,
0XD1,0X7E,0X73,0X74,0X75,0X76,0X77,0X78,0X79,0X7A,0XD2,0XD3,0XD4,0X5B,0XD6,0XD7,
0XD8,0XD9,0XDA,0XDB,0XDC,0XDD,0XDE,0XDF,0XE0,0XE1,0X5B,0X5D,0XE4,0X5D,0XE6,0XE7,
0X7B,0X41,0X42,0X43,0X44,0X45,0X46,0X47,0X48,0X49,0XE8,0XE9,0XEA,0XEB,0XEC,0XED,
0X7D,0X4A,0X4B,0X4C,0X4D,0X4E,0X4F,0X50,0X51,0X52,0XEE,0XEF,0XF0,0XF1,0XF2,0XF3,
0X5C,0X9F,0X53,0X54,0X55,0X56,0X57,0X58,0X59,0X5A,0XF4,0XF5,0XF6,0XF7,0XF8,0XF9,
0X30,0X31,0X32,0X33,0X34,0X35,0X36,0X37,0X38,0X39,0XFA,0XFB,0XFC,0XFD,0XFE,0XFF
              };

2.建立轉換方法1

public static byte[] ConvertEbcdicToAscii(byte[] ebcdicData)
{
    //宣告輸出位元組陣列
    Byte[] OutByte = new byte[ebcdicData.Length];
    //逐Byte依照Mapping TABLE轉出
    for (int i = 0; i < ebcdicData.Length; i++)
    {
        OutByte[i] = EbcdicToAscii[(int)ebcdicData[i]];
    }
    return OutByte;
}

3.建立轉換方法2(對,就是這麼簡單)

public static byte[] ConvertEbcdicToAscii2(byte[] ebcdicData)
{
    //建立編碼ASCII     
    Encoding ascii = Encoding.ASCII;
    //建立編碼IBM037   
    Encoding ebcdic = Encoding.GetEncoding("IBM037");
    //Retutn Ascii Data 
    return Encoding.Convert(ebcdic, ascii, ebcdicData);
}

4.先測試簡單的大小寫英文(拉丁字母)、阿拉伯數字及簡單符號

依據EBCDIC Table:

 

依據ASCII Table( 火星任務電影有用到喔!)

預定轉換結果:

字元 EBCDIC( ASCII
A 0xC1 0x41
B 0xC2 0x42
a 0x81 0x61
b 0x82 0x62
1 0xF1 0x31
2 0xF2 0x32
: 0x7A 0x3A
# 0x7B 0x23
@ 0x7C 0x40

測試程式碼

//EBCDIC C1:[A] C2:[B] 
//EBCDIC 81:[a] 82:[b]
//EBCDIC F0:[0] F1:[1] F2:[2] 
//EBCDIC 7A:[:] 7B:[#] 7C:[@]
byte[] MyEBCDICbyte = new byte[] { 0xC1, 0xC2, 0x81, 0x82, 0xF0, 0xF1, 0xF2, 0x7A, 0x7B, 0x7C };
//轉換方法(1)-仿照VB版轉換
byte[] MyASCIIByte = ConvertEbcdicToAscii(MyEBCDICbyte);
//轉換方法(2)-Encoding.Convert
byte[] MyASCIIByte2 = ConvertEbcdicToAscii2(MyEBCDICbyte);
Console.WriteLine("EBCDIC: {0}", MyEBCDICbyte.BToHex());
Console.WriteLine("ASCII 1: {0}", MyASCIIByte.BToHex());
Console.WriteLine("ASCII 2: {0}", MyASCIIByte2.BToHex());

Assert.AreEqual(MyASCIIByte.BToHex(), MyASCIIByte2.BToHex());

增加一個Byte To Hex Extension來顯示16進位(Hex string)內容:

public static string BToHex(this byte[] Bdata)
{
    return BitConverter.ToString(Bdata).Replace("-", "");
}

測試結果:  兩個方法都正確轉換

5.身兼測試工程師分身的Developer,完整的字碼區間測試應該也要進行:

先建立完整的區間: 00-FF(256個字元)

 

byte[] MyEBCDICbyte = new byte[]{
              0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0A,0x0B,0x0C,0x0D,0x0E,0x0F,
              0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1A,0x1B,0x1C,0x1D,0x1E,0x1F,
              0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2A,0x2B,0x2C,0x2D,0x2E,0x2F,
              0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3A,0x3B,0x3C,0x3D,0x3E,0x3F,
              0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4A,0x4B,0x4C,0x4D,0x4E,0x4F,
              0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5A,0x5B,0x5C,0x5D,0x5E,0x5F,
              0x60,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6A,0x6B,0x6C,0x6D,0x6E,0x6F,
              0x70,0x71,0x72,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7A,0x7B,0x7C,0x7D,0x7E,0x7F,
              0x80,0x81,0x82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,0x8A,0x8B,0x8C,0x8D,0x8E,0x8F,
              0x90,0x91,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9A,0x9B,0x9C,0x9D,0x9E,0x9F,
              0xA0,0xA1,0xA2,0xA3,0xA4,0xA5,0xA6,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC,0xAD,0xAE,0xAF,
              0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC,0xBD,0xBE,0xBF,
              0xC0,0xC1,0xC2,0xC3,0xC4,0xC5,0xC6,0xC7,0xC8,0xC9,0xCA,0xCB,0xCC,0xCD,0xCE,0xCF,
              0xD0,0xD1,0xD2,0xD3,0xD4,0xD5,0xD6,0xD7,0xD8,0xD9,0xDA,0xDB,0xDC,0xDD,0xDE,0xDF,
              0xE0,0xE1,0xE2,0xE3,0xE4,0xE5,0xE6,0xE7,0xE8,0xE9,0xEA,0xEB,0xEC,0xED,0xEE,0xEF,
              0xF0,0xF1,0xF2,0xF3,0xF4,0xF5,0xF6,0xF7,0xF8,0xF9,0xFA,0xFB,0xFC,0xFD,0xFE,0xFF
          };

測試程式

//轉換方法(1)-仿照VB版轉換
byte[] MyASCIIByte = ConvertEbcdicToAscii(MyEBCDICbyte);
//轉換方法(2)-Encoding.Convert
byte[] MyASCIIByte2 = ConvertEbcdicToAscii2(MyEBCDICbyte);


for (int i = 0; i < MyASCIIByte.Length; i++)
{
    byte[] MyASCII8Byte = new byte[1] { MyASCIIByte[i] };
    int j = i + 1;
    string HexString = MyASCII8Byte.BToHex().Replace("3F", "__");
    if (j % 16 == 0)
    {
        Console.WriteLine("0x{0}", HexString);
    }
    else
    {
        Console.Write("0x{0},", HexString);
    }
}
Console.WriteLine("");
for (int i = 0; i < MyASCIIByte2.Length; i++)
{
    byte[] MyASCII8Byte = new byte[1] { MyASCIIByte2[i] };
    string HexString = MyASCII8Byte.BToHex().Replace("3F", "__");

    int j = i + 1;
    if (j % 16 == 0)
    {
        Console.WriteLine("0x{0}", HexString);
    }
    else
    {
        Console.Write("0x{0},", HexString);
    }
}

測試結果:為求對齊好方便肉眼看差異,貼到notepad

由於ASCII只支援了128個字元,當Encoding.Convert 碰到不認識的字元時,有些內碼轉不出來,自動轉成?(0x3F),上頭我們也轉成底線好方便比對。

 

小結:

  • 兩種方法在英數字(拉丁字母+阿拉伯數字)、符號都可以轉換。
  • 某些特定字元在不同大型主機電腦會點小差異需要注意(Example: 驚嘆號!,HP:4FIBM、AT&T: 5A)。
  • 如果大型主機電腦系統會傳超過128種ASCII的字元範圍(像是中文),建議還是需要撰寫AP轉換,避免轉換過程會掉碼(轉換成問號?)。

 

台南加油!

 

參考:

ISO8859-1(1~n版,擴充ASCII,增加不同歐語系腔調字母)

ASCII

如何將 ASCII 和 EBCDIC 的字元碼之間轉換(微軟KB)

Encoding.Convert()

How to convert from EBCDIC to ASCII in C#.net