/MEng/System/Runtime/TextXCoder| ClassPath: | MEng.System.Runtime.TextXCoder | | Parent ClassPath: | MEng.Object | | Copyable: | No | | Final: | Yes |
MEng.System.Runtime.TextXCoder provides text transcoding services. Text (characters and strings of characters) in the computer world are a pretty complex mess. Computers, invented in the US, initially used a single byte, or less, to store each character, so they were limited to values from 0 to 255 per character, and the original ASCII only used 7 of those. Obviously this is too little to store the characters for complex languages like Japanese or Chinese, and could only represent a single language at a time for less complex languages. So various 'encodings' were invented to store the characters required for various languages. These were ways to encode representations of a language's character set into the available bytes. These are often called 'code pages', in that they are a page of codes that represent the characters of a particular language. Some code pages commonly used are Latin1 (ISO-8859-1), or Windows 1295, or Shift-JIS, for instance. Some of them, such as Shift-JIS, which are used for complex Asian languages use more than one byte to store a character, sometimes using variable numbers of bytes according to which character is being stored. As long as you live within a single language, this isn't so bad, since you always deal with the same code page that is used by your language version of your operating system. But, in a world of interconnected computers, there must be a way to convert text from one format into another, to transcode it in other words, so that it can be understood by the receiver's computer. One attempt to make things easier is called Unicode. Unicode defines standard numeric values for a large number of characters, called glyphs, and extensible to represent a theoretically very large number of glyphs for specialized purposes. Windows NT based platforms, NT, Windows 2000, Windows XP, use Unicode as the native encoding, which makes it fairly easy for it to pull in text from many different code pages and represent them in a single, comprehensive encoding internally. Note that Unicode is not an 'encoding', which defines a physical representation of a set of characters, it is just a standard that defines numeric representations for a large set of characters from languages around the world. It can still be encoded in various ways. The actual encoding used by the 32 bit versions of Windows is UTF-16, which encodes Unicode in 16 bit values. Some languages might still use multiple 16 bit values to represent their full complement of characters, but most languages of the world comfortably fit into UTF-16 such that each character is a single 16 bit value. CML uses the native Unicode format, so it uses UTF-16 on the 32 bit versions of Windows. This means that any text which is read into a CML string from the outside world must be transcoded into this UTF-16 format, and transcoded back out to some desired format on the way back out. This class provides those services. This table describes the encodings currently supported, and the alias names which are recognized for each encoding. These aliases are widely used so, as a practical matter, they must be supported. Note that LE and BE mean little and big endian, indicating the order in which multi-byte characters are stored. Little endian means the least significant byte is stored first and big endian means the most significant byte is first. Windows, on the Intel platform at least, is little endian, so the internal format is UTF16-LE. | UTF8 | UTF-8, UTF_8 | | USASCII | US-ASCII, US_ASCII, ASCII | | UTF16LE | UTF16LE, UTF16-LE, UTF16_LE, UTF-16LE, UTF16L, UTF16-L, UTF-16L | | UTF16BE | UTF16BE, UTF16-BE, UTF16_BE, UTF-16BE, UTF16B, UTF16-B, UTF-16B | | UCS4BE | UCS-4B, UCS-4BE, UCS4-B, UCS4-BE | | UCS4LE | UCS-4L, UCS-4LE, UCS4-L, UCS4-LE | | Latin1 | Latin1, Latin-1, ISO-8859-1, 8859-1, CP819 | | Latin2 | Latin2, Latin-2, ISO-8859-2, 8859-2, CP912, IBM912 | | Latin3 | Latin3, Latin-3, ISO-8859-3, 8859-3, CP913, IBM913 | | Latin5 | Latin4, Latin-4, ISO-8859-4, 8859-4, CP914, IBM914, CYRILLIC | | Latin6 | Latin6, Latin-6, ISO-8859-6, 8859-6, CP1089, IBM1089, ARABIC | | EBCDIC-CP-US | EBCDIC-CP-US, CP037, CPIBM037, IBM037, EBCDIC-CP-CA, EBCDIC-CP-WT, EBCDIC-CP-NL | | CP437 | CP437, IBM437 | | CP850 | CP850 | | Windows-1252 | CP1252, WINDOWS-1252, CP1004 |
The following encodings are not endian specific, so if you use them, they will be assumed to use the local host's endian format. So if you are on a little endian machine, they will be little endian, and vice versa. So generally you would want to avoid these. | UTF-16 | UTF-16, UTF_16, UCS-2, UCS_2, UCS2, ISO-10646-UCS-2, CP1200 | | UCS-4 | UCS-4, UCS_4, UCS4 |
Other encodings will be added as required. Nested Classes:
Enum=XCoderErrors
BadEncoding : "Text encoding '%(1)' is not supported";
BadSrcData : "The source data was badly formed for this ...
BadSrcRange : "The source index/count %(1)/%(2) is beyond the ...
BadCharCount : "The requested source count is %(1) but the ...
BufferSz : "The target buffer is too small (Max=%(1))to ...
ConvertFromErr : "";
ConvertToErr : "";
Unrep : "The source contains unrepresentable chars";
EndEnum;This enumerated type defines the com port specific exceptions that this class might throw. Note though that other exceptions might be thrown from other classes used by this class or passed into the methods of this class, and some common exceptions from MEng.Object might be thrown. Note that some of them have no associated text because the actual text reported comes from the underlying C++ error that occurred.
Constructors:
Constructor();
Constructor([In] MEng.String EncodingName); The default constructor creates a US-ASCII transcoder, since that's probably the one most commonly used. The second constructor allows you to provide an encoding. If the encoding is not supported, a BadEncoding exception will be thrown.
Final, Const Methods:
GetEncoding() Returns MEng.String; Returns the currently set encoding for this transcoding object.
Final, Non-Const Methods:
ConvertFrom
(
[In] MEng.System.Runtime.MemBuf SrcBuf
, [In] MEng.Card4 SrcBytes
, [Out] MEng.String ToFill
) Returns MEng.Card4;Converts up to SrcBytes bytes from the source buffer into the target string. If the source byte count is larger than the allocation size of the buffer, you will get a BadSrcRange exception. If the source data is not valid for the encoding, you could get one of the other format exceptions. Returns the number of bytes eaten from the source buffer.
ConvertFromRange
(
[In] MEng.System.Runtime.MemBuf SrcBuf
, [In] MEng.Card4 StartInd
, [In] MEng.Card4 Count
, [Out] MEng.String ToFill
) Returns MEng.Card4;Converts up to Count bytes from the source buffer, starting at the StartInd index, into the target string. If the start index or the index plus the count count is larger than the allocation size of the buffer, you will get a BadSrcRange exception. If the source data is not valid for the encoding, you could get one of the other format exceptions. Returns the number of bytes eaten from the source buffer.
ConvertTo
(
[In] MEng.String ToConvert
, [Out] MEng.System.Runtime.MemBuf ToFill
, [Out] MEng.Card4 BytesWritten
) Returns MEng.Card4;Converts the characters in ToConvert to the current encoding, putting the resulting bytes into the buffer ToFill. It will set BytesWritten to the number of bytes written to the buffer. It returns the number of chars it ate from the source string. If the buffer's maximum size cannot hold the data, a BufferSz error will be thrown.
ConvertNTo
(
[In] MEng.String ToConvert
, [In] MEng.Card4 Count
, [Out] MEng.System.Runtime.MemBuf ToFill
, [Out] MEng.Card4 BytesWritten
) Returns MEng.Card4;Converts Count characters from ToConvert to the current encoding, putting the resulting bytes into the buffer ToFill. It will set BytesWritten to the number of bytes written to the buffer. It returns the number of chars it ate from the source string. If the buffer's maximum size cannot hold the data, a BufferSz error will be thrown.
SetEncoding([In] MEng.String ToSet); Sets the encoding for this object. If the encoding is not supported, a BadEncoding exception will be thrown. The passed encoding is one of the encodings from the list above, or one of the encoding alias names.
|