![]() |
This overview breaks the database vendors into three categories based on chronological availability, followed by the physical data storage characteristics. The physical storage type was chosen because it has a direct bearing on how the data is to be processed (double-byte or multi-byte), and the default operation of the Data Definition Language (DDL). A "CHAR(30) Unicode" statement with UCS-2 will store 30 Unicode characters. This has more efficient processing characteristics, but doubles the space required for Latin-1 data.
A "CHAR(30)" statement with UTF-8 will store between 10 and 30 Unicode characters. This has a minor performance penalty due to "byte counting" but does not significantly increase the space needed for Latin-1 data and can be implemented using existing multibyte schemes. Asian data will increase its data footprint by 50% with UTF-8.
Note that you may not know how the data is processed internally and that a UCS2 datatype does not ensure true Unicode internals.
Available now, March 1997
UCS-2 can be manipulated and stored today with the IBM DB2 database using the GRAPHIC datatype set with the CCSID set to UCS-2 [18]. ADABAS D [16] and Teradata [27] offer separate Unicode datatypes, along with the standard CHAR types. ADABAS D can also store in the UTF-7 and UTF-8 encodings.
UTF-8 is offered today as an alternate default character set on a server-wide (Sybase), Database-wide (Oracle), and per-column (Interbase and ADABAS D) basis.
Note: it was unclear at the time this was written if Interbase [26] used UCS-2 or UTF-8. An outside source indicated that it used the UTF-8 encoding. The author was unable to confirm this based on Borland sources, so it is included here with a question mark.