Oracle started supporting Unicode based character sets in Oracle 7. Below is a summary of the Unicode character sets supported in Oracle:
AL24UTFFSS was the first Unicode character set supported by Oracle. It was introduced in Oracle 7.2. The AL24UTFFSS encoding scheme was based on the Unicode 1.1 standard, which is now obsolete. AL24UTFFSS as been de-supported from Oracle 9i. The migration path for existing AL24UTFFSS databases is to upgrade the database to 8.0 or 8.1, then upgrade the character set to UTF8 before upgrading the database further to 9i or 10g.
UTF8 was the UTF-8 encoded character set introduced in Oracle 8 and 8i. It followed the Unicode 2.1 standard between Oracle 8.0 and 8.1.6, and was upgraded to Unicode version 3.0 for versions 8.1.7, 9i, 10g and 11g. To maintain compatibility with existing installations this character set will remain at Unicode 3.0 in future Oracle releases. Although specific supplementary characters were not assigned to Unicode until version 3.1, the allocation for these characters were already defined in 3.0 So if supplementary characters are inserted in a UTF8 database, it will not corrupt the actual data inside the database. They will be treated as 2 separate undefined characters, occupying 6 bytes in storage. Oracle recommends that customers switch to AL32UTF8 for full supplementary character support.
This is the UTF8 database character set for the EBCDIC platforms. It ahs the same properties as UTF8 on ASCII based platforms. The EBCDIC Unicode transformation format is documented in Unicode Technical Report #1 UTF-EBCDIC. Which can be found at http://www.unicode.org/unicode/reports/tr16/
This is the UTF-8 encoded character set introduced in Oracle 9i. AL32UTF8 is the database character set that supports the latest version (5.0 in Oracle 11.1) of the Unicode standard. It also provides support for the newly defined supplementary characters. All supplementary characters are stored as 4 bytes. AL32UTF8 was introduced because when UTF8 was designed (in the time of Oracle 8) there wasn’t a concept of supplementary characters, there UTF8 has a maximum of 3 bytes per character. Changing the design of UTF8 wold break backward compatibility, so a new character set was introduced. The introduction of surrogate pairs should mean that no significant architecture changes are needed in future versions of the Unicode standard, so currently the plan is to keep enhancing AL32UTF8 as necessary to support future versions of the Unicode standard. For example, in Oracle 10.1 this character set was implemented the Unicode 3.2 standard, in Oracle 10.2 that has been updated to support the Unicode 4.01 standard and in Oracle 11.1 to the Unicode 5.0 standard.
Please note that pre-Oracle 9 software can have some serious problems connecting to a AL32UTF8 database.
This is the first UTF-16 encoded character set in Oracle. It was introduced in Oracle 9i as the default national character set (NLS_NCHAR_CHARACTERSET). AL16UTF16 supports the latest version (5.0 in Oracle 11.1) of the Unicode standard. It also provides support for the newly define supplementary characters. All supplementary characters are stored as 4 bytes. As with AL32UTF8, the plan is to keep enhancing AL16UTF16 as necessary to support future versions of the Unicode standard. AL16UTF16 cannot be used as a database character set (NLS_CHARACTERSET), only as the national character set (NLS_NCHAR_CHARACTERSET). The database character set is used to identify and to hold SQL, SQL metadata and PL/SQL source code. It must have either single byte 7-bit ASCII or single byte EBCDIC as a subset, whichever is native to the deployment platform. Therefore, it is not possible to use a fixed-width, multi-byte character set (such as AL16UTF16) as the database character set. Trying to create a database with AL16UTF16 as the database character set in 9i and up will give “ORA-12706: THIS CREATE DATABASE CHARACTER SET IS NOT ALLOWED”. AL16UTF16 is always in Big Endian byte order, regardless of the processor endianess.
There are only a few circumstances where you actually have an advantage of using the national characterset. In 99% of the cases simply use a UTF8 or AL32UTF8 database.
The following URLs contain a complete list of hex values and character descriptions for every Unicode character:Unicode Version 5.0: http://www.unicode.org/Public/5.0.0/ucd/UnicodeData.txt Unicode Version 4.0 http://www.unicode.org/Public/4.0-Update1/UnicodeData-4.0.1.txt Unicode Version 3.2 http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt Unicode Version 3.1 http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt Unicode Version 3.0 http://www.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt Unicode Versions 2.x http://www.unicode.org/unicode/standard/versions/enumeratedVersions.html Unicode Version 1.1 http://www.unicode.org/Public/1.1-Update/UnicodeData-1.1.5.txt
A description of the file format can be found at: http://www.unicode.org/Public/UNIDATA/UnicodeData.html
For a glossary of Unicode terms, see: http://www.unicode.org/glossary
On above locations you can find the unicode standard, all characters that are there are referenced with their UCS-2 codepoint.
Oracle currently has no plans to desupport UTF8, they simple encourage everyone to use AL32UTF8. All codepoints defined in UTF8 are also valid in AL32UTF8. So there is never an issue with going from UTF8 to AL32UTF8.