PeopleSoft

Oracle Unicode Character Sets

May 16, 2008 · 2 Comments

Oracle started supporting Unicode based character sets in Oracle 7. Below is a summary of the Unicode character sets supported in Oracle:

Oracle Unicode Supported Character Sets

AL24UTFFSS

AL24UTFFSS was the first Unicode character set supported by Oracle. It was introduced in Oracle 7.2. The AL24UTFFSS encoding scheme was based on the Unicode 1.1 standard, which is now obsolete. AL24UTFFSS as been de-supported from Oracle 9i. The migration path for existing AL24UTFFSS databases is to upgrade the database to 8.0 or 8.1, then upgrade the character set to UTF8 before upgrading the database further to 9i or 10g.

UTF8

UTF8 was the UTF-8 encoded character set introduced in Oracle 8 and 8i. It followed the Unicode 2.1 standard between Oracle 8.0 and 8.1.6, and was upgraded to Unicode version 3.0 for versions 8.1.7, 9i, 10g and 11g. To maintain compatibility with existing installations this character set will remain at Unicode 3.0 in future Oracle releases. Although specific supplementary characters were not assigned to Unicode until version 3.1, the allocation for these characters were already defined in 3.0 So if supplementary characters are inserted in a UTF8 database, it will not corrupt the actual data inside the database. They will be treated as 2 separate undefined characters, occupying 6 bytes in storage. Oracle recommends that customers switch to AL32UTF8 for full supplementary character support.

UTFE

This is the UTF8 database character set for the EBCDIC platforms. It ahs the same properties as UTF8 on ASCII based platforms. The EBCDIC Unicode transformation format is documented in Unicode Technical Report #1 UTF-EBCDIC. Which can be found at http://www.unicode.org/unicode/reports/tr16/

AL32UTF8

This is the UTF-8 encoded character set introduced in Oracle 9i. AL32UTF8 is the database character set that supports the latest version (5.0 in Oracle 11.1) of the Unicode standard. It also provides support for the newly defined supplementary characters. All supplementary characters are stored as 4 bytes. AL32UTF8 was introduced because when UTF8 was designed (in the time of Oracle 8) there wasn’t a concept of supplementary characters, there UTF8 has a maximum of 3 bytes per character. Changing the design of UTF8 wold break backward compatibility, so a new character set was introduced. The introduction of surrogate pairs should mean that no significant architecture changes are needed in future versions of the Unicode standard, so currently the plan is to keep enhancing AL32UTF8 as necessary to support future versions of the Unicode standard. For example, in Oracle 10.1 this character set was implemented the Unicode 3.2 standard, in Oracle 10.2 that has been updated to support the Unicode 4.01 standard and in Oracle 11.1 to the Unicode 5.0 standard.

Please note that pre-Oracle 9 software can have some serious problems connecting to a AL32UTF8 database.

AL16UTF16

This is the first UTF-16 encoded character set in Oracle. It was introduced in Oracle 9i as the default national character set (NLS_NCHAR_CHARACTERSET). AL16UTF16 supports the latest version (5.0 in Oracle 11.1) of the Unicode standard. It also provides support for the newly define supplementary characters. All supplementary characters are stored as 4 bytes. As with AL32UTF8, the plan is to keep enhancing AL16UTF16 as necessary to support future versions of the Unicode standard. AL16UTF16 cannot be used as a database character set (NLS_CHARACTERSET), only as the national character set (NLS_NCHAR_CHARACTERSET). The database character set is used to identify and to hold SQL, SQL metadata and PL/SQL source code. It must have either single byte 7-bit ASCII or single byte EBCDIC as a subset, whichever is native to the deployment platform. Therefore, it is not possible to use a fixed-width, multi-byte character set (such as AL16UTF16) as the database character set. Trying to create a database with AL16UTF16 as the database character set in 9i and up will give “ORA-12706: THIS CREATE DATABASE CHARACTER SET IS NOT ALLOWED”. AL16UTF16 is always in Big Endian byte order, regardless of the processor endianess.

There are only a few circumstances where you actually have an advantage of using the national characterset. In 99% of the cases simply use a UTF8 or AL32UTF8 database.

The following URLs contain a complete list of hex values and character descriptions for every Unicode character:

Unicode Version 5.0: http://www.unicode.org/Public/5.0.0/ucd/UnicodeData.txt
Unicode Version 4.0 http://www.unicode.org/Public/4.0-Update1/UnicodeData-4.0.1.txt
Unicode Version 3.2 http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt
Unicode Version 3.1 http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt
Unicode Version 3.0 http://www.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt
Unicode Versions 2.x http://www.unicode.org/unicode/standard/versions/enumeratedVersions.html
Unicode Version 1.1 http://www.unicode.org/Public/1.1-Update/UnicodeData-1.1.5.txt

A description of the file format can be found at: http://www.unicode.org/Public/UNIDATA/UnicodeData.html

For a glossary of Unicode terms, see: http://www.unicode.org/glossary

On above locations you can find the unicode standard, all characters that are there are referenced with their UCS-2 codepoint.

Oracle currently has no plans to desupport UTF8, they simple encourage everyone to use AL32UTF8. All codepoints defined in UTF8 are also valid in AL32UTF8. So there is never an issue with going from UTF8 to AL32UTF8.

Categories: Oracle

Import/Export – NLS Considerations for a Unicode Database

May 16, 2008 · Leave a Comment

How does NLS affect Import/Export

Import and Export are client products, in the same way as SQL*Plus or Oracle Forms, and will therefore translate characters from the database character set to that defined by NLS_LANG. The character set used for the export will be stored in the export file and when the file is imported, the import will check the character set that was used. If it is different than that defined by NLS_LANG at the import site, the characters will be translated to the import character set and then, if necessary to the database character set.

How Should NLS_LANG Be Set When Using Export?

Oracle recommends setting the character set part of the NLS_LANG environment variable to the same character set as the character set of the database you are using.

select value from nls_database_parameters where parameter=’NLS_CHARACTERSET’;

That way no conversion will take place and the export file will be created in the same character set as the original database and contain ALL data from the original database (even incorrectly stored data if that would be the case). Even if the plan is to import this data into a database with a different character set later the conversion can be postponed until the import.

Note that this has no relation to the Operating System. If you have a WE8MSWIN1252 database on a Unix Server (which is totally supported) then you should set NLS_LANG to AMERICAN_AMERICA.WE8MSWIN1252 before export.

During the interaction with the database (SQL*Plus for example) you need to configure your Unix client properly and that can not be 1252 as Unix does not have a 1252 character set.

How Should NLS_LANG Be Set When Using Import?

If the source and target database have the same character set, the character set part of the NLS_LANG should be set to that same character set on both the export and the import.

Even if the character sets of the exporting and importing databases are not the same the best (preferred) value to use for the character set part of NLS_LANG on both export and import is still the character set of the source database.

select value from nls_database_parameters where parameter=’NLS_CHARACTERSET’;

But setting the NLS_LANG to the character set of the target database during import is also correct. That way conversion only takes place once, either on export or on import!

However, the preferred place to do the conversion is between the import executable and the target database.

Note that this has no relation with the Operating System. If your source database is a WE8MSWIN1252 database then you simply should set NLS_LANG to AMERICAN_AMERICA.WE8MSWIN1252 before import.

During the interaction with the database (SQL*Plus) you need to configure your Unix client properly and that can nt be 1252 as Unix does not have a 1252 character set.

Example: You want to go from an WE8MSWIN1252 to an UTF8 database:

Note that this is only the exp/imp example, if you want to migrate to UTF8 you need to use another method.

  • Double check the NLS_CHARACTERSET on the SOURCE database: select * from nls_database_parameters where parameter=’NLS_CHARACTERSET’; and export with the NLS_LANG set to AMERICAN_AMERICA.<NLS_CHARACTERSET>. In this case we want to create an export file containing WE8MSWIN1252 data. (This is also the setting you want to use if you take an export as a backup)
  • On Unix this would be: export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252; exp ….
  • On Windows this would be: C:\>set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252; C:\>exp …
  • Import with the NLS_LANG set to AMERICAN_AMERICA.WE8MSWIN1252 (=source NLS_CHARACTERSET) into the new UTF8 database.
  • On Unix this would be: export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252; imp ….
  • On Windows this would be: C:\>set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252; C:\> imp …
  • The conversion to UTF8 is done while inserting the data in the UTF8 database.

Oracle recommends setting the NLS_LANG explicit in the current shell for UNIX or in the DOS box used for the exp or imp tool on Windows.

How is Import affected by the NLS_LANGUAGE and NLS_TERRITORY?

Not. Normally you use the AMERICAN_AMERICA default, but if you imported with the NLS_LANG set to FRENCH_FRANCE for example then you will not have a problem, even if the originating environment used GERMAN_GERMANY or so on.

What about the Message (possible ncharset conversion) during Import?

If you see messages similar to the above in the import log read further down.

How To Know in What Characterset a dump (export) file is created?

To determine this information simply type in: imp system/oracle@database show=yes file=test.dmp

The output looks similar to the following:

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bit Production
With the Partitioning, Real Application Clusters and Data Mining Scoring Engine options

Export file created by EXPORT:V09.02.00 via conventional path

Warning: the objects were exported by SYS, not by you

import done in WE8ISO8859P15 character set and UTF8 NCHAR character set
-> this is the current NLS_LANG value set in the environment and the NCHAR characterset of the target database
import server uses AL32UTF8 character set (possible charset conversion)
-> this is only shown if the NLS_LANG during the import session is different from the target database characterset, so if you see 3 lines you might have problems
export client uses WE8ISO8859P1 character set (possible charset conversion
-> this is the characterset used during the export session and the characterset used in the dump file
How does NLS affect datapump (expdp/impdp)?

Datapump does not use the NLS_LANG to do conversion between databases. Conversion between 2 database charactersets is done purely based on the NLS_CHARACTERSET (or NLS_NCHAR_CHARACTERSET for NCHAR, NVARCHAR and NCLOB datatypes) of the source and target database.

However, if you specify a parameter file then the NLS_LANG is used. This is only important if you use non-English characters (e.g. for the QUERY parameter) in the parameter file.

If you use non-English characters in the parameter file then the NLS_LANG environment variables should be set (in the session where the Data Pump job is started) to the correct encoding of the parameter file.

Note: There is a chance to have data corruption when going from a 8 bit characterset to UTF8 or another multibyte characterset on ALL 10g versions (including 10.1.0.5 and 10.2.0.3) and 11.1.0.6. Impdp may provoke data corruption unless patch 5874989 is applied. This is caused by Bug 5874989. The “old” exp/imp works fine. This problem is fixed in upcoming 10.2.0.4 and 11.1.0.7 patchsets.

All existing patches for this bug are found here: http://updates.oracle.com/download/5874989.html

For 10.1.0.5, 10.2.0.2, 10.2.0.3 and 11.1.0.6 you can ask for a backport for Bug 5874989 if there isn’t a patch yet for your version/platform.

The patch is technically only needed on the impdp side, but if you use expdp/impdp between different charactersets Oracle suggests you patch all of your systems.

What Causes ORA-01401 or ORA-12899 during import (imp and impdp)?

9i and lower gives ORA-01401: inserted value too large for column, 10g and up give ORA-12899: value too large for column.

This is seen when exporting from a database with a 8 bit NLS_CHARACTERSET (like WE8ISO8859P1, WE8MSWIN1252, WE8DEC …) or 16-bit NLS_CHARACTERSET (like JA16SJIS, ZHS16GBK, KO16MSWIN949) to a database with a NLS_CHARACTERSET set to AL32UTF8 or UTF8.

(AL32)UTF8 users more BYTES to store a character then a 8 and 16 bit charactersets. This is explained in Note: 119119.1 AL32UTF8 (Unicode) Database Character Set Implications.

CHAR and VARCHAR2 column sizes are defined by default in BYTES not characterset.

The best solution is to precreate the tables using char semantics (9i and up) Note 144808.1 Examples and limits of BYTE and CHAR semantics usage.

Categories: Oracle · PeopleTools

National Character Set Conversion in Oracle 9i and 10g

May 16, 2008 · 1 Comment

What is the National Character Set?

The National Character set (NLS_NCHAR_CHARACTERSET) is a character set which is defined in addition to the (normal) database character set and is used for data stored in NCHAR, NVARCHAR2 and NCLOB columns. You can find your current value for the NLS_NCHAR_CHARACTERSET with the following select statement:

SQL> select value from NLS_DATABASE_PARAMETERS where parameter = ‘NLS_NCHAR_CHARACTERSET’;

You can not have more than 2 charactersets defined in Oracle: The NLS_CHARACTERSET is used for CHAR, VARCHAR2, and CLOB columns; the NLS_NCHAR_CHARACTERSET is used for NCHAR, NVARCHAR2 and NCLOB columns.

NLS_NCHAR_CHARACTERSETis defined when the database is created and specified with the CREATE DATABASE command. The NLS_NCHAR_CHARACTERSET defaults to AL16UTF16 if nothing is specified. From 9i onwards the NLS_NCHAR_CHARACTERSET can only have 2 values: UTF8 or AL16UTF16 which are the Unicode character sets in the Oracle database.

A lot of people think they need to use the NLS_NCHAR_CHARACTERSET to have UNICODE support in Oracle but this is not true, NLS_NCHAR_CHARCTERSET (NCHAR, NVARCHAR2) in 9i is always Unicode but you can perfectly use the “normal” CHAR and VARCHAR2 columns for storing unicode in a database that has a AL32UTF8 / UTF8 NLS_CHARACTERSET.

When trying to use another NATIONAL characterset the CREATE DATABASE command will fail with ORA-12714: invalid national character set specified – If you try to create an Oracle 9i or 10g database with an invalid database character set.

Which Datatypes Use the National Character Set?

There are three datatypes which can store data in the national character set:

NCHAR – is a fixed-length national character set character string. The length of the column is ALWAYS defined in characters (it always uses CHAR semantics)

NVARCHAR2 – is a variable-length national character set character string. The length of the column is ALWAYS defined in characters (it always uses CHAR semantics)

NCLOB – stores national character set data of up to four gigabytes in size. Data is always stored in UCS2 or AL16UTF16, even if the NLS_NCHAR_CHARACTERSET is UTF8. If you use N-types, DO use the (N’…’) syntax when coding it so that the literals are denoted as being in the national character set by preprending the letter ‘N’, for example: create table test values(N’this is a NLS_NCHAR_CHARACTERSET string’);

How Do I Know if I Use N-Type Columns?

This select statement will list all tables containing a N-type column:

select distinct owner, table_name from dba_tab_columns where data_type in (‘NCHAR’,'NVARCHAR2′,’NCLOB’);

On a 9i database created without (!) the “sample” schema you will see these rows (or less) returned:

OWNER TABLE_NAME
—————————
SYS ALL_REPRIORITY
SYS DBA_FGA_AUDIT_TRAIL
SYS DBA_REPRIORITY
SYS DEFLOB
SYS STREAMS$_DEF_PROC
SYS USER_REPRIORITY
SYSTEM DEF$_LOB
SYSTEM DEF$_TEMP$LOB
SYSTEM REPCAT$_PRIORITY
9 Rows selected.
These SYS and SYSTEM tables may contain data if you are using:
  • Fine Grained Auditing -> DBA_FGA_AUDIT_TRAIL
  • Advanced Replication -> ALL_REPRIORITY, DBA_REPRIORITY, USER_REPRIORITY, DEF$_TEMP$LOB, REPCAT$_PRIORITY
  • Advanced Replication or Deferred Transactions Functionality -> DEFLOB
  • Oracle Streams -> STREAMS$_DEF_PROC

If you created the database with DBCA (database assistant) and included the sample schema then you will typically see:

OWNER TABLE_NAME
——————————–
OE BOMBAY_INVENTORY
OE PRODUCTS
OE PRODUCT_DESCRIPTIONS
OE SYDNEY_INVENTORY
OE TORONTO_INVENTORY
PM PRINT_MEDIA
SYS ALL_REPRIORITY
SYS DBA_FGA_AUDIT_TRAIL
SYS DBA_REPRIORITY
SYS DEFLOB
SYS STREAMS$_DEF_PROC
SYS USER_REPRIORITY
SYSTEM DEF$_LOB
SYSTEM DEF$_TEMP$LOB
SYSTEM REPCAT$_PRIORITY
15 Rows selected.
The OE and PM tables contain just sample data and be dropped if need be.

Should I Worry When I Upgrade from 8i or lower to 9i or 10g?

  • When upgrading from Version 7 – The National Character Set did not exist in Oracle Version 7, so you can not have N-type columns. Your database will just have the -default- AL16UTF16 NLS_NCHAR_CHARACTERSET declaration and the standing sys/system tables. So there is no need to worry about anything when upgrading 7.
  • When upgrading from Version 8 and 8i – If you only have the SYS / SYSTEM tables listed above then you don’t have USER data using N-type columns. Your database will just have the -default- AL16UTF16 NLS_NCHAR_CHARACTERSET declaration AL16UTF16 NLS_NCHAR_CHARACTERSET declarations after the upgrade and the standard sys/system tables. So there is nothing to worry about…
  • If you have more tables than SYS / SYSTEM listed above (and they are also not the “sample” tables) then there are 3 possible cases (again the next to points are only relevant when you DO have n-type USER data)
    • Your current 8 / 8i NLS_NCHAR_CHARACTERSET is in this list: JA16SJISFIXED, JA16EUCIXED, ZHT32TRISFIXED, KO16KSC5601FIXED, KO16DBCSFIXED, S16TSFIXED, ZHS16CBG231280FIXED, ZHS16GBKFIXED, ZHS16DBCSFIXED, ZHT16DBCSFIXED, ZHT16BIG5FIXED, ZHT32EUCFIXED – Then the new NLS_NCHAR_CHARACTERSET will be AL16UTF16 and your data will be converted to AL16UTF16 during the upgrade.
    • Your current 8 / 8i NLS_NCHAR_CHARACTERSET is UTF8: Then the new NLS_NCHAR_CHARACTERSET will be UTF8 and your data will not be touched during the upgrade.
    • Your current 8 / 8i NLS_NCHAR_CHARACTERSET is NOT in the list of points above and IS NOT UTF8: Then you will need to export your data and drop it before upgrading.

The NLS_NCHAR_CHARACTERSET is NOT Changed to UTF8 or AL16UTF8 after Upgrading to 9i:

This may happen if you have not set the ORA_NLS33 environment variable correctly to the 9i Oracle Home during the upgrade. Strongly consider restoring your backup and performing the migration again or create a case with Oracle.

Can You Change the AL16UTF16 to UTF8/ People Hear that there are Problems with AL16UTF16?

  • If you do not use N-types then there is NO problem at all with AL16UTF16 because you are simply not using it and Oracle strongly advises you to stay with the default AL16UTF16 NLS_NCHAR_CHARACTERSET.
  • If you do use N-types then there will be a problem with the 8i clients and lower accessing the N-type columns (note that you will NOT have a problem selecting from the “Normal” non-N-types (columns). If this is a situation you find yourself in Oracle recommends that you use UTF8 as NLS_NCHAR_CHARACTERSET or create a second 9i database using UTF8 as NCHAR and use this as “inbetween or staging” database between the 8i and the 9i database so you can create views in this new database that do a select from the AL16UTF16 9i database, the data will then be converted from AL16UTF16 to UTF8 in the “inbetween” database and that can be read by Oracle 8i. This is one of the two reasons why you should use UTF8 as NLS_NCHAR_CHARACTERSET, if you are NOT using N-type columns with pre-9i clients then there is NO reason to go to UTF8.
  • If you want to change to UTF8 because you are using transportable tablespaces from 8i database then you want to check if you are using N-types in the 8i database that are included in the tablespace that you are transporting. Use the following select statement: select distinct owner, table_name from dba_tab_columns where data_type in (‘NCHAR’,'NVARCHAR2′,’NCLOB’); If yes, then you have the second reason to use UTF8 for NLS_NCHAR_CHARACTERSET. If not, then leave it to AL16UTF16 and create a TAR with Oracle for the solution for any ORA-19736 error(s).
  • You are in one of the 2 situations where it’s really needed to change from AL16UTF16 to UTF8 then the correct steps to go from AL16UTF16 to UTF8 are:
    • install csscan
    • for 10g use the csscan 2.x version found in $ORACLE_HOME/bin, no need to install a new version (skip the next step)
    • For 9.2 and lower: Please DO install version 1.2 or higher from OTN for your OS version. Copy all of the scripts and executables found in the zip file you downloaded to your $ORACLE_HOME directory and overwrite (replace) the older versions. Note: Do NOT use the csscan of a 10g Installation on a 9i/8i database!
    • Execute csminst.sql using the following commands and executables:
cd $ORACLE_HOME/rdbms/admin
export ORACLE_SID=<Oracle SID>
sqlplus “/ as sysdba”
set termout on
set echo on
spool csminst.log
START csminst.sql
Check the csminst.log for errors. If you get the error “Character set migrate utility schema not compatible” then check to be sure you replace all of the older csscan files and executables with the ones in the download zip file (check the file permissions in the target directory, make sure you are starting the executable from the right directory, check your $PATH settings, make sure you are not starting csscan from an inappropriate Oracle Home or you have not executed a cmsinst.sql from a different location or wrong version).
For 9i export all of the user N-data, drop/truncate all the user N-data (if you do not drop all N-data then you will run into ORA-12717: Cannot issue ALTER DATABASE NATIONAL CHARACTERSET when NCLOB, NCHAR or NVARCHAR2 data exists, execute csscan to check if everything is OK by:
csscan FULL=Y TOCHAR=UTF8 LOG=check CAPTURE=Y ARRAY=10000000 PROCESS=2
Always execute csscan as sysdba. Review the check.txt file very carefully to verify that ALL character type data in the data dictionary remains the same in the new character set and ALL character type application data remains the same in the new character set. If this is true you can then execute an ALTER DATABASE NATIONAL CHARACTERSET UTF8; command – next shutdown the listener and any applications that connects locally to the database. There should only be ONE connection to the database during the WHOLE time and that’s the SQL*Plus session that you are using to perform the change.
  1. Make sure the parallel server parameter in init.ora is set to false or it is not set at all. If you are using RAC see Oracle’s MetaLink note for changing the Character Set for a RAC database.
  2. Execute the following commands in a SQL*Plus session connected as “/ as sysdba”:
spool Nswitch.log
shutdown immediate;
startup mount;
alter system enable restricted session;
alter system set job_queue_processes=0;
alter system set aq_tm_processes=0;
alter database open;
alter database national character set UTF8;
shutdown immediate;
Restore the parallel_server parameter in init.ora, if necessary:
Import the user N-data again.
For 10g and up:
  • export any user N-data
  • drop/truncate any user N-data
  • truncate these 2 xdb tables if they return 7 rows
SQL>select LOCALNAME from XDB.XDB$QNAME_ID;
SQL>select NMSPCURI from XDB.XDB$NMSPC_ID;
sql> truncate table xdb.xdb$qname_id;
sql> truncate table xdb.xdb$nmspc_id;

If you have more rows than 7 open a SR with Oracle. Otherwise, execute csscan to verify if everything is ok.

csscan FULL=Y TONCHAR=UTF8 LOG=check CAPTURE=Y ARRAY=10000000 PROCESS=2

always execute csscan as “/ as sysdba”

Review the results in the check.txt file. You should see the following:

All character type data in the data dictionary remain the same in the new character set
All character type application data remain the same in the new character set
Next run csalter (using alter database is in a 10g system for the national characterset is not really a problem, however, ….
Shutdown the listener and any application that connects locally to the database. There should only be ONE connection to the database during the WHOLE time and that’s the SQL*Plus session you are using to execute the commands. Once you’ve verified that your session is the only current session connect “/ as sysdba” and execute the following set of commands:
– Make sure that the parallel_server and CLUSTER_DATABASE parameters are set to FALSE or they are not set
– at all. If you are using RAC you will need to start the database in single instance mode by setting the
– initialization parameter CUSTER_DATABASE=FALSE
show parameter CLUSTER_DATABASE
show parameter PARALLEL_SERVER
– check if you are using an spfile
show parameter spfile
– if this displays the path to an SPFILE then you are using spfile and in that case you need to do:
show parameter job_queue_processes
show parameter aq_tm_processes
– then do
shutdown immediate;
startup restrict;
spool Nswitch.log
@?/rdbms/admin/csalter.plb
– if you are using an spfile then you need to also do this
alter system set job_queue_processes=<original value> scope=both;
alter system set aq_tm_processes=<original value> scope=both;
shutdown;
startup;

After the above you will need to update the XDB tables with the following inserts:

(these are located in the $ORACLE_HOME/rdbms/admin/catxdbtm.sql script)

/************ Insert reserved values into tables *********************/

insert into xdb.xdb$nmspc_id values
(‘http://www.w3.org/XML/1998/namespace’,          HEXTORAW(‘01′));
insert into xdb.xdb$nmspc_id values
(‘http://www.w3.org/XML/2000/xmlns’,              HEXTORAW(‘02′));
insert into xdb.xdb$nmspc_id values
(‘http://www.w3.org/2001/XMLSchema-instance’,     HEXTORAW(‘03′));
insert into xdb.xdb$nmspc_id values
(‘http://www.w3.org/2001/XMLSchema’,              HEXTORAW(‘04′));
insert into xdb.xdb$nmspc_id values
(‘http://xmlns.oracle.com/2004/csx’,              HEXTORAW(‘05′));
insert into xdb.xdb$nmspc_id values
(‘http://xmlns.oracle.com/xdb’,                   HEXTORAW(‘06′));
insert into xdb.xdb$nmspc_id values
(‘http://xmlns.oracle.com/xdb/nonamespace’,       HEXTORAW(‘07′));

insert into xdb.xdb$qname_id values (HEXTORAW(‘01′), ’space’, HEXTORAW(‘01′), HEXTORAW(‘10′));
insert into xdb.xdb$qname_id values (HEXTORAW(‘01′), ‘lang’, HEXTORAW(‘01′), HEXTORAW(‘11′));
insert into xdb.xdb$qname_id values (HEXTORAW(‘03′), ‘type’, HEXTORAW(‘01′), HEXTORAW(‘12′));
insert into xdb.xdb$qname_id values (HEXTORAW(‘03′), ‘nil’, HEXTORAW(‘01′), HEXTORAW(‘13′));
insert into xdb.xdb$qname_id values
(HEXTORAW(‘03′), ’schemaLocation’, HEXTORAW(‘01′), HEXTORAW(‘14′));
insert into xdb.xdb$qname_id values
(HEXTORAW(‘03′), ‘noNamespaceSchemaLocation’, HEXTORAW(‘01′), HEXTORAW(‘15′));
insert into xdb.xdb$qname_id values (HEXTORAW(‘02′), ‘xmlns’, HEXTORAW(‘01′), HEXTORAW(‘16′));

commit;

Import any user N-data.

Important: Do NOT use the N_SWITCH.SQL script any more AFTER migrating from Oracle 8i to 9i, using this script a second time will corrupt existing NCHAR data!!!!!!!

Is the AL32UTF8 Problem the Same as the AL16UTF16  / Do I need the same Patches?

No, they may look similar but they are different problems. This refers to the problems connecting to AL32UTF8 databases with older client software.

But I Still Want <characterset> as NLS_NCHAR_CHARACTERSET, like I had in 8i:

This is simply not possible. From 9i onwards the NLS_NCHAR_CHARACTERSET can have only 2 values: UTF8 or AL16UTF16.  Both UTF8 and AL16UTF16 are unicode charactersets, so they can store whatever <characterset> you had as NLS_NCHAR_CHARACTERSET in 8(i). If you are not using N-types then keep the default AL16UTF16 (or use UTF8 if you really want), it doesn’t matter if you don’t use N-types. There is one condition in which this “limitation” can have an undesired affect, when you are importing an Oracle 8i Transportable Tablespace into Oracle 9i you can run into ORA-19736 (as well with AL16UTF16 as with UTF8). Simply send Oracle the output of: select distinct owner, table_name from dba_tab_columns where data_type in (‘NCHAR’,NVARCHAR2′,’NCLOB’); In that case open a TAR with Oracle refer to the Metalink note ans ask them to assign the TAR to the NLS/globalization team. That team can then assist you with a work around. Do NOT try to change the national characterset in 8i to AL16UTF16 or update system tables like you find sometimes on the Oracle DBA Sites or Blogs on the Internet at least if you don’t want to jeopardize your data integrity as well as possible destroying you Oracle instance.

Do You Need to set NLS_LANG to AL16UTF16 when Creating / Using the NLS_NCHAR_CHARACTERSET?

NLS_LANG is used to let Oracle know what characterset your client’s OS is using so that Oracle can perform (if needed) the conversion from the client’s characterset to the Server’s database characterset.  NLS_LANG is a CLIENT parameter and has no influence on the server (database side) whatsoever.

AL32UTF8 fails with NLS_NCHAR_CHARACTERSET with ORA-12714:

From 9i onwards the NLS_NCHAR_CHARACTERSET can have only 2 values: UTF8 or AL16UTF16. UTF8 is possible so that you can use it (when needed) for 8.x backwards compatibility. In all other conditions AL16UTF16 is the preferred and best value. AL16UTF16 has the same unicode revision as AL32UTF8 so there is no need for AL32UTF8 as NLS_NCHAR_CHARACTERSET.

You have messages like “possible ncharset conversion)” During Import:

This is normal and is not an error condition. If you do not use N-types than this is a purely informative message. But even in the case that you use N-types like NCHAR and NCLOB then this is not a problem. The database will convert from the “old” NCHAR characterset to the new one automatically (and – unlike the “normal” characterset – the NLS_LANG has no impact on this conversion – during the exp/imp). AL16UTF16 or UTF8 (the only 2 possible values in 9i) are unicode charactersets and so you can store any character…So no data loss is to be expected.

Can You use AL16UTF16 as NLS_CHARACTERSET?

No, AL16UTF16 can only be used as NLS_NCHAR_CHARACTERSET in 9i and above. Trying to create a database with  a AL16UTF16 NLS_CHARACTERSET will fail.

Do You Need to Change the NLS_NCHAR_CHARACTERSET in 8i to UTF8 BEFORE Upgrading to 9i/10g?

No, see previous statements.

Having a UTF8 NLS_CHARACTERSET Database is There an Advantage to use AL16UTF16 N-Types?

There might be 2 reasons: One possible advantage is storage (disk space). UTF8 uses 1 to 3 bytes, AL16UTF16 always uses 2 bytes. If you have a lot of non-Western data (Cyrillic, Chinese, Portugese, etc….) UTF8 will use in most cases less disk space than AL16UTF16. This is NOT true for NCLOB and CLOB, they are both encoded with an internal fixed-width Unicode character set so they will both use the same amount of disk space. The other possible advantage is extending the limits of CHAR semantics. For a single-byte character set encoding, the character and byte length are the same. However, multi-byte character set encodings do not correspond to the bytes, making sizing the column more difficult. Hence the reason why CHAR semantics was introduced. However, we still have some physical underlying byte based limits and development has chosen to allow full usage of CHARacters occupying the MAX datalength that can be stored for a certain datatype in 9i and up.

The MAX column is the MAXIMUM amount of CHARACTERS that can be stored occupying the MAXIMUM data length. Seen that UTF8 and AL32UTF8 are VARYING charactersets this means that a string of X chars can be X to X*3 (or X*4 for AL32) bytes.

The MIN column is the maximum size that you can define and that Oracle can store if all data is the MINIMUM datalength (1 byte for AL32UTF8 and UTF8) for that character.

N-types (NVARCHAR2, NCHAR) are always defined in CHAR semantics, you cannot define them in BYTE.

All numbers are CHAR definitions.

Maximum Unicode Column Values

This means that if you try to store more than 666 characters that occupy 3 bytes in UTF8 column you still will get a ORA-01401: inserted value too large for column (or from 10g onwards: ORA-12899: value too large for column) error, even if you have defined the column as CHAR (2000 CHAR) so here it might be a good idea to define that column as NCHAR that will raise the MAX to 100 char’s …

Disadvantages of using N-types:

  • You might have some problems with older clients if using AL16UTF16
  • Be sure that you use (AL32)UTF8 as NLS_CHARACTERSET, otherwise you will run into one of the above issues.
  • Do not expet a higher performance by using AL16UTF16, it might be faster on some systems, but that has more to do with I/O then with the database kernel.
  • Forms 6i/9i does not support NCHAR / NVARCHAR2
  • If you use N-types, Do use the (N’…’) syntax when coding it so that literals are denoted as being in the national character set by prepending letter ‘N’

Categories: Oracle