Posted by: repettas | May 16, 2008

National Character Set Conversion in Oracle 9i and 10g

What is the National Character Set?

The National Character set (NLS_NCHAR_CHARACTERSET) is a character set which is defined in addition to the (normal) database character set and is used for data stored in NCHAR, NVARCHAR2 and NCLOB columns. You can find your current value for the NLS_NCHAR_CHARACTERSET with the following select statement:

SQL> select value from NLS_DATABASE_PARAMETERS where parameter = ‘NLS_NCHAR_CHARACTERSET’;

You can not have more than 2 charactersets defined in Oracle: The NLS_CHARACTERSET is used for CHAR, VARCHAR2, and CLOB columns; the NLS_NCHAR_CHARACTERSET is used for NCHAR, NVARCHAR2 and NCLOB columns.

NLS_NCHAR_CHARACTERSETis defined when the database is created and specified with the CREATE DATABASE command. The NLS_NCHAR_CHARACTERSET defaults to AL16UTF16 if nothing is specified. From 9i onwards the NLS_NCHAR_CHARACTERSET can only have 2 values: UTF8 or AL16UTF16 which are the Unicode character sets in the Oracle database.

A lot of people think they need to use the NLS_NCHAR_CHARACTERSET to have UNICODE support in Oracle but this is not true, NLS_NCHAR_CHARCTERSET (NCHAR, NVARCHAR2) in 9i is always Unicode but you can perfectly use the “normal” CHAR and VARCHAR2 columns for storing unicode in a database that has a AL32UTF8 / UTF8 NLS_CHARACTERSET.

When trying to use another NATIONAL characterset the CREATE DATABASE command will fail with ORA-12714: invalid national character set specified – If you try to create an Oracle 9i or 10g database with an invalid database character set.

Which Datatypes Use the National Character Set?

There are three datatypes which can store data in the national character set:

NCHAR – is a fixed-length national character set character string. The length of the column is ALWAYS defined in characters (it always uses CHAR semantics)

NVARCHAR2 – is a variable-length national character set character string. The length of the column is ALWAYS defined in characters (it always uses CHAR semantics)

NCLOB – stores national character set data of up to four gigabytes in size. Data is always stored in UCS2 or AL16UTF16, even if the NLS_NCHAR_CHARACTERSET is UTF8. If you use N-types, DO use the (N’…’) syntax when coding it so that the literals are denoted as being in the national character set by preprending the letter ‘N’, for example: create table test values(N’this is a NLS_NCHAR_CHARACTERSET string’);

How Do I Know if I Use N-Type Columns?

This select statement will list all tables containing a N-type column:

select distinct owner, table_name from dba_tab_columns where data_type in (‘NCHAR’,’NVARCHAR2′,’NCLOB’);

On a 9i database created without (!) the “sample” schema you will see these rows (or less) returned:

OWNER TABLE_NAME
—————————
SYS ALL_REPRIORITY
SYS DBA_FGA_AUDIT_TRAIL
SYS DBA_REPRIORITY
SYS DEFLOB
SYS STREAMS$_DEF_PROC
SYS USER_REPRIORITY
SYSTEM DEF$_LOB
SYSTEM DEF$_TEMP$LOB
SYSTEM REPCAT$_PRIORITY
9 Rows selected.
These SYS and SYSTEM tables may contain data if you are using:
  • Fine Grained Auditing -> DBA_FGA_AUDIT_TRAIL
  • Advanced Replication -> ALL_REPRIORITY, DBA_REPRIORITY, USER_REPRIORITY, DEF$_TEMP$LOB, REPCAT$_PRIORITY
  • Advanced Replication or Deferred Transactions Functionality -> DEFLOB
  • Oracle Streams -> STREAMS$_DEF_PROC

If you created the database with DBCA (database assistant) and included the sample schema then you will typically see:

OWNER TABLE_NAME
——————————–
OE BOMBAY_INVENTORY
OE PRODUCTS
OE PRODUCT_DESCRIPTIONS
OE SYDNEY_INVENTORY
OE TORONTO_INVENTORY
PM PRINT_MEDIA
SYS ALL_REPRIORITY
SYS DBA_FGA_AUDIT_TRAIL
SYS DBA_REPRIORITY
SYS DEFLOB
SYS STREAMS$_DEF_PROC
SYS USER_REPRIORITY
SYSTEM DEF$_LOB
SYSTEM DEF$_TEMP$LOB
SYSTEM REPCAT$_PRIORITY
15 Rows selected.
The OE and PM tables contain just sample data and be dropped if need be.

Should I Worry When I Upgrade from 8i or lower to 9i or 10g?

  • When upgrading from Version 7 – The National Character Set did not exist in Oracle Version 7, so you can not have N-type columns. Your database will just have the -default- AL16UTF16 NLS_NCHAR_CHARACTERSET declaration and the standing sys/system tables. So there is no need to worry about anything when upgrading 7.
  • When upgrading from Version 8 and 8i – If you only have the SYS / SYSTEM tables listed above then you don’t have USER data using N-type columns. Your database will just have the -default- AL16UTF16 NLS_NCHAR_CHARACTERSET declaration AL16UTF16 NLS_NCHAR_CHARACTERSET declarations after the upgrade and the standard sys/system tables. So there is nothing to worry about…
  • If you have more tables than SYS / SYSTEM listed above (and they are also not the “sample” tables) then there are 3 possible cases (again the next to points are only relevant when you DO have n-type USER data)
    • Your current 8 / 8i NLS_NCHAR_CHARACTERSET is in this list: JA16SJISFIXED, JA16EUCIXED, ZHT32TRISFIXED, KO16KSC5601FIXED, KO16DBCSFIXED, S16TSFIXED, ZHS16CBG231280FIXED, ZHS16GBKFIXED, ZHS16DBCSFIXED, ZHT16DBCSFIXED, ZHT16BIG5FIXED, ZHT32EUCFIXED – Then the new NLS_NCHAR_CHARACTERSET will be AL16UTF16 and your data will be converted to AL16UTF16 during the upgrade.
    • Your current 8 / 8i NLS_NCHAR_CHARACTERSET is UTF8: Then the new NLS_NCHAR_CHARACTERSET will be UTF8 and your data will not be touched during the upgrade.
    • Your current 8 / 8i NLS_NCHAR_CHARACTERSET is NOT in the list of points above and IS NOT UTF8: Then you will need to export your data and drop it before upgrading.

The NLS_NCHAR_CHARACTERSET is NOT Changed to UTF8 or AL16UTF8 after Upgrading to 9i:

This may happen if you have not set the ORA_NLS33 environment variable correctly to the 9i Oracle Home during the upgrade. Strongly consider restoring your backup and performing the migration again or create a case with Oracle.

Can You Change the AL16UTF16 to UTF8/ People Hear that there are Problems with AL16UTF16?

  • If you do not use N-types then there is NO problem at all with AL16UTF16 because you are simply not using it and Oracle strongly advises you to stay with the default AL16UTF16 NLS_NCHAR_CHARACTERSET.
  • If you do use N-types then there will be a problem with the 8i clients and lower accessing the N-type columns (note that you will NOT have a problem selecting from the “Normal” non-N-types (columns). If this is a situation you find yourself in Oracle recommends that you use UTF8 as NLS_NCHAR_CHARACTERSET or create a second 9i database using UTF8 as NCHAR and use this as “inbetween or staging” database between the 8i and the 9i database so you can create views in this new database that do a select from the AL16UTF16 9i database, the data will then be converted from AL16UTF16 to UTF8 in the “inbetween” database and that can be read by Oracle 8i. This is one of the two reasons why you should use UTF8 as NLS_NCHAR_CHARACTERSET, if you are NOT using N-type columns with pre-9i clients then there is NO reason to go to UTF8.
  • If you want to change to UTF8 because you are using transportable tablespaces from 8i database then you want to check if you are using N-types in the 8i database that are included in the tablespace that you are transporting. Use the following select statement: select distinct owner, table_name from dba_tab_columns where data_type in (‘NCHAR’,’NVARCHAR2′,’NCLOB’); If yes, then you have the second reason to use UTF8 for NLS_NCHAR_CHARACTERSET. If not, then leave it to AL16UTF16 and create a TAR with Oracle for the solution for any ORA-19736 error(s).
  • You are in one of the 2 situations where it’s really needed to change from AL16UTF16 to UTF8 then the correct steps to go from AL16UTF16 to UTF8 are:
    • install csscan
    • for 10g use the csscan 2.x version found in $ORACLE_HOME/bin, no need to install a new version (skip the next step)
    • For 9.2 and lower: Please DO install version 1.2 or higher from OTN for your OS version. Copy all of the scripts and executables found in the zip file you downloaded to your $ORACLE_HOME directory and overwrite (replace) the older versions. Note: Do NOT use the csscan of a 10g Installation on a 9i/8i database!
    • Execute csminst.sql using the following commands and executables:
cd $ORACLE_HOME/rdbms/admin
export ORACLE_SID=<Oracle SID>
sqlplus “/ as sysdba”
set termout on
set echo on
spool csminst.log
START csminst.sql
Check the csminst.log for errors. If you get the error “Character set migrate utility schema not compatible” then check to be sure you replace all of the older csscan files and executables with the ones in the download zip file (check the file permissions in the target directory, make sure you are starting the executable from the right directory, check your $PATH settings, make sure you are not starting csscan from an inappropriate Oracle Home or you have not executed a cmsinst.sql from a different location or wrong version).
For 9i export all of the user N-data, drop/truncate all the user N-data (if you do not drop all N-data then you will run into ORA-12717: Cannot issue ALTER DATABASE NATIONAL CHARACTERSET when NCLOB, NCHAR or NVARCHAR2 data exists, execute csscan to check if everything is OK by:
csscan FULL=Y TOCHAR=UTF8 LOG=check CAPTURE=Y ARRAY=10000000 PROCESS=2
Always execute csscan as sysdba. Review the check.txt file very carefully to verify that ALL character type data in the data dictionary remains the same in the new character set and ALL character type application data remains the same in the new character set. If this is true you can then execute an ALTER DATABASE NATIONAL CHARACTERSET UTF8; command – next shutdown the listener and any applications that connects locally to the database. There should only be ONE connection to the database during the WHOLE time and that’s the SQL*Plus session that you are using to perform the change.
  1. Make sure the parallel server parameter in init.ora is set to false or it is not set at all. If you are using RAC see Oracle’s MetaLink note for changing the Character Set for a RAC database.
  2. Execute the following commands in a SQL*Plus session connected as “/ as sysdba”:
spool Nswitch.log
shutdown immediate;
startup mount;
alter system enable restricted session;
alter system set job_queue_processes=0;
alter system set aq_tm_processes=0;
alter database open;
alter database national character set UTF8;
shutdown immediate;
Restore the parallel_server parameter in init.ora, if necessary:
Import the user N-data again.
For 10g and up:
  • export any user N-data
  • drop/truncate any user N-data
  • truncate these 2 xdb tables if they return 7 rows
SQL>select LOCALNAME from XDB.XDB$QNAME_ID;
SQL>select NMSPCURI from XDB.XDB$NMSPC_ID;
sql> truncate table xdb.xdb$qname_id;
sql> truncate table xdb.xdb$nmspc_id;

If you have more rows than 7 open a SR with Oracle. Otherwise, execute csscan to verify if everything is ok.

csscan FULL=Y TONCHAR=UTF8 LOG=check CAPTURE=Y ARRAY=10000000 PROCESS=2

always execute csscan as “/ as sysdba”

Review the results in the check.txt file. You should see the following:

All character type data in the data dictionary remain the same in the new character set
All character type application data remain the same in the new character set
Next run csalter (using alter database is in a 10g system for the national characterset is not really a problem, however, ….
Shutdown the listener and any application that connects locally to the database. There should only be ONE connection to the database during the WHOLE time and that’s the SQL*Plus session you are using to execute the commands. Once you’ve verified that your session is the only current session connect “/ as sysdba” and execute the following set of commands:
– Make sure that the parallel_server and CLUSTER_DATABASE parameters are set to FALSE or they are not set
– at all. If you are using RAC you will need to start the database in single instance mode by setting the
– initialization parameter CUSTER_DATABASE=FALSE
show parameter CLUSTER_DATABASE
show parameter PARALLEL_SERVER
– check if you are using an spfile
show parameter spfile
– if this displays the path to an SPFILE then you are using spfile and in that case you need to do:
show parameter job_queue_processes
show parameter aq_tm_processes
– then do
shutdown immediate;
startup restrict;
spool Nswitch.log
@?/rdbms/admin/csalter.plb
– if you are using an spfile then you need to also do this
alter system set job_queue_processes=<original value> scope=both;
alter system set aq_tm_processes=<original value> scope=both;
shutdown;
startup;

After the above you will need to update the XDB tables with the following inserts:

(these are located in the $ORACLE_HOME/rdbms/admin/catxdbtm.sql script)

/************ Insert reserved values into tables *********************/

insert into xdb.xdb$nmspc_id values
(‘http://www.w3.org/XML/1998/namespace&#8217;,          HEXTORAW(’01′));
insert into xdb.xdb$nmspc_id values
(‘http://www.w3.org/XML/2000/xmlns&#8217;,              HEXTORAW(’02′));
insert into xdb.xdb$nmspc_id values
(‘http://www.w3.org/2001/XMLSchema-instance&#8217;,     HEXTORAW(’03′));
insert into xdb.xdb$nmspc_id values
(‘http://www.w3.org/2001/XMLSchema&#8217;,              HEXTORAW(’04′));
insert into xdb.xdb$nmspc_id values
(‘http://xmlns.oracle.com/2004/csx&#8217;,              HEXTORAW(’05′));
insert into xdb.xdb$nmspc_id values
(‘http://xmlns.oracle.com/xdb&#8217;,                   HEXTORAW(’06′));
insert into xdb.xdb$nmspc_id values
(‘http://xmlns.oracle.com/xdb/nonamespace&#8217;,       HEXTORAW(’07′));

insert into xdb.xdb$qname_id values (HEXTORAW(’01′), ‘space’, HEXTORAW(’01′), HEXTORAW(’10′));
insert into xdb.xdb$qname_id values (HEXTORAW(’01′), ‘lang’, HEXTORAW(’01′), HEXTORAW(’11′));
insert into xdb.xdb$qname_id values (HEXTORAW(’03′), ‘type’, HEXTORAW(’01′), HEXTORAW(’12′));
insert into xdb.xdb$qname_id values (HEXTORAW(’03′), ‘nil’, HEXTORAW(’01′), HEXTORAW(’13′));
insert into xdb.xdb$qname_id values
(HEXTORAW(’03′), ‘schemaLocation’, HEXTORAW(’01′), HEXTORAW(’14′));
insert into xdb.xdb$qname_id values
(HEXTORAW(’03′), ‘noNamespaceSchemaLocation’, HEXTORAW(’01′), HEXTORAW(’15′));
insert into xdb.xdb$qname_id values (HEXTORAW(’02′), ‘xmlns’, HEXTORAW(’01′), HEXTORAW(’16′));

commit;

Import any user N-data.

Important: Do NOT use the N_SWITCH.SQL script any more AFTER migrating from Oracle 8i to 9i, using this script a second time will corrupt existing NCHAR data!!!!!!!

Is the AL32UTF8 Problem the Same as the AL16UTF16  / Do I need the same Patches?

No, they may look similar but they are different problems. This refers to the problems connecting to AL32UTF8 databases with older client software.

But I Still Want <characterset> as NLS_NCHAR_CHARACTERSET, like I had in 8i:

This is simply not possible. From 9i onwards the NLS_NCHAR_CHARACTERSET can have only 2 values: UTF8 or AL16UTF16.  Both UTF8 and AL16UTF16 are unicode charactersets, so they can store whatever <characterset> you had as NLS_NCHAR_CHARACTERSET in 8(i). If you are not using N-types then keep the default AL16UTF16 (or use UTF8 if you really want), it doesn’t matter if you don’t use N-types. There is one condition in which this “limitation” can have an undesired affect, when you are importing an Oracle 8i Transportable Tablespace into Oracle 9i you can run into ORA-19736 (as well with AL16UTF16 as with UTF8). Simply send Oracle the output of: select distinct owner, table_name from dba_tab_columns where data_type in (‘NCHAR’,NVARCHAR2′,’NCLOB’); In that case open a TAR with Oracle refer to the Metalink note ans ask them to assign the TAR to the NLS/globalization team. That team can then assist you with a work around. Do NOT try to change the national characterset in 8i to AL16UTF16 or update system tables like you find sometimes on the Oracle DBA Sites or Blogs on the Internet at least if you don’t want to jeopardize your data integrity as well as possible destroying you Oracle instance.

Do You Need to set NLS_LANG to AL16UTF16 when Creating / Using the NLS_NCHAR_CHARACTERSET?

NLS_LANG is used to let Oracle know what characterset your client’s OS is using so that Oracle can perform (if needed) the conversion from the client’s characterset to the Server’s database characterset.  NLS_LANG is a CLIENT parameter and has no influence on the server (database side) whatsoever.

AL32UTF8 fails with NLS_NCHAR_CHARACTERSET with ORA-12714:

From 9i onwards the NLS_NCHAR_CHARACTERSET can have only 2 values: UTF8 or AL16UTF16. UTF8 is possible so that you can use it (when needed) for 8.x backwards compatibility. In all other conditions AL16UTF16 is the preferred and best value. AL16UTF16 has the same unicode revision as AL32UTF8 so there is no need for AL32UTF8 as NLS_NCHAR_CHARACTERSET.

You have messages like “possible ncharset conversion)” During Import:

This is normal and is not an error condition. If you do not use N-types than this is a purely informative message. But even in the case that you use N-types like NCHAR and NCLOB then this is not a problem. The database will convert from the “old” NCHAR characterset to the new one automatically (and – unlike the “normal” characterset – the NLS_LANG has no impact on this conversion – during the exp/imp). AL16UTF16 or UTF8 (the only 2 possible values in 9i) are unicode charactersets and so you can store any character…So no data loss is to be expected.

Can You use AL16UTF16 as NLS_CHARACTERSET?

No, AL16UTF16 can only be used as NLS_NCHAR_CHARACTERSET in 9i and above. Trying to create a database with  a AL16UTF16 NLS_CHARACTERSET will fail.

Do You Need to Change the NLS_NCHAR_CHARACTERSET in 8i to UTF8 BEFORE Upgrading to 9i/10g?

No, see previous statements.

Having a UTF8 NLS_CHARACTERSET Database is There an Advantage to use AL16UTF16 N-Types?

There might be 2 reasons: One possible advantage is storage (disk space). UTF8 uses 1 to 3 bytes, AL16UTF16 always uses 2 bytes. If you have a lot of non-Western data (Cyrillic, Chinese, Portugese, etc….) UTF8 will use in most cases less disk space than AL16UTF16. This is NOT true for NCLOB and CLOB, they are both encoded with an internal fixed-width Unicode character set so they will both use the same amount of disk space. The other possible advantage is extending the limits of CHAR semantics. For a single-byte character set encoding, the character and byte length are the same. However, multi-byte character set encodings do not correspond to the bytes, making sizing the column more difficult. Hence the reason why CHAR semantics was introduced. However, we still have some physical underlying byte based limits and development has chosen to allow full usage of CHARacters occupying the MAX datalength that can be stored for a certain datatype in 9i and up.

The MAX column is the MAXIMUM amount of CHARACTERS that can be stored occupying the MAXIMUM data length. Seen that UTF8 and AL32UTF8 are VARYING charactersets this means that a string of X chars can be X to X*3 (or X*4 for AL32) bytes.

The MIN column is the maximum size that you can define and that Oracle can store if all data is the MINIMUM datalength (1 byte for AL32UTF8 and UTF8) for that character.

N-types (NVARCHAR2, NCHAR) are always defined in CHAR semantics, you cannot define them in BYTE.

All numbers are CHAR definitions.

Maximum Unicode Column Values

This means that if you try to store more than 666 characters that occupy 3 bytes in UTF8 column you still will get a ORA-01401: inserted value too large for column (or from 10g onwards: ORA-12899: value too large for column) error, even if you have defined the column as CHAR (2000 CHAR) so here it might be a good idea to define that column as NCHAR that will raise the MAX to 100 char’s …

Disadvantages of using N-types:

  • You might have some problems with older clients if using AL16UTF16
  • Be sure that you use (AL32)UTF8 as NLS_CHARACTERSET, otherwise you will run into one of the above issues.
  • Do not expet a higher performance by using AL16UTF16, it might be faster on some systems, but that has more to do with I/O then with the database kernel.
  • Forms 6i/9i does not support NCHAR / NVARCHAR2
  • If you use N-types, Do use the (N’…’) syntax when coding it so that literals are denoted as being in the national character set by prepending letter ‘N’
About these ads

Responses

  1. [...] of this answer is REPETTAS WORDPRESS BLOG [...]

  2. Thank you very much! Very usefull thread.

  3. I am going to migrate the Windows 32 bit Oracle 10g (Character Set WE8MSWIN1252) database to Windows 64 bit Oracle 10g (Character Set AL32UTF8) but getting below error to export the data on new DB. Some of the rows data missing from this table.
    Error is:
    IMP-00019: row rejected due to ORACLE error 12899
    IMP-00003: ORACLE error 12899 encountered
    ORA-12899: value too large for column “DBNAME”.”Tablename”.”ColumnName” (actual: 77, maximum: 50)

    Please suggest on above. Thanks.

  4. Your blog is really cool to me and your subject matter is very relevant. I was browsing around and came across something you might find interesting. I was guilty of 3 of them with my sites. “99% of website owners are committing these five errors”. http://tinyurl.com/84ettp3 You will be suprised how fast they are to fix.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: