Posted by: repettas | May 16, 2008

Import/Export – NLS Considerations for a Unicode Database

How does NLS affect Import/Export

Import and Export are client products, in the same way as SQL*Plus or Oracle Forms, and will therefore translate characters from the database character set to that defined by NLS_LANG. The character set used for the export will be stored in the export file and when the file is imported, the import will check the character set that was used. If it is different than that defined by NLS_LANG at the import site, the characters will be translated to the import character set and then, if necessary to the database character set.

How Should NLS_LANG Be Set When Using Export?

Oracle recommends setting the character set part of the NLS_LANG environment variable to the same character set as the character set of the database you are using.

select value from nls_database_parameters where parameter=’NLS_CHARACTERSET’;

That way no conversion will take place and the export file will be created in the same character set as the original database and contain ALL data from the original database (even incorrectly stored data if that would be the case). Even if the plan is to import this data into a database with a different character set later the conversion can be postponed until the import.

Note that this has no relation to the Operating System. If you have a WE8MSWIN1252 database on a Unix Server (which is totally supported) then you should set NLS_LANG to AMERICAN_AMERICA.WE8MSWIN1252 before export.

During the interaction with the database (SQL*Plus for example) you need to configure your Unix client properly and that can not be 1252 as Unix does not have a 1252 character set.

How Should NLS_LANG Be Set When Using Import?

If the source and target database have the same character set, the character set part of the NLS_LANG should be set to that same character set on both the export and the import.

Even if the character sets of the exporting and importing databases are not the same the best (preferred) value to use for the character set part of NLS_LANG on both export and import is still the character set of the source database.

select value from nls_database_parameters where parameter=’NLS_CHARACTERSET’;

But setting the NLS_LANG to the character set of the target database during import is also correct. That way conversion only takes place once, either on export or on import!

However, the preferred place to do the conversion is between the import executable and the target database.

Note that this has no relation with the Operating System. If your source database is a WE8MSWIN1252 database then you simply should set NLS_LANG to AMERICAN_AMERICA.WE8MSWIN1252 before import.

During the interaction with the database (SQL*Plus) you need to configure your Unix client properly and that can nt be 1252 as Unix does not have a 1252 character set.

Example: You want to go from an WE8MSWIN1252 to an UTF8 database:

Note that this is only the exp/imp example, if you want to migrate to UTF8 you need to use another method.

  • Double check the NLS_CHARACTERSET on the SOURCE database: select * from nls_database_parameters where parameter=’NLS_CHARACTERSET’; and export with the NLS_LANG set to AMERICAN_AMERICA.<NLS_CHARACTERSET>. In this case we want to create an export file containing WE8MSWIN1252 data. (This is also the setting you want to use if you take an export as a backup)
  • On Unix this would be: export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252; exp ….
  • On Windows this would be: C:\>set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252; C:\>exp …
  • Import with the NLS_LANG set to AMERICAN_AMERICA.WE8MSWIN1252 (=source NLS_CHARACTERSET) into the new UTF8 database.
  • On Unix this would be: export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252; imp ….
  • On Windows this would be: C:\>set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252; C:\> imp …
  • The conversion to UTF8 is done while inserting the data in the UTF8 database.

Oracle recommends setting the NLS_LANG explicit in the current shell for UNIX or in the DOS box used for the exp or imp tool on Windows.

How is Import affected by the NLS_LANGUAGE and NLS_TERRITORY?

Not. Normally you use the AMERICAN_AMERICA default, but if you imported with the NLS_LANG set to FRENCH_FRANCE for example then you will not have a problem, even if the originating environment used GERMAN_GERMANY or so on.

What about the Message (possible ncharset conversion) during Import?

If you see messages similar to the above in the import log read further down.

How To Know in What Characterset a dump (export) file is created?

To determine this information simply type in: imp system/oracle@database show=yes file=test.dmp

The output looks similar to the following:

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bit Production
With the Partitioning, Real Application Clusters and Data Mining Scoring Engine options

Export file created by EXPORT:V09.02.00 via conventional path

Warning: the objects were exported by SYS, not by you

import done in WE8ISO8859P15 character set and UTF8 NCHAR character set
-> this is the current NLS_LANG value set in the environment and the NCHAR characterset of the target database
import server uses AL32UTF8 character set (possible charset conversion)
-> this is only shown if the NLS_LANG during the import session is different from the target database characterset, so if you see 3 lines you might have problems
export client uses WE8ISO8859P1 character set (possible charset conversion
-> this is the characterset used during the export session and the characterset used in the dump file
How does NLS affect datapump (expdp/impdp)?

Datapump does not use the NLS_LANG to do conversion between databases. Conversion between 2 database charactersets is done purely based on the NLS_CHARACTERSET (or NLS_NCHAR_CHARACTERSET for NCHAR, NVARCHAR and NCLOB datatypes) of the source and target database.

However, if you specify a parameter file then the NLS_LANG is used. This is only important if you use non-English characters (e.g. for the QUERY parameter) in the parameter file.

If you use non-English characters in the parameter file then the NLS_LANG environment variables should be set (in the session where the Data Pump job is started) to the correct encoding of the parameter file.

Note: There is a chance to have data corruption when going from a 8 bit characterset to UTF8 or another multibyte characterset on ALL 10g versions (including 10.1.0.5 and 10.2.0.3) and 11.1.0.6. Impdp may provoke data corruption unless patch 5874989 is applied. This is caused by Bug 5874989. The “old” exp/imp works fine. This problem is fixed in upcoming 10.2.0.4 and 11.1.0.7 patchsets.

All existing patches for this bug are found here: http://updates.oracle.com/download/5874989.html

For 10.1.0.5, 10.2.0.2, 10.2.0.3 and 11.1.0.6 you can ask for a backport for Bug 5874989 if there isn’t a patch yet for your version/platform.

The patch is technically only needed on the impdp side, but if you use expdp/impdp between different charactersets Oracle suggests you patch all of your systems.

What Causes ORA-01401 or ORA-12899 during import (imp and impdp)?

9i and lower gives ORA-01401: inserted value too large for column, 10g and up give ORA-12899: value too large for column.

This is seen when exporting from a database with a 8 bit NLS_CHARACTERSET (like WE8ISO8859P1, WE8MSWIN1252, WE8DEC …) or 16-bit NLS_CHARACTERSET (like JA16SJIS, ZHS16GBK, KO16MSWIN949) to a database with a NLS_CHARACTERSET set to AL32UTF8 or UTF8.

(AL32)UTF8 users more BYTES to store a character then a 8 and 16 bit charactersets. This is explained in Note: 119119.1 AL32UTF8 (Unicode) Database Character Set Implications.

CHAR and VARCHAR2 column sizes are defined by default in BYTES not characterset.

The best solution is to precreate the tables using char semantics (9i and up) Note 144808.1 Examples and limits of BYTE and CHAR semantics usage.

About these ads

Responses

  1. Dear Sir,
    I have a problem with importing a data pump using impdp and I search this forum but I didn’t found any solution to my problem.
    I extracted a data pump files from an oracle 10g database using expdp command, but when I try to restore this dump on another machine running the same oracle version and the same OS, it imports the tables and indexes correctly but the Arabic data stored at this tables appears on undefined characters.
    the database characterset is AL32UTF8 and the source database uses the same characterset, also I used NLS_LANG but It didn’t solve any thing!
    How can I solve this issue because I deleted the source database and I only have this dump file.
    Please help me and thanks in advance.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: