1753781 Members
7439 Online
108799 Solutions
New Discussion юеВ

Transfert DB to utf 8

 
SOLVED
Go to solution
Fall_2
Occasional Advisor

Transfert DB to utf 8

Hello,
I would like to know if somebody have an idea on how to transfert an existing DB into utf 8, and avoiding strings getting too long for the data type.
Regards,
Ibra
ib
6 REPLIES 6
Brian Crabtree
Honored Contributor

Re: Transfert DB to utf 8

Export/import is one way to do this. You will have to export the database, recreate the instance with the new character set, and import it.

The strings should not get to long except if you are transfering from one of the Japanese charactersets with Kanji characters. For these columns, you will want to increase the length by 50% (Kanji in JA character sets take 2 characters, while it takes 3 with UTF8). Otherwise, you should be ok.

If your database is in US7ASCII, and you are on 8i, you can also change the character set without having to recreate the database. You can do this with "alter database character set utf8;" and "alter database national character set utf8;".

Hope this helps,

Brian
Indira Aramandla
Honored Contributor
Solution

Re: Transfert DB to utf 8

Hi,

This is what Metalink says for converting a US7ASCII database to UTF8.

There two ways for this.
1. Using the ALTER DATABASE CHARACTER SET command
2. Export the data , then truncation, convert DB and then Import

You can convert a databases to UTF8 using the ALTER DATABASE CHARACTER SET command, but remember the eight bit data won't be converted using this command.

Prior to using the above command you have to (must) use csscan utility correctly and check to see that you have no CLOBs in your database. CLOB storage method differs between single- and multi-byte character sets. You should handle all tables containing CLOBs in the same way as tables containing 8bit characters (export / truncation/ DB convert/ import).

Remember to restart the database twice after using ALTER DATABASE CHARACTER SET (see note 66320.1). Also, (just in case) restart SQL*Plus/svrmgrl session used to start, restart or change the database. This should help avoid NLS initialization bugs that can sometimes be found.

If you your database contains eight bit data then use the export/import utilities and the ALTER DATABASE CHARACTER SET INTERNAL_USE command.

1. Run the database character set scanner utility to identify all tables containing eight bit Windows characters in our US7ASII database
2. Perform full backup of the database
3. Convert the Oracle 8i database to WE8MSWIN1252 character set using ALTER DATABASE CHARACTER SET command
4. Export the tables containing 8 bit data using WE8MSWIN1252 as NLS_LANG setting for the export utility
5. Truncate all data from tables containing 8 bit data, database should not contain only ASCII database
6. Convert the database to UTF8 using ALTER DATABASE CHARACTER SET INTERNAL_USE command
7. Import the export from step 4 into the converted UTF8 database using WE8MSWIN1252 as NLS_LANG setting for the import utility.

***************************************************************************
The issue is that for CLOB processing to be effective the characters in CLOBs must have the same byte length. Otherwise calculating character offsets would require reading everything from the beginning of a CLOB to the requested offset and this would be very inefficient. Single-byte database character sets are fixed-width and are compatible with CLOBs. Multibyte database character sets are variable-width and are incompatible.

NCLOBs have the same issue with the national character set. LONGs do not have the issue. They are stored in the database character set. LONG RAW, BLOB and BFILE are binary types. They are not NLS-dependent.

I hope this gives some idea.
Never give up, Keep Trying
Yogeeraj_1
Honored Contributor

Re: Transfert DB to utf 8

hi,

to add to the above excellent replies by Brian and Indira.

Set your NLS_LANG environment variable to AMERICAN_AMERICA.US7ASCII before doing the import and that will work.

There is a documented (albeit confusingly documented) restriction. Page 2.53 of the utilities guide has:


Character Set Conversion

The following sections describe character conversion for CHAR and NCHAR data.

CHAR Data

Up to three character set conversions may be required for character data during an export/import operation:

1. Export writes export files using the character set specified in the NLS_LANG environment variable for the user session. A character set conversion is performed if the value of NLS_LANG differs from the database character set.

2. If the character set in the export file is different than the Import user session character set, Import performs a character set conversion to its user session character set. Import can perform this conversion only if the ratio of
the width of the widest character in its user session character set to the width of the smallest character in the export file character set is 1.


3. A final character set conversion may be performed if the target database's character set is different from Import's user session character set.



I think you will be hitting the issue in #2, the sessions character set (UTF8) is larger than 1. The export data exported with US7ASCII -- that width is 1. The ratio of these numbers is not 1 -- hence IMP cannot do the conversion (and that's what it is trying to do) -- we have to let the database do the conversion to UTF8.

Setting the NLS_LANG should clear that right up. This has been tested!!

hope this helps too!

Regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Christian Gebhardt
Honored Contributor

Re: Transfert DB to utf 8

Caution !!!

At an Oracle-Workshop last week hey adviced me not to use UTF8 (if not absolutely necessary, e.g. you have contact to the asian countries)

Reason:
UTF8 is a dynamic Characterset up to 6 Bytes a character. If you have more than one database (we have about 160) you will have problems for example with Data-transfer between databases.

Example:
Database A: WE8ISO8859P15
Database B: UTF8

In database A you have a table A with column varchar2(1) filled with 1 character ??, ??, ??, ?? (in germany we call it "Umlaut"=special german sign) These characters needs two bytes in utf8.

In database B :
create table B as select * from A@A
will not work because the table is created with varchar2(1) but you need a column length 2 to store the characters.

Workaround??? I don't know

Chris
Yogeeraj_1
Honored Contributor

Re: Transfert DB to utf 8

hi,

a few more clarifications.

There are inefficiencies in UTF8. It is a variable length encoding scheme. You will find that stored procedures and other string intensive operations may be adversely impacted by the use of UTF8. It is not a "blanket recommend" using UTF8 -- only when you have a need.

As for convertion from US7ASCII to UTF8, should be no problem as long as the data in db1 is all 7bit. If some 8bit data is stored there, it will be converted to 7bit data first.

These are not "problems" just considerations. If I export from an 8bit database and import into a 7bit -- I'll undergo a characterset conversion. My 7bit database cannot store any 8bit characters, they will be mapped into the 7bit characterset (eg: the data will be DIFFERENT, I'll lose some characters that have the high bit set, they'll become some other character based on the rules for converting 8bit to 7bit).

Going the other way, from 7bit to 8bit won't encounter any issues since the 7bit US7ASCII is a subset of the 8bit WE8ISO8859P1.

You must consider this when using multiple charactersets. The strings will be converted from one to the other and you may "lose" some characters.

hope this helps too!

regards
Yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Fall_2
Occasional Advisor

Re: Transfert DB to utf 8

Thanks all for your in put and I will let you know how my transfert went.
Brian I don't know what happen I gave you a 10 for your response.
Regards,
Ibra
ib