1847994 Members
6961 Online
104022 Solutions
New Discussion

script help

 
Edgar_8
Regular Advisor

script help

Hi All,

We have comma separated files(approx.5000 per day) which we use to load into an Oracle 9i DW table
which has a primary key on a unique column. Currently the load is extremely slow & we have exhausted all our
options. Causes of slow load is due to conventional sqlldr load method & duplicate checking using the primary
key. Going forward we want to use direct sqlldr load method without any primary key on the table, but to do that
we have to ensure that the CSV files have no duplicates. Can someone assist with an hp-ux 11i script that will
read through the CSV file, filer on field 4 for duplicate checking & only output clean data/rows?

Your assistance is most appreciated!
10 REPLIES 10
John Poff
Honored Contributor

Re: script help

Hi,

It shouldn't be too hard. Can you post a small sample of your CSV file?

JP
harry d brown jr
Honored Contributor

Re: script help

sort -u -t"," -k4 FILENAME > newFILENAME

of course it might be important to determine what was duplicated.

live free or die
harry
Live Free or Die
Michael Schulte zur Sur
Honored Contributor

Re: script help

Hi,

I think, you would want to cut out field four, sort it and run uniq -d over it, something like:
cut -d"," -f4 inputfile | sort | uniq -d

hth,

Michael
Manish Srivastava
Trusted Contributor

Re: script help

Hi,

I think this will do:

sort -t, -mu -k4,4

man sort will also help.

Regards,
Manish.
Edgar_8
Regular Advisor

Re: script help

Hi JP,

See below for example:
6,00,30,40A8CF9E16B59940010106070E06,,195,2832,1816,,,,,,,,20040517164342,,,,,,,,,,,,4,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,18,8,0,0,4,1,18,8,0,0,4,1,,2,460,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20040517164342,20040517164342,0,,,,,,,,,,,,,655012500098699,,,,,,,,,,20040517173842,120,,,,,,42,,,,,,,,,,,,END,,
6,00,30,40A8CF9E16B59940010106070E06,,,,,1816,,,,,,,20040517164342,20040517164349,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,212,27,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20040517164342,20040517164349,0,,,,,,,,,,,,,655012500098699,,,,,,,,,,,,,,,,,,,,,,,,,,,,,END,,
6,00,30,40A8CF9E16B59940010106070E06,,195,2832,1816,,,,,,,,20040517164342,,,,,,,,,,,,4,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,18,8,0,0,4,1,18,8,0,0,4,1,,2,460,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20040517164342,20040517164342,0,,,,,,,,,,,,,655012500098699,,,,,,,,,,20040517173842,120,,,,,,42,,,,,,,,,,,,END,,
6,00,30,40A8CF9E16B59940010106070E06,,,,,1816,,,,,,,20040517164342,20040517164349,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,212,27,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20040517164342,20040517164349,0,,,,,,,,,,,,,655012500098699,,,,,,,,,,,,,,,,,,,,,,,,,,,,,END,,

Regards!
Thierry Poels_1
Honored Contributor

Re: script help

hi,

Oracle SQL*Loader is quite fast, so I doubt preprocessing of the input data would give much benefit.

- check the redo logs (check alert file for "checkpoint not complete cannot allocate new log")
- check if primary key is properly indexed / partitioned
- use direct path
- play with bindsize & readsize


good luck,
Thierry.
All unix flavours are exactly the same . . . . . . . . . . for end users anyway.
Edgar_8
Regular Advisor

Re: script help

Thiery,

Believe when I say that we have attempted all options, & this is the remaining one. As you know direct path does
not allow indexes unless of course there are no duplicates in the dat being loaded. So if we can extract the
duplicates before hand then YES we will use direct path.

Regards!
Eric Antunes
Honored Contributor

Re: script help

Hi Edgar,

The simple the solution, the best it works!

I think you should disable the constraint and do it via direct path. After having all the rows on your table delete the duplicate rows loaded today and enable the constraint!
Each and every day is a good day to learn.
Eric Antunes
Honored Contributor

Re: script help

Better yet:

If you can't disable the constraint you create a temporary table (DIRECT_PATH_TMP) with the same columns than the datawarehouse table (DATAWAREHOUSE_TABLE). Then you load the rows via direct path into this table. After that you do something like this:

insert into DATAWAREHOUSE_TABLE A
(
select * from DIRECT_PATH_TMP
where not exists (select 1 from DATAWAREHOUSE_TABLE B where B.column4 = A.column4)
);

commit;
Each and every day is a good day to learn.
Eric Antunes
Honored Contributor

Re: script help

The script has a little error on the aliases. The A alias should be at DIRECT_PATH_TMP table:

insert into DATAWAREHOUSE_TABLE
(
select * from DIRECT_PATH_TMP A
where not exists (select 1 from DATAWAREHOUSE_TABLE B where B.column4 = A.column4)
);

commit;

I hope it helped,

Antunes
Each and every day is a good day to learn.