- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- duplicate record extraction before sqlldr
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2004 05:38 PM
тАО02-03-2004 05:38 PM
duplicate record extraction before sqlldr
We have comma separated files with thousands of records that need to be loaded in oracle 9i table
with a primary key col.Is there a way we could search through the file on a unix level against the
primary key value deleting duplicate records before the file is sql loaded into db? The CSV file has
200 records with the 49th record being the primary key, so the unix search should remove duplicate records
but also allow us to extract col1-50, any ideas?
Thanks in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2004 07:05 PM
тАО02-03-2004 07:05 PM
Re: duplicate record extraction before sqlldr
Not only will it reject duplciates, but will write them to a failure file.
If you must pre-process it at the unix level, you could try identifying the duplicates with awk.
You may run into line too long though.
Something like (not tested):
awk -F"," '
{ if ($49 in alreadyfound) {
printf ("Rejecting line %d: %s\n", $NR, $0) >> "DUPFILE"
next
}
alreadyfound[$49]=1
} ' YOURINPUTFILE > YOURUNIQUEFILE
-- Graham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2004 08:15 PM
тАО02-03-2004 08:15 PM
Re: duplicate record extraction before sqlldr
By Definition a primary-key is "NOT NULL" and "UNIQUE". So When you try to load the file using sqlldr, it will reject all the duplicates and the logs will show you clearly how many records got loaded and how many records got rejected. There will be a bad file with all the duplicate records.
You need not manually edit the csv file and eliminate/delete the duplicate records when the utility sqlldr does it automatically for you.
IA
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2004 08:41 PM
тАО02-03-2004 08:41 PM
Re: duplicate record extraction before sqlldr
We having performance issues with the current sqlldr load, its simply too slow! So we looking at attempting
to do as much work upfront.The CSV data is being loaded via conventional mode into a table which has
a primary key & 3 indexes.
Any ideas how we could improve performance & identify what is affecting performance?
Thanks in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2004 09:02 PM
тАО02-03-2004 09:02 PM
Re: duplicate record extraction before sqlldr
Some guidelines are:
Preallocate the space based on expected data volume to prevent dynamic allocation of extents during the load.
In case direct loads are used, temporary segments are used to generate indexes for the new data. These indexes are merged with the existing indexes at the end of the load.
By sorting the input data on the key fields of the largest index, can help in improving the performance.
sks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2004 10:47 PM
тАО02-03-2004 10:47 PM
Re: duplicate record extraction before sqlldr
The fastest way to load the data with sqlldr is the Direct mode.
As you may have duplicate, you must get read of them before the load itseld.
If you have sort tools such as syncsort you can easily filter the data and remove the duplicate record.
Furhtermore, you can pre-sort the data on the primary key and speed up the load.
Also the UNRECOVERABLE option will not generate redo information.
Regards,
Jean-Luc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-04-2004 12:22 AM
тАО02-04-2004 12:22 AM
Re: duplicate record extraction before sqlldr
If the total record length is short, you could try it in HP-UX using this tested script:
cut -d, -f1-50 unlfile.csv | \
sort -t, -k 49,49 | \
awk -F, '{if($49!=prev)print;prev=$49}' > dupfree.csv
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-04-2004 12:42 AM
тАО02-04-2004 12:42 AM
Re: duplicate record extraction before sqlldr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-04-2004 11:25 PM
тАО02-04-2004 11:25 PM
Re: duplicate record extraction before sqlldr
sort -u -t, -k49,49 input_file -o output_file
HTH
Rgds,
Art