- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Compare 2 files and remove duplicates- PERL
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-29-2007 01:25 PM
тАО03-29-2007 01:25 PM
Compare 2 files and remove duplicates- PERL
I have 2 files a & b. I need to compare these 2 files and remove all the entry from file 'b' which is same as that of file 'a' using PERL script
Can anyone please help me do this.
Thanks
Anand
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-29-2007 01:49 PM
тАО03-29-2007 01:49 PM
Re: Compare 2 files and remove duplicates- PERL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-29-2007 01:55 PM
тАО03-29-2007 01:55 PM
Re: Compare 2 files and remove duplicates- PERL
You really want to ask yourself lots of questions on the quality and quantity of the data
- Megabytes or Gigabytes?
- Identifyable key field?
- Sorted
- Any performance consideration?
- Once only, or repeateable and this in need of serious error handling.
Anyway. With the terse question provided i believe the answer is:
$ cat > a
aap
noot
mies
teun
$ cat > b
noot
vuur
kees
mies
$ grep -v -f b a
aap
teun
$ grep -v -f a b
vuur
kees
$ perl -e 'open A,shift; foreach (){$a{$_}++}; open B,shift; foreach (){print unless $a{$_}}' b a
aap
teun
$ perl -e 'open A,shift; foreach (){$a{$_}++}; open B,shift; foreach (){print unless $a{$_}}' a b
vuur
kees
$
Perl formatted
open A,shift; # open file A, using first element from @ARGV
foreach () # loop over file A
{$a{$_}++}; # use each record as key in associative array, incrementing the element (making it true)
} # end loop
open B,shift; # open next file (needs and 'or die'
foreach (){ # loop over next
print unless $a{$_} # print... unless array element with key from A is true (exists)
} # done
Cheers,
Hein van den Heuvel
HvdH Performance Consulting
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-29-2007 03:47 PM
тАО03-29-2007 03:47 PM
Re: Compare 2 files and remove duplicates- PERL
$ comm -13 a b
I just realized, if "b" has 2 lines that are exactly the same as "a", only one will be removed from the above command.
If "a" is small, "grep -vxf a b" would work.
(Hein should have used -x to match the whole line, when excluding.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-29-2007 04:00 PM
тАО03-29-2007 04:00 PM
Re: Compare 2 files and remove duplicates- PERL
Wait a minute. You were the one chiding people to be careful here with grep. :-)
I just copied that logic.
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1113590
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-29-2007 11:49 PM
тАО03-29-2007 11:49 PM
Re: Compare 2 files and remove duplicates- PERL
In my defense, allthough the data is not described at all there is a suggestion that this is table-ish data, or something like a profile where full lines of data can be expected. But indeed, without a -x any bad input in file a, lets say just the letter 'a' can whipe out whole sections of file b.
I was thinking in perl terms where $_ has the data with terminator, forcing whole-line compares.
You are also correct on the BNAME/SUFFIX confusion in the other topic you point to.
Cheers,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2007 05:44 AM
тАО03-30-2007 05:44 AM
Re: Compare 2 files and remove duplicates- PERL
The 2 files are actually very large each containing about 80 MB of data. Both the files contain some IDs. The data is not sorted.
I want to know the IDs that are in the first file but not present in the second file.
Thanks,
Anand
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2007 06:17 AM
тАО03-30-2007 06:17 AM
Re: Compare 2 files and remove duplicates- PERL
[the -x is optional in the case. Right Dennis? :-) :-]
80MB would be a little more than I'd like to feed perl to remember, but it should work.
The best solution is probably to simply sort each file and use 'comm'. See man comm
#comm -23 a.sorted b.sorted
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2007 06:19 AM
тАО03-30-2007 06:19 AM
Re: Compare 2 files and remove duplicates- PERL
moral:
- read the man page carefully
- trust but verify on a small file set
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-30-2007 07:53 AM
тАО03-30-2007 07:53 AM
Re: Compare 2 files and remove duplicates- PERL
that's good:
>>
Both the files contain some IDs.
<<
You forgot to describe
- how to identify an ID
- if it's sufficient to extract the IDs of one file only
- if an ID is only part of a line or a whole line
- if checking vive versa is required
- what to to with lines NOT containing an ID
Assuming an ID is a string
IDnnn (n=0..9)
you can extract lines containing such an ID via
grep 'ID[0-9][0-9][0-9]' file_a
This is just a start - but do the five parts above of your homework first.
mfG Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-31-2007 09:44 PM
тАО03-31-2007 09:44 PM
Re: Compare 2 files and remove duplicates- PERL
No, Anand changed what he wanted and it is now "comm -23".
>I want to know the IDs that are in the first file but not present in the second file.
As Peter asked, if your two files don't have the exact record content for the same "key", you can't use the grep or comm solutions directly.
You must extract the keys from the two files, with awk, sort the two key files. Then you can use "comm -23" to give you all of the keys that are in 1 but not 2.