HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Need a perl script for below mentioned requirement...
Operating System - HP-UX
1834459
Members
2782
Online
110067
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2006 01:41 AM
04-08-2006 01:41 AM
Need a perl script for below mentioned requirement.
there are two ascii files say with same header (column values)
old ascii file:
column1||column2|column3|column4
new ascii file:
column1|column2|column3|column4
both the ascii files contains millions of records.
here i need to compare both the files ..
onething to say that old ascii file fourth column contains all the junk data values and in the new ascii file fourth column contains the proper data.
Here i need to compare both the files and give the comparison report saying that all of the other three columns values in both are matching or not. excluding the fourth column values as cannot compare the junk data with the proper data anyway in this scenario all of them will fail.
can we exclude the fourt column and test with the remaining three column values to compare both new and old ascii file.
Thanks in advance
Swetha
old ascii file:
column1||column2|column3|column4
new ascii file:
column1|column2|column3|column4
both the ascii files contains millions of records.
here i need to compare both the files ..
onething to say that old ascii file fourth column contains all the junk data values and in the new ascii file fourth column contains the proper data.
Here i need to compare both the files and give the comparison report saying that all of the other three columns values in both are matching or not. excluding the fourth column values as cannot compare the junk data with the proper data anyway in this scenario all of them will fail.
can we exclude the fourt column and test with the remaining three column values to compare both new and old ascii file.
Thanks in advance
Swetha
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2006 03:09 AM
04-08-2006 03:09 AM
Re: Need a perl script for below mentioned requirement.
q1) is the field seperator indeed a pipe "|"
q2) are the files pre-sorted.
q3) can one column, or column combination be treated as a (unique) key and the other(s) as data values?
q4) any chance one file would fit in memory?
If so, just read one file entirely and store into a hash, either keyed by the unique key columns or by the whole record and count occurences. Next read the other file and lookup in first. If found, subtrace count or delete hash entry. If not found report. At end of second file report remaining (non-zero) entries in hash from first file.
q5) ever checked out perl and (DBM) databases?
Read this for example, and google for more: http://www.unix.org.ua/orelly/perl/perlnut/ch12_01.htm
Here is what I would do though...
For each file, cut first 3 columns, pipe into sort and save.
Now 'diff', sdiff, comm or uniq the 3-columns-and-sorted work files to your hearts content.
Expirement with 100 or so records in each first of course.
Enjoy!
Hein.
q2) are the files pre-sorted.
q3) can one column, or column combination be treated as a (unique) key and the other(s) as data values?
q4) any chance one file would fit in memory?
If so, just read one file entirely and store into a hash, either keyed by the unique key columns or by the whole record and count occurences. Next read the other file and lookup in first. If found, subtrace count or delete hash entry. If not found report. At end of second file report remaining (non-zero) entries in hash from first file.
q5) ever checked out perl and (DBM) databases?
Read this for example, and google for more: http://www.unix.org.ua/orelly/perl/perlnut/ch12_01.htm
Here is what I would do though...
For each file, cut first 3 columns, pipe into sort and save.
Now 'diff', sdiff, comm or uniq the 3-columns-and-sorted work files to your hearts content.
Expirement with 100 or so records in each first of course.
Enjoy!
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2006 07:41 PM
04-08-2006 07:41 PM
Re: Need a perl script for below mentioned requirement.
q1) is the field seperator indeed a pipe "|"
yes
q2) are the files pre-sorted.
yes
For each file, cut first 3 columns, pipe into sort and save.
Now 'diff', sdiff, comm or uniq the 3-columns-and-sorted work files to your hearts content.
Expirement with 100 or so records in each first of course.
if it is like 2 column. means 1,3 and 4 i need comparision.
example:
first file:
col1|col2|col3|col4
2|3|ad|2
2|2|addad|2
2|23|dsdsfd|3
second file :
col1|col2|col3|col4
2|3|3|2
2|2|3|2
2|23|3|3
on comparison it should not check the column 3 values. as the first file 3rd column value contains all the junk data and second one contains the real data. in this case if compare all of the rows get failed. we need to aviod this column.
Output should be like this:
Total number of records: XXX
Total number of matched records :xxx
Total number of unmatched records: xxx
yes
q2) are the files pre-sorted.
yes
For each file, cut first 3 columns, pipe into sort and save.
Now 'diff', sdiff, comm or uniq the 3-columns-and-sorted work files to your hearts content.
Expirement with 100 or so records in each first of course.
if it is like 2 column. means 1,3 and 4 i need comparision.
example:
first file:
col1|col2|col3|col4
2|3|ad|2
2|2|addad|2
2|23|dsdsfd|3
second file :
col1|col2|col3|col4
2|3|3|2
2|2|3|2
2|23|3|3
on comparison it should not check the column 3 values. as the first file 3rd column value contains all the junk data and second one contains the real data. in this case if compare all of the rows get failed. we need to aviod this column.
Output should be like this:
Total number of records: XXX
Total number of matched records :xxx
Total number of unmatched records: xxx
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-09-2006 10:04 AM
04-09-2006 10:04 AM
Re: Need a perl script for below mentioned requirement.
It's a good thing you showed that sample data.
For starters... that data is NOT sorted.
"2|3" does not come before "2|2" best I know.
Next, even if it were sorted, it is not 'obvious' that you intended numeric comparisons.
So the script not only has to pick apart the records, but compare based on field type, or normalize, and then compare.
I had a sample compare script floating around which I adapted to your needs.
See below, and attached.
Be sure to try it on small files first.
Good luck,
Hein.
use strict;
my ($f1, $f2, $key_1, $key_2, $common, $file_1, $file_2, $read_1, $read_2);
$f1 = @ARGV[0];
$f2 = @ARGV[1] or die "Must provide two files to compare";
open (F1, $f1) or die "Error open file 1: $f1";
open (F2, $f2) or die "Error open file 2: $f2";
my $high = "999999999|999999999|x|999999999";
; ; # skip headers
$common = $file_1 = $file_2 = 0;
while (1) {
# compare keys, and flag which files to read.
if ($key_1 eq $key_2) {
$common++;
$read_1 = $read_2 = 1;
} else {
if ($key_1 lt $key_2) {
print "F1: $key_1\n";
$file_1++;
$read_1++;
} else {
print "F2: $key_2\n";
$file_2++;
$read_2++;
}
}
# read from file with lower key value, or both if equal.
if ($read_1) {
$read_1 = 0;
$_ =;
if (eof) {
last if eof F2;
$_ = $high;
}
# $key_1 = join (':',(split /\|/)[0,1,3]);
$key_1 = sprintf ("%09d:%09d:%09d",(split /\|/)[0,1,3]);
# print "1: $key_1\n";
}
if ($read_2) {
$read_2 = 0;
$_ =;
if (eof) {
last if eof F1;
$_ = $high;
}
# $key_2 = join (':',(split /\|/)[0,1,3]);
$key_2 = sprintf ("%09d:%09d:%09d",(split /\|/)[0,1,3]);
# print "2: $key_2\n";
}
}
print "\n$common common records.\n";
print "$file_1 only in $f1\n" if $file_1;
print "$file_2 only in $f2\n" if $file_2;
print "No differences found\n" unless ($file_1 or $file_2);
For starters... that data is NOT sorted.
"2|3" does not come before "2|2" best I know.
Next, even if it were sorted, it is not 'obvious' that you intended numeric comparisons.
So the script not only has to pick apart the records, but compare based on field type, or normalize, and then compare.
I had a sample compare script floating around which I adapted to your needs.
See below, and attached.
Be sure to try it on small files first.
Good luck,
Hein.
use strict;
my ($f1, $f2, $key_1, $key_2, $common, $file_1, $file_2, $read_1, $read_2);
$f1 = @ARGV[0];
$f2 = @ARGV[1] or die "Must provide two files to compare";
open (F1, $f1) or die "Error open file 1: $f1";
open (F2, $f2) or die "Error open file 2: $f2";
my $high = "999999999|999999999|x|999999999";
$common = $file_1 = $file_2 = 0;
while (1) {
# compare keys, and flag which files to read.
if ($key_1 eq $key_2) {
$common++;
$read_1 = $read_2 = 1;
} else {
if ($key_1 lt $key_2) {
print "F1: $key_1\n";
$file_1++;
$read_1++;
} else {
print "F2: $key_2\n";
$file_2++;
$read_2++;
}
}
# read from file with lower key value, or both if equal.
if ($read_1) {
$read_1 = 0;
$_ =
if (eof) {
last if eof F2;
$_ = $high;
}
# $key_1 = join (':',(split /\|/)[0,1,3]);
$key_1 = sprintf ("%09d:%09d:%09d",(split /\|/)[0,1,3]);
# print "1: $key_1\n";
}
if ($read_2) {
$read_2 = 0;
$_ =
if (eof) {
last if eof F1;
$_ = $high;
}
# $key_2 = join (':',(split /\|/)[0,1,3]);
$key_2 = sprintf ("%09d:%09d:%09d",(split /\|/)[0,1,3]);
# print "2: $key_2\n";
}
}
print "\n$common common records.\n";
print "$file_1 only in $f1\n" if $file_1;
print "$file_2 only in $f2\n" if $file_2;
print "No differences found\n" unless ($file_1 or $file_2);
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP