1752736 Members
5779 Online
108789 Solutions
New Discussion юеВ

Re: Need help

 
Hein van den Heuvel
Honored Contributor

Re: Need help

Hmmm, Muthu... I fail to see how you can solve the problem described with the simple array comparison you suggest. It seems clear to me that any solution needs to focus on the first, 'key' field.
How else can one decided whether a a new record appeared in the same place where an old record was deleted?

Anyway...

For a large file, it probably needs to be pre-sorted and two two files read simultaneously comparing key values to keep then in sync.
I presented one example of this is in:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=999120


For small files you can just 'slurp' them into a perl associative array, and report based on keys.

Here is an example that only compares the first non-field. It is easily adapted to compare other fields, or just everything except the key.

------ compare.pl ---------
$file = shift;
open (FILE, "<$file") or die "Failed to open first file: $file.";
while () {
chomp;
($key,$flag,$num) = split (/\|/, $_);
print "$. $key,$flag,$num\n";
$f1_flag{$key} = $flag;
$f1_num{$key} = $num;
}

$file = shift;
open (FILE, "<$file") or die "Failed to open second file: $file.";
while () {
chomp;
($key,$flag,$num) = split (/\|/, $_);
print "$. $key,$flag,$num\n";
$f2_flag{$key} = $flag;
$f2_num{$key} = $num;
}

for $key (sort keys %f2_flag) {
$change = "=";
if (defined $f1_flag{$key}) {
$change = "c" if ($f1_flag{$key} ne $f2_flag{$key});
delete $f1_flag{$key};
} else {
$change = "a";
}
print "$key|$f2_flag{$key}|$f2_num{$key}| $change\n"
}

for $key (sort keys %f1_flag) {
print "$key|$f1_flag{$key}|$f1_num{$key}| d\n"
}

---- usage example ----

C:\Temp>type file1.tmp
p1|y|500
p2|n|500
p5|y|500
C:\Temp>type file2.tmp
p1|n|500
p3|y|501
p5|y|500
C:\Temp>perl tmp.pl file1.tmp file2.tmp
p1|n|500 | c
p3|y|501 | a
p5|y|500 | =
p2|n|500 | d


if this input is treated as 'key' and everything else then it simplyfies some

------- compare_2.pl ----------

$file = shift;
open (FILE, "<$file") or die "Failed to open first file: $file.";
while () {
chomp;
$f1{$`} = $' if /\|/;
}

$file = shift;
open (FILE, "<$file") or die "Failed to open second file: $file.";
while () {
chomp;
$f2{$`} = $' if /\|/;
}

for $key (sort keys %f2) {
$change = "=";
if (defined $f1{$key}) {
$change = "c" if ($f1{$key} ne $f2{$key});
delete $f1{$key};
} else {
$change = "a";
}
print "$key|$f2{$key}| $change\n"
}

for $key (sort keys %f1) {
print "$key|$f1{$key}| d\n"
}

Hein van den Heuvel
Honored Contributor

Re: Need help

Mind you, Muthu's script may be great IF the data follows strict patterns. But I do not think it works as requested. For example with a single deleted line:

C:\Temp>type file1.tmp
p1|y|500
p2|n|500
p3|y|500
p4|y|500
p5|y|500
C:\Temp>type file2.tmp
p1|n|500
p3|y|500
p4|y|500
p5|y|500
C:\Temp>perl test.pl
p2|n|500 | d b'cas p2 is deleted in 2nd file
p3|y|500 | a b'cas p2 is added in 2nd file
p3|y|500 | d b'cas p3 is deleted in 2nd file
p4|y|500 | a b'cas p3 is added in 2nd file
p4|y|500 | d b'cas p4 is deleted in 2nd file
p5|y|500 | a b'cas p4 is added in 2nd file
p5|y|500 | d b'cas p5 is deleted in 2nd file
| a b'cas p5 is added in 2nd file

Using the script I suggest:

p1|n|500 | c
p3|y|500 | =
p4|y|500 | =
p5|y|500 | =
p2|n|500 | d

Of course my method will fail if the order is critical, and not this column 1 value.

If the "=" lines are not desirable, then change the code to make the print conditional:
print "$key|$f2{$key}| $change\n" unless $change eq "=";

or re-arrange the core look some. For example:

for $key (sort keys %f2) {
$change = "c";
if ($x = $f1{$key}) {
delete $f1{$key};
next if ($x eq $f2{$key});
} else {
$change = "a";
}
print "$key|$f2{$key}| $change\n"
}


Result for last input example:

C:\Temp>perl compare.pl file1.tmp file2.tmp
p1|n|500 | c
p2|n|500 | d


Ok, lunch break over, back to real work...
:-)

Hein.



Sandman!
Honored Contributor

Re: Need help

Hi Suchitra,

I'ave pasted a shell script below that satisfies the requirements for parsing and filtering input files according to your criteria:

========================myparser.sh========================
#!/bin/sh

set -a

InFile1=f1
InFile2=f2
#
SortFile1=f1s
SortFile2=f2s
#
OutFile=outfile

# Zero out the outputfile
# before input processing
cat /dev/null > $OutFile

# Sort both file 1 and 2 on the first field
# using the vertical bar as field separator
sort -t"|" -k1 $InFile1 > $SortFile1
sort -t"|" -k1 $InFile2 > $SortFile2

# Filter out lines common to both files
# and print out those that have changed
join -t"|" $SortFile1 $SortFile2 | awk -F"|" '
BEGIN {OFS="|"}
{if($2!=$4)print $1,$4,$NF,"c b'\''cas "$1" changed from "$2" to "$4" in 2nd file"}' >>$OutFile

# Print out all the unmatched lines in
# sorted file 1 and flag them as deleted
join -t"|" -v1 $SortFile1 $SortFile2 | awk -F"|" '
BEGIN{OFS="|"} {print $0,"d b'\''cas "$1" is deleted in 2nd file"}' >>$OutFile

# Print out all the unmatched lines in
# sorted file 2 and flag them as added
join -t"|" -v2 $SortFile1 $SortFile2 | awk -F"|" '
BEGIN{OFS="|"} {print $0,"a b'\''cas "$1" is added in 2nd file"}' >>$OutFile
===========================================================

Copy the code into a file name of your choice, customize the environment (InFile, SortFile etc.) to your system, make it executable and run it at the command line.

hope it helps!
Sandman!
Honored Contributor

Re: Need help

Matter of fact the code might be easier to understand (as well as copy 'n paste) if attached instead of pasted...so click on the attachment for the shell script.

cheers!
suchitra
Occasional Contributor

Re: Need help

Thanks a lot guys ... I got the solution I wanted. It was a great help from all of you guys.
Thanks a lot.