1827876 Members
1456 Online
109969 Solutions
New Discussion

Re: Need help

 
suchitra
Occasional Contributor

Need help

I have a problem with comparing the columns of 2 files and write a status like add, modified or delete in the outfile along with the records of these 2 file.
For ex :

consider my 1st file has say
file1
==========
p1|y|500
p2|n|500

file2
======
p1| n| 500
p3 | y|501

now my output file should be like this

output file
=========
p1 | n| 500 |c b'cas it has been changed from y to n in the 1st file
p2| n |500 | d b'cas p2 is deleted in the 2nd file
p3 | y| 501 | a b'cas p3 is added in the 2nd file

could you please help me out. I want either a shell script or a awk command .


14 REPLIES 14
Muthukumar_5
Honored Contributor

Re: Need help

Use this:

awk -F"\|" '{ var=$0;var1=$1;var2=$2;var3=$3;getline < "file2";split($0,a,"|"); if ( a[1] == var1 ) { if ( a[2] != var2 ) { print var"## c bcas "var2" is changed to "a[2];} if ( a[3] != var3 ) { print var"## c bcas "var3" is changed in 2nd file";} }
else { print var"## d bcas "var1" is deleted in 2nd file";print $0" ## a bcas "a[1]" is deleted in 2nd file";}}' file1

--
Muthu
Easy to suggest when don't know about the problem!
Steve Steel
Honored Contributor

Re: Need help

Hi

Look at the comm command

comm - select or reject lines common to two sorted files

And www.shelldorado.com


Steve Steel
If you want truly to understand something, try to change it. (Kurt Lewin)
Peter Godron
Honored Contributor

Re: Need help

Suchitra,
my ititial solution:

#!/usr/bin/sh
echo "Changed"
echo "`join -j 1 -t '|' -o 1.1 2.2 1.3 file1 file2`|c"
echo "Deleted"
cut -f1 -d '|' file1 > file1.bck
cut -f1 -d '|' file2 > file2.bck
grep `comm -23 file1.bck file2.bck` file1 > file1.res
sed "1,$ s/$/|d/" file1.res
rm file1.res
echo "Added"
grep `comm -13 file1.bck file2.bck` file2 > file2.res
sed "1,$ s/$/|a/" file2.res
rm file2.res
rm file1.bck
rm file2.bck
Peter Godron
Honored Contributor

Re: Need help

Muthukumar,
very smooth script!
The first "print var" could be replaced by:
print a[1]"|"a[2]"|"a[3]
to pick up the file2 values, rather than file1.
Senthil Kumar .A_1
Honored Contributor

Re: Need help

Hi,

I have attached a script that uses "comm" command for your situation.

Regards,
Senthil Kumar .A
Let your effort be such, the very words to define it, by a layman - would sound like a "POETRY" ;)
Muthukumar_5
Honored Contributor

Re: Need help

Peter,

To avoid to print a whole line as a[1]"|"a[2]"|"a[3], I have stored in a separate variable. It is help ful to simply script ;)

--
Muthu
Easy to suggest when don't know about the problem!
Peter Godron
Honored Contributor

Re: Need help

Suchitra,
something else to keep in mind is that any script using comm would only work on sorted files.

Also, are there any duplicate keys like:
p1|y|500
p1|n|300
.
.

Muthukumar_5
Honored Contributor

Re: Need help

Using perl:

#!/usr/bin/perl

open FD1,"file1" || die "Open Error: $!";
open FD2,"file2" || die "Open Error: $!";

@arr1=;
@arr2=;

for ($i=0;$i<@arr1;$i++)
{
@pat1=split (/\|/,$arr1[$i]);
@pat2=split (/\|/,$arr2[$i]);

$arr1[$i]=~chomp($arr1[$i]);
$arr2[$i]=~chomp($arr2[$i]);

if ( $pat1[0] eq $pat2[0] )
{
if ( $pat1[1] ne $pat2[1] )
{

print "$arr1[$i] | c b'cas it has been changed from $pat1[1] to $pat2[1] in field 2 in 2nd file\n";
}
if ( $pat1[2] ne $pat2[2] )
{
print "$arr1[$i] | c b'cas it has been changed from $pat1[2] to $pat2[2] in field 3 in 2nd file\n";
}

}
else
{
print "$arr1[$i] | d b'cas $pat1[0] is deleted in 2nd file\n";
print "$arr2[$i] | a b'cas $pat1[0] is added in 2nd file\n";
}

}

# END

--
Muthu
Easy to suggest when don't know about the problem!
Peter Godron
Honored Contributor

Re: Need help

Suchitra,
do these answers solve your problem?
Can you please have a look at:
http://forums1.itrc.hp.com/service/forums/helptips.do?#28
and then update the record.
Hein van den Heuvel
Honored Contributor

Re: Need help

Hmmm, Muthu... I fail to see how you can solve the problem described with the simple array comparison you suggest. It seems clear to me that any solution needs to focus on the first, 'key' field.
How else can one decided whether a a new record appeared in the same place where an old record was deleted?

Anyway...

For a large file, it probably needs to be pre-sorted and two two files read simultaneously comparing key values to keep then in sync.
I presented one example of this is in:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=999120


For small files you can just 'slurp' them into a perl associative array, and report based on keys.

Here is an example that only compares the first non-field. It is easily adapted to compare other fields, or just everything except the key.

------ compare.pl ---------
$file = shift;
open (FILE, "<$file") or die "Failed to open first file: $file.";
while () {
chomp;
($key,$flag,$num) = split (/\|/, $_);
print "$. $key,$flag,$num\n";
$f1_flag{$key} = $flag;
$f1_num{$key} = $num;
}

$file = shift;
open (FILE, "<$file") or die "Failed to open second file: $file.";
while () {
chomp;
($key,$flag,$num) = split (/\|/, $_);
print "$. $key,$flag,$num\n";
$f2_flag{$key} = $flag;
$f2_num{$key} = $num;
}

for $key (sort keys %f2_flag) {
$change = "=";
if (defined $f1_flag{$key}) {
$change = "c" if ($f1_flag{$key} ne $f2_flag{$key});
delete $f1_flag{$key};
} else {
$change = "a";
}
print "$key|$f2_flag{$key}|$f2_num{$key}| $change\n"
}

for $key (sort keys %f1_flag) {
print "$key|$f1_flag{$key}|$f1_num{$key}| d\n"
}

---- usage example ----

C:\Temp>type file1.tmp
p1|y|500
p2|n|500
p5|y|500
C:\Temp>type file2.tmp
p1|n|500
p3|y|501
p5|y|500
C:\Temp>perl tmp.pl file1.tmp file2.tmp
p1|n|500 | c
p3|y|501 | a
p5|y|500 | =
p2|n|500 | d


if this input is treated as 'key' and everything else then it simplyfies some

------- compare_2.pl ----------

$file = shift;
open (FILE, "<$file") or die "Failed to open first file: $file.";
while () {
chomp;
$f1{$`} = $' if /\|/;
}

$file = shift;
open (FILE, "<$file") or die "Failed to open second file: $file.";
while () {
chomp;
$f2{$`} = $' if /\|/;
}

for $key (sort keys %f2) {
$change = "=";
if (defined $f1{$key}) {
$change = "c" if ($f1{$key} ne $f2{$key});
delete $f1{$key};
} else {
$change = "a";
}
print "$key|$f2{$key}| $change\n"
}

for $key (sort keys %f1) {
print "$key|$f1{$key}| d\n"
}

Hein van den Heuvel
Honored Contributor

Re: Need help

Mind you, Muthu's script may be great IF the data follows strict patterns. But I do not think it works as requested. For example with a single deleted line:

C:\Temp>type file1.tmp
p1|y|500
p2|n|500
p3|y|500
p4|y|500
p5|y|500
C:\Temp>type file2.tmp
p1|n|500
p3|y|500
p4|y|500
p5|y|500
C:\Temp>perl test.pl
p2|n|500 | d b'cas p2 is deleted in 2nd file
p3|y|500 | a b'cas p2 is added in 2nd file
p3|y|500 | d b'cas p3 is deleted in 2nd file
p4|y|500 | a b'cas p3 is added in 2nd file
p4|y|500 | d b'cas p4 is deleted in 2nd file
p5|y|500 | a b'cas p4 is added in 2nd file
p5|y|500 | d b'cas p5 is deleted in 2nd file
| a b'cas p5 is added in 2nd file

Using the script I suggest:

p1|n|500 | c
p3|y|500 | =
p4|y|500 | =
p5|y|500 | =
p2|n|500 | d

Of course my method will fail if the order is critical, and not this column 1 value.

If the "=" lines are not desirable, then change the code to make the print conditional:
print "$key|$f2{$key}| $change\n" unless $change eq "=";

or re-arrange the core look some. For example:

for $key (sort keys %f2) {
$change = "c";
if ($x = $f1{$key}) {
delete $f1{$key};
next if ($x eq $f2{$key});
} else {
$change = "a";
}
print "$key|$f2{$key}| $change\n"
}


Result for last input example:

C:\Temp>perl compare.pl file1.tmp file2.tmp
p1|n|500 | c
p2|n|500 | d


Ok, lunch break over, back to real work...
:-)

Hein.



Sandman!
Honored Contributor

Re: Need help

Hi Suchitra,

I'ave pasted a shell script below that satisfies the requirements for parsing and filtering input files according to your criteria:

========================myparser.sh========================
#!/bin/sh

set -a

InFile1=f1
InFile2=f2
#
SortFile1=f1s
SortFile2=f2s
#
OutFile=outfile

# Zero out the outputfile
# before input processing
cat /dev/null > $OutFile

# Sort both file 1 and 2 on the first field
# using the vertical bar as field separator
sort -t"|" -k1 $InFile1 > $SortFile1
sort -t"|" -k1 $InFile2 > $SortFile2

# Filter out lines common to both files
# and print out those that have changed
join -t"|" $SortFile1 $SortFile2 | awk -F"|" '
BEGIN {OFS="|"}
{if($2!=$4)print $1,$4,$NF,"c b'\''cas "$1" changed from "$2" to "$4" in 2nd file"}' >>$OutFile

# Print out all the unmatched lines in
# sorted file 1 and flag them as deleted
join -t"|" -v1 $SortFile1 $SortFile2 | awk -F"|" '
BEGIN{OFS="|"} {print $0,"d b'\''cas "$1" is deleted in 2nd file"}' >>$OutFile

# Print out all the unmatched lines in
# sorted file 2 and flag them as added
join -t"|" -v2 $SortFile1 $SortFile2 | awk -F"|" '
BEGIN{OFS="|"} {print $0,"a b'\''cas "$1" is added in 2nd file"}' >>$OutFile
===========================================================

Copy the code into a file name of your choice, customize the environment (InFile, SortFile etc.) to your system, make it executable and run it at the command line.

hope it helps!
Sandman!
Honored Contributor

Re: Need help

Matter of fact the code might be easier to understand (as well as copy 'n paste) if attached instead of pasted...so click on the attachment for the shell script.

cheers!
suchitra
Occasional Contributor

Re: Need help

Thanks a lot guys ... I got the solution I wanted. It was a great help from all of you guys.
Thanks a lot.