Operating System - HP-UX
1753797 Members
7616 Online
108805 Solutions
New Discussion юеВ

Re: Check input file rows present or not present in output file

 
sathis kumar
Frequent Advisor

Re: Check input file rows present or not present in output file

Thanks for all your help.

Please find below the exact requirement that we have:

My i/p file looks like:
B L1983A B1N 20090701 HUECDP QBLH
B L1983A B1N 20090701 HUHFDP QBL1

My o/p file looks like:
B L1983A B1N 20090701 HUECDP +0000000000000000.00 F 20090701 01 A QBL1
B L1983A B1N 20090701 HUHFDP +0000000000000000.00 F 20090701 01 A QBL1

1) I need to compare the lines in i/p file (1-38 characters) with o/p file and if matches then for those output I need to replace the last field value in o/p file with the corresponding one in the i/p file.

ie. above output should change like:
B L1983A B1N 20090701 HUECDP +0000000000000000.00 F 20090701 01 A QBLH
B L1983A B1N 20090701 HUHFDP +0000000000000000.00 F 20090701 01 A QBL1

You could observe the last field in the first line got changed from QBL1 to QBLH (as same as the one in i/p file)

2) If some lines present in the i/p file are missing in the o/p file then those lines
need to be captured in a new file

Note: We might need to do the testing with 5000,10000,20000 and even 50,000 of lines too. Hence need to check the performance of the script execution also.
Dennis Handly
Acclaimed Contributor

Re: Check input file rows present or not present in output file

>Hence need to check the performance of the script execution also.

Are your files sorted? If not, do you care if the output is sorted?

A close upper bound on the time would be to sort both files.

sathis kumar
Frequent Advisor

Re: Check input file rows present or not present in output file

Yes, the files are sorted
Dennis Handly
Acclaimed Contributor

Re: Check input file rows present or not present in output file

>the files are sorted

Then this is a simple no brainer and the performance is linear. Just do a "simple merge" and compare the records.
Probably easy to do in C or perl. Only a little harder in awk, since two input and two output files.
Hein van den Heuvel
Honored Contributor

Re: Check input file rows present or not present in output file

If the records are sorted, and the in the same order there is not even a need to do the compare. You could just use:

$ awk '{new = $NF; getline < "b.txt"; regexp = $NF "$"; sub(regexp,new); print}' a.txt
B L1983A B1N 20090701 HUECDP +0000000000000000.00 F 20090701 01 A QBLH
B L1983A B1N 20090701 HUHFDP +0000000000000000.00 F 20090701 01 A QBL1

That remembers the last field a the line from the first file, reads the other file, replaces its last field with the one from the first and prints.


If the lines are sorted but potentially NOT equal then you will need to add some code to read along into whichever file that has fallen behind until caught up.

if file a is
10
12

and file b is
10
11
12

then the program has to skip that line 11 from b.

if file a is
10
11
13
and file b is
10
12
13
then the program needs to skip a line from file from each before processing.

Below an example of how to solve such program in awk.

Note, I used 28 instead of 38 in the example, because that's how the data showed up in the forum, and while you indicated 4 fields, you actually showed 5, so that's not to be trusted either.

Also please note how you wasted James's time by being imprecise initially.
You did NOT just need to find matching lines... for which GREP is perfect, but you also needed data from EACH provide file for which GREP is useless.

hope this helps,
Hein.

-------------- update.awk ----------------
BEGIN { a_skip = b_skip = c_lines = 0 }
{ a_match = substr($0,1,28)
a_last = $NF
while (a_match != b_match) {
if (a_match > b_match) {
b_skip++
if ((getline < "b.txt") != 1 ) { exit }
b_match = substr($0,1,28)
b_last = $NF
c = $0
}
if (a_match < b_match) {
a_skip++
if (getline != 1) { exit }
a_match = substr($0,1,28)
a_last = $NF
}
}
regexp = b_last "$"
sub (regexp, a_last, c)
print c
c_lines++
b_skip--
}
END { print c_lines " printed to C. " a_skip, " skipped from a, ", b_skip " from b." > "/dev/stderr"

}
-------------- sample execution ----------

/cygdrive/c/temp
$ awk -f update.awk < a.txt > c.txt
2 printed to C. 1 skipped from a, 0 from b.

/cygdrive/c/temp
$ cat c.txt
B L1983A B1N 20090701 HUECDP +0000000000000000.00 F 20090701 01 A QBLH
B L1983A B1N 20090701 HUHFDP +0000000000000000.00 F 20090701 01 A QBL1
Dennis Handly
Acclaimed Contributor

Re: Check input file rows present or not present in output file

>Hein: and the in the same order

Sathis said lines could be missing.

>for which GREP is perfect

grep might be terrible for 50 K records.

Here is my awk merge example with checking:

awk -v file=i_file -v err_file=err.out '
BEGIN { save = ""; EOF = 0 }
{
if (save == "") {
if (EOF || getline save < file <= 0) {
print "Missing in I file:", $0 > err_file
EOF = 1
save = ""
next
}
}
while (substr(save, 1, 28) < substr($0, 1, 28)) {
print "Missing in O file:", save > err_file
if (getline save < file <= 0) {
print "Missing in I file:", $0 > err_file
EOF = 1
save = ""
next
}
}

if (substr(save, 1, 28) == substr($0, 1, 28)) {
$NF = substr(save, 30)
print $0
save = ""
next
}
print "Missing in I file:", $0 > err_file
}
END {
if (save != "")
print "Missing in O file:", save > err_file
while (getline save < file > 0) {
print "Missing in O file:", save > err_file
}
} ' o_file