topic compare 2 files using AWK based on conditions in Operating System - HP-UX

compare 2 files using AWK based on conditions

chinnaga — Sat, 06 Nov 2010 18:13:42 GMT

Hi,

I am struck in the middle of a problem.

I have a control file which tells me which are the fields in the files I need to compare and based on the values I need to print the exact value if key =Y and output is Y , or if output is Y/N then I need to print only Y if it matches or N if it does not match.

For ex:

my control file

key|compare_field|output
Y|Field_1|Y
N|Filed_2|Y/N
Y|Field_3|Y
N|Field_4|Y/N
N|Field_5|N
N|Field_6|Y/N

file1

field_1|feld_2|field_3|field_4|field_5|field_6
000|adbc|edfr|hjkl|890|jlk|ioy
678|jfjd|djla|uopp|678|jyh|jkl

file2

field_1|feld_2|field_3|field_4|field_5|field_6
000|adbc|edfr|hjkl|890|jlk|ioy
678|juio|djla|uopu|678|jyh|jkl

my output should be

field_1|feld_2|field_3|field_4|field_6
000|Y|edfr|Y|Y
678|N|djfr|N|Y

I can do it seperately, need your help to combine this logic.

# to copy the field names as the header in the report file.
nawk -F\| 'END {print x } $NF =="Y" || $NF == "Y\/N" { printf "%s",$2 FS >> "report_file" }' control_file

To compare the 2 files and print the output as Y or N

nawk -F'|' '{ getline x NR >1 {for(i=2;i<=NF;i++) $i=(F[i]==$i)?"Y";"N"}1' OFS="|" f=file2 file1

I can do then seperately, but I am not able to read the control file and compare the files based on the control file.

Please help me.

Thanks in Advance
Rashmi

Re: compare 2 files using AWK based on conditions

Hein van den Heuvel — Sat, 06 Nov 2010 18:48:16 GMT

For starters, you are not going to be able to do this in a one-liner.

You'll need a BEGIN {} to read the control file, and store the results in an array indexed by the compare-field name.

Unless you know the control file field names, and the column headers in the data file are garantueed to be in exact order you need to MAP those into column numbers. 'field_3' = column 3. To do that, as you read the first line of the data file you will want to create an other array for that mapping, or re-index that first array with the column numbers instead of names. And set a flag that this is done, or just use NR as you already do.

With all in plase, then in the main loop with the flag set, or NR>1, you then for each record will have to loop over each field in do what needs to be done based on the control array.

Q> where does "djfr" in the output come from?

Q> how well can the column mnames expected to be matched: field_1 =? Field_1. feld_2 =? Filed_2,...

Q> report on missing/excessive fields or ignore?

Good luck!
Hein

Re: compare 2 files using AWK based on conditions

chinnaga — Sun, 07 Nov 2010 03:14:21 GMT

Thanks for the hint.. I will try based on your input.

Thanks a lot.

Re: compare 2 files using AWK based on conditions

Dennis Handly — Mon, 08 Nov 2010 03:22:40 GMT

>Hein: You'll need a BEGIN {} to read the control file, and store the results in an array indexed by the compare-field name.

It's worse than that. There are three input files. Unless file1 and file2 are in order and have the same number of records.
If so, as you cycle through file1, you can do a getline on file2.

Re: compare 2 files using AWK based on conditions

Hein van den Heuvel — Mon, 08 Nov 2010 04:10:43 GMT

Dennis>> Unless file1 and file2 are in order and have the same number of records.

Right. That was on other question I meant to ask. Are the input file garantueed to be in lock-step; are they sorted or can they be sorted; what to do if there are 'missing' or 'duplicate' records...; is one of the inputs 'dominant'?

Q> can a single input line generate more than 1 output line ?

and I suppose in the 'BEGIN' the script might as well slurp the first input line from each data file and do the column mapping right there to keep the main processing as clean as possible.

If the example had been correct/plausible then I might have tried, but with all those variable/variants it seemed like 'real work'.

Cheers,
Hein