topic Re: Easy points awk question in Operating System - HP-UX

Easy points awk question

Belinda Dermody — Tue, 18 Feb 2003 14:55:33 GMT

One liner or maybe two

Input file with one field (50,000+ lines)
compare each line to another file, if the $1 matches the first field of the 2nd input file(like a grep,but awk is faster), write that 2nd file line to another file(outfile) otherwise skip to the next record in file 1.

Re: Easy points awk question

James R. Ferguson — Tue, 18 Feb 2003 15:05:23 GMT

Hi James:

How about 'comm':

# comm -3 file1 file2 > newfile

Regards!

...JRF...

Re: Easy points awk question

James R. Ferguson — Tue, 18 Feb 2003 15:06:44 GMT

Hi (again):

Ooops, I think you wanted the inverse:

# comm -12 file1 file2 > newfile

Regards!

...JRF...

Re: Easy points awk question

H.Merijn Brand (procura — Tue, 18 Feb 2003 15:22:03 GMT

You need 'join'. Both files need to be sorted. Join is made for this purpose.

Enjoy, have FUN! H.Merijn

Re: Easy points awk question

Belinda Dermody — Tue, 18 Feb 2003 15:32:51 GMT

Jim, neither one gives me the correct results

comm -3 > file3 gives me all the lines of file and file2 in file3.
comm -2 > file3 gives me nothing.
File one will always have one field and that one field will either be in file2 or not, if in file2 put it in file file

sample file1
abc@abc.om
def@def.com # will not be in file2
xyz@xyz.com
etc..
etc..

sampe file2
abc@abc.com abc123.xyz.net
xyz@xyz.com axz999.yahoo.com
etc..... etc.....

expected file3
abc@abc.om
xyz@xyz.com

Re: Easy points awk question

Leif Halvarsson_2 — Tue, 18 Feb 2003 15:37:47 GMT

Hi,
Try:
join -1 1 -2 1 -o 2.1,2.2,2.3

Re: Easy points awk question

Sridhar Bhaskarla — Tue, 18 Feb 2003 15:53:46 GMT

Hi James,

first file = file1
second file = file2

while read entry
do
/usr/bin/awk -v value=$entry '
$1==value {print $0}' file2
done < file1

-Sri

Re: Easy points awk question

Belinda Dermody — Tue, 18 Feb 2003 15:54:22 GMT

I thought this would be easy not correct yet.

I thought Leif had the answer. But as an example

abc@abc.com from file1 (master listing) was in file1 85 times, but using Leif it gave me a count of only four times.

And Jim, this is the first time that you have unable to provide the correct result the first response.

Re: Easy points awk question

James R. Ferguson — Tue, 18 Feb 2003 16:00:28 GMT

Hi (again) James:

OK, when you said "line" I took you literally. Merijn (Procura) offered the better command. Try this:

# join -a2 file1 file2 > newfile

Regards!

...JRF...

Re: Easy points awk question

Dietmar Konermann — Tue, 18 Feb 2003 16:06:21 GMT

Pure (somewhat upgly) awk... slow, but should work with unsorted files.

awk '
{
while (getline line < "file2") {
split (line, l);
if ($1 == l[1])
print ($1);
}
close ("file2");
}' < file1 > file3

If you need the complete "2nd file line" in the output, replace print ($1) with print (line).

Best regards...
Dietmar.

Re: Easy points awk question

Rodney Hills — Tue, 18 Feb 2003 16:20:24 GMT

If the second file has unique entries, then here is a short perl program (substitute your 2nd file name for "lookupfile")-

open(INP,"while() { chomp; ($key,$rest)=split(" ",$_,2); $lu{$key}=$_; }
close(INP);
while(<>) { chomp; print $lu{$key},"\n" if $lu{$key}; }

Run by entering-
perl aboveprogram.pl firstfile

HTH

-- Rod Hills

Re: Easy points awk question

Rodney Hills — Tue, 18 Feb 2003 16:27:56 GMT

Whoops-

On the last line of the perl program use "$_" instead of "$key".

-- Rod Hills

Re: Easy points awk question

Leif Halvarsson_2 — Tue, 18 Feb 2003 16:28:15 GMT

Hi,
The two files has to be sorted on the join fild before using "join".

Re: Easy points awk question

Stanimir — Tue, 18 Feb 2003 16:52:22 GMT

Hi!
Try:

for unm in `awk -F: '{print $0}' ;
do
awk -v v1=$unm 'match($0,v1){print substr($0,RSTART,RLENGTH)}' >
done

- input file
- 2nd input file
- result

Regards.

Re: Easy points awk question

Belinda Dermody — Tue, 18 Feb 2003 19:07:54 GMT

Thanks for all the tries, nothing has come close to giving the correct results. The awk statements keep on bailing out on line 1; the good old helpful error msg.

The join doesnt come even close to separating the files correctly, I sorted the two input files. I guess I will have to try to Perl option next.

But once again thanks for all the response and suggestions. I will assign points as soon as I double check my work and make sure I have had any finger checks on typing in.

Re: Easy points awk question

Dietmar Konermann — Tue, 18 Feb 2003 19:24:52 GMT

Your awk error is not caused by the posted scripts... just checked some of them. Maybe a copy/paste problem?

Re: Easy points awk question

Belinda Dermody — Tue, 18 Feb 2003 20:25:13 GMT

Thanks Dietmar; found a missing closing ', but still no output
#!/bin/sh -xv

awk '
{
while (getline line < tmpusertable) {
split (line,l);
if($1 == l[1])
print ($1);
}
close (tmpusertable);
}' < work6 > work7

This way it runs 5 seconds and work7 is empty.

work6 has 24,000 lines and tmpusertable has 250,000 lines and I expect work7 to have about 18,000 lines.

If I put " " around tmpusertable like you have in the example it runs, but work7 is nothing but empty lines, I killed it after 32,000

Re: Easy points awk question

john korterman — Tue, 18 Feb 2003 20:28:42 GMT

Hi James,
please try the attached semi-awk script, but first replace and with the paths to your input files.

regards,
John K.

Re: Easy points awk question

Leif Halvarsson_2 — Tue, 18 Feb 2003 20:28:50 GMT

Hi,
I am not sure I have understand you correct but I used your example and added some lines to file1 (xxx in my test).

# cat xxx
abc@abc.com
def@def.com
xyz@xyz.com
abc@abc.com
abc@abc.com
abc@abc.com
abc@abc.com
#
# cat yyy
abc@abc.com abc123.xyz.net
xyz@xyz.com axz999.yahoo.com

# sort xxx >zzz
# join -1 1 -2 1 -o 2.1 zzz yyy
abc@abc.com
abc@abc.com
abc@abc.com
abc@abc.com
abc@abc.com
xyz@xyz.com

Of course yyy need to be sorted too but in the example this was already done.

Re: Easy points awk question

Belinda Dermody — Tue, 18 Feb 2003 20:34:39 GMT

The problem is there might be address in file xxx that does not match either one of the addresses on the line in file yyy and if so I do not want them in the report.
File xxx is addresses coming in. File yyy has the incoming address and a possible forwarding addres.

Your lines of xxx always match something in yyy