Operating System - HP-UX
1834636 Members
3656 Online
110069 Solutions
New Discussion

Re: Easy points awk question

 
SOLVED
Go to solution
Belinda Dermody
Super Advisor

Easy points awk question

One liner or maybe two

Input file with one field (50,000+ lines)
compare each line to another file, if the $1 matches the first field of the 2nd input file(like a grep,but awk is faster), write that 2nd file line to another file(outfile) otherwise skip to the next record in file 1.
48 REPLIES 48
James R. Ferguson
Acclaimed Contributor

Re: Easy points awk question

Hi James:

How about 'comm':

# comm -3 file1 file2 > newfile

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: Easy points awk question

Hi (again):

Ooops, I think you wanted the inverse:

# comm -12 file1 file2 > newfile

Regards!

...JRF...
H.Merijn Brand (procura
Honored Contributor

Re: Easy points awk question

You need 'join'. Both files need to be sorted. Join is made for this purpose.

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Belinda Dermody
Super Advisor

Re: Easy points awk question

Jim, neither one gives me the correct results

comm -3 > file3 gives me all the lines of file and file2 in file3.
comm -2 > file3 gives me nothing.
File one will always have one field and that one field will either be in file2 or not, if in file2 put it in file file

sample file1
abc@abc.om
def@def.com # will not be in file2
xyz@xyz.com
etc..
etc..


sampe file2
abc@abc.com abc123.xyz.net
xyz@xyz.com axz999.yahoo.com
etc..... etc.....

expected file3
abc@abc.om
xyz@xyz.com

Leif Halvarsson_2
Honored Contributor

Re: Easy points awk question

Hi,
Try:
join -1 1 -2 1 -o 2.1,2.2,2.3
Sridhar Bhaskarla
Honored Contributor

Re: Easy points awk question

Hi James,

first file = file1
second file = file2

while read entry
do
/usr/bin/awk -v value=$entry '
$1==value {print $0}' file2
done < file1

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Belinda Dermody
Super Advisor

Re: Easy points awk question

I thought this would be easy not correct yet.

I thought Leif had the answer. But as an example

abc@abc.com from file1 (master listing) was in file1 85 times, but using Leif it gave me a count of only four times.

And Jim, this is the first time that you have unable to provide the correct result the first response.
James R. Ferguson
Acclaimed Contributor

Re: Easy points awk question

Hi (again) James:

OK, when you said "line" I took you literally. Merijn (Procura) offered the better command. Try this:

# join -a2 file1 file2 > newfile

Regards!

...JRF...
Dietmar Konermann
Honored Contributor

Re: Easy points awk question

Pure (somewhat upgly) awk... slow, but should work with unsorted files.

awk '
{
while (getline line < "file2") {
split (line, l);
if ($1 == l[1])
print ($1);
}
close ("file2");
}' < file1 > file3


If you need the complete "2nd file line" in the output, replace print ($1) with print (line).

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Rodney Hills
Honored Contributor

Re: Easy points awk question

If the second file has unique entries, then here is a short perl program (substitute your 2nd file name for "lookupfile")-

open(INP,"while() { chomp; ($key,$rest)=split(" ",$_,2); $lu{$key}=$_; }
close(INP);
while(<>) { chomp; print $lu{$key},"\n" if $lu{$key}; }

Run by entering-
perl aboveprogram.pl firstfile

HTH

-- Rod Hills
There be dragons...
Rodney Hills
Honored Contributor

Re: Easy points awk question

Whoops-

On the last line of the perl program use "$_" instead of "$key".

-- Rod Hills
There be dragons...
Leif Halvarsson_2
Honored Contributor

Re: Easy points awk question

Hi,
The two files has to be sorted on the join fild before using "join".
Stanimir
Trusted Contributor

Re: Easy points awk question

Hi!
Try:

for unm in `awk -F: '{print $0}' ;
do
awk -v v1=$unm 'match($0,v1){print substr($0,RSTART,RLENGTH)}' >
done

- input file
- 2nd input file
- result

Regards.



Belinda Dermody
Super Advisor

Re: Easy points awk question

Thanks for all the tries, nothing has come close to giving the correct results. The awk statements keep on bailing out on line 1; the good old helpful error msg.

The join doesnt come even close to separating the files correctly, I sorted the two input files. I guess I will have to try to Perl option next.

But once again thanks for all the response and suggestions. I will assign points as soon as I double check my work and make sure I have had any finger checks on typing in.
Dietmar Konermann
Honored Contributor

Re: Easy points awk question

Your awk error is not caused by the posted scripts... just checked some of them. Maybe a copy/paste problem?
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Belinda Dermody
Super Advisor

Re: Easy points awk question

Thanks Dietmar; found a missing closing ', but still no output
#!/bin/sh -xv


awk '
{
while (getline line < tmpusertable) {
split (line,l);
if($1 == l[1])
print ($1);
}
close (tmpusertable);
}' < work6 > work7

This way it runs 5 seconds and work7 is empty.

work6 has 24,000 lines and tmpusertable has 250,000 lines and I expect work7 to have about 18,000 lines.

If I put " " around tmpusertable like you have in the example it runs, but work7 is nothing but empty lines, I killed it after 32,000
john korterman
Honored Contributor

Re: Easy points awk question

Hi James,
please try the attached semi-awk script, but first replace and with the paths to your input files.

regards,
John K.
it would be nice if you always got a second chance
Leif Halvarsson_2
Honored Contributor

Re: Easy points awk question

Hi,
I am not sure I have understand you correct but I used your example and added some lines to file1 (xxx in my test).

# cat xxx
abc@abc.com
def@def.com
xyz@xyz.com
abc@abc.com
abc@abc.com
abc@abc.com
abc@abc.com
#
# cat yyy
abc@abc.com abc123.xyz.net
xyz@xyz.com axz999.yahoo.com


# sort xxx >zzz
# join -1 1 -2 1 -o 2.1 zzz yyy
abc@abc.com
abc@abc.com
abc@abc.com
abc@abc.com
abc@abc.com
xyz@xyz.com

Of course yyy need to be sorted too but in the example this was already done.
Belinda Dermody
Super Advisor

Re: Easy points awk question

The problem is there might be address in file xxx that does not match either one of the addresses on the line in file yyy and if so I do not want them in the report.
File xxx is addresses coming in. File yyy has the incoming address and a possible forwarding addres.

Your lines of xxx always match something in yyy


Sridhar Bhaskarla
Honored Contributor

Re: Easy points awk question

Hi James,

first file = file1 or index
second file = data

If file1 has multiple entries and if you do not want multiple entries to be printed out from file2, then you can just do one more step before running this script.

$sort file1 |uniq > index


#!/usr/bin/ksh
while read entry
do
/usr/bin/awk -v value=$entry '
$1==value {print $0}' data
done < index

If not, replace index in the above script with file1. The above script works for me. Simply copy and paste the script. Replace only data and index with your files.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
H.Merijn Brand (procura
Honored Contributor

Re: Easy points awk question

Shot in the dark, not tested

# perl -e '@ARGV=("file1");while(<>){chomp;$p{$_}++};@ARGV=("file2");while(<>){m/^(\S+)/&&exists$p{$1}and print}'

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Rodney Hills
Honored Contributor

Re: Easy points awk question

I believe my perl script will take care of not printing any entry if no match is found...

-- Rod Hills
There be dragons...
Belinda Dermody
Super Advisor

Re: Easy points awk question

OK Leif, I can not see any difference between yours and mine( I cut and pasted yours) and I get a bailout on line 1
#!/bin/ksh -xv

# tmpusertable has 2 fields a incoming address
# and a forwarding address
#
# work6 has one field either a incoming address
# or a forwarding address.

# The results that I want is a output file that
# will only have incoming
# address that have matched up
# Sample work6 file next 2 lines
# xxx@aol.com # will generate a output line
# bbb@aol.com # will not generate a output line
#
# Sample tmpusertable next line
# xxx@aol.com yyy@yahoo.com
#


while read line
do
awk -v item=$line '{if ($1 != item ) continue; else print $1}' tmpusertable
done < work6
Belinda Dermody
Super Advisor

Re: Easy points awk question

Rodney, I tried your Perl program and sent it thru a debugger, it runs but no output is either printed to the screen or a file. Look at the last reply where I used t5.ksh and examples of what I am looking for.