Operating System - HP-UX
1834488 Members
3590 Online
110067 Solutions
New Discussion

Re: Easy points awk question

 
SOLVED
Go to solution
Leif Halvarsson_2
Honored Contributor

Re: Easy points awk question

Hi,
Again, sorry if I misunderstand you.

There is one line in the file xxx (or in the sorted zzz) which not matches anything in yyy:

def@def.com

But this line is not in the output from the join command in the example.
Belinda Dermody
Super Advisor

Re: Easy points awk question

Leif, at least I am getting an output file from your join, but the numbers are not correct.

# The results that I want is a output file that
# will only have incoming
# address that have matched up
# Sample work6 file next 2 lines
# xxx@aol.com # will generate a output line
# bbb@aol.com # will not generate a out line
# yyy@yahoo.com #Will also generate a out line
#
# Sample tmpusertable next line
# xxx@aol.com yyy@yahoo.com

I ran your join again, I had one line I know had 15 entries but your solution only put 5 lines in the outputfile.

Sridhar Bhaskarla
Honored Contributor

Re: Easy points awk question

James,

Here is the attached session file using my awk script.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Leif Halvarsson_2
Honored Contributor

Re: Easy points awk question

Hi,
Perhaps I understand you better now.

You want to match the lines in file1 on both fields in file2.
If using join you have to do this in two steps.

The sorted lines of file2 must be uniq.

Example:

sort file1 >xxx
sort file2 |uniq >yyy
join -1 1 -2 1 -o 2.1 xxx yyy
sort file2 -k 2 |uniq >yyy
join -1 1 -2 2 -o 2.2 xxx yyy
Belinda Dermody
Super Advisor

Re: Easy points awk question

To everyone, thank you so much, I am a hardheaded old Irishman and have troubles once in awhile, thats mildly speaking especially with this one.

For you brave hearts that want to continue I have added an attachement trying to explain the infile1 to compare against a MasterDB file and the resulting output file.
Rodney Hills
Honored Contributor

Re: Easy points awk question

This modified version I believe will work-

open(INP,"while() {
chomp;
($addr1,$addr2)=split(" ",$_);
$lu{$addr1}=$addr1; $lu{$addr2}=$addr1}
}
close(INP);
while(<>) { chomp; print $lu{$_},"\n" if $lu{$_}; }

HTH

-- Rod Hills
There be dragons...
Sridhar Bhaskarla
Honored Contributor

Re: Easy points awk question

OK,

With the latest input you provide, here is the modified script again.

while read entry
do
/usr/bin/awk -v value=$entry '
$1 == value || $2 == value {print $1}' data
done < index


here

data = your database
index = inputfile1

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Leif Halvarsson_2
Honored Contributor

Re: Easy points awk question

Hi,
I checked your example, I assume (example)
"5.mop@yahoo.com" should be "mop@yahoo.com"
and
devio.excite.com should be devio@excite.com

There is a mix of space and tab characters as delimiter in file2 so I need to do some processing first. I create temporary files so the original is unchanged. It seems as it does no sense if the matching is on field 1 or two so I create a temporary file with only one field and have to match only once.

Script:

cat file2 |tr "\040" "\011" |tr -s "\011" >yyy
cut -f 1 yyy >zzz
cut -f 2 yyy >>zzz
sort file1 >xxx
sort zzz |uniq >yyy
join -1 1 -2 1 xxx yyy

Result:

abc12@excite.com
abc@yahoo.com
abc@yahoo.com
def@excite.com
devio@excite.com
jaim@yalei.net


Dietmar Konermann
Honored Contributor

Re: Easy points awk question

OK, last try... :-)

awk '
{
while (getline line < "masterdb") {
if ($1 != "" && line != "" && match (line, $1) && split (line,l))
print (l[1]);
}
close ("masterdb");
}' < infile > outfile

BTW, the doublequotes (") are important.

$cat infile
abc@yahoo.com
def@excite.com
999@nosite.net
jaim@yalei.net
abc12@excite.com
devio@excite.com
abc@yahoo.com
texas_tiger.net

$cat masterdb
abc@yahoo.com def@hotmail.com
def@excite.com 12345@yahoo.edu
dyd@hotmail.com jaim@yalei.net
abc12@excite.com pepso@usa.gov
mop@yahoo.com devio@excite.com
rxyte@yahoo.com zyzzz@cox.net

$cat outfile
abc@yahoo.com
def@excite.com
dyd@hotmail.com
abc12@excite.com
mop@yahoo.com
abc@yahoo.com

Hopefully this works now. However, this is supposed to work with unsorted files without doing hundreds of thousends fork/exec's.

Regards...
Dietmar, learning a lot during this threads. :-)
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Andreas Voss
Honored Contributor

Re: Easy points awk question

Hi,

here my 2ct.:

awk -vmaster=MASTERFILE 'BEGIN{n=0;
while(getline < master)
{
master1[n]=$1
master2[n++]=$2
}
close(master);
}
{
for(i=0;i {
if($1 == master1[i] || $1 == master2[i])
{
print master1[i];
break;
}
}
}' FILE

Regards
john korterman
Honored Contributor

Re: Easy points awk question

Hi James,
please try the attached script, using the nightly logfile as par1 and the database file as par2.

regards,
John K.
it would be nice if you always got a second chance
Carlos Fernandez Riera
Honored Contributor

Re: Easy points awk question

Or maybe someone more....

awk ' { for ( i=1; i<= NF; i++ ) print $i }' master > all_masters
awk ' { for ( i=2; i<= NF; i++ ) print "s/"$i"/"$1"/" }' master > sed_masters
awk '{ print NR, $1 }' in > numbered_in

grep -f all_masters numbered_in > founds
sed -f sed_masters founds > thats_all_folks
unsupported
Belinda Dermody
Super Advisor

Re: Easy points awk question

Working on other priorties this morning and just got back in. What a thread.

John Korterman, I appreciate all your time and efforts, but I have cut and pasted your script twice, have saved it and shuttled it to my system ran dos2unix on it and I still get a syntax error bailout on line 1. I have tried it on a Sun platform and a HP platform with the same results. I have subsituted my files for $1 and $2.
Belinda Dermody
Super Advisor

Re: Easy points awk question

Andreas Voss, thank you for your response also, I do not get a error. But it looks like it is running for about 10 seconds and then ends without any out put. It should take a lot longer than that anyway, the MASTERFILE has 250K of lines and the infile has about 18K of lines.
H.Merijn Brand (procura
Honored Contributor

Re: Easy points awk question

Did you try my line of perl?

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Carlos Fernandez Riera
Honored Contributor

Re: Easy points awk question

With that numbers sed will fail ( only accept upto 99 lines), but it can be solved spliting the sed_file into several ones.

Maybe tomorrow i will write a better script.
unsupported
Carlos Fernandez Riera
Honored Contributor

Re: Easy points awk question

BTW: Have i read 'easy' awk....?
unsupported
Sridhar Bhaskarla
Honored Contributor

Re: Easy points awk question

Last message from me.

Did you try my latest little orthodox awk script?. I used the sample data and index given by you and it prints exactly what you mentioned.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
john korterman
Honored Contributor

Re: Easy points awk question

Hi again James,
I'm sorry to hear that it bails out in line one - I just cut and pasted my lastly attached script and ran it like this:
# ./copied.sh ./fil1 ./fil2

and it produced this output:

abc@yahoo.com
def@excite.com
abc12@excite.com
abc@yahoo.com
dyd@hotmail.com
mop@yahoo.com

My script is made by the crudest of tools: hand-crafted in vi, the bail out error from awk indicates that something has gone wrong in the copy/paste/dos2unix operations. Perhaps you could try to type it in manually in vi, maybe just rewrite the first awk line to see if the error messages "moves".
Or attach your input and let me run the script - could be arranged for a few beers.

regards,
John K.
it would be nice if you always got a second chance
Belinda Dermody
Super Advisor

Re: Easy points awk question

I want to thank you all for your patience and time with this situation. I finally got the awk script by Sridhar to run on the HP system, the files are created on a Sun platform, so I had to scp them over to the HP system and run it there but watching it run checking 1 line about every 30 seconds, I have to figure out another approach on how to get the data. I have an average of 20K of rejects per day and the Masterfile has 250K of line items. It would probably take all day to get the daily report and then have to start over again. I am assigning points to the die hards even though I couldn't get there stuff to work. Once again thank you very much.

For Carlos, I do have three awk books and I did read and I am not a programmer, I have over 75 written scripts that make my daily jobs easier and they have a lot of basic awk statements but this situation was different and grepping was really to slow.

So once again thank you all for your time.
Leif Halvarsson_2
Honored Contributor
Solution

Re: Easy points awk question

Hi,
Have you problems running awk scripts on Sun (and if running an older version of Solaris), try nawk instead, it should be compatible with awk on HPUX.

In my example I missed that you wanted the first field outpyt if the matching was on the second field. The example below should work better.


cat file2 |tr "\040" "\011" |tr -s "\011" >yyy
sort file1 >xxx
sort -k 1 yyy >zzz
join -1 1 -2 1 -o 1.1 xxx zzz
sort -k 2 yyy >zzz
join -1 1 -2 2 -o 2.1 xxx zzz
#
# ./testj
abc12@excite.com
abc@yahoo.com
abc@yahoo.com
def@excite.com
mop@yahoo.com
dyd@hotmail.com
Rodney Hills
Honored Contributor

Re: Easy points awk question

I tried my perl script on the sample data you supplied, and go the results you are looking for.

Here is the result of my session-


$ cat masterdb
abc@yahoo.com def@hotmail.com
def@excite.com 12345@yahoo.edu
dyd@hotmail.com jaim@yalei.net
abc12@excite.com pepso@usa.gov
mop@yahoo.com devio.excite.com
rxyte@yahoo.com zyzzz.cox.net
$
$ cat infile1
abc@yahoo.com
def@excite.com
999@nosite.net
jaim@yalei.net
abc12@excite.com
devio@excite.com
abc@yahoo.com
texas_tiger.net
$
$ cat myreport.pl
open(INP,"while() {
chomp;
($addr1,$addr2)=split(" ",$_);
$lu{$addr1}=$addr1; $lu{$addr2}=$addr1;
}
close(INP);
while(<>) { chomp; print $lu{$_},"\n" if $lu{$_}; }
$
$ perl myreport.pl infile1
abc@yahoo.com
def@excite.com
dyd@hotmail.com
abc12@excite.com
abc@yahoo.com


The results are what you stated in your sample. This perl script should also be a lot more effecient then any of the awk or sed scripts.

HTH

-- Rod Hills

There be dragons...
Belinda Dermody
Super Advisor

Re: Easy points awk question

Congrates to Leif, I just finished taking counts on sample email addresses and his output file agree. It only takes about 4 minutes to process all the records also.

Thank you so very much -- Plus all you other guys who put up with this stubborn old man.....
Belinda Dermody
Super Advisor

Re: Easy points awk question

Rodney, Congrates yours works also and I have to do some number crunching and comparing between the two, time wise is about the same but you have 30 more lines for some reason, I do not know if they are extra or the join was droping them.

Thanks all once again....