Operating System - HP-UX
1847566 Members
3833 Online
110265 Solutions
New Discussion

Filter records from Input File

 
SOLVED
Go to solution
uform
Frequent Advisor

Filter records from Input File

Hi,

File 1
Record1: AAAERTD778 Q 100 DFGD 5676
Record2: AAAERTD558 Q 100 DFGD 323
Record3: AAAERTD448 Q 100 DFGD 778
Record4: AAAERTD178 Q 100 DFGD 5626
Record5: AAAERTD323 Q 100 DFGD 736

File 2
778
323

My requirement is to open File 2. Read line by line and search for the pattern in File 1.
That too patern should match only in first column.

For eg. If i take 778 from File 2 and do a general grep in File 1, it returns Records 1 and 3.But i need it to return only Record 1.

Problem is ,

1) If i just say grep for 778 and redirect output to another file it writes records 1 and 3 into new file.

2) If i cut first column and do the grep then
redirect output to another file it writes only the first column. But i need to write the whole record.

So i followed 2nd method and got the line nos using grep -n. Then again i looped based on physical line no and got the final output as what i needed. But i feel there should be a eay way to do it.

PLS HELP !! ONLY Shell script. NO PERL PLS.

Thanks



18 REPLIES 18
uform
Frequent Advisor

Re: Filter records from Input File

Also File 2 may not always have a 3 digit no.

AND

File 1 may not have same length in first column... it may be AAAERTD778 or
XYHAERT123778 or HAERT99978.
Jeff_Traigle
Honored Contributor

Re: Filter records from Input File

A variety of ways to attack this. The easiest, if the data in column 1 is always prefixed by AAAERTD, is adjusting your grep to the following should get you what you want:

for NUMBER in $(cat file2)
do
grep AAAERTD${NUMBER} file1 > ${NUMBER}.output
done
--
Jeff Traigle
James R. Ferguson
Acclaimed Contributor

Re: Filter records from Input File

Hi:

No Perl, too bad. OK, so create a read loop to read 'file2' and use its tokens as arguments to 'awk'. Something like:

...
while read TOKEN
do
awk -v TOKEN=${TOKEN} '$2~TOKEN {print}' file1
done < file2

Note that 'awk' counts fileds one-relative. Hence, $2 is awk's *first* whitespace-delimited field.

Regards!

...JRF...
Jeff_Traigle
Honored Contributor

Re: Filter records from Input File

Ok. so you complicate it a bit with that requirement. :) Change the grep to the following:


grep -E "[A-Z]*${NUMBER} " file1 > ${NUMBER}.output
--
Jeff Traigle
uform
Frequent Advisor

Re: Filter records from Input File

grep -E "[A-Z]*${NUMBER} " file1 > ${NUMBER}.output

will this not pickup Record 1 and 3 if the search string is 778 ?
Hein van den Heuvel
Honored Contributor

Re: Filter records from Input File


Easy... just have a first awk command generate the right awk program, then execute:

$ cat x.1
AAAERTD778 Q 100 DFGD 5676
AAAERTD558 Q 100 DFGD 323
AAAERTD448 Q 100 DFGD 778
AAAERTD178 Q 100 DFGD 5626
AAAERTD323 Q 100 DFGD 736

$ cat x.2
778
323
$ awk '{print "/^[A-Z]*" $1 "/"}' x.2 > /tmp/tmp_$$.awk
$ awk -f /tmp/tmp_$$.awk x.1
AAAERTD778 Q 100 DFGD 5676
AAAERTD323 Q 100 DFGD 736
$ rm /tmp/tmp_$$.awk x.1

For sake of completeness the helper looks like:
$ awk '{print "/^[A-Z]*" $1 " /"}' x.2
/^[A-Z]*778/
/^[A-Z]*323/

So it looks for lines starting with a bunch on characters from A to Z, followed by a specific number as found in the second file, followed by a space. If found, take the non-specified, default action, which is print the current line.

Hein
HvdH Performance Consulting
Sandman!
Honored Contributor

Re: Filter records from Input File

Here's a short one-line awk construct that does what you're looking for:

f1 contains...
AAAERTD778 Q 100 DFGD 5676
AAAERTD558 Q 100 DFGD 323
AAAERTD448 Q 100 DFGD 778
AAAERTD178 Q 100 DFGD 5626
AAAERTD323 Q 100 DFGD 736

f2 contains...
778
323

awk '{if($1~/^[A-Z]/){n=z[split($1,z,"[A-Z]")];x[n]=$0}for(i in x) if(i==$1)print x[i]}' f1 f2

~cheers
Sandman!
Honored Contributor

Re: Filter records from Input File

...and here's the indented version for clarity:

awk '{
if ($1~/^[A-Z]/) {
n = z[split($1,z,"[A-Z]")]
x[n] = $0
}
for (i in x)
if (i == $1)
print x[i]
}' f1 f2
Arturo Galbiati
Esteemed Contributor

Re: Filter records from Input File

Hi,
Here's a short one-line ksh construct that does what you're looking for:

f1 contains...
AAAERTD778 Q 100 DFGD 5676
AAAERTD558 Q 100 DFGD 323
AAAERTD448 Q 100 DFGD 778
AAAERTD178 Q 100 DFGD 5626
AAAERTD323 Q 100 DFGD 736

f2 contains...
778
323

xargsRecord1: AAAERTD778 Q 100 DFGD 5676
Record5: AAAERTD323 Q 100 DFGD 736

This look for key in file f2 there are in file f1 at the bgininng preceded by any chars and terminated by blank
HTH,
Art

Hein van den Heuvel
Honored Contributor

Re: Filter records from Input File

I think we have beaten this hors to dead by now, but want to give a quick comment though...

Sandman,

Your solution creates a large array with all of file1 does it not? And it will find only one match, even if there were 2 lines in file1 for a given file2 entry.

I would suggest to flip the files around:

awk "/^[0-9]*/{x[$1]=1} /^[A-Z]/ {n=z[split($1,z,\"[A-Z]\")]; if (x[n]) print}" 2.tmp 1.tmp

1) if the line starts wit a number, set a flag for that number in an array.
2) if the line starts with letter, split away a number part. If that number is flagged in the array, print the line.

Hein.


uform
Frequent Advisor

Re: Filter records from Input File

Sorry folks for responding so late...i forgot the user id , pwd for this site and had a tuff time finding it. :)

anyways ... regarding this question.

I would like to reiterate the question one more time...

1) I want to open File 2, read line by line
and then search for that string in File 1's first column. I see many mentioning A-Z,0-9
etc... i dont want to stick on to that way...
whatever the start and end of column one be(even special characters, all i need is the search string should be in FIRST column of file 1. so may be it will be good to just cut the column one and search in it.

2) grep -E or grep -e not working for me
@lib>grep -e
grep: illegal option -- e
Usage: grep -hblcnsviw pattern file . . .
@lib>grep -E
grep: illegal option -- E
Usage: grep -hblcnsviw pattern file . . .

Thanks for all your replies and patience.
Sandman!
Honored Contributor

Re: Filter records from Input File

Based on your last post...here's a shell script that does what you're looking for:

while read i
do
while read j
do
str=$(echo $j | awk '{print z[split($1,z,"[A-Z]")]}')
if [[ $i = $str ]]; then
echo $j
fi
done < f1
done < f2
Sandman!
Honored Contributor

Re: Filter records from Input File

Script attached as part of it got cut-off in my last post due to copy 'n paste.

cheers!
Hein van den Heuvel
Honored Contributor
Solution

Re: Filter records from Input File

Ah, we read too much in too little description.

I still kinda like the two-stage approach.
Short and sweet. Try this:

# cat x.1
AAAERTD778 Q 100 DFGD 5676
AAAERTD558 Q 100 DFGD 323
AAAERTD448 Q 100 DFGD 778
AAAERTD178 Q 100 DFGD 5626
AAAERTD323 Q 100 DFGD 736
# cat x.2
TD1
558
#
# awk '{print "$1 ~ /" $1 "/"}' x.2 > tmp.awk
# awk -f tmp.awk x.1
AAAERTD558 Q 100 DFGD 323
AAAERTD178 Q 100 DFGD 5626

For sake of completeness, the help file was:
# cat tmp.awk
$1 ~ /TD1/
$1 ~ /558/

Hein.
uform
Frequent Advisor

Re: Filter records from Input File

Hi Sandman,

Your script working.. except final echo $j
removing all extra spaces within every column from particular record

if a column is 80 bytes and has values for
only 40 bytes ..rest will be blanks/spaces....but your code echos 40 bytes then one space and prints next columns.... basically length of whole record
is shrinked. PLS HELP !!!

while read i
do
while read j
do
str=$(echo $j | awk '{print z[split($1,z,"[A-Z]")]}')
if [[ $i = $str ]]; then
echo $j
fi
done < f1
done < f2
Sandman!
Honored Contributor

Re: Filter records from Input File

>Your script working.. except final echo $j removing all extra spaces within >every column from particular record

>if a column is 80 bytes and has values for only 40 bytes ..rest will be >blanks/spaces....but your code echos 40 bytes then one space and prints next >columns.... basically length of whole record is shrinked.

If you need to print the trailing spaces for every record then you need to set the internal field separator (IFS) to something other than a space. This way the echo $j command will capture and print everything, including the trailing spaces. See the script attached to this post:

~cheers
Hein van den Heuvel
Honored Contributor

Re: Filter records from Input File

>>> PLS HELP !!!

Did you try my last suggested solution?

It would retain spaces, and I suspect it will be relatively fast, which may be important for large data volumes.

Hein.
uform
Frequent Advisor

Re: Filter records from Input File

Sandman,

Your code worked with that IFS=":" ... but for large data it just hung.


Hein,

Sorry i was concentrating on sandman's code.
YOUR CODE WORKED !! even for large data. cool.

Thanks to ALL