1836245 Members
1682 Online
110096 Solutions
New Discussion

Re: seach two files

 
SOLVED
Go to solution
roger_114
Occasional Contributor

seach two files

Hi

I need to read a file (allfiles.txt) and for each record in the file, search another file (arc.txt) to see if there is a match. If there is no match then echo the record from allfiles.txt to the screen.

There are approx 80,000 records in each file.

The script below works but is slow.

Is there a faster way. ?

Thanks


for i in `cat /tmp/allfiles.txt`
do
grep -q $i /tmp/arc.txt
exit_status=$?
if [ $exit_status != 0 ]
then
echo $i
fi
done
12 REPLIES 12
Rodney Hills
Honored Contributor
Solution

Re: seach two files

Assuming the records are unique in each file, You could use perl-

#!/usr/bin/perl
@files=("/tmp/allfiles.txt","tmp/arc.txt");
for $i in (0..1) {
open(INP,"<$files[$i]");
while() {
chomp;
$hold{$_}+=$i+1;
}
close(INP);
}
foreach $rec (sort keys %hold) {
$cnt=$hold{$rec};
next if $cnt == 3;
print "Only in allfiles: ".$rec,"\n" if $cnt == 1;
print "Only in arc : ".$rec,"\n" if $cnt == 2;
}

HTH

-- Rod Hills
There be dragons...
roger_114
Occasional Contributor

Re: seach two files

Is there a way to do it without Perl.. ?
A. Clay Stephenson
Acclaimed Contributor

Re: seach two files

Well, you are making this way too hard because grep with the -f option should do what you are trying to do.

grep -f /tmp/allfiles.txt /tmp/arc.txt will produce a line of output for each "hit" found.
If it ain't broke, I can fix that.
harry d brown jr
Honored Contributor

Re: seach two files

make sure both files are sorted

man diff
man sort

diff file1 file2

or

sort -u file1 file2

live free or die
harry d brown jr
Live Free or Die
harry d brown jr
Honored Contributor

Re: seach two files

sorry,

the sort should read:

sort fileA fileB|uniq -u

live free or die
harry
Live Free or Die
roger_114
Occasional Contributor

Re: seach two files

Well I'm getting closer, but I think I am making this too hard.. What I need to do is read fileA , check to see if it has a match in fileB. If it does not, echo to screen (or file). If it does have a match, then keep processing...

harry d brown jr
Honored Contributor

Re: seach two files


if they are not sorted then you are left to WALKING the files. There are no other options - period.

live free or die
harry d brown jr
Live Free or Die
Biswajit Tripathy
Honored Contributor

Re: seach two files

Roger wrote:
> Well I'm getting closer, but I think I am making
> this too hard..

The hard part is not writing the script that does what you want (the script written by you in your
original post does that perfectly), the hard part is
writing one that is *significantly faster*.

If your script runs only few times, you should
continue to use the script you are using now.
Otherwise you could write a C or C++ program to
do what you want.

If you decide do this, write a a C /C++ program
that uses 2 char arrays to read the entire content
of both the files (i.e like char allfiles[][] and char
arc[][]) to the memory, sort them and start
searching from top.This will make searching
significantly faster as

1) you can abort searching for a string if strncmp()
returns greater than 1

2)search for a string in arc[][] should start where
the last search ended (i.e no need to search entire
string).

- Biswajit
:-)
B. Hulst
Trusted Contributor

Re: seach two files

Hi,

If you want speed and lots of record create a C script. That always works.

Regards,
Bob
Rodney Hills
Honored Contributor

Re: seach two files

I'm surprised the shell could handle the
for i in `cat /tmp/allfiles.txt`

since allfiles.txt has 80000 records.

Maybe you could try changing the "for" to a "while".

IFS=""
exec <4/tmp/allfiles.txt
while read -ru4 i ; do
grep -q $i /tmp/arc.txt
...
done

HTH

-- Rod Hills
There be dragons...
Rodney Hills
Honored Contributor

Re: seach two files

Oops, should be-
exec 4
-- Rod Hills
There be dragons...
c_51
Trusted Contributor

Re: seach two files

how about:

mkfifo fifo1
mkfifo fifo2

sort allfiles.txt > fifo1
sort arc.txt > fifo2

comm -12 fifo1 fifo2