1834301 Members
2314 Online
110066 Solutions
New Discussion

shell script problem..

 
amonamon
Regular Advisor

shell script problem..

Hello can anyone give fast and good solution on this:
I have 2 files:
fileA:

629122
629145
629111
629000
629999
629787
629444

fileB:
123569
789999
629122
629145
777777
629111
000000
629999
629787
629444
989898
777777


and result should be :


629000

becouse it is the only line that is in fileA and not in fileB

any help..I tryed something with comm..but..no use..
15 REPLIES 15
Oviwan
Honored Contributor

Re: shell script problem..

hey

check the diff command:
$diff fileA fileB

Regards
MarkSyder
Honored Contributor

Re: shell script problem..

One of those "scrap of paper" moments - I haven't tried this but think it will work:

for i in `cat fileA`
do
if grep $i fileB
then
:
else
echo $i
done

Mark Syder (like the drink but spelt different)
The triumph of evil requires only that good men do nothing
john korterman
Honored Contributor

Re: shell script problem..

Hi,

try this first:
$ sort fileA >./fileA.s
$ sort fileB >./fileB.s

$ comm -23 fileA.s fileB.s
629000

regards,
John K.
it would be nice if you always got a second chance
amonamon
Regular Advisor

Re: shell script problem..

noo...
Problem is becouse file B is very big and also fileA

files are not sorted..
so most of content from fileA is in fileB but there are 10-20 lines that are in fileA that fileB does not have..

I think I got it..
sort -b fileA > fileA.sort
sort -b fileB > fileB.sort

comm -23 fileA.sort fileB.sort
Ian Lochray
Respected Contributor

Re: shell script problem..

grep -v -f fileb filea
MarkSyder
Honored Contributor

Re: shell script problem..

Ian,

Your solution will show every line that doesn't match. If one line matches and the other (say) 3000 don't, it will show him those 3000 lines.

Mark
The triumph of evil requires only that good men do nothing
Dennis Handly
Acclaimed Contributor

Re: shell script problem..

>can anyone give fast and good solution on this:

Fast to run? Or fast to implement?

>Mark: I haven't tried this but think it will work:

Using grep will have terrible performance for large files. Using that cat(1) will also tokenize each line, if there is more than one field that would fail.

John has the right solution with sort and comm. If there are duplicates, you may or may want to use sort -u.

>noo... Problem is because file B is very big and also fileA

Were you complaining about the sort, sort, comm solution?

>Mark: Ian, Your solution will show every line that doesn't match.

Isn't that what amonamon wants? It gives 629000. It will have all the lines of fileA that are not in fileB.
Dennis Handly
Acclaimed Contributor

Re: shell script problem..

If fileA and fileB are about 2000 records, I get:
grep:
real 0m0.02s user 0m0.03s sys 0m0.01s

sort, sort, comm:
real 0m0.07s user 0m0.01s sys 0m0.01s

With 30,000 they break even and comm is better.
MarkSyder
Honored Contributor

Re: shell script problem..

Dennis,

I am assuming that Ian's solution would involve a loop like my earlier suggestion. The first iteration of the loop would do grep -v 123569 fileA. This would output every line in fileA that did not contain 123569. Similarly, the second iteration would output every line that did not contain 789999 etc.

As Ian has written it, amonamon would actually be searching for the string fileB in fileA.

Mark
The triumph of evil requires only that good men do nothing
Dennis Handly
Acclaimed Contributor

Re: shell script problem..

>Mark: I am assuming that Ian's solution would involve a loop like my earlier suggestion.

Any loop would be IN grep. That's what grep -f does. Looks at the file fileB for each string.

>The first iteration of the loop would do grep -v 123569 fileA.

No. grep looks at each record of fileA and if it matches any line of fileB, it isn't printed. Perhaps you have your gedanken for-loops interchanged. :-)

>As Ian has written it, amonamon would actually be searching for the string fileB in fileA.

That's not what "grep -v -f file" does. Did you try it?
Ernesto Cappello
Trusted Contributor

Re: shell script problem..

Hi Amonamon, this is the command:

diff A B | grep "<" | awk '{print $2}'

to satisfy your request.

Best regards.
Ernesto
MarkSyder
Honored Contributor

Re: shell script problem..

Dennis,

No, I missed the -f!

I've never used it and have not tried it but will bear it in mind for future use.

Mark
The triumph of evil requires only that good men do nothing
Sandman!
Honored Contributor

Re: shell script problem..

> any help..I tryed something with comm..but..no use..

Did you sort the files before filtering your criteria thru comm(1) i.e.

# sort fileA > sfA
# sort fileB > sfB
# comm -23 sfA sfB
629000
blah2blah
Frequent Advisor

Re: shell script problem..

a faster alogrithm would be to use awk or perl to read in file b put everything in an array or hash, the read file a and check to see if entry is in the hash if it isn't print it.

something like this. maybe someone that knows perl can correct my mistakes.


open(FILEHANDLE,"
while (){
@a[chop()]=1;
}

open(FILEHANDLE,"
while (){
if @a[chop()] print;
}

Sandman!
Honored Contributor

Re: shell script problem..

never mind my post...John Korterman beat me to it by a mile :)