1830048 Members
31053 Online
109998 Solutions
New Discussion

comm command

 
amonamon
Regular Advisor

comm command

well I am not sure is only comm command convinient for this problem but I tryes with it..

fileA -

11
33
44
55
77


fileB -

22
34
45
55
56
66
77
778
888
999

output:

11
33
44

so print to output line from fileA if it is not in fileB..

thnaks a lot..
9 REPLIES 9
john korterman
Honored Contributor

Re: comm command

Hi,

try;
$ comm -23 fileA fileB

regards,
John K.
it would be nice if you always got a second chance
Dennis Handly
Acclaimed Contributor

Re: comm command

Both fileA and fileB need to be sorted for comm(1).
Peter Nikitka
Honored Contributor

Re: comm command

Hi,

a sorted file in the sight of 'comm' is a collated sort. If you have a numerical sorted input like
9
88
100
it's complete out of order in lexical (=comm) sense.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
amonamon
Regular Advisor

Re: comm command

It works as I tried also with comm -23 and this can work for small file but I have 1 000 000 lines in files and it has some problems..I will test it again and let U know..

Thanks a lot!!!
Peter Godron
Honored Contributor

Re: comm command

Hi,
seems you have a duplicate at:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1110742

Have you tried:
grep -vf fileB fileA

How big are both of your files ?
Dennis Handly
Acclaimed Contributor

Re: comm command

>Peter N: a sorted file in the sight of 'comm' is a collated sort.

Yes. The above fragments were sorted by ASCII. To check you can use:
$ sort -c fileA; echo $?
$ sort -c fileB; echo $?

>Peter G: Have you tried: grep -vf fileB fileA
>How big are both of your files?

With a million lines, that may be too big.
Using sort & comm instead, may depend on how often these files change.
Bob E Campbell
Honored Contributor

Re: comm command

I am not sure what the performance of this solution would be, but I have to believe it would beat the double sorted comm(1) solution:

# grep -v -f fileB fileA
11
33
44
Bob E Campbell
Honored Contributor

Re: comm command

Ooops, sorry Peter and Dennis. I need to get new bifocals. I generated a file with numbers from 1 to 4000000 and then copied it and deleted a few lines:

# time comm -13 sortedListA sortedListB
123456
15
3999966

real 0.8
user 0.7
sys 0.0

# time comm -13 bigListA bigListB
15
123456
3999966

real 0.7
user 0.7
sys 0.0

# time grep -v -f bigListA bigListB
grep: not enough memory

real 9:57.2
user 9:53.2
sys 0.6

Looks to me as if the two files need to have the same sort order, that grep can't stomach comparing two really big files, and 8 gig of RAM just isn't what it used to be.

Bob
Dennis Handly
Acclaimed Contributor

Re: comm command

>Bob: Looks to me as if the two files need to have the same sort order, that grep can't stomach comparing two really big files

sort doesn't use a bubble sort. ;-)
So it maybe takes n1 * log(n1) + n2 * log(n2)
Doing a grep requires comparing every record in fileA with everyone in fileB (perhaps 50% if it matches, 100% if it doesn't).

comm only needs to compare with records up to the match.