comm command

amonamon · ‎03-20-2007

well I am not sure is only comm command convinient for this problem but I tryes with it..

fileA -

11
33
44
55
77

fileB -

22
34
45
55
56
66
77
778
888
999

output:

11
33
44

so print to output line from fileA if it is not in fileB..

thnaks a lot..

john korterman · ‎03-20-2007

Hi,

try;
$ comm -23 fileA fileB

regards,
John K.

it would be nice if you always got a second chance

Dennis Handly · ‎03-20-2007

Both fileA and fileB need to be sorted for comm(1).

Peter Nikitka · ‎03-21-2007

Hi,

a sorted file in the sight of 'comm' is a collated sort. If you have a numerical sorted input like
9
88
100
it's complete out of order in lexical (=comm) sense.

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

amonamon · ‎03-21-2007

It works as I tried also with comm -23 and this can work for small file but I have 1 000 000 lines in files and it has some problems..I will test it again and let U know..

Thanks a lot!!!

Peter Godron · ‎03-21-2007

Hi,
seems you have a duplicate at:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1110742

Have you tried:
grep -vf fileB fileA

How big are both of your files ?

Dennis Handly · ‎03-21-2007

>Peter N: a sorted file in the sight of 'comm' is a collated sort.

Yes. The above fragments were sorted by ASCII. To check you can use:
$ sort -c fileA; echo $?
$ sort -c fileB; echo $?

>Peter G: Have you tried: grep -vf fileB fileA
>How big are both of your files?

With a million lines, that may be too big.
Using sort & comm instead, may depend on how often these files change.

Bob E Campbell · ‎03-21-2007

I am not sure what the performance of this solution would be, but I have to believe it would beat the double sorted comm(1) solution:

# grep -v -f fileB fileA
11
33
44

Bob E Campbell · ‎03-21-2007

Ooops, sorry Peter and Dennis. I need to get new bifocals. I generated a file with numbers from 1 to 4000000 and then copied it and deleted a few lines:

# time comm -13 sortedListA sortedListB
123456
15
3999966

real 0.8
user 0.7
sys 0.0

# time comm -13 bigListA bigListB
15
123456
3999966

real 0.7
user 0.7
sys 0.0

# time grep -v -f bigListA bigListB
grep: not enough memory

real 9:57.2
user 9:53.2
sys 0.6

Looks to me as if the two files need to have the same sort order, that grep can't stomach comparing two really big files, and 8 gig of RAM just isn't what it used to be.

Bob

Dennis Handly · ‎03-21-2007

>Bob: Looks to me as if the two files need to have the same sort order, that grep can't stomach comparing two really big files

sort doesn't use a bubble sort. ;-)
So it maybe takes n1 * log(n1) + n2 * log(n2)
Doing a grep requires comparing every record in fileA with everyone in fileB (perhaps 50% if it matches, 100% if it doesn't).

comm only needs to compare with records up to the match.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

comm command

comm command

Re: comm command

Re: comm command

Re: comm command

Re: comm command

Re: comm command

Re: comm command

Re: comm command

Re: comm command

Re: comm command