1833451 Members
3064 Online
110052 Solutions
New Discussion

scripts in shell

 
SOLVED
Go to solution

scripts in shell

Hello friend, i have a server with HP-UX B11.11.
I would like you can send me a little program for the next question .... I hope you can ... :)

I have 2 files: file A and file B, everyone with 2 columns and they are disorganized. However, these 2 files have some numbers in the first column in common. I would like to get a new file (file C) with the numbers in common in the first column and the second corresponding numbers. For example:

file A:

12345000001 987230001
12345000006 987230002
12345000003 876450001
12394000012 789123456

File B:

12345000001 987230005
12345000004 987231202
12345000003 876450014
12394000012 789123466

File C:

12345000001 987230001
12345000003 876450001 876450014
12394000012 789123456 789123466

I hope you can help me.

Best regards,

Christian
Christian Aguilar
4 REPLIES 4
Hein van den Heuvel
Honored Contributor
Solution

Re: scripts in shell

Please clarify your example.

Why does 12345000001 not show in C? Should it?

WIll the output be sorted? Could it be?
What to do with records which have no correspondence? Drop?

Is there one 'dominant/driver file and the other a secondary/slave or do both wheigh equaly?

How must data? less than a magabyte? More than a gigabyte?

If the output is sorted, then just use

join A B > C

$ join A B
12345000001 987230001 987230005
12394000012 789123456 789123466

$ sort A > AA
$ sort B > BB
$ join AA BB > C
$ cat C
12345000001 987230001 987230005
12345000003 876450001 876450014
12394000012 789123456 789123466

$ perl -e 'open B,") {($k,$v)=split; $b{$k}=$v}; open A,"){($k,$v)=split; print qq($k $v $b{$k}\n)}'
12345000001 987230001 987230005
12345000006 987230002
12345000003 876450001 876450014
12394000012 789123456 789123466

$ perl -e 'open B,") {($k,$v)=split; $b{$k}=$v}; open A,"
){($k,$v)=split; print qq($k $v $b{$k}\n) if $b{$k}}'
12345000001 987230001 987230005
12345000003 876450001 876450014
12394000012 789123456 789123466
$
hth,
Hein

Re: scripts in shell

Hello,
yep, you are right .. it should ne like this:

file A:

12345000001 987230001
12345000006 987230002
12345000003 876450001
12394000012 789123456

File B:

12345000001 987230005
12345000004 987231202
12345000003 876450014
12394000012 789123466

File C:

12345000001 987230001 987230005
12345000003 876450001 876450014
12394000012 789123456 789123466
Christian Aguilar

Re: scripts in shell

And the rows without numbers in the first column in common should be dropped. Doen't matter the size of the file.
Christian Aguilar
Hein van den Heuvel
Honored Contributor

Re: scripts in shell

>> Doen't matter the size of the file.

Believe me, it does matter.

If both files are small, any solution will do.

If one file is much (10x) smaller than the other, and less than say 100MB, then you want to load that in memory first. next read the bigger file and look for matches in memory (the second perl solution).

If both files are large ( > 1 GB ) then you probably want to stick to the sort first, then join.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting