scripts in shell

Christian Aguilar Varga · ‎04-09-2008

Hello friend, i have a server with HP-UX B11.11.
I would like you can send me a little program for the next question .... I hope you can ... :)

I have 2 files: file A and file B, everyone with 2 columns and they are disorganized. However, these 2 files have some numbers in the first column in common. I would like to get a new file (file C) with the numbers in common in the first column and the second corresponding numbers. For example:

file A:

12345000001 987230001
12345000006 987230002
12345000003 876450001
12394000012 789123456

File B:

12345000001 987230005
12345000004 987231202
12345000003 876450014
12394000012 789123466

File C:

12345000001 987230001
12345000003 876450001 876450014
12394000012 789123456 789123466

I hope you can help me.

Best regards,

Christian

Christian Aguilar

Hein van den Heuvel · ‎04-09-2008

Please clarify your example.

Why does 12345000001 not show in C? Should it?

WIll the output be sorted? Could it be?
What to do with records which have no correspondence? Drop?

Is there one 'dominant/driver file and the other a secondary/slave or do both wheigh equaly?

How must data? less than a magabyte? More than a gigabyte?

If the output is sorted, then just use

join A B > C

$ join A B
12345000001 987230001 987230005
12394000012 789123456 789123466

$ sort A > AA
$ sort B > BB
$ join AA BB > C
$ cat C
12345000001 987230001 987230005
12345000003 876450001 876450014
12394000012 789123456 789123466

$ perl -e 'open B,") {($k,$v)=split; $b{$k}=$v}; open A,"){($k,$v)=split; print qq($k $v $b{$k}\n)}'
12345000001 987230001 987230005
12345000006 987230002
12345000003 876450001 876450014
12394000012 789123456 789123466

$ perl -e 'open B,") {($k,$v)=split; $b{$k}=$v}; open A,"){($k,$v)=split; print qq($k $v $b{$k}\n) if $b{$k}}'
12345000001 987230001 987230005
12345000003 876450001 876450014
12394000012 789123456 789123466
$
hth,
Hein

Christian Aguilar Varga · ‎04-09-2008

Hello,
yep, you are right .. it should ne like this:

file A:

12345000001 987230001
12345000006 987230002
12345000003 876450001
12394000012 789123456

File B:

12345000001 987230005
12345000004 987231202
12345000003 876450014
12394000012 789123466

File C:

12345000001 987230001 987230005
12345000003 876450001 876450014
12394000012 789123456 789123466

Christian Aguilar

Christian Aguilar Varga · ‎04-09-2008

And the rows without numbers in the first column in common should be dropped. Doen't matter the size of the file.

Christian Aguilar

Hein van den Heuvel · ‎04-09-2008

>> Doen't matter the size of the file.

Believe me, it does matter.

If both files are small, any solution will do.

If one file is much (10x) smaller than the other, and less than say 100MB, then you want to load that in memory first. next read the bigger file and look for matches in memory (the second perl solution).

If both files are large ( > 1 GB ) then you probably want to stick to the sort first, then join.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

scripts in shell

scripts in shell

Re: scripts in shell

Re: scripts in shell

Re: scripts in shell

Re: scripts in shell