1838608 Members
3908 Online
110128 Solutions
New Discussion

Help needed in scripting

 
SOLVED
Go to solution
Kenneth_18
Frequent Advisor

Help needed in scripting

Here is what I want to accomplish. I've been thinking hard but I just can't get an idea on how to start the script so maybe a sample script or hint will send me on my way.

Here is the problem.

One file contains an ASCII table as shown below:

file1
xx dd xx xx
11 aa 22 33
xx cc xx xx
77 bb 88 99
xx gg xx xx

another file contains another ASCII table as shown below:

file2

xx xx cc xx
xx xx jj xx
xx xx bb xx
xx xx dd xx
44 55 aa 66

What I want to achieve is to combine bothfiles into one table with each lines in each table combined that has the same values in the 2nd column and third columns of file1 and file2 respectively. The unmatch data are written to another separate error logfile. Thus the result should be:

aa 11 22 33 44 55 66
bb 77 88 99 xx xx xx
cc xx xx xx xx xx xx

the error log file should contain something like below:

gg xx xx xx
jj xx xx xx

Thanks in advance for the help!
7 REPLIES 7
H.Merijn Brand (procura
Honored Contributor

Re: Help needed in scripting

# man join

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Hein van den Heuvel
Honored Contributor
Solution

Re: Help needed in scripting


If you cat SORT the input first, then this becomes a one liner with JOIN:


$ sort -k 2 x > a
$ sort -k 3 xx > aa
$ cat a
11 aa 22 33
77 bb 88 99
xx cc xx xx
xx dd xx xx
xx gg xx xx
$ cat aa
44 55 aa 66
xx xx bb xx
xx xx cc xx
xx xx dd xx
xx xx jj xx
$ join -j1 2 -j2 3 -o 1.2 1.1 1.3 1.4 2.1 2.2 2.4 a aa
aa 11 22 33 44 55 66
bb 77 88 99 xx xx xx
cc xx xx xx xx xx xx
dd xx xx xx xx xx xx


$ join -j1 2 -j2 3 -v 1 a aa
gg xx xx xx
$ join -j1 2 -j2 3 -v 2 a aa
jj xx xx xx

Hein.

Muthukumar_5
Honored Contributor

Re: Help needed in scripting

We can do this with sort / paste / awk as,

sort to sort based on field 2 on file1 and filed 3 on file2

# sort -k 2,2 file1 > file1.sort
# sort -k 3,3 file2 > file2.sort

# paste -d " " file1.sort file2.sort | awk '{ print $2" "$1" "$3" "$4" "$5" "$6" "$8 }'

aa 11 22 33 44 55 66
bb 77 88 99 xx xx xx
cc xx xx xx xx xx xx
dd xx xx xx xx xx xx
gg xx xx xx xx xx xx

HTH.

Easy to suggest when don't know about the problem!
harry d brown jr
Honored Contributor

Re: Help needed in scripting


I like the sort and join commands myself, but maybe you need to learn some perl:

#!/usr/bin/perl
#
$collected = "";
$errs_collected = "";
#
open(firstFILEPTR, "< ./testa") ||
die "can not open ./testa for input? check your current directory permissions and space availability!";
while () {
chomp;
($part1,$part2,$part3,$part4) = split(/ /, $_);
$collected{$part2}{"data"} = sprintf("%s %s %s %s",$part2,$part1,$part3,$part4);
$collected{$part2}{"cnt"} = 4;
}
#
close (firstFILEPTR);
#
open(secondFILEPTR, "< ./testb") ||
die "can not open ./testb for input? check your current directory permissions and space availability!";
while () {
chomp;
($part1,$part2,$part3,$part4) = split(/ /, $_);
if ($collected{$part3} eq "") {
$errs_collected{$part3} = sprintf("%s %s %s %s",$part3,$part1,$part2,$part4);
} else {
$collected{$part3}{"data"} = sprintf("%s %s %s %s %s",$collected{$part3}{"data"},$part1,$part2,$part4);
$collected{$part3}{"cnt"} += 3;
}
}
#
close (secondFILEPTR);
#
foreach $partname ( keys %collected ) {
if ($collected{$partname}{"cnt"} eq 7) {
printf("%s\n",$collected{$partname}{"data"});
}
}
printf("Errors to follow:\n");
foreach $partname ( keys %collected ) {
if ($collected{$partname}{"cnt"} ne 7) {
printf("%s\n",$collected{$partname}{"data"});
}
}
foreach $partname ( keys %errs_collected ) {
printf("%s\n",$errs_collected{$partname});
}


live free or die
harry
Live Free or Die
Hein van den Heuvel
Honored Contributor

Re: Help needed in scripting


Ooops,
Forgot to include my perl alternative.
This does not require pre-sorting and as written outputs be order of the second file argument.
As written it allows only 1 match per pair.
A perl script gives you full control over tweaks like that.
caveat: It sucks in the whole first file, so may become too slow for large files.

See below.
Cheers,
Hein.

$ perl x.p x xx
cc xx xx xx xx xx xx
bb 77 88 99 xx xx xx
dd xx xx xx xx xx xx
aa 11 22 33 44 55 66
excess from file 1
gg xx xx xx
excess from file 2
jj xx xx xx
$ cat x.p
$f = shift @ARGV;
open (F,"<$f") or die "Failed to open $f";
while () {
chop;
@w= split;
$x1{$w[1]} = join(' ',$w[0], $w[2], $w[3]);
}
close (F);
$f = shift @ARGV;
open (F,"<$f") or die "Failed to open $f";
while () {
chop;
@w= split;
$j= $w[2];
$f2 = join(' ',$w[0], $w[1], $w[3]);
if ($f1 = $x1{$j}) {
print join(' ',$j, $f1, $f2)."\n";
undef $x1{$j};
} else {
$x2{$j} = $f2;
}
}
print "excess from file 1\n";

foreach $k (keys %x1) {
print "$k $x1{$k}\n" if $x1{$k};
}
print "excess from file 2\n";
foreach $k (keys %x2) {
print "$k $x2{$k}\n";
}

Hein van den Heuvel
Honored Contributor

Re: Help needed in scripting

Muthukumar, Sorry but I believe your solution will only work in the example given where the 'exception' record happen to sort after the matching records. If one adds a record with key 'bc' to one of the input files the paste command will get 'out of sync'.

Cheers,
Hein.
Kenneth_18
Frequent Advisor

Re: Help needed in scripting

Sorry guys but I've been to busy to responed immediately. I will try out the suggestions and let you know the results.