Operating System - HP-UX
1753500 Members
3623 Online
108794 Solutions
New Discussion

Re: Script question - comm limits

 
SOLVED
Go to solution
Hein van den Heuvel
Honored Contributor

Re: Script question - comm limits

I had an existing Perl script to compare lines, based on a key value first, and the whole line next, printing matching lines.

It can easily be adapted to use different key functions, or different outputs (non-matching)

Here it is, using the first sequence of numbers on a line as keys.

--------------- comm_12_numeric.pl ---------
#
# look for matching lines based on a key value
#
# Open files
#
$name = shift @ARGV or die "Must provide first filename";
open F1, "<$name" or die "Could not read file $name";
$name = shift @ARGV or die "Must provide second filename";
open F2, "<$name" or die "Could not read file $name";


my ($f1, $f2, $k1, $k2);

# Read a line from F1 into global $f1, and return its key value.
sub k1() {
$f1 = ;
exit unless defined ($f1);
$f1 =~ m/^(\d+)/;
return $1;
}

# Read a line from F2 into global $f2, and return its key value.
sub k2() {
$f2 = ;
exit unless defined ($f2);
$f2 =~ m/^(\d+)/;
return $1;
}

#
$k1 = &k1;
$k2 = &k2;

while ( 1 ) {
if ($k1 == $k2) {
print $f1 if ($f1 eq $f2);
$k1 = &k1;
$k2 = &k2;
} else {
if ($k1 > $k2) {
$k2 = &k2 while $k1 > $k2
} else {
$k1 = &k1 while $k2 > $k1
}
}
}
-----------------


For sake of completeness a perl equivalent for 'comm -12', printing matching lines ordered using the whole line. Note how only 2 string variables and 2 file variables are used.

---------------- comm_12_text.pl ---------
#
# Open files
#
$name = shift @ARGV or die "Must provide first filename";
open F1, "<$name" or die "Could not read file $name";
$name = shift @ARGV or die "Must provide second filename";
open F2, "<$name" or die "Could not read file $name";

my $f1 = ;
my $f2 = ;
while (defined ($f1) & defined ($f2)) {
if ($f1 eq $f2) {
print $f1;
$f1 = ;
$f2 = ;
} else {
if ($f1 gt $f2) {
$f2 = while defined ($f2) & $f1 gt $f2;
} else {
$f1 = while defined ($f1) & $f2 gt $f1;
}
}
}
----------

Cheers,
Hein.
Raynald Boucher
Super Advisor

Re: Script question - comm limits

Hello all,

The solution is to use sort with no options on your source files.

To confirm,
1- I resorted the files and it worked properly.
2- I took a subset of my original files (first 2000 lines presorted with sort -n) to test with reduced numbers; comm failed.

comm succeeded with my test data because all my test data starts with the same string.
I extracted all entries matching '211[0-9]*.doc' to reduce numbers but that made the numeric sort match the alpha sort.

Thanks all

RayB