Operating System - Linux
1753481 Members
4429 Online
108794 Solutions
New Discussion юеВ

Re: Pull data from file with script tool?

 
SOLVED
Go to solution
TDW
New Member

Pull data from file with script tool?

If I have a file like this:

123 789
234 500
999 662
373 881
474 662
611 200
515 789
809 424
234 500

I want to return each row that has a value in field two that appears with one or more different values in field one. So from this list, I want:

999 662
474 662
123 789
515 789

I do not want:

234 500
234 500

because that value matches. I want to see when a value repeats in field two with a different value in field one. Make sense?

I'm sure some awk or perl expert can knock this one out quick. Thanks for the help in advance!
14 REPLIES 14
Hein van den Heuvel
Honored Contributor
Solution

Re: Pull data from file with script tool?

What needs to happen with a 3rd value pair?
- when the first pairs where the same "432 500"
- when they were not "456 789"


Here is one way:

perl -n x.pl x
--- x.pl ---
($left,$right)=split;
$prior = $seen{$right};
if (defined $prior) {
if ($left ne $prior) {
print "$prior $right\n$left $right\n";
delete $seen{$right}
}
} else {
$seen{$right} = $left;
}
-------------

So this tosses ignores repeats and tosses a pair once printed. Cast in order or appearance.

Or this....
Sort first by key 2, then look and remember.
This breaks on a 3rd pair as written.

$ sort -k 2 x | perl -ne '($a,$b)=split; print "$x $y\n$a $b\n" if $a ne $x and $b eq $y; $x=$a; $y=$b'
474 662
999 662
123 789
515 789


I think you want this....

----------- x.pl ----------
($left,$right)=split;
$prior = $seen{$right};
if (defined $prior) {
if ($left ne $prior) {
print "$prior $right\n" unless $header{$right}++;
print "$left $right\n";
}
} else {
$seen{$right} = $left;
}
----------------------------

Hein.

TDW
New Member

Re: Pull data from file with script tool?

I want every line returned where the same value in field two has multiple values in field one. So there could be two or more lines returned for each value in field two. Does that explain it? Thanks for your assistance!
TDW
New Member

Re: Pull data from file with script tool?

After reading your response again, I think I need to explain a little more.

I just need to see every combination once where the second field has more than one value in field one.

So from this list:

999 662
474 662
333 662
123 457
474 662
811 363
474 662

return:

999 662
474 662
333 662

That help?
Hein van den Heuvel
Honored Contributor

Re: Pull data from file with script tool?

That's what my final piece of code will do when put into a file and executed as...

perl -n x.pl x

Did you try?

Hein.



James R. Ferguson
Acclaimed Contributor

Re: Pull data from file with script tool?

Hi:

Here's another approach:

# cat ./filter
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
my ( $first, $second );
while (<>) {
( $first, $second ) = split;
push( @{ $seen{$second} }, $first );
}
for $second ( sort keys %seen ) {
next if scalar( @{ $seen{$second} } ) == 1;
for $first ( sort @{ $seen{$second} } ) {
print "$first $second\n";
}
}
1;

...run as:

# sort -u file | ./filter
333 662
474 662
999 662

The code begins by using a 'sort' to eliminate duplicate lines. The Perl script splits the lines into two pieces. The second element is used as a hash key and each first element pushed into an array associated with that key.

When all the data has been assimilated, arrays with only one element are skipped. The remaining elements of each array are then printed as associated with their hash key.

Regards!

...JRF...
Sandman!
Honored Contributor

Re: Pull data from file with script tool?

Try the awk construct below. It does what you are looking for:

awk '
{
x[$2]++
if (f[$2]) {
if ($0!=f[$2]) f[$2]=f[$2]"\n"$0
else delete f[$2]
} else f[$2]=$0
} END {for (i in f) if (x[i]>1) print f[i]}' file
James R. Ferguson
Acclaimed Contributor

Re: Pull data from file with script tool?

Hi (again):

Instead of having to externally sort and pipe the sorted file to the Perl script, you can use this version:

# cat ./filter
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
my ( $first, $second );
open( FH, "-|", "sort", "-u", @ARGV ) or die;
while () {
( $first, $second ) = split;
push( @{ $seen{$second} }, $first );
}
for $second ( sort keys %seen ) {
next if scalar( @{ $seen{$second} } ) == 1;
for $first ( sort @{ $seen{$second} } ) {
print "$first $second\n";
}
}
1;

...now, simply do:

# ./filter file
333 662
474 662
999 662

Regards!

...JRF...
Sandman!
Honored Contributor

Re: Pull data from file with script tool?

Base on the sample input provided viz...

999 662
474 662
333 662
123 457
474 662
811 363
474 662

Disregard my last post and instead use the script posted below...

# sort -u file | awk '{i=$2;x[i]++;f[i]=f[i]?f[i]"\n"$0:$0}END{for(j in f) if(x[j]>1) print f[j]}'
333 662
474 662
999 662

~hope it helps
Hein van den Heuvel
Honored Contributor

Re: Pull data from file with script tool?

( It this horse dead yet? :-)

JRF, Sandman,...

Once one decides on sorting anyway, then sort it properly and there is no longer a need to remember anything but the last value pair! So no 'end' processing is needed.

sort -k 2 -k 1 -u file | awk 's2!=$2 {s1=$1;s2=$2;next} s1!="" {print s1,s2; s1=""} {print}'

explanation:

- sort by second column first, first colum next, eliminate dups.
- feed into awk (or perl or...)
- if saved-second was not current second then save the current value pair, and be done, issueing a 'next' ready for the next matching.
- (else saved-second was equal to current second)
- if saved-first not empty then print saved value pair and ark as printed by clearing saved-first.
- print current line


I like the way JRF makes the 'seen' array entries lists.
Here is a solution using that technique, but not requiring a sort, thos the output will be in order of input, with the requested restrictions:

#!/usr/bin/perl
use strict;
my %seen;
while (<>) {
my ($left,$right)=split;
if (defined $seen{$right}) { # seen before?
my @prior = @{$seen{$right}}; # grab current list
next if grep (/^$left/,@prior); # eleminate dups
print "@prior[0] $right\n" if (1==@prior); # first time?
print;
}
push ( @{$seen{$right}}, $left); # remember this one
}


Regards,
Hein.