Pull data from file with script tool?

TDW · ‎10-19-2007

If I have a file like this:

123 789
234 500
999 662
373 881
474 662
611 200
515 789
809 424
234 500

I want to return each row that has a value in field two that appears with one or more different values in field one. So from this list, I want:

999 662
474 662
123 789
515 789

I do not want:

234 500
234 500

because that value matches. I want to see when a value repeats in field two with a different value in field one. Make sense?

I'm sure some awk or perl expert can knock this one out quick. Thanks for the help in advance!

Hein van den Heuvel · ‎10-19-2007

What needs to happen with a 3rd value pair?
- when the first pairs where the same "432 500"
- when they were not "456 789"

Here is one way:

perl -n x.pl x
--- x.pl ---
($left,$right)=split;
$prior = $seen{$right};
if (defined $prior) {
if ($left ne $prior) {
print "$prior $right\n$left $right\n";
delete $seen{$right}
}
} else {
$seen{$right} = $left;
}
-------------

So this tosses ignores repeats and tosses a pair once printed. Cast in order or appearance.

Or this....
Sort first by key 2, then look and remember.
This breaks on a 3rd pair as written.

$ sort -k 2 x | perl -ne '($a,$b)=split; print "$x $y\n$a $b\n" if $a ne $x and $b eq $y; $x=$a; $y=$b'
474 662
999 662
123 789
515 789

I think you want this....

----------- x.pl ----------
($left,$right)=split;
$prior = $seen{$right};
if (defined $prior) {
if ($left ne $prior) {
print "$prior $right\n" unless $header{$right}++;
print "$left $right\n";
}
} else {
$seen{$right} = $left;
}
----------------------------

Hein.

TDW · ‎10-19-2007

I want every line returned where the same value in field two has multiple values in field one. So there could be two or more lines returned for each value in field two. Does that explain it? Thanks for your assistance!

TDW · ‎10-19-2007

After reading your response again, I think I need to explain a little more.

I just need to see every combination once where the second field has more than one value in field one.

So from this list:

999 662
474 662
333 662
123 457
474 662
811 363
474 662

return:

999 662
474 662
333 662

That help?

Hein van den Heuvel · ‎10-19-2007

That's what my final piece of code will do when put into a file and executed as...

perl -n x.pl x

Did you try?

Hein.

James R. Ferguson · ‎10-20-2007

Hi:

Here's another approach:

# cat ./filter
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
my ( $first, $second );
while (<>) {
( $first, $second ) = split;
push( @{ $seen{$second} }, $first );
}
for $second ( sort keys %seen ) {
next if scalar( @{ $seen{$second} } ) == 1;
for $first ( sort @{ $seen{$second} } ) {
print "$first $second\n";
}
}
1;

...run as:

# sort -u file | ./filter
333 662
474 662
999 662

The code begins by using a 'sort' to eliminate duplicate lines. The Perl script splits the lines into two pieces. The second element is used as a hash key and each first element pushed into an array associated with that key.

When all the data has been assimilated, arrays with only one element are skipped. The remaining elements of each array are then printed as associated with their hash key.

Regards!

...JRF...

Sandman! · ‎10-20-2007

Try the awk construct below. It does what you are looking for:

awk '
{
x[$2]++
if (f[$2]) {
if ($0!=f[$2]) f[$2]=f[$2]"\n"$0
else delete f[$2]
} else f[$2]=$0
} END {for (i in f) if (x[i]>1) print f[i]}' file

James R. Ferguson · ‎10-20-2007

Hi (again):

Instead of having to externally sort and pipe the sorted file to the Perl script, you can use this version:

# cat ./filter
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
my ( $first, $second );
open( FH, "-|", "sort", "-u", @ARGV ) or die;
while () {
( $first, $second ) = split;
push( @{ $seen{$second} }, $first );
}
for $second ( sort keys %seen ) {
next if scalar( @{ $seen{$second} } ) == 1;
for $first ( sort @{ $seen{$second} } ) {
print "$first $second\n";
}
}
1;

...now, simply do:

# ./filter file
333 662
474 662
999 662

Regards!

...JRF...

Sandman! · ‎10-20-2007

Base on the sample input provided viz...

999 662
474 662
333 662
123 457
474 662
811 363
474 662

Disregard my last post and instead use the script posted below...

# sort -u file | awk '{i=$2;x[i]++;f[i]=f[i]?f[i]"\n"$0:$0}END{for(j in f) if(x[j]>1) print f[j]}'
333 662
474 662
999 662

~hope it helps

Hein van den Heuvel · ‎10-21-2007

( It this horse dead yet? :-)

JRF, Sandman,...

Once one decides on sorting anyway, then sort it properly and there is no longer a need to remember anything but the last value pair! So no 'end' processing is needed.

sort -k 2 -k 1 -u file | awk 's2!=$2 {s1=$1;s2=$2;next} s1!="" {print s1,s2; s1=""} {print}'

explanation:

- sort by second column first, first colum next, eliminate dups.
- feed into awk (or perl or...)
- if saved-second was not current second then save the current value pair, and be done, issueing a 'next' ready for the next matching.
- (else saved-second was equal to current second)
- if saved-first not empty then print saved value pair and ark as printed by clearing saved-first.
- print current line

I like the way JRF makes the 'seen' array entries lists.
Here is a solution using that technique, but not requiring a sort, thos the output will be in order of input, with the requested restrictions:

#!/usr/bin/perl
use strict;
my %seen;
while (<>) {
my ($left,$right)=split;
if (defined $seen{$right}) { # seen before?
my @prior = @{$seen{$right}}; # grab current list
next if grep (/^$left/,@prior); # eleminate dups
print "@prior[0] $right\n" if (1==@prior); # first time?
print;
}
push ( @{$seen{$right}}, $left); # remember this one
}

Regards,
Hein.

TDW · ‎10-21-2007

Thanks for the replies guys! All three of you came up with a viable solution via different methods. They all return these values from my original test data:

474 662
999 662
123 789
515 789

Hein, the last code you gave seems to start running and then hang , like this:

perl -n perlHein.pl /tmp/testdata
999 662
474 662

Not sure why that is happening.

Thanks much for sharing your skill!

Hein van den Heuvel · ‎10-21-2007

in my first solutions I used "perl -n x.pl x

The x.pl is the perl script text
The x was the data file name
The -n tell perls to create an implied loop to process each input line.

The last solution was presented as a full program, invoking perl itself, and with the loop explicitly coded.
Loop: while (<>) {

That solution should be invoked with:
./script_name file_name
Sorry for not making that clear.
And I forgot to mark the 'retain formattng' option while posting, so the loop was not visibile with the indenting either!

The 'hang' is reading from STDIN for more data to process.

Sorry 'bout that confusion!
Hein.

Sandman! · ‎10-21-2007

Agree with JRF that there is no need to first sort the file and then pipe it to an external program when the entire procedure can be scripted within awk. That said here is an improved version of the awk construct posted earlier:

# cat file
123 789
234 500
999 662
373 881
474 662
611 200
515 789
809 424
234 500

# awk '{i=$2;if(x[i] && x[i]!=$1) print x[i],i"\n"$0;x[i]=$1}' file
999 662
474 662
123 789
515 789

Hein van den Heuvel · ‎10-21-2007

Sandman!

Two rules to practice thsi week...

1) As simple as possible, but no simpler
2) leave well enough alone

Your new examples only work for certain data sets. It's much similar to my first solution and fails for much the same reason
It remembers only left value for a given right value, where there might be a list.
And on each 'fresh' value it reprints the save values potentially making fresh dups as it goes. Try this data set...
999 662
474 662
333 662
123 457
474 662
811 363
474 662
123 789
234 500
999 662
373 881
474 662
611 200
515 789
809 424
234 500

Cheers,
Hein.

Sandman! · ‎10-22-2007

Good point Hein as I found out on closer inspection and with the dataset you supplied. And yes your Perl code is algorithmically similar to the one I wrote in awk.

Each record in the input file is replaced by the succeeding one and since the file is not sorted the dups fall thru the cracks. But this horse ain't dead yet ;) so here is the final version of the awk construct.

awk '{
if (x[$0]!=$0) {
y[$2]=(y[$2]?y[$2]"\n"$0:$0); z[$2]++
} x[$0]=$0
}END{
for (i in y) if (z[i]>1) print y[i]
}' file

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Pull data from file with script tool?

Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?

Re: Pull data from file with script tool?