Operating System - HP-UX
1752280 Members
4901 Online
108786 Solutions
New Discussion юеВ

Re: Help with script to extract lines from file.

 
SOLVED
Go to solution
D. Jackson_1
Honored Contributor

Help with script to extract lines from file.

Hi,

Having an issue trying to figure out the following.
I have a file with lots of columns in it.
I need to capture column 130, 169, 3, 191, and 200.

My output looks similar to this when I capture the above.

12345678 +27.50 2008-08-15UFSY705226
12345678 -13.25 2008-08-15UFSY705226
12345678 +4.15 2008-08-15UFSY705226
12345678 +30.00 2008-08-15UFSY705226
87654321 +120.55 2008-08-15UFSY710415
87654321 -50.50 2008-08-15UFSY710415
87654321 +10.95 2008-08-15UFSY710415
43215678 +11.50 2008-08-15UFSY710026
43215678 -5.35 2008-08-15UFSY710026
43215678 +10.00 2008-08-15UFSY710026

Final output is supposed to look like this.
The column 2 output addition is where I am stuck.

12345678 +48.40 2008-08-15UFSY705226
87654321 +81.00 2008-08-15UFSY710415
43215678 +16.15 2008-08-15UFSY710026

TIA
D. Jackson
7 REPLIES 7
Patrick Wallek
Honored Contributor

Re: Help with script to extract lines from file.

What defines your "columns"? Are they space delimited? Delimited by some other character?

If space delimited, something like:

# awk '{print $130, $169, $3, $191, $200}' somefilename

should work.
James R. Ferguson
Acclaimed Contributor
Solution

Re: Help with script to extract lines from file.

Hi:

# cat ./mypl
#!/usr/bin/perl
use strict;
use warnings;
my @F;
my @prev;
my ( $sum, $first ) = ( 0, 1 );
while (<>) {
@F = split;
if ( $first == 1 ) {
$first = 0;
(@prev) = (@F);
}
if ( $F[0] eq $prev[0] ) {
if ( $F[1] =~ m{\-(\d+\.\d+)} ) {
$sum -= ($1);
}
elsif ( $F[1] =~ m{\+?(\d+\.\d+)} ) {
$sum += ($1);
}
}
else {
$prev[1] = $sum;
print "@prev\n";
(@prev) = (@F);
$sum = $F[1];
}
}
END {
$prev[1] = $sum;
print "@prev\n";
}
1;

...run as:

# ./mypl file

...using your sample data, this yields:

# ./mypl file
12345678 48.4 2008-08-15UFSY705226
87654321 81 2008-08-15UFSY710415
43215678 16.15 2008-08-15UFSY710026

Regards!

...JRF...
Hein van den Heuvel
Honored Contributor

Re: Help with script to extract lines from file.

here is what I have using your data 'pasted' into a file x and #paste x x > y

$ perl -lane '$k=$F[3]; $v{$k} += $F[1]; $r{$k}=$F[2].$F[5]}{ for (sort keys %v) {print "$_ $v{$_} $r{$_}"' y
12345678 48.4 2008-08-15UFSY7052262008-08-15UFSY705226
43215678 16.15 2008-08-15UFSY7100262008-08-15UFSY710026
87654321 81 2008-08-15UFSY7104152008-08-15UFSY710415

For the real file that should be
$ perl -lane '$k=$F[130]; $v{$k} += $F[169]; $r{$k}=$F[3].$F[191].$F[200]}{ for (sort keys %v) {print "$_ $v{$_} $r{$_}"' file

$k = key
%v = values
%r = rest of columns

Many questions/suggestions though...

1) How should the rest of the columns be put together? just glue or 'join' with delimitor

2) What if the other columns change? Ignore? Take the first values? Take the last values? (example) Report as issue?

3) If the last column in the input is selected, then add a 'chomp' to the input loop

4) How should the output be ordered? Ignore?Order of arrival? Order of key (example)? Order of summed value?

5) The work perl has to do can be reduced by conditionalizing the storing of 'the rest' based on presence of the key in the value array.

Enjoy,
Hein.
Hein van den Heuvel
Honored Contributor

Re: Help with script to extract lines from file.


Note... in case that was not clear, JRF's example reports in order of arrival and typically requires the input to be pre-sorted or could report multiple lines with the same key. For now it does not handle selected columns for the 'rest' but that is just SMOP.

The reason I make this comment is to highlight the particual output order.

btw... Unless I am missing something real subbtle, it can also be simplyfied a great deal when realizing the perl will automagically do the + versus - when adding split field.


use strict;
use warnings;
my ( @F, $sum, $key, $text, $previous_text );
my $previous_key = '';
my $format = qq(%s %6.2f %s\n);
while (<>) {
chomp;
@F = split;
$key = $F[3];
if ($key eq $previous_key) {
$sum += $F[1];
} else {
printf ($format, $previous_key, $sum, $previous_text) if $previous_key;
$sum = $F[1];
$previous_key = $key;
$previous_text = $F[0].$F[0];
}
}
printf ($format, $key, $sum, $F[0].$F[0]);



For non-sorted input, and still retaining the order of arrival, you might consider something much similar to my prior one-liner:

#!/usr/bin/perl
use strict;
use warnings;
my (@F, $key, @keys, %value, %text);
while (<>) {
chomp;
@F = split;
my $key = $F[3];
if (!defined $value{$key}) {
push @keys, $key;
$value{$key} = $F[1];
$text{$key} = $F[2].$F[5]; # Alter!
} else {
$value{$key} += $F[1];
}
}
for $key (@keys) {
printf ("%s %6.2f %s\n", $key, $value{$key}, $text{$key});
}

Note... the field numbers XXX in $F[XXX] need to be adjusted to the actual data.


hth,
Hein.
Dennis Handly
Acclaimed Contributor

Re: Help with script to extract lines from file.

Keeping Hein's comments in mind, you can use this awk script. Here I have assumed you have already rearranged columns 130, 169, 3, 191, and 200. Also, I assume that 3, 191 and 200 are the same if 130 is the same. So it just adds up the new column 2, if the first column is the same:
awk '
BEGIN {
getline # first
S1 = $1
S2 = $2
S3 = $3
}
{
if ($1 == S1) {
S2 += $2 # add
} else {
printf "%s %.2f %s\n", S1, S2, S3
S1 = $1 # save
S2 = $2
S3 = $3
}
}
END { printf "%s %.2f %s\n", S1, S2, S3 } ' file
D. Jackson_1
Honored Contributor

Re: Help with script to extract lines from file.

Thanks to all who replied. Your input helped me in resolving my issues.

Very much appreciated.

D. Jackson
D. Jackson_1
Honored Contributor

Re: Help with script to extract lines from file.

Thread Closed...