Re: Help with script to extract lines from file.

D. Jackson_1 · ‎08-25-2008

Hi,

Having an issue trying to figure out the following.
I have a file with lots of columns in it.
I need to capture column 130, 169, 3, 191, and 200.

My output looks similar to this when I capture the above.

12345678 +27.50 2008-08-15UFSY705226
12345678 -13.25 2008-08-15UFSY705226
12345678 +4.15 2008-08-15UFSY705226
12345678 +30.00 2008-08-15UFSY705226
87654321 +120.55 2008-08-15UFSY710415
87654321 -50.50 2008-08-15UFSY710415
87654321 +10.95 2008-08-15UFSY710415
43215678 +11.50 2008-08-15UFSY710026
43215678 -5.35 2008-08-15UFSY710026
43215678 +10.00 2008-08-15UFSY710026

Final output is supposed to look like this.
The column 2 output addition is where I am stuck.

12345678 +48.40 2008-08-15UFSY705226
87654321 +81.00 2008-08-15UFSY710415
43215678 +16.15 2008-08-15UFSY710026

TIA
D. Jackson

Patrick Wallek · ‎08-25-2008

What defines your "columns"? Are they space delimited? Delimited by some other character?

If space delimited, something like:

# awk '{print $130, $169, $3, $191, $200}' somefilename

should work.

James R. Ferguson · ‎08-25-2008

Hi:

# cat ./mypl
#!/usr/bin/perl
use strict;
use warnings;
my @F;
my @prev;
my ( $sum, $first ) = ( 0, 1 );
while (<>) {
@F = split;
if ( $first == 1 ) {
$first = 0;
(@prev) = (@F);
}
if ( $F[0] eq $prev[0] ) {
if ( $F[1] =~ m{\-(\d+\.\d+)} ) {
$sum -= ($1);
}
elsif ( $F[1] =~ m{\+?(\d+\.\d+)} ) {
$sum += ($1);
}
}
else {
$prev[1] = $sum;
print "@prev\n";
(@prev) = (@F);
$sum = $F[1];
}
}
END {
$prev[1] = $sum;
print "@prev\n";
}
1;

...run as:

# ./mypl file

...using your sample data, this yields:

# ./mypl file
12345678 48.4 2008-08-15UFSY705226
87654321 81 2008-08-15UFSY710415
43215678 16.15 2008-08-15UFSY710026

Regards!

...JRF...

Hein van den Heuvel · ‎08-25-2008

here is what I have using your data 'pasted' into a file x and #paste x x > y

$ perl -lane '$k=$F[3]; $v{$k} += $F[1]; $r{$k}=$F[2].$F[5]}{ for (sort keys %v) {print "$_ $v{$_} $r{$_}"' y
12345678 48.4 2008-08-15UFSY7052262008-08-15UFSY705226
43215678 16.15 2008-08-15UFSY7100262008-08-15UFSY710026
87654321 81 2008-08-15UFSY7104152008-08-15UFSY710415

For the real file that should be
$ perl -lane '$k=$F[130]; $v{$k} += $F[169]; $r{$k}=$F[3].$F[191].$F[200]}{ for (sort keys %v) {print "$_ $v{$_} $r{$_}"' file

$k = key
%v = values
%r = rest of columns

Many questions/suggestions though...

1) How should the rest of the columns be put together? just glue or 'join' with delimitor

2) What if the other columns change? Ignore? Take the first values? Take the last values? (example) Report as issue?

3) If the last column in the input is selected, then add a 'chomp' to the input loop

4) How should the output be ordered? Ignore?Order of arrival? Order of key (example)? Order of summed value?

5) The work perl has to do can be reduced by conditionalizing the storing of 'the rest' based on presence of the key in the value array.

Enjoy,
Hein.

Hein van den Heuvel · ‎08-25-2008

Note... in case that was not clear, JRF's example reports in order of arrival and typically requires the input to be pre-sorted or could report multiple lines with the same key. For now it does not handle selected columns for the 'rest' but that is just SMOP.

The reason I make this comment is to highlight the particual output order.

btw... Unless I am missing something real subbtle, it can also be simplyfied a great deal when realizing the perl will automagically do the + versus - when adding split field.

use strict;
use warnings;
my ( @F, $sum, $key, $text, $previous_text );
my $previous_key = '';
my $format = qq(%s %6.2f %s\n);
while (<>) {
chomp;
@F = split;
$key = $F[3];
if ($key eq $previous_key) {
$sum += $F[1];
} else {
printf ($format, $previous_key, $sum, $previous_text) if $previous_key;
$sum = $F[1];
$previous_key = $key;
$previous_text = $F[0].$F[0];
}
}
printf ($format, $key, $sum, $F[0].$F[0]);

For non-sorted input, and still retaining the order of arrival, you might consider something much similar to my prior one-liner:

#!/usr/bin/perl
use strict;
use warnings;
my (@F, $key, @keys, %value, %text);
while (<>) {
chomp;
@F = split;
my $key = $F[3];
if (!defined $value{$key}) {
push @keys, $key;
$value{$key} = $F[1];
$text{$key} = $F[2].$F[5]; # Alter!
} else {
$value{$key} += $F[1];
}
}
for $key (@keys) {
printf ("%s %6.2f %s\n", $key, $value{$key}, $text{$key});
}

Note... the field numbers XXX in $F[XXX] need to be adjusted to the actual data.

hth,
Hein.

Dennis Handly · ‎08-25-2008

Keeping Hein's comments in mind, you can use this awk script. Here I have assumed you have already rearranged columns 130, 169, 3, 191, and 200. Also, I assume that 3, 191 and 200 are the same if 130 is the same. So it just adds up the new column 2, if the first column is the same:
awk '
BEGIN {
getline # first
S1 = $1
S2 = $2
S3 = $3
}
{
if ($1 == S1) {
S2 += $2 # add
} else {
printf "%s %.2f %s\n", S1, S2, S3
S1 = $1 # save
S2 = $2
S3 = $3
}
}
END { printf "%s %.2f %s\n", S1, S2, S3 } ' file

D. Jackson_1 · ‎08-26-2008

Thanks to all who replied. Your input helped me in resolving my issues.

Very much appreciated.

D. Jackson

D. Jackson_1 · ‎08-26-2008

Thread Closed...

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Help with script to extract lines from file.

Help with script to extract lines from file.