Operating System - HP-UX
1777311 Members
3084 Online
109068 Solutions
New Discussion юеВ

Re: extraction of information

 
lublinsky
New Member

extraction of information

Would you be so kind to help me? I need to extract from an txt file some strings (for example number 10, 16 and 19). Each string contains 4 numbers only:
3 5 7 8
3 5 3 8
2 6 8 9


how can I do that?
Can I calculate average of each column then?
Thank you in advance.

3 REPLIES 3
James R. Ferguson
Acclaimed Contributor

Re: extraction of information

Hi:

OK, let's leverage Perl. We'll skip any record that we read that is composed of more than four fields. We'll skip any record whose four fields are not digits only. Then, we'll compute and print the average of each column as requested.

# cat ./average
#!/usr/bin/perl
use strict;
use warnings;
my (@nums, @sum);
my ($recs, $skip, $i);
while (<>) {
@nums=split;
$skip=0;
next unless $#nums eq 3;
foreach (@nums) {
$skip++ unless m/^\d+\z/;
}
next if $skip;
$recs++;
for ($i=0; $i < 4; $i++) {
$sum[$i] += $nums[$i];
}
}
for ($i=0; $i < 4; $i++) {
print "col-", $i+1, " = ", $sum[$i] / $recs, "\n";
}
1;

...Run as :

# ./average inputfile

(or):

# ./average

[ and enter your data on the commandline; CTL_D to end (usually) ]

For example:

# ./average
1 2 3 4
skip this line
1 2 3 4
10 10 10 10 will not be processed
0 0 0 0
col-1 = 0.666666666666667
col-2 = 1.33333333333333
col-3 = 2
col-4 = 2.66666666666667
#

Regards!

...JRF...
Hein van den Heuvel
Honored Contributor

Re: extraction of information


Lublinsky, Welcome to the HP ITRC forum for HPUX.

I'm afraid your question is not clear enough for me.
I don't see how those numbers 10,16 and 19 relate to the table provided.

extracting string is readily done with AWK, CUT, and my favoure: perl.

For example, in AWK for each line read, each column becomes a field $1, $2... $NF where NF is the number of fields. adn the numbes can be variables.
So to calculate the averages you coudl use am awk one-liner:

# cat x
3 5 7 8
3 5 3 8
2 6 8 9
# awk '{lines++; for (i=1;i<=NF;i++) {c[i]+=$i}} END {for(i=1;i<=4;i++) {print c[i]/lines}}' x

c[i] is an aray c indexed by i where 1 runs from 1 to the number of fields.
lines maintains a running coutn of the lines read
END{bllock} tell awk to execute 'block' when at end of input.

hth,
Hein.


(same line with debug data:
awk '{lines++; for (i=1;i<=NF;i++) {c[i]+=$i; print lines, i, $i, c[i]}}END {for(i=1;i<=4;i++) {print c[i]/lines}}' x




Sandman!
Honored Contributor

Re: extraction of information

Could you elaborate on your requirements as your post does not make it clear as to waht you are looking for?

thanks!