Operating System - Linux
1827812 Members
2124 Online
109969 Solutions
New Discussion

Re: Doing Sum of Colums with awk

 
SOLVED
Go to solution
Chris Frangandonis
Regular Advisor

Doing Sum of Colums with awk

Hi All,

I am stuck with trying to calc certain columns with awk. What I am trying to achieve is the following :

First row
=========
If column 2 is 7 (Data ID = 7) the next yte will e the length of the field (198). Thereafter
the columns start (9 yte length). Following 6 ytes will e 6 (Data ID = 6), following that will
e the next length (1656) with its columns.

Second row
===========
Column 2 is 0 (Data Id= 0) and the next yte is the length 198 (do nothing as there are no columns).
Following that, 6 ytes will e 1 (Data ID = 1) and the next 6 ytes is the length (282).
Thereafter are the columns until the length (282), max 19 columns. The following 6 ytes is 5
(Data ID ==5) and following yte are length (270), max columns 25

Third row
==========
Column 2 is 7 (Data ID =7) and next yte is the length 198 Now add what was in the First row in coloum1
to column 1 in the third row (358316+ 280236) and so on for all other columns. When we get to Data ID= 6 do the same
add all columns from row 1 ( Data ID = 6) to third row where Data Id = 6 of column 1 and so on


Example File
See Attachment

In other words try and add all the Data ID's columns
06-11-2600:151 7 198 358316 20172 116524 200888 2124 16585 117573 189039 8310 0 20 0 92 24 48 2932 2664 0 3692 0 3172
06-11-2600:301 7 198 280236 15376 112812 131512 2144 12743 131538 102357 1122 0 100 0 32 28 8 3792 3420 0 3196 32 2576
06-11-2600:451 7 198 295100 12696 113328 146524 3412 8619 115907 114759 1493 0 280 0 8 0 32 3696 3260 0 2764 0 2300
Sum Data ID 7 933652 48244 342664 sum sum .......

Output

Data ID = 7 933652 48244 342664 $4 $5 $6 $7 $8 etc.................................
Data ID = 6 0 0 0 284 0 0 0 27000 .....
Data ID = 0 (No Columns)
Data ID = 1 0 0 0 0 0 0 0 0 0 71 0
Data ID = 5 0 518 0 0 0 etc

Hope this is clear enough

Many Thanks
Chris
19 REPLIES 19
Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk

Here is a solution using the awk auto-split to fields.
I only tried this on windows, where awk seems to be limited to 100 fields, which is not enough for your need. So you may have to use 'substr' to select a chunk from the input line and split that. (Save line, assing chunk to $0, use $1.. $NF, next chunk)
I'll leave that part of the exercise for you :-)
I left the debuf print statements in there.
I assume the length is always the same for each data-id, or at least the last one counts (sic).

Enjoy!
Hein.

-------------- test.awk ---------

{ if (NF > 4) {
print "nf=", NF;
i = 2;
while (i < NF) {
type = $i++;
if (type > max) { max = type};
size = $i++;
val = $i;
print "type = ", type, "size = ", size, "value = ", val;
fields[type] = size / 9;
for (j=1; j < fields[type]; j++) {
a[type,j] += $i++;
if (i > NF) { j = 999 };
}
}
}
}
END {
for (i=0; i<=max; i++) {
printf ("\nDATA ID = %d", i);
for (j=1; j printf (" %d", a[i,j]);
}
}
}
----------------------------------
C:\Temp>awk test.awk test.txt
nf= 100
type = 7 size = 198 value = 358316
type = 6 size = 1656 value = 0
nf= 51
type = 0 size = 198 value = 1
type = 5 size = 270 value = 0
nf= 100
type = 7 size = 198 value = 280236
type = 6 size = 1656 value = 0
nf= 51
type = 0 size = 198 value = 1
type = 5 size = 270 value = 0
nf= 100
type = 7 size = 198 value = 295100
type = 6 size = 1656 value = 0
nf= 51
type = 0 size = 198 value = 1
type = 5 size = 270 value = 0

DATA ID = 0 3 846 0 0 0 0 0 0 0 0 0 71 0 0 0 0 0 0 0 0 0
DATA ID = 1
DATA ID = 2
DATA ID = 3
DATA ID = 4
DATA ID = 5 0 518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DATA ID = 6 0 0 0 284 0 0 0 27000 0 0 0 0 0 13139 9468 36802 265 1569 30758 71 203 924 4793 64 0 5747 0 109 0 17 0 0 0 0 1 0 0 0 0 1
0 0 27 0 11 2 15 0 0 0 0 0 0 0 26 0 4 2 10 0 0 0 0 0 0 0 7 0 0 5712 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DATA ID = 7 933652 48244 342664 478924 7680 37947 365018 406155 10925 0 400 0 132 52 88 10420 9344 0 9652 32 8048
C:\Temp>
Peter Godron
Honored Contributor

Re: Doing Sum of Colums with awk

Chris,
awk has a limit of 199 fields per record, you may have to work with gawk !
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

Many Thanks. I think you understood what I was getting at. I dont understand your statment (Save line, assing chunk to $0, use $1.. $NF, next chunk)? Could you please explain .
Would it be possible to do the following for eg when 7 use the 198 (length) as substr($3,0,length($3)) and the same for 6 1656 and 0 198 etc !!!!!

Thanks Agian
Chris
Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk


Why would I help you further when you reward the first effort with a slap in the face: '0 points' !

Let's assume that was an accident.

Here is an example using the re-assignment to $0, causing awk to re-evaluate the split.

It 'almost' works. There is some inconsistency in the data, or a tweak needed in the offest calculation.
Basically the data format stinks. It is almost fixed column, but not exactly.

Check this out.
(my time is up for this exercise)



{ if (NR > 1) {
$0 = substr ($0, 15); # skip first column
print "nf=", NF;
while (NF > 1) {
type = $1;
if (type > max) { max = type};
size = $2;
print "*" substr ($0,1,20) "* type = ", type, "size = ", size ;
fields[type] = size / 9;
if (size > 882) { fields[type] = 97 }
for (j=3; j < fields[type] + 2 ; j++) {
a[type,j] += $j;
}
$0 = substr ($0, index($0,$2) + length($2) + size);
}
}
}
END {
for (i=0; i<=max; i++) {
printf ("\nDATA ID = %d", i);
for (j=3; j printf (" %d", a[i,j]);
}
}
}

Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

I dont not realise that I allocated "0" pionts to you. I did initially give 7 points , but seems like my mouce's rolled over to "0". Can we rectify this and how
My Apologise once again

Thanks
Chris
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

I don't not realize that I allocated "0" points to you. I did initially give 7 points , but seems like my mouce's rolled over to "0". Can we rectify this and how
My Apologize once again

Thanks
Chris
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

Thanks a lot. 90% there, some small questions. If it is possible with you, that is to spend a bit more time to continue with this. The reason for this is that I have a couple of questions which could help
1) I get the following error "cannot have more than 199 fields" , I was thinking to FS=" " and then substr. I have only awk installed on the servers and not nawk or gawk.
2) With field one i.e. Data ID = 7 = 9 bytes but for the others it is 6 byte length Data ID = 5,6,61,62,1

Thanks Again
Chris
Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk

A little closer, but not exactly there.

>>> 1) I get the following error "cannot have more than 199 fields" , I was thinking to FS=" " and then substr. I have only awk installed on the servers and not nawk or gawk.

Yeah, you pretty much have to give up on the automatic field seperators, and just go by position. Except... that the positions are not entirely predictable in the provided data.

>>> 2) With field one i.e. Data ID = 7 = 9 bytes but for the others it is 6 byte length Data ID = 5,6,61,62,1

That was NOT clear from the question, but it was clear from the data... now that you mention it. Using the $x fields hides the length.

It also seems the the size for the first series does not include the type + size fields themselve, but for subsequent series they do !?!

Here is a rewrite using offsets and a lot of fudges and adders to try to make it fit.

Good luck!
---------------------------------
{ if (NR > 1) {
line = $0
len = length($0)
print "== " NR " : " len
pos = 15 # skip first column
adder = 10 # first size is exclusive of type/size
while (len - pos > 10) {
$0 = substr (line,pos,20)
fudge = index($0,$2) + length($2)
type = $1
size = $2
seen[type] = 1

print "*" $0 "* type = ", type, "size = ", size
width = (type == 7) ? 9 : 6
fields[type] = int((size - fudge) / width)
for (j=0; j < fields[type]; j++) {
a[type,j] += substr(line, pos + fudge + j*width, width)
}
pos = pos + size + adder + fudge - 10
adder = 0 # after first time just add size
}
}
}
END {
for (i in seen) {
printf ("\nDATA ID=%d, Fields=%d : ", i, fields[i] )
for (j=0; j printf (" %d", a[i,j])
}
}
}


--------------


C:\Temp>awk -f test.awk tmp.txt | more
== 2 : 1878
* 7 198 358316 * type = 7 size = 198
* 6 1656 0 * type = 6 size = 1656
== 3 : 774
* 0 198 * type = 0 size = 198
* 1 282 0 * type = 1 size = 282
* 5 270 0 22* type = 5 size = 270
== 4 : 1878
* 7 198 280236 * type = 7 size = 198
* 6 1656 0 * type = 6 size = 1656
== 5 : 777
* 0 198 * type = 0 size = 198
* 1 282 0 * type = 1 size = 282
* 5 270 0 1* type = 5 size = 270
== 6 : 1878
* 7 198 295100 * type = 7 size = 198
* 6 1656 0 * type = 6 size = 1656
== 7 : 775
* 0 198 * type = 0 size = 198
* 1 282 0 * type = 1 size = 282
* 5 270 0 1* type = 5 size = 270

DATA ID=0, Fields=31 : 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
DATA ID=1, Fields=45 : 0 0 0 0 0 0 0 0 0 71 0 0 0...
DATA ID=5, Fields=43 : 0 518 0 0 0 0 0 0 0 0 0 0 0...
DATA ID=6, Fields=274 : 0 0 0 284 0 0 0 27000 0 0...
0 1 0 0 0 0 1 0 0 27 0 11 2 15 0 0 0 0 0 0 0 26 0 4...
6 0 0 0 0 0 0 0 0 0 469 0 0 0 0 0 1091 0 0 0 0 10 1...
0 13944 0 21 261 0 0 8 0 0 0 0 0 13944 0 0 0 0 0 0...
4 1559 71 0 0 0 0 0 1 6 23 172 5137 4 0 7 65 0 2 10...
0 0 0 0 0 0 0 0 0 24 0 0 0 0 0 0 0 0 0
DATA ID=7, Fields=20 : 933652 48244 342664 478924...
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

Firstly I would like to thank you for time and your assistance with my problem. Your are 99% there but I still get the "cannot have more than 199 fields" I was thinking of zipping the file and attaching it so that you could see that I am not changing any columns. If I reduce the amount of columns for , it works well.
I am using textpad to read the file and not notepad as notepad is a bit different.

What am I doing wrong.
Please help.


Many Thanks
Chris


Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk


Hello Chris,

>> I still get the "cannot have more than 199 fields"

I don't have access to an HPUX box to try this just now. Is that just a warning and it continues or a hard failure?
Awk on windows does not fail, just does not give access to fields over $99

>> I was thinking of zipping the file and attaching it so that you could see that I am not changing any columns.

And that looks much cleaner!

>> I am using textpad to read the file and not notepad as notepad is a bit different.

I'm also using textpad.(and sometimes crimson)
Since you are loooking at the data using windows, how about processing it on windows!?

>> What am I doing wrong.

You are using standard awk which is primitive. You need to learn awk better.. if you are going to use it. You probably should be using perl, the data is 'ugly'. Other than that all is well ! :-).

The main part of the script now uses 'substr' to select fields. It still uses $1 and $2 for free format input. Replace that by substr and then you can call awk with -Fx giving a non-whitespace seperator and it will no longer see all those fields. Up to you to do the column counting!


Cheers,
Hein.
Peter Godron
Honored Contributor

Re: Doing Sum of Colums with awk

Hi,

>> I still get the "cannot have more than 199 fields"

Have you forgotten my inital mail ?
From "man awk" :
" DIAGNOSTICS
awk supports up to 199 fields ($1, $2, ..., $199) per record."

Thats on my HPUX 11.11 box.
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Gordon,

Firstly thank you and yes you are right. Question on gawk
1) Which gawk version and is the site from http://ftp.gnu.org/gnu/gawk/
2) Can Hein's awk script be accommodated to use gawk? I am not to familiar with gawk

Many Thanks
Chris
Peter Godron
Honored Contributor

Re: Doing Sum of Colums with awk

Chris,
http://hpux.connect.org.uk/hppd/hpux/Gnu/gawk-3.1.5/
or
http://mirrors.develooper.com/hpux/

The script should pretty much run as is, but I am not an awk/gawk expert.

Perhaps one of experts has found a way around the 199 field limit ...?
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

Are you familiar with gawk and can you script run with gawk.

Thanks
Chris
Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk

Sorry, can not try anything on hpux just now.
I'm out at a customer site and did not set up my rx2620 for remote access this time.
The 199 error *might* be coming from the debug statement 'print NF'. If so you are in luck. Just delete that line.

Hein.
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,Peter,

I installed gwak on the HPUX and that did the trick. So far 100% done. Thanks Peter.

Hein,
One last and final question:
I need to include the digit before Data ID i.e.
06-11-2600:45"1" 7 198
{ if (NR > 1) {
line = $0
len = length($0)
pos = 14 # Changed from 15 to 14
adder = 10 # first size is exclusive of type/size
while (len - pos > 10) {
$0 = substr (line,pos+1,20)
etc......
reason being :
DATA_Quality
=============
0 DEFAULT default value
1 SECURE data of this integration interval are secure
2 TIME CORR during this cycle the system time was changed by the MML command
3 NO INIT counter could not become initialized
If the value (DATA_Q) is " 0 || 1 " then continue adding or else skip line (dont take into consideration (No adding is necessary).

I must admit Hein, your idea is brilliant, great stuff. I could learn a lot from you.

Thanks Again
Chris



Hein van den Heuvel
Honored Contributor
Solution

Re: Doing Sum of Colums with awk

You could change that first line to explicitly implement the formula in little steps:

{ q = substr ($0,14,1)
if ((NR > 1) and ((q == "1") or (q =="0")) ) {
:

But I would fix it by using the standard awk regular expression filter technique. Change the first line with the IF to read:
/^..-..-....:..[01]/ {

and drop a line with "}" before the END section.

This makes the main section conditional on the line starting with (^) that date pattern, the (:), two anythings and a 0 or 1 ([01]).


>> I must admit Hein, your idea is brilliant, great stuff. I could learn a lot from you.

Thank you.
(For mere money I'll come over and teach some tricks and methods! I'm a gun for hire, consulting primarily in software/database/application performance space :-)

Regards,
Hein van den Heuvel
HvdH Performance Consulting
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hein,

Brilliant stuff. I used the search "/..-[[01]/ and it works. Magic.
One last favor, if I ever need some help which I know I would, will it be possible to mail you my questions. It will be questions once a blue moon (No nagging) and if so please post or mail me your address.

Many Thanks and keep up the good stuff
Chris
Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk

It's in my profile, or google for hein hvdh
[0 points for this please :-]

Hein.