Operating System - Linux
1752774 Members
4733 Online
108789 Solutions
New Discussion юеВ

Re: Doing Sum of Colums with awk

 
SOLVED
Go to solution
Chris Frangandonis
Regular Advisor

Doing Sum of Colums with awk

Hi All,

I am stuck with trying to calc certain columns with awk. What I am trying to achieve is the following :

First row
=========
If column 2 is 7 (Data ID = 7) the next yte will e the length of the field (198). Thereafter
the columns start (9 yte length). Following 6 ytes will e 6 (Data ID = 6), following that will
e the next length (1656) with its columns.

Second row
===========
Column 2 is 0 (Data Id= 0) and the next yte is the length 198 (do nothing as there are no columns).
Following that, 6 ytes will e 1 (Data ID = 1) and the next 6 ytes is the length (282).
Thereafter are the columns until the length (282), max 19 columns. The following 6 ytes is 5
(Data ID ==5) and following yte are length (270), max columns 25

Third row
==========
Column 2 is 7 (Data ID =7) and next yte is the length 198 Now add what was in the First row in coloum1
to column 1 in the third row (358316+ 280236) and so on for all other columns. When we get to Data ID= 6 do the same
add all columns from row 1 ( Data ID = 6) to third row where Data Id = 6 of column 1 and so on


Example File
See Attachment

In other words try and add all the Data ID's columns
06-11-2600:151 7 198 358316 20172 116524 200888 2124 16585 117573 189039 8310 0 20 0 92 24 48 2932 2664 0 3692 0 3172
06-11-2600:301 7 198 280236 15376 112812 131512 2144 12743 131538 102357 1122 0 100 0 32 28 8 3792 3420 0 3196 32 2576
06-11-2600:451 7 198 295100 12696 113328 146524 3412 8619 115907 114759 1493 0 280 0 8 0 32 3696 3260 0 2764 0 2300
Sum Data ID 7 933652 48244 342664 sum sum .......

Output

Data ID = 7 933652 48244 342664 $4 $5 $6 $7 $8 etc.................................
Data ID = 6 0 0 0 284 0 0 0 27000 .....
Data ID = 0 (No Columns)
Data ID = 1 0 0 0 0 0 0 0 0 0 71 0
Data ID = 5 0 518 0 0 0 etc

Hope this is clear enough

Many Thanks
Chris
19 REPLIES 19
Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk

Here is a solution using the awk auto-split to fields.
I only tried this on windows, where awk seems to be limited to 100 fields, which is not enough for your need. So you may have to use 'substr' to select a chunk from the input line and split that. (Save line, assing chunk to $0, use $1.. $NF, next chunk)
I'll leave that part of the exercise for you :-)
I left the debuf print statements in there.
I assume the length is always the same for each data-id, or at least the last one counts (sic).

Enjoy!
Hein.

-------------- test.awk ---------

{ if (NF > 4) {
print "nf=", NF;
i = 2;
while (i < NF) {
type = $i++;
if (type > max) { max = type};
size = $i++;
val = $i;
print "type = ", type, "size = ", size, "value = ", val;
fields[type] = size / 9;
for (j=1; j < fields[type]; j++) {
a[type,j] += $i++;
if (i > NF) { j = 999 };
}
}
}
}
END {
for (i=0; i<=max; i++) {
printf ("\nDATA ID = %d", i);
for (j=1; j printf (" %d", a[i,j]);
}
}
}
----------------------------------
C:\Temp>awk test.awk test.txt
nf= 100
type = 7 size = 198 value = 358316
type = 6 size = 1656 value = 0
nf= 51
type = 0 size = 198 value = 1
type = 5 size = 270 value = 0
nf= 100
type = 7 size = 198 value = 280236
type = 6 size = 1656 value = 0
nf= 51
type = 0 size = 198 value = 1
type = 5 size = 270 value = 0
nf= 100
type = 7 size = 198 value = 295100
type = 6 size = 1656 value = 0
nf= 51
type = 0 size = 198 value = 1
type = 5 size = 270 value = 0

DATA ID = 0 3 846 0 0 0 0 0 0 0 0 0 71 0 0 0 0 0 0 0 0 0
DATA ID = 1
DATA ID = 2
DATA ID = 3
DATA ID = 4
DATA ID = 5 0 518 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DATA ID = 6 0 0 0 284 0 0 0 27000 0 0 0 0 0 13139 9468 36802 265 1569 30758 71 203 924 4793 64 0 5747 0 109 0 17 0 0 0 0 1 0 0 0 0 1
0 0 27 0 11 2 15 0 0 0 0 0 0 0 26 0 4 2 10 0 0 0 0 0 0 0 7 0 0 5712 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DATA ID = 7 933652 48244 342664 478924 7680 37947 365018 406155 10925 0 400 0 132 52 88 10420 9344 0 9652 32 8048
C:\Temp>
Peter Godron
Honored Contributor

Re: Doing Sum of Colums with awk

Chris,
awk has a limit of 199 fields per record, you may have to work with gawk !
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

Many Thanks. I think you understood what I was getting at. I dont understand your statment (Save line, assing chunk to $0, use $1.. $NF, next chunk)? Could you please explain .
Would it be possible to do the following for eg when 7 use the 198 (length) as substr($3,0,length($3)) and the same for 6 1656 and 0 198 etc !!!!!

Thanks Agian
Chris
Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk


Why would I help you further when you reward the first effort with a slap in the face: '0 points' !

Let's assume that was an accident.

Here is an example using the re-assignment to $0, causing awk to re-evaluate the split.

It 'almost' works. There is some inconsistency in the data, or a tweak needed in the offest calculation.
Basically the data format stinks. It is almost fixed column, but not exactly.

Check this out.
(my time is up for this exercise)



{ if (NR > 1) {
$0 = substr ($0, 15); # skip first column
print "nf=", NF;
while (NF > 1) {
type = $1;
if (type > max) { max = type};
size = $2;
print "*" substr ($0,1,20) "* type = ", type, "size = ", size ;
fields[type] = size / 9;
if (size > 882) { fields[type] = 97 }
for (j=3; j < fields[type] + 2 ; j++) {
a[type,j] += $j;
}
$0 = substr ($0, index($0,$2) + length($2) + size);
}
}
}
END {
for (i=0; i<=max; i++) {
printf ("\nDATA ID = %d", i);
for (j=3; j printf (" %d", a[i,j]);
}
}
}

Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

I dont not realise that I allocated "0" pionts to you. I did initially give 7 points , but seems like my mouce's rolled over to "0". Can we rectify this and how
My Apologise once again

Thanks
Chris
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

I don't not realize that I allocated "0" points to you. I did initially give 7 points , but seems like my mouce's rolled over to "0". Can we rectify this and how
My Apologize once again

Thanks
Chris
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

Thanks a lot. 90% there, some small questions. If it is possible with you, that is to spend a bit more time to continue with this. The reason for this is that I have a couple of questions which could help
1) I get the following error "cannot have more than 199 fields" , I was thinking to FS=" " and then substr. I have only awk installed on the servers and not nawk or gawk.
2) With field one i.e. Data ID = 7 = 9 bytes but for the others it is 6 byte length Data ID = 5,6,61,62,1

Thanks Again
Chris
Hein van den Heuvel
Honored Contributor

Re: Doing Sum of Colums with awk

A little closer, but not exactly there.

>>> 1) I get the following error "cannot have more than 199 fields" , I was thinking to FS=" " and then substr. I have only awk installed on the servers and not nawk or gawk.

Yeah, you pretty much have to give up on the automatic field seperators, and just go by position. Except... that the positions are not entirely predictable in the provided data.

>>> 2) With field one i.e. Data ID = 7 = 9 bytes but for the others it is 6 byte length Data ID = 5,6,61,62,1

That was NOT clear from the question, but it was clear from the data... now that you mention it. Using the $x fields hides the length.

It also seems the the size for the first series does not include the type + size fields themselve, but for subsequent series they do !?!

Here is a rewrite using offsets and a lot of fudges and adders to try to make it fit.

Good luck!
---------------------------------
{ if (NR > 1) {
line = $0
len = length($0)
print "== " NR " : " len
pos = 15 # skip first column
adder = 10 # first size is exclusive of type/size
while (len - pos > 10) {
$0 = substr (line,pos,20)
fudge = index($0,$2) + length($2)
type = $1
size = $2
seen[type] = 1

print "*" $0 "* type = ", type, "size = ", size
width = (type == 7) ? 9 : 6
fields[type] = int((size - fudge) / width)
for (j=0; j < fields[type]; j++) {
a[type,j] += substr(line, pos + fudge + j*width, width)
}
pos = pos + size + adder + fudge - 10
adder = 0 # after first time just add size
}
}
}
END {
for (i in seen) {
printf ("\nDATA ID=%d, Fields=%d : ", i, fields[i] )
for (j=0; j printf (" %d", a[i,j])
}
}
}


--------------


C:\Temp>awk -f test.awk tmp.txt | more
== 2 : 1878
* 7 198 358316 * type = 7 size = 198
* 6 1656 0 * type = 6 size = 1656
== 3 : 774
* 0 198 * type = 0 size = 198
* 1 282 0 * type = 1 size = 282
* 5 270 0 22* type = 5 size = 270
== 4 : 1878
* 7 198 280236 * type = 7 size = 198
* 6 1656 0 * type = 6 size = 1656
== 5 : 777
* 0 198 * type = 0 size = 198
* 1 282 0 * type = 1 size = 282
* 5 270 0 1* type = 5 size = 270
== 6 : 1878
* 7 198 295100 * type = 7 size = 198
* 6 1656 0 * type = 6 size = 1656
== 7 : 775
* 0 198 * type = 0 size = 198
* 1 282 0 * type = 1 size = 282
* 5 270 0 1* type = 5 size = 270

DATA ID=0, Fields=31 : 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
DATA ID=1, Fields=45 : 0 0 0 0 0 0 0 0 0 71 0 0 0...
DATA ID=5, Fields=43 : 0 518 0 0 0 0 0 0 0 0 0 0 0...
DATA ID=6, Fields=274 : 0 0 0 284 0 0 0 27000 0 0...
0 1 0 0 0 0 1 0 0 27 0 11 2 15 0 0 0 0 0 0 0 26 0 4...
6 0 0 0 0 0 0 0 0 0 469 0 0 0 0 0 1091 0 0 0 0 10 1...
0 13944 0 21 261 0 0 8 0 0 0 0 0 13944 0 0 0 0 0 0...
4 1559 71 0 0 0 0 0 1 6 23 172 5137 4 0 7 65 0 2 10...
0 0 0 0 0 0 0 0 0 24 0 0 0 0 0 0 0 0 0
DATA ID=7, Fields=20 : 933652 48244 342664 478924...
Chris Frangandonis
Regular Advisor

Re: Doing Sum of Colums with awk

Hi Hein,

Firstly I would like to thank you for time and your assistance with my problem. Your are 99% there but I still get the "cannot have more than 199 fields" I was thinking of zipping the file and attaching it so that you could see that I am not changing any columns. If I reduce the amount of columns for , it works well.
I am using textpad to read the file and not notepad as notepad is a bit different.

What am I doing wrong.
Please help.


Many Thanks
Chris