1847199 Members
3198 Online
110263 Solutions
New Discussion

Re: AWK problem

 
SOLVED
Go to solution
SKSingh_1
Frequent Advisor

AWK problem

I have following input:
Date time utilization
10/27/2007 00:00 13.82
10/27/2007 00:01 20.95
10/27/2007 00:02 10.79
10/27/2007 00:03 11.78
10/27/2007 00:04 11.77
10/27/2007 00:05 9.96
10/27/2007 00:06 12.69
10/27/2007 00:07 18.94
10/27/2007 00:08 17.91
10/27/2007 00:09 18.00

desired output to show per 5 min utilization
10/27/2007 00:00
10/27/2007 00:05
11 REPLIES 11
Hasan  Atasoy
Honored Contributor

Re: AWK problem

if you put that data in a file named as file fallowing shell script help you.

#!/bin/sh
let cnt=0
let avg=0
let sum_util=0
cat file | while read date time util
do
let sum_util=$sum_util+$util
let cnt=$cnt+1
if [ $cnt -eq 5 ]; then
let gen_avg=$sum_util/5;
echo "$date\t$time\t$gen_avg"
let cnt=0
fi
done

my 10 cents please :)
SKSingh_1
Frequent Advisor

Re: AWK problem

problem is relatively complex

equirement is to take date and time in consideration

so, when solution should work even when input is
10/27/2006 00:00 13.82
10/27/2006 00:01 20.95
10/28/2006 00:01 10.79
10/28/2006 00:03 11.78
10/28/2006 00:04 11.77
10/30/2006 00:08 9.96
10/30/2006 00:09 12.69
10/30/2006 00:10 18.94

in this case output should
10/27/2006 00:00
10/28/2006 00:00
10/30/2006 00:05
James R. Ferguson
Acclaimed Contributor

Re: AWK problem

Hi:

Your question/request has so little definition as not to merit a suggestion until you better define your objective.

Your first post said "...per 5 minute utilization" but your second post showed data with only three, very long hour periods per day.

Do you want the average of up to five intervals if they are confined to a discrete day? Is the average simply the sum of the observed values divided by then number of samples? Or, is the average the sum of the observed values divided by the total time elapsed for the (up to five) samples?

Don't make the reader guess what you think you want. Post some real input and some manually calculated output to match it.

Lastly, unlike your previous post closures, don't say "thanks, I got it" without indicating _which_ solution satisfied your original question (and why). Doing so enhances the quality of the thread for those who follow.

Regards!

...JRF...
SKSingh_1
Frequent Advisor

Re: AWK problem

Hi james, Thank you for suggestion I will take care of it.

In given poblem the process data collected may vary over time. some times pocess just executed for 1-2 minutes and some time for number of minutes.

Now requirement is to get average utilization fo each completed instance.

for example: I have posted data collection and desired output of solution.
Hein van den Heuvel
Honored Contributor
Solution

Re: AWK problem

The challenge in awk is to recognize a time range to correspond with a measurement.
Normally you'd like to use something like PERL's timelocal() function to map to seconds and chop into a range.
With that it would be easy to see with integer math whether a new record is a certain (5) number of seconds beyond the range or not.
We can not 'subtract' the string time to see whether 5 seconds have elapsed.

If we may assume that the records are nicely in order, then we may have a string workaround. For each input line we can find the string which corresponds to the begin of a range.
Now remember the current range.
If a record no longer has the same range (doesn't matter whether it is the same day or hour still!) then report what was accumulated so far.
And of course report the last range after the last record read.

For example, check out a potential solution below. (Remove debug information for production version).

$ cat x.awk
function range_report() {
printf ("-- %s %d %5.2f\n", old_range, range_count, range_value/range_count);
}
/^[0-9]+\// { # Starts with a date?
split ( $2, x, ":" ); # hours and minutes
range = sprintf ("%s %02d:%02d", $1, x[1], 5*int(x[2]/5))
# print range, x[1], x[2], $0;
if (range == old_range) {
range_count++;
range_value += $3;
} else {
if (range_count) range_report();
range_count = 1;
range_value = $3;
old_range = range;
}
}
END {range_report()}
$ cat x
10/27/2007 00:00 13.82
10/27/2007 00:01 20.95
10/27/2007 00:02 10.79
10/27/2007 00:03 11.78
10/27/2007 00:04 11.77
10/27/2007 00:05 9.96
10/27/2007 00:06 12.69
10/27/2007 00:07 18.94
10/27/2007 08:28 17.91
10/27/2007 08:29 18.00
10/28/2006 00:01 10.79
10/28/2006 00:03 11.78
10/28/2006 00:04 11.77
10/30/2006 00:08 9.96
10/30/2006 00:09 12.69
10/30/2006 00:10 18.94
$ awk -f x.awk x
-- 10/27/2007 00:00 5 13.82
-- 10/27/2007 00:05 3 13.86
-- 10/27/2007 08:25 2 17.95
-- 10/28/2006 00:00 3 11.45
-- 10/30/2006 00:05 2 11.32
-- 10/30/2006 00:10 1 18.94

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
SKSingh_1
Frequent Advisor

Re: AWK problem

"PERFECT" solution hein.
This is what I was looking for.

Thank you
Sandman!
Honored Contributor

Re: AWK problem

Try the awk construct below. It breaks up the 2nd field into an hour and minute field and it accrues the sum of the last field until a multiple of five comes across in the minute field after which the process is repeated again.

awk '{
m=z[split($2,z,":")]
h=z[1]
v=int(m/5)*5
i=(v<10?$1" "h":"0v:$1" "h":"v)
p[i]+=$NF
}END{for(i in p) print i,p[i]/5}' file
Hein van den Heuvel
Honored Contributor

Re: AWK problem

Not bad Sandman! Not bad.

I think the request was for teh average not to divide by 5 (minutes) but by the number of occurances. An other array is needed for that.
(or a single array storing a more complex value with sum and count)

Minor comment: I believe you can not exactly control the order of output.

Also, very minor, I would replace the hard-to-read conditional assign of 'i' with a sprintf.

awk '{
m=z[split($2,z,":")]
h=z[1]
i=sprintf ("%s %s:%02d",$1,h,int(m/5)*5)
p[i]+=$NF
count[i]++
}END{for(i in p) print i,p[i]/count[i]}' file

Hein.
Arturo Galbiati
Esteemed Contributor

Re: AWK problem

Hi,
hi Heine, you're rigth, to fic issue about input and order of otput:

awk ' /^[0-9]+\// {
m=z[split($2,z,":")]
h=z[1]
i=sprintf ("%s %s:%02d",$1,h,int(m/5)*5)
p[i]+=$NF
count[i]++
}
END {for(i in p) print i,p[i]/count[i]}' f1|\
sort -t "/" -k3 -k1 -k2

Just my .2$
HTH,
Art
Sandman!
Honored Contributor

Re: AWK problem

Well Hein and Art thanks for pointing out the subtleties that I overlooked. And yes the sprintf function makes the code legible instead of the ternary operator.

>http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1158441<

BTW solution in above thread is truly "the work of Art" ;)

And here's the improved version of the awk construct I posted earlier.

awk '{
m=z[split($2,z,":")]
h=z[1]
i=sprintf ("%s %s:%.2d",$1,h,int(m/5)*5)
p[i]+=$NF
c[i]++
}END{for(i in p) print i,p[i]/c[i] | "sort -t/ -k3,3.4"}' file
SKSingh_1
Frequent Advisor

Re: AWK problem

Great people great solution.

I have used first solution from Hein and its working.

I havn't checked other solution. Points given to other solution on basis of improved comments.