Re: AWK problem

SKSingh_1 · ‎10-27-2007

I have following input:
Date time utilization
10/27/2007 00:00 13.82
10/27/2007 00:01 20.95
10/27/2007 00:02 10.79
10/27/2007 00:03 11.78
10/27/2007 00:04 11.77
10/27/2007 00:05 9.96
10/27/2007 00:06 12.69
10/27/2007 00:07 18.94
10/27/2007 00:08 17.91
10/27/2007 00:09 18.00

desired output to show per 5 min utilization
10/27/2007 00:00
10/27/2007 00:05

Hasan Atasoy · ‎10-27-2007

if you put that data in a file named as file fallowing shell script help you.

#!/bin/sh
let cnt=0
let avg=0
let sum_util=0
cat file | while read date time util
do
let sum_util=$sum_util+$util
let cnt=$cnt+1
if [ $cnt -eq 5 ]; then
let gen_avg=$sum_util/5;
echo "$date\t$time\t$gen_avg"
let cnt=0
fi
done

my 10 cents please :)

SKSingh_1 · ‎10-27-2007

problem is relatively complex

equirement is to take date and time in consideration

so, when solution should work even when input is
10/27/2006 00:00 13.82
10/27/2006 00:01 20.95
10/28/2006 00:01 10.79
10/28/2006 00:03 11.78
10/28/2006 00:04 11.77
10/30/2006 00:08 9.96
10/30/2006 00:09 12.69
10/30/2006 00:10 18.94

in this case output should
10/27/2006 00:00
10/28/2006 00:00
10/30/2006 00:05

James R. Ferguson · ‎10-28-2007

Hi:

Your question/request has so little definition as not to merit a suggestion until you better define your objective.

Your first post said "...per 5 minute utilization" but your second post showed data with only three, very long hour periods per day.

Do you want the average of up to five intervals if they are confined to a discrete day? Is the average simply the sum of the observed values divided by then number of samples? Or, is the average the sum of the observed values divided by the total time elapsed for the (up to five) samples?

Don't make the reader guess what you think you want. Post some real input and some manually calculated output to match it.

Lastly, unlike your previous post closures, don't say "thanks, I got it" without indicating _which_ solution satisfied your original question (and why). Doing so enhances the quality of the thread for those who follow.

Regards!

...JRF...

SKSingh_1 · ‎10-28-2007

Hi james, Thank you for suggestion I will take care of it.

In given poblem the process data collected may vary over time. some times pocess just executed for 1-2 minutes and some time for number of minutes.

Now requirement is to get average utilization fo each completed instance.

for example: I have posted data collection and desired output of solution.

Hein van den Heuvel · ‎10-28-2007

The challenge in awk is to recognize a time range to correspond with a measurement.
Normally you'd like to use something like PERL's timelocal() function to map to seconds and chop into a range.
With that it would be easy to see with integer math whether a new record is a certain (5) number of seconds beyond the range or not.
We can not 'subtract' the string time to see whether 5 seconds have elapsed.

If we may assume that the records are nicely in order, then we may have a string workaround. For each input line we can find the string which corresponds to the begin of a range.
Now remember the current range.
If a record no longer has the same range (doesn't matter whether it is the same day or hour still!) then report what was accumulated so far.
And of course report the last range after the last record read.

For example, check out a potential solution below. (Remove debug information for production version).

$ cat x.awk
function range_report() {
printf ("-- %s %d %5.2f\n", old_range, range_count, range_value/range_count);
}
/^[0-9]+\// { # Starts with a date?
split ( $2, x, ":" ); # hours and minutes
range = sprintf ("%s %02d:%02d", $1, x[1], 5*int(x[2]/5))
# print range, x[1], x[2], $0;
if (range == old_range) {
range_count++;
range_value += $3;
} else {
if (range_count) range_report();
range_count = 1;
range_value = $3;
old_range = range;
}
}
END {range_report()}
$ cat x
10/27/2007 00:00 13.82
10/27/2007 00:01 20.95
10/27/2007 00:02 10.79
10/27/2007 00:03 11.78
10/27/2007 00:04 11.77
10/27/2007 00:05 9.96
10/27/2007 00:06 12.69
10/27/2007 00:07 18.94
10/27/2007 08:28 17.91
10/27/2007 08:29 18.00
10/28/2006 00:01 10.79
10/28/2006 00:03 11.78
10/28/2006 00:04 11.77
10/30/2006 00:08 9.96
10/30/2006 00:09 12.69
10/30/2006 00:10 18.94
$ awk -f x.awk x
-- 10/27/2007 00:00 5 13.82
-- 10/27/2007 00:05 3 13.86
-- 10/27/2007 08:25 2 17.95
-- 10/28/2006 00:00 3 11.45
-- 10/30/2006 00:05 2 11.32
-- 10/30/2006 00:10 1 18.94

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

SKSingh_1 · ‎10-28-2007

"PERFECT" solution hein.
This is what I was looking for.

Thank you

Sandman! · ‎10-28-2007

Try the awk construct below. It breaks up the 2nd field into an hour and minute field and it accrues the sum of the last field until a multiple of five comes across in the minute field after which the process is repeated again.

awk '{
m=z[split($2,z,":")]
h=z[1]
v=int(m/5)*5
i=(v<10?$1" "h":"0v:$1" "h":"v)
p[i]+=$NF
}END{for(i in p) print i,p[i]/5}' file

Hein van den Heuvel · ‎10-28-2007

Not bad Sandman! Not bad.

I think the request was for teh average not to divide by 5 (minutes) but by the number of occurances. An other array is needed for that.
(or a single array storing a more complex value with sum and count)

Minor comment: I believe you can not exactly control the order of output.

Also, very minor, I would replace the hard-to-read conditional assign of 'i' with a sprintf.

awk '{
m=z[split($2,z,":")]
h=z[1]
i=sprintf ("%s %s:%02d",$1,h,int(m/5)*5)
p[i]+=$NF
count[i]++
}END{for(i in p) print i,p[i]/count[i]}' file

Hein.

Arturo Galbiati · ‎10-28-2007

Hi,
hi Heine, you're rigth, to fic issue about input and order of otput:

awk ' /^[0-9]+\// {
m=z[split($2,z,":")]
h=z[1]
i=sprintf ("%s %s:%02d",$1,h,int(m/5)*5)
p[i]+=$NF
count[i]++
}
END {for(i in p) print i,p[i]/count[i]}' f1|\
sort -t "/" -k3 -k1 -k2

Just my .2$
HTH,
Art

Sandman! · ‎10-29-2007

Well Hein and Art thanks for pointing out the subtleties that I overlooked. And yes the sprintf function makes the code legible instead of the ternary operator.

>http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1158441<

BTW solution in above thread is truly "the work of Art" ;)

And here's the improved version of the awk construct I posted earlier.

awk '{
m=z[split($2,z,":")]
h=z[1]
i=sprintf ("%s %s:%.2d",$1,h,int(m/5)*5)
p[i]+=$NF
c[i]++
}END{for(i in p) print i,p[i]/c[i] | "sort -t/ -k3,3.4"}' file

SKSingh_1 · ‎10-29-2007

Great people great solution.

I have used first solution from Hein and its working.

I havn't checked other solution. Points given to other solution on basis of improved comments.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: AWK problem

AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem

Re: AWK problem