1751710 Members
5143 Online
108781 Solutions
New Discussion юеВ

data manipulation

 
SOLVED
Go to solution
lawrenzo_1
Super Advisor

data manipulation

Hello all,

Please could you provide some help / ideas:

I have a file that is updated everytime a job is started and finishes against an informix database. I would like to run some code against the file to determine the start and stop times, also I will develop the script to calculate the average run times.

the logfile entries are as follows:

start/end PID date time program user script option

S|1161480|20070720|1205|createWebOrdFile.4ge|cronlog|.|/usr/cs3/scripts/JDE/createWebOrdFile||
E|1161480|20070720|1205|createWebOrdFile.4ge|cronlog|.|/usr/cs3/scripts/JDE/createWebOrdFile||


what I would be looking for is the script o be run on the previous day for a 24hr period and detail every job that has run including start and stop times including the option the script was run with:

scriptname:4ge program name:start:stop:duration:option

any help will be much appreciated.

Thanks

Chris.

hello
9 REPLIES 9
Sandman!
Honored Contributor
Solution

Re: data manipulation

Try the awk script below:

awk -F\| '{
if ($1=="S") {s[$2]=$8":"$5":"$3" "$4;str=$4}
else if ($1=="E") {e[$2]=$3" "$4;stp=$4}
}END{for(i in s) print s[i]":"e[i]":"str-stp}' file
Peter Nikitka
Honored Contributor

Re: data manipulation

Hi,

I'm sorting the description of all of the fields of the logfile you gave:
1 start(S) end(E)
2 PID (1161480)
3 date (20070720)
4 time (1205) => is that HHMM ?
5 program (createWebOrdFile.4ge)
6 user (cronlog)
7 script (.)
8 option (/usr/cs3/....)
9,10 ignored

Please form your requested output format like this:
"script:"(7) "prog:"(5) ...

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Sandman!
Honored Contributor

Re: data manipulation

Actually Peter is spot-on. The sample input provided is sketchy as there isn't a one-to-one correspondence between the fields and its data. The data seems to be off by a few. Same goes for the output. Please provide a clearer example of the input and output.

~thanks
James R. Ferguson
Acclaimed Contributor

Re: data manipulation

Hi Chris:

> ...previous day for a 24hr period...

Would you define what you mean, please. Will that definition include records that start on one day and end on another?

Regards!

...JRF...
lawrenzo_1
Super Advisor

Re: data manipulation

Yes this will include defenitions that run one day into another however I will output to a daily file and put a condition that is a pid is not found then check the current audit log.

unless there are any other suggestions?

Thanks

Chris
hello
lawrenzo_1
Super Advisor

Re: data manipulation

guys:

1 start(S) end(E)
2 PID (1161480)
3 date (20070720)
4 time (1205) => is that HHMM ?
5 program (createWebOrdFile.4ge)
6 user (cronlog)
7 is where the script is run from (.)
8 script
9,option ie <script> - T1 being store identifier.

option 7 can be ignored

Thanks
hello
Peter Nikitka
Honored Contributor

Re: data manipulation

Ok,

my solution does not use any values of the End-entry except PID, date + time; checking of other fields is not provided. Incomplete lines (missing/additional fields) are ingored.
The PID will be used as a unique identifier - PIDs of different days will Be IGNORED (you told, that this is okay!). So runtime will be solely based on 'time'. If necessary, this could be handled more graceful.
The empty field 9 (of your example) is 'option'.
If you need any headers/prefixes, modify them at output in 'END'.

So let's try this:

awk -F'|' 'NF!=10 {next}
{idx=$2""}
$1=="S" {day[idx]=$3;stim[idx]=$4; scr[idx]=$8; prog[idx]=$5; opt[idx]=$9}
$1=="E" {if($3!=day[idx]) next
if(stim[idx]) etim[idx]=$4 }
END {for (r in etim) printf("script:%s prog:%s rt:%d-%d=%d opt=%s\n", scr[r],prog[r], etim[idx], stim[idx],etim[idx]-stim[idx], opt[idx])}' YOURFILE

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Hein van den Heuvel
Honored Contributor

Re: data manipulation

Hmmm,

The prior solutions seem to subtract two HHMM values to get a duration.
That will result in 41 minutes going from 1259 to 1300 instead of the 1 minute one might reasonably expect.
Also, the prior solutions specifically remember stuff from the start record which is also available in the end record.

I would suggest:

$ awk -f test.awk test.txt

where
----------- test.awk ----------
BEGIN { FS="|"; OFS=":" }
function minutes (time) { h = int(time/100); m = time - h*100; return h*60 + m }
/^S/ {s[$2]=$4}
/^E/ {print $8 " " $5,s[$2],$4,minutes($4)-minutes(s[$2])}

Sample data
---------------- test.txt -----------
S|1161480|20070720|1259|createWebOrdFile.4ge|cronlog|.|test||
E|1161480|20070720|1300|createWebOrdFile.4ge|cronlog|.|test||
S|1161483|20070720|1205|createWebOrdFile.4ge|cronlog|.|xxxxxxxxxxxxx||
S|1161484|20070720|1205|createWebOrdFile.4ge|cronlog|.|yyyyyyyyyyyyy||
E|1161484|20070720|1255|createWebOrdFile.4ge|cronlog|.|aaaaaaaaaaaaa||
E|1161483|20070720|1310|createWebOrdFile.4ge|cronlog|.|bbbbbbbbbbbbb||


Results in:
--------------------------
test createWebOrdFile.4ge:1259:1300:1
aaaaaaaaaaaaa createWebOrdFile.4ge:1205:1255:50
bbbbbbbbbbbbb createWebOrdFile.4ge:1205:1310:65


Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
lawrenzo_1
Super Advisor

Re: data manipulation

thanks guys,

some good solutions here which I will use.

Chris
hello