Operating System - HP-UX
1834406 Members
1387 Online
110067 Solutions
New Discussion

Re: data processing script

 
SOLVED
Go to solution
Ferdie Castro
Advisor

data processing script

I have a filename a whose data looks like below
2004-01-29 01:27:09,999171234567,303
2004-01-29 01:27:09,119171234567,562
I have filenameb whose data looks like below
101 200
201 300
301 400
401 500
501 600
I need to run a script which changes the last column value of filenamea to a value from filenameb which is within that range (higher value) ex. 303 between 301 & 400 so I change 303 to 400, next is 562 between 501 & 600 so I change it to 600. Is awk possible here.
Output from filenamea would then be.....
2004-01-29 01:27:09,999171234567,300
2004-01-29 01:27:09,119171234567,600
Pls note that last column values where changed based on the filenameb.
Hope you can help me out here. I've done simple programs but this one can't do it.
Thanks.
Ferdie


5 REPLIES 5
Steven E. Protter
Exalted Contributor

Re: data processing script

If you are willing to do some hunting, thats been done on this forum:

search awk scripts or:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=51050

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x836cc1c4ceddd61190050090279cd0f9,00.html

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x026250011d20d6118ff40090279cd0f9,00.html

I don't have the patience to write it for you right now,sorry.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Hein van den Heuvel
Honored Contributor

Re: data processing script

SMOP.

If all you need is those ranges of 100 that can be mathematically calculated, if file b idoes not have arbretrary ranges, then you can use an awk one-liner like:

awk -F, '{print $1,$2,100*int(($3+99)/100)}' < a


If you need the rnages in a file, you'd have to think about what to do if a value is not in a known range. In the example below I ignored that by just using the high values and adding a 'very high' in the script.

The script begins by reading b remembering high values in an array with m elements. I then set the field seperator to comma for convenient field splittnig.
Then for each main file line reade (a) it compares the last field (#3) with entries in the array, starting at zero, ending one too far. Adjust and print.

BEGIN { while (getline < "b") {
high[m++]=$2;
}
high[m]=999999;
FS=",";
}
{ i=0;
while ($3 > high[i++]);
i--;
print $1, $2, high[i];
}


Cheers,
Hein.
Michael Schulte zur Sur
Honored Contributor
Solution

Re: data processing script

Hi,

use awk -f t.awk filenamea

Michael

Hein van den Heuvel
Honored Contributor

Re: data processing script

fwiw/nitpicking...

Before you put Michael's solution into production you should add a 'VAL="?????";'
in the begin of the main loop (near FS=",";)

This will make sure that if, somehow, a value is outside an established range it will print something odd instead of repeat the last-used range.

Hein.
Elmar P. Kolkman
Honored Contributor

Re: data processing script

I've done this in the past... What you could do is concat both files with a good marker.

You could do it like this:

( cat filenameb ; echo "---split---" ; cat filenamea ) | awk ' BEGIN { gotsplit=0; idx=0}
/---split---/ { gotsplit=1; FS=","; next }
gotsplit==0 { minval[idx]=$1;maxval[idx]=$2;idx++}
gotsplit==1 { found=-1;i=0;while ( i < idx) { if ($NF >= minval[i] && $NF <= maxval[i]) { found=i } i++ } if (found!=-1) { $NF=maxval[found] } print }
'
Every problem has at least one solution. Only some solutions are harder to find.