Operating System - HP-UX
1835989 Members
3177 Online
110088 Solutions
New Discussion

Re: parsing problem..shell script

 
SOLVED
Go to solution
amonamon
Regular Advisor

parsing problem..shell script

Hello I am using ksh, and would appreciate if someone could help me started with this phase:


My file has amoung all..those pattern lines(they change..):
Does not have to have 22 lines...sometimes 10 sometimes 30 or more

...
...
...
..
2008-01-16 09:10:39,103 INFO [http-8080-Processor11235] (PaymentRequestFactory.java:640) -payment.processing.Request(
accessID = "Internet"
accountType = payment.processing.AccountType(
value = 1
)
sum = 1
AccountID = 0
consumerID = "33333333333"
consumerPIN = ""
currency = "RAT"
merchantID = "C"
name = "Request"
pin = ""
productID = "Roam"
purpose = "436640104,38763891420,"
roleID = 4
routingInfo = "3"
tranID = "000210000002783264"
type = "Request"
userID = "C"
)
2008-01-16 09:10:39,104 INFO [http-8080] (PaymentConnectionFactory.output) - getRef( q1 )
...
....

...
..

Is there a way to write this output into this:

accessID = "Internet";sum = 1;AccountID = 0;consumerID = "33333333333";consumerPIN = "";currency = "RAT";merchantID = "C";name = "Request";pin = "";productID = "Roam";purpose = "436640104,38763891420,";roleID = 4;routingInfo = "3";tranID = "000210000002783264";type = "Request";userID = "C"

where:
FS=;

parser should start with accessID filed and finish when it finds userID






10 REPLIES 10
mschulz67
Occasional Advisor

Re: parsing problem..shell script

You can try with awk, I'd prefer perl.
General outline: Use a trigger variable, set trigger when the line starts with "accessID" and unset the trigger when the line starts with "userID".
Print only the lines when the trigger is set (mind the last one...)
With awk: Set the output field separator to ";", print a "\n" after each "userID"

Have fun.
amonamon
Regular Advisor

Re: parsing problem..shell script

In case U missunderstood I would like to have output:

accessID = "Internet";sum = 1;AccountID = 0;consumerID = "33333333333";consumerPIN = "";currency = "RAT";merchantID = "C";name = "Request";pin = "";productID = "Roam";purpose = "436640104,38763891420,";roleID = 4;routingInfo = "3";tranID = "000210000002783264";type = "Request";userID = "C"

where:
FS=;


input is that file with lot of snippet of:

accessID = "Internet"
accountType = payment.processing.AccountType(
value = 1
)
sum = 1
AccountID = 0
consumerID = "33333333333"
consumerPIN = ""
currency = "RAT"
merchantID = "C"
name = "Request"
pin = ""
productID = "Roam"
purpose = "436640104,38763891420,"
roleID = 4
routingInfo = "3"
tranID = "000210000002783264"
type = "Request"
userID = "C"
Hein van den Heuvel
Honored Contributor
Solution

Re: parsing problem..shell script

Try this awk line:

$ awk '/^accessID/,/^userID/ {if (/^userID/) {print x $0; x=""} else { x = x $0 ";" }}' your-file


Please specify what to do with the "accountType" entry, or rather, the other part of the request(accessID...) part.

The example omits it, the specification does not.

hth,
Hein.
amonamon
Regular Advisor

Re: parsing problem..shell script

that is pretty much that..except accountType should not be in parse result..just as U said.


also all those 3 lines should NOT be in result:

accountType = payment.processing.AccountType(
value = 1
)

amonamon
Regular Advisor

Re: parsing problem..shell script

also...:(
can U help me with code explanation..I do not know purpose of this x variable..:(
Hein van den Heuvel
Honored Contributor

Re: parsing problem..shell script

In my first solution I use the awk 'range' selection "from one match untill the other": /one/,/other/
The 'x' variable is an accumulator where I add the pieces of string found plust seperator. Print and clean at end.

The begin,end match can also be build by hand, using a flag set on first match, and cleared on last. Test flag all the time.
That's the 'f' in the next example.
The 'x' is still an accumulator, now cleared by assigning the first value.

awk '/^userID/ {print x ";" $0; f=0} (f) {x = x ";" $0} /^accessID/ {x=$0;getline;getline;getline;f=1} ' your-file.

In slow motion:

awk '
/^userID/ # At the end...
{print x ";" $0; # print what we have in x sofar and add this line.
f=0} # no longer matching

(f) # if matching
{x = x ";" $0} # add this chunk to accumulator x

/^accessID/ # In the beginning
{x=$0; # Start fresh accumulator
getline;getline;getline; # hardcoded skip
f=1} # start matching

' your-file.


hth,
Hein.
amonamon
Regular Advisor

Re: parsing problem..shell script

Thanks for help.. I sofar upgraded this script so now I created this output:


myfile.txt:
....
..
784928|0002785283|Charge|SRoam
476431|0002785284|Charge|SRoam
844473|0012415890|Charge|SRoam
..
...
..


In my second.txt file

there is for filed $2 (myfile.txt) furher explanation:

for example if I do grep 0002785283 second.txt I get:


2008-01-16 12:58:28,137 0002785283 FINISHED timeout = 1
2008-01-16 12:58:28,144 Result : 0002785283 res:1

what I should get somehow get file:

784928|0002785283|Charge|SRoam|2008-01-16 12:58:28,137 0002785283 FINISHED timeout = 1|2008-01-16 12:58:28,144 Result : 0002785283 res:1

..Add those lines..

so final file should have

output.txt

784928|0002785283|Charge|SRoam|2008-01-16 12:58:28,137 0002785283 FINISHED timeout = 1|2008-01-16 12:58:28,144 Result : 0002785283 res:1
476431|0002785284|Charge|SRoam|2007-01-16 11:38:28,137 0002785283 FINISHED timeout = 1|2008-01-16 12:58:28,144 Result : 0002785283 res:2
...
..
etc.

Thank you in advance
amonamon
Regular Advisor

Re: parsing problem..shell script

Maybe I should write here what I did for this last post:

while read line; do

{
echo linija je $line
#pattern1=$(nawk -F"|" '{ print $2 }' $line)
pattern1=$(cut -d"|" -f2 $line)
rest=$(grep $pattern1 second.txt )
echo $line"|"$rest >> output.txt
}

done < myfile.txt



but error...this one:

cut: cannot open 689133|4849844105|charge|cce


any help?
Dennis Handly
Acclaimed Contributor

Re: parsing problem..shell script

> I created this output: myfile.txt:

For every record in this file you want to find something in second.txt file and join them?

>I do grep 0002785283 second.txt I get:
2008-01-16 12:58:28,137 0002785283 FINISHED timeout = 1
2008-01-16 12:58:28,144 Result : 0002785283 res:1

Is this one long line? Or two lines that we need to combine? (I assumed the latter and assumed only 2.)

>so final file should have

What is the connection from 0002785284 and the info from second.txt 0002785283, or is that a typo?

Here is my script, it will be slow if second.txt is big:
awk -F"|" -v file2=second.txt '
{
cmd = "grep " $2 " " file2
cmd | getline save_2
cmd | getline save_3
print $0 "|" save_2 "|" save_3
} ' myfile.txt

>but error...this one:
cut: cannot open 689133|4849844105|charge|cce

cut(1) works on files not variables, so change to:
$ pattern1=$(echo $line | cut -d"|" -f2)
Arturo Galbiati
Esteemed Contributor

Re: parsing problem..shell script

Hi,
to mergethe merge you canb use the join command that's very powerful:

1) tr -s '|' ' 'temp.txt
change the separtor in myfile.txt from '|' to space and save the result in temp.txt

2) join -t " " -j1 2 -j2 3 -o 1.1,1.2,1.3,1.4,2.1,2.2,2.3,2.4,2.5,2.6,2.7 temp.txt second.txt

merge teh two file by the key 2 in teh first file and key 3 in second file




i.e.:

% cat myfile.txt
784928|0002785283|Charge|SRoam
476431|0002785284|Charge|SRoam
844473|0012415890|Charge|SRoam

%cat second.txt
2008-01-16 12:58:28,137 0002785283 FINISHED timeout = 1
2008-01-16 12:58:28,144 Result : 0002785283 res:1

%tr -s '|' ' 'temp.txt
%join -t " " -j1 2 -j2 3 -o 1.1,1.2,1.3,1.4,2.1,2.2,2.3,2.4,2.5,2.6,2.7 temp.txt second.txt
784928 0002785283 Charge SRoam 2008-01-16 12:58:28,137 0002785283 FINISHED timeout = 1

type man join to understand all option used.

HTH,
Art