1839204 Members
4264 Online
110137 Solutions
New Discussion

awk Separating Records

 
SOLVED
Go to solution
nz_1
Advisor

awk Separating Records

Hi

I am trying to separate records using awk.

Below is a sample of the records in a file.
------------
1|D|hfoo01||39322|74883|service-call-master
2|call-no|143814
3|call-status|D|F
3|call-date-last-change|20-AUG-2007|30-AUG-2007
1|U|hfoo01||39322|74893|service-call-master
2|call-no|143814
3|call-user-only-num1|0.0000|23158.0000
1|I|hfoo01||39322|74893|job-cost-master
2|job-code|143814
3|jcm-project-manager|GNOHPH01|HFOO01
1|U|hfoo01||39322|74903|service-call-master
2|call-no|143814
3|call-invoice-no||S3106895
---------------------------

The record separators are,
1|I| or 1|D| or 1|U|

Here is my feeble attempt to do it.
-------------
% more 1.awk
BEGIN { FS = "\n"
RS = "\/<1\|[IDU]" }
{
print "New Record "$1", "$2", "$3", "$4",,"RT
}

# awk -f 1.awk audit.log
New Record 1|D|hfoo01||39322|74883|service-call-master, 2|call-no|143814, 3|call
-status|D|F, 3|call-date-last-change|20-AUG-2007|30-AUG-2007,,
----------------------

Appreciate help from awk gurus on the RS variable.

Thank you.
nash
11 REPLIES 11
Dennis Handly
Acclaimed Contributor
Solution

Re: awk Separating Records

It appears that RS can only be a single char. Not a regexp.

>I am trying to separate records using awk.

Looks like you are joining them.
(You haven't defined RT?)

Here is what I have to concatenate your lines:
awk '
BEGIN { getline; save=$0 }
/^1\|[IDU]\|/ {
print save ",,"
save=$0
next
}
{
save=save $0 # concatenate lines
next
}
END { print save ",," } ' audit.log
James R. Ferguson
Acclaimed Contributor

Re: awk Separating Records

Hi Nash:

Try this:

# perl -0377 -ne '@a=split(/(1\|I\||1\|D\||1\|U\|)/,$_);@a;for $b (@a) {print "$b\n"}' file

...using your data, this would ouput:

1|D|
hfoo01||39322|74883|service-call-master
2|call-no|143814
3|call-status|D|F
3|call-date-last-change|20-AUG-2007|30-AUG-2007

1|U|
hfoo01||39322|74893|service-call-master
2|call-no|143814
3|call-user-only-num1|0.0000|23158.0000

1|I|
hfoo01||39322|74893|job-cost-master
2|job-code|143814
3|jcm-project-manager|GNOHPH01|HFOO01

...

This reads your whole file into memory (slurps it) without regard to any record seperator. It then leverages 'split' where regular expresssion patterns can be used.

The fact that your pattern to match contains the pipe (vertical bar) and alternation uses that symbol too, means that we have to escape with backslashes making a rather ugly expression.

Regards!

...JRF...
Dennis Handly
Acclaimed Contributor

Re: awk Separating Records

>JRF: means that we have to escape with backslashes making a rather ugly expression.

Yep, awk needs that too.
My script leaves the record separators and joins the lines into a bigger record.

James R. Ferguson
Acclaimed Contributor

Re: awk Separating Records

Hi (again) Nash:

>Dennis: My script leaves the record separators and joins the lines into a bigger record.

OK, I missed that requirement, but that's an easy fix:

# perl -0377 -wne 's/\012//g;@a=split(/(?=1\|I\||1\|D\||1\|U\|)/,$_);for $b (@a) {print "NEW RECORD:$b\n\n"}' audit.log

...outputs:

NEW RECORD:1|D|hfoo01||39322|74883|service-call-master2|call-no|1438143|call-sta
tus|D|F3|call-date-last-change|20-AUG-2007|30-AUG-2007

NEW RECORD:1|U|hfoo01||39322|74893|service-call-master2|call-no|1438143|call-use
r-only-num1|0.0000|23158.0000

NEW RECORD:1|I|hfoo01||39322|74893|job-cost-master2|job-code|1438143|jcm-project
-manager|GNOHPH01|HFOO01

NEW RECORD:1|U|hfoo01||39322|74903|service-call-master2|call-no|1438143|call-inv
oice-no||S3106895

Regards!

...JRF...

larsoncu
Advisor

Re: awk Separating Records

how about just doing something like this:

awk '
#prints a newline before printing every line that begins with a 1
#expcept for the first line
/^1/ { if ( NR > 1 )
printf("\n%s",$0);
else
printf("%s",%0);
next;
}
# on every line not beginning with a one
# print the whole line except the first character, but without a carriage return
{printf("%s",substr($0,2));}
'
Sandman!
Honored Contributor

Re: awk Separating Records

Try the awk construct below:

awk '{if($0~"^1"){if(l) print l;l=$0}else l=l","$0}END{print l}' file
nz_1
Advisor

Re: awk Separating Records

Using Dennis' solution and adjust it a bit.
---------------
BEGIN { getline; save=$0 }
/^1\|[IDU]\|/ {
print save
save=$0"|"
next
}
{
save=save $0"|" # concatenate lines
next
}
END {}
-------------
Voila! I got records with | as the delimiter.
-----------------
% awk -f 1.awk audit.log
1|D|hfoo01||39322|74883|service-call-master2|call-no|143814|3|call-status|D|F|3|
call-date-last-change|20-AUG-2007|30-AUG-2007|
1|U|hfoo01||39322|74893|service-call-master|2|call-no|143814|3|call-user-only-nu
m1|0.0000|23158.0000|
1|I|hfoo01||39322|74893|job-cost-master|2|job-code|143814|3|jcm-project-manager|
GNOHPH01|HFOO01|
-----------

Ah... except for the first record which missed a "|" and the fourth record which is totally missing.
nz_1
Advisor

Re: awk Separating Records

Sandman's solution.
----
% awk '{if($0~"^1"){if(l) print l;l=$0}else l=l"|"$0}END{print l}' audit.log
1|D|hfoo01||39322|74883|service-call-master|2|call-no|143814|3|call-status|D|F|3
|call-date-last-change|20-AUG-2007|30-AUG-2007
1|U|hfoo01||39322|74893|service-call-master|2|call-no|143814|3|call-user-only-nu
m1|0.0000|23158.0000
1|I|hfoo01||39322|74893|job-cost-master|2|job-code|143814|3|jcm-project-manager|
GNOHPH01|HFOO01
1|U|hfoo01||39322|74903|service-call-master|2|call-no|143814|3|call-invoice-no||
S3106895|
--------

Yes! All records retrieved!
nz_1
Advisor

Re: awk Separating Records

James' solution works fine but I cant figure out how to append | to the subsequent lines using perl.
-----------
% perl -0377 -wne 's/\012//g;@a=split(/(?=1\|I\||1\|D\||1\|U\|)/,$_);for $b (>
NEW RECORD:1|D|hfoo01||39322|74883|service-call-master2|call-no|1438143|call-sta
tus|D|F3|call-date-last-change|20-AUG-2007|30-AUG-2007

NEW RECORD:1|U|hfoo01||39322|74893|service-call-master2|call-no|1438143|call-use
r-only-num1|0.0000|23158.0000

NEW RECORD:1|I|hfoo01||39322|74893|job-cost-master2|job-code|1438143|jcm-project
-manager|GNOHPH01|HFOO01

NEW RECORD:1|U|hfoo01||39322|74903|service-call-master2|call-no|1438143|call-inv
oice-no||S3106895
James R. Ferguson
Acclaimed Contributor

Re: awk Separating Records

Hi (again) Nash:

> James' solution works fine but I cant figure out how to append | to the subsequent lines using perl.

OK, try this:

# perl -0377 -wne 's/\012//g;@a=split(/(?=1\|I\||1\|D\||1\|U\|)/,$_);for $b (@a) {print "NEW RECORD:$b|\n\n"}' audit.log

I suspect that you want to drop the "NEW RECORD:" preamble and the additional newline ('\n') that I added for clarity, so:

# perl -0377 -wne 's/\012//g;@a=split(/(?=1\|I\||1\|D\||1\|U\|)/,$_);for $b (@a) {print "$b|\n"}' audit.log

1|D|hfoo01||39322|74883|service-call-master2|call-no|1438143|call-status|D|F3|ca
ll-date-last-change|20-AUG-2007|30-AUG-2007|
1|U|hfoo01||39322|74893|service-call-master2|call-no|1438143|call-user-only-num1
|0.0000|23158.0000|
1|I|hfoo01||39322|74893|job-cost-master2|job-code|1438143|jcm-project-manager|GN
OHPH01|HFOO01|
1|U|hfoo01||39322|74903|service-call-master2|call-no|1438143|call-invoice-no||S3
106895|

Regards!

...JRF...

Dennis Handly
Acclaimed Contributor

Re: awk Separating Records

>except for the first record which missed a "|" and the fourth record which is totally missing.

You need to adjust the BEGIN to add that "|".
And you lost my END to print out the last record.