topic Re: awk Separating Records in Operating System - Linux

awk Separating Records

nz_1 — Fri, 31 Aug 2007 03:55:35 GMT

Hi

I am trying to separate records using awk.

Below is a sample of the records in a file.
------------
1|D|hfoo01||39322|74883|service-call-master
2|call-no|143814
3|call-status|D|F
3|call-date-last-change|20-AUG-2007|30-AUG-2007
1|U|hfoo01||39322|74893|service-call-master
2|call-no|143814
3|call-user-only-num1|0.0000|23158.0000
1|I|hfoo01||39322|74893|job-cost-master
2|job-code|143814
3|jcm-project-manager|GNOHPH01|HFOO01
1|U|hfoo01||39322|74903|service-call-master
2|call-no|143814
3|call-invoice-no||S3106895
---------------------------

The record separators are,
1|I| or 1|D| or 1|U|

Here is my feeble attempt to do it.
-------------
% more 1.awk
BEGIN { FS = "\n"
RS = "\/<1\|[IDU]" }
{
print "New Record "$1", "$2", "$3", "$4",,"RT
}

# awk -f 1.awk audit.log
New Record 1|D|hfoo01||39322|74883|service-call-master, 2|call-no|143814, 3|call
-status|D|F, 3|call-date-last-change|20-AUG-2007|30-AUG-2007,,
----------------------

Appreciate help from awk gurus on the RS variable.

Thank you.
nash

Re: awk Separating Records

Dennis Handly — Fri, 31 Aug 2007 04:35:15 GMT

It appears that RS can only be a single char. Not a regexp.

>I am trying to separate records using awk.

Looks like you are joining them.
(You haven't defined RT?)

Here is what I have to concatenate your lines:
awk '
BEGIN { getline; save=$0 }
/^1\|[IDU]\|/ {
print save ",,"
save=$0
next
}
{
save=save $0 # concatenate lines
next
}
END { print save ",," } ' audit.log

Re: awk Separating Records

James R. Ferguson — Fri, 31 Aug 2007 07:19:16 GMT

Hi Nash:

Try this:

# perl -0377 -ne '@a=split(/(1\|I\||1\|D\||1\|U\|)/,$_);@a;for $b (@a) {print "$b\n"}' file

...using your data, this would ouput:

1|D|
hfoo01||39322|74883|service-call-master
2|call-no|143814
3|call-status|D|F
3|call-date-last-change|20-AUG-2007|30-AUG-2007

1|U|
hfoo01||39322|74893|service-call-master
2|call-no|143814
3|call-user-only-num1|0.0000|23158.0000

1|I|
hfoo01||39322|74893|job-cost-master
2|job-code|143814
3|jcm-project-manager|GNOHPH01|HFOO01

...

This reads your whole file into memory (slurps it) without regard to any record seperator. It then leverages 'split' where regular expresssion patterns can be used.

The fact that your pattern to match contains the pipe (vertical bar) and alternation uses that symbol too, means that we have to escape with backslashes making a rather ugly expression.

Regards!

...JRF...

Re: awk Separating Records

Dennis Handly — Sun, 25 Sep 2011 00:48:54 GMT

>JRF: means that we have to escape with backslashes making a rather ugly expression.

Yep, awk needs that too.
My script leaves the record separators and joins the lines into a bigger record.

Re: awk Separating Records

James R. Ferguson — Fri, 31 Aug 2007 08:37:22 GMT

Hi (again) Nash:

>Dennis: My script leaves the record separators and joins the lines into a bigger record.

OK, I missed that requirement, but that's an easy fix:

# perl -0377 -wne 's/\012//g;@a=split(/(?=1\|I\||1\|D\||1\|U\|)/,$_);for $b (@a) {print "NEW RECORD:$b\n\n"}' audit.log

...outputs:

NEW RECORD:1|D|hfoo01||39322|74883|service-call-master2|call-no|1438143|call-sta
tus|D|F3|call-date-last-change|20-AUG-2007|30-AUG-2007

NEW RECORD:1|U|hfoo01||39322|74893|service-call-master2|call-no|1438143|call-use
r-only-num1|0.0000|23158.0000

NEW RECORD:1|I|hfoo01||39322|74893|job-cost-master2|job-code|1438143|jcm-project
-manager|GNOHPH01|HFOO01

NEW RECORD:1|U|hfoo01||39322|74903|service-call-master2|call-no|1438143|call-inv
oice-no||S3106895

Regards!

...JRF...

Re: awk Separating Records

larsoncu — Fri, 31 Aug 2007 10:04:52 GMT

how about just doing something like this:

awk '
#prints a newline before printing every line that begins with a 1
#expcept for the first line
/^1/ { if ( NR > 1 )
printf("\n%s",$0);
else
printf("%s",%0);
next;
}
# on every line not beginning with a one
# print the whole line except the first character, but without a carriage return
{printf("%s",substr($0,2));}
'

Re: awk Separating Records

Sandman! — Fri, 31 Aug 2007 16:31:48 GMT

Try the awk construct below:

awk '{if($0~"^1"){if(l) print l;l=$0}else l=l","$0}END{print l}' file

Re: awk Separating Records

nz_1 — Sat, 01 Sep 2007 01:37:47 GMT

Using Dennis' solution and adjust it a bit.
---------------
BEGIN { getline; save=$0 }
/^1\|[IDU]\|/ {
print save
save=$0"|"
next
}
{
save=save $0"|" # concatenate lines
next
}
END {}
-------------
Voila! I got records with | as the delimiter.
-----------------
% awk -f 1.awk audit.log
1|D|hfoo01||39322|74883|service-call-master2|call-no|143814|3|call-status|D|F|3|
call-date-last-change|20-AUG-2007|30-AUG-2007|
1|U|hfoo01||39322|74893|service-call-master|2|call-no|143814|3|call-user-only-nu
m1|0.0000|23158.0000|
1|I|hfoo01||39322|74893|job-cost-master|2|job-code|143814|3|jcm-project-manager|
GNOHPH01|HFOO01|
-----------

Ah... except for the first record which missed a "|" and the fourth record which is totally missing.

Re: awk Separating Records

nz_1 — Sat, 01 Sep 2007 01:41:13 GMT

Sandman's solution.
----
% awk '{if($0~"^1"){if(l) print l;l=$0}else l=l"|"$0}END{print l}' audit.log
1|D|hfoo01||39322|74883|service-call-master|2|call-no|143814|3|call-status|D|F|3
|call-date-last-change|20-AUG-2007|30-AUG-2007
1|U|hfoo01||39322|74893|service-call-master|2|call-no|143814|3|call-user-only-nu
m1|0.0000|23158.0000
1|I|hfoo01||39322|74893|job-cost-master|2|job-code|143814|3|jcm-project-manager|
GNOHPH01|HFOO01
1|U|hfoo01||39322|74903|service-call-master|2|call-no|143814|3|call-invoice-no||
S3106895|
--------

Yes! All records retrieved!

Re: awk Separating Records

nz_1 — Sat, 01 Sep 2007 01:49:56 GMT

James' solution works fine but I cant figure out how to append | to the subsequent lines using perl.
-----------
% perl -0377 -wne 's/\012//g;@a=split(/(?=1\|I\||1\|D\||1\|U\|)/,$_);for $b (>
NEW RECORD:1|D|hfoo01||39322|74883|service-call-master2|call-no|1438143|call-sta
tus|D|F3|call-date-last-change|20-AUG-2007|30-AUG-2007

NEW RECORD:1|U|hfoo01||39322|74893|service-call-master2|call-no|1438143|call-use
r-only-num1|0.0000|23158.0000

NEW RECORD:1|I|hfoo01||39322|74893|job-cost-master2|job-code|1438143|jcm-project
-manager|GNOHPH01|HFOO01

NEW RECORD:1|U|hfoo01||39322|74903|service-call-master2|call-no|1438143|call-inv
oice-no||S3106895

Re: awk Separating Records

James R. Ferguson — Sat, 01 Sep 2007 08:24:44 GMT

Hi (again) Nash:

> James' solution works fine but I cant figure out how to append | to the subsequent lines using perl.

OK, try this:

# perl -0377 -wne 's/\012//g;@a=split(/(?=1\|I\||1\|D\||1\|U\|)/,$_);for $b (@a) {print "NEW RECORD:$b|\n\n"}' audit.log

I suspect that you want to drop the "NEW RECORD:" preamble and the additional newline ('\n') that I added for clarity, so:

# perl -0377 -wne 's/\012//g;@a=split(/(?=1\|I\||1\|D\||1\|U\|)/,$_);for $b (@a) {print "$b|\n"}' audit.log

1|D|hfoo01||39322|74883|service-call-master2|call-no|1438143|call-status|D|F3|ca
ll-date-last-change|20-AUG-2007|30-AUG-2007|
1|U|hfoo01||39322|74893|service-call-master2|call-no|1438143|call-user-only-num1
|0.0000|23158.0000|
1|I|hfoo01||39322|74893|job-cost-master2|job-code|1438143|jcm-project-manager|GN
OHPH01|HFOO01|
1|U|hfoo01||39322|74903|service-call-master2|call-no|1438143|call-invoice-no||S3
106895|

Regards!

...JRF...

Re: awk Separating Records

Dennis Handly — Sat, 01 Sep 2007 16:16:24 GMT

>except for the first record which missed a "|" and the fourth record which is totally missing.

You need to adjust the BEGIN to add that "|".
And you lost my END to print out the last record.