Operating System - HP-UX
1753477 Members
5007 Online
108794 Solutions
New Discussion юеВ

Re: Scripting Help - extracting data from multi-record file

 
SOLVED
Go to solution
Kellogg Unix Team
Trusted Contributor

Scripting Help - extracting data from multi-record file

Hi Gurus,

I need some help in extracting data from a file which has multiple records. In following example -

The file content is -
EDI_DC rgrvgtr rgrgrg tr4ggtrg gtgtggg
EDIEDK01 tigtii rogrrk inv34573 edrtgf
EDIEDK02 wsderfs wsderf WE 34976854 wedscfrg
EDIEDK02 wkivmfu wjhhtn SF 4567890 edsedfgr
EDIEDK02 jbvngrbg fevefv QE 997855 fgnres
EDI_DC jbjbtrj ovevvrgg rfrfrere erferfe
EDIEDK01 jgbnb ergfbv inv54674 trtbtr
EDIEDK02 vvvjnv jegnvgn WE 687854 tjgrntn
EDIEDK02 bngbng tgttgte SF 456345 tyhnmty
EDIEDK02 trjbnnj eferfrff QE 876895 jgrnbtr
...and so on

For each EDI_DC block, I want to extract

EDIEDK01 segment's 4th field (inv number)
EDIEDK02 segment with WE - 5th field
EDIEDK02 segment with SF - 5th field

in that order. With simple egrep command, I am getting ALL "inv number"s followed by WEs, and SFs. I want my result also to be grouped in a way input file has, viz.
inv number1
WE1
SF1
inv number2
WE2
SF2
...and so on

How do I loop around with EDI_DC being start of the loop? Any suggestion is welcome.
Thanks & Regards
work is fun ! (my manager is standing behind me!!)
10 REPLIES 10
Peter Nikitka
Honored Contributor
Solution

Re: Scripting Help - extracting data from multi-record file

Hi,

let's try with awk:

awk '$1 == "EDI_DC" {if(found) printf("%s\n%s\n%s\n", inv,we,sf); found++}
$1 == "EDIEDK01" {inv=$4}
$1 == "EDIEDK02" && $4 == "WE" {we=$5}
$1 == "EDIEDK02" && $4 == "SF" {sf=$5}
END {if(found) printf("%s\n%s\n%s\n", inv,we,sf)}' inputfile

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Hein van den Heuvel
Honored Contributor

Re: Scripting Help - extracting data from multi-record file

Or perl with full regexpr instead of split by word...
---------- x.pl ---------
$inv = $we = $sf = "" if /^EDI_DC/;
$inv = $1 if /^EDIEDK01\s+\w+\s+\w+\s+(\w+)/;
next unless $inv;
$we = $1 if /^EDIEDK02\s+\w+\s+\w+\s+WE\s+(\w+)/;
next unless $we;
if (/^EDIEDK02\s+\w+\s+\w+\s+SF\s+(\w+)/) {
$sf = $1;
print ("$inv\n$we\n$sf\n");
}

Usage with data in x.txt:

perl -n x.pl x.txt

workings:

1) clear vars if header seen
2) remember inv if line starts with...
3) look for WE record only if inv seen
4) look for SF only if WE seen, and print all when found.

Sandman!
Honored Contributor

Re: Scripting Help - extracting data from multi-record file

Yet another way to do it using sed:

# sed -n 's/.*\(inv.*\) .*/\1/p;s/.*\(WE .*\) .*/\1/p;s/.*\(SF .*\) .*/\1/p' infile

cheers!
Sandman!
Honored Contributor

Re: Scripting Help - extracting data from multi-record file

Ignore last post as it prints "WE" and "SF" tags followed by the number like...

=============
inv34573
WE 34976854
SF 4567890
inv54674
WE 687854
SF 456345
=============

...and you want only the "inv" number and the numbers following the tags "WE" and "SF". So here's the corrected version of the sed command:

# sed -n 's/.*\(inv.*\) .*/\1/p;s/.*WE \(.*\) .*/\1/p;s/.*SF \(.*\) .*/\1/p' infile

hope it helps!
Kellogg Unix Team
Trusted Contributor

Re: Scripting Help - extracting data from multi-record file

Thanks to all for their help. I tried Peter's awk solution and with little tweaks, got that to working. I am getting the following -
INVO4400033530 1301 0001492191
INVO4400033531 1302 0001492192

Can I use cut command to strip off INVO within awk to get result like -

4400033530 1301 0001492191
4400033531 1302 0001492192

work is fun ! (my manager is standing behind me!!)
James R. Ferguson
Acclaimed Contributor

Re: Scripting Help - extracting data from multi-record file

Hi:

To "cut" off characters you don't want, you can use 'substr':

# echo "xxxxgoodstring"|awk '{print substr($0,5)}'

Strings are numbered starting at one (1).

Regards!

...JRF...
Peter Godron
Honored Contributor

Re: Scripting Help - extracting data from multi-record file

Hi,
cut -c5-

will cut from the 5th character to end of line
Kellogg Unix Team
Trusted Contributor

Re: Scripting Help - extracting data from multi-record file

Thanks JRF, substr did what I was looking for. cut within awk was giving me error. Here is the final script that works ($x is the filename) -

awk '$1 == "EDI_DC" {if(found) printf("%s\t%s\t%s\n", inv,sf,we); found++}
$1 ~ "E2EDK01" {inv=substr($6,5)}
$1 ~ "E2EDKA1" && $3 == "SF" {sf=$4}
$1 ~ "E2EDKA1" && $3 == "WE" {we=$4}
END {if(found) printf("%s\t%s\t%s\n", inv,sf,we)}' $x
work is fun ! (my manager is standing behind me!!)
Hein van den Heuvel
Honored Contributor

Re: Scripting Help - extracting data from multi-record file

Just FYI... in the perl solution with regexp's it is is trivial to make the INVO as part of the match string , but not tthe part to be remembered.

Run as: perl -n test.pl test.txt
The -n assumes a loop around the input data.

------------- test.pl -------
$inv = $we = $sf = "" if /^EDI_DC/;
$inv = $1 if /^EDIEDK01\s+\w+\s+\w+\s+INVO(\w+)/;
next unless $inv;
$we = $1 if /^EDIEDK02\s+\w+\s+\w+\s+WE\s+(\w+)/;
next unless $we;
if (/^EDIEDK02\s+\w+\s+\w+\s+SF\s+(\w+)/) {
$sf = $1;
print ("$inv $we $sf\n");
}

The explicit word matching can of course be replaced by .* match all:

$inv = $we = $sf = "" if /^EDI_DC/;
$inv = $1 if /^EDIEDK01.*inv(\w+)/;
next unless $inv;
$we = $1 if /^EDIEDK02.*WE\s+(\w+)/;
next unless $we;
if (/^EDIEDK02.*SF\s+(\w+)/) {
$sf = $1;
print ("$inv $we $sf\n");
}


fwiw,
Hein.