Re: Scripting Help - extracting data from multi-record file

Kellogg Unix Team · ‎05-26-2006

Hi Gurus,

I need some help in extracting data from a file which has multiple records. In following example -

The file content is -
EDI_DC rgrvgtr rgrgrg tr4ggtrg gtgtggg
EDIEDK01 tigtii rogrrk inv34573 edrtgf
EDIEDK02 wsderfs wsderf WE 34976854 wedscfrg
EDIEDK02 wkivmfu wjhhtn SF 4567890 edsedfgr
EDIEDK02 jbvngrbg fevefv QE 997855 fgnres
EDI_DC jbjbtrj ovevvrgg rfrfrere erferfe
EDIEDK01 jgbnb ergfbv inv54674 trtbtr
EDIEDK02 vvvjnv jegnvgn WE 687854 tjgrntn
EDIEDK02 bngbng tgttgte SF 456345 tyhnmty
EDIEDK02 trjbnnj eferfrff QE 876895 jgrnbtr
...and so on

For each EDI_DC block, I want to extract

EDIEDK01 segment's 4th field (inv number)
EDIEDK02 segment with WE - 5th field
EDIEDK02 segment with SF - 5th field

in that order. With simple egrep command, I am getting ALL "inv number"s followed by WEs, and SFs. I want my result also to be grouped in a way input file has, viz.
inv number1
WE1
SF1
inv number2
WE2
SF2
...and so on

How do I loop around with EDI_DC being start of the loop? Any suggestion is welcome.
Thanks & Regards

work is fun ! (my manager is standing behind me!!)

Peter Nikitka · ‎05-26-2006

Hi,

let's try with awk:

awk '$1 == "EDI_DC" {if(found) printf("%s\n%s\n%s\n", inv,we,sf); found++}
$1 == "EDIEDK01" {inv=$4}
$1 == "EDIEDK02" && $4 == "WE" {we=$5}
$1 == "EDIEDK02" && $4 == "SF" {sf=$5}
END {if(found) printf("%s\n%s\n%s\n", inv,we,sf)}' inputfile

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

Hein van den Heuvel · ‎05-26-2006

Or perl with full regexpr instead of split by word...
---------- x.pl ---------
$inv = $we = $sf = "" if /^EDI_DC/;
$inv = $1 if /^EDIEDK01\s+\w+\s+\w+\s+(\w+)/;
next unless $inv;
$we = $1 if /^EDIEDK02\s+\w+\s+\w+\s+WE\s+(\w+)/;
next unless $we;
if (/^EDIEDK02\s+\w+\s+\w+\s+SF\s+(\w+)/) {
$sf = $1;
print ("$inv\n$we\n$sf\n");
}

Usage with data in x.txt:

perl -n x.pl x.txt

workings:

1) clear vars if header seen
2) remember inv if line starts with...
3) look for WE record only if inv seen
4) look for SF only if WE seen, and print all when found.

Sandman! · ‎05-26-2006

Yet another way to do it using sed:

# sed -n 's/.*$inv.*$ .*/\1/p;s/.*$WE .*$ .*/\1/p;s/.*$SF .*$ .*/\1/p' infile

cheers!

Sandman! · ‎05-26-2006

Ignore last post as it prints "WE" and "SF" tags followed by the number like...

=============
inv34573
WE 34976854
SF 4567890
inv54674
WE 687854
SF 456345
=============

...and you want only the "inv" number and the numbers following the tags "WE" and "SF". So here's the corrected version of the sed command:

# sed -n 's/.*$inv.*$ .*/\1/p;s/.*WE $.*$ .*/\1/p;s/.*SF $.*$ .*/\1/p' infile

hope it helps!

Kellogg Unix Team · ‎05-30-2006

Thanks to all for their help. I tried Peter's awk solution and with little tweaks, got that to working. I am getting the following -
INVO4400033530 1301 0001492191
INVO4400033531 1302 0001492192

Can I use cut command to strip off INVO within awk to get result like -

4400033530 1301 0001492191
4400033531 1302 0001492192

work is fun ! (my manager is standing behind me!!)

James R. Ferguson · ‎05-30-2006

Hi:

To "cut" off characters you don't want, you can use 'substr':

# echo "xxxxgoodstring"|awk '{print substr($0,5)}'

Strings are numbered starting at one (1).

Regards!

...JRF...

Peter Godron · ‎05-30-2006

Hi,
cut -c5-

will cut from the 5th character to end of line

Kellogg Unix Team · ‎05-30-2006

Thanks JRF, substr did what I was looking for. cut within awk was giving me error. Here is the final script that works ($x is the filename) -

awk '$1 == "EDI_DC" {if(found) printf("%s\t%s\t%s\n", inv,sf,we); found++}
$1 ~ "E2EDK01" {inv=substr($6,5)}
$1 ~ "E2EDKA1" && $3 == "SF" {sf=$4}
$1 ~ "E2EDKA1" && $3 == "WE" {we=$4}
END {if(found) printf("%s\t%s\t%s\n", inv,sf,we)}' $x

work is fun ! (my manager is standing behind me!!)

Hein van den Heuvel · ‎05-30-2006

Just FYI... in the perl solution with regexp's it is is trivial to make the INVO as part of the match string , but not tthe part to be remembered.

Run as: perl -n test.pl test.txt
The -n assumes a loop around the input data.

------------- test.pl -------
$inv = $we = $sf = "" if /^EDI_DC/;
$inv = $1 if /^EDIEDK01\s+\w+\s+\w+\s+INVO(\w+)/;
next unless $inv;
$we = $1 if /^EDIEDK02\s+\w+\s+\w+\s+WE\s+(\w+)/;
next unless $we;
if (/^EDIEDK02\s+\w+\s+\w+\s+SF\s+(\w+)/) {
$sf = $1;
print ("$inv $we $sf\n");
}

The explicit word matching can of course be replaced by .* match all:

$inv = $we = $sf = "" if /^EDI_DC/;
$inv = $1 if /^EDIEDK01.*inv(\w+)/;
next unless $inv;
$we = $1 if /^EDIEDK02.*WE\s+(\w+)/;
next unless $we;
if (/^EDIEDK02.*SF\s+(\w+)/) {
$sf = $1;
print ("$inv $we $sf\n");
}

fwiw,
Hein.

Kellogg Unix Team · ‎05-30-2006

Closing the thread, Thanks everyone!!

work is fun ! (my manager is standing behind me!!)

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Scripting Help - extracting data from multi-record file

Scripting Help - extracting data from multi-record file