1830250 Members
3547 Online
110000 Solutions
New Discussion

script tips help

 
SOLVED
Go to solution
sheevm
Regular Advisor

script tips help

Hi All,

I have in my put file, each line contains "|" delimiter. I have to process this file counting the number of pipes. If the count is not 18 I have to put those line into a different files.

Basically spilt the files into two.

on file with lines that has 18 pipes.
other lines.

Please see the sample input line:

C|2630|000000058|No Item Name Available||9||||||||1|1|1|1|1|N

thanks
be good and do good
25 REPLIES 25
Peter Nikitka
Honored Contributor
Solution

Re: script tips help

Hi,

18 delimiters == 19 fields; awk solution:

awk -F'|' 'NF != 19' infile >outfile

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Bob E Campbell
Honored Contributor

Re: script tips help

After thinking about it I thought that keeping this code dumb was the best option. I suggest the following...

#! /usr/bin/sh

LINE="C|2630|000000058|No Item Name Available||9||||||||1|1|1|1|1|N"

numFields=1
subLINE=${LINE#*\|}

while [[ ${subLINE} != ${LINE} && -n ${subLINE} ]]
do
(( numFields = $numFields + 1 ))
subLINE=${subLINE#*\|}
done

print "$numFields"


Add your "read LINE" and similar code as needed.
Hein van den Heuvel
Honored Contributor

Re: script tips help

awk -F'|' 'NF != 19' infile >outfile

Peters solution will give the bad lines.

You woudl need a secong run over the file to give the good lines:

awk -F'|' 'NF == 19' infile > good

Or combine them with

awk -F'|' 'NF==19 {print} NF!=19{print > "Bad.txt\"' Good.txt


Or

awk -F'|' '{ if (NF==19) {print} else {print > "Bad.txt\"}' Good.txt

Hein.



Hein van den Heuvel
Honored Contributor

Re: script tips help

Ooops, remove teh \ from \" in my suggestions. Testing on Windoze again...

Hein.
sheevm
Regular Advisor

Re: script tips help

All,

Thanks a lot for the tips. I will try and let you know and assign points.

kesh
be good and do good
sheevm
Regular Advisor

Re: script tips help

Hi All,

The solution works with two passes. For some reason I am not able to make it work with one pass. I can live with it for now.

Another question is I have this file 2GB I need to split the file into 20MB files. Can someone help with any tips on this?

Thanks
kesh
be good and do good
James R. Ferguson
Acclaimed Contributor

Re: script tips help

Hi:

> Another question is I have this file 2GB I need to split the file into 20MB files. Can someone help with any tips on this?

See the manpages for 'split':

http://docs.hp.com/en/B2355-60127/split.1.html

http://docs.hp.com/en/B2355-60127/csplit.1.html

Regards!

...JRF...
sheevm
Regular Advisor

Re: script tips help

I just got more request on this script:

1. Input file is 2GB I need to split this into 20MB chuncks

2. First field in each line must be replaced by "R" from "C" except the last line must be "X"

This is an urgent production request, not much time for learning curve. My script skills are very limited. I appreciate any help.

Thanks



be good and do good
Peter Nikitka
Honored Contributor

Re: script tips help

Hi,

a size splitting would potentially break a file in the middle of a line - I'm shure this is not what you want.
If sou want to have small sized output files only, I suggest to count line numbers and have a break at e.g. 100000 - configure this and the names of your resulting filenames in the BEGIN section. The output files suffx will start with zero.

awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (NF==19) { g++; print $0>(okb""int(g/lim))}
else {b++; print $0>(fab""int(b/lim))} }' infile

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Peter Nikitka
Honored Contributor

Re: script tips help

Hi,

forgot to ask for your request 2:
What is the 'last line' in respect to request 1?

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
sheevm
Regular Advisor

Re: script tips help

Peter,

Thanks a lot for your help.

Another request is

First field in each line must be replaced by "R" from "C". If it is last line it must be replaced by "X".

I have a question on your awk line:

awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (NF==19) { g++; print $0>(okb""int(g/lim))}
else {b++; print $0>(fab""int(b/lim))} }' infile


what will be the output file names?

Also another point I like to bring it your attention is input file is 2GB file. I hope processing time will not be an issue

Thanks
kesh
be good and do good
sheevm
Regular Advisor

Re: script tips help

Peter,

I see the output file names. I tested your AWK comnad , it seems split is working.

As far as the first field replacement, I was going to read each line in the loop and replacing the first field with "R" except the last line with "X".

Is there a better way to do it?

Thanks
be good and do good
Peter Nikitka
Honored Contributor

Re: script tips help

Hi,

the names of the output files will be
/tmp/outok0, /tmp/outok1, ...
for the OK-lines and
/tmp/fail0, /tmp/fail1, ...
for the irregular lines.

Processing time won't be different for one very big or the sum of fewer lesser big files.

The change from 'C' to 'R' in the first field is easy, the 'X'-substitution not.
Dealing with "the last line" leads to a common problem: during processing we do not know wether more input will arrive or not.
For such a processing we need a buffering mechanism:

awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (buf) print buf>outf
if($1 == "C") sub("^C","R")
buf=$0
if (NF==19) { g++; outf=(okb""int(g/lim))}
else {b++; outf=(fab""int(b/lim))}
}
END {if (buf) {sub("^R","X",buf);print buf >outf} }' infile

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
sheevm
Regular Advisor

Re: script tips help

Peter,

I cut/paste your script. this is the error I got.

awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 8
awk: illegal statement near line 8

-------------------------------------------
awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (buf) print buf>outf
if($1 == "C") sub("^C","R")
buf=$0
if (NF==19) { g++; outf=(okb""int(g/lim))}
else {b++; outf=(fab""int(b/lim))}
}
END {if (buf) {sub("^R","X",buf);print buf >outf} }' infile
be good and do good
sheevm
Regular Advisor

Re: script tips help

Hi All,

I am trying to implement this AWK script which Peter has sent. I am getting syntax error on the "sub" line.

Can someone help me?


awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 8
awk: illegal statement near line 8

-------------------------------------------
awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (buf) print buf>outf
if($1 == "C") sub("^C","R")
buf=$0
if (NF==19) { g++; outf=(okb""int(g/lim))}
else {b++; outf=(fab""int(b/lim))}
}
END {if (buf) {sub("^R","X",buf);print buf >outf} }' infile
be good and do good
Hein van den Heuvel
Honored Contributor

Re: script tips help

>> This is an urgent production request, not much time for learning curve. My script skills are very limited. I appreciate any help.

And I do hope you get all the help you need,
but I can not help but feel worried about an organization which relies on best effort from a bunch of geeks and self proclaimed wizards like myslef to help with 'urgent production problems'.

>> Can someone help me?

Sure, for mere money I'll be glad to solve this problem. Be sure to contact me!.


>> if($1 == "C") sub("^C","R")

I think you'll be {} around the conditional part after the if, and a ; after teh next line?

imho this is not beyong a one-line and should be recoded as a little awk script.

See my UNTESTED re-org below...

Cheers,
Hein.

BEGIN {
okb="/tmp/outok";
fab="/tmp/fail";
g=0;
b=0;
lim=100000
}

{ if (buf) { print buf>outf}
if($1 == "C") { sub("^C","R") }
buf = $0;
if (NF==19) {
g++;
outf=(okb""int(g/lim))
} else {
b++;
outf=(fab""int(b/lim))
}
}
END {
if (buf) {
sub("^R","X",buf);
print buf >outf
}
}


Peter Nikitka
Honored Contributor

Re: script tips help

Hi,

a plain copy out of my ITC answer into a shell worked well - do you use that code not under HP-UX? For Solaris a call to 'nawk' or '/usr/xpg4/bin/awk' is required.

Additional '{}' are not required, ';' only when rearranging the lines of the code: a newline is an implicit semicolon.

I support Hein's suggestion to put the awk core program into an extra file, myprog.awk e.g. . Use
awk -F'|' -f myprog.awk infile

in this case.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
sheevm
Regular Advisor

Re: script tips help

Peter,

You are correct. I am working in Solaris 8.0 box. But the script will implemented in HP-UX 11.23. Currently I have limited access to HP box. I will try to see if I can run this in the HP or make changes to the code as per your comments.

Thanks for all your help.

Hein,

Thanks you for offering your services. Please send me your contact information. We can discuss it.

Thanks
be good and do good
Hein van den Heuvel
Honored Contributor

Re: script tips help

I have my Email in my forum profile.
It is all 16 characters of my name together at gmail or hotmail.

Regards,
Hein van den Heuvel
Peter Nikitka
Honored Contributor

Re: script tips help

Hi kesh,

time....
date +%T....
t4p

i.e. time for points.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
sheevm
Regular Advisor

Re: script tips help

Peter,

Just before your message I have assigned points for your assistance.

By the way, the script is working.

Can you suggest me a good book/tutorial in AWK, SED AND PERL

Thanks
be good and do good
Peter Nikitka
Honored Contributor

Re: script tips help

Hi,

online information of the GNU awk:

http://www.gnu.org/manual/gawk/gawk.html

You'll have to select gawk-only features from those of the nawk/awk family by yourself, however.

Arnold Robbins, maintainer of gawk, wrote "effective awk programming", and there was an AWK+SED book in the O'Reilly series, as well.

I didn't read either of the books (yet), however.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
sheevm
Regular Advisor

Re: script tips help

Hi

Is there any "date" function in awk to get the current system date? Or is there a way to use a shell variable in the body of the "awk" program?

Thanks
be good and do good
Hein van den Heuvel
Honored Contributor

Re: script tips help

>> Is there any "date" function in awk to get the current system date?

If it is there, it is called systime() or strftime(). It depends on the awk version.
Check your manpage / documentation. Gawk has it. Try it.
- system return seconds since 1-jan-1970
- strftime takes a format string and seconds.

>> Or is there a way to use a shell variable in the body of the "awk" program?

Yes:

use: "command" | getline var

For example:

awk 'BEGIN { "date " | getline xx; sub (/..../,"test ",xx); print xx}'

Hein.