script tips help

sheevm · ‎03-09-2007

Hi All,

I have in my put file, each line contains "|" delimiter. I have to process this file counting the number of pipes. If the count is not 18 I have to put those line into a different files.

Basically spilt the files into two.

on file with lines that has 18 pipes.
other lines.

Please see the sample input line:

C|2630|000000058|No Item Name Available||9||||||||1|1|1|1|1|N

thanks

be good and do good

Peter Nikitka · ‎03-09-2007

Hi,

18 delimiters == 19 fields; awk solution:

awk -F'|' 'NF != 19' infile >outfile

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

Bob E Campbell · ‎03-09-2007

After thinking about it I thought that keeping this code dumb was the best option. I suggest the following...

#! /usr/bin/sh

LINE="C|2630|000000058|No Item Name Available||9||||||||1|1|1|1|1|N"

numFields=1
subLINE=${LINE#*\|}

while [[ ${subLINE} != ${LINE} && -n ${subLINE} ]]
do
(( numFields = $numFields + 1 ))
subLINE=${subLINE#*\|}
done

print "$numFields"

Add your "read LINE" and similar code as needed.

Hein van den Heuvel · ‎03-09-2007

awk -F'|' 'NF != 19' infile >outfile

Peters solution will give the bad lines.

You woudl need a secong run over the file to give the good lines:

awk -F'|' 'NF == 19' infile > good

Or combine them with

awk -F'|' 'NF==19 {print} NF!=19{print > "Bad.txt\"' Good.txt

Or

awk -F'|' '{ if (NF==19) {print} else {print > "Bad.txt\"}' Good.txt

Hein.

Hein van den Heuvel · ‎03-09-2007

Ooops, remove teh \ from \" in my suggestions. Testing on Windoze again...

Hein.

sheevm · ‎03-09-2007

All,

Thanks a lot for the tips. I will try and let you know and assign points.

kesh

be good and do good

sheevm · ‎03-12-2007

Hi All,

The solution works with two passes. For some reason I am not able to make it work with one pass. I can live with it for now.

Another question is I have this file 2GB I need to split the file into 20MB files. Can someone help with any tips on this?

Thanks
kesh

be good and do good

James R. Ferguson · ‎03-12-2007

Hi:

> Another question is I have this file 2GB I need to split the file into 20MB files. Can someone help with any tips on this?

See the manpages for 'split':

http://docs.hp.com/en/B2355-60127/split.1.html

http://docs.hp.com/en/B2355-60127/csplit.1.html

Regards!

...JRF...

sheevm · ‎03-12-2007

I just got more request on this script:

1. Input file is 2GB I need to split this into 20MB chuncks

2. First field in each line must be replaced by "R" from "C" except the last line must be "X"

This is an urgent production request, not much time for learning curve. My script skills are very limited. I appreciate any help.

Thanks

be good and do good

Peter Nikitka · ‎03-12-2007

Hi,

a size splitting would potentially break a file in the middle of a line - I'm shure this is not what you want.
If sou want to have small sized output files only, I suggest to count line numbers and have a break at e.g. 100000 - configure this and the names of your resulting filenames in the BEGIN section. The output files suffx will start with zero.

awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (NF==19) { g++; print $0>(okb""int(g/lim))}
else {b++; print $0>(fab""int(b/lim))} }' infile

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

Peter Nikitka · ‎03-12-2007

Hi,

forgot to ask for your request 2:
What is the 'last line' in respect to request 1?

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

sheevm · ‎03-12-2007

Peter,

Thanks a lot for your help.

Another request is

First field in each line must be replaced by "R" from "C". If it is last line it must be replaced by "X".

I have a question on your awk line:

awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (NF==19) { g++; print $0>(okb""int(g/lim))}
else {b++; print $0>(fab""int(b/lim))} }' infile

what will be the output file names?

Also another point I like to bring it your attention is input file is 2GB file. I hope processing time will not be an issue

Thanks
kesh

be good and do good

sheevm · ‎03-12-2007

Peter,

I see the output file names. I tested your AWK comnad , it seems split is working.

As far as the first field replacement, I was going to read each line in the loop and replacing the first field with "R" except the last line with "X".

Is there a better way to do it?

Thanks

be good and do good

Peter Nikitka · ‎03-12-2007

Hi,

the names of the output files will be
/tmp/outok0, /tmp/outok1, ...
for the OK-lines and
/tmp/fail0, /tmp/fail1, ...
for the irregular lines.

Processing time won't be different for one very big or the sum of fewer lesser big files.

The change from 'C' to 'R' in the first field is easy, the 'X'-substitution not.
Dealing with "the last line" leads to a common problem: during processing we do not know wether more input will arrive or not.
For such a processing we need a buffering mechanism:

awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (buf) print buf>outf
if($1 == "C") sub("^C","R")
buf=$0
if (NF==19) { g++; outf=(okb""int(g/lim))}
else {b++; outf=(fab""int(b/lim))}
}
END {if (buf) {sub("^R","X",buf);print buf >outf} }' infile

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

sheevm · ‎03-12-2007

Peter,

I cut/paste your script. this is the error I got.

awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 8
awk: illegal statement near line 8

-------------------------------------------
awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (buf) print buf>outf
if($1 == "C") sub("^C","R")
buf=$0
if (NF==19) { g++; outf=(okb""int(g/lim))}
else {b++; outf=(fab""int(b/lim))}
}
END {if (buf) {sub("^R","X",buf);print buf >outf} }' infile

be good and do good

sheevm · ‎03-12-2007

Hi All,

I am trying to implement this AWK script which Peter has sent. I am getting syntax error on the "sub" line.

Can someone help me?

awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 8
awk: illegal statement near line 8

-------------------------------------------
awk -F'|' 'BEGIN {okb="/tmp/outok"; fab="/tmp/fail"; g=0;b=0;lim=100000}
{ if (buf) print buf>outf
if($1 == "C") sub("^C","R")
buf=$0
if (NF==19) { g++; outf=(okb""int(g/lim))}
else {b++; outf=(fab""int(b/lim))}
}
END {if (buf) {sub("^R","X",buf);print buf >outf} }' infile

be good and do good

Hein van den Heuvel · ‎03-12-2007

>> This is an urgent production request, not much time for learning curve. My script skills are very limited. I appreciate any help.

And I do hope you get all the help you need,
but I can not help but feel worried about an organization which relies on best effort from a bunch of geeks and self proclaimed wizards like myslef to help with 'urgent production problems'.

>> Can someone help me?

Sure, for mere money I'll be glad to solve this problem. Be sure to contact me!.

>> if($1 == "C") sub("^C","R")

I think you'll be {} around the conditional part after the if, and a ; after teh next line?

imho this is not beyong a one-line and should be recoded as a little awk script.

See my UNTESTED re-org below...

Cheers,
Hein.

BEGIN {
okb="/tmp/outok";
fab="/tmp/fail";
g=0;
b=0;
lim=100000
}

{ if (buf) { print buf>outf}
if($1 == "C") { sub("^C","R") }
buf = $0;
if (NF==19) {
g++;
outf=(okb""int(g/lim))
} else {
b++;
outf=(fab""int(b/lim))
}
}
END {
if (buf) {
sub("^R","X",buf);
print buf >outf
}
}

Peter Nikitka · ‎03-13-2007

Hi,

a plain copy out of my ITC answer into a shell worked well - do you use that code not under HP-UX? For Solaris a call to 'nawk' or '/usr/xpg4/bin/awk' is required.

Additional '{}' are not required, ';' only when rearranging the lines of the code: a newline is an implicit semicolon.

I support Hein's suggestion to put the awk core program into an extra file, myprog.awk e.g. . Use
awk -F'|' -f myprog.awk infile

in this case.

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

sheevm · ‎03-13-2007

Peter,

You are correct. I am working in Solaris 8.0 box. But the script will implemented in HP-UX 11.23. Currently I have limited access to HP box. I will try to see if I can run this in the HP or make changes to the code as per your comments.

Thanks for all your help.

Hein,

Thanks you for offering your services. Please send me your contact information. We can discuss it.

Thanks

be good and do good

Hein van den Heuvel · ‎03-13-2007

I have my Email in my forum profile.
It is all 16 characters of my name together at gmail or hotmail.

Regards,
Hein van den Heuvel

Peter Nikitka · ‎03-13-2007

Hi kesh,

time....
date +%T....
t4p

i.e. time for points.

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

sheevm · ‎03-13-2007

Peter,

Just before your message I have assigned points for your assistance.

By the way, the script is working.

Can you suggest me a good book/tutorial in AWK, SED AND PERL

Thanks

be good and do good

Peter Nikitka · ‎03-13-2007

Hi,

online information of the GNU awk:

http://www.gnu.org/manual/gawk/gawk.html

You'll have to select gawk-only features from those of the nawk/awk family by yourself, however.

Arnold Robbins, maintainer of gawk, wrote "effective awk programming", and there was an AWK+SED book in the O'Reilly series, as well.

I didn't read either of the books (yet), however.

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

sheevm · ‎03-14-2007

Hi

Is there any "date" function in awk to get the current system date? Or is there a way to use a shell variable in the body of the "awk" program?

Thanks

be good and do good

Hein van den Heuvel · ‎03-14-2007

>> Is there any "date" function in awk to get the current system date?

If it is there, it is called systime() or strftime(). It depends on the awk version.
Check your manpage / documentation. Gawk has it. Try it.
- system return seconds since 1-jan-1970
- strftime takes a format string and seconds.

>> Or is there a way to use a shell variable in the body of the "awk" program?

Yes:

use: "command" | getline var

For example:

awk 'BEGIN { "date " | getline xx; sub (/..../,"test ",xx); print xx}'

Hein.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

script tips help

script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help

Re: script tips help