Operating System - HP-UX
1827807 Members
2741 Online
109969 Solutions
New Discussion

Re: AWK script for more than 200 fields

 
Aishwarya P
New Member

AWK script for more than 200 fields

The application which we run is very old and it uses an awk script to split a huge file into separate file based on the first two columns (like 01, 02, 03, 04 ...)Now due to some enhancement the no. of fields in the input file have increased and its nearly 250, so the awk script fails. Someone please help with an alternative code. I have provided the current script below:

#
awk ' BEGIN {
hold_tbl_code = "00" # initialise hold tbl code
}
{
tbl_code = substr($0,1,2) # set variables
#
if (substr(tbl_code,2,1) != " ")
{
if (tbl_code == hold_tbl_code) # if tbl code = old tbl code
{
print $0 > tbl_filename # print record to file
}
else
{
if (hold_tbl_code == "00") # if first dealer
{ # continue
}
else
{
close(tbl_filename) # close previous tbl file
}
#
tbl_filename = sprintf("%32s%2s%4s",\
"/apps/production/visa/data/table",\
tbl_code,".dat") # set tbl filename variable
#
print $0 > tbl_filename # print data to file
#
hold_tbl_code = tbl_code # move tbl code to hold dlr code
}
}
} '
#
18 REPLIES 18
Dennis Handly
Acclaimed Contributor

Re: AWK script for more than 200 fields

Possibly use gnu awk?

I also don't see you using any fields. So you could use -Fx, where "x" is a char that doesn't appear or appear often in your data.
Aishwarya P
New Member

Re: AWK script for more than 200 fields

It is a very old system which doesnot support gnu awk. Is there any other option to split the file. I'll give a sample format of in put file. the is split as table01, table02, table03.... based on the first two columns such as 01, 02, 03, 04.... All the 01 records are put into table01, 02 records into table02 etc. Now for each record for 03 has increased to 250.

01;VE;003301;020;2009_01_20;045502226;0280;2009_01_20;;0009;0;LS;;;2009_04_28;;N;A;;4373503;N;N;Y;2009_03_06;2009_04_06;2009_04_13;2009_04_07;2009_04_28;;;;;VL999 137777 06/03 20/04/09;
02;045502226;VE;8EZ19;538;;L331286;6G1MZ55Y69L331286;P;DOM;;690F;80U;PHANTOM BLACK;;;;51I;;;
03;VE;045502226;463 A45 A88 AGA AGB AHU AQ9 AW5 AX2 AY0 B13 B34 B35 BMK BSI C63 CE1 CX5 DL1 E20 EOF ESA EVG FE1 GW8 JL4 JL9 L76 MYC N10 N40 N87 NK4 NT3 P34 QWD RHD T81 TT7 U32 U71 UFR UWE UWQ V7O XW6;
04;045502226;VE;18324016;;;;SBP082660042;2099A1;92R003441;01575609;619CVA2015;;
05;045502226;93254552700;;;2B-61232;S0813;
07;045502226;0;VE;;2009_04_21;DELV;2009_04_07;;;;;2009_04_01;HBD OPTIONS=DLOK;
08;045502226;PREF;2009_02_16;;VE;
08;045502226;RELD;2009_03_03;;VE;
08;045502226;SCHD;2009_03_16;;VE;
08;045502226;CHK1;2009_04_06;1;VE;
08;045502226;CHK2;2009_04_06;1;VE;
08;045502226;CHK3;2009_04_07;1;VE;
08;045502226;CHK4;2009_04_07;1;VE;
08;045502226;CHK5;2009_04_07;1;VE;
08;045502226;CHK6;2009_04_07;1;VE;
08;045502226;BILT;2009_04_07;;VE;
08;045502226;TRAN;2009_04_07;;VE;
08;045502226;DELV;2009_04_07;;VE;
09;045502226;463;N;
09;045502226;A45;N;
09;045502226;A88;N;
09;045502226;AGA;N;
09;045502226;AGB;N;
09;045502226;AHU;N;
12;045502226;VE;VL999;;;;;;0280;;0280;VIC;;VIC;;;;GELZ;;
14;045502226;2009_01_19;0009;
14;045502226;2009_04_06;0280;
20;045502226;GELZ;GELZ;;;;;;E;Y;N;N;N;N;N;;0;;;N;;;;;;
01;VE;003306;304;2009_02_03;045504066;0280;2009_02_03;;0252;0;LS;;;2009_04_28;;N;A;;4372186;N;N;N;;2009_04_13;2009_04_22;2009_04_07;2009_04_28;;;;;VW000 487415 03/02 10/04/09 CHRIS JURESKO PET NET;
02;045504066;VE;8EX35;C55;;L331241;6G1EX85729L331241;P;DOM;;690F;80U;PHANTOM BLACK;;;;51I;;;
03;VE;045504066;463 A45 A88 AG3 AH8 AHU AQ9 AX2 AY0 B13 B34 B35 BMK BS2 BSI CE1 CJ2 CX5 DL1 E20 ESA EVF FE1 GW8 JL4 JL9 LY7 M82 N10 N40 N65 N87 NK4 NT3 QI7 RHD U32 U71 UFR UWE UWP V7O XW6 YE3;
04;045504066;VE;18273309;;;;LY7090770452;2209A1;92R003379;01576409;619HKGS219;;
05;045504066;93180665699;;;2B-61192;S0831;
07;045504066;0;VE;;2009_04_21;DELV;2009_04_07;;;;;2009_04_01;HBD OPTIONS=DCAR;
08;045504066;PREF;2009_02_16;;VE;
08;045504066;RELD;2009_03_03;;VE;
08;045504066;SCHD;2009_03_16;;VE;
08;045504066;CHK1;2009_04_06;1;VE;
08;045504066;CHK2;2009_04_06;1;VE;
08;045504066;CHK3;2009_04_07;1;VE;
08;045504066;CHK4;2009_04_07;1;VE;
08;045504066;CHK5;2009_04_07;1;VE;
08;045504066;CHK6;2009_04_07;1;VE;
08;045504066;BILT;2009_04_07;;VE;
08;045504066;TRAN;2009_04_07;;VE;
08;045504066;DELV;2009_04_07;;VE;
09;045504066;463;N;
09;045504066;A45;N;
09;045504066;A88;N;
09;045504066;AG3;N;
09;045504066;AH8;N;
Dennis Handly
Acclaimed Contributor

Re: AWK script for more than 200 fields

>Is there any other option to split the file?

Nothing to split, you don't have any fields other than $0 and one you create with substr.

As I said, you can use -F: or possibly -F"" to fool awk. Or -F"dummy separator".
Aishwarya P
New Member

Re: AWK script for more than 200 fields

No there is no other option. The split of file happens only with the first 2 columns ie 01, 02, 03, 04 ... All these are separate records. All the 01 records will be sorted and sent to table01.dat, similarly 02 , 03, 04... all the records will be stored into different dat files
James R. Ferguson
Acclaimed Contributor

Re: AWK script for more than 200 fields

Hi:

If you can't install GNU 'awk' then what about using Perl? Perl doesn't suffer from arbitrary limits.

Regards!

...JRF...
OldSchool
Honored Contributor

Re: AWK script for more than 200 fields

As Dennis noted, you don't appear to be using "fields", you only have $0 and you do a substring.

so..."no. of fields in the input file have increased and its nearly 250, so the awk script fails." ... can't be an accurate description of the problem.

What error message you are getting???

according to "Sed & Awk", approximations of some common limits for older awks, while implementation specific are:

# fields / record: 100
# characters / record (in or out): 3000
# characters / field: 1024
# characters in printf string: 1024
# files open: 15

I suspect you're bumping in to one of the above, but insufficient information

as far as "very old system which does not support gnu awk."? I take this to mean, "its not installed, can't find pre-compiled depot". If you've got a half-way decent compiler, you can build it.










James R. Ferguson
Acclaimed Contributor

Re: AWK script for more than 200 fields

Hi:

Actually Dennis has given you the easiest solution! As he said, use a dummy record separtor to stop 'awk' from splitting and tallying more the 199 fields in the process.

To demonstate, may a file with too many fields for 'awk' to handle:

# perl -e 'for (1..500) {printf "f%s ",$_};print "\n"' > /tmp/toomany

# awk '{print $1;print substr($0,1,2)}' /tmp/toomany
awk: Line f1 f2 f3 f4 f5 f6 f7 cannot have more than 199 fields.

...now impose a dummy field seperator as Dennis suggested:

# awk -F"\000" '{print $1;print substr($0,1,2)}' /tmp/toomany
f1

...which works...

Of course, the next limit you will bounce off is a line length > 3,000 characters :-)

Regards!

...JRF...
Michael Mike Reaser
Valued Contributor

Re: AWK script for more than 200 fields

I'm with OldSchool - what is the specific awk command you're issuing to get the above program to run, and what error messages are being output?

"...so the awk script fails" doesn't tell us HOW it failed, so to have any idea of how to make it Not-Fail we need to know how it's failing NOW.
There's no place like 127.0.0.1

HP-Server-Literate since 1979
Dennis Handly
Acclaimed Contributor

Re: AWK script for more than 200 fields

>OldSchool: can't be an accurate description of the problem. What error message you are getting?

Sure it can, it should be obvious what the message is and why. Though from the input, I don't see that many fields.

>Michael: what is the specific awk command you're issuing to get the above program to run, and what error messages are being output?
>"...so the awk script fails" doesn't tell us HOW it failed

Sure it does. And you can do the experiment that JRF proposed.

OldSchool
Honored Contributor

Re: AWK script for more than 200 fields

Dennis > "Sure it can, it should be obvious what the message is and why. Though from the input, I don't see that many fields. "

I had one of those "Well...duh!" moments....

didn't particularly *look* at the input, only that the used $0 and a substring, and not $200, $250 or $NF. Oh well...

Just out of curiosity, I did run JRFs stuff thru GNU awk, which *didn't* have a problem with the file at all and produced correct results.

Then, had it '{print NF}' using the sample data posted and the default separators...max was 46. So the sample doesn't appear to be "representative"

of course, given that he *doesn't* reference anything but $0, the "-F" should certainly getting him going, one would think (at least until the length of a record becomes an issue)
Michael Mike Reaser
Valued Contributor

Re: AWK script for more than 200 fields

Urgh. Dennis, when I was in Atlanta, do you remember how I had a great propensity for "can't see the forest for the trees"? Yep, I still do it. :-P

Duhr. Duhr duhr duhr duhr. Duhr.
There's no place like 127.0.0.1

HP-Server-Literate since 1979
Aishwarya P
New Member

Re: AWK script for more than 200 fields

If I use gawk, it says gawk not found also there is man page entry for gawk.
How to upgrade the system to use gawk option:

The server specification:

HP-UX dhp0037 B.10.20 A 9000/851 2013207678 two-user license

Dennis Handly
Acclaimed Contributor

Re: AWK script for more than 200 fields

>How to upgrade the system to use gawk option:

gawk == gnu awk, which you said wasn't supported and now, not installed. (Did you look in /usr/local/bin/*awk?

This is no need to use gawk, if you just use -F"dummy separator".
OldSchool
Honored Contributor

Re: AWK script for more than 200 fields

"If I use gawk, it says gawk not found...."

which means that either "gawk" isn't installed, or that it can't be found in $PATH. But as noted previously, on several occasions, all you need to do is use the "-F" switch and set the field separator to something not used in the data(pehaps "|")

that should suffice to eliminate the "too many fields" error.
James R. Ferguson
Acclaimed Contributor

Re: AWK script for more than 200 fields

Hi:

While you can use any character or regular expression for your inter-field delimiter, I chose the _nul_ character as one that seems highly unlikely to be found in your data and thus the most likely to prevent 'awk' from splitting your input into too many fields.

This is the reason I wrote:

# awk -F"\000" '{print $1;print substr($0,1,2)}' /tmp/toomany

Regards!

...JRF...

Dennis Handly
Acclaimed Contributor

Re: AWK script for more than 200 fields

>JRF: While you can use any character or regular expression

Yes, that's why I suggested to use literally this long string: -F"dummy separator"

If your "fs" is more than one char, you go through the ERE engine. Your -F"\000" is an ERE.
James R. Ferguson
Acclaimed Contributor

Re: AWK script for more than 200 fields

Hi:

> Dennis: ...that's why I suggested to use literally this long string: -F"dummy separator" If your "fs" is more than one char, you go through the ERE engine. Your -F"\000" is an ERE.

OK, and do you say "ERE" because 'awk' supports the Posix ERE engine as opposed to the Posix RE engine?

Too, I could have (should have?) used simply :

"\0"

...in lieu of:

"\000"

Anyway, wouldn't the engine do _less_ work when attempting to match only one character then poteentially matching the "d" in "dummy..." and then having to assess the "u", before (e.g. finding none), bump to the next character in the input string and start all over?

Regards!

...JRF...
Dennis Handly
Acclaimed Contributor

Re: AWK script for more than 200 fields

>I could have (should have?) used simply: "\0"

That's still two chars.

>wouldn't the engine do _less_ work when attempting to match only one character

Yes but it still switches to the ERE path.
If you used a control-A, it would be almost as unique as that NUL.