Operating System - HP-UX
1748080 Members
5248 Online
108758 Solutions
New Discussion

awk difference (RE) between HP-UX and Linux

 
SOLVED
Go to solution
support_billa
Valued Contributor

awk difference (RE) between HP-UX and Linux

hello,

 

i detect following awk difference between HP-UX and Linux SLES-11

i want to find a entry with RE :

 

file:

;;DB_DEF#field1#field2#FS#junk#junk#junk#junk

 

HPUX OK:

awk -v DB_TOKEN="DB_DEF" -F# '$1 ~ (+DB_TOKEN) { print $1,$2,$3,$4 }' file 

 

LINUX :

it found all other entries of the file, but not the exact entry .....

 

i think my RE is OK ? is it not a standard RE ?

when i change from " (+DB_TOKEN)" to "DB_TOKEN" it works for Linux.

 

regards

13 REPLIES 13
Nighwish
Frequent Advisor

Re: awk difference (RE) between HP-UX and Linux

Hi

 

AWK syntaxes in different from HPUX and LINUX, there is nothing unusual in this behavior.

 

 

Regards.

Bill Hassell
Honored Contributor

Re: awk difference (RE) between HP-UX and Linux

There are several versions of awk.  Many years ago, HP replaced standard awl with nawk but left the name the same. And then there's gawk -- which may be named awk too.

 

Here are some useful references. The first explains a lot of the design differences, the second is a great cheat sheet.

 

http://www.catonmat.net/blog/awk-nawk-and-gawk-cheat-sheet/
http://www.catonmat.net/download/awk.cheat.sheet.pdf

 



Bill Hassell, sysadmin
BowlesCR
Advisor

Re: awk difference (RE) between HP-UX and Linux

All good explanations. I just wanted to throw in that you can download gawk from the Porting and Archive Centre (it installs to a different location than the system awk) and that should be nearly if not completely identical to the way it works in Linux
support_billa
Valued Contributor

Re: awk difference (RE) between HP-UX and Linux

AWK syntaxes in different from HPUX and LINUX, there is nothing unusual in this behavior.

 in my case : what is the right behavior ? HPUX or LINUX ?

 

i use due to this thread below a lot of RE of AWK,

in the last part of the thread are good examples of James and Dennis :

replace a string with "/" in a variable

 

regards

support_billa
Valued Contributor

Re: awk difference (RE) between HP-UX and Linux

i think LINUX awk is gawk, also i tested HPUX gawk and with the info of the thread awk is nawk

 

Info about Version

 

LINUX: Version
awk -W version
GNU Awk 3.1.8

HPUX: Version
gawk -W version
GNU Awk 3.1.5

 

awk -W version isn't allowed in HPUX

 

Test of LINUX and HPUX, different using of RE (+ or .* )


LINUX: OK
DB_TOKEN=DB_DEF
awk -F'#' '$1 ~ /^.*'"${DB_TOKEN}"'$/ { print $1,$2,$3,$4 }' file

awk -v DB_TOKEN="DB_DEF" -F# '$1 ~ (DB_TOKEN) { print $1,$2,$3,$4 }' file
awk -v DB_TOKEN="DB_DEF" -F# '$1 ~ DB_TOKEN { print $1,$2,$3,$4 }' file

LINUX:  NOTOK
awk -v DB_TOKEN="DB_DEF" -F# '$1 ~ (+DB_TOKEN) { print $1,$2,$3,$4 }' file

HPUX: OK
DB_TOKEN=DB_DEF
awk -F'#' '$1 ~ /^.*'"${DB_TOKEN}"'$/ { print $1,$2,$3,$4 }' file

/usr/local/bin/gawk -v DB_TOKEN="DB_DEF" -F# '$1 ~ (DB_TOKEN) { print $1,$2,$3,$4 }' file
/usr/local/bin/gawk -v DB_TOKEN="DB_DEF" -F# '$1 ~ DB_TOKEN   { print $1,$2,$3,$4 }' file
HPUX: NOT OK
/usr/local/bin/gawk -v DB_TOKEN="DB_DEF" -F# '$1 ~ (+DB_TOKEN) { print $1,$2,$3,$4 }' file

 

regards

support_billa
Valued Contributor

Re: awk difference (RE) between HP-UX and Linux

i found a agreement between LINUX and HPUX :

 

awk -v DB_TOKEN="DB_DEF" -F# '$1 ~ (".+"DB_TOKEN) { print $1,$2,$3,$4 }' file

 

OK ?

 

but the options of ERE of gawk isn't possible to use for awk HPUX like r{n,m}  with possix :-((

 

Info:

 

[abc...]   character list, matches any of the characters abc....
[^abc...]  negated character list, matches any character except abc....
r1|r2      alternation: matches either r1 or r2.
r1r2       concatenation: matches r1, and then r2.
r+         matches one or more r's.
r*         matches zero or more r's.
r?         matches zero or one r's.
(r)        grouping: matches r.
r{n}
r{n,m}     One or two numbers inside braces denote an interval expression.  If  there  is  one  
       number  in  the braces,  the preceding regular expression r is repeated n times.  
       If there are two numbers separated by a comma, r is repeated n to m times.  If
       there is one number followed  by  a  comma,  then  r  is repeated at least n times.
           Interval  expressions are only available if either --posix or --re-interval is
       specified on the command line.


Bill Hassell
Honored Contributor

Re: awk difference (RE) between HP-UX and Linux

...but the options of ERE of gawk isn't possible to use for awk HPUX like r{n,m}  with possix :-((

 

The POSIX shell is no different than ksh or bash. Braces (and parenthesis and semicolons, etc) have special meaning to the shell and must therefore be excluded from shell processing. There is no problem at all if the awk statements are in an awk script, but on the command line, you must must single quotes (apostrophes) to turn off shell processing.



Bill Hassell, sysadmin
Dennis Handly
Acclaimed Contributor

Re: awk difference (RE) between HP-UX and Linux

>awk -v DB_TOKEN="DB_DEF" -F# '$1 ~ (".+" DB_TOKEN) { print $1,$2,$3,$4 }' file

 

I assume this is required since you need to do string concatenation and you need that "." before the "+".

 

>but the options of ERE of gawk isn't possible to use for awk HP-UX like r{n,m}  with POSIX

 

Do you have an example where it fails?

 

 

Dennis Handly
Acclaimed Contributor
Solution

Re: awk difference (RE) between HP-UX and Linux

>I think my RE is OK?  Is it not a standard RE?

 

No, this is a bogus ERE, in that it most likely won't do anything useful.

 

>>I assume this is required since you need to do string concatenation and you need that "." before the "+".

 

Yes.  This is the problem. Error recovery is different between the two versions of awk.

 

It appears HP-UX's version is broken.  The Posix standard says to convert a string to a number it should use atof(3).  Unfortunately it doesn't mention clearly if the string is bogus, you get 0.

 

You can see this if you change awk to add:

BEGIN { print "ERE:", (+DB_TOKEN) }

 

For HP-UX, it seems to ignore the unary "+" as do nothing and it prints: DB_DEF

For gawk, it honors unary "+" and converts the bogus string and prints: 0

 

So if you want your ERE to skip one or more chars, you need: (".+" DB_TOKEN)