Operating System - Linux
1753797 Members
7366 Online
108805 Solutions
New Discussion юеВ

help needed scripting urgent plzz

 
SOLVED
Go to solution
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

To make the scenario clear, the file will definetly be like this
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345

numerics are not enclosed in ""
But the script which u have provided me to check the lenghth of each field and
the numeric type check for (3,7,10) fields is not working for this :(
It is only working if " " are not present in the file.

cat file
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345

u provided me
awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}' file

This is not working :) plz peter modify this and plz send mee its very very urgent


Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

I begin to understand now:
You have
- , as field delimiter
- fields with a defined length or its content
- BUT if the record contains quotes ", they do NOT count to this field length

This does not make sense to me:
Best would be to check for the REAL length of a record - whether it contains a quote or not. My algorithm will work well for this.

If possible, adjust the values of the field length to the correct values and no further change will be required.

I think the whole definition of the data format lacks 'well definition' but has hidden assumptions.

Nevertheless I have given you all stuff you need to do even this by yourself already:

...original...
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)

...to...(don't forget: awk -v qu=#"' ..)
if(length($i)!=l[i]) {
if(!(match($i,qu) && ((length($i)-2) == l[i]))
printf("size mismatch in line %d, field %d\n",NR,i)
}

This will additionally check for quotes and adjust the length-definition accordingly.
NOTE: UNTESTED!

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Sandman!
Honored Contributor

Re: help needed scripting urgent plzz

>numerics are not enclosed in ""

If this pattern in your data is consistent then simply test if 3,7 and 10 start with ".

What is still confusing is whether this test is logically ANDed to the test of field length?

>i want to check whether all the fields are of specific lenght which is predefined >(i have kept it in an array len='2 2 2 2 2 2 2 2 4 5' ) and also check whether >numeric fields (3,7,10)are numeric or not.
>line number should be returned in both cases

Does this mean that you want to check if each of the fields is of the specified length AND if 3,7,10 are numeric? If this ANDed test is true then return line number otherwise not??? IMHO...what happens if fields are of specified length but 3,7,10 are not-numeric or vice-versa??? please clarify this.

The script below removes embedded commas from non-numeric fields and prints the line numbers if 3,7,10 are numeric or not. Invoke as:

# myawkscr input_file

==========================================================
#!/usr/bin/sh -x

awk -F, '{
for (i=1;i<=NF;++i) {
if ($i~/^"[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+"$/)
printf((i else
printf((i }
}' $1 | awk -F, '{
if ($3 !~ /^"/ && $7 !~ /^"/ && $10 !~ /^"/)
printf("line %d [$3 $7 $10 all-numeric] %s\n",NR,$0)
else
printf("line %d [$3 $7 $10 non-numeric] %s\n",NR,$0)
}'
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Peter its not working!!
>cat file
"hi","bh",112,"ih","j","mk",34,"hi","sony",12345

len='2 2 2 2 2 2 2 2 4 5'
awk -F, -v qu=#"' len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]){if(!(match($i,qu) && ((length($i)-2)==l[i])) printf("size m
ismatch in line %d, field %d\n",NR,i)}
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}' file
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Sandy,
i need to check whether each field is of specified length or not..If not return that record number...AND for numeric fields 3,7,10 check whether they r numeric or not.
Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

of course you shouldn't type blindly and copy my spelling errors into your script - you should know (slowly, but in the end shurely ...) what the characters, that form the awk program do.

My mistake:
awk -v qu=#"' ...
should have been written as
awk -v qu='"' ...

mfG Peter

who really has done some work in this thread here - additional work for you, which really should be done:
http://forums1.itrc.hp.com/service/forums/helptips.do?#28
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Hein van den Heuvel
Honored Contributor

Re: help needed scripting urgent plzz

Just for grins, here is how you could add the extra (length, type) checking using perl as an outline of the priciple. Of course you would not actually want to use perl to do this.

Cheers,
Hein.


--------------------- tmp.pl --------------
use warnings;
use strict;

# element: 0 1 2 3 4 5 6 7 8 9
my @len = (2,2,2,0,1,2,2,0,4,5);
my @num = (0,0,1,0,0,0,1,0,0,1);

sub strip { return ($_[0] =~ /^"(.*)"$/)? $1 : $_[0] }
#
# Replace comma in quoted strings with spaces.
#
while (<>) {
chomp;
my (@quoted) = split /"/;
my ($i)=1;
while ($i < @quoted) {
$quoted[$i] =~ s/,/ /g;
$i += 2;
}
$_ = join "\"", @quoted ;
#
# Now find the real fields
#
my (@values) = split /,/;
#
# Deal with optional fields
#
if (&strip($values[8]) ne "") {
if (&strip($values[3]) eq "") {
print STDERR "Line $. field 3 - missing\n";
}
}
#
# Validate each fields for length (if non-0 length) and numericness
#
for ($i = 0; $i < @values; $i++) {
my $strip = &strip($values[$i]);
#debug print STDERR "-- $.:$i:$num[$i]:$len[$i]:$values[$i]:$strip\n";
if ($len[$i] && (length($strip) != $len[$i])) {
print STDERR "Line $. field $i - bad lenght\n";
}
if ($num[$i] && ($values[$i] =~ /\D+/)) {
print STDERR "Line $. field $i - not numeric\n";
}
}
print "$_\n";
}


--- test data ----
--- notice comma in field on 2nd line ---

"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi,there","sony",12345
"hi","bh",12,"ih","j","mk",34567,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","jxxx","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",3x,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345

---- sample run ---

> perl tmp.pl tmp.tmp > tmp.new
Line 3 field 6 - bad lenght
Line 4 field 4 - missing
Line 5 field 8 - bad lenght
Line 7 field 4 - bad lenght
Line 8 field 6 - not numeric
>
> cat tmp.new
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi there","sony",12345
"hi","bh",12,"ih","j","mk",34567,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","jxxx","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",3x,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
Sandman!
Honored Contributor

Re: help needed scripting urgent plzz

>i need to check whether each field is of specified length or not..If not return that >record number...AND for numeric fields 3,7,10 check whether they r numeric >or not.

When checking field-length, do you take the double-quotes into account? For ex. if a field is "hi" then its length is 4 (taking " into account) else it's 2 (not taking " into account); so please clarify.
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

sandy, i dont want to take double quotes into account..
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Peter,
i have kept our validation script in a function. how can i return the NR(record in which validation fails) to that function??? when i try to give return $NR outside awk it is not taking. how can i do this?