Operating System - Linux
1828477 Members
2618 Online
109978 Solutions
New Discussion

Re: help needed scripting urgent plzz

 
SOLVED
Go to solution
Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

of course you shouldn't type blindly and copy my spelling errors into your script - you should know (slowly, but in the end shurely ...) what the characters, that form the awk program do.

My mistake:
awk -v qu=#"' ...
should have been written as
awk -v qu='"' ...

mfG Peter

who really has done some work in this thread here - additional work for you, which really should be done:
http://forums1.itrc.hp.com/service/forums/helptips.do?#28
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Hein van den Heuvel
Honored Contributor

Re: help needed scripting urgent plzz

Just for grins, here is how you could add the extra (length, type) checking using perl as an outline of the priciple. Of course you would not actually want to use perl to do this.

Cheers,
Hein.


--------------------- tmp.pl --------------
use warnings;
use strict;

# element: 0 1 2 3 4 5 6 7 8 9
my @len = (2,2,2,0,1,2,2,0,4,5);
my @num = (0,0,1,0,0,0,1,0,0,1);

sub strip { return ($_[0] =~ /^"(.*)"$/)? $1 : $_[0] }
#
# Replace comma in quoted strings with spaces.
#
while (<>) {
chomp;
my (@quoted) = split /"/;
my ($i)=1;
while ($i < @quoted) {
$quoted[$i] =~ s/,/ /g;
$i += 2;
}
$_ = join "\"", @quoted ;
#
# Now find the real fields
#
my (@values) = split /,/;
#
# Deal with optional fields
#
if (&strip($values[8]) ne "") {
if (&strip($values[3]) eq "") {
print STDERR "Line $. field 3 - missing\n";
}
}
#
# Validate each fields for length (if non-0 length) and numericness
#
for ($i = 0; $i < @values; $i++) {
my $strip = &strip($values[$i]);
#debug print STDERR "-- $.:$i:$num[$i]:$len[$i]:$values[$i]:$strip\n";
if ($len[$i] && (length($strip) != $len[$i])) {
print STDERR "Line $. field $i - bad lenght\n";
}
if ($num[$i] && ($values[$i] =~ /\D+/)) {
print STDERR "Line $. field $i - not numeric\n";
}
}
print "$_\n";
}


--- test data ----
--- notice comma in field on 2nd line ---

"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi,there","sony",12345
"hi","bh",12,"ih","j","mk",34567,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","jxxx","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",3x,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345

---- sample run ---

> perl tmp.pl tmp.tmp > tmp.new
Line 3 field 6 - bad lenght
Line 4 field 4 - missing
Line 5 field 8 - bad lenght
Line 7 field 4 - bad lenght
Line 8 field 6 - not numeric
>
> cat tmp.new
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi there","sony",12345
"hi","bh",12,"ih","j","mk",34567,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","sony",12345
"hi","bh",12,"","j","mk",34,"hi","",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
"hi","bh",12,"ih","jxxx","mk",34,"hi","sony",12345
"hi","bh",12,"ih","j","mk",3x,"hi","sony",12345
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
Sandman!
Honored Contributor

Re: help needed scripting urgent plzz

>i need to check whether each field is of specified length or not..If not return that >record number...AND for numeric fields 3,7,10 check whether they r numeric >or not.

When checking field-length, do you take the double-quotes into account? For ex. if a field is "hi" then its length is 4 (taking " into account) else it's 2 (not taking " into account); so please clarify.
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

sandy, i dont want to take double quotes into account..
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Peter,
i have kept our validation script in a function. how can i return the NR(record in which validation fails) to that function??? when i try to give return $NR outside awk it is not taking. how can i do this?
Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

you asked:
>>
...how can i return the NR(record in which validation fails) to that function??? when i try to give return $NR outside awk it is not taking.
<<

I ask: what script you are talking about?
If you did combine all the solutions together I gave you for your problems, it should be something like this:

awk 'trim_fields_with_commata' /original/file >/tmp/corrected_file

awk -v qu='"' -v len="$len" 'check_structure_of_records' /tmp/corrected_file

If you do not need the temporary file for inspection, just combine the awk's via a pipe. The output or the 2nd awk then contains all ill formated records.

If you want to process them further, simply redirect this output to another file and deal with this one.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Sandman!
Honored Contributor

Re: help needed scripting urgent plzz

I'ave pasted a script which checks whether the fields are of specified length AND whether fields 3 7 and 10 are numeric. If this condition is false then the record number, the message "validation fails", and the records itself are printed. Otherwise the only thing that is outputted is the records itself. Let me know if this is what you wanted? Execute as follows:

# myawkscr.sh inputfile

~hope it helps

========================myawkscr.sh========================
#!/usr/bin/sh

awk -F, '{
for (i=1;i<=NF;++i) {
if ($i~/^"[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+"$/)
printf((i else
printf((i }
}' $1 | awk -F, '{
vlen = 0
numrc = 0
for (i=1;i<=NF;++i) {
if (i==1 || i==2 || i==4 || i==5 || i==6 || i==8)
if (length($i) != 4)
vlen++
if (i==9)
if (length($i) != 6)
vlen++
if (i==3 || i==7) {
if (length($i) != 2)
vlen++
if ($i!~/^"/)
numrc++
}
if (i==10) {
if (length($i) != 5)
vlen++
if ($i!~/^"/)
numrc++
}
}
if (vlen || !numrc)
printf("line %d [validation fails]: %s\n",NR,$0)
else
print
}'
Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi viseshu,

>>
scripting urgent plzz
<<

seems no longer be so urgent :-)
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Peter, the logic u have given me,
len='2 2 2 2 2 2 2 2 4 5'
awk -F, -v qu=#"' len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]){if(!(match($i,qu) && ((length($i)-2)==l[i])) printf("size m
ismatch in line %d, field %d\n",NR,i)}
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}' file

is not working when the file doesnot contain double quotes at the start. So, can you please suggest me wht can be done..
The input file will be
"ABC","124","dkfjkd","jkdf","45678"
"sony","home,hi","890"
NOTE:comma can be present inside""
Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

in one of my replies I told you to correct a typo in the awk syntax. The characters in the assignment to variable 'qu' at the -v option are singlequote-doublequote-singlequote '"'.
To make the algorithm more clear, here the existence of double quotes is checked first.

awk -F, -v qu='"' len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if((match($i,qu)) len2chk=l[i]+2
else len2chk=l[i]
if(length($i)!=len2chk){printf("size m
ismatch in line %d, field %d\n",NR,i)}
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d contains non-numeric data,%s,\n",NR,i,$i)
}
}' file

Taking these two lines of your example input data
"ABC","124","dkfjkd","jkdf","45678"
"sony","home,hi","890"

I take advise to read one of my previous replies to this post:
You MUST consolidate your data first to to fulfil the condition NOT HAVING additional commata in your data when viewing it as fields in respect to a field delimiter "doublequote".
You can use a replacement character which is not used elsewhere (e.g. ^) and re-substitute it later.

But for the above awk solution - and that is true for ALL possible solutions as I stated earlier - data must be clean.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"