Operating System - Linux
1839301 Members
2761 Online
110138 Solutions
New Discussion

Re: help needed scripting urgent plzz

 
SOLVED
Go to solution
viseshu
Frequent Advisor

help needed scripting urgent plzz

hi all,

I am having a file with the following format.
"ABC",1809593008,"MYHOME",20061002,"SITON,theback",abcdef,...
There will be so many records in the file and 17 fields in every record. i hav 3 requirements.
1.I want to remove commas(,) if any encountered ONLY in " " in all records.

2.Each field is of specified field length (predefined which im having but i cant use it in a file, i need to hard code them in script). I want to check whether each field is of its predefined length or not.
3. If 15th field is present, then 14th 11th field should also be present. If not it should return the record number.

i need a function a function for the last 2
34 REPLIES 34
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

plz note : i want to replace the comma mentioned above with a space
Doug O'Leary
Honored Contributor

Re: help needed scripting urgent plzz

Hey;

Can you provide a short test file for us to work against? Maybe 100 lines or so?

Doug

------
Senior UNIX Admin
O'Leary Computers Inc
linkedin: http://www.linkedin.com/dkoleary
Resume: http://www.olearycomputers.com/resume.html
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Iam atttaching a short file containing 8 fields in each record. As i dont hav any sample file of the same specification im attaching this. Please help
Hein van den Heuvel
Honored Contributor

Re: help needed scripting urgent plzz

Hmm, that's a bit lame not to have decend sample data. How will you verify your work?

Anyway, here is something to get you going.
It does not deal with the length requirements, but if you read and understand the split on double-quote, then you can do something like that spliting the line by commas and walkign the fields.

#cat test.pl
my (@quoted) = split /"/;
my ($i)=1;
while ($i < @quoted) {
$quoted[$i] =~ s/,/ /g;
$i += 2;
}
$_ = join "\"", @quoted ;
my ($one,$two,$three,$four,$five) = split /,/;
if ($four ne "" and $two eq "") {
print STDERR "Missing field X at line $.\n";
}

#cat x.txt
"ABC",1809593008,"MYHOME",20061002,"SITON,theback"
"DEF",1809593008,"MY,HOME",20061002,"SITON,theback"
"GHI",,"MYHOME",,"SITON,theback"
"JKL",,"MYHOME",20061002,"SITON,theback"
"MNO",1809593008,"MYHOME",20061002,"SITON,theback"

# perl -p test.pl x.tmp > y.txt
Missing field X at line 4

#cat y.txt
"ABC",1809593008,"MYHOME",20061002,"SITON theback"
"DEF",1809593008,"MY HOME",20061002,"SITON theback"
"GHI",,"MYHOME",,"SITON theback"
"JKL",,"MYHOME",20061002,"SITON theback"
"MNO",1809593008,"MYHOME",20061002,"SITON theback"

Good luck!
Hein.
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Hein, sorry im not doing it in perl..
Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

I asked myself, if I should send an answer to a thread, containing the message
>>
I'm not doing it in perl
<<
Why?
Is it allowed to use awk?

But nevertheless ...:
1.) I asume it is NOT allowed for a field to contain a single quote " only.
I use the quote as delimiter and substitue even field numbers only.
cat /tmp/a
"ABC",1809593008,"MYHOME",20061002,"SITON,the,back",abcdef,..,""

awk -F'"' '{printf $1;for(i=2;i<=NF;i++) {if(! (i%2)) gsub(","," ",$i);printf FS""$i}; printf"\n"}' /tmp/a
"ABC",1809593008,"MYHOME",20061002,"SITON the back",abcdef,..,""


2.) I assume your field delimiter is comma, and your data is clean in respect to this delimiter after 1.)

Set a variable containing the length data in the format
len='l1 l2 l3 .. ln'
That way you have built an array where
l[i] contains the length of the i-th field.
=> No hardcoding needed!

awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)}


3.) self explanatory - add to 2.)
{if($15 && !($14 || $11)) print NR}


It should be easy to combine the tasks together.

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Sandman!
Honored Contributor

Re: help needed scripting urgent plzz

If you're okay with awk then try the script below. It removes embedded commas from fields that are alphabetic strings and are enclosed in double-quotes:

awk -F, '{
for (i=1;i<=NF;++i) {
if ($i~/^"[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+"$/)
printf((i else
printf((i }
}' infile

~hope it helps
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Peter,
thanks a lot
1)replacing , with space is working fine but can u please xplain the concept of that even field, i did not get tht..{if(! (i%2)) what is this doing...im not gettting...plzzz explain clearly......:(
Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

look at this example string:
"ABC",1809593008,"MYHOME",20061002,"SITON,the,back",abcdef,..,""

If you take the quote (") as delimiter, these are your records - I call then f1:
1
2 ABC
3 ,1809593008,
4 MYHOME
5 ,20061002,
6 SITON,the,back
7 abcdef
...

If you take your original delimiter, I call the records f2.

You see, that records in f1 which contain commata are only of interest, when the record number is even.
Odd record numbers of f1 containing commata consist of records of f2 containing NO COMMATA only.
So you must note, how important my assumption to solution 1 is, that there mustn't be fields of f2 containing a single quote only:
If that where the case you couldn't decide by algorithm, how records of f1 and f2 interact together.


So you must transform in even record numbers - exactly only in the even ones - your comma to space.

The % operator is the modulo function, so
(i%2) is zero for even and one for odd i, leading to the expression
(! (i%2)) which evaluates to true for even numbers.


IMHO some looking at the man page of awk could help...

mfG Peter

PS: I really think I have earned some points now :-)
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Thank u very much peter.....very helpful ur explanation....and 1 more thing is like,for 3) along with the current requirement i want to check whether 3,7,10 filds are numeric or not ..u have provided me

awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)}

i want to check whether 3,7,10 fields are numberic or not if not numberic i want to return the record number.
Peter Nikitka
Honored Contributor
Solution

Re: help needed scripting urgent plzz

Hi,

to check for numeric data in record i only, use something like that:

...
if(match($i,"[^0-9]")) printf("record %d line %d contains non-numeric data\n",i,NR)
...

To integrate this check I suggest:
- set an addtional variable containing the record numbers to check for numerical input
nc='3 7 10'
- feed it to awk the same way as "$len"
- loop over this additional array in every cycle of the loop over all records.
If your field numbers are static, you can do it static in your program as well. Though I do not recommend this generally (things may change and solutions migrate...), this will be faster:

awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)
if ((i==3)||(i==7)||(i==19)) if(match($i,"[^0-9]")) printf("line %d field %d contains non-numeric data,%s,\n",NR,i,$i)
}
}

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Hein van den Heuvel
Honored Contributor

Re: help needed scripting urgent plzz

>> Hein, sorry im not doing it in perl..

Yes indeed.

You will be sorry you are not doing it in perl

:-).

Hein.
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Peter,
i want to check whether a variable is numeric(it contains some number) or not how can i do that ????
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Peter,
ur script
awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}'
is working very fine with data like
hi,bye,12,ui,ki,lo,344,bo,mlbo,sony
with length array as
len='2 2 2 2 2 2 2 2 4 5'
But concern is
MY DATA CONTAINS "" data like
"hi","bh","12","ih","j","mk","12","hi","sony","12345"
all data will be in "" seperated by comma.Please provide me solution for this scenario..its urgent plzzzzzzzzzz
Sandman!
Honored Contributor

Re: help needed scripting urgent plzz

Did you try my awk script...does it not meet your requirements?
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

sandy,
its working fine...thanku very much..il assign the points as soon as im done with my work.
how can u check whether a variable is numeric ornot??
Sandman!
Honored Contributor

Re: help needed scripting urgent plzz

>how can u check whether a variable is numeric ornot??

Do you mean within the awk script or outside of it? Could you give a more concrete example of what you're trying to do?

thanks!
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

not in awk sandy, just like as we check whether a string is not null or not
Ex: if [[ -n $VAR ]] in this way??? if there is no option then im ready to go with awk even.
i have file with many records like
"hi","bh",12,"ih","j","mk",12,"hi","sony",12345
i want to check whether all the fields are of specific lenght which is predefined(i have kept it in an array len='2 2 2 2 2 2 2 2 4 5' ) and also check whether numeric fields (3,7,10)are numeric or not.
line number should be returned in both cases
Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

I'm not shure what you want now:
If your data is like you have discribed in your first questions, this 'check-for-numeric' should be done in awk.
Neverthess, data like
"123"
are non-numeric (normally) IMHO.

You have the choice of making an implicit assumption, that your records you want to check contain always data of format "" or checking that explicitly.
I would do the second.

So modify your awk
1) to set a variable containing the quote " to get the possibility to handle this character in the inner of the awk program

awk -v qu='"' ...

2) add additional checks for quoted numeric values like
...original...
if(match($i,"[^0-9]")) printf("record %d line %d contains non-numeric data\n",i,NR)
... to ...

if(match($i,qu) {split($i,tt,qu); rec=tt[2]}
else rec=$i
if(match(rec,"[^0-9]")) printf("record %d line %d contains non-numeric data\n",rec,NR)

mfG Peter
mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

To make the scenario clear, the file will definetly be like this
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345

numerics are not enclosed in ""
But the script which u have provided me to check the lenghth of each field and
the numeric type check for (3,7,10) fields is not working for this :(
It is only working if " " are not present in the file.

cat file
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345

u provided me
awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}' file

This is not working :) plz peter modify this and plz send mee its very very urgent


Peter Nikitka
Honored Contributor

Re: help needed scripting urgent plzz

Hi,

I begin to understand now:
You have
- , as field delimiter
- fields with a defined length or its content
- BUT if the record contains quotes ", they do NOT count to this field length

This does not make sense to me:
Best would be to check for the REAL length of a record - whether it contains a quote or not. My algorithm will work well for this.

If possible, adjust the values of the field length to the correct values and no further change will be required.

I think the whole definition of the data format lacks 'well definition' but has hidden assumptions.

Nevertheless I have given you all stuff you need to do even this by yourself already:

...original...
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)

...to...(don't forget: awk -v qu=#"' ..)
if(length($i)!=l[i]) {
if(!(match($i,qu) && ((length($i)-2) == l[i]))
printf("size mismatch in line %d, field %d\n",NR,i)
}

This will additionally check for quotes and adjust the length-definition accordingly.
NOTE: UNTESTED!

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Sandman!
Honored Contributor

Re: help needed scripting urgent plzz

>numerics are not enclosed in ""

If this pattern in your data is consistent then simply test if 3,7 and 10 start with ".

What is still confusing is whether this test is logically ANDed to the test of field length?

>i want to check whether all the fields are of specific lenght which is predefined >(i have kept it in an array len='2 2 2 2 2 2 2 2 4 5' ) and also check whether >numeric fields (3,7,10)are numeric or not.
>line number should be returned in both cases

Does this mean that you want to check if each of the fields is of the specified length AND if 3,7,10 are numeric? If this ANDed test is true then return line number otherwise not??? IMHO...what happens if fields are of specified length but 3,7,10 are not-numeric or vice-versa??? please clarify this.

The script below removes embedded commas from non-numeric fields and prints the line numbers if 3,7,10 are numeric or not. Invoke as:

# myawkscr input_file

==========================================================
#!/usr/bin/sh -x

awk -F, '{
for (i=1;i<=NF;++i) {
if ($i~/^"[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+"$/)
printf((i else
printf((i }
}' $1 | awk -F, '{
if ($3 !~ /^"/ && $7 !~ /^"/ && $10 !~ /^"/)
printf("line %d [$3 $7 $10 all-numeric] %s\n",NR,$0)
else
printf("line %d [$3 $7 $10 non-numeric] %s\n",NR,$0)
}'
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Peter its not working!!
>cat file
"hi","bh",112,"ih","j","mk",34,"hi","sony",12345

len='2 2 2 2 2 2 2 2 4 5'
awk -F, -v qu=#"' len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]){if(!(match($i,qu) && ((length($i)-2)==l[i])) printf("size m
ismatch in line %d, field %d\n",NR,i)}
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}' file
viseshu
Frequent Advisor

Re: help needed scripting urgent plzz

Sandy,
i need to check whether each field is of specified length or not..If not return that record number...AND for numeric fields 3,7,10 check whether they r numeric or not.