- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- help needed scripting urgent plzz
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-28-2006 11:58 PM
тАО09-28-2006 11:58 PM
I am having a file with the following format.
"ABC",1809593008,"MYHOME",20061002,"SITON,theback",abcdef,...
There will be so many records in the file and 17 fields in every record. i hav 3 requirements.
1.I want to remove commas(,) if any encountered ONLY in " " in all records.
2.Each field is of specified field length (predefined which im having but i cant use it in a file, i need to hard code them in script). I want to check whether each field is of its predefined length or not.
3. If 15th field is present, then 14th 11th field should also be present. If not it should return the record number.
i need a function a function for the last 2
Solved! Go to Solution.
- Tags:
- csv
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-28-2006 11:59 PM
тАО09-28-2006 11:59 PM
Re: help needed scripting urgent plzz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2006 01:16 AM
тАО09-29-2006 01:16 AM
Re: help needed scripting urgent plzz
Can you provide a short test file for us to work against? Maybe 100 lines or so?
Doug
------
Senior UNIX Admin
O'Leary Computers Inc
linkedin: http://www.linkedin.com/dkoleary
Resume: http://www.olearycomputers.com/resume.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2006 01:39 AM
тАО09-29-2006 01:39 AM
Re: help needed scripting urgent plzz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2006 02:20 AM
тАО09-29-2006 02:20 AM
Re: help needed scripting urgent plzz
Anyway, here is something to get you going.
It does not deal with the length requirements, but if you read and understand the split on double-quote, then you can do something like that spliting the line by commas and walkign the fields.
#cat test.pl
my (@quoted) = split /"/;
my ($i)=1;
while ($i < @quoted) {
$quoted[$i] =~ s/,/ /g;
$i += 2;
}
$_ = join "\"", @quoted ;
my ($one,$two,$three,$four,$five) = split /,/;
if ($four ne "" and $two eq "") {
print STDERR "Missing field X at line $.\n";
}
#cat x.txt
"ABC",1809593008,"MYHOME",20061002,"SITON,theback"
"DEF",1809593008,"MY,HOME",20061002,"SITON,theback"
"GHI",,"MYHOME",,"SITON,theback"
"JKL",,"MYHOME",20061002,"SITON,theback"
"MNO",1809593008,"MYHOME",20061002,"SITON,theback"
# perl -p test.pl x.tmp > y.txt
Missing field X at line 4
#cat y.txt
"ABC",1809593008,"MYHOME",20061002,"SITON theback"
"DEF",1809593008,"MY HOME",20061002,"SITON theback"
"GHI",,"MYHOME",,"SITON theback"
"JKL",,"MYHOME",20061002,"SITON theback"
"MNO",1809593008,"MYHOME",20061002,"SITON theback"
Good luck!
Hein.
- Tags:
- Perl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2006 02:30 AM
тАО09-29-2006 02:30 AM
Re: help needed scripting urgent plzz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2006 06:24 AM
тАО09-29-2006 06:24 AM
Re: help needed scripting urgent plzz
I asked myself, if I should send an answer to a thread, containing the message
>>
I'm not doing it in perl
<<
Why?
Is it allowed to use awk?
But nevertheless ...:
1.) I asume it is NOT allowed for a field to contain a single quote " only.
I use the quote as delimiter and substitue even field numbers only.
cat /tmp/a
"ABC",1809593008,"MYHOME",20061002,"SITON,the,back",abcdef,..,""
awk -F'"' '{printf $1;for(i=2;i<=NF;i++) {if(! (i%2)) gsub(","," ",$i);printf FS""$i}; printf"\n"}' /tmp/a
"ABC",1809593008,"MYHOME",20061002,"SITON the back",abcdef,..,""
2.) I assume your field delimiter is comma, and your data is clean in respect to this delimiter after 1.)
Set a variable containing the length data in the format
len='l1 l2 l3 .. ln'
That way you have built an array where
l[i] contains the length of the i-th field.
=> No hardcoding needed!
awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)}
3.) self explanatory - add to 2.)
{if($15 && !($14 || $11)) print NR}
It should be easy to combine the tasks together.
mfG Peter
- Tags:
- awk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2006 09:26 AM
тАО09-29-2006 09:26 AM
Re: help needed scripting urgent plzz
awk -F, '{
for (i=1;i<=NF;++i) {
if ($i~/^"[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+"$/)
printf((i
printf((i
}' infile
~hope it helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-29-2006 09:53 PM
тАО09-29-2006 09:53 PM
Re: help needed scripting urgent plzz
thanks a lot
1)replacing , with space is working fine but can u please xplain the concept of that even field, i did not get tht..{if(! (i%2)) what is this doing...im not gettting...plzzz explain clearly......:(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-30-2006 10:13 PM
тАО09-30-2006 10:13 PM
Re: help needed scripting urgent plzz
look at this example string:
"ABC",1809593008,"MYHOME",20061002,"SITON,the,back",abcdef,..,""
If you take the quote (") as delimiter, these are your records - I call then f1:
1
2 ABC
3 ,1809593008,
4 MYHOME
5 ,20061002,
6 SITON,the,back
7 abcdef
...
If you take your original delimiter, I call the records f2.
You see, that records in f1 which contain commata are only of interest, when the record number is even.
Odd record numbers of f1 containing commata consist of records of f2 containing NO COMMATA only.
So you must note, how important my assumption to solution 1 is, that there mustn't be fields of f2 containing a single quote only:
If that where the case you couldn't decide by algorithm, how records of f1 and f2 interact together.
So you must transform in even record numbers - exactly only in the even ones - your comma to space.
The % operator is the modulo function, so
(i%2) is zero for even and one for odd i, leading to the expression
(! (i%2)) which evaluates to true for even numbers.
IMHO some looking at the man page of awk could help...
mfG Peter
PS: I really think I have earned some points now :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-02-2006 10:02 PM
тАО10-02-2006 10:02 PM
Re: help needed scripting urgent plzz
awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)}
i want to check whether 3,7,10 fields are numberic or not if not numberic i want to return the record number.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-03-2006 12:51 AM
тАО10-03-2006 12:51 AM
Solutionto check for numeric data in record i only, use something like that:
...
if(match($i,"[^0-9]")) printf("record %d line %d contains non-numeric data\n",i,NR)
...
To integrate this check I suggest:
- set an addtional variable containing the record numbers to check for numerical input
nc='3 7 10'
- feed it to awk the same way as "$len"
- loop over this additional array in every cycle of the loop over all records.
If your field numbers are static, you can do it static in your program as well. Though I do not recommend this generally (things may change and solutions migrate...), this will be faster:
awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)
if ((i==3)||(i==7)||(i==19)) if(match($i,"[^0-9]")) printf("line %d field %d contains non-numeric data,%s,\n",NR,i,$i)
}
}
mfG Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-03-2006 02:15 AM
тАО10-03-2006 02:15 AM
Re: help needed scripting urgent plzz
Yes indeed.
You will be sorry you are not doing it in perl
:-).
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-04-2006 04:09 PM
тАО10-04-2006 04:09 PM
Re: help needed scripting urgent plzz
i want to check whether a variable is numeric(it contains some number) or not how can i do that ????
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-04-2006 05:05 PM
тАО10-04-2006 05:05 PM
Re: help needed scripting urgent plzz
ur script
awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}'
is working very fine with data like
hi,bye,12,ui,ki,lo,344,bo,mlbo,sony
with length array as
len='2 2 2 2 2 2 2 2 4 5'
But concern is
MY DATA CONTAINS "" data like
"hi","bh","12","ih","j","mk","12","hi","sony","12345"
all data will be in "" seperated by comma.Please provide me solution for this scenario..its urgent plzzzzzzzzzz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-04-2006 06:07 PM
тАО10-04-2006 06:07 PM
Re: help needed scripting urgent plzz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-04-2006 06:54 PM
тАО10-04-2006 06:54 PM
Re: help needed scripting urgent plzz
its working fine...thanku very much..il assign the points as soon as im done with my work.
how can u check whether a variable is numeric ornot??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-04-2006 07:07 PM
тАО10-04-2006 07:07 PM
Re: help needed scripting urgent plzz
Do you mean within the awk script or outside of it? Could you give a more concrete example of what you're trying to do?
thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-04-2006 08:57 PM
тАО10-04-2006 08:57 PM
Re: help needed scripting urgent plzz
Ex: if [[ -n $VAR ]] in this way??? if there is no option then im ready to go with awk even.
i have file with many records like
"hi","bh",12,"ih","j","mk",12,"hi","sony",12345
i want to check whether all the fields are of specific lenght which is predefined(i have kept it in an array len='2 2 2 2 2 2 2 2 4 5' ) and also check whether numeric fields (3,7,10)are numeric or not.
line number should be returned in both cases
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-05-2006 12:15 AM
тАО10-05-2006 12:15 AM
Re: help needed scripting urgent plzz
I'm not shure what you want now:
If your data is like you have discribed in your first questions, this 'check-for-numeric' should be done in awk.
Neverthess, data like
"123"
are non-numeric (normally) IMHO.
You have the choice of making an implicit assumption, that your records you want to check contain always data of format "
I would do the second.
So modify your awk
1) to set a variable containing the quote " to get the possibility to handle this character in the inner of the awk program
awk -v qu='"' ...
2) add additional checks for quoted numeric values like
...original...
if(match($i,"[^0-9]")) printf("record %d line %d contains non-numeric data\n",i,NR)
... to ...
if(match($i,qu) {split($i,tt,qu); rec=tt[2]}
else rec=$i
if(match(rec,"[^0-9]")) printf("record %d line %d contains non-numeric data\n",rec,NR)
mfG Peter
mfG Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-05-2006 12:26 AM
тАО10-05-2006 12:26 AM
Re: help needed scripting urgent plzz
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
numerics are not enclosed in ""
But the script which u have provided me to check the lenghth of each field and
the numeric type check for (3,7,10) fields is not working for this :(
It is only working if " " are not present in the file.
cat file
"hi","bh",12,"ih","j","mk",34,"hi","sony",12345
u provided me
awk -F, -v len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}' file
This is not working :) plz peter modify this and plz send mee its very very urgent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-05-2006 03:33 AM
тАО10-05-2006 03:33 AM
Re: help needed scripting urgent plzz
I begin to understand now:
You have
- , as field delimiter
- fields with a defined length or its content
- BUT if the record contains quotes ", they do NOT count to this field length
This does not make sense to me:
Best would be to check for the REAL length of a record - whether it contains a quote or not. My algorithm will work well for this.
If possible, adjust the values of the field length to the correct values and no further change will be required.
I think the whole definition of the data format lacks 'well definition' but has hidden assumptions.
Nevertheless I have given you all stuff you need to do even this by yourself already:
...original...
if(length($i)!=l[i]) printf("size mismatch in line %d, field %d\n",NR,i)
...to...(don't forget: awk -v qu=#"' ..)
if(length($i)!=l[i]) {
if(!(match($i,qu) && ((length($i)-2) == l[i]))
printf("size mismatch in line %d, field %d\n",NR,i)
}
This will additionally check for quotes and adjust the length-definition accordingly.
NOTE: UNTESTED!
mfG Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-05-2006 06:26 AM
тАО10-05-2006 06:26 AM
Re: help needed scripting urgent plzz
If this pattern in your data is consistent then simply test if 3,7 and 10 start with ".
What is still confusing is whether this test is logically ANDed to the test of field length?
>i want to check whether all the fields are of specific lenght which is predefined >(i have kept it in an array len='2 2 2 2 2 2 2 2 4 5' ) and also check whether >numeric fields (3,7,10)are numeric or not.
>line number should be returned in both cases
Does this mean that you want to check if each of the fields is of the specified length AND if 3,7,10 are numeric? If this ANDed test is true then return line number otherwise not??? IMHO...what happens if fields are of specified length but 3,7,10 are not-numeric or vice-versa??? please clarify this.
The script below removes embedded commas from non-numeric fields and prints the line numbers if 3,7,10 are numeric or not. Invoke as:
# myawkscr input_file
==========================================================
#!/usr/bin/sh -x
awk -F, '{
for (i=1;i<=NF;++i) {
if ($i~/^"[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+$/)
printf("%s ",$i)
else if ($i~/^[A-Za-z]+"$/)
printf((i
printf((i
}' $1 | awk -F, '{
if ($3 !~ /^"/ && $7 !~ /^"/ && $10 !~ /^"/)
printf("line %d [$3 $7 $10 all-numeric] %s\n",NR,$0)
else
printf("line %d [$3 $7 $10 non-numeric] %s\n",NR,$0)
}'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-05-2006 04:53 PM
тАО10-05-2006 04:53 PM
Re: help needed scripting urgent plzz
>cat file
"hi","bh",112,"ih","j","mk",34,"hi","sony",12345
len='2 2 2 2 2 2 2 2 4 5'
awk -F, -v qu=#"' len="$len" 'BEGIN {f=split(len,l," ")}
{for(i=1;i<=NF;i++) {
if(length($i)!=l[i]){if(!(match($i,qu) && ((length($i)-2)==l[i])) printf("size m
ismatch in line %d, field %d\n",NR,i)}
if ((i==3)||(i==7)||(i==10)) if(match($i,"[^0-9]")) printf("line %d field %d con
tains non-numeric data,%s,\n",NR,i,$i)
}
}' file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-05-2006 09:36 PM
тАО10-05-2006 09:36 PM
Re: help needed scripting urgent plzz
i need to check whether each field is of specified length or not..If not return that record number...AND for numeric fields 3,7,10 check whether they r numeric or not.