1753259 Members
5636 Online
108792 Solutions
New Discussion юеВ

awk help

 
SOLVED
Go to solution
viseshu
Frequent Advisor

awk help

i have a csv file like
"ab",12345,456,"home ahi","new one","get that me",45678,"new",123,"mnw",45,"mnuens"
"ab",12345,456,"home ahi","new one","get that me",45678,"new",123,"mnw",45,"mnuens","kil je"

I want the fields where a character fields contains double quotes" (in between the field not the " which encloses the field)

Please help me out.
12 REPLIES 12
Dennis Handly
Acclaimed Contributor

Re: awk help

I don't see any entry that contains embedded double quotes??

This script assume that comma isn't embedded. It also assume that double quote isn't embedded in a non-quoted string.

#!/usr/bin/ksh
awk -F"," '
function check_field(i, ff) {
    len = length(ff)
    if (substr(ff, 1, 1) == "\"" && substr(ff, len, 1) == "\"") {
       if (index(substr(ff,2,len-2), "\"") != 0) {
          print "Found embedded double quote"
          print "record:", NR, " field:", i, ":" ff ":"
       }
    }
}
{
# check each field for embedded double quotes
for (i=1; i <= NF; ++i) {
    check_field(i, $i)
}
}' itrc_cvs.in

viseshu
Frequent Advisor

Re: awk help

Dennis, it may happen like a double quote may be missing at the start or end of the field.. Then will it works fine????
Dennis Handly
Acclaimed Contributor
Solution

Re: awk help

No, try this version. Note: it doesn't check for quotes being balanced.
#!/usr/bin/ksh
awk -F"," '
function check_field(i, ff) {
    len = length(ff)
    # strip off any double quotes
    if (substr(ff, len, 1) == "\"")
       --len
       start=1
       if (substr(ff, 1, 1) == "\"") {
          start=2
          --len
       }
       if (index(substr(ff,start,len), "\"") != 0) {
          print "Found embedded double quote"
          print "record:", NR, " field:", i, ":" ff ":"
       }
   }
{
# check each field for embedded double quotes
for (i=1; i <= NF; ++i) {
    check_field(i, $i)
}
}' itrc_cvs.in

Peter Nikitka
Honored Contributor

Re: awk help

Hi,

I would divide this request into two parts:
1) check for balanced quoting
2) generate the requested output

Solution 1)
awk -F, -v qu='"' '{for(i=1;i<=NF;i++) {if(! index($i,qu)) continue
m=match($i,qu".*"qu)
if(m) { if(length($i)==RLENGTH) continue
printf("Err: line=%d f=%d data outside quotes:%s:\n",NR,i,$i)}
else printf("Err: line=%d f=%d unbalanced quotes:%s:\n",NR,i,$i)}}'

Feeding this data
1234,aa","aa bb","dd"hh,"bb,456
would create
Err: line=1 f=2 unbalanced quotes:aa":
Err: line=1 f=4 data outside quotes:"dd"hh:
Err: line=1 f=5 unbalanced quotes:"bb:

Solution 2)
awk -F, -v qu='"' '{out=0; for(i=1;i<=NF;i++) {if(! index($i,qu)) continue
if (match($i,qu".*"qu) && (length($i)==RLENGTH)) {if(out) printf(FS);out++;printf(substr($i,2,RLENGTH-2))}}
if(out) printf("\n")}' above.csv

would create this output (with above data)
ab,home ahi,new one,get that me,new,mnw,mnuens
ab,home ahi,new one,get that me,new,mnw,mnuens,kil je

mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
Sandman!
Honored Contributor

Re: awk help

Another way to do it would to be to count the number of times double-quotes appear in the fields. If the count is other than zero (numeric field) or two (field punctuated by double-quotes) then print that field. If the field is zero then do nothing. If count==2, print the field if quotes aren't at the ends.

See the awk script below...

awk -F, '{
for (i=1;i<=NF;++i) {
f=$i
s=gsub("\"","",f)
if (s) {
if (s==2) {
if ($i!~/^".*"$/)
print $i
}
else
print $i
}
}
}' file
john korterman
Honored Contributor

Re: awk help

Hi,

a simple approach based on an idea similar to Sandman's:

#!/usr/bin/sh

typeset -i POS=0 NUMBER=0

while read LINE
do
while [ POS -lt ${#LINE} ]
do
POS=$POS+1
KARAK=$(echo ${LINE} | cut -c $POS)
if [ "$KARAK" = "\"" ]
then
NUMBER=$(( $NUMBER + 1 ))
KARAK=" "
fi
let DECISION=NUMBER%2
if [ "$DECISION" != 0 ]
then
echo "$KARAK\c"
fi
done
echo ""
POS=0
NUMBER=0
done <$1


Run it using your inputfile as $1

regards,
John K.
it would be nice if you always got a second chance
viseshu
Frequent Advisor

Re: awk help

Is it possible to enclose the number fields with double quotes????(6,10,19 fields)
Presently double quotes are not enclosed with ""
curt larson_1
Honored Contributor

Re: awk help

i know you asked for help using awk.

but, there are several perl modules written for handling csv files. Just a suggestion, use code that has already been written and tested instead of creating your own.

http://www.perlmeme.org/tutorials/parsing_csv.html

http://www.interopp.org/ncci/man/CSV.htm

http://search.cpan.org/search?query=csv&mode=all
Sandman!
Honored Contributor

Re: awk help

The numeric fields can have double-quotes and they will be treated the same as the non-numeric fields and irrespective of whether the field contains only double-quotes.

~hope it helps