Operating System - Linux
1752463 Members
5883 Online
108788 Solutions
New Discussion юеВ

How do I make awk select fields between quotes?

 
Stuart Abramson
Trusted Contributor

How do I make awk select fields between quotes?

Do you see how the 1st and 2nd share names in this list are "oneword" entries, but the 3rd and 4th contain spaces?

I want awk to read all fields between quote marks as one field, but I also want a space to be a field seperator.

Can you do that?

share "ERDisk" "/fs02/ERDisk" netbios=OH01PDMS01 maxusr=4294967295 umask=22
share "SPSBackup" "/fs01/SPSBackup" netbios=OH01PDMS01 maxusr=4294967295
share "Architect Team" "/fs01/Architect Team" netbios=OH01PDMS01
share "Storage Area Network" "/fs01/Storage" netbios=OH01PDMS01
9 REPLIES 9
James R. Ferguson
Acclaimed Contributor

Re: How do I make awk select fields between quotes?

Hi Stuart:

Use 'split'. For instance:

# X='share "Storage Area Network" "/fs01/Storage" netbios=OH01PDMS01'

# echo ${X}|awk -F\" '{WORD=split($2,a," ");print a[2]}'

...prints:

Area

...or the second field of the second field.

Regards!

...JRF...
Mike Stroyan
Honored Contributor

Re: How do I make awk select fields between quotes?

Awk isn't really good at parsing quoted
input. There is no feature made just for
quoting input. Here is an implementation
that uses the match() function to identify
pairs of quotes. It then changes spaces
within the quotes to a placeholder character
that can be made back into a space after
splitting the line into fields. You may
want to handle tab characters as well.

$ cat quoting.awk
{
l=$0
q="\""
qsp="\377" # A character for quoted spaces
nq="[^\"]"
while (match(l, q nq "*" q)) { #find quote pairs
quoted=substr(l, RSTART, RLENGTH)
gsub(" ",qsp,quoted) #change spaces to quoted spaces inside quotes
gsub(q,"",quoted) #remove quotes
l2=substr(l, 1, RSTART-1) quoted substr(l, RSTART+RLENGTH)
l=l2 #reassemble line
}
print "parsing '" $0 "'"
nf=split(l, a)
for (i=1; i<=nf; i++) {
gsub(qsp," ",a[i]) # change quoted spaces back to spaces
print "field " i " is '" a[i] "'"
}
}
$ X='share "Storage Area Network" "/fs01/Storage" netbios=OH01PDMS01'
$ echo "${X}" | awk -f quoting.awk
parsing 'share "Storage Area Network" "/fs01/Storage" netbios=OH01PDMS01'
field 1 is 'share'
field 2 is 'Storage Area Network'
field 3 is '/fs01/Storage'
field 4 is 'netbios=OH01PDMS01'

Sandman!
Honored Contributor

Re: How do I make awk select fields between quotes?

Hi Stuart,

Can you show how the output should look like? It's hard to figure out from your post what exactly you're trying to accomplish.

thanks!
Stuart Abramson
Trusted Contributor

Re: How do I make awk select fields between quotes?

I want to break this line:

share "Architect Team" "/fs01/Architect Team" netbios=OH01PDMS01

Into 4 fields:

$1 = share
$2 = Architect Team
$3 = /fs01/Architect
$4 = netbios=OH01PDMS01

Not that $2 has a space in it. I could replace it with something distinctive, like "&". This may not be possible...
Sandman!
Honored Contributor

Re: How do I make awk select fields between quotes?

Assuming your input file contains the records that you want parsed then here's an awk construct that does what you're looking for:

# awk '{sub("\" \"","\"");FS="\"";print $1,$2,$3,$4}' infile

cheers!
Hein van den Heuvel
Honored Contributor

Re: How do I make awk select fields between quotes?


I don't see a single function in AWK to do this.
You'll have to go through the motions.
The example below splits the whole line in words. It then starts building a array of 'tokens' which will be the target deliverable.
For each input word, copy it to the new array, detecting a word starting with a quote. If you do, then look for the first word ending in a quote allowing for that to be the same word.
As long as you do not see an end quote, keep adding words to the same token.

Enjoy!
Hein.


------ demonstration x.awk -----------
{
word_count=split ($0, words, " ");
n=1
for (i=1; i<=word_count; i++) {
word = words[i];
tokens[n]=word;
if (index(word,"\"")==1) {
# drop leading double-quote to deal with quoted single words
word = substr (word,2);
while (index(word,"\"") != length(word)) {
word = words[++i];
tokens[n] = tokens[n] " " word; # single space.
}
}
n++;
}
#
# Finally we have all new tokens
#
print "\nInput #",NR, $0;
for (i=1; i print i, tokens[i];
}
}

------------ demonstration with data from original question

# awk -f x.awk x.txt

Input line # 1 share "ERDisk" "/fs02/ERDisk" netbios=OH01PDMS01 maxusr=4294967295 umask=22
1 share
2 "ERDisk"
3 "/fs02/ERDisk"
4 netbios=OH01PDMS01
5 maxusr=4294967295
6 umask=22

Input line # 2 share "SPSBackup" "/fs01/SPSBackup" netbios=OH01PDMS01 maxusr=4294967295
1 share
2 "SPSBackup"
3 "/fs01/SPSBackup"
4 netbios=OH01PDMS01
5 maxusr=4294967295

Input line # 3 share "Architect Team" "/fs01/Architect Team" netbios=OH01PDMS01
1 share
2 "Architect Team"
3 "/fs01/Architect Team"
4 netbios=OH01PDMS01

Input line # 4 share "Storage Area Network" "/fs01/Storage" netbios=OH01PDMS01
1 share
2 "Storage Area Network"
3 "/fs01/Storage"
4 netbios=OH01PDMS01

#
Hein van den Heuvel
Honored Contributor

Re: How do I make awk select fields between quotes?

Hmmm, a reply by Sandman snuck in which looks interesting but I do not think it does the job.

When we use that with a explicit labels we get:

{print "input", NR, $0;
sub("\" \"","\"");
FS="\"";
print "1:", $1
print "2:", $2
print "3:", $3
print "4:", $4
}

----- demonstruction using the same input data ---

# awk -f y.awk x.txt
input 1 share "ERDisk" "/fs02/ERDisk" netbios=OH01PDMS01 maxusr=4294967295 umask=22
1: share
2: "ERDisk"/fs02/ERDisk"
3: netbios=OH01PDMS01
4: maxusr=4294967295
input 2 share "SPSBackup" "/fs01/SPSBackup" netbios=OH01PDMS01 maxusr=4294967295
1: share
2: SPSBackup
3: /fs01/SPSBackup
4: netbios=OH01PDMS01 maxusr=4294967295
input 3 share "Architect Team" "/fs01/Architect Team" netbios=OH01PDMS01
1: share
2: Architect Team
3: /fs01/Architect Team
4: netbios=OH01PDMS01
input 4 share "Storage Area Network" "/fs01/Storage" netbios=OH01PDMS01
1: share
2: Storage Area Network
3: /fs01/Storage
4: netbios=OH01PDMS01

Not what was requested.

Mind you, mine is also not exactly right, as I left the quotes.
Below a minor adaption to remove the quotes.

----- x.awk removing quotes from tokens ---
{
word_count=split ($0, words, " ");
n=1
for (i=1; i<=word_count; i++) {
word = words[i];
tokens[n]=word;
if (index(word,"\"")==1) {
# drop leading double-quote
word = substr (word,2);
tokens[n] = word;
while (index(word,"\"") != length(word)) {
word = words[++i];
tokens[n] = tokens[n] " " word; # single space.
}
# drop trailing quote
tokens[n] = substr (tokens[n], 1, length(tokens[n]) - 1);
}
n++;
}
#
# Finally proudly display new tokens.
#
print "\nInput #",NR, $0;
for (i=1; i print i, tokens[i];
}
}

--- demonstration --------

# awk -f x.awk x.txt

Input # 1 share "ERDisk" "/fs02/ERDisk" netbios=OH01PDMS01 maxusr=4294967295 umask=22
1 share
2 ERDisk
3 /fs02/ERDisk
4 netbios=OH01PDMS01
5 maxusr=4294967295
6 umask=22

Input # 2 share "SPSBackup" "/fs01/SPSBackup" netbios=OH01PDMS01 maxusr=4294967295
1 share
2 SPSBackup
3 /fs01/SPSBackup
4 netbios=OH01PDMS01
5 maxusr=4294967295

Input # 3 share "Architect Team" "/fs01/Architect Team" netbios=OH01PDMS01
1 share
2 Architect Team
3 /fs01/Architect Team
4 netbios=OH01PDMS01

Input # 4 share "Storage Area Network" "/fs01/Storage" netbios=OH01PDMS01
1 share
2 Storage Area Network
3 /fs01/Storage
4 netbios=OH01PDMS01

Sandman!
Honored Contributor

Re: How do I make awk select fields between quotes?

Hein...the input file I used looks like:

# cat infile
share "Architect Team" "/fs01/Architect Team" netbios=OH01PDMS01
share "Storage Area Network" "/fs01/Storage" netbios=OH01PDMS01

and in order to see the fields one per line here's the revised code:

# awk '{
> sub("\" \"","\"")
> FS="\""
> print "$1= "$1"\n$2= "$2"\n$3= "$3"\n$4= "$4"\n"
> }' infile

the above awk construct outputs...

$1= share
$2= Architect Team
$3= /fs01/Architect Team
$4= netbios=OH01PDMS01

$1= share
$2= Storage Area Network
$3= /fs01/Storage
$4= netbios=OH01PDMS01

...no double quotes in any of the fields

regards!
Hein van den Heuvel
Honored Contributor

Re: How do I make awk select fields between quotes?

You are right Sandman.
It does work for those two lines.
10 point for your clever solution for thsi specific case!
Of course it will split in only 4 fields, but that may well be desirable.
And of course it requires quoted fields 2 and 3, even if they are single words, but that is likely to be the case.

The reason I replied is because it did NOT work on the first line, on my test (XP system).

It turns out that when I copy that first line, it fails on the first instance, but works on the next! Bad luck?

I then tried the same on a Linux box (no access to hpux just now) and the same effect?!
First line fails, but an identical second line works.
I'm at a loss for an explanation.

For a more generic qoutes string handling I would probably turn to perl.
One approach would be to replace each quoted string with a recognizable word, then split and when you see a special word substitute the machting quoted string back:

----------- x.pl -------------
$special = "special";

print "Input #$NR $_";
$i = 0;
while (/\s(".*?")\s/) {
$special{$special.$i} = substr($1, 1, length($1) - 2);
s/$1/$special.$i++/e;
}

$j = 0;
foreach $word (split) {
$word = $special{$word} if ($word =~ /^$special/);
print ++$j, ": $word\n";
}
------------------------
use as: perl -n x.pl x.txt


Btw.. I only just noticed the new 'retain format' after I pushed 'submit'.
I'm going to use this re-reply to see how that would on my awk attempt.

Cheers,
Hein.


{
word_count=split ($0, words, " ");
n=1
for (i=1; i<=word_count; i++) {
word = words[i];
tokens[n]=word;
if (index(word,"\"")==1) {
# drop leading double-quote
word = substr (word,2);
tokens[n] = word;
while (index(word,"\"") != length(word)) {
word = words[++i];
tokens[n] = tokens[n] " " word; # single space.
}
# drop trailing quote
tokens[n] = substr (tokens[n], 1, length(tokens[n]) - 1);
}
n++;
}
#
# Finally we have all new tokens
#
print "\nInput #",NR, $0;
for (i=1; i print i, tokens[i];
}
}