1827930 Members
2014 Online
109973 Solutions
New Discussion

AWK - FS and output

 
SOLVED
Go to solution
rmueller58
Valued Contributor

AWK - FS and output

I have a tabbed delimited flat file, I want to change it to CSV file,

it has 7 fields.

Any thoughts on how I can redefine output?

22 REPLIES 22
Michael Schulte zur Sur
Honored Contributor

Re: AWK - FS and output

Hi,

try
awk -FS: '{gsub("\t",",");print $0}' filename

greetings,

Michael
A. Clay Stephenson
Acclaimed Contributor

Re: AWK - FS and output

This is slighty tricky in that string fields in CSV files must be surrounded by double quotes (in case the strings contain commas). This is a very easy awk or Perl task but it does requires a little customization if string data are in the file.
If it ain't broke, I can fix that.
Sandman!
Honored Contributor

Re: AWK - FS and output

You can use tr for that...

# tr "\t" "," outfile

cheers!
rmueller58
Valued Contributor

Re: AWK - FS and output

I've tried all three suggestions, I am thinking the format of the file is problematic..

If I do:

awk '{print $1"'"$2","....}' infile > outfile

the fields separate on white space with commas, But if it try to tag the "tab" with FS or F the original file is retained in the output file.

I think it is something specific with the field splitting.

I've tried "" ' ' no quotes, around the "\t"
any other ideals?
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: AWK - FS and output

Oh, and the other "gotcha" for CSV files, is that not only must string fields be surrounded by "'s but "'s within string fields must be escapped with an additional ". It's more than simply a trivial tr or sed task.
If it ain't broke, I can fix that.
rmueller58
Valued Contributor

Re: AWK - FS and output

Clay I need to escape the Commas as well?
Hein van den Heuvel
Honored Contributor

Re: AWK - FS and output

Hmm,

Why not simply use tr to replace tab's by comma's?

Do you need to worry about commas in the tab seperated fields, within quoted strings perhaps?
Similarly, can there be tabs in (quoted) fields in the input file?

Simple sample solution:

/mnt/root # cat > x
a b c d e
/mnt/root # od -b x
0000000 141 040 142 011 143 040 144 011 145 012
0000012

Using FS (I is silent :-) and OFS:

/mnt/root # awk 'BEGIN{FS="\011"; OFS=","}{print $1,$2,$3,$4,$5,$6,$7}' x
a b,c d,e,,,,

Using simple gobal substitute:
/mnt/root # awk '{gsub(/\011/,",",$0); print}' x
a b,c d,e


hth,
Hein.
Sandman!
Honored Contributor

Re: AWK - FS and output

Based on Clay's post, if commas are embedded inside the fields, which are separated by characters themselves, then each of the fields should be punctuated by double-quotes.

Here's an awk construct that would help...

# awk -F"\t" '{OFS=",";print c$1c,c$2c,c$3c,c$4c,c$5c,c$6c,c$7c}' c='"' inp >out

cheers!
A. Clay Stephenson
Acclaimed Contributor

Re: AWK - FS and output

In your case, printf in awk will be your friend. The idea is that you use -F as your input field separator and the printf will include comma's in the format string. Without knowing you data format, I can't be more specific because as I have tried to explain, CSV strings require additional work --- if you are build a true CSV file.
If it ain't broke, I can fix that.
Sandman!
Honored Contributor

Re: AWK - FS and output

Hi Rex,

Could you post a sample of your tab delimited file since that would help in deciphering all the special or invisble characters that need to be accounted for or escaped.

cheers!
rmueller58
Valued Contributor

Re: AWK - FS and output

referencecode firstname lastname group homephone emailaddress institution
Sandman,

that is about the best I can do, it contains private info.
rmueller58
Valued Contributor

Re: AWK - FS and output

Data Entry People drive me nuts,

In the First Name field there are a bunch with FN MI, so I would have to substr off the middle initial as well. I don't think the person that build the SQL output realized that.. I am going to take it back to her and have her work it from the Informix side.
James R. Ferguson
Acclaimed Contributor

Re: AWK - FS and output

Hi Rex:

This may accomodate your requirements:

# perl -pe 's/(.+?)\t+/"$1",/g;s/,([^"]+?)$/,"$1"/' filename

Regards!

...JRF...
rmueller58
Valued Contributor

Re: AWK - FS and output

Thanks James, Clay, Sandman and all.

I am going take a step back before I chase the damn goose around the barnyard.

This file she gave acts as if the entire row is one field unless I break it down from the Whitespace.. (which maybe what I wind up doing.. I appreciate the help..

I am going to get the informix syntax book out and work with the person on formatting the output file from Informix instead.
rmueller58
Valued Contributor

Re: AWK - FS and output

Sandman,

this is the output from your first AWK statement
awk -F"\t" '{OFS=",";print c$1c,c$2c,c$3c,c$4c,c$5c,c$6c,c$7c}' c='"' Conn*csv > rex.con


"","","","","","",""
"referencecode firstname lastname group homephone emailaddress institution ","","","
","","",""
"

rmueller58
Valued Contributor

Re: AWK - FS and output

Taking a different tact.. Going to deal with the original file from Informix.
A. Clay Stephenson
Acclaimed Contributor

Re: AWK - FS and output


Okay, these all all string fields so to produce a CSV file w/o headings then create an awk file, my.awk, like this:

function do_csv_string(x)
{
gsub("\"","\"\"",x)
return "\"" x "\""
}

{
printf("%s,%s,%s,%s,%s,%s,%s\n",
do_csv_string($1),do_csv_string($2),
do_csv_string($3),do_csv_string($4),
do_csv_string($5),do_csv_string($6),
do_csv_string($7))
}

Invoke as awk -F '' -f my.awk < infile > outfile

This will correctly enclose each string in double quotes are required by CSV format and , in addition, will escape each quote within a string with an additional quote.
If it ain't broke, I can fix that.
rmueller58
Valued Contributor

Re: AWK - FS and output

Clay, it is putting the "," at the end of each record and not splitting the fields.

A. Clay Stephenson
Acclaimed Contributor

Re: AWK - FS and output

You have incorrectly captured the character as the FS.

awk -F "\t" -f my.awk < infile > outfile

If it ain't broke, I can fix that.
rmueller58
Valued Contributor

Re: AWK - FS and output

Clay,

the output file is not correct..

I am going to bug the SQL programmer a bit when she gets out of her meeting.
rmueller58
Valued Contributor

Re: AWK - FS and output

Clay,

the output file is not correct..

I am going to bug the SQL programmer a bit when she gets out of her meeting.

Thanks for the assistance.. I think the source file might be the real problem here.
rmueller58
Valued Contributor

Re: AWK - FS and output

We changed the format of the Informix Script from "output to" to "unload to" and added the delimiter = ',' the file is format correctly but now informix is ignoring the header record..

One step closer..

Thanks guys..