Operating System - HP-UX
1829627 Members
1627 Online
109992 Solutions
New Discussion

Re: Comma Separated Files

 
Edgar_10
Frequent Advisor

Comma Separated Files

Hi,

I have a comma separated file containing irrelevant data which is separated by commas, does anyone know of a way I can extract only the fields/columns which have data and ignore the blank/comma separated values?

Thanks
14 REPLIES 14
William Wong_2
Trusted Contributor

Re: Comma Separated Files

I think you will need to give slightly more detail in terms of what you are trying to accomplish. You will be importing this information into what? A spreadsheet? A database? Are you sure you want to get rid of the fields that have blank data? You can use
vi to edit the file and the following will eliminate the blank/comma separated values:

:s/ ,//g

The above searches for a blank space followed by a comma and deletes both and replaces them globally in the file. But you do need to provide more information for someone to be able to help you.
Edgar_10
Frequent Advisor

Re: Comma Separated Files

Hi William,

I need to extract only the field which contain data as that is the data I will use with oracle sql loader to load into a table. Basically I have a comma separated files which contain approx.6000 records. Each record has about 100 fields but only 20 are populated with data, the rest is comma separated blanks.

What I want to do is extract only the 20 populated fields.

Thanks!
Edgar_10
Frequent Advisor

Re: Comma Separated Files

Hi William,

I need to extract only the field which contain data as that is the data I will use with oracle sql loader to load into a table. Basically I have a comma separated files which contain approx.6000 records. Each record has about 100 fields but only 20 are populated with data, the rest is comma separated blanks.

What I want to do is extract only the 20 populated fields.

Thanks!
Kenneth_19
Trusted Contributor

Re: Comma Separated Files

Hi,

A cut might probably help you, see the example below:

# echo "a,,c,,e" | cut -d , -f 1,3,5
a,c,e

You should know which field(s) in the record contains data and I believe that the sequence of fields that contain data are same for each record, so to convert the file to extract only fields with data, you can:

# cat input.csv | cut -d , -f 2,3,4,8,9 > output.csv

you can get the new csv file with only the fields you want.

Kenneth
Always take care of your dearest before it is too late
Massimo Bianchi
Honored Contributor

Re: Comma Separated Files

Hi,
my idea is the following: we translate each comma in a space, and after all you your input is separated by space.

After you can get all the data with a cycle..


example:

ciccio.txt contains

1,2,3,,,,5,6,7,,9
,,3,4,6,,8,,,2,,4,5,


Then we do:


cat ciccio.txt | tr "," " " | while read ....
do

done


the series of dots should be at least equal to the max number of argument expected.

HTH,
Massimo
Edgar_10
Frequent Advisor

Re: Comma Separated Files

Hi Ken,

I had already attempted to use "cut" but this is only for fixed format records and unfortunately not all the records contain identical populated columns/fields.

Do you/anyone know of a perl/sed/awk script that could read through a file, and pick out specific fields/columns?

Thanks in advance!
Christian Gebhardt
Honored Contributor

Re: Comma Separated Files

Hi

Something like that ???

# cat c.txt
, , , , 2 , ,2 ,3,3, , , ,
3, , , ,3,4, , ,4,5 , ,5 ,5 ,
,45,5,345,5,4,6,6, , , ,5 , ,

# sed 's/,/ /g' c.txt | sed 's/ */,/g' | sed -e 's/^,//' -e 's/,$//'

2,2,3,3
3,3,4,4,5,5,5
45,5,345,5,4,6,6,5

first sed replaces , with space
second sed replaces all spaces with one ,
third sed eliminates , from beginnig and end of line


Hope this helps
Chris


Kenneth_19
Trusted Contributor

Re: Comma Separated Files

Hi,

In that case, if the number and positions of each record are not the same, how can you import them back to the database with sqlldr?
You might have hundreds of combinations then!!!

Get one step back, why you want to eliminate the empty fields if they are finally loaded into the same table, where the fields will still be empty even you don't load anything for it?

If you really want to make this done, try:

# sed "1,$ s/,,/,/g" input.csv > output.csv

if the csv files have character fields that are also wrapped with double quote, run again:

# sed "1,$ s/,"",//g" output.csv > output2.csv

This can eliminate empty fields, but you might need some extra effort to get rid of the extra "," at the very beginning and towards the end of each record if the first and the last field is empty.

Kenneth
Always take care of your dearest before it is too late
Massimo Bianchi
Honored Contributor

Re: Comma Separated Files

Hi,
if you know the number of arguments, you can:
- use my previous answer
- put as many variable as you nedd in the while loop
- print only those field that are not empty !

Massimo

Leif Halvarsson_2
Honored Contributor

Re: Comma Separated Files

Hi,
Try a very simple solution,
cat |ts -s "," >newfile

This removes repeating , and the fields with data is extracred.
Leif Halvarsson_2
Honored Contributor

Re: Comma Separated Files

Sorry,
It should of course be "tr" not "ts".
Edgar_10
Frequent Advisor

Re: Comma Separated Files

Hi All,

Thanks for the invaluable feedback! As mentioned I have csv files containing approx.16000 rows/records which are 100 fileds/columns in length.

My aim is to extract only fields 3,7,12,22,33,46.

Is this possible?

Thanks in advance!
Kenneth_19
Trusted Contributor

Re: Comma Separated Files

Hi,

Then my previous answer is the solution!

# cat input.csv | cut -d , -f 3,7,12,22,33,46 > output.csv

Let me explain the syntax:

for the cut command, -d is for specifying delimiter therefore "-d ," is for comma seperated fields, -f is for specifying the field number based on the delimiter given, "so -f 3,7,12,22,33,46" is to extract the 3rd, 7th, 12th, 22th, 33th, and 46th field from each record.

Kenneth
Always take care of your dearest before it is too late
Massimo Bianchi
Honored Contributor

Re: Comma Separated Files

Hi,
awk solution:



awk -F, '( print $3,$7,$12,$22,$33,$46 )' FILE


the purpose of the "," in the print to put a space between the fieds...

Massimo