Operating System - Linux
1752660 Members
5708 Online
108788 Solutions
New Discussion юеВ

Re: Perl Help needed: reading file into array and pulling certain fields?

 
Kristopher March_1
New Member

Perl Help needed: reading file into array and pulling certain fields?

I'm a beginner - Day 1.

I'm reading a .csv file into an array (not sure if this is correct approach). Then I'm trying to pull certain columns of data from that.

Here's the script I've been working on:

open (DATAFILE, "Inventory.csv");
@PND_inv = ;
$numitems = @PND_inv;
print "There are $numcmd commands.", "Here is your data: @PND_inv[0]\n";

What I realized was that items are on a single line and not columnized as I had thought. So I'm getting the entire first line and not the entire first column.

I know how to do this with cat and awk, but I need to this single script to work on both NT and UNIX platforms.
% cat filename.csv | awk -F , '{print $4, $5}' > /tmp/newfile.out

Where am I going wrong?
9 REPLIES 9
James R. Ferguson
Acclaimed Contributor

Re: Perl Help needed: reading file into array and pulling certain fields?

Hi:

Just as you would do in 'awk' (or with 'IFS' in the shell) so must you do in perl -- setup your field separator.

You can do this with the perl switch '-F' in conjuction with '-a' to autosplit. For instance:

# echo "a,b,c,d" |perl -lanF"," -e 'print $F[2]' -

...would print "c" since it is the second (zero-relative) field that is delimited by ",".

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: Perl Help needed: reading file into array and pulling certain fields?

Hi (again):

I should hasten to add that you can 'split()' fields into items in an array:

# perl -wle '@a=split(/,/,$ARGV[0]);print $a[2]' a,b,c,d

...also returns "c"...

Regards!

...JRF...
Kristopher March_1
New Member

Re: Perl Help needed: reading file into array and pulling certain fields?

Thanks for the quick response. Although the one liner looks simple I'm still having trouble understanding exactly what's going on.

At what point are you reading the file in?
Torsten.
Acclaimed Contributor

Re: Perl Help needed: reading file into array and pulling certain fields?

It is always a good idea to learn a new language! But likely you won't be an expert after day 1.

But here is the quick alternative:

Use "GNU utilities for Windows" and run your shell script on win as on unix.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
James R. Ferguson
Acclaimed Contributor

Re: Perl Help needed: reading file into array and pulling certain fields?

Hi (again):

In the example:

# perl -lanF"," -e 'print $F[2]' -

I was using the '-n' switch to read file(s) sepecified on the commandline. The '-' argument equates to STDIN.

Consider:

# cat /tmp/mydata
one,two,three
red fish,blue fish,one fish,two fish

# perl -lanF, -e 'print $F[2]' /tmp/mydata

...yields:

three
one fish

...since I asked to print the second field ($F[2] in the automatically split (-a) array built from reading /tmp/mydata. The '-n' creates a read loop just like you would with :

while (<>)

I used the '-l' to autogenerate linefeeds when printing.

Regards!

...JRF...



H.Merijn Brand (procura
Honored Contributor

Re: Perl Help needed: reading file into array and pulling certain fields?

Splitting on ',' for csv files is both easy and wrong. Not for simple csv, but consider these two csv lines:

1,,Blah,2,"Foo, Bar"

and

1;;Blah;2;Foo, Bar

the first is correct CSV, and the seconds is Micro$oft's way of interpreting the C in CSV (C translates to semicolon in M$' dictionaries, instead of Comma)

These problems are all solved for perl in the Text::CSV_XS module (or Text::CSV for the slower pure-perl version)

Now I have written a module to hide all the implementation specifics for spreadsheets (and CSV) and make a uniformal interface:

--8<---
use Spreadsheet::Read;
my $ref = ReadData ("Inventory.csv");

print "This CSV file contains ", $ref->[1]{maxcol}, " columns (fields) and ", $ref->[1]{maxrow}, " rows (lines).\n";
print "The third field on line 6 is: ", $ref->[1]{cell}[3][6], "\n";
-->8---

The module comes with a command line utility, so you can see the content of Excel (xls), OpenOffice (sxc, ods), and CSV (csv) pretty easy:

# xlscat Inventory.csv

# xlscat -?

will show the available options.
If you also have Perl/Tk installed, a new util 'ss2tk' will be included in the next release, and it will give you a multi-tabbed perl/tk read-only interface to these formats with search capabilities


And yes, Spreadsheet::Read and Text::CSV_XS work on both Linux and NT (and HP-UX, and AIX, and Solaris, and Cygwin, and .....)

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Muthukumar_5
Honored Contributor

Re: Perl Help needed: reading file into array and pulling certain fields?

You can use cut utility to get better performance :) Any way, if you are going with perl then,

echo "a,b,c,d,e,f" | perl -aF, -ne 'print "@F[3,4]\n";'

If you want to update in same file then,

perl -aF, -ni -e 'print "print "@F[3,4]\n";'

-Muthu
Easy to suggest when don't know about the problem!
James R. Ferguson
Acclaimed Contributor

Re: Perl Help needed: reading file into array and pulling certain fields?

Hi (again):

Merijn: While I was aware of the inherent problems with CSV files, your pointer was (aw always) very germane. Thanks.

Muthu: I fail to see how 'cut' could provide better performance in repetive processing like this. Given that 'cut' isn't a shell built-in you would have to spawn a new process (for 'cut') for every line read.

...JRF...
Hein van den Heuvel
Honored Contributor

Re: Perl Help needed: reading file into array and pulling certain fields?


Kristopher,

Welcome to the ITRC forums. Be sure to glance over the 'rules' a little: http://forums1.itrc.hp.com/service/forums/helptips.do?#overview

First, may I applaud you for tackling a new problem with a new language. Be sure to read up on the many generic web resources on perl. (google is your friend)

Next, be sure to read Merijn's reply 5 times over. He (procura) is 'the man' in this space.

Finally, my own modest contribution.
It is still using a simplistic 'split /,/'... boo hiss... but it may help reduce the learning curve a little.
My solution expects a 'header' line. And is willing to deal with column numners as well as names.
It currently reads the whole file into arrays, but I find that often I don't need the data in arrays, just process the lines as they come by such as the other solutions imply. Code, Data, and sample usage below.

Hope this helps,
Hein.

------ csv.pl ----------

use strict 'vars';
my ($datafile, $tmp, $i, $rows, $columns, $x, $y);
my (@col, @dataline, @cell, %col);


$datafile = shift @ARGV; # pick up data file name from command line
$datafile = "Inventory.csv" unless $datafile; # Provide default
$x = shift @ARGV;
$y = shift @ARGV;

open (DATAFILE, "<$datafile") or die "Failed to open $datafile for input";
#
# Read column-header line
# create array of column-names to numbers and numbers to names
#
$_ = ;
chomp;
foreach $tmp (split /,/) {
@col[$columns] = $tmp;
$col{$tmp} = $columns++;
}
#
while () {
@dataline[$rows] = $_;
$i = 0;
foreach $tmp (split /,/) { $cell[$rows][$i++] = $tmp };
$rows++;
}
print "There were $columns columns read with $rows rows of datalines\n";
print "Columns are: ", join(", ",@col), "\n";
if (defined $y) {
if ($y =~ /\d+/) {
print "cell $x,$y = $cell[$x][$y ]\n";
}else {
print "cell $x,$y = $cell[$x][$col{$y}]\n";
}
} else {
print "no specific cell data requested\n";
}


------ C:\Temp>type tmp.dat -----------
a,b,c,d
aap,noot,mies,teun
12,34,56,78
test,,more test,done

------------- usage examples -------------

C:\Temp>perl csv.pl tmp.dat
There were 4 columns read with 3 rows of datalines
Columns are: a, b, c, d
no specific cell data requested

C:\Temp>perl csv.pl tmp.dat 0 0
There were 4 columns read with 3 rows of datalines
Columns are: a, b, c, d
cell 0,0 = aap

C:\Temp>perl csv.pl tmp.dat 0 d
There were 4 columns read with 3 rows of datalines
Columns are: a, b, c, d
cell 0,d = teun


C:\Temp>perl csv.pl tmp.dat 2 2
There were 4 columns read with 3 rows of datalines
Columns are: a, b, c, d
cell 2,2 = more test