1821589 Members
3370 Online
109633 Solutions
New Discussion юеВ

Perl sort on a big file

 
SOLVED
Go to solution
Chandrakumar Karunamoor
Occasional Advisor

Perl sort on a big file

Hi,
Need help in improving the sort in perl. I am reading a file full of records with pipe(|) delimited fields. I have about 13 fields in total and i am sorting the file on the first and third field.

In other words, I am sorting the records numerically on the first field and if it is same then sorting (string) on third field.

Below is the code i am using. Since our input files are in terms of GB (giga bytes) I would greatly appreciate any help in doing this better and faster.

open(DAT, $ARGV[0]) or die "Cannot open $ARGV[0] for read\n";
my @records = sort by_gst_date();

sub by_gst_date {
my @first = split(/\|/,$a);
my @second = split(/\|/,$b);
$first[0] <=> $second[0]
or
$first[2] cmp $second[2]
}

Thank you for your help.
Regards,
Chandra
6 REPLIES 6
Bill Hassell
Honored Contributor

Re: Perl sort on a big file

I think the standard HP-UX sort command will do this very easily (and pretty darn fast):

sort -t \| -kn1 -kn3 SomeBig_file

To reverse the sort, just add -r to the sort options. Note that you need to escape the pipe symbol with a backslash as it has special meaning to the shell.


Bill Hassell, sysadmin
Chandrakumar Karunamoor
Occasional Advisor

Re: Perl sort on a big file

And I forgot to mention that i do a lot processing after this. I summarize the data, transpose it etc. which is diffucult in shell script.

Thank you,
Chandra
Bill Hassell
Honored Contributor

Re: Perl sort on a big file

Just include the sort command in the Perl script and process the output.


Bill Hassell, sysadmin
A. Clay Stephenson
Acclaimed Contributor

Re: Perl sort on a big file

I would at least try feeding your Perl script with the piped output of the system sort command. You are having to do tons of processing in your comparison routine. This is a case where the system utility is almost certainly faster than Perl's sort although both are based on quicksort.
If it ain't broke, I can fix that.
H.Merijn Brand (procura
Honored Contributor
Solution

Re: Perl sort on a big file

You need the swarzian transform :)

my @recs = map { $_->[1] }
sort { $a->[0] cmp $b->[0] }
map { [ (pack "sA*", (split/\|/)[0,2]), $_ ]}
;

bloody fast and efficient. (warning: written from brain, not tested)

Beats the hell out of external sort.

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
H.Merijn Brand (procura
Honored Contributor

Re: Perl sort on a big file

Clay, FWIW 5.8.0 and up are based on merge-sort :)

up till 5.6.1 it's indeed quicksort.

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn