Re: Sort in awk is very slow

Neil Edwards · ‎05-09-2002

Hello Gang,

I seem to have coded myself into a corner. I am working on a fairly complex set of financial reports written mainly in awk. The reports work but take a very long time to complete. I have found that my main problem is the sort. The sort example I use was taken from the O'reilly Book "Sed & Awk" on page 222. It is very compact and works well when the array is small but when processing about 5000 accounts can take several minutes to finish. My data is actually accumulated in a number of parallel arrays. I also have an array which looks like this:
202-456-4490 908
102-412-6512 134
000-000-0000 0
000-000-0001 1

The first field is the account number and the second field is the
index in the original arrays. I then sort this array and output the original parallel array values using the second field of this sorted array to access the data. The reports now print in account number order just as they should. My only problem is the speed of the sort. I am thinking about writing the data to a file and then calling the sort program but maybe someone has a better idea.

Thanks in advance, Neil

It wasn't me.

harry d brown jr · ‎05-09-2002

perl is an option, or even faster hardware.

live free or die
harry

Live Free or Die

harry d brown jr · ‎05-09-2002

Try converting your awk script to perl:

a2p

live free or die
harry

Live Free or Die

A. Clay Stephenson · ‎05-09-2002

Hi Neil:

I had to pull out my copy of 'Sed & Awk' to see what you were talking about but as soon as I did I saw what your problem is. This is a classic 'bubble sort' and its performances degrades with the square of the number of elements.

Before I give you a good answer, I suppose I should point out that all that parallel array nonsense (excuse me - code) could have been avoided with the use of perl (specifically the Class:Struct module) which could have kept all your data related to one account in a nice box (a struct/record).

The good news is that in my awk code library, I have a sort function that will do the trick and should do in at most a few seconds what takes your bubblesort minutes. (It does assume that your arrays are numeric indexed rather than associative and the lowest index is 1).

use it like this:

qsort(my_array,1,count)
Arg 1 = the array to sort
Arg 2 = 1 (because this is a recursive function it needs an array lower bounds; should always be 1)
Arg 3 = number of elements in the array

This should fix you, Clay

If it ain't broke, I can fix that.

Neil Edwards · ‎05-09-2002

Thanks for the quick responses!

I've tried to learn Perl but I seem to get bogged down everytime I try. Clay, I put in your sort function and sorts that were taking 15 minutes are now done in 1 or 2 seconds!!! I can only say wow!!! I am truly amazed because your sort is only a few lines longer than mine. My users are going to be impressed with 'my' work.

Thanks again, Neil

It wasn't me.

A. Clay Stephenson · ‎05-09-2002

Hi again Neil:

I'm glad you liked the sort function. The fundamental problem with your sort is that the exchange interval (1) is so small that repeated passes are required to sort the data.
I actually have a better function that combines the best features of quicksort and bubblesort by switching to bubblesort when the array boundaries are small. Bubblesort excels in this environment. Since you are already in the 1-2 second range and seem happy, I don't think I will bother.

If it ain't broke, I can fix that.

Neil Edwards · ‎05-10-2002

Hi Clay,

I just saw your last post. Would you mind posting your better sort function? I would really like to squeeze all the speed out of this program that I can.

TIA, Neil

It wasn't me.

A. Clay Stephenson · ‎05-10-2002

Okay, but I suspect the improvements are going to be small. Use it like this:

bqsort(my_array,1,count,6)

The 4th argument is the 'bubblepoint' and whenever a quicksort partition size falls to this value, bubblesort is used instead.

The 'bubblepoint' should probably be kept in the range of 4 to 10 - 6 is an all-around good value for the vast majority of datasets.

Clay

If it ain't broke, I can fix that.

A. Clay Stephenson · ‎05-10-2002

Oops ....

I'm am idiot. That should be:
qbsort(my_array,1,count,6)

Clay

If it ain't broke, I can fix that.

Neil Edwards · ‎05-10-2002

Thanks again Clay. That speeded up the sort by maybe 10%.

Neil

It wasn't me.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Sort in awk is very slow

Sort in awk is very slow