- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Sort in awk is very slow
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-09-2002 10:52 AM
тАО05-09-2002 10:52 AM
I seem to have coded myself into a corner. I am working on a fairly complex set of financial reports written mainly in awk. The reports work but take a very long time to complete. I have found that my main problem is the sort. The sort example I use was taken from the O'reilly Book "Sed & Awk" on page 222. It is very compact and works well when the array is small but when processing about 5000 accounts can take several minutes to finish. My data is actually accumulated in a number of parallel arrays. I also have an array which looks like this:
202-456-4490 908
102-412-6512 134
000-000-0000 0
000-000-0001 1
The first field is the account number and the second field is the
index in the original arrays. I then sort this array and output the original parallel array values using the second field of this sorted array to access the data. The reports now print in account number order just as they should. My only problem is the speed of the sort. I am thinking about writing the data to a file and then calling the sort program but maybe someone has a better idea.
Thanks in advance, Neil
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-09-2002 10:56 AM
тАО05-09-2002 10:56 AM
Re: Sort in awk is very slow
perl is an option, or even faster hardware.
live free or die
harry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-09-2002 11:03 AM
тАО05-09-2002 11:03 AM
Re: Sort in awk is very slow
a2p
live free or die
harry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-09-2002 11:09 AM
тАО05-09-2002 11:09 AM
SolutionI had to pull out my copy of 'Sed & Awk' to see what you were talking about but as soon as I did I saw what your problem is. This is a classic 'bubble sort' and its performances degrades with the square of the number of elements.
Before I give you a good answer, I suppose I should point out that all that parallel array nonsense (excuse me - code) could have been avoided with the use of perl (specifically the Class:Struct module) which could have kept all your data related to one account in a nice box (a struct/record).
The good news is that in my awk code library, I have a sort function that will do the trick and should do in at most a few seconds what takes your bubblesort minutes. (It does assume that your arrays are numeric indexed rather than associative and the lowest index is 1).
use it like this:
qsort(my_array,1,count)
Arg 1 = the array to sort
Arg 2 = 1 (because this is a recursive function it needs an array lower bounds; should always be 1)
Arg 3 = number of elements in the array
This should fix you, Clay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-09-2002 11:53 AM
тАО05-09-2002 11:53 AM
Re: Sort in awk is very slow
I've tried to learn Perl but I seem to get bogged down everytime I try. Clay, I put in your sort function and sorts that were taking 15 minutes are now done in 1 or 2 seconds!!! I can only say wow!!! I am truly amazed because your sort is only a few lines longer than mine. My users are going to be impressed with 'my' work.
Thanks again, Neil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-09-2002 12:00 PM
тАО05-09-2002 12:00 PM
Re: Sort in awk is very slow
I'm glad you liked the sort function. The fundamental problem with your sort is that the exchange interval (1) is so small that repeated passes are required to sort the data.
I actually have a better function that combines the best features of quicksort and bubblesort by switching to bubblesort when the array boundaries are small. Bubblesort excels in this environment. Since you are already in the 1-2 second range and seem happy, I don't think I will bother.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-10-2002 07:29 AM
тАО05-10-2002 07:29 AM
Re: Sort in awk is very slow
I just saw your last post. Would you mind posting your better sort function? I would really like to squeeze all the speed out of this program that I can.
TIA, Neil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-10-2002 08:06 AM
тАО05-10-2002 08:06 AM
Re: Sort in awk is very slow
bqsort(my_array,1,count,6)
The 4th argument is the 'bubblepoint' and whenever a quicksort partition size falls to this value, bubblesort is used instead.
The 'bubblepoint' should probably be kept in the range of 4 to 10 - 6 is an all-around good value for the vast majority of datasets.
Clay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-10-2002 08:22 AM
тАО05-10-2002 08:22 AM
Re: Sort in awk is very slow
I'm am idiot. That should be:
qbsort(my_array,1,count,6)
Clay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-10-2002 10:05 AM
тАО05-10-2002 10:05 AM
Re: Sort in awk is very slow
Neil