Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Sort Dilemma - Need Suggestions

 
Jerry Rieman
Advisor

Sort Dilemma - Need Suggestions

We have always used SORT32 and it works fine. The normal procedure is to write out records to a file, sort them, and read the sorted file back into a program. This works well for both large and small sorts, but is slow when there are thousands of small sorts. The excessive file IO is a killer.

We have switched a key application program that does thousands of sorts to use the "sor$" library routines and pass records to be sorted directly to the sort via the "record" option. This method passes the records to the sort in memory and returns the sorted records in memory. As expected, we have achieved dramatic improvements in performance with this method - from 3 hours to 20 minutes in one case.

However, we have run into one glitch. As long as the number of records to be sorted is small, performance is excellent. As the number of records increases, performance degrades exponentially - even slower than the original program approach. The creation and use of sortwork files appears to be the bottleneck. Sort32 does not seem to dynamically expand to use available memory - even when there is lots of memory available - and large WSEXTENT.

We then tried to use Hypersort- V08-007 via the assignment of the SORTSHR logical. It does exactly what we want - i.e. - expands dynamically to use available memory as needed. Performance is excellent until a sort is encountered that requires more than the available memory. We expected Hypersort to then create work files as needed. However, it did not, but instead seemed to go into an infinite process loop (no direct IO) - at the point at which one would expect it to be out of available memory. We reviewed all the HP documentation on Hypersort - i.e. - WSEXTENT to be no more than 1/3 the size of PGFLOQUO. All our adjusting of WSEXTENT seemed to have no effect.

We are running Alpha VMS 8.3 - current on all patches.
Has anyone else experienced a similar Hypersort problem? Any suggestions on things to try - short of rewriting the application to do one large sort instead of many smaller sorts? Thanks in advance for any advice.
9 REPLIES
Karl Rohwedder
Honored Contributor

Re: Sort Dilemma - Need Suggestions

Jerry,

I habe no experience with Hypersort, but acc. to the HELP it should use workfiles (up to 255), so perhaps you should contact HP about this issue.

Reg. SORT: I assume you have spread the workfiles across different disks (via the SORTWORKn logicals)? One may try using RAMDISKS for them.

regards Kalle
Leif R. Jansson
Occasional Advisor

Re: Sort Dilemma - Need Suggestions

Hi

I guess the problem is file-accsess to disks
for the sortwork-files. To make this more fast You may try to use the product DECram
to create a disk in memory. Depending on how much memory You have and how big the sortworks files become during the sort-session You can combine DECram disk's with ordinary disks.

/Regards
Leif
All questions are good questions
Edwin Gersbach_2
Valued Contributor

Re: Sort Dilemma - Need Suggestions

Jerry,

Make shure that SYS$SCRATCH is defined in the context of the sort task.

I once had some issues where scripts have failed in certain contexts (detached?, network?) because this logical was not defined and workfiles could therefore not be created.

Edwin
Robert Gezelter
Honored Contributor

Re: Sort Dilemma - Need Suggestions

Jerry,

Can you give us some more details?

In particular, what are the logical name definitions, including the Job logical name table.

- Bob Gezelter, http://www.rlgsc.com
Hein van den Heuvel
Honored Contributor

Re: Sort Dilemma - Need Suggestions

Jerry, soryy to put in a 'lame', hand waving reply, but SORT, and HyperSort are VERY sensitive to WS vs PAGFIL vs VA settings. It tries to make the right decisions and the wrong relations can misguided. You may well find you need to turn DOWN some settings!

Also... what about record sizes. Check the LRL setttings for the files. Do they match the actuals? If not (like 32K vs '200') then sort may decided to write out 50 times more data than really needed.

Sorry, gotta go, maybe more later.
Hope this helps some,

Hein van den Heuvel
HvdH Performance Consulting.
Jerry Rieman
Advisor

Re: Sort Dilemma - Need Suggestions

Some responses to suggestions -

Kalle- We tested with both not defining the sortwork* logicals and explicitly defining the sortwork* logicals. No luck. Normally, when the sortwork* logicals are not defined, the work files default to sys$scratch:. That is exactly what happened for SORT32, Hypersort never did create work files.

Kalle and Leif - We did indeed look closely at DECRAM as an option. It is now integrated into OpenVMS 8.3 - no separate installation - but does require a separate license. I was hoping that for a few hours of coding I could bypass the DECRAM option. The coding was easy, the testing has been problematic.

Edwin - sys$scratch: was indeed there. While testing we have been running interactively. This avoids any possible problems with sysgen PQL settings.

Bob- We have no special logical name tables or definitions. We did assign "sortshr" to hypersort in order to invoke the hypersort. We had no problem invoking hypersort - it just went into a process loop. Is there something else you are looking for?

Hein - Mostly we have been moving wsextent up and down to see if we get any improvement. We have moved it from 100,000 all they way up to 2,000,000. pgflquo is 2,000,000 - as is wsmax.

Record size for all records was 1434. I did try 1536 just for grins. No luck. It is a full record sort - which is the default for the sor$ routines. We may try a key sort - it just requires passing another arguement to sor$begin_sort.

Thanks to all for the suggestions - keep em coming.
James Cristofero
Occasional Advisor

Re: Sort Dilemma - Need Suggestions

Jerry,

We had some similar experience at our site with Hypersort.

We ended up redefining SYSTEM$SCRATCH to a large disk volume (and still do for SYNCSORT) as many of the processes had those large files (and you know what I'm talking about here) to sort. We allocated 200GB, if your attempting sorting the files I think you are.

The client ended up reverting to SYNCSORT due to issues related to the ability to pass variables.

Heim and I talked about this last fall, that there were some issues where SYNCSORT was superior to Hypersort.

Account's required large PFGLQ, WSEXTENTS.
Andreas Gruhl
Occasional Visitor

Re: Sort Dilemma - Need Suggestions

Hello Jerry,

we see good performance when using the FILE_ALLOC parameter of SOR$BEGIN_SORT.
We set it to the total count of blocks the data would occupy if it were stored on disk, i.e. record_count*bytes_per_record/512.
It seems as if SORT then allocates internal memory of this size.
The default is only 1000 blocks which is way too small.

Good luck
Andreas
Jerry Rieman
Advisor

Re: Sort Dilemma - Need Suggestions

james- fancy meeting you here! I tried putting ssys$scrach on a drive with about 200 gigs of free space - no luck.

Andreas - couple of questions -
1. are any of your sorts so big that it requires hypersort to create work files - and do those work files get created successfully? I have yet to get hypersort to create any work files. What kind of working set params are you using?

2. Are you doing a full record sort (the default) or passing keys for a key sort? We are using the default and I am thinking of trying a key sort - perhaps it will force hypersort into another area of logic and not go into a process loop.