Online Expert Day - HPE Data Storage - Live Now
April 24/25 - Online Expert Day - HPE Data Storage - Live Now
Read more
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Sort and the SortworkN files

SOLVED
Go to solution

Sort and the SortworkN files

When I do a very big sort, I always follow these steps;

1. Find the blocksize of the file(s) to be sorted.
2. Mutliply that by 3.
3. Assign sortworkN to as many "N" as needed to obtain the "3-times-blocksize" space.

I usually assign the space so that Sortwork1 has the most space, Sortwork2 has the second most space, etc. But it seems that VMS always rearranges things so that the first disk used is the one with the least space, and so on.

Is this true? If so, why?

Dom
12 REPLIES
Hein van den Heuvel
Honored Contributor

Re: Sort and the SortworkN files


Are we talking about sorting variable record length sequential files?

Sort sizes the work file based on the longerst record length. So if that is 'way off' then you may end up using much more work space than you actually need. Typically this is seen with LRL=32767. Trim that back to the actual longerst record and you may be much better of.

How big a sort are we talking about Gigabytes?

Hein.

Re: Sort and the SortworkN files

The specific file I am sorting is 16 bytes/record (fixed length) with 36,603,791 records. But I have noticed this behavior with many large sorts on many different file types. It always seems that SORT will use the disk with the smallest space first, and then move up a ladder to the disk with the largest space.
Hein van den Heuvel
Honored Contributor

Re: Sort and the SortworkN files

>> The specific file I am sorting is 16 bytes/record (fixed length) with 36,603,791

Hmmm, under 600MB. Just sort in memory?
(needs big pgflqquo and such!)

Also, did you try & compare with Hypersort or good'ole sort?

Best I understand Sort will write 'runs' of sorted data to a scratch file for as long as it can, then switch to the next file for the next run and so on untill all done. At then end it merges those pre-sorted runs back together into a single output.
The (dis)order withing the input data will influence which workfile will get use more or less.

hth,
Hein.
John Gillings
Honored Contributor
Solution

Re: Sort and the SortworkN files

Dom,

How SORT chooses which disk to allocate workspace to is not documented and subject to change.

I think the only way SORT knows how much space is available on a particular disk is by an extend failure, so I doubt the behaviour you observe is deliberate. More likely it's a "last in first out" consequence of the way you order your work files. Have you tried reversing your order of definitions?

That said, it really shouldn't matter which way you define the logical names, or the order which SORT uses them.

Search the source code [SORT32.LIS] if you really want to find out the design decisions, but please DON'T make your application dependent on undocumented behaviour.
A crucible of informative mistakes
Antoniov.
Honored Contributor

Re: Sort and the SortworkN files

Dominic,
due to size of your file may be hypert-sort can't work. See HP note
http://h71000.www7.hp.com/doc/731FINAL/4493/4493pro_060.html#4493_sor_chap

About work files.
HP documentation says:
Consider the following when you assign work files to devices:
- Assign work files to the fastest devices available. For example, random-access, mass storage devices such as disks.
- Choose devices with the least activity and the most space available.
- Assign each work file to a different physical device to maximize overlapping input and output.
Therefore SORT choice is not only depending by your declaration.

HP documentatio says too:
If Sort requires work files (for example, if you are sorting a large file), a larger working set can increase sort efficiency. However, if your system is used heavily, it might be unable to allocate all the pages in the working set extent to your process. This can result in paging, which occurs when the operating system transfers parts of a process between physical memory and memory on a paging device; only the active part of the process remains in the physical memory. To avoid excessive paging, you can decrease the working set extent for your process. (Use the SET WORKING_SET command to decrease the working set extent.)

For furthermore information read here
http://h71000.www7.hp.com/doc/731FINAL/6489/6489pro_023.html

Antonio Vigliotti
Antonio Maria Vigliotti
Robert Gezelter
Honored Contributor

Re: Sort and the SortworkN files

Dom,

For diagnostic purposes, I would be interested in your checking the following items:
- please give us the results of a SHOW LOGICAL SORTWORK*
- the quotas for your account on each of the disks identified in the step above
- a SHOW PROCESS/QUOTAS and SHOW PAGE from your SORT process before the operation.

While I would, as John mentioned, not RELY on undocumented SORT behavior, it is useful at least in understanding what is happening (and reporting behavior anomalies to Engineering).

One that I am curious about is the possibility that the relationship between free space and available space quota on your disks is reversed, namely, that the disk on which you (by your quota) have the most free space is also has the least overall free space available.

There are other posssibilities. In other situations, I have seen SORT have problems if certain UAF parameters were set in what would seem innocuous ways.

- Bob Gezelter, http://www.rlgsc.com
Hein van den Heuvel
Honored Contributor

Re: Sort and the SortworkN files

Antoniov wrote on Aug 23, 2005 06:40:52 GMT
" due to size of your file may be hypert-sort can't work. See HP note
http://h71000.www7.hp.com/doc/731FINAL/4493/4493pro_060.html#4493_sor_
chap "

Why would you conclude that? Please explain such that we can fix the documentation if needed. Best I know the file size has no implication on whether hypersort can be used or not. ALL the doc indicates is that you may have to be more carefull with workingset / quota setting.

Hein.
Antoniov.
Honored Contributor

Re: Sort and the SortworkN files

Hein,
don't angry :-)
HP suggests in linked documentation don't use hyper-sort in large files.
I don't know why, I merely extracted a little sentence of guide.

Antonio Vigliotti
Antonio Maria Vigliotti
Hein van den Heuvel
Honored Contributor

Re: Sort and the SortworkN files

Hi Antoniov,

I am not angry, just dumb.
I fail to see the line that made you conclude hypersort can not do large files.
I woudl like to know which line that was to try to understand whether there is perhaps a misunderstanding an how we can clarify this in the future.

Is it the note "Memory allocation differences may limit the high-performance Sort/Merge utility's ability to perform the same number of concurrent sort operations as the Sort/Merge utility can perform in the same amount of virtual memory" ?

Feel free to send Email on this in order not to clutter this topic: firstname@company.com


Mind you, I am intested to know whether Dominic tried hypersort vs classic sort and what his experince might be.


Thanks,
Hein.
Antoniov.
Honored Contributor

Re: Sort and the SortworkN files

Hein,
you are right :-O
May be interesting if Dominic uses Hyper-Sort.
Dominic,
if you want to try, please inform us about results.

Antonio Vigliotti
Antonio Maria Vigliotti

Re: Sort and the SortworkN files

We do not have an alpha (wish we did!), so no, I did not try hypersort.

My major concern was with the other thread I started, the one which asked if SORT was handling the error messages from system. Hein gave me the definitive answer there.

This thread was posted because I was curious about the way SORT handled its workspace. I've made a few tests and it seems that SORT always uses the workspace with the least amount of space first. I guess the engineers have a reason for that.

Dom
Antoniov.
Honored Contributor

Re: Sort and the SortworkN files


This thread was posted because I was curious about the way SORT handled its workspace. I've made a few tests and it seems that SORT always uses the workspace with the least amount of space first. I guess the engineers have a reason for that.


Hi Dom,
if your read my post of Aug 23, 2005 06:40:52 GMT you can see a link to HP documentation. SORT uses this strategy:
- Assign work files to the fastest devices available. For example, random-access, mass storage devices such as disks.
- Choose devices with the least activity and the most space available.
- Assign each work file to a different physical device to maximize overlapping input and output.

So it may appear to you as SORT uses the least amount space disk.

Antonio Vigliotti
Antonio Maria Vigliotti