Re: SORT plus CONVERT /NOSORT is quicker than CONVERT /SORT ???

Hein van den Heuvel · ‎03-01-2010

Yes, pre-sorting can be faster as you noticed.

There are multiple reasons for this.

- CONVERT use RMS RECORD IO to read the input file. Since you don't specify RMS setting nor the OpenVMS version I have to guess the this means a 16 block (old VMS) or 32 block buffer? Try $ SET RMS /BLOC=127/BUF=8

- Because CONVERT uses RECORD IO, it will be in and out of EXEC mode 'all the time' and burn a good bit more CPU than anything else (SORT, C-RTL, COBRTL) reading this file.

- CONVERT typically uses and RFA sort for the primary KEY. So after the SORT it re-reads all input, randomly. Using the potentially small and few buffers.

- CONVERT will switch to a record sort, if there are multiple input files. For grins, add ",NL:" to your input file spec! Of course, with that the CONVWORK file will become 200/6 times bigger! Depending on where the work file lives, that may be faster or slower.

- SORT and HYPERSORT use 127 block IOs, independent of RMS and has its own async/multi-buffer handling.

I checked with LD TRACE. You may want to do that also, with a chunck of the input file.

As input I used an expanded SYS$LIBRARY:VMS$PASSWORD_DICTIONARY.DATA/FDL="RECORD; SIZE 200; FORM FIX"/PAD

RMS/BLOC=32 direct-io load = 589, sort = 556

RMS/BLOC=127 direct-io load = 196, sort = 162

NL:, BLOCK=127 direct-io load = 383, sort = 582

NOSORT, BLOCK=127 direct-io load = 153

f$file("lda6:[000000]test.dat","EOF")/127 = 132

Sort, direct IO = 332 (irrespective of RMS settings)
HYPERSORT direct IO = 288

HYPERSORT can not READ indexed files. It can be used by CONVERT to do its work. just fine.

Memory-pool and reboot? No.

John, Yeah, improper LRL's can block up RECORD sorts. Convert's RFA sort would not be effected.

Phil, No. Think about it.

Mark, use /STAT=FULL, use ^T.

Cheers,
Hein.

Mark Corcoran · ‎03-02-2010

JOHN/HEIN: yep, I normally do select all of the text before attempting to submit, because of this issue of updates failing.

Unfortunately, it's been a while since I last used Forums, and forgot that this happened. Maybe I should get a label stuck on my monitor to remind me ;-)

STEPHEN/HEIN: The hardware on which the unload & indexing takes place will depend on which queue Scheduler happens to have submitted the job to, but whilst doing manual testing, I was using the faster of the nodes in the cluster...

It's a 24-CPU Alphaserver GS1280 7/1300, running OVMS v7.3-2 (yes, yes, I know).

HEIN: I guess using the PROCIO and DIR/SIZE=ALL might be useful in determining how far you are through processing, if you know how much I/O or disk space the files normally end up taking :-D

However, all I know is that when it works (using CONVERT/SORT) it takes <=7hrs, and when it doesn't work (using CONVERT/SORT) it can take >7hrs; SORT + CONV/NOSORT takes ~1hr.

After I'd finished trying to make the CONVERT/SORT as equivalent to the other method as possible (reducing the variables), it was still running after several hours.

I was curious as to how you derived the /SECONDARY=3 value - as I understand it, the /SECONDARY relates to "alternate keys" and by definition, would preclude the primary key.

This particular set of indexing uses 3 keys in total, so I would have assumed the value should be 2?

It's interesting to note that you mention this would make CONVERT use "just one, bigger CONVWORK" - I'm pretty sure I read a similar statement in the manual, but I've not seen CONVERT /SORT use more than one CONVWORK file - what I have seen however, is it using two versions of each SORTWORK file...

JOHN: In answer to your question "does SORT of an indexed file (in this case) actually achieve?"...

The default for CONVERT is to do /SORT unless you specify otherwise.

This particular example table has three keys because three different reports used the file to convert one "identifier" to another, and then use that "identifier" to look up another unloaded & indexed table.

e.g. let's say you have indices on columns A B and C in this unloaded/indexed table, but you have a report that needs to take identifier "A" and determine what its "Z" identifier value is.

However, column "Z" doesn't exist in this table; in order to get the "Z" value, you have to use say the "B" value to look up another unloaded & indexed table, which gives you a "G" value that you use to
index a third unloaded & indexed table to finally get your "Z" column/identifier value.

As it transpires (from discussing the history of this particular table), 2 of the reports no longer use the unloaded/indexed tables, but have been rewritten to access the live database.

The third report is written in C; I've not yet looked at the code to see what it does or how it does it, but apparently, if the indexed file is NOT sorted, it takes about 5 days to run instead of ~7 hours (I'm told it doesn't just simply fopen() it as a sequential file, and that it does use RABs and FABs, but I don't know any more than that).

Also, if the input file does not have records in sorted order, CONVERT /NOSORT will report lots of informationals - "%CONVERT-I-SEQ, record not in order"; I don't think this makes it problematic (because you'll still have it indexed), but it does mean either the job log file gets very big, or you have to turn off messages for the duration of the CONVERT, which may mean you losing diagnostic info if the CONVERT fails; I've also had unconfirmed reports that for each record that results in an informational message, the record doesn't in fact appear in the output file (not tried it myself, so I can't confirm).

You also mentioned the LRL - the LRL is set to 122, which is correct (RMU UNLOAD generates files "properly" through RMS, rather than fputs() calls in C :-D)

ALL: I am in the process of doing a number of trials on different sizes of input files, so I'm not being ignorant and not responding - I'm just trying to gather some suitable stats to report back here.

I have already noticed however, that the elapsed time for the sort of key 0 seems to be directly proportionate to the number of records - 10m records takes ~20 times as much as 500k records.

However, the key 0 load time looks like it might be on some kind of exponential "curve", climbing steeply the more records are in the input file.

It doesn't appear to make any (significant) time difference as to whether or not I reduce the FDL file just to the primary key, or (when having all three keys) use /SECONDARY=3.

Not sure how the "sort tree size" is derived, but again, this doesn't appear to change regardless of the number of records in the input file.

Here is the stats output from indexing on the primary key only (input file reduced to 10m records):

$ convert x.dat x.tmp /FDL=single_key_index.fdl /stat=full

Finished Load of Key 0 at 14:52:31.95

SORT Statistics for Key 0
Records Sorted: 10000000 Initial Runs: 255
Sort Tree Size: 19674 Work File Allocation: 422210
Merge Order: 20 Merge Passes: 14
Direct I/O Count: 79774 Buffered I/O Count: 682
Elapsed Time: 0 00:00:56.58 CPU Time: 0 00:00:45.93
Page Faults: 108

LOAD Statistics for Key 0
Direct I/O Count: 10102898 Buffered I/O Count: 64
Elapsed Time: 0 00:07:49.28 CPU Time: 0 00:07:20.80
Page Faults: 311

CONVERT Statistics
Number of Files Processed: 1
Total Records Processed: 10000000 Buffered I/O Count: 775
Total Exception Records: 0 Direct I/O Count: 10182698
Total Valid Records: 10000000 Page Faults: 435
Elapsed Time: 0 00:08:45.96 CPU Time: 0 00:08:06.74

Hein van den Heuvel · ‎03-02-2010

[just happened to me. Blank screen after submit. Typed too long? back + submint fixed it this time]

>> It's a 24-CPU Alphaserver GS1280 7/1300, running OVMS v7.3-2 (yes, yes, I know).

Not a bad box.

>> if you know how much I/O or disk space the files normally end up taking :-D

Back of the envelope calculations, early on, tend to give a rather accurate indication of the anticipated end time. blocks/io-sizes/times/io-counts.

>>(using CONVERT/SORT) it can take >7hrs; SORT + CONV/NOSORT takes ~1hr.

And that's critical information which we (me) don't dispute. Just want to explain, make understood. (rime?).

My money is on the default RMS buffers being too small.
What does SHOW RMS tell you? 16 blocks? 32 blocks?
so that's 4 or 8x more IOs than sort.
After I'd finished trying to make the
Plus convert reads twice. Another 2x
Plus convert would typically cause more randomish IOs, to which the the storage subsystem my react poorly.

>> I was curious as to how you derived the /SECONDARY=3 value

No magic. It's larger than 1 which is all that matters, and it is often but not always a reasonable break-even point between too much data in convwork, versus too many scans.

>> so I would have assumed the value should be 2?

Sure. Or larger.

>> t I've not seen CONVERT /SORT use more than one CONVWORK file -

It would only use one at a time.
Don't recall whether it truncates or delete + create. The listings could tell.

Or... use SET WATCH FILE/CLASS=MAJOR

> is it using two versions of each SORTWORK file...

Right. And for critical/frequent converts you may well learn how big they all will be and when they are written/read/deleted.

> The default for CONVERT is to do /SORT unless you specify otherwise.

Right... or when it detects that the keys are defined the same.

>>> if the indexed file is NOT sorted, it takes about 5 days to run instead of ~7 hours (I'm told it doesn't just simply fopen() it as a sequential file, and that it does use RABs and FABs, but I don't know any more than that).

That can probably be readily improved upon. It is rare that a well designed job takes hours and hours to proces a (large) file.

>> Also, if the input file does not have records in sorted order, CONVERT /NOSORT will report lots of informationals - "%CONVERT-I-SEQ, record not in order";

You don't want to do that. Specify /EXCPETION if you must. And check the stats. for in,out, exception.

>> for each record that results in an informational message, the record doesn't in fact appear in the output file

Corect. Check stats or exception file.

>> I'm just trying to gather some suitable stats to report back here.

Excellent.

>> - 10m records takes ~20 times as much as 500k records.

Right. So just try with 1 million or so records.

>> However, the key 0 load time looks like it might be on some kind of exponential "curve", climbing steeply the more records are in the input file.

Randomness versus sequential could explain some. More likely the smaller files had all data in the XFC cache, so randomness was 'forgiven'. That's why I do my experiments on an LD drive mounted /NOCACHE or on a RMS drive.

>> It doesn't appear to make any (significant) time difference as to whether or not I reduce the FDL file just to the primary key, or (when having all three keys) use /SECONDARY=3.

no surprised... if teh RFA sort for the primary is the cause

SORT Statistics for Key 0
Records Sorted: 10000000 Direct I/O Count: 79774

LOAD Statistics for Key 0
Direct I/O Count: 10102898 Elapsed Time: 0 00:07:49.28
CPU Time: 0 00:07:20.80

hmmm, that's a factor of 127 more IO on the load. I expected 8x - 16x. Very small rms default buffer size (SHOW RMS) ?

Also, the working set seems too small to allow SORT to be optimal.

Good luck,

Hein/

Hein van den Heuvel · ‎03-02-2010

"rms drive" --> RAM drive = MDxx:

Hein

Mark Corcoran · ‎03-04-2010

******************************************************************************************
CONCLUSION/SUMMARY
******************************************************************************************

I'm confident that in our case, the reason for the slow performance between (SORT & CONVERT /NOSORT) and CONVERT/SORT is down to the fact that

1) Unlike SORT (which defaults to /PROCESS=RECORD), CONVERT/SORT effectively sorts the input file using /PROCESS=TAG
2) This results in an initial pass through the input file to build a list (in SORTWORK files) of primary keys and pointers to locations in the input file, where those records can be found.
3) A second pass is then performed through the input file, where - because the file content is not in primary key order - blocks are not read in sequential order.
4) It's clear from the stats that irrespective of what RMS buffer size or quantity you specify, or whether or not the input or SORTWORK files are fragmented when CONVERT /SORT is performed, the number of I/Os required to perform this second pass is "the same".
5) However, with a "large" input file, the amount of time taken for I/O requests on the second pass
to complete is longer.
6) This is consistent with read requests not being satisfied from the XFC cache, and instead having to be serviced by reading "in a haphazard manner" from the input file.
7) This likely contributes to delays whilst disk heads move to the required location (along with delays introduced by using SAN disks rather than directly-attached ones).

Since there is still a need for the dumped/RMU-unload RDB table to be accessed as an index file*, the RMU table dump/unload file (a sequential file) still needs to be CONVERTed into an indexed file.

However we know that attempting to CONVERT a file where the primary keys are not in sequence will result in (some) records not being written to the output file.

The only means of surmounting this is either to pre-sort the input file, or to get CONVERT to sort the input file during processing.

I've already established that in my environment, attempting to CONVERT /SORT a file of this size (partly because it is more than twice the size of the XFC cache), is a non sequitur...

Once CONVERT has built a sorted list of primary keys, the time taken to complete I/Os to then retrieve record from the input file (in order to write it to the output file) would be unacceptable.

[And that is aside from adding more load to the system and to the SAN link & farm itself]

Therefore the only acceptable means of converting this sequential file containing an RDB table unloaded by RMU, is to pre-sort it (using SORT), and then use that as the input to CONVERT /NOSORT.

*(and with the records in sorted primary key order, for reasons yet to be investigated)

******************************************************************************************
DIAGNOSIS/EVIDENCE OF MY CONCLUSION
******************************************************************************************
After much wailing and gnashing of teeth, I'm satisfied that I now know the reason for the difference between SORT & CONVERT/NOSORT vs CONVERT/SORT...

Mainly as a result of Hein's input, I considered a number of possible factors.

A) RMS buffers - number and size thereof.

Examining the existing settings showed this:

MULTI- | MULTIBUFFER COUNTS | NETWORK
BLOCK | Indexed Relative Sequential | BLOCK
COUNT | Disk Magtape Unit Record | COUNT
Process 0 | 0 0 0 0 0 | 0
System 32 | 0 0 0 0 0 | 16

Prolog Extend Quantity QUERY_LOCK
Process 0 0 System
System 0 0 Enabled

CONTENTION_POLICY
Process System
System Never

Common factors across testing:

1) Input file is contiguous, sequential, and has 10m recrods
2) Output file is located on, and SYS$SCRATCH points to the same disk as the input file
3) Convert command used is $ CONVERT INPUT_FILE.DAT OUTPUT_FILE.DAT /FDL=3KEYS.FDL /STAT=FULL

a) The above ("default") RMS settings are in use.
Finished Load of Key 0 at 09:09:30.00

SORT Statistics for Key 0
Records Sorted: 10000000 Initial Runs: 255
Sort Tree Size: 19674 Work File Allocation: 422210
Merge Order: 20 Merge Passes: 14
Direct I/O Count: 79777 Buffered I/O Count: 682
Elapsed Time: 0 00:00:47.98 CPU Time: 0 00:00:43.90
Page Faults: 100

LOAD Statistics for Key 0
Direct I/O Count: 10114452 Buffered I/O Count: 1057
Elapsed Time: 0 00:09:10.21 CPU Time: 0 00:08:49.45
Page Faults: 414

Finished Load of Key 1 at 09:10:04.90

SORT Statistics for Key 1
Records Sorted: 10000000 Initial Runs: 312
Sort Tree Size: 14050 Work File Allocation: 546626
Merge Order: 0 Merge Passes: 0
Direct I/O Count: 2 Buffered I/O Count: 3
Elapsed Time: 0 00:00:00.01 CPU Time: 0 00:00:00.00
Page Faults: 0

LOAD Statistics for Key 1
Direct I/O Count: 15676 Buffered I/O Count: 111
Elapsed Time: 0 00:00:34.87 CPU Time: 0 00:00:29.26
Page Faults: 0

Finished Load of Key 2 at 09:13:14.17

SORT Statistics for Key 2
Records Sorted: 10000000 Initial Runs: 0
Sort Tree Size: 10002448 Work File Allocation: 0
Merge Order: 0 Merge Passes: 0
Direct I/O Count: 2769 Buffered I/O Count: 7
Elapsed Time: 0 00:00:53.09 CPU Time: 0 00:00:52.21
Page Faults: 269178

LOAD Statistics for Key 2
Direct I/O Count: 1671 Buffered I/O Count: 3
Elapsed Time: 0 00:02:16.17 CPU Time: 0 00:02:02.46
Page Faults: 58

CONVERT Statistics
Number of Files Processed: 1
Total Records Processed: 10000000 Buffered I/O Count: 1916
Total Exception Records: 0 Direct I/O Count: 10214378
Total Valid Records: 10000000 Page Faults: 269797
Elapsed Time: 0 00:13:42.49 CPU Time: 0 00:12:57.35

b) RMS settings changed with the following command:
$ SET RMS_DEFAULT /BLOCK_COUNT=127 /EXTEND_QUANTITY=65376 /BUFFER_COUNT=255 /INDEXED /SEQUENTIAL

[65376 for the extend quantity rather than 65535, because it is the largest multiple of 288 (the
cluster size of the disk on which the output file is generated) that is divisible into 65535 with
no remainder]

The above changes are perhaps at the extreme end (max buffers * max size) but it actually caused
the key 0 load time to double.

Finished Load of Key 0 at 08:52:58.72

SORT Statistics for Key 0
Records Sorted: 10000000 Initial Runs: 255
Sort Tree Size: 19674 Work File Allocation: 422210
Merge Order: 20 Merge Passes: 14
Direct I/O Count: 23388 Buffered I/O Count: 683
Elapsed Time: 0 00:01:07.16 CPU Time: 0 00:00:51.12
Page Faults: 2007

LOAD Statistics for Key 0
Direct I/O Count: 9918781 Buffered I/O Count: 1069
Elapsed Time: 0 00:21:48.72 CPU Time: 0 00:20:57.03
Page Faults: 414

Finished Load of Key 1 at 08:53:35.06

SORT Statistics for Key 1
Records Sorted: 10000000 Initial Runs: 312
Sort Tree Size: 14050 Work File Allocation: 546626
Merge Order: 0 Merge Passes: 0
Direct I/O Count: 2 Buffered I/O Count: 3
Elapsed Time: 0 00:00:00.01 CPU Time: 0 00:00:00.00
Page Faults: 0

LOAD Statistics for Key 1
Direct I/O Count: 15675 Buffered I/O Count: 111
Elapsed Time: 0 00:00:36.32 CPU Time: 0 00:00:27.69
Page Faults: 0

Finished Load of Key 2 at 08:55:57.14

SORT Statistics for Key 2
Records Sorted: 10000000 Initial Runs: 0
Sort Tree Size: 10002448 Work File Allocation: 0
Merge Order: 0 Merge Passes: 0
Direct I/O Count: 2769 Buffered I/O Count: 7
Elapsed Time: 0 00:00:19.58 CPU Time: 0 00:00:18.42
Page Faults: 39138

LOAD Statistics for Key 2
Direct I/O Count: 1673 Buffered I/O Count: 3
Elapsed Time: 0 00:02:02.49 CPU Time: 0 00:01:58.23
Page Faults: 6

CONVERT Statistics
Number of Files Processed: 1
Total Records Processed: 10000000 Buffered I/O Count: 1929
Total Exception Records: 0 Direct I/O Count: 9962320
Total Valid Records: 10000000 Page Faults: 41872
Elapsed Time: 0 00:25:54.41 CPU Time: 0 00:24:32.54

c) RMS settings changed with the following command:
$ SET RMS_DEFAULT /BLOCK_COUNT=127 /EXTEND_QUANTITY=65376 /BUFFER_COUNT=0 /INDEXED /SEQUENTIAL

Finished Load of Key 0 at 09:49:27.22

SORT Statistics for Key 0
Records Sorted: 10000000 Initial Runs: 255
Sort Tree Size: 19674 Work File Allocation: 422210
Merge Order: 20 Merge Passes: 14
Direct I/O Count: 23392 Buffered I/O Count: 682
Elapsed Time: 0 00:00:51.03 CPU Time: 0 00:00:44.10
Page Faults: 108

LOAD Statistics for Key 0
Direct I/O Count: 10058459 Buffered I/O Count: 1057
Elapsed Time: 0 00:21:02.08 CPU Time: 0 00:20:08.74
Page Faults: 407

Finished Load of Key 1 at 09:49:54.15

SORT Statistics for Key 1
Records Sorted: 10000000 Initial Runs: 312
Sort Tree Size: 14050 Work File Allocation: 546626
Merge Order: 0 Merge Passes: 0
Direct I/O Count: 3 Buffered I/O Count: 3
Elapsed Time: 0 00:00:00.00 CPU Time: 0 00:00:00.00
Page Faults: 0

LOAD Statistics for Key 1
Direct I/O Count: 15679 Buffered I/O Count: 111
Elapsed Time: 0 00:00:26.92 CPU Time: 0 00:00:22.82
Page Faults: 0

Finished Load of Key 2 at 09:53:17.63

SORT Statistics for Key 2
Records Sorted: 10000000 Initial Runs: 0
Sort Tree Size: 10002448 Work File Allocation: 0
Merge Order: 0 Merge Passes: 0
Direct I/O Count: 2769 Buffered I/O Count: 7
Elapsed Time: 0 00:01:21.65 CPU Time: 0 00:01:19.35
Page Faults: 489689

LOAD Statistics for Key 2
Direct I/O Count: 1671 Buffered I/O Count: 3
Elapsed Time: 0 00:02:01.82 CPU Time: 0 00:01:58.98
Page Faults: 238

CONVERT Statistics
Number of Files Processed: 1
Total Records Processed: 10000000 Buffered I/O Count: 1916
Total Exception Records: 0 Direct I/O Count: 10102004
Total Valid Records: 10000000 Page Faults: 490488
Elapsed Time: 0 00:25:43.59 CPU Time: 0 00:24:34.01

d) RMS settings changed with the following command:
$ SET RMS_DEFAULT /BLOCK_COUNT=0 /EXTEND_QUANTITY=65376 /BUFFER_COUNT=255 /INDEXED /SEQUENTIAL

Finished Load of Key 0 at 10:09:51.26

SORT Statistics for Key 0
Records Sorted: 10000000 Initial Runs: 255
Sort Tree Size: 19674 Work File Allocation: 422210
Merge Order: 20 Merge Passes: 14
Direct I/O Count: 79830 Buffered I/O Count: 682
Elapsed Time: 0 00:00:56.67 CPU Time: 0 00:00:46.34
Page Faults: 333

LOAD Statistics for Key 0
Direct I/O Count: 10078515 Buffered I/O Count: 1057
Elapsed Time: 0 00:12:10.60 CPU Time: 0 00:10:06.04
Page Faults: 418

Finished Load of Key 1 at 10:10:19.35

SORT Statistics for Key 1
Records Sorted: 10000000 Initial Runs: 312
Sort Tree Size: 14050 Work File Allocation: 546626
Merge Order: 0 Merge Passes: 0
Direct I/O Count: 2 Buffered I/O Count: 3
Elapsed Time: 0 00:00:00.00 CPU Time: 0 00:00:00.00
Page Faults: 0

LOAD Statistics for Key 1
Direct I/O Count: 15678 Buffered I/O Count: 111
Elapsed Time: 0 00:00:28.08 CPU Time: 0 00:00:23.57
Page Faults: 0

Finished Load of Key 2 at 10:16:10.34

SORT Statistics for Key 2
Records Sorted: 10000000 Initial Runs: 0
Sort Tree Size: 10002448 Work File Allocation: 0
Merge Order: 0 Merge Passes: 0
Direct I/O Count: 2769 Buffered I/O Count: 7
Elapsed Time: 0 00:03:33.43 CPU Time: 0 00:03:25.65
Page Faults: 1803297

LOAD Statistics for Key 2
Direct I/O Count: 1671 Buffered I/O Count: 3
Elapsed Time: 0 00:02:17.55 CPU Time: 0 00:02:00.35
Page Faults: 61

CONVERT Statistics
Number of Files Processed: 1
Total Records Processed: 10000000 Buffered I/O Count: 1916
Total Exception Records: 0 Direct I/O Count: 10178497
Total Valid Records: 10000000 Page Faults: 1804442
Elapsed Time: 0 00:19:26.45 CPU Time: 0 00:16:41.98

B) XFC Cache
Despite what I've indicated above, my testing wasn't always as standard as above; Hein had
suggested that "More likely the smaller files had all data in the XFC cache, so randomness was
'forgiven'".

I have to confess that I don't know a great deal about cacheing in VMS - I just trust it to do
stuff by itself without any interference from me, so haven't had the need to read up on it before.

Consequently, I didn't know what the XFC cache was (or even at the time, what XFC stood for), what
precisely it caches, or how it works (e.g. if you have large (numbers of) RMS buffers, does this
cause file reads to read more from sequential files than you request, on the basis that you'll
want the next few blocks etc.)

I decided to have a look into cacheing, and I compared the volume shadowset I was using for my
testing with the one normally used by the process which has been having problems.

Test volume:
Extent cache size 64 Maximum blocks in extent cache 60941376
File ID cache size 64 Blocks in extent cache 3788928
Quota cache size 0 Maximum buffers in FCP cache 57995

Normal volume:
Extent cache size 64 Maximum blocks in extent cache 28335832
File ID cache size 64 Blocks in extent cache 0
Quota cache size 0 Maximum buffers in FCP cache 57995

I had been using the test volume for input & output files, and the normal volume for the
sort work files (to improve throughput, and to help reduce fragmentation of the sort work
files (depending on when CONVERT requests to create these in relation to the output file)).

I wasn't sure what the significance of 0 blocks being in the extent cache (whatever that is),
but I suspected it wasn't good, so decided to move the sort work files to the same device as
the input/output files, but this didn't make any difference to timings.

I then had another thought about Hein's mention of "smaller files had all data in the XFC cache, so randomness was forgiven".

I thought that perhaps the randomness was a causal factor...

As rows are deleted from the database and deleted rows are subsequently reused by new rows being added,
the rows would be less likely to be in sequential order.

Given that I knew that attempts to CONVERT /SORT the first 10m records "was okay", whereas the first 20m (and even the first 15m) records "took forever", I did consider that for industry-specific reasons, the data added to the table in the earlier years was
probably more likely to still be in sequential order, and that later data was non-sequential.

Extracting the second block of 10m records, and SORTing this (rather than CONVERT /SORTing it), didn't prove the point - timings for sorting the first 10m and the second 10m records were approximately the same.

I'd hoped that like the description for how CONVERT/SORT works (in the Record Management Utilities Reference Manual), there might be something similar for SORT - but no such luck, so I tried the help library on the offchance that there might be a description.

There wasn't, but it did lead me to finding the default behaviour - SORT /PROCESS=RECORD

What CONVERT /SORT does is essentially SORT /PROCESS=TAG (probably because unlike SORT, CONVERT allows for multiple keys, and therefore, it effectively "needs" to do SORT /PROCESS=TAG).

In my particular case, I've established that we no longer need the second and third keys, but even with a single key, CONVERT /SORT will still essentially do a SORT /PROCESS=TAG.

I then considered that (as Hein had mentioned) CONVERT /SORT is performing at least 2 passes of the input file, and that perhaps the issue was whether or not the file was in the cache in its entirety.

Attempting to SORT 10m records /PROCESS=TAG was taking "forever" too.

I thought that perhaps because of the randomness of the location of the records in the input file, when SORT came to generate the output file (after first having generated sort work files with primary keys and pointers to the location of the records in the input files), the problem was the "virtual" (SAN farm) disk thrashing about all over the input file to get the records in order, rather than the file not being in the cache.

[I also thought that perhaps when CONVERT/SORT happened to work "normally" in the past, it wasn't so much because the input file wasn't fragmented, but *because* it was fragmented...

i.e. the "random" location of the fragments meant that the non-sequential contents of the file, now happened to be laid out on disk in a sequential manner (a bit like a "double negative" making a positive, IYSWIM)]

However, whilst having searched c.o.v postings related to XFC, I came across a command that I'd not seen/used before:

$ SHOW MEMORY /CACHE=FILE=

I used this whilst performing a CONVERT/SORT, and after CONVERT creates its sort work files with the primary keys and pointers to record locations in the input file, it uses them to build the output file.

As the number of reads to the input file increases, the cache hit rate went down (from 98% to 58% and counting^w decrementing)

Obviously, the bigger the file, the less likely it is that it would be cacheable in its entirety (depending on the size of the XFC, and what demands are made on it by other files).

On our system, the full file (~5.5GB) can't be entirely cached by the XFC (which is sized at 2.06GB).

Even if the file could be entirely cached, by the time the file is fully loaded into the cache, due to the random nature in which CONVERT /SORT will be retrieving records,
it will have had a lot of cache hit failures, and many of the blocks read into the cache would only be referenced once or a handful of times, "taking up space" that would be better used by as-yet-unread blocks.

[I'm not sure, but I'm assuming that when SHOW MEM /CACHE reports page counts, it's talking about pages of size 'F$GETSYI("PAGE_SIZE"), rather than defaulting to Vax page sizes?
Do (all) facilities reporting pages really mean pages of size 'F$GETSYI("PAGE_SIZE")', or do they sometimes (for backwardws compatability purposes, mean Vax page sizes?]

My tests with 500k records resulted in an input file of ~120608 blocks, which would comfortably fit in the varying numbers of allocated cache pages I saw (between 136000 and 189000), if the page size being reported on is 'F$GETSYI("PAGE_SIZE")'.

Similarly, with 10m records, the input file is ~2.4m blocks; 189000 pages would equate (on our cluster) to 189000 * 8192 bytes, equivalent to 3024000 512-byte blocks, i.e. enough to cache the 2.4m blocks (same caveat on page size applies).

Hein van den Heuvel · ‎03-04-2010

Mark,

Thank you very much for the extensive summary reply. It validates the efforts others have put in to try to help some.

- Random versus sequential IO is a HUGE factor.
- Larger buffers help a lot in sequential access.
- Many buffers rarely help, they just burn CPU to find/maintain them,
-- For sequential you only need a few more than one to get read-ahead and write-behind.
-- For random access you need enough to significantly improve the 'odds' of having the right buffers. That's often unlikely.
- Many big buffers is just silly: Each IO reads lots of block, only to use one or a few
A few years back I saw 50 buffers as already wasting more CPU time to wade through versus re-reading from XFC, because RMS uses a (silly!) sequential search through the buffer descriptors. That 50 is not hard, but 255 is probably always a CPU loss (unless reading from floppies or similar :-)

- When the XFC can help it will... big time. But when you exceeds its size, the cliff steep.
-- That's why I often run tests against (virtual) disks mounted NOCACHE or againt and MD device, for which the XFC does not attempt to cache.

>> "attempting to CONVERT a file where the primary keys are not in sequence will result in (some) records not being written to the output file."

- I get the impression there might only be thousands, not millions, out of order. If the out of orderness is controlled (going back), then CONVERT/EXCEPT can be used to capture the odd ones.:
For example, the 12 in 10,11,15,16,12,17,18 will show of in the exception file.
However, if the records were 10,11,20,12,15,16,17,18
then the exception file would have 12,15,16,12,17,18

You and I might recognize the 20 as the odd one out, but convert/sort will not.

Some program/perl script might be able to split the file in an ordered main stream, and the odd-ones-out. You could then convert/nosort the bulk file, and convert/merge in the odd ones later.

>> XFC cache was ... how it works .. ead more from sequential files than you request,

Yes, there is a read ahead.
ANALYZE/SYSTEM has a nice XFC extension.

>> decided to have a look into cacheing
"Extent cache size "....

Those are just the files system meta-data caches.
Not the data caches.
The file system caches typically only have a marginal effect.
Yes you ought to have them reasonable, but they will not make a 10x difference like the XFC may give.

>> What CONVERT /SORT does is essentially SORT /PROCESS=TAG

Yes, it uses and RFA sort and the main reason is SPACE. A record sort doubles the space requirements, Convert, unlike you or me, can not decide to delete the input once sorted.
The RFA sort needs 6 bytes per record which typically is factor of 10 - 1000 less spaces than the real records.
The price is speed, but remember this was coded up 25 years ago when space was the premium.

>> SHOW MEMORY /CACHE=FILE=

Yes, that's the real DATA cache for a file, and it's stats. That's the good stuff!

Yes XFC pages are system pages size: 8192 bytes typically.

You can also use SHOW MEMO/UNIT=BYTES to help confirm.

>> Similarly, with 10m records, the input file is ~2.4m blocks

Yes, that 1+ GB is readily cacheable on a low memory pressure modern system.
You see that in the CPU time approaching elapsed time.

Mark Corcoran · ‎03-05-2010

Grrr!

I've lost my edit session again here, so need to remember what I typed.

[I think, in part, it was either because I was in VMS mode when I was typing, and hit CTRL-H (which in IE will not take you to the head of the line in an edit box, but actually take you back a page), or because my laptop had not sensed/received the break keycode for the depress of the the CTRL key).

So, I'm now going to edit this in MS Word then cut & paste from there (because at least I can undo/redo in Word)]

>Thank you very much for the extensive summary reply. It validates the efforts others have put in to try to help some.

No problem; thanks everyone for their input - it was much appreciated.

I know that not everybody likes a tome-sized update (especially round here, where bullet-pointed 2-word summaries are preferred, even though it then makes them meaningless), but it's a bit like being back in school - getting credit points for showing how you arrived at your answer (to show you understand the process), rather than just give the answer (using a calculator).

>Many big buffers is just silly: Each IO reads lots of block, only to use one or a few
I'd guessed as much, but I felt the need to take the extremes at each end, with suitable intervals in between.

Unfortunately, the amount of time taken to perform tests (and a lot of the tests I did were not shown in my update) limited the number of intermediate RMS (and other) settings that I could use.

[I'd already felt guilty about having spend a lot of time recoding ~35 command files into a single one, to prevent input/output file fragmentation (and to standardise on naming conventions, and to avoid having to make ~35 sets of changes each time something changed).

I was at the stage of testing it when a colleague found that SORT & CONVERT/NOSORT was quicker than CONVERT/SORT; everybody jumped on the bandwagon and blamed CONVERT.

I was a little sceptical - I wasn't happy not knowing why the difference occurred, and was concerned that either the use of SORT worked but might stop working in X amount of time, or that coincidentally something else changed around the same time, and in fact, the SORT didn't fix the problem.

Some IT people/techies are happy to let self-resolving issues go unexplained, but not me; I realise that you can get to a point of "diminishing returns", where how much time you spend investigating it vs how likely you are to get to the bottom of it, becomes more time & less likely; I just hope I'm sensible enough to realise when I've reached that point :-D]

>I get the impression there might only be thousands, not millions, out of order
I was actually wondering as to how many there were, and had considered writing a script to determine this, but I tried your suggestion of using the exceptions file.

The problem is that after ~8mins of running, the number of records processed was ~1/10th the total, and the exceptions file was 1/10th the size of the input file, so I aborted it.

Looking at the exceptions file and the input file, it was clear why this was the case, and is something that I find sometimes happens with DIFFERENCES.EXE when you compare two similar (but not the same) files.

Suppose the first 4 records' primary keys were this:

000000000001
900000000002
000000000003
000000000004

As humans, we can clearly see that the 900000000002 record is the odd one out here; CONVERT however, doesn't have the artificial intelligence to be able to do this - it simply sees the 900000000002 is next in sequence after 000000000001, and that anything less than or equal to 900000000001 must be out of sequence.

[i.e. your example was spot on]

On reflection, I'm not sure how one would programmatically generate some percentage value for out-of-orderness, given the aboveâ ¦

>Yes, it uses and RFA sort and the main reason is SPACE.
It had struck me that space would be the likely reason for the way that CONVERT sorts records, but I think perhaps when a user specifies /FDL and the FDL file only has a single key in it, that this type of sort is no longer suitable (in speed terms - obviously, it depends on the disk space available to CONVERT on SYS$SCRATCH).

>You can also use SHOW MEMO/UNIT=BYTES to help confirm.
Alas, it seems that SHOW MEM /CACHE always used bytes (or GB/MB anyway) as the unit for display, but always uses pages for SHOW MEM /CACHE=FILE.

I tried doing SET PROCESS /UNIT=BYTES, in case for some reason the /UNIT qualifier on the SHOW command was being ignored or overridden, but it made no difference :-(

Thanks again for your help. Now I've got some recoding to doâ ¦

Mark

Mark Corcoran · ‎04-07-2010

See my last update to the thread.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: SORT plus CONVERT /NOSORT is quicker than CONVERT /SORT ???