Operating System - OpenVMS
1828580 Members
2246 Online
109982 Solutions
New Discussion

Re: standard OPEN VMS checksum on large size file

 
SOLVED
Go to solution
John_960
New Member

standard OPEN VMS checksum on large size file

The open VMS checksum utility run very fast on small to medium size files. It run very slow on large size files.

Anyone has any insight about how to improve the performance on large size file checksum?
10 REPLIES 10
Garry Fruth
Trusted Contributor
Solution

Re: standard OPEN VMS checksum on large size file

You could try SET RMS/BLOCK=127 and see if that helps.
David B Sneddon
Honored Contributor

Re: standard OPEN VMS checksum on large size file

John,

How big is "large"?
Is the file in question fragmented?

$ dump/header/block=count=0 file.type

will show how many extents are in the file.

Regards
Dave
Karl Rohwedder
Honored Contributor

Re: standard OPEN VMS checksum on large size file

I've experimented a little with a 75MB Saveset (contigous) and different RMS/BLOCK and /BUFFER values. The results are in the attached textfiles. Important seems to be a higher /BLOCK_COUNT.

mfg Kalle
Volker Halle
Honored Contributor

Re: standard OPEN VMS checksum on large size file

John,

I've just done some simple tests on a 1,250,000 block file (no caching). The performance seems to mainly depend on the throughput delivered by the disk drive:

Test on rx2600 72 GB SCSI drive:

Multiblock count= 32:
9.97 sec elapsed, 6.56 sec CPU time
39066 DIRIOs -> 3918 DIRIO/SEC -> 62 MB/sec

Multiblock count= 127:
8.27 sec elapsed, 6.78 sec CPU time
9846 DIRIO -> 1190 DIRIO/sec -> 74 MB/sec

Test on Alpha 1000 8 GB SCSI drive:

Multiblock count= 32:
198.49 sec elpased, 31.07 sec CPU time
39040 DIRIO, 196 IO/sec, 3.14 MB/sec

Multiblock count= 127:
203.27 sec elapsed, 27.94 sec CPU time
9847 DIRIO -> 48 DIRIO/sec -> 3 MB/sec

Volker.
Volker Halle
Honored Contributor

Re: standard OPEN VMS checksum on large size file

John,

the multi-buffer count (/BUFF=n/SEQ) also seems to play an important part. Setting /BUF=1/SEQ/BLOCK=127 reduced the runtime in my test on the Alpha SCSI disk to 109.47 secs.

Volker.
John_960
New Member

Re: standard OPEN VMS checksum on large size file

These are RMS fixed record size files. The size is average 2GB.
John_960
New Member

Re: standard OPEN VMS checksum on large size file

thanks all. The block count = 127 makes a hugh difference
Hein van den Heuvel
Honored Contributor

Re: standard OPEN VMS checksum on large size file


I realize that the main answer is largerly answerred. Just a few observations...

SET RMS/BUF = the number of buffers. Well, this application will never come back to a buffer, so just one would be enough. Any more buffers will just cause more administrative work, page managemetn and the likes. This is nicely visible in Karl's log.

Note however that RMS can do READ-AHEAD (connect time option RAB$V_RAH). If RAH is active, then the second buffer will give the bulk of the improvement. 4 will help some more, but you'll find rapidly diminshing returns after that. The default is 2 buffers, if RAH is active.

SET RMS/BUF = defines the size of each buffer. In general, to bigger the better. For just about the same work, you get more data to work on. Straightforward. Again, see Karl's log for an example.

The RMS maximum is 127 blocks. This is an ODD number due to a 16 bit field limitation.(because 0 bytes is a valid number, 128 blocks = 2**16 = 64 KB = 65536 bytes requires 17 bits, which we do not have.)

For many setups you will find that a large multiple of 16 (specifically 96 or 64) works better that 127. For example the XFC works a little better as each cache line is 16 blocks, so it can use a minumal number of cache lines per io. No split.
And for raid disks, if you disk cluster size as well as the underlying stripe/chunk sizes are multples of 16, it'll all work out just a little bit better also.

Now, why does RMS not do the right thing? Well, when you open the file is has no hint as to what you will do. Maybe you'll just read the first block only? So if an applicatoon knows it will just try to read the whole file (ftp, checksum, copy, search, grep, ...) then it behoves that application to tell rms that: RAH, MBC, MBF, SQO

Cheers,
Hein.
John Gillings
Honored Contributor

Re: standard OPEN VMS checksum on large size file

John,

Another point...

The CHECKSUM command is "officially" undocumented and unsupported. It's there for use by OpenVMS utilities (specifically VMSINSTAL), so as long as it meets the requirements for its intended purpose, how it works in other contexts is irrelevant!

As should be obvious, the performance is almost certainly dependent on how fast you can read the entire file. The calculation part is trivial and I doubt there's anything you can do to tune it. I'd expect the scaling for larger files to be linear.

Depending on what you want it for, it may make sense to write your own checksum utility which better meets your needs.

One other trick that might be worth trying. Use:

$ SET FILE/ATTR=(RFM:FIX,MRS:16384,LRL:16384)

(remember the SAVE the attributes first so you can restore them correctly!)

On a 200K block file with variable length records, this reduced the time from 1:30 to 25 seconds.

This may give a different checksum from the original attributes, but as long as you always calculate it the same way, that shouldn't matter.
A crucible of informative mistakes

Re: standard OPEN VMS checksum on large size file

>> The CHECKSUM command is "officially" undocumented and unsupported

Checksum is supported at least on version 8.2
( HP OpenVMS Version 8.2 New Features and Documentation Overview 4.3)