Operating System - OpenVMS
1748204 Members
4133 Online
108759 Solutions
New Discussion юеВ

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

 
SOLVED
Go to solution
comarow
Trusted Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Heil,

Thank you so much for bringing this discussion to a higher level.


From a rehashed version of VMS file system internals book (section 5.4.5).

5.4.5 Dynamic Highwater Marking

Disk scavenging is a security problem where
...
VMS solves this problem with the combination of the two following techniques:


o Erase-on-allocate
o Highwater marking

Both are enabled when the highwater marking volume attribute is enabled with
the SET VOLUME/HIGHWATER command.

VMS maintains a highwater mark which indicates how far the file has been
written in its allotted space on the disk. All blocks in the file up to the highwater
mark are guaranteed to have been written since they were allocated to the file.
The user is not permitted to read beyond the highwater mark, and thus cannot
read stale data from the file.

Erase-on-allocate is the more costly but conservative technique. It is used when
the file is open, allowing any form of shared access or nonsequential access.
Erase-on-allocate, as its name implies, simply means erasing all disk blocks when
they are allocated to the file. The file's highwater mark is set to point to the end
of the newly allocated and erased space.

Highwater marking is used only when the file is open for write with exclusive
access in sequential-only mode. In this mode, the highwater mark is maintained
in memory and cannot be maintained across multiple nodes of a cluster with
acceptable performance (which is why access is limited to a single accessor)."



Thus, as far as I understand, erase on allocate only occurs on shared or random access, but not on sequential, private access.

Back to the original question, is there a performance penalty? The answer is, definately maybe.

Have fun!

Steven Schweda
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

> Thus, as far as I understand, erase on
> allocate only occurs on shared or random
> access, but not on sequential, private
> access.

More or less. If the application sets the
SQO flag, strictly sequential access is
pretty painless. (As I recall, with SQO set,
non-sequential access fails with a run-time
error. I never tried any shared access with
SQO.) Without the SQO bit set, the resulting
erase-on-allocate behavior can be pretty
close to crippling (when a large allocation
is done).

As I learned the hard way, SQO is _not_ set
by default, and setting it can be a very good
idea.
Hein van den Heuvel
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Comarow, good follow up and clarification. Thanks.

Steve. Ditto. The need for SQO keeps surprising me, but I suppose that's just the way it is.

Cheers,
Hein.

John Gillings
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

I'm not sure if I can explain Comarow's observations.

Here's an experiment I just tried:

1) SCSI disk with HIGHWATER enabled, DKB0
2) Create a reasonably large text file with a rotating alphabet pattern ~ 12,000 blocks
3) Delete the file
4) $ COPY NL: TEST.DAT/ALLOCATE=4000
5) $ SET FILE/END TEST.DAT
6) $ DUMP/BLOCK=(START:2000,COUNT:1) TEST.DAT

As expected, result is all zeros

7) $ SET VOLUME/NOHIGH DKB0
8) $ COPY NL: TEST.DAT/ALLOCATE=4000
9) $ SET FILE/END TEST.DAT
10) $ DUMP/BLOCK=(START:2000,COUNT:1) TEST.DAT

As expected the block contains my rotating alphabet.

11) $ SET VOLUME/HIGH DKB0
12) $ COPY NL: TEST.DAT/ALLOCATE=4000
13) $ DUMP/HEAD/BLOCK=COUNT:0 TEST.DAT

examine map area to determine LBN of VBN 2000 within the file => 96216

Now dump the LBN directly from the disk - this bypasses any HWM processing.

14) $ DUMP/BLOCK=(START:96216,COUNT:1) DKB0:

As expected, block contains my rotating alphabet.

15) $ SET FILE/END TEST.DAT
16) $ DUMP/BLOCK=(START:96216,COUNT:1) DKB0:

Block now contains zeros. The SET FILE/END has pushed the EOF and HWM to the last allocated block, forcing the OS to zero out all the blocks in between. This demonstrates that the zeroing has NOT happened on allocation, it only happened when the EOF was moved without writing data.

The only explanation I can give for the behaviour seen by Steven is somehow the ZIP code is READING a high VBN within the newly created file. If all it does is write at the EOF there should be no unnecessary writing of zeros.

As Hein said earlier, for a "well behaved" application, there should be no significant overhead for HWM. It's only applications which do nasty things like SET FILE/END which suffer.

Steven, perhaps you could send me a tiny example program which demonstrates the SQO and non-SQO behavour?
A crucible of informative mistakes
Steven Schweda
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

> [...] perhaps you could send me a tiny
> example program which demonstrates the SQO
> and non-SQO behavour?

I don't see a small test case in my code
pile. As I said, UnZip 5.52 (source for
which everyone should already have) was the
original test case. (If you run the
experiment on UnZip, be sure to disable _all_
the SQO-setting code in VMS.C, as there are
multiple (4?) instances.)

Since being informed of the SQO miracle flag,
I've been trying to set it every time I get
a chance, as it makes such a big difference
with large files. UnZip is a particularly
good candidate, as it knows the output file
size before it creates it, so it can (and
does) allocate the whole thing in one shot.
Sadly, before 5.52, this could cause a disk
seizure for minutes at a time on a large
allocation.

You're welcome to look, but I believe that
UnZip does nothing other than sys$create(),
perhaps sys$extend(), and sys$connect(),
then strictly sequential writes. As I
recall, the choke-hold on the disk occurred
early, at the sys$create().
Jan van den Boogaard
Frequent Advisor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Thanks for all the input !!

After reading the answers and after searching the docs for "highwater" AND ALSO for "high-water" (with hyphen!) my own conclusion is:

A lot depends on how applications behave.

Highwatermarking can have a considerable overhead when the "erase-on-allocate" feature is triggered.

When a new disk is initialized , and INIT/ERASE is used, and SET VOL/ERASE is used right from the start of the volume, then highwater marking is of no use because data is erased when deleted or purged. So in a situation where files are seldom deleted/purged , the combination INIT/ERASE , SET VOL/ERASE/NOHIGH seems sensible to me.