Operating System - OpenVMS
1827584 Members
2755 Online
109965 Solutions
New Discussion

Pros and cons of HIGHWATERMARKING and INIT/ERASE

 
SOLVED
Go to solution
Jan van den Boogaard
Frequent Advisor

Pros and cons of HIGHWATERMARKING and INIT/ERASE

Hi everyone,
OpenVMS has features for file security: SET VOLUME/HIGHWATER MARKING and INIT /ERASE and SET VOLUME/ERASE_ON_DELETE. What are the pros and cons of these options? I heard say that HIGHWATER marking means an overhead, but doesnt SET VOL/ERASE have an overhead as well? ( INIT /ERASE clearly has a one-time-only overhead.)

In cases of applications where new files are constantly being created , but deletions seldom happen, would SET VOL/ERASE be better than HIGHWATER_MARKING ?

Thnx, jan
15 REPLIES 15
comarow
Trusted Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Jan:

Those settings are to prevent scavangers
from allocating disk space, then dumping the
information. The benifit is very clear,
security.

The penalty is performance.

With High Water marking, as one allocates
blocks of disk space, the space is written with zeros.

Set Volume/ERASE is great security, as every time a file is deleted, the blocks
are written with zeros.

They both have a performance penalty. The choice is dependent on your environment.

Neither will satisfy DIS regulations.

If you have DFU and want to undelete a file,
(in that short space of time on an active disk), /erase will make that impossible.


The performance penalty is significant.

Hav fun
Hein van den Heuvel
Honored Contributor
Solution

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

comarow wrote...

>> Those settings are to prevent scavangers
from allocating disk space

Right.

>> With High Water marking, as one allocates
blocks of disk space, the space is written with zeros.

WRONG WRONG WRONG.

With HWM zeros are written as one READs beyond the current HWM for the file, wich normally is at the EOF of the file.

For normal, sequential, file writes HWM has NO OVERHEAD. Only badly behaving applications, trying to write beyond where they have written, get a penalty... which seems fair to me.

HWM will NOT protect against privilleged user access unallocted/uninitialized blocks through non-file-structured, logical block, IO.

>> Set Volume/ERASE is great security, as every time a file is deleted, the blocks
are written with zeros.

Right, and this will thus also prevent privved users from using brute force to read the data after delete (of course they probably had 'all the time in the world' to read that data before it was deleted).

They both have a performance penalty. The choice is dependent on your environment.

Jan,

INIT/Erase is a nice simple thing to do when not in a rush to release a new drive to usage.

SET VOL/erase sounds like a good suggestion in your case of low delete, but normally is a 'waste of time' as instead of writting an erase pattern after last usage, you can just write fresh data on re-use and write zeroes on attempted re-use before writing (HWM).

Hight Water Marking has been designed as a no-overhead security, but its erase pattern might not serve your needs.

Met vriendelijke groetjes,
Hein.


Steven Schweda
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

> For normal, sequential, file writes HWM
> has NO OVERHEAD. [...]

This assumes that the FAB$M_SQO bit is set,
right? That is, a naive application may
easily see a disk lock up for a long time if
a file is extended by, say, several GB.

For a good time, Zip-compress a file of a GB
or two, then UnZip it using an UnZip before
5.52. Then compare the behavior of UnZip
5.52 (or later). 5.52 sets the
sequential-access-only flag. Earlier
versions do not. Disable highwater marking
on the volume, and the difference goes away.
John Gillings
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Steven,

>That is, a naive application may
>easily see a disk lock up for a long time
>if a file is extended by, say, several GB.

No. That shouldn't happen. HWM will only write zeros if an attempt is made to access data beyond the HWM (which should normally be at or above EOF). Extending the file isn't a problem, the blocks are allocated to the file and still contain the old data. Writing at EOF will overwrite with new data, extending both EOF and HWM. If you try to read the data above the HWM (and implicitly, above EOF), the file system will first fill the gap between HWM and your target block with zeros, update the HWM, then read the block (now containing zeros).

Normally HWM and INIT/ERASE do quite different things, BUT in the specific workload where data is rarely deleted, INIT/ERASE and SET VOL/ERASE_ON_DELETE achieve the same result, but at higher "cost". All the "ERASE" options are a definite cost - you will incur the I/O to erase the blocks. With HWM, you will only incur the cost if you absolutely have to - that is, someone attempts to read data that they haven't written.

The only overhead well behaved applications will suffer from enabling HWM is that of updating the HWM in the file header, but for sequential files, HWM will follow EOF, so there shouldn't be any extra overhead.
A crucible of informative mistakes
comarow
Trusted Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Thanks.

I'm researching the behavior of HWM.
Wim Van den Wyngaert
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Wim Van den Wyngaert
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

If I remember correctly, "mon fcp" item "erase rate" is an indication of how many hwm-erases are realy done.

Wim
Wim
comarow
Trusted Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Thanks.

I'm very interested in where you obtained the description of the behavior.

I examined
http://h71000.www7.hp.com/doc/732FINAL/aa-q2hlg-te/aa-q2hlg-te.HTMl

Which describes high water marking, but does not describe how it is accomplished.
Similarly, the Guide to System Performance
mentions how to turn off high water marking to improve performance. But it to does not describe the actual mechanics.

All the Performance Cookbooks recommend turning it off if not needed.

This does not mean you are wrong.

However, we did some testing.

Here are our results:


Bob, my testing shows that a file gets zero'd on allocation but maybe there is another way to allocate blocks to a file that will not zero them

What I did was fill DVA0 with some files with lots of text

then I deleted the file and turned on HWM

Then using an FDL file I created a file w/ a large allocation

DIR showed the file as 0/2849

Then I turned off HWM and did a SET FILE/END and DUMP/BLOCK=(S:2000,C:1) and I got all zeros back

So unless CREATE was doing something behind my back I never used VBN 2000 and it should have had random text in it


Perhaps you would like to repeat the test?

I'd be very interested in how you came to your conclussions.
Steven Schweda
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

> No. That shouldn't happen. [...]

When I started to add large-file support to
[Un]Zip, I quickly tired of having the system
go to sleep for minutes during the initial
allocation of a large output file on a disk
with highwater marking enabled. Adding the
code to set the SQO bit was the cure.

You can run the experiment using UnZip 5.52.
The SQO code is all in [.VMS]VMS.C, just
above the sys$create() call which creates the
output file(s). Simply Zip a 2GB (or so)
file, and then UnZIp it. Disable the code
which sets the SQO bit, rebuild UnZip, and
run it again.

For even more impressive delays, you'd need
to switch to the not-yet-released BETA source
kits for Zip 3.0 and UnZip 6.0, as Zip 2.x
and UnZip 5.x are limited to files no larger
than 2GB.

For the original discussion on comp.os.vms,
see:

http://groups.google.com/group/comp.os.vms/browse_thread/thread/e40c7dbd70bab7c9/f8a01603c0ff49b3
comarow
Trusted Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Heil,

Thank you so much for bringing this discussion to a higher level.


From a rehashed version of VMS file system internals book (section 5.4.5).

5.4.5 Dynamic Highwater Marking

Disk scavenging is a security problem where
...
VMS solves this problem with the combination of the two following techniques:


o Erase-on-allocate
o Highwater marking

Both are enabled when the highwater marking volume attribute is enabled with
the SET VOLUME/HIGHWATER command.

VMS maintains a highwater mark which indicates how far the file has been
written in its allotted space on the disk. All blocks in the file up to the highwater
mark are guaranteed to have been written since they were allocated to the file.
The user is not permitted to read beyond the highwater mark, and thus cannot
read stale data from the file.

Erase-on-allocate is the more costly but conservative technique. It is used when
the file is open, allowing any form of shared access or nonsequential access.
Erase-on-allocate, as its name implies, simply means erasing all disk blocks when
they are allocated to the file. The file's highwater mark is set to point to the end
of the newly allocated and erased space.

Highwater marking is used only when the file is open for write with exclusive
access in sequential-only mode. In this mode, the highwater mark is maintained
in memory and cannot be maintained across multiple nodes of a cluster with
acceptable performance (which is why access is limited to a single accessor)."



Thus, as far as I understand, erase on allocate only occurs on shared or random access, but not on sequential, private access.

Back to the original question, is there a performance penalty? The answer is, definately maybe.

Have fun!

Steven Schweda
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

> Thus, as far as I understand, erase on
> allocate only occurs on shared or random
> access, but not on sequential, private
> access.

More or less. If the application sets the
SQO flag, strictly sequential access is
pretty painless. (As I recall, with SQO set,
non-sequential access fails with a run-time
error. I never tried any shared access with
SQO.) Without the SQO bit set, the resulting
erase-on-allocate behavior can be pretty
close to crippling (when a large allocation
is done).

As I learned the hard way, SQO is _not_ set
by default, and setting it can be a very good
idea.
Hein van den Heuvel
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Comarow, good follow up and clarification. Thanks.

Steve. Ditto. The need for SQO keeps surprising me, but I suppose that's just the way it is.

Cheers,
Hein.

John Gillings
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

I'm not sure if I can explain Comarow's observations.

Here's an experiment I just tried:

1) SCSI disk with HIGHWATER enabled, DKB0
2) Create a reasonably large text file with a rotating alphabet pattern ~ 12,000 blocks
3) Delete the file
4) $ COPY NL: TEST.DAT/ALLOCATE=4000
5) $ SET FILE/END TEST.DAT
6) $ DUMP/BLOCK=(START:2000,COUNT:1) TEST.DAT

As expected, result is all zeros

7) $ SET VOLUME/NOHIGH DKB0
8) $ COPY NL: TEST.DAT/ALLOCATE=4000
9) $ SET FILE/END TEST.DAT
10) $ DUMP/BLOCK=(START:2000,COUNT:1) TEST.DAT

As expected the block contains my rotating alphabet.

11) $ SET VOLUME/HIGH DKB0
12) $ COPY NL: TEST.DAT/ALLOCATE=4000
13) $ DUMP/HEAD/BLOCK=COUNT:0 TEST.DAT

examine map area to determine LBN of VBN 2000 within the file => 96216

Now dump the LBN directly from the disk - this bypasses any HWM processing.

14) $ DUMP/BLOCK=(START:96216,COUNT:1) DKB0:

As expected, block contains my rotating alphabet.

15) $ SET FILE/END TEST.DAT
16) $ DUMP/BLOCK=(START:96216,COUNT:1) DKB0:

Block now contains zeros. The SET FILE/END has pushed the EOF and HWM to the last allocated block, forcing the OS to zero out all the blocks in between. This demonstrates that the zeroing has NOT happened on allocation, it only happened when the EOF was moved without writing data.

The only explanation I can give for the behaviour seen by Steven is somehow the ZIP code is READING a high VBN within the newly created file. If all it does is write at the EOF there should be no unnecessary writing of zeros.

As Hein said earlier, for a "well behaved" application, there should be no significant overhead for HWM. It's only applications which do nasty things like SET FILE/END which suffer.

Steven, perhaps you could send me a tiny example program which demonstrates the SQO and non-SQO behavour?
A crucible of informative mistakes
Steven Schweda
Honored Contributor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

> [...] perhaps you could send me a tiny
> example program which demonstrates the SQO
> and non-SQO behavour?

I don't see a small test case in my code
pile. As I said, UnZip 5.52 (source for
which everyone should already have) was the
original test case. (If you run the
experiment on UnZip, be sure to disable _all_
the SQO-setting code in VMS.C, as there are
multiple (4?) instances.)

Since being informed of the SQO miracle flag,
I've been trying to set it every time I get
a chance, as it makes such a big difference
with large files. UnZip is a particularly
good candidate, as it knows the output file
size before it creates it, so it can (and
does) allocate the whole thing in one shot.
Sadly, before 5.52, this could cause a disk
seizure for minutes at a time on a large
allocation.

You're welcome to look, but I believe that
UnZip does nothing other than sys$create(),
perhaps sys$extend(), and sys$connect(),
then strictly sequential writes. As I
recall, the choke-hold on the disk occurred
early, at the sys$create().
Jan van den Boogaard
Frequent Advisor

Re: Pros and cons of HIGHWATERMARKING and INIT/ERASE

Thanks for all the input !!

After reading the answers and after searching the docs for "highwater" AND ALSO for "high-water" (with hyphen!) my own conclusion is:

A lot depends on how applications behave.

Highwatermarking can have a considerable overhead when the "erase-on-allocate" feature is triggered.

When a new disk is initialized , and INIT/ERASE is used, and SET VOL/ERASE is used right from the start of the volume, then highwater marking is of no use because data is erased when deleted or purged. So in a situation where files are seldom deleted/purged , the combination INIT/ERASE , SET VOL/ERASE/NOHIGH seems sensible to me.