Operating System - OpenVMS
1828225 Members
2514 Online
109975 Solutions
New Discussion

File copy versus disk cluster size

 
SOLVED
Go to solution
John Symmonds
New Member

File copy versus disk cluster size

When I copy a file on our VMS 7.3-2 system from
DSKA (cluster size 69) to DSKB (cluster size 9) and back again, the file size (blocks used) changes. I'm not sure why this happens, but is there any way I can restore the file back to original condition after copying it?

$Dir/size=all DSKA:[xx]temp.fil
TEMP.FIL;1 2001/2001

$Copy DSKA:[XX]TEMP.FIL DSKB:[XX]TEMP.FIL
$DIR/SIZE=ALL DSKB:[XX]TEMP.FIL
TEMP.FIL;1 2001/2007

$Copy DSKB:[XX]TEMP.FIL DSKA:[XX]TEMP2.FIL
$DIR/SIZE=ALL DSKA:[XX]TEMP2.FIL
TEMP2.FIL;1 2007/2070

TEMP.FIL info:
File organization: Relative, maximum record number: 2147483647
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 2001, Extend: 0, Bucket size: 2
Global buffer count: 0, No version limit, Contiguous
Record format: Fixed length 512 byte records

Thanks,
John
11 REPLIES 11
Steven Schweda
Honored Contributor

Re: File copy versus disk cluster size

> File organization: Relative, [...]

Hmmm. I don't do much with these.

What does BACKUP do?
John Symmonds
New Member

Re: File copy versus disk cluster size


with backup,
$DIR/SIZE=ALL DSKA:[XX]TEMP2.FIL
TEMP2.FIL;1 2001/2070

This seems better. Is COPY broken or should I have expected this behaviour?

Hoff
Honored Contributor

Re: File copy versus disk cluster size

Is this box reasonably current on its OpenVMS Alpha V7.3-2 patches? If it is not, do start there.

A DUMP /HEADER /BLOCK=END=0 and a DIRECTORY /FULL before and after the copy, too, please?

And the obvious question, why do you care? Some sort of a data integrity check in use locally? Any particular untoward behavior noted? Sheer curiosity?
John Symmonds
New Member

Re: File copy versus disk cluster size


All the latest patches as of mid-July 2009 have been applied.

The reason that it matters is that the program which uses this file, uses the 'blocks used' in it's index calculation so it could encounter errors if that number changes. We are looking into migrating our disk storage to a SAN which will probably have different disk cluster sizes.

I've attached a text file which has the Dir/full and Dump/header output for all 3 files.

Thanks,
John
Jess Goodman
Esteemed Contributor

Re: File copy versus disk cluster size

The used-block count from directory comes from the RMS file attruibute EBK (end-of-block), and of course there is a companion attribute, FFB (first-free-byte).

The RMS manual says this:

XAB$L_EBK Field

The XAB$L_EBK field is meaningful for sequential files only.

XAB$W_FFB Field

The XAB$W_FFB field is meaningful for sequential files only.
I have one, but it's personal.
John McL
Trusted Contributor

Re: File copy versus disk cluster size

2001 is a multiple of 69 blocks and the end of file marker is notionally the start of the next block.

2007 is a multiple of 9 blocks (disk allocation on that disk) and the end of file marker is given a block 2002.

Copying it back to the first disk is seen as copying 2002 blocks, which according to cluster size needs 2070 blocks (the next multiple of 69 blocks)

I think the problem lies in having records of 512 bytes because this forces the EOF marker to refer to the next block, whose existence is inconsistent.
Hein van den Heuvel
Honored Contributor

Re: File copy versus disk cluster size

RMS _does_ maintain the EBK for relative file.
If the file is extended by RMS for record processing (Copy uses block processing) then the EOF is set to the allocated block minus one, and on the extent all buckets are initialized to make sure no data magically appears.

The reason for the growth itself is easy right? Just a matter of lowest common denominator.
But the transition from 2001/2007 to 2007/2070 looks very suspect. If High Water Marking is not enabled then that may have given the file a few more record than it had.

I'd be tempted to check with DUMP/RECO=(COUNT=1,START=1001) on the original and final.

I would also get onto an 8.3 system (EISNER?) to repeat the experiment but have no time right now to do a solid experiment.

For proper experiments, just use and MD or LD device with selected cluster sizes and. Perhaps disable HWM and pre-fill space with a pattern (PERL, CONVERT/PAD, whatever) delete file to get the dirty space next and do the copy experiment.

Using the EOF is relatively scary, as you can only use that when also taking the bucket size and record size into consideration.

This file is really SILLY... a record size of 512 will be pre-pended with a 1 byte flag which will cause a maximum of 1 record per bucket --> 50% waste and 1 IO per record.

Please consider / experiment with a convert to a bucket size of 8 or 16 or so but test extensively as the EOF calculations will changes, and convert 'packs' records.

Hope this helps some,
Hein.
John McL
Trusted Contributor

Re: File copy versus disk cluster size

What effect does $ SET FILE/ATTRIB=(EBK:2002,FFB:0) have on the file?

(It would be a good idea to take a copy of the file and test this command on the copy.)

If it resets the two fields in the file header (seen with the DUMP/HEADER) then if the size hasn't trimmed immediately you might try a SET FILE/TRUNCATE .

I think the question is whether COPY is happy with a relative file finishing precisely on a block (and cluster) boundary.


Hein van den Heuvel
Honored Contributor
Solution

Re: File copy versus disk cluster size

Actually.. I spoke too soon.

The true EOF for a relative file is maintained in the header. You can see that easily enough with $ ANAL/RMS/INT.

RMS Actually 'fixes' the File system EOF to match this when opening the file shared.

>> but is there any way I can restore the file back to original condition after copying it?

Yes!

Watch this (8.3):

$ convert/fdl="fil; org rel; rec; for fix; siz 512"/pad/trun tt: tmp.old
aap
noot
mies
$
$ mcr sysman io connect mda1 /driver=sys$mddriver/noadap
$ ini mda1: hein/clus=23/size=5000/max=20/nohigh
$ mou mda1: hein
$ copy/log tmp.old mda1:[000000]tmp.tmp
%COPY-S-COPIED, TMP.OLD;1 copied to MDA1:[000000]TMP.TMP;1 (9 blocks)
$ copy/log mda1:[000000]tmp.tmp sys$login:tmp.new
%COPY-S-COPIED, MDA1:[000000]TMP.TMP;1 copied to TMP.NEW;1 (23 blocks)
$ dir/size=all tmp.old;,.new;,MDA1:[000000]TMP.TMP;

TMP.NEW;1 23/24
TMP.OLD;1 9/9
Directory MDA1:[000000]
TMP.TMP;1 9/23

$ open/read/write/share=write new tmp.new
$ close new
$ dir /size=all tmp.new;
TMP.NEW;1 9/24
$ set file/trun tmp.new
$ dir /size=all tmp.new;
TMP.NEW;1 9/9


Also, the RMS DEFAULT for creating a 512 byte fixed length record relative file is a 2 block bucket. So RMS is silly as well. It's excuse is that it can not read your mind. It does not know how the file will be used. You should.

Those 512 byte records more often than not proves that a little knowledge is a dangerous thing to have, or 'some folks know just enough to be dangerous'. They know about 512 byte blocks, and will 'fill out' a record to a 'nice' 512. Why? In this example, due to the flag byte, it becomes horrible.

Hope this helps better,
Hein.
Hein van den Heuvel
Honored Contributor

Re: File copy versus disk cluster size

Argh... I wrote 'header' but intended 'prologue'.
That would be the 'internal header', or VBN 1.

John McL, Relative file are organized in BUCKETS and therefore is is utterly irrelevant whether the record size has a special value.

Yes you can SET FILE/ATTR=EBK=xxx for a relative file.
RMS itself will ignore and read up until the prologue EOF.
If you then truncate, RMS will silently stop reading at the HIGH (allocated) block.

ANAL/RMS will complain. For example:
$ dir/size=all tmp.new.
TMP.NEW;1 2/6
$ anal/rms tmp.new
:
End-of-File VBN: 10
Prolog Version: 1
*** Attempt to read block with invalid VBN 6.
Unrecoverable error encountered in structure of file.


Hein.
John Symmonds
New Member

Re: File copy versus disk cluster size


Ok here's what I've learned so far:

1) We should use BACKUP to copy files like this whenever possible.
2) Our file is not very well thought-out.
3) RMS fixes the file system EOF when the file is opened for shared access.
4) I've got a lot of reading to do in my spare time to understand some of this.

Also, I have to spend some time to understand how this file is used in our system.
Thanks a LOT for all the info.

Cheers,
John