Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

INIT/SHADOW vs INIT off by one error?

SOLVED
Go to solution
Jon Pinkley
Honored Contributor

INIT/SHADOW vs INIT off by one error?

I am using Alpha VMS 7.3-2 with updates current as of May 6, 2007.

I was initializing two devices to be used as a HBVS shadowset, and used the following command:

$ init /shadow=($4$dkc207:,$1$dga3207:)/erase/system/header=1000/cluster=8/limit/index=begin/user=ils/own=[1,1] syslog_22

$4$dkc207: is an RZ29B (4.3 GB) on an HSZ40 controller pair, $1$DGA3207: is a 5 GB vdisk on an EVA6000.

I expected the logical volume size of the shadowset to be the size of the smaller device, in this case the RZ29B.

However, the logical volume size is 1 block smaller than the size of the RZ29B.

If I use init on the RZ29B without /shadow, then create a shadowset with the member, it is the full size.

Is this an off by one coding error in INIT/SHADOW (using the highest LBN instead of number of blocks) when setting the size of the device?

I am not worried about the loss of one block of disk space; in fact with a clustersize of 8 it makes no difference in number of available clusters, since 8378028 is not a multiple of 8.
However, it seems odd to me that the logical volume size is different based on whether /shadow is used.

And I tried the obvious solution:

Neither

$ set volume/size dsa3207:

nor

$ set volume/size=8378028 dsa3207:

changed the logical size from 8378027 to 8378028, even with dismounting/remounting.

So it looks to me like there are two inconsistencies, as the set volume meets all the requirements listed in the help for set volume/size. I noticed this earlier in another adventure with V7.3-2 BACKUP (id "AXP72R001") not preserving /limit or /size, when using backup/image/noinit.

I am attaching log file demonstrating my findings.
it depends
18 REPLIES
Jan van den Ende
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Jon,

it looks odd, but I can well live with it, except
>>>
I noticed this earlier in another adventure with V7.3-2 BACKUP (id "AXP72R001") not preserving /limit or /size, when using backup/image/noinit.
<<<
Do you mean, you INITed the volume /LIMIT=..
and then BACKUP/NOINIT does _NOT_ leave your /LIMIT in the target;
or do you mean
you have a BACKUP/INIT of a volume with /LIMIT, and upon restore /NOINIT that value gets lost?
I have not yet been in a position to do either, but if it is confirmed, that would be a nasty, and potentially troublesome bug indeed!!

Please clarify.

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.
Volker Halle
Honored Contributor
Solution

Re: INIT/SHADOW vs INIT off by one error?

Jon,

I've tried this on a simple LD device with OpenVMS Alpha V8.2 and I don't see this behaviour:

$ LD CONN file LDA1:
$ INIT LDA1: TEST/LIMIT
$ MOUNT/OVER=ID LDA1:
$ SHOW DEV/FULL LDA1:

and same sequence with

$ INIT TEST/SHAD=(LDA1:)/LIMIT

In both cases, the Total Blocks and Logical Volume Size exactly match the size of the LD container file.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Could it have siomething to do with the /erase ? You didn't use /erase on the single disk init. And there is an error on the disk. Could it be a bad block that is subtracted from the size while erasing ?

Wim (without anything higher than 7.3, so no test env)
Wim
Jon Pinkley
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Volker,

If it works in 8.2 then it must have been fixed.

Here's a small reproducer with LD v9 on 7.3-2

$ ld create disk$archive:[000000]disk1.dsk/size=10000/contig/nobackup
$ ld create disk$archive:[000000]disk2.dsk/size=11000/contig/nobackup
$ ld conn disk$archive:[000000]disk1.dsk lda1/share
$ ld conn disk$archive:[000000]disk2.dsk lda2/share
$ sho dev ld

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$LDA0: (SIGMA) Online 0
$4$LDA1: (SIGMA) Online 0
$4$LDA2: (SIGMA) Online 0
$ init/shad=($4$lda1:,$4$lda2:)/erase itrc/cluster=1/system/index=begin/own=[1,1]/user=itrc
$ mou/sys/noassist dsa999: /shadow=($4$lda1:,$4$lda2:) itrc
%MOUNT-I-MOUNTED, ITRC mounted on _DSA999:
%MOUNT-I-SHDWMEMSUCC, _$4$LDA1: (SIGMA) is now a valid member of the shadow set
%MOUNT-I-SHDWMEMSUCC, _$4$LDA2: (SIGMA) is now a valid member of the shadow set
$ sho dev/ful dsa999

Disk DSA999:, device type Generic SCSI disk, is online, mounted, file-oriented
device, shareable, available to cluster, error logging is enabled, device
supports bitmaps (no bitmaps active).

Error count 0 Operations completed 25
Owner process "" Owner UIC [SYSTEM]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W
Reference count 1 Default buffer size 512
Total blocks 10000 Sectors per track 10
Total cylinders 100 Tracks per cylinder 10
Logical Volume Size 9999 Expansion Size Limit 12288

Volume label "ITRC" Relative volume number 0
Cluster size 1 Transaction count 1
Free blocks 9966 Maximum files allowed 2750
Extend quantity 5 Mount count 1
Mount status System Cache name "_$4$DKA407:XQPCACHE"
Extent cache size 64 Maximum blocks in extent cache 996
File ID cache size 64 Blocks in extent cache 0
Quota cache size 0 Maximum buffers in FCP cache 4475
Volume owner UIC [FILSYS] Vol Prot S:RWCD,O:RWCD,G:RWCD,W:RWCD

Volume Status: ODS-2, subject to mount verification, erase on delete, file
high-water marking, write-back caching enabled.

Disk $4$LDA1:, device type Generic SCSI disk, is online, member of shadow set
DSA999:, shadow set virtual unit.

Error count 0 Shadow member operation count 141
Allocation class 4

Disk $4$LDA2:, device type Generic SCSI disk, is online, member of shadow set
DSA999:, shadow set virtual unit.

Error count 0 Shadow member operation count 145
Allocation class 4

$
it depends
Jon Pinkley
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Jan,

In VMS 7.3-2 you can initialize a volume with a /SIZE less than the physical size of the device. You would normally do this so you could shadow with a smaller device. In my case, I wanted to shadow an EVA vdisk that has 1GB size granularity with an RZ29B, so I initialized the vdisk with /SIZE=8378028 and /LIMIT. I then did a backup/image/noinit from the RZ29 to the vdisk, planning to convert the vdisk into a shadowset, and then add the RZ29 back into the shadowset with copy.

However, that backup (with VMS 7.3-2) results in the warning
%BACKUP-I-LOGNOTPRES, logical volume size of volume $1$DGA3205: not preserved
%BACKUP-I-LIMITNOTPRES, expansion size limit of volume $1$DGA3205: not preserved

I came up with a work around, thanks to Jur's LDDRIVER

$ ld connect $1$DGA3205 /LBN=(start:0,size:8378028) LDA3205:

Followed by init of LDA3205 with clustersize, etc. followed by backup/image/noinit to LD3205: (still doesn't preserve SIZE, but now SIZE is preserved by LDDRIVER). After backup is complete, dismount LDA3205, mount LDA3205 private, set volume/limit lda3205, dismount LDA3205:; disconnect LDA3205: Mount DGA3205 and it is nearly how you would like it. You are still limited in how large a value MAX_FILES can be set to.

The best solution is to upgrade to 8.3, where BACKUP has been taught about dynamic volume expansion. However, we still have some products that aren't ready for 8.3.

I am attaching a log file showing a reproducer using the LDA1 and LDA2 from my response to Volker.
it depends
Jon Pinkley
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Wim,

I don't believe is has anything to do with errors on the SCSI devices. Same thing happens on an LD device, see my response to Volker.

The errors on the SCSI devices have to do with shared SCSI and the HSZ contollers, they are "normal" for us especially when we shutdown/reboot a node on the shared SCSI bus.

BTW, I just inited the device without /shadow and then added in the other device and let the shadowing synchronize. These were "small" 4.3 GB devices, so it doesn't take two days.
it depends
Volker Halle
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Jon,

I've now repeated your test on V8.2 and ... the error is still there !

$ ld create disk1.dsk/size=10000/contig/nobackup
$ ld create disk2.dsk/size=11000/contig/nobackup
$ ld conn disk1.dsk lda1/share
$ ld conn disk2.dsk lda2/share
$ init/shad=(lda1:,lda2:)/erase itrc/cluster=1/system/index=begin
%INIT-I-LIMITCHANGED, value for /LIMIT increased to 1048576
%INIT-I-LIMITCHANGED, value for /LIMIT increased to 1048576
$ mou/sys/noassist dsa999: /shadow=($1$lda1:,$1$lda2:) itrc
%MOUNT-I-MOUNTED, ITRC mounted on _DSA999:
%MOUNT-I-SHDWMEMSUCC, _$1$LDA1: (AXPVMS) is now a valid member of the shadow set
%MOUNT-I-SHDWMEMSUCC, _$1$LDA2: (AXPVMS) is now a valid member of the shadow set
$ sho dev/full dsa999

Disk DSA999:, device type Foreign disk type 1, is online, mounted, file-oriented
device, shareable, available to cluster, error logging is enabled, device
supports bitmaps (no bitmaps active).

Error count 0 Operations completed 25
Owner process "" Owner UIC [SYSTEM]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W
Reference count 1 Default buffer size 512
Total blocks 10000 Sectors per track 10
Total cylinders 100 Tracks per cylinder 10
Logical Volume Size 9999 Expansion Size Limit 1048576

So this problem is introduced by some combination and values of qualifiers:

If I just do an INIT/SHADOW=(LDA1:), then I get the correct Logical Volume Size !

Volker.
Volker Halle
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Jon,

same problem on OpenVMS Alpha V8.3 with ALL current patches installed.

So it seems to be some bug in the INIT/SHADOW code. Worth a low-priority call to HP...

Volker.
Jan van den Ende
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Jon,

thanks for finding this out!

Now that I know about it, we can deal around it if neede, but I REALLY would hate to have to find out in a real recovery situation!
For the foreseeable future we are stuck on 7.3-2, so will have to keep this int the back of our heads.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Robert Brooks_1
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Please formally log a call with HP.

I'm pretty sure that the shadowing engineer has already fixed this problem for V8.3-1H1, but I'm not sure if it has made it back into older versions.


-- Rob
Jon Pinkley
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

---------------------------
Volker>>>
So this problem is introduced by some combination and values of qualifiers:

If I just do an INIT/SHADOW=(LDA1:), then I get the correct Logical Volume Size !

Volker<<<
---------------------------

I just repeated the reproducer but used two LD device that were the same size.

In this case, the logical volume size is created correctly. So it seems that the trigger is the dissimilar size; instead of having more than a single device in the /shadow list.

$ ld create disk$archive:[000000]disk1.dsk/cont/size=10000/noback
$ ld create disk$archive:[000000]disk2.dsk/cont/size=10000/noback
$ ld connect disk$archive:[000000]disk1.dsk lda1:/share
$ ld connect disk$archive:[000000]disk2.dsk lda2:/share
$ init/shadow=(lda1:,lda2:)/erase/cluster=1/own=[1,1]/index=begin/limit=100000/headers=1000 itrc
$ mou/system dsa999: /shadow=($4$lda1:,$4$lda2:) itrc
%MOUNT-I-MOUNTED, ITRC mounted on _DSA999:
%MOUNT-I-SHDWMEMSUCC, _$4$LDA1: (SIGMA) is now a valid member of the shadow set
%MOUNT-I-SHDWMEMSUCC, _$4$LDA2: (SIGMA) is now a valid member of the shadow set
$ sho dev/ful dsa999

Disk DSA999:, device type Generic SCSI disk, is online, mounted, file-oriented
device, shareable, available to cluster, error logging is enabled, device
supports bitmaps (no bitmaps active).

Error count 0 Operations completed 25
Owner process "" Owner UIC [SYSTEM]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W
Reference count 1 Default buffer size 512
Total blocks 10000 Sectors per track 10
Total cylinders 100 Tracks per cylinder 10
Logical Volume Size 10000 Expansion Size Limit 12288

Rest removed by Jon

Another annoyance is that the volumes are processed sequentially, but are not allocated at the start. So the following can happen:

$ sho dev ld

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$LDA0: (SIGMA) Online 0
$4$LDA1: (SIGMA) ShadowSetMember 0 (member of DSA999:)
$4$LDA2: (SIGMA) ShadowSetMember 0 (member of DSA999:)
$ dism dsa999
$ mou $4$lda2:/ov=id
%MOUNT-W-VOLSHDWMEM, mounting a shadow set member volume; volume write locked
%MOUNT-I-MOUNTED, ITRC mounted on _$4$LDA2: (SIGMA)
$ init/shadow=(lda1:,lda2:)/erase/cluster=1/own=[1,1]/index=begin/limit=100000/headers=1000 itrc2
%SYSTEM-F-DEVMOUNT, device is already mounted
$ mou/ov=id lda1
%MOUNT-W-VOLSHDWMEM, mounting a shadow set member volume; volume write locked
%MOUNT-I-MOUNTED, ITRC2 mounted on _$4$LDA1: (SIGMA)
$

Note that the the error did not occur until after the first device had been erased/initialized (evidenced by the changed label).

The problem is that it still erased and inited the first volume, and didn't fail immediately. So for example if the devices were 1 TB in size instead of 10000 blocks, you wouldn't know about the problem until a long time after the init command had been issued. In fact, it appears that you could even start using a device in the list, for something else, and in the worst case, could finish using it and dismount it before the device was initiallized.

Example data loss scenario.

Proc1$ init/shadow=('TBdev1','TBdev2')/erase BIGHBVS 'other_init_qualifiers'

While 'TBdev1' is being initialized/erased something else starts to use 'TBdev2'. The worst case is if data is written to it, and then the device is dismounted, all while 'TBdev1' is still being erased.

Example follows:

Proc2$ init 'TBdev2' archive
Proc2$ mou/ov=id 'TBdev2'
Proc2$ backup disk1:[000000.important...] disk$archive:[*...]*.*;*/own=orig/ver/delete
Proc2$ dismount 'TBdev2'

Some time later, the init/erase of 'TBdev1' completes, and the init/erase of 'TBdev2' starts. All the data written to the 'TBdev2' is overwritten.

While the above is not a likely scenareo, it shouldn't be possible, i.e. the init command should allocate every device in the list, and fail if it cannot, before it starts to do any writing.
it depends
Volker Halle
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Jon,

the INIT/SHADOW problem, where the logical volume size was off by one, has been solved in V7.3-2 and higher. The solutions will appear in some future ECO kit.

Volker.
Hoff
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

FWIW, and now largely lost in the mists of time, this particular (mis?)feature was deliberately introduced, and intended to allow volumes initialized with /SHADOW to be differentiated from other volumes. It's obviously not particularly useful. And yes, I know who implemented this.
Robert Brooks_1
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

It's obviously not particularly useful. And yes, I know who implemented this.

====

I have an idea who did it, and if I'd known about it at the time, I'd have "appealed to a higher authority" to have it removed . . .

-- Rob
Jon Pinkley
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

"this particular (mis?)feature was deliberately introduced, and intended to allow volumes initialized with /SHADOW to be differentiated from other volumes."

Since it isn't a particuarly harmful feature, I can see how it has remained dormant for a while. What I don't understand is why it only "flags" the volumes when they are different sizes. Volker discovered that when he was not able to reproduce my problem when he used two identically sized LD devices. My point is that it doesn't seem to indicate that the /SHADOW qualifier was used, it seems to imply that an INIT /SHADOW was used to initialize a set of disk that did not all have the same MAXBLOCK value.

If knowing that the devices were initialized with INIT/SHADOW is useful (How?), then I would have expected a flag bit in something in the homeblock or storage control block. For example a bit in HM2$W_VOLCHAR.
it depends
Hoff
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

The max-1 technique is non-volatile and it is propagated past volume dismount and remount (and reboot), and it doesn't require having the shadowing driver have knowledge of the file structure.

More recent disks can have non-volatile storage out in the drive reserved for the driver or the OS (or for malware); storage outside of the file system. Something which would obviate the max-1, but does require devices to have the storage -- and not all do. Both SAS and SCSI devices can -- but not necessarily do -- have this storage. There's a mechanism that can allow this on IDE widgets per the T10 specs, but it's not transportable.

There are drivers that know the innards of the volume structure, and this is a design decision. In some ways it makes the driver somewhat easier, but it also means the driver has to have added knowledge.

There are cases of this "carnal volume knowledge" within the EFI console that show these basic trade-offs; where the console "knows" the disk structures, and strange and (initially) mysterious changes can ensue.

Jon Pinkley
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

Ok, I'll take the bait. Where is this max-1 value stored if not in SCB$L_VOLSIZE? I thought the driver was able to determine the MAXBLOCK from the hardware, for example on a SCSI device with READ CAPACITY and sector size; but where could the driver get the "logical size" information if not from the SCB? Is this max-1 technique doing something different that initialize /size does?

If I initialize a 100,000 block LD device using the command:

$ init lda2: fifty /cluster=1 /size=50000 /system

then mount/over=id, I see the following in output of show device /full lda2:

Total blocks 100000 Sectors per track 18
Total cylinders 327 Tracks per cylinder 17
Logical Volume Size 50000 Expansion Size Limit 102400

If I mount the same device /foreign, and do a show device/full it has no notion of logical volume size, since that is defined by ODS in the storage control block (SCB).

As far as I know, the SCB is "non-volatile and it is propagated past volume dismount and remount (and reboot)", but it DOES "require having the shadowing driver have knowledge of the file structure." at least to the extent of knowing how to find the SCB.

And the shadowing driver already has to know about the SCB, since it keeps things like SCB$Q_GENERNUM there. In fact, I don't even think the shadowing driver will allow you to mount a DSA device/foreign, given that to do an image restore you must restore to a member, then reform a single member shadow set with the restored device as the master member, followed by the addition of other shadow set members with full copy.
it depends
Jon Pinkley
Honored Contributor

Re: INIT/SHADOW vs INIT off by one error?

[V83.INIT.LIS]INIVOLEXEC.LIS has this comment:

"If dissimilar devices are specified with /SHADOW, make sure that /SIZE is less than UCB$L_MAXBLOCKS so that INIDSK will always create the same geometry for all the members."

Evidently, a work around for the potential geometry problem has been found since October 2003.
---------
My other observation was that the devices were not all allocated before starting the initializations.

Can anyone think of a reason that you would not want all the devices allocated for the duration of the init?

As far as the allocation of all the devices before initializing, it appears the only work around to safely use init/shadow is to encapsulate the init with a bunch of explicit allocate commands, the initialize, then explicit deallocate commands.

for example:

$ allocate dev1
$ allocate dev2
$ allocate dev3
$ allocate dev4
$ allocate dev5
$ allocate dev6
$ initialize /shadow=(dev1,dev2,dev3,dev4,dev5,dev6) /erase ...
$ deallocate dev1
$ deallocate dev2
$ deallocate dev3
$ deallocate dev4
$ deallocate dev5
$ deallocate dev6

Jon
it depends