Operating System - OpenVMS
1752795 Members
6053 Online
108789 Solutions
New Discussion юеВ

Re: Result of MVTIMEOUT expiring.

 
SOLVED
Go to solution
The Brit
Honored Contributor

Result of MVTIMEOUT expiring.

Just in the interest of clarity.

If a unit in a shadowset has a problem for some reason or other and the shadow set ends up in a MountVerify state, what happens to the shadowset when the MVTIMEOUT times out??

1. Does the problem unit dismount.
2. Does the shadowset dismount.
3. Does the shadowset just go offline.
4. other ??

thanks

Dave.
7 REPLIES 7
Robert Brooks_1
Honored Contributor

Re: Result of MVTIMEOUT expiring.

In general, the problematic member will get tossed some time after SHADOW_MBR_TMO seconds

From the online help . . .

Sys_Parameters

SHADOW_MBR_TMO

SHADOW_MBR_TMO controls the amount of time the system tries to
fail over physical members of a shadow set before removing them
from the set. The SHADOW_MBR_TMO parameter replaces the temporary
VMSD3 parameter used in prior releases.

The SHADOW_MBR_TMO parameter is valid for use only with Phase II
of Volume Shadowing for OpenVMS. You cannot set this parameter
for use with Phase I, which is obsolete.

Use the SHADOW_MBR_TMO parameter (a word) to specify the number
of seconds, in decimal from 1 to 65,535, during which recovery
of a repairable shadow set is attempted. If you do not specify
a value or if you specify 0, the default delay of 120 seconds is
used.

Because SHADOW_MBR_TMO is a dynamic parameter, you should use the
SYSGEN command WRITE CURRENT to permanently change its value.

SHADOW_MBR_TMO is a DYNAMIC parameter.

Note that the above is true for a multi-member set where only one member is in trouble. For a single-member set, or a set with all members in trouble, the virtual unit itself times out of mount verification, just as a non-shadow member will.

The concept of mount verification is a somewhat twisted topic, and shadowing makes a complicated topic even more weird.

-- Rob
Wim Van den Wyngaert
Honored Contributor

Re: Result of MVTIMEOUT expiring.

1. Does the problem unit dismount.
Yes
2. Does the shadowset dismount.
No
3. Does the shadowset just go offline.
No
4. other ??
See Roberts answer.

We have shadow_mbr_tmo on 180 seconds and mvtimeout on 900.

After 180 seconds the 2nd member is thrown out. If both members are in failure, it will take 900 seconds before the shadow member is removed. A single disk (e.g. pagefile) will also timeout after 900 sec.

During the wait, all IO's are accepted and go in hang.

You can force the dismount with dism /abort/force member_dev

Wim
Wim
Hoff
Honored Contributor

Re: Result of MVTIMEOUT expiring.

Dave, what might you be up to here? Your two-point value to Robert's response tells me you're headed in a different direction than most questions here. What's your version and your configuration?

The status of the shadowset virtual unit is deliberately fairly separate from that of its constituents; its member volumes. If the whole shadowset drops offline as your question implies (states?), the whole of the I/O system is stuffed up. This could be due to multiple member failures or due to bug(s) in host-based volume shadowing (HBVS) or mount verification (MV) or elsewhere in the I/O stack.

There are a number of important fixes in the FIBRE_SCSI ECO packages; these kits are often where fixes related to HBVS or MV errors are shipped.

When I've seen entire shadowsets drop into MV, it's usually either a seriously stuffed-up FC SAN controller, a firmware error, or other more systemic problem (eg: a lost lock). I can't say I've seen an HBVS drop entirely offline, I've usually found these wedged in MV. And in the cases I've worked, the MV state usually derailed critical applications. (If you need to know what happens with your hardware and your configuration, run some tests.)

There is fairly little clarity within HBVS and MV; this whole area of OpenVMS is inherently very twisty code, and exceedingly device- and version-dependent.


The Brit
Honored Contributor

Re: Result of MVTIMEOUT expiring.

Hoff,
The answers above, were really related to the behaviour of SHADOW_MBR_TMO, which was not mentioned in the OP.

SHADOW_MBR_TMO is fairly easy to understand, however as Rob Brooks points out, MVTIMEOUT is a somewhat different and more complex consideration.

Basically, we are adding a third shadow unit to our volumes, and this unit is located in a remote facility (~5km away). What I was trying to pin down was "what happens to the shadow volume if it enters a MountVerify state which is not resolved, and then MVTIMEOUT expires."

My understanding is that MVTIMEOUT acts on the volume as a whole, and then the question becomes, "Does the volume then dismount? And if so, I assume all outstanding IO's are lost!!"

Although we were originally considering it in the context of the data center separation, in retrospect, I don't thing it is actually relevant to that, so the discussion just became a "search for enlightenment"

thanks

Dave.
Robert Brooks_1
Honored Contributor
Solution

Re: Result of MVTIMEOUT expiring.

My understanding is that MVTIMEOUT acts on the volume as a whole, and then the question becomes, "Does the volume then dismount? And if so, I assume all outstanding IO's are lost!!"

--

When a volume times out of mount verification,
any I/O's that are still "known" to the relevant device driver are returned to the calling application(s) with a failure status (probably SS$_VOLINV).

Once a device has timed out of mount verification, the volume "valid" bit is cleared. So, while the volume is still mounted, no I/O's can be sent to a volume whose valid bit is clear. If you try to do I/O to such a volume, it'll fail immediately with SS$_VOLINV.


-- Rob
The Brit
Honored Contributor

Re: Result of MVTIMEOUT expiring.

Thanks Rob,

That was exactly the information I was trying to pin down.

Dave
The Brit
Honored Contributor

Re: Result of MVTIMEOUT expiring.

Got what I need.

Thanks

Dave.