Operating System - OpenVMS
1753918 Members
7863 Online
108810 Solutions
New Discussion юеВ

MountVerify after attempt to add third member of shadow set.

 
SOLVED
Go to solution
Thomas Ritter
Respected Contributor

Re: MountVerify after attempt to add third member of shadow set.

DSM, what was the actual mount command ?
Could the /system qualifier have been omitted ?
You have checked on each node that the three disks are visible ?
DSM_1
Advisor

Re: MountVerify after attempt to add third member of shadow set.

I don't have a definitive answer for the first event, yet. But for this week's event, my colleague confirms that he did not run SYSMAN IO AUTO on all three nodes. He created the volume in the EVA, presented it to all three nodes from a fibrechannel perspective, ran SYSMAN IO AUTO on the one node (Oberon), only, and then ran the mount command. This sounds like a good explanation to me.

In answer to the other questions: The device is currently visible on all three nodes (using SHOW DEVICE), but not mounted. I have not posted this display. But the other nodes have been bounced since then, so the new device would have been picked up then. The original MOUNT command was done with the /SYSTEM qualifier. It was dismounted immediately after the shadow copy finished. This shows up at 14:54:57.69, in the log portion I posted.

A new question: Ideally, I would like to see this hypothesis tested with another shadow set, before it gets tried again on something critical. But, if we can reproduce this situation, with a volume showing as MntVerifyTimeout, is it easy to recover? Would a SYSMAN IO AUTO fix it at that point?
Volker Halle
Honored Contributor

Re: MountVerify after attempt to add third member of shadow set.

DSM,

so it does look like my theory did hit the point...

What happens is this: You mount the 3rd member on one node, it does see the new disk (and the old ones - of course) and happily adds it to the shadowset. A shadowset must have a unique state across all nodes it's mounted on, so the other 2 nodes - not being able to see the new member - just drop ALL the members from their view of the shadowset and end up with an shadowset with zero members immediately (shown as MntVerifyTimeout). You can NOT recover from that, because it's already too late. If there are no open files on that shadowset on those 2 nodes, you could DISM/ABORT DSA1:, then run SYSMAN IO AUTO and then remount the shadowset with it's current members, i.e. MOUNT/SYS DSA1: label. But if this a system disk, you're out of luck.

If you would have enabled MSCP-serving across all nodes, you could have prevented this fatal scenario. OpenVMS supports failover to the MSCP-served path at any time and V7.3-2 also supports failback, i.e. the path will go back to the FC path, if that becomes available. In such a config, you could actually pull out all FC cables from one node and it will continue to access the disks via the MSCP-served path - it will not be able to boot that way.

Volker.
DSM_1
Advisor

Re: MountVerify after attempt to add third member of shadow set.

Thanks. I am tempted to keep asking questions. I know nothing about MSCP. I shall do some reading and googling.

But I think my original question has been answered.

I am not sure what the norm is in this forum. If it is up to me to close the thread, I will leave it another 24 hours in case there are further comments and then close it.
Volker Halle
Honored Contributor

Re: MountVerify after attempt to add third member of shadow set.

DSM,

MSCP means 'serving' the disks via the cluster interconnect. This presents a path to a disk 'local' to another system. FC disks are considered 'local' in this context.

You can close the topic at any time. It is still possible to enter replies to a closed topic.

I hope you enjoyed the 'ITRC OpenVMS forum' experience ;-)

Volker.
Dean McGorrill
Valued Contributor

Re: MountVerify after attempt to add third member of shadow set.

so it was your theroy Volker. it would seem to me some user defensive coding by vms could check with the other nodes and return something like "device not configured on other cluster member" in a case like this. (Not that it will get done..)

Dean
Robert Brooks_1
Honored Contributor

Re: MountVerify after attempt to add third member of shadow set.

it would seem to me some user defensive coding by vms could check with the other nodes and return something like "device not configured on other cluster member" in a case like this. (Not that it will get done..)

---

There has been some talk of doing this. For various reasons, the ability to do that didn't exist until recently. What is needed (and now exists) is a "voting" scheme where one node could "veto" a change that another node is proposing, if it doesn't have the capabilities needed to allow the proposed change. Dissimilar device shadowing uses this construct.

That's not to say that the check to add a new member will be done, but it has been discussed. In any event, it wouldn't happen until V8.4.

-- Rob
Dean McGorrill
Valued Contributor

Re: MountVerify after attempt to add third member of shadow set.

DSM,
I usually almost always enable MSCP (mass storage control protocal), and might
have protected you on this one.

Rob,
I assumed (wrongly) that they
had some defensive protection for a scenero like DSM's, so hence my post. Tx for your
post, I hope the implement it. Dean
Volker Halle
Honored Contributor

Re: MountVerify after attempt to add third member of shadow set.

DSM,

now, if you would have been running OpenVMS I64 V8.3, here is the patch you would want to install: VMS83I_SYS-V0300

5.2.11 Verify New Shadow Member

5.2.11.1 Problem Description:

If a disk is not presented to all nodes in the cluster, yet the member is added to a shadow set on all nodes, the shadow set will end up in Mount Verification and unable to recover, even after SHADOW_MBR_TMO seconds.

5.2.11.3 Problem Analysis:

A proposed new member is validated on the node which is performing the actual mount. On other nodes, it is added via a "trigger validate". However, if the UCB for the device does not exist, then the trigger validate can not complete but the member cannot be expelled properly because it is not yet a valid member of the shadow set on that node.


I would not expect this to be ever back-ported to V7.3-2

Volker.