- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: MountVerify after attempt to add third member ...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 07:23 PM
07-02-2007 07:23 PM
Our VMS guy attempted to copy the system volume onto the EVA8000, by adding a third member to the existing shadow set. This resulted in two members of the cluster losing contact with the system disk, showing the device as MntVerifyTimeout.
We thought it must have been the EVA8000 and/or communication to it, and waited for the HP support people to upgrade firmware and check the configuration and so forth.
Yesterday, a colleague wanted to clone the system disk, to be used in another node. So he mounted a third shadowset member WITHIN the EVA5000. Once again two of the cluster nodes lost contact with the system disk. We literally had to power them off, as not even the console responded.
Any ideas?
By the way, I am not the VMS guy at this site, I just work closely with him. So my VMS knowledge is patchy.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 07:33 PM
07-02-2007 07:33 PM
Re: MountVerify after attempt to add third member of shadow set.
welcome to the OpenVMS ITRC forum.
Please carefully check ERRLOG.SYS and OPERATOR.LOG to determine the exact sequence of events. This is a complex scenario and likely requires a lot of configuration information to be able to understand what has been going on...
If you say 'not event the console responded', did you try CTRL-P ? If the system disk is offline, you would not expect any response from the console. Did you capture the console output ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 09:24 PM
07-02-2007 09:24 PM
Re: MountVerify after attempt to add third member of shadow set.
Apparently our current VMS guy forgot about ctrl-P, perhaps in the panic of the situation. (Our regular VMS guy is on vacation.). He has opened a point with HP and is sending off the console logs and other assorted pieces.
I guess this is not the place for a complex diagnosis. I just thought it was possible that someone might have seen something like this before. I decided to try this forum partly because I use other (not HP) forums from time to time, while my colleagues don't.
I have attached a small section of the operator log from the node that remained up. Nothing was recorded from the other two nodes. The time frame covered mounting the new member to crashing the other two nodes.
Using "anal/error/elv translate" for a one hour period covering the significant events, there were no messages listed that looked significant to me (just timestamps and volume changes).
Normally I would stick to my own patch (Oracle) and leave VMS to the VMS guys, but this problem worries and frustrates me, and it ought to have been diagnosed properly the last time. If it looks like this is getting too complex, I will close the thread and hope HP come up with something, this time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 09:40 PM
07-02-2007 09:40 PM
Re: MountVerify after attempt to add third member of shadow set.
The shadowing driver would have logged an errlog entry on VENUS and HOYLE about the reason for dropping the members, but as you didn't crash those nodes, that information is lost.
What are the values of SHADOW_MBR_TMO in this cluster ? Are the pathes to those disks as expected: SHOW DEV/MULTI showing consistent multipath counts ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 10:06 PM
07-02-2007 10:06 PM
Re: MountVerify after attempt to add third member of shadow set.
As this is V7.3-2, mount-verification messages will tend to be suppressed. Something must have happened on VENUS and HOYLE to cause the shadowing driver to drop all members from the DSA1: shadowset.
What if VENUS and HOYLE could not access the new DGA500 disk ? And OBERON adds it to the shadowset ? Is MSCP-serving of that disk enabled on OBERON ($ SHO DEV/SERVED) ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 10:18 PM
07-02-2007 10:18 PM
Re: MountVerify after attempt to add third member of shadow set.
I have attached the results of "show dev/multi dsa1" from all three nodes.
MSCP is not loaded/enabled. Both Venus and Hole can see DGA500 (as evidenced by Show Dev). I believe that gets into the intricacies of fibrechannel which is well outside my comfort zone.
A question: if the MOUNT command was run on one node without a /CLUSTER qualifier, could that cause problems?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 10:27 PM
07-02-2007 10:27 PM
SolutionThe view of the members in the shadowset must be UNIQUE cluster-wide, i.e. every node must be able to see and access every member in the shadowset.
HOYLE and VENUS can see $1$DGA500 NOW, but could they, when the problem happened ???
Chances are, that DGA500 had just been added to the EVA and a MC SYSMAN IO AUTO had not been run on HOYLE and VENUS ? It had to be run on OBERON, as the new member was to be mounted on that node.
And this has happened twice, always with NEW members being added to an existing clsuter-wide shadowset, am I beginning to see a pattern here ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 10:34 PM
07-02-2007 10:34 PM
Re: MountVerify after attempt to add third member of shadow set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2007 10:49 PM
07-02-2007 10:49 PM
Re: MountVerify after attempt to add third member of shadow set.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 11:03 AM
07-03-2007 11:03 AM
Re: MountVerify after attempt to add third member of shadow set.
the sysman i a not being run is it, as volker suggests. Dean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 11:16 AM
07-03-2007 11:16 AM
Re: MountVerify after attempt to add third member of shadow set.
Could the /system qualifier have been omitted ?
You have checked on each node that the three disks are visible ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 01:36 PM
07-03-2007 01:36 PM
Re: MountVerify after attempt to add third member of shadow set.
In answer to the other questions: The device is currently visible on all three nodes (using SHOW DEVICE), but not mounted. I have not posted this display. But the other nodes have been bounced since then, so the new device would have been picked up then. The original MOUNT command was done with the /SYSTEM qualifier. It was dismounted immediately after the shadow copy finished. This shows up at 14:54:57.69, in the log portion I posted.
A new question: Ideally, I would like to see this hypothesis tested with another shadow set, before it gets tried again on something critical. But, if we can reproduce this situation, with a volume showing as MntVerifyTimeout, is it easy to recover? Would a SYSMAN IO AUTO fix it at that point?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 05:21 PM
07-03-2007 05:21 PM
Re: MountVerify after attempt to add third member of shadow set.
so it does look like my theory did hit the point...
What happens is this: You mount the 3rd member on one node, it does see the new disk (and the old ones - of course) and happily adds it to the shadowset. A shadowset must have a unique state across all nodes it's mounted on, so the other 2 nodes - not being able to see the new member - just drop ALL the members from their view of the shadowset and end up with an shadowset with zero members immediately (shown as MntVerifyTimeout). You can NOT recover from that, because it's already too late. If there are no open files on that shadowset on those 2 nodes, you could DISM/ABORT DSA1:, then run SYSMAN IO AUTO and then remount the shadowset with it's current members, i.e. MOUNT/SYS DSA1: label. But if this a system disk, you're out of luck.
If you would have enabled MSCP-serving across all nodes, you could have prevented this fatal scenario. OpenVMS supports failover to the MSCP-served path at any time and V7.3-2 also supports failback, i.e. the path will go back to the FC path, if that becomes available. In such a config, you could actually pull out all FC cables from one node and it will continue to access the disks via the MSCP-served path - it will not be able to boot that way.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 06:59 PM
07-03-2007 06:59 PM
Re: MountVerify after attempt to add third member of shadow set.
But I think my original question has been answered.
I am not sure what the norm is in this forum. If it is up to me to close the thread, I will leave it another 24 hours in case there are further comments and then close it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 07:10 PM
07-03-2007 07:10 PM
Re: MountVerify after attempt to add third member of shadow set.
MSCP means 'serving' the disks via the cluster interconnect. This presents a path to a disk 'local' to another system. FC disks are considered 'local' in this context.
You can close the topic at any time. It is still possible to enter replies to a closed topic.
I hope you enjoyed the 'ITRC OpenVMS forum' experience ;-)
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-05-2007 07:44 AM
07-05-2007 07:44 AM
Re: MountVerify after attempt to add third member of shadow set.
Dean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-05-2007 08:12 AM
07-05-2007 08:12 AM
Re: MountVerify after attempt to add third member of shadow set.
---
There has been some talk of doing this. For various reasons, the ability to do that didn't exist until recently. What is needed (and now exists) is a "voting" scheme where one node could "veto" a change that another node is proposing, if it doesn't have the capabilities needed to allow the proposed change. Dissimilar device shadowing uses this construct.
That's not to say that the check to add a new member will be done, but it has been discussed. In any event, it wouldn't happen until V8.4.
-- Rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-05-2007 09:30 AM
07-05-2007 09:30 AM
Re: MountVerify after attempt to add third member of shadow set.
I usually almost always enable MSCP (mass storage control protocal), and might
have protected you on this one.
Rob,
I assumed (wrongly) that they
had some defensive protection for a scenero like DSM's, so hence my post. Tx for your
post, I hope the implement it. Dean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-24-2007 05:20 PM
07-24-2007 05:20 PM
Re: MountVerify after attempt to add third member of shadow set.
now, if you would have been running OpenVMS I64 V8.3, here is the patch you would want to install: VMS83I_SYS-V0300
5.2.11 Verify New Shadow Member
5.2.11.1 Problem Description:
If a disk is not presented to all nodes in the cluster, yet the member is added to a shadow set on all nodes, the shadow set will end up in Mount Verification and unable to recover, even after SHADOW_MBR_TMO seconds.
5.2.11.3 Problem Analysis:
A proposed new member is validated on the node which is performing the actual mount. On other nodes, it is added via a "trigger validate". However, if the UCB for the device does not exist, then the trigger validate can not complete but the member cannot be expelled properly because it is not yet a valid member of the shadow set on that node.
I would not expect this to be ever back-ported to V7.3-2
Volker.