- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Quorum Disk Failure
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-05-2004 05:16 AM
тАО02-05-2004 05:16 AM
What happens if the Quorum disk fails?
Is the scenario the same on the hosts - cluster reconfigure with msgs referencing the loss of the quorum disk or does something different happen?
What happens when the failed Quorum disk is made available again?
Do the hosts automatically recongize the appearance of the quorum disk and use it (with appropriate msgs)?
In other words, how does the loss/restore of a quorum disk compare to loss/restore of a host node?
One more question (for extra points!). What happens when the quorum disk "temporarily" disappears? We have our quorum disk on an HP SW SAN. If we make a Zoning or Presentation change that affects the quorum disk, besides VMS going through a Mount Verify, what does the host recovery look like?
Thanks much
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-05-2004 05:25 AM
тАО02-05-2004 05:25 AM
Re: Quorum Disk Failure
(in QDISKINTERVAL seconds)
You will see messages about the quorum disk being unavailable and as long as there are enough votes then the cluster will continue. When the qdsk returns it will be recognised as the quorum disk by the presence of the QUORUM.DAT and the votes (QDSKVOTES) counted.
I assume both systems are directly connected to the quorum disk.
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2004 06:15 AM
тАО02-06-2004 06:15 AM
SolutionIt is indirect. The first node to reconnect checks the quorumdisk for recent 'stamps' by the other node(s). They are not there, so it simply leaves its own imprint. The second one finds teh stamp (and it better be of a known node). It also leaves its trace, and the next time #1 comes along it can conclude that the quorum disk is a valid member again.
Even IF (unwanted situation) the departure of the quorumdisk leads to a loss of quorum (the cluster 'hangs') this mechanism is above the hang, and if the return of Qdsk suffices to regain quorum, that WILL be recognised, and the hang will be over.
jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2004 08:33 AM
тАО02-06-2004 08:33 AM
Re: Quorum Disk Failure
Only in the event of a double failure does the surviving node hang.
Maybe that will help someone.
john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2004 09:14 PM
тАО02-06-2004 09:14 PM
Re: Quorum Disk Failure
works fine, but not even needed.
@ nodes each 1 vote + qdsk 1 vote = 3 votes expected. Any single voter gone leaves 2 vote +> quorum maintained.
The abovementioned temporary hang (and resume) would occur at one node out (eg, maintenace) and THEN having your SAN disconnecting and reconnecting the qdsk.
hth
Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-07-2004 12:36 AM
тАО02-07-2004 12:36 AM
Re: Quorum Disk Failure
"@ nodes" should read "2 nodes"
" +> " " " " => "
... I sometimes (have to) work on systems with different keyboard layouts. It should have been forbidden, but then, who should be allowed to declare "THE" correct layout?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-08-2004 10:29 PM
тАО02-08-2004 10:29 PM
Re: Quorum Disk Failure
I agree with Jan. Why setting votes=2 for cluster members ? It works fine for me with votes=1 .
Best regards,
Lokesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-08-2004 11:01 PM
тАО02-08-2004 11:01 PM
Re: Quorum Disk Failure
IF
1) node 1 is stopped with REMOVE_NODE
2) the quorum disk gets lost after 1) has completed
THEN
your cluster is still alive (because a minority of 1 vote left the cluster with total votes equal to 3, so 2 votes left).
If you have 1-1-1, the cluster would hang until the disk is replaced or the second node is rebooted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-09-2004 12:10 AM
тАО02-09-2004 12:10 AM
Re: Quorum Disk Failure
Indeed, that IS the reason to use 2-2-1.
Then the issue of SAN disconnecting/reconnecting disappears nearly completely: Only if one node left WITHOUT adjustment (eigther by crash or by operator forgetting "remove_node"), THEN if the SAN connection disappears BEFORE a SET CLUSTER/EXPECTED,only THEN will the hang still occur. Should be very rare.
Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-09-2004 01:25 AM
тАО02-09-2004 01:25 AM
Re: Quorum Disk Failure
Thanks for explaning the advantage of 2-2-1 . I will note it down.
Best regards,
Lokesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-09-2004 05:54 AM
тАО02-09-2004 05:54 AM
Re: Quorum Disk Failure
a parallel to the Remove_Node option is when one has a system crash.
I set up the 2-2-1 votes so that if Node A crashes, then I can go to Node B's console and adjust quorum:
^p
>>> dep sirr c
>>> cont
>>> q
>>> ^z
Now I can lose the quorum disk too and the surviving node can stay up as a standalone system.
BE very CAREFUL when using the above console commands. If I entered the above incorrectly *and/or* one makes a typo, the surviving node can crash as well. Test it before needing it.
john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-09-2004 06:38 AM
тАО02-09-2004 06:38 AM
Re: Quorum Disk Failure
just THAT would be your emergency escape if you get to HANG, eg 1-1-1 and losing 2.
In the 2-2-1 scheme, if one crashes, you DON'T have to IPC the other node. If your cluster is NOT YET hung, but you fear the crashed node might be down for whatever you define as 'a prolongued time', then you get to exactly the same result by SET CLUSTER/EXPECTED from any sufficiently privileged (CMKRNL, SYSNAM, & SYSLCK) process.
Alternatively, AMDS or Availability Manager if installed and configured correctly will do the same for you, and also in a much more controlled way.
Then again, I ALWAYS carry a note with the IPC sequence in my wallet. In 20 years I needed it twice, and then carrying a small note is invaluable!
Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-10-2004 02:25 AM
тАО02-10-2004 02:25 AM
Re: Quorum Disk Failure
I guess I still have an old school mentality since I still remember having to hit bit 28 to boot TOPS10 on a KI10 (are there any KI10's left on the planet??? or TOPS10 for that matter?)
> Then again, I ALWAYS carry a note with the IPC sequence in my wallet. In 20 years I needed it twice, and then carrying a small note is invaluable!
Once was all I really needed it so you have me beat. :-)
Kept it on paper as well until I recently bought an iPaq 1945; something about paper was more reassuring though . . .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-10-2004 03:23 AM
тАО02-10-2004 03:23 AM
Re: Quorum Disk Failure
1) This is probably obvious, but what
does IPC stand for in the previous msgs
context?
2) In John's msg, he has ">>> dep sirr c",
which is a deposit cmd, but what is
"sirr" and why the value "c"?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-10-2004 04:42 AM
тАО02-10-2004 04:42 AM
Re: Quorum Disk Failure
C = ipl at which to request the interrupt i.e. 12
The handler for software interrupts at 12 is the recalculate quorum routine which has a prompt of IPC (interrupt prio C).
The use of this routine is not recommended nowadays. Use AMDS or Availability Manager. If you have not got AMDS/AvailMgr setup do so now!
They are invaluable for many reasons.
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-10-2004 06:32 PM
тАО02-10-2004 06:32 PM
Re: Quorum Disk Failure
Remember when you fix the quorum by hand, you have the gate open for a split cluster.
E.g. an interbuilding cluster with a quorum station in 1 building. The quorum station + 1 node in the same building go down. The remaining node in the other building is alive but blocked because of missing quorum. You adjust the quorum and the single node continues. The interbuilding link is down and the nodes in the other building resume activity (e.g. after power failure). You now have 2 clusters.
That's why my procedures refuse to boot if I don't see all disks in both buildings.