- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Three node OpenVMS cluster hanging issues
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2008 11:58 AM
тАО07-23-2008 11:58 AM
I have a three node cluster all running OpenVMS V7.3-2 on ES45's. They are clustered via the LAN. Each time I shutdown (REM,REB) a node, the other nodes hang until the node is booted and the VAX cluster state transisition completes. On each system; VOTES = 1, EXPECTED_VOTES = 3, QUORUM_DISK = $1$DGA110, VAXCLUSTER = 2. We are in need of the quorum disk in case two of the three nodes are down, this way at least one node will maintain the cluster.
Question: Why does the two nodes hang until the third node is booted?
Thank you,
Ron Russik
Ron.Russik@yrcw.com
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2008 12:18 PM
тАО07-23-2008 12:18 PM
Re: Three node OpenVMS cluster hanging issues
Normally if you want a quorum disk, you would set its votes to nodes-1, and each node to one vote, and expected votes to (num_nodes*2)-1
In your 3 node case:
Each node 1 vote
Quorum disk 2 votes
Expected votes 5
which leave quorum at 3.
What do you have for quorum disk votes?
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2008 01:11 PM
тАО07-23-2008 01:11 PM
Re: Three node OpenVMS cluster hanging issues
You have your EXPECTED_VOTES set too high. Use the formula that Jon gave in his answer. Make sure you put it in as hardcoded values in MODPARAMS, do an AUTOGEN with a reboot and you should be o.k. The formula he gave will allow two of the nodes to be down and the third to stay up.
Phil@Vital
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2008 01:25 PM
тАО07-23-2008 01:25 PM
Re: Three node OpenVMS cluster hanging issues
Since VMS will protect itself as well as possible, the expected votes will ratchet up. So if you have a quorum disk with 1 vote, and each node has 1 vote, expected votes will be bumped to 4 as soon as the third node joins the cluster. That will make quorum = 3. As long as every node has a consistent set of cluster related sysgen parameters, and each node has a direct link to the quorum disk, the cluster should survive the unexpected loss of 1 node. There would be a temporary hang during cluster transition, but the remaining nodes should remove the member from the cluster and continue.
You stated that the loss of a node caused a permanent hang (with no indication about which node, so I will assume you meant any node). You also stated that this happens even when using a shutdown with the remove_node option, which should trigger the remaining nodes to adjust quorum based on the remaining votes.
Summary: Given the information you provided, you should not be seeing what you have reported. So there must be something unstated that is causing the behavior you are seeing.
Cut and past the following into a file, for example cluster.debug
$ create sys$scratch:show_cluster$init.debug
INITIALIZE
ADD CLUSTER/ALL
ADD TRANSITION_TIME
ADD QUORUM
ADD EXPECTED
ADD QDVOTES
ADD QF_ACTIVE
ADD QF_SAME
ADD QF_WATCHER
SET SCREEN = 132
$ define/user show_cluster$init sys$scratch:show_cluster$init.debug
$ show cluster
$ delete sys$scratch:show_cluster$init.debug;
Then do the following and show us the output.
$ @cluster.debug
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2008 01:36 PM
тАО07-23-2008 01:36 PM
Re: Three node OpenVMS cluster hanging issues
Also, What system is your Quorum disk connected to? Is it served via MSCP to the other systems? Is it possible that the quorum disk is local to the system your shutting down and consequently the other two systems are losing connectivity to the quorum disk and causing the clsuter to lose quorum?
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2008 03:33 PM
тАО07-23-2008 03:33 PM
Re: Three node OpenVMS cluster hanging issues
There are several different possibilities. Access to the quorum disk could be compromised, or different quorum disks could be identified by different nodes (I have seen both, as well as some other problems involving quorum disks that created symptoms similar to what is described in this post).
It is also possible to have incorrectly set voting parameters, or inconsistent voting parameters across the cluster.
The physical configuration can also be a problem. As has been noted, using a served quorum disk can be problematical if one or two machines have the ability to sever all access to the quorum disk.
More data (the settings of each machine with regards to voting and quorum disk access) would, needless to say, be helpful in better understanding this situation.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2008 06:52 PM
тАО07-23-2008 06:52 PM
Solution1+1+1+2QD=5, quorum=3.
Post the SHOW /CLUSTER parameters from each of the three nodes; the interesting ones here are:
VAXCLUSTER, EXPECTED_VOTES, VOTES, DISK_QUORUM and QDSKVOTES. Or the SYSMAN PARAM SHOW /CLUSTER output from each, if that's easier.
Ensure each of the three nodes can access $1$DGA110, and MOUNT the disk.
Setting EXPECTED_VOTES too low riskscorruptions with shared resources during cases of partitioning. Don't "game" the settings; set this value to the number of votes that should be present. OpenVMS will correct this setting, once connections are established. Unfortunately, if two lobes cannot connect but both have quorum as EXPECTED_VOTES was "gamed" and set too low, clustering will do what you asked and your shared disks are toast.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-24-2008 05:46 AM
тАО07-24-2008 05:46 AM
Re: Three node OpenVMS cluster hanging issues
%SYSMAN-I-ENV, current command environment:
Clusterwide on local cluster
Username RRUSSIK will be used on nonlocal nodes
SYSMAN> do mcr sysgen show/cluster
%SYSMAN-I-OUTPUT, command execution on node OHMS03
Parameters in use: Active
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VAXCLUSTER 2 1 0 2 Coded-valu
EXPECTED_VOTES 5 1 1 127 Votes
VOTES 1 1 0 127 Votes
DISK_QUORUM "$1$DGA110 " " " " " "ZZZZ" Ascii
QDSKVOTES 1 1 0 127 Votes
QDSKINTERVAL 3 3 1 32767 Seconds
ALLOCLASS 1 0 0 255 Pure-numbe
LOCKDIRWT 1 0 0 255 Pure-numbe
CLUSTER_CREDITS 32 32 10 128 Credits
NISCS_CONV_BOOT 0 0 0 1 Boolean
NISCS_LOAD_PEA0 1 0 0 1 Boolean
NISCS_PORT_SERV 0 0 0 3 Bitmask
MSCP_LOAD 1 0 0 16384 Coded-valu
TMSCP_LOAD 0 0 0 3 Coded-valu
MSCP_SERVE_ALL 1 4 0 15 Bit-Encode
TMSCP_SERVE_ALL 0 0 0 15 Bit-Encode
MSCP_BUFFER 1024 1024 256 -1 Coded-valu
MSCP_CREDITS 32 32 2 1024 Coded-valu
TAPE_ALLOCLASS 0 0 0 255 Pure-numbe
SD_ALLOCLASS 0 0 0 255 Pure-numbe
NISCS_MAX_PKTSZ 8192 8192 576 9180 Bytes
NISCS_LAN_OVRHD 0 0 0 256 Bytes
SERVED_IO 0 0 0 0 Obsolete
CWCREPRC_ENABLE 1 1 0 1 Bitmask D
RECNXINTERVAL 20 20 1 32767 Seconds D
MSCP_CMD_TMO 0 0 0 2147483647 Seconds D
%SYSMAN-I-OUTPUT, command execution on node OHMS02
Parameters in use: Active
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VAXCLUSTER 2 1 0 2 Coded-valu
EXPECTED_VOTES 5 1 1 127 Votes
VOTES 1 1 0 127 Votes
DISK_QUORUM "$1$DGA110 " " " " " "ZZZZ" Ascii
QDSKVOTES 1 1 0 127 Votes
QDSKINTERVAL 3 3 1 32767 Seconds
ALLOCLASS 2 0 0 255 Pure-numbe
LOCKDIRWT 1 0 0 255 Pure-numbe
CLUSTER_CREDITS 32 32 10 128 Credits
NISCS_CONV_BOOT 0 0 0 1 Boolean
NISCS_LOAD_PEA0 1 0 0 1 Boolean
NISCS_PORT_SERV 0 0 0 3 Bitmask
MSCP_LOAD 1 0 0 16384 Coded-valu
TMSCP_LOAD 0 0 0 3 Coded-valu
MSCP_SERVE_ALL 1 4 0 15 Bit-Encode
TMSCP_SERVE_ALL 0 0 0 15 Bit-Encode
MSCP_BUFFER 1024 1024 256 -1 Coded-valu
MSCP_CREDITS 32 32 2 1024 Coded-valu
TAPE_ALLOCLASS 0 0 0 255 Pure-numbe
SD_ALLOCLASS 0 0 0 255 Pure-numbe
NISCS_MAX_PKTSZ 8192 8192 576 9180 Bytes
NISCS_LAN_OVRHD 0 0 0 256 Bytes
SERVED_IO 0 0 0 0 Obsolete
CWCREPRC_ENABLE 1 1 0 1 Bitmask D
RECNXINTERVAL 20 20 1 32767 Seconds D
MSCP_CMD_TMO 0 0 0 2147483647 Seconds D
%SYSMAN-I-OUTPUT, command execution on node OHMS01
Parameters in use: Active
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VAXCLUSTER 2 1 0 2 Coded-valu
EXPECTED_VOTES 5 1 1 127 Votes
VOTES 1 1 0 127 Votes
DISK_QUORUM "$1$DGA110 " " " " " "ZZZZ" Ascii
QDSKVOTES 1 1 0 127 Votes
QDSKINTERVAL 3 3 1 32767 Seconds
ALLOCLASS 3 0 0 255 Pure-numbe
LOCKDIRWT 1 0 0 255 Pure-numbe
CLUSTER_CREDITS 32 32 10 128 Credits
NISCS_CONV_BOOT 0 0 0 1 Boolean
NISCS_LOAD_PEA0 1 0 0 1 Boolean
NISCS_PORT_SERV 0 0 0 3 Bitmask
MSCP_LOAD 1 0 0 16384 Coded-valu
TMSCP_LOAD 0 0 0 3 Coded-valu
MSCP_SERVE_ALL 1 4 0 15 Bit-Encode
TMSCP_SERVE_ALL 0 0 0 15 Bit-Encode
MSCP_BUFFER 1024 1024 256 -1 Coded-valu
MSCP_CREDITS 32 32 2 1024 Coded-valu
TAPE_ALLOCLASS 0 0 0 255 Pure-numbe
SD_ALLOCLASS 0 0 0 255 Pure-numbe
NISCS_MAX_PKTSZ 8192 8192 576 9180 Bytes
NISCS_LAN_OVRHD 0 0 0 256 Bytes
SERVED_IO 0 0 0 0 Obsolete
CWCREPRC_ENABLE 1 1 0 1 Bitmask D
RECNXINTERVAL 20 20 1 32767 Seconds D
MSCP_CMD_TMO 0 0 0 2147483647 Seconds D
SYSMAN>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-24-2008 06:13 AM
тАО07-24-2008 06:13 AM
Re: Three node OpenVMS cluster hanging issues
I'll assume all three of these nodes have functional FC and all three have direct access to $1$DGA110:.
As for why this box is hanging awaiting the third node, that implies (dis)connectivity, and here probably around when the quorum disk is manifested to the newly-forming cluster. With EV=5 and no QD connection, you need all 3 nodes present.
For grins (and I'm guessing at several key aspects of this cluster configuration not yet in evidence) configure the quorum disk as the system disk. This assuming the quorum disk is another controller-based FC SAN DG disk, with or without controller-based RAID; that the system here disk is common, FC SAN-based and not host shadowed.
Do also load the current ECO kits; this as a boilerplate response to any weirdness. If you're not current when weirdness arises, get current first and then go hunting for the weirdness.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-24-2008 10:03 AM
тАО07-24-2008 10:03 AM
Re: Three node OpenVMS cluster hanging issues
I concur with Hoff. Since it was mentioned that it is desired to have a single node runnable as the cluster, then the sum of the votes of a single node and the votes assigned to the quorum disk (QDSKVOTES) must achieve quorum.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-24-2008 10:17 AM
тАО07-24-2008 10:17 AM
Re: Three node OpenVMS cluster hanging issues
expected_votes = 5
vaxcluster = 2
disk_quorum = "$1$DGA1112"
votes = 1
qdskvotes = 2
(anything else I might be missing?)
Thank you,
Ron Russik
Ron.Russik@yrcw.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-24-2008 12:40 PM
тАО07-24-2008 12:40 PM
Re: Three node OpenVMS cluster hanging issues
I like to create a file sys$common:[sysexe]agen_cluster_common_modparams.dat that has all the cluster parameters, like votes, expected votes, quorum disk votes, quorum disk name, etc. and then in sys$system:modparams.dat I put a line with
AGEN$INCLUDE_PARAMS SYS$COMMON:[SYSEXE]AGEN_CLUSTER_COMMON_MODPARAMS.DAT
Then I only need to change one file if the cluster values need to change. I have other common include files for site specific, application specific, etc. Then each nodes specific modparams.dat only has the agen$include_params followed by a few items, like node name, SCSSYSTEMID, etc. After a system upgrade, you will need to cleanup each nodes MODPARAMS.DAT, as the upgrade usually appends to it, with values that will supersede anything in the include files, so the method isn't maintenance free.
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-25-2008 09:41 AM
тАО07-25-2008 09:41 AM
Re: Three node OpenVMS cluster hanging issues
Thank you for all your help in this matter. Your solutions, recommendations, and advice were very helpful in assisting with this issue.
Again, Thank you for all your help.
Ron Russik
Ron.Russik@yrcw.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-24-2008 11:13 AM
тАО11-24-2008 11:13 AM
Re: Three node OpenVMS cluster hanging issues
expected_votes = 5
vaxcluster = 2
disk_quorum = "$1$DGA1112"
votes = 1
qdskvotes = 2