- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Strange values in the SHOW CLUSTER display
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2007 11:50 PM
02-26-2007 11:50 PM
Strange values in the SHOW CLUSTER display
We run a cluster with two boot servers, a quorum disk and 50 workstations as satellites.
Each boot server has 2 votes, the quorum disk has one and the satellites have none.
A few days ago upgraded the cluster from 7.3-2 to 8.3 as follows:
1) Create a new SAN disk with a image backup the (single) system disk
2) boot one server from the 8.3 CD, perform the upgrade on and boot from the new disk
3) disable MOP on the 7.3-2 server
4) reboot most satellites
5) reboot second server
6) reboot remaining satellites
Everything runs fine - no problems have been encountered so far.
Now a colleague got panicky because when we do a SHOW CLUSTER, some nodes show a value of 3 and others a value of 2 for the Q column in the MEMBERS section. The latter would indicate the possibility of a cluster split! However, doing a SHOW CLUSTER on all nodes shows a value of 3 for CL_Q on each of them while CL_EXP and CL_V are both 5.
Some analysis revealed that all satellites showing a Quorum of 2 had been rebooted before the second boot server while all satellites showing a Quorum of 3 got rebooted after the second boot server.
The documentation (7.3) says
-------------------
Derived from EXPECTED_VOTES and calculated by the connection manager. It represents an initial value for the minimum number of votes that must be present for this node to function. The dynamic QUORUM value is the CL_QUORUM field, which is described in the CLUSTER class category
-------------------
The second sentence would indicate a value which was once valid but is not neccessarily valid now -> ergo useless. The third sentence would make sense, but is not what we actually get.
What is wrong here?
Edwin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2007 01:08 AM
02-27-2007 01:08 AM
Re: Strange values in the SHOW CLUSTER display
The EXPECTED_VOTES system parameter value appears wrong in the parameters on several nodes. In your current case, it should be set to five on all nodes.
I'd likely configure this cluster with one vote for each of the two servers, and with one vote for the quorum disk; assuming the quorum disk is connected on a multi-host bus.
The CL values for VOTES and for quorum are the calculated values; the running values. In your configuration, CL_QUORUM should be 3 and CL_VOTES should be 5 everywhere.
The risk here is not the running configuration, but in the interval before the connectivity is established. Once connections are established, values for parameters such as EXPECTED_VOTES will be floated upwards to the correct value. If connections cannot be established -- such as cases with name collisions; two nodes erroneously booting from the same system root is a classic example -- you can generate a partitioned cluster, and can stomp on your environment.
MODPARAMS has an include-file syntax, and it can be convenient to stuff things like the quorum values into a common file. Here's a somewhat complex example of customizing AUTOGEN, including local dynamic parameter calculation and use of AGEN$INCLUDE_PARAMS:
http://h71000.www7.hp.com/wizard/wiz_3604.html
There are details on the display settings here: http://h71000.www7.hp.com/doc/83final/6048/6048pro_065.html#shcl_part
There are details on properly setting VOTES and EXPECTED_VOTES for various specific local configuration requirements here:
http://www.hoffmanlabs.com/vmsfaq/
The present cluster configuration appears valid and stable, though I would address the settings as part of the next reboot cycle, and I would also specifically address the VOTES and EXPECTED_VOTES settings on each node node before bringing that new node into the cluster.
Stephen Hoffman
HoffmanLabs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2007 01:53 AM
02-27-2007 01:53 AM
Re: Strange values in the SHOW CLUSTER display
As I said in the first place:
>> However, doing a SHOW CLUSTER on all nodes shows
>> a value of 3 for CL_Q on each of them while
>> CL_EXP and CL_V are both 5.
I may not have made clear enough this this is true for all systems. Also, our modparams.dat has just two lines beside 3 includes: SCSSYSTEMID and SCSNODE.
The problem is in the second column of the member class as can be seen in the attached file. ASSM80, which is my workstation, show a 2 in the 'Q' column but a 3 for CL_Q at the bottom.
Edwin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2007 02:16 AM
02-27-2007 02:16 AM
Re: Strange values in the SHOW CLUSTER display
Q=2 and Q=3: the setting in system parameters, as derived from EXPECTED_VOTES or whatever is available, as determined at boot time. Changes only when rebooted.
CL_EXP=5 : the running value, as corrected.
CL_Q=3 : the running value for the cluster quorum for all nodes, as derived from the total number of votes present plus one vote, rounded up.
The running setting can be adjusted upward automatically, and downward with the IPL C (IPC) or DECamds or SET CLUSTER /EXPECTED_VOTES command or such; with manual command input.
Please confirm that EXPECTED_VOTES is set to five on all nodes:
SYSMAN
SET ENV/CLUSTER
PARAM SHOW EXPECTED_VOTES
It appears that various nodes, including ASSM80, have a different value. And one that is lower than what HP recommends.
As for the system-level parameter information, there should be a value for VOTES and EXPECTED_VOTES in each MODPARAMS.DAT or stored in a (usually shared) AGEN$INCLUDE_PARAMS-based file.
The OpenVMS FAQ describes how to determine the correct and most appropriate settings for VOTES and QDISK_VOTES, and how to derive EXPECTED_VOTES from that value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2007 02:35 AM
02-27-2007 02:35 AM
Re: Strange values in the SHOW CLUSTER display
The stations are setup in such a way that they boot as soon as they see a cluster. Normally this cluster would have 3 to 5 votes but the stations say expected votes=2 so that they can join a reduced cluster too (broken disk and 1 node brought down with remove_node). And because the stations have no votes they can not form a cluster of their own.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2007 03:31 AM
02-27-2007 03:31 AM
Re: Strange values in the SHOW CLUSTER display
The stations are setup in such a way that they boot as soon as they see a cluster. Normally this cluster would have 3 to 5 votes but the stations say expected votes=2 so that they can join a reduced cluster too (broken disk and 1 node brought down with remove_node). And because the stations have no votes they can not form a cluster of their own.<<<
I've certainly seen it before, too.
The hazard is that you can boot into a partitioned cluster There are certainly cases when you want to bootstrap into a degraded configuration but -- in this case, with (say) two one-vote boot nodes and a one-vote quorum disk on a (assumed) shared cluster interconnect -- there seems no degraded configuration you could conceivably even want to boot into.
If either of the boot/voting nodes and the quorum disk is up, or both the boot/voting nodes and no quorum disk, you can boot a satellite.
If you're down to one node and no quorum disk, you are also down into the range when the cluster could conceivably be partitioned; where the two voting nodes have multiple problems. Is automatic booting here a good idea?
One configuration I saw up close and personal had incorrect settings, and was booted into a partitioned configuration when an SRM console command variable had gotten reset. Bye-bye disk data.
How does one get corrupted data? I'll assume for the sake of argument here that the quorum disk is on shared SCSI. If you look at the EXPECTED_VOTES values for the boot nodes shown in the attachment, you'll see that should you erroneously boot the two boot nodes from the same root, the cluster WILL start and the nodes will not connect. Unfortunately, the nodes will believe they have quorum because of the value in EXPECTED_VOTES, and each will allow processing. This is a partitioned cluster. Disks will get stomped on.
I have not tried to generate a partitioned cluster on a Fibre Channel (FC) SAN, but -- unless one of the FC controllers detects and prevents this -- the "duplicate" configuration is basically the same as a shared SCSI bus. It's a storage interconnect, meaning that unless the SAN notices two different nodes pounding on the same disk, it'll have the same effect. The two nodes cannot "see" each other over SCSI or over the FC SAN, but can reach storage.
The fellow that was concerned is right to have been concerned, IMHO. A set of blade-guards has been disabled here. If the blade guards were intentionally disabled, there is should be understanding of the risks and of the intended operations and command sequences for use in the degraded configurations. (And in this case, that there isn't a whole lot of value to disabling the blade guards, as the connection manager will float the running quorum value just as soon as connections are established.) Personally, I generally prefer to allow the automatic blade guards to remain in place and to work, and to only disable them upon explicit manual command input.
I prefer to have the cluster configuration to encounter a user data integrity interlock -- what can be called the quorum hang -- than to have the cluster proceed and risk stomping on data. The quorum scheme was not implemented to cause folks to have a hang and to have an outage, the quorum scheme is a set of blade guards specifically designed and implemented to prevent a serious outage.
Again, there's a whole section on this topic in the FAQ. (For the next edition of the FAQ, I'll add some text on deliberately-degraded bootstraps when initially forming a cluster or when booting in a degraded state, as I see that's not listed in the current edition.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2007 08:29 PM
02-27-2007 08:29 PM
Re: Strange values in the SHOW CLUSTER display
>> SYSMAN
>> SET ENV/CLUSTER
>> PARAM SHOW EXPECTED_VOTES
Yes it is 5 on all servers and satellites!
Also, this value has never changed - at least not intentionally and I don't know how it could change otherwise. As mentioned, this value is specified in a included file.
As to the configuration assumptions:
The quorum disk (and all other disks) are SAN disks, two FC interfaces, 4 FC pathes. The interconnect between the two servers is by means of 2 LAN links via 2 different switches and a third dedicated SCS link by means of a direct wire between the two boxes.
>> A set of blade-guards has been disabled here.
Not that I'm aware of! The only unusual thing is that for some time (until all system got rebooted) the cluster has run from two different system disks, one with 7.3-2 and one with 8.3, with individual LDB, UAF, etc. And whenever a satellite was rebooted, both servers and the quorum disk have been available.
After all, this was just a rolling upgrade.
As there seems to be no risk for now, I will leave the cluster as is until I'm going to perform a planned autogen/reboot with feedback on all nodes in a few days. Let's see how it looks afterwards.
Edwin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2007 12:39 PM
03-14-2007 12:39 PM
Re: Strange values in the SHOW CLUSTER display
If there are two boot servers and a quorum disk and we assign one vote to each of the boot servers and quorum disk then according to my count the expected vote should be 3 not 5. What am I missing?
What is a blade guard?
Thanks
Cass
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2007 01:16 PM
03-14-2007 01:16 PM
Re: Strange values in the SHOW CLUSTER display
> Feb 27, 2007 12:50:13 GMT
> Each boot server has 2 votes, the quorum
> disk has one and the satellites have none.
2 + 2 + 1 = 5
Don't ask me why "[e]ach boot server has 2
votes".
> What is a blade guard?
The inconvenient and annoying part of a power
tool which is intended to keep your
vulnerable body parts away from the dangerous
moving parts. Here, used metaphorically.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2007 05:39 AM
03-15-2007 05:39 AM
Re: Strange values in the SHOW CLUSTER display
That only gave me three votes, not five.