- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Cluster suspended while one member had a defec...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2007 07:06 PM
06-18-2007 07:06 PM
I had a problem where I can't find the solution. So I hope one of you can help me.
I have a cluster with 4 members (each member got 1 vote, expected votes=3, no quorum disk). Last week, the fan of the CPU of one cluster member had a defect, so this machine turned out.
In my opinion, the rest of the cluster had to run normal. But it seemed as if the other cluster members suspended. They couldn't be reached, even on the console you couldn't do anything. They worked again (without reboot) when the broken cluster member was back.
So, what could be the problem of it?
Regards,
Kirsten
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2007 07:15 PM
06-18-2007 07:15 PM
SolutionHow is storage organizes in your cluster, do the remaining nodes have access to vital disks (e.g. the systemdisk), or are some disks served by the failing node?
If the quorum was lost, there should be messages on the consoles or in the OPERATOR.LOG.
regards Kalle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2007 07:17 PM
06-18-2007 07:17 PM
Re: Cluster suspended while one member had a defect
Do you have a Quorum disk ?
Can you post the votes of all the members ?
It seems the number of votes was under the quorum, so it may explain why the cluster hang.
It is a pity that you do not have AMDS or Availability Manager, as it tells you the quorum is not reached, and you can force a new value for the quorum, so the Cluster is again working.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2007 07:39 PM
06-18-2007 07:39 PM
Re: Cluster suspended while one member had a defect
The system disk is reachable from all machines in the cluster, so this couldn't be the problem.
It was a bit of mystique for me. In the moment, one cluster member was broken, all the other machines suspended. No entrie in the operator log for the reason. They worked again, when the broken cluster member was back. I couldn't understand this.
Short summary: 4 nodes in a cluster, no quorum disk, expected votes 3, each machine 1 vote.
Regards,
Kirsten
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2007 08:12 PM
06-18-2007 08:12 PM
Re: Cluster suspended while one member had a defect
at least some lost-connection... messages should be in the OPERATOR.LOG.
Can you give some more background on your configuration, e.g. storage, interconnects...
regards Kalle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2007 09:28 PM
06-18-2007 09:28 PM
Re: Cluster suspended while one member had a defect
=(expected_values/2 +1) rounded down.
then the calculated value of quorum in your scenario is =3/2+1 =2.5 rounded down i.e. 2
So when in your cluster, if atleast two nodes alive ,your cluster should be up.I think you should check sysgen parameters (votes,expected_votes,QDSKVOTES) and also modparams.dat file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2007 09:37 PM
06-18-2007 09:37 PM
Re: Cluster suspended while one member had a defect
regards Kalle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2007 11:26 PM
06-18-2007 11:26 PM
Re: Cluster suspended while one member had a defect
your votes configuration is correct. For 4 nodes, the majority (i.e. QUORUM) is 3, so the cluster should continue, if only one node is lost.
It may be too late to find out, why the clsuter has apparently hung. Do you capture your console data with some console manager application ? If not, there should be at least some messages in OPERATOR.LOG - written once the 4th system came back again.
If this would happen again - and if it really has something to do with lost quorum, you could try the IPC interrupt on the console to recalculate quorum.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2007 04:54 AM
06-19-2007 04:54 AM
Re: Cluster suspended while one member had a defect
your formula is wrong.
The correct formula for calculationg quorum is:
quorum = (expected_votes+2)/2
In this case, (4+2)/2 gives 3, which is the correct quorum value for a 4 votes.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2007 05:10 AM
06-19-2007 05:10 AM
Re: Cluster suspended while one member had a defect
check and see if the votes/expected votes etc are what you think they are. when its running, do a..
$ show cluster/continous
add vote
add quorum
add cluster
that will show what the running cluster has.
see if that makes sense. Dean
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2007 05:16 AM
06-19-2007 05:16 AM
Re: Cluster suspended while one member had a defect
Expected_Votes -- per the original posting -- is set incorrectly. If connectivity is not available (due to a console configuration error or due to a partial communications disconnection), then the Expected_Votes set to 3 will result in Quorum being calculated as 2, which could then allow two disjoint partitions to operate in parallel, and with the data corruption that typically then ensues.
If you wish to preserve the integrity of your disk data, Expected_Votes should be set to 4, and not to 3.
http://64.223.189.234/node/153
Personally, I view the existing quorum mechanism implemented with system parameters as a design mistake. Far too often, somebody either sets the values incorrectly, or sets the values "creatively"; deliberately and erroneously sets their configuration incorrectly.
The central rational for existence for the cluster quorum scheme is to prevent your data from getting stomped on. It's not something you want to mis-set, lest you allow your data to get stomped on. And by "stomped on", I here mean "massively corrupted; how current is your BACKUP?", or such.
Stephen Hoffman
HoffmanLabs LLC