- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Cluster Transitions
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-26-2010 03:03 AM
тАО02-26-2010 03:03 AM
Cluster Transitions
Four node cluster.
One of the nodes crashes and exits cluster with a CPU bcache hardware fault.
I would expect the other 3 nodes to have hung for the 4 second interval we have RECNXINTERVAL set at. Then business as normal.
However when we look at our application latency stats , we don't see any hangs during the time of the crash. We measure latency in milliseconds and a 4 second hang would stick out like a sore one ....
Not complaining ....we survived a server crash and saw no loss of service ....
What am I missing ?
p.s. This is the first server crash we have had in 4 years. We run 24 hours ...Sunday evening start to Friday night finish.
Latency measured in milliseconds.
Above 3 seconds is considered an outage and investigated.
Rock on OpenVMS and ES45 Alpha servers ....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-26-2010 03:48 AM
тАО02-26-2010 03:48 AM
Re: Cluster Transitions
What are your applications doing?
Precisely what "response time" are you measuring?
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-26-2010 05:17 AM
тАО02-26-2010 05:17 AM
Re: Cluster Transitions
If there was a real OpenVMS crash (probably a MACHINECHK), then - during the crash processing - the node has sent a 'last gasp' message over all cluster channels, so the other nodes IMMEDIATELY removed that node from the cluster, without the need of waiting for RECNXINTERVAL to expire.
RECNXINTERVAL only comes into play, if no 'last gasp' message has been sent, i.e. if you just HALT the machine or just power it off. Or if the network connections all fail at once.
OpenVMS clusters - the best clusters there are !
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-26-2010 05:22 AM
тАО02-26-2010 05:22 AM
Re: Cluster Transitions
Yes OpenVMS Clusters simply the best.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-26-2010 05:28 AM
тАО02-26-2010 05:28 AM
Re: Cluster Transitions
In particular, a crashing cluster member node usually sends out the so-called last-gasp message. Receipt of this message allows the remaining nodes to completely bypass the RECNXINTERVAL mechanisms; they "know" what happened to the host.
Cases where no last gasp is sent (console halt, network disconnection, certain hardware failuresm etc) will require a longer transition.
As for the voting configuration, if these boxes have a shared RAID-capable external controller, then the typical uptime configuration would be one vote to each host, and three votes to a shared quorum disk with controller RAID. You'd see fewer or no votes configured to the quorum disk when faster transitions are required, presuming the hosts are stable.
Inferring much from your statements around operations and latency and uptime, I'd also expect to be prototyping replacement servers here. Most any of the rx2660 series or blades would be the replacement path for this configuration. These boxes are nice, but they're also big and slow and getting older all the time.
Some reading:
http://labs.hoffmanlabs.com/node/153
http://labs.hoffmanlabs.com/node/569