HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Disabling node TOC?
Operating System - HP-UX
1830207
Members
1424
Online
109999
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2009 06:12 AM
03-27-2009 06:12 AM
Disabling node TOC?
Is there any (supported) way to completely disable Serviceguard rebooting a node entirely? We have a number of machines that run more than one service (some not using MC/SG), and when SG reboots a node, all the other services on the node are also killed.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2009 06:52 AM
03-27-2009 06:52 AM
Re: Disabling node TOC?
If your packages have the "service_fail_fast_enabled" flag set, this forces the node to reboot whenever a failure is detected. That would be a case of "you asked for it, you got it."
If you don't use the "service_fail_fast_enabled" setting, the only reason for SG to trigger a reboot is if all the heartbeat connections are lost and the node failed to get the cluster lock. If this happens often, you should work on improving the reliability and/or redundancy of your heartbeat connections.
In a robust cluster, a SG-triggered reboot should be a very rare event - and when it happens, the rebooting node will usually be isolated by network faults so it could not provide any services to anyone at that time anyway.
Avoiding the reboots completely would require developing a new strategy for avoiding the split-brain situations. This strategy must be absolutely bulletproof: a wrong decision here means either not providing a service when you could, or corrupting your package disks.
MK
If you don't use the "service_fail_fast_enabled" setting, the only reason for SG to trigger a reboot is if all the heartbeat connections are lost and the node failed to get the cluster lock. If this happens often, you should work on improving the reliability and/or redundancy of your heartbeat connections.
In a robust cluster, a SG-triggered reboot should be a very rare event - and when it happens, the rebooting node will usually be isolated by network faults so it could not provide any services to anyone at that time anyway.
Avoiding the reboots completely would require developing a new strategy for avoiding the split-brain situations. This strategy must be absolutely bulletproof: a wrong decision here means either not providing a service when you could, or corrupting your package disks.
MK
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2009 06:56 AM
03-27-2009 06:56 AM
Re: Disabling node TOC?
It is indeed a rare event, but when it happens (e.g. during a maintenance event) it can be catastrophic. It seems like in the situation where it would normally reboot (all HB links down and no cluster lock) it could just shut *itself* down, but that doesn't seem to be an option...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-29-2009 09:42 AM
03-29-2009 09:42 AM
Re: Disabling node TOC?
The solution in that case is to tell ServiceGuard in advance that a node is going to go away, using the "cmhaltnode" command.
It causes all running packages on that node to shut down and failover to other nodes (if enabled and possible), *and* it prepares the rest of the cluster to accept that the halting node may go off-line after the halt operation is completed.
The "cmhaltnode" command is *not* in any way similar to the "shutdown" command: the OS of the node and any non-packaged services on the node will keep running, only ServiceGuard operations on the node are shut down.
If you have a two-node cluster and you use cmhaltnode to stop one node, check the syslog of the other node: you will see a message about shutting down the "safety time protection" (= the "deadman" panic reboot module) as the cluster transitions into single-node operation. After that, it's safe to do whatever you want with the halted node.
If this is not applicable to your situation, please explain in more detail. If we can find out exactly why the reboot is triggered in your situation, perhaps we can find a way to avoid it.
MK
It causes all running packages on that node to shut down and failover to other nodes (if enabled and possible), *and* it prepares the rest of the cluster to accept that the halting node may go off-line after the halt operation is completed.
The "cmhaltnode" command is *not* in any way similar to the "shutdown" command: the OS of the node and any non-packaged services on the node will keep running, only ServiceGuard operations on the node are shut down.
If you have a two-node cluster and you use cmhaltnode to stop one node, check the syslog of the other node: you will see a message about shutting down the "safety time protection" (= the "deadman" panic reboot module) as the cluster transitions into single-node operation. After that, it's safe to do whatever you want with the halted node.
If this is not applicable to your situation, please explain in more detail. If we can find out exactly why the reboot is triggered in your situation, perhaps we can find a way to avoid it.
MK
MK
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP