Operating System - HP-UX
1833838 Members
2681 Online
110063 Solutions
New Discussion

Re: Will both node halt if heartbeat fails

 
SOLVED
Go to solution

Will both node halt if heartbeat fails

According to the manual:

In the event of a LAN interface failure, a local switch is done to a standby LAN interface if one exists. If a heartbeat LAN interface fails and no standby is configured, the node fails with a TOC. If a data LAN interface fails without a standby, the node fails with a TOC only if Package Failfast (described further in the "Planning" chapter under "Package Configuration Planning") is enabled for the package.


This weekens we're doing some upgrade to the electrical system in one of the server room and therefor node 1 will be turned off completely and the discs.

Question is, will node 2 go into a TOC and halt (shutdown) because it looses allcontact with node 1 and the discs?
11 REPLIES 11
Sridhar Bhaskarla
Honored Contributor
Solution

Re: Will both node halt if heartbeat fails

Hi,

If 'all' heartbeats fail, then the node that cannot acquire the lock disk (or quorum) will TOC itself.

However, if you manually bring down the node using 'cmhaltnode', then you do not need to worry about other node crashing as the heartbeats to that node will no more be checked until it is put into the cluster again.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try

Re: Will both node halt if heartbeat fails

Just to be perfectly clear here, if i were to shut down and power down node 1 in a 2 node cluster, node 2 will also shut down since it will loose all contact with node 1 because of the heartbeats beeing lost?

Also, just to ponder a bit around this questions. Why does the whole node shut down, isn't it enough to just stop the cluster service? Is there a way to configure it so only the cluster stops in this case and not the whole server?

Are there any drawbacks to not shuting down the server?

I assume TOC here actually means shutting donw the server completely and not just the cluster deamon?
Sridhar Bhaskarla
Honored Contributor

Re: Will both node halt if heartbeat fails

Hi,

//Just to be perfectly clear here, if i were to shut down and power down node 1 in a 2 node cluster, node 2 will also shut down since it will loose all contact with node 1 because of the heartbeats beeing lost?//

If you shutdown node1 after halting the cluster daemon (cmhaltnode), then node2 will not go down as the cluster will be reformed with only node2 as the member. This is not a failure. This is one of the common practices used by SAs to do maintenances. Halt the node, do the maintenance, move the package to that node, bring down the primary node for maintenance etc., to minimize the downtime.

//lso, just to ponder a bit around this questions. Why does the whole node shut down, isn't it enough to just stop the cluster service? Is there a way to configure it so only the cluster stops in this case and not the whole server? //

Whole node will have to go down as there is a good possibility of the shared volume groups active on the system in case of 'package failures' which may potentially corrupt the data.

// assume TOC here actually means shutting donw the server completely and not just the cluster deamon?//

TOC means 'transfer of control.' The control will be passed immediately to hardware to ensure a quick shutdown of the system to prevent further data corruption.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try
melvyn burnard
Honored Contributor

Re: Will both node halt if heartbeat fails

The simple answer here is tha if you do an orderly shutdown of one node, then the cluster will reform on the remaining node as a single node cluster.
as this is NOT a failure condition, there i sno requirement to check for cluster lock or to verify heartbeats to the other node, so this node should remain in operation during the maintenance period of the other node.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Re: Will both node halt if heartbeat fails

but if one node TOC fails or the nett is lost (iow, both heartbeats) or simular, both nodes will shutdown because because of the heartbeat beeing lost?
melvyn burnard
Honored Contributor

Re: Will both node halt if heartbeat fails

not necessarily.
Only one node should TOC if you lose heartbeats, or have network issues.
The only time there should be BOTH nodes TOC'ing is when neither node can get the Cluster lock disc.
You ar eplanning to do a planned outage, by shutting down the node in question.
This causes the node to leave the cluster, resulting in a cluster reformation into a single node cluster. This is normal, expected behaviour, not a failure, and once the node has reformed as a single node cluster it will not try to monitor the other node, until that node is rebooted and it rejoins the running cluster.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Re: Will both node halt if heartbeat fails

Yeah, that part is pretty much coverd by you two guys and now and im thankfull.

Reason i keep asking is because we started a what if discussion here and i became a bit confused and needed some clarifications.

Someone claimed that that one of the node had a serious crash some time ago and resulted it node 1 going down hard and there was a loss of the network so the hearthbeats and lans between the nodes went down.

This again resulted in node 2 shuting down which i cant quite udnerstand. I can see how it has to halt the cluster service for the reasons mentioned earlier here.

Reason is that node 2 is also out omniback server (not in the cluster) so we really dont what the whole server shuting donw in cases like this.

So basically i'm just wondering if such a scenario could have happened? And if so, is there a safe way to avoid the server from shuting down an rather just stop the cluster deamon. I would think stopping the cluster deamon would achive the same result as shutding down the node to avoid data loss.
melvyn burnard
Honored Contributor

Re: Will both node halt if heartbeat fails

If during a previous scenario one node TOC'ed, the other should have remained up. If it did not, then there was an issue with either the cluster lock disc, or some other configuration issue.
Checking all the OLDsyslogs and package logs at the time of the incident may ahve helped.

For you rplanned outage, simply shutdown the one node, the other node should remain alive.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Re: Will both node halt if heartbeat fails

For the planned jb, im also shuting down the disc system which means, no cluster lock disc. Is this a thing that would result in a TOC on node 2?

So should i stop the cluster service on node 2 too?
melvyn burnard
Honored Contributor

Re: Will both node halt if heartbeat fails

please see my first response, if the cluster is reformed to become a single node cluster by doing an orderly shutdown of one node (or cmhaltnode -vf nodename) , this is not a failure, and therefore we do NOT require the cluster lock disc.
But this does indicate that in your previous failure, you may have lost the ability to contact the cluster lock disc in a failre scenario, and this WOULD result in the 2nd TOC'ing
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Re: Will both node halt if heartbeat fails

Ahh.....thanks for all info. Helped me clear up every issue ragarding this.