Operating System - HP-UX
1833784 Members
2440 Online
110063 Solutions
New Discussion

Re: HeartBeat and PowerFailure TOC

 
SOLVED
Go to solution
Riccardo Capuzzi_1
Frequent Advisor

HeartBeat and PowerFailure TOC

Hi All,
we have 2 nodes configured in a cluster environment (MC/SG 11.14) and, for each node, a link aggregate lan made by APA.
We have also configured as a standby heartbeat the APA interface.
Now the question is:
Are there any differences beetwen power failures and loss of heartbeat signal?
We know that with a loss of heartbeat signal a TOC will occur, but if we have a power failure on a cluster node?

Best regards,
Riccardo Capuzzi
7 REPLIES 7
Bernhard Mueller
Honored Contributor

Re: HeartBeat and PowerFailure TOC

Ricardo,

if the two nodes cannot reach each other, arbitration has to take place to decide which one will continue running the cluster.

only then the race for the lock disk happens and if one node has a power failure, you would assume that the other could aquire the lock.

if you have a quorum server outside the cluster and one has a power failure, you would assume the other can still reach the qs, right?

the node will TOC if it cannot reach the other node *and* cannot aquire the lock (or reach qs).

Regards,
Bernhard

Steven E. Protter
Exalted Contributor

Re: HeartBeat and PowerFailure TOC

Power failure of one or the other node but not both should not cause the surviving node to TOC. It should cause the surviging node to realize its its partner is down and if you have properly configured packages to failover, they should failover, along with other resources defined in the failover configuration.

Hopefully you have a UPS so the shutdown in the event of a power failure is a little more graceful.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Riccardo Capuzzi_1
Frequent Advisor

Re: HeartBeat and PowerFailure TOC

Thank you all, but i want to know what is the real difference beetwen Power Fail and loss of heart beat. Both in power fail and in loss of heartbeat, one of the cluster node isn't reachable. So why in one situation we have a TOC and in the other we don't have a TOC?

Best Regards,
Riccardo Capuzzi
RAC_1
Honored Contributor

Re: HeartBeat and PowerFailure TOC

Others have prettey well explained the "what will happen if one node looses the power" In case of loss of heartbeat, the server which owns the lock will continue to run and the other node will TOC.

Anil.
There is no substitute to HARDWORK
Steven E. Protter
Exalted Contributor

Re: HeartBeat and PowerFailure TOC

Power fail=no electricity.
Loss of heartbeat=can happen witht he power on. pull the cable on the heartbeat lan. Everything is powered up very nicely as TOC happens.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Bernhard Mueller
Honored Contributor
Solution

Re: HeartBeat and PowerFailure TOC

Riccardo,

if both nodes are up, but cannot communicate through any heartbeat lan, each thinks "the other node may be dead, then I should continue running the cluster (and packages)..." But if *both* are alive you could have a "split brain" syndrome, i.e. each of the nodes would run its own instance of the cluster and all packages. The result would most likely be some sort of data corruption.

Therefore, if MC/SG needs to make a decision where the cluster should run, this is called arbitration and the typical mechanism is a "lock disk" the one who acquires the lock (and sets a flag) is allowed to run the cluster (and packages) and the one who is *denied* the lock MUST GO DOWN (i.e. TOC) to avoid a split brain.

Regards
Bernhard
Riccardo Capuzzi_1
Frequent Advisor

Re: HeartBeat and PowerFailure TOC

Thanks you all for explanations.

Best Regards,
Riccardo Capuzzi