1849960 Members
2941 Online
104049 Solutions
New Discussion

Heart Beat

 
SOLVED
Go to solution
vinayan
Advisor

Heart Beat

Any body can tell me what happens if my heartbeat lan of HA cluster fails and what are all the steps I need to carry out to diagnose and troubleshoot??????
HP UX Version 11.11
Service guard 11.16
5 REPLIES 5
Ludovic Derlyn
Esteemed Contributor

Re: Heart Beat

hi

If heartbeat failed, nodes will be toc ...

what is exactly your question ?
what is your problem encountered ?

Regards
spex
Honored Contributor

Re: Heart Beat

Hello,

See "Optimizing failover time in a Serviceguard environment":

http://h71028.www7.hp.com/enterprise/downloads/Optimizing%20failover_6-22.pdf

PCS
Stephen Doud
Honored Contributor

Re: Heart Beat

First, you should configure multiple heartbeat networks for redundancy - then you won't have to worry about this :)
Do this by editing the cluster configuration file and change STATIONARY_IP references to HEARTBEAT_IP. Then halt the cluster and perform cmapplyconf on the file.
HEARTBEAT_IP simply allows Serviceguard to use that network to pass HB... it does not restrict data from using the network.

But if all HB networks fail, check /var/adm/syslog/syslog.log (or OLDsyslog.log) to see if it registers LAN NIC failures, eg. "lan3 failed"

You could also perform linkloop between HB NICs to see if they can communicate.

When ALL HB networks fail between nodes, SG uses a cluster reformation protocol:

-> If the remaining nodes that can communicate with one another comprise less than half of the original nodes, this set will TOC (reboot)

-> if the remaining nodes that can communicate with one another comprise more than half of the original nodes, this set will reform a new cluster

-> if the remaining nodes that can communicate with one another comprise exactly half of the original nodes, this set will seek the cluster arbitration device to determine which half gets to form the new cluster. The first half that checks in w/ the arbitration device gets to form a new cluster and the slower half will be forced to TOC (reboot).

Arbitration devices are either cluster lock disk or quorum server.
Patrick Wallek
Honored Contributor
Solution

Re: Heart Beat

You really need to have multiple heartbeat LANs. The whole purpose of the heartbeat is to periodically verify that the nodes of the cluster are up. If the heartbeat fails, then this is not possible and SG will see it as a node failure.

Its exact behavior from here depends on your set up. If you have a 2 or 3 node cluster and are using a lock disk, then the node that grabs the lock disk first will stay up. All other nodes will TOC.

If you have a quorum server then the quorum server will determine the clusters behavior.

If you have only a single heartbeat, then this is a bad cluster design in that it is a SPOF (Single point of failure).
san.sa
Advisor

Re: Heart Beat

Hi Vinayan ,

If you are looking for High Availability you should configure atleast 2 Heartbeat lan ..

If you have only one heartbeat , please configure the stationary heartbeat also , if you are looking for HA .

Regards

SAN