Operating System - HP-UX
1833724 Members
2680 Online
110063 Solutions
New Discussion

Campus serviceguard config and lock disk question

 
Richard Pereira_1
Regular Advisor

Campus serviceguard config and lock disk question

Hi,

I have a 4 node SG cluster built as follows;

-2 remote sites. (site A and B)
-4 servers, 2 servers at each site. (1,2 at A and 3,4 at B)
-each server runs a package and fails over to a sister server at the alternate site.(1 fails over to 3 and 2 fails over to 4)
-we are using dual lock disk. each disk is in its own VG and is visible to all 4 nodes.

---A <--------> B
---1 <--------> 3
---2 <--------> 4
vglock1 vglock2

I have this nagging question about this dual lock disk setup. The managing SG guide (B3936-90065.pdf) at page 58 states:

"If one of the dual lock disks fails, ServiceGuard will detect this when it
carries out periodic checking, and it will write a message to the syslog
file. After the loss of one of the lock disks, the failure of a cluster node
could cause the cluster to go down."

Does this mean that if my B data center were to completely crash (instant loss of power, servers, telecom and disks go out). Would the nodes at data center A panic and then attempt to reform the cluster instead of staying online?
2 REPLIES 2
John Bigg
Esteemed Contributor

Re: Campus serviceguard config and lock disk question

A dual cluster lock is a compound lock which means under normal circumstances you have to get both locks. This is why in a normal cluster adding a second cluster lock does not give extra redundancy it makes the cluster less available since the failure of either lock disk would prevent the cluster getting a lock risking an entire cluster failure. i.e. doubling the risk of a cluster lock failure.

However, there is a difference between failing to get a lock due to detecting an error, and not being able to contact the cluster lock disk at all which is what happens when you have a site failure.

In the situation you describe, if you have a cluster lock fail at site B and then you lost the entire site B, the cluster nodes at site A would attempt to reform and after obtaining the cluster lock at site A would continue running. This is because the request to obtain cluster lock B would timeout rather than fail, and since the lock at A was obtained the cluster would form.

This contrasts to the situation where the cluster lock at site B fails, and then nodes 3 and 4 at site B fail at the same time but not the whole site. i.e. cluster lock B is still reachable but is failed and generates an I/O error rather than a timeout. In this situation nodes 1 and 2 at site A would also fail since they could reach the cluster lock at site B but could not obtain it even if they had obtained the lock at site A.

i.e. you are safe with your configuration and are using dual cluster locks correctly. Unlike many who think adding a second lock is the right thing to do to protect from cluster lock failures in a non campus cluster environment.
Richard Pereira_1
Regular Advisor

Re: Campus serviceguard config and lock disk question

Thanks, closing thread