Servers: The Right Compute
Showing results for 
Search instead for 
Did you mean: 

High availability HP-UX and Linux disaster recovery solutions that circumvent costly downtime


Guest blog written by Ravichandra Krishnamurthy, HP Serviceguard Technical Architect


In spite of using designs with redundant network paths and switches, high availability clusters can experience network partitions. When network failures happen in a specific sequence or only in specific segments it results in the cluster being partitioned in various ways. When such partitions happen, clusters typically use arbitration mechanisms to determine which partition survives and which will be evicted from its membership. After the arbitrator determines the partition to be evicted, I/O fencing mechanisms like SCSI-3 Persistent reservation in combination with node fencing mechanisms are used to ensure that the evicted members of the cluster, reboot and no I/O is generated from them until they rejoin the cluster again.


Various mechanisms like a quorum server process, a dedicated shared LUN, or a volume group per cluster, are used for the purpose of quorum arbitration by HP Serviceguard cluster for HP-UX and Linux and are discussed below.


Various arbitration mechanisms


The quorum server is a process that runs on a separate node, outside of the Serviceguard cluster and arbitrates in the event of a network partition. While it is very simple to set up and can be shared by up to 50 clusters for the purpose of arbitration it may not be able to communicate with the quorum server, due to partitions in the network. To overcome this HP Serviceguard supports multiple paths from each cluster node to quorum server to provide redundancy. If one path fails the other path is used by the node to communicate with the quorum server.


The lock LUN arbitration mechanism is a small dedicated, shared LUN, on which a fast mutex structure is laid out. The coordinator servers of each partition try to acquire the mutex on the disk and the partition that obtains it survives, while the other one is evicted from the cluster membership.


Another arbitrator on Serviceguard (HP-UX) is the volume group lock or vglock, where the same fast mutex is laid out on the metadata of the volume group so that coordinator members of each partition try to acquire the lock in order to survive the partition.


Disk based arbitration mechanisms like lock LUN and vglock, is convenient and useful when the number of nodes in a cluster is four or less and all the nodes are co- located. A quorum server is preferable either when the number of cluster nodes exceed four or when they are located in more than one location i.e., clusters stretched across cities or located across two Metropolis.


Handling network partitions and fencing


When a Serviceguard cluster experiences a 50 - 50 network partition (i.e., cluster is partitioned into exactly two sets of nodes, which can communicate within themselves but not to each other) then the cluster requests the configured arbitration mechanism to select one of these partitions to continue in the membership of that cluster. Once the arbitrator selects a partition, the nodes of the other partition timeout and evicts themselves from the cluster. When the two sides of the network partition has unequal number of nodes, then the Serviceguard cluster chooses the partition which has a majority number of nodes, without employing the services of an arbitrator but only informing it about the new membership.  HP Serviceguard clusters are recommended to be configured with redundant network paths for heartbeat and data so that a single failure does not cause cluster reformations or workload failovers.


During a cluster reconfiguration, HP Serviceguard uses deadman kernel module (Linux) and kernel driver on HP-UX as its node fencing mechanism to ensure that nodes that are evicted from the membership of a cluster, reboot within a guaranteed duration. This is the first of the two guarantees very much essential to prevent data corruption. The second one is the prevention of ghost I/O. HP Serviceguard ensures this by using SCSI-3 Persistent reservation technology on all the data storage that is configured as part of the workloads in the cluster. The registration for the node that is evicted from the membership is revoked from the shared cluster storage so that any delayed I/O from that node, after the cluster reconfiguration phase is completed, is rejected by the storage.


Arbitration and fencing mechanisms ensure clusters handle network partitions and the associated requirements to prevent data corruption and ghost I/O. The mechanisms associated with these actions contribute to the robustness and stability of HP Serviceguard clusters on HP-UX and Linux.


HP Serviceguard for HP-UX and Linux provides a comprehensive set of solutions ensuring high availability and disaster recovery for the mission critical needs of enterprise workloads. Stay tuned for more about of the unique capabilities of this solution.




Are you interested in staying in the loop on HP Server topics and trends?
We have several topic-specific Twitter accounts you may be interested in, such as:

BladesHP-UXIntegrityNonStop, and ProLiant


We also invite you to join the HP Servers LinkedIn Group and our Facebook page to stay up to date on the latest from HP Servers.


0 Kudos
About the Author


28-30 November
Madrid, Spain
Discover 2017 Madrid
Join us for Hewlett Packard Enterprise Discover 2017 Madrid, taking place 28-30 November at the Feria de Madrid Convention Center
Read more
HPE at Worldwide IT Conferences and Events -  2017
Learn about IT conferences and events  where Hewlett Packard Enterprise has a presence
Read more
View all