Operating System - HP-UX
1831506 Members
3314 Online
110025 Solutions
New Discussion

Cluster configuration problem

 
Jim Mulshine
Frequent Advisor

Cluster configuration problem

We have a 2-node cluster installed in two buildings. There is one computer and a rack with disk arrays per building, with everything connected using Fiber Channel, and LVM mirroring is used to mirror data on disk arrays in building A to those in building B.

Recently we discovered a flaw in our cluster configuration when we lost the electric power to the computer and disk arrays in building A. Instead of the cluster node in building B starting up the package that was running previously on the node A, node B TOC'ed itself. The problem is that the FIRST_CLUSTER_LOCK_PV is on the disk array in building A, so node B couldn't find it while disk array A was down. (If only node A goes down while the disk array A is running, then there is no problem, everything works as it should.)

I've read about the possibility of setting up a SECOND_CLUSTER_LOCK_PV which perhaps could be placed on one of the building B disk arrays. However, it seems to me that this could result in node A running all cluster packages on the disks in building A at the same time that node B runs the same cluster packages on the disks in building B in the case that we drop all heartbeat connections between the buildings (not so likely, but not impossible).

Does anyone know of a tried and proven solution to our problem?
2 REPLIES 2
melvyn burnard
Honored Contributor

Re: Cluster configuration problem

The scenario you mention with the 2 cluster lock discs is known as Split Brain Syndrome.
It is highly unlikely that you could get this, but the risc is definitely there especially in the Campus Cluster configuration you appear to have.
There are a few options you could look at such as ensuring you have multiple heartbeat connections via completely different routings, or looking at having a third node in a different building, that does not connect to the discs, but has all the networking connectivity between it and the other two nodes, and then removing the cluster lock disc.
Of course this also has a low risc factor of a major failure, e.g. you lose one node, the cluster reforms from a 3 node to a 2 node cluster, which does not need teh cl disk, but then you lose the arbitration node, which would result in the third and final node TOC'ing due to no cl disk. Having said that, you have then experienced an MPOF and not just a SPOF.
HTH
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Michael F. Dick
Advisor

Re: Cluster configuration problem

Hi,

having a cluster lock disk in the same room as one of the cluster nodes is a bad idea (you expierenced that)..anyway, your configuration is a campus cluster (and not a local cluster). have a 3rd room with independent power, and put a arbitrator node in that room. with that, you wouldn't need a lock disk. this machine doesn't need to be a big machine, just big enough to run the cluster software..

just a thought

Michael
Well, thats all just my $.02