Operating System - HP-UX
1837524 Members
3954 Online
110117 Solutions
New Discussion

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

 
jithu_1
Occasional Contributor

FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

Hello Folks,
I was hoping someone with lots of MC/SG expertise would answer this.

In a 2 node cluster, can we have multipe lock PV's? what are the advantages and dis-advantages?

A 2 node cluster failed and here is the answer I got from a sys-admin.
"The inability of PROD1(server) to obtain the cluster lock disk was due to the fact that the only path to that disk was through an odd director(adapter)(EMC symmetrix). Device c2t0d3 was part of a volume group and protected by pvlinks. The cluster lock device is accessed at the device file level below the lvm layer, thus negating any alternate paths for that device"

Is this true, if so any way to negate this in future.

thanks in advance
jithu
9 REPLIES 9
melvyn burnard
Honored Contributor

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

This is correct, the cluster lock disc is seen using the physical device, e.g. /dev/dsk/c1t6d0
so you cannot see this disk via the alternate link.
You can configure 2 Cluster lock disc maximum, but this is generally inadvisable for various reasons, and is normally only done when using Campus Cluster comnfiguration, or when the single cluster lock disc is powered from the same source as one of the nodes.

A more important question is to discover and rectify why the system could not see the disc via the correct path.

HTH

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Vladislav Demidov
Honored Contributor

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

Hello,
It is just quatation from MS/SG manual:
A dual lock disk does not provide a redundant cluster lock. In fact, the dual lock is a compound lock. This means that two disks must be available at cluster formation time rather than the one that is needed for a single lock disk. Thus, the only recommended usage of the dual cluster lock is when the single cluster lock cannot be isolated at the time of a failure from exactly one half of the cluster nodes. If one of the dual lock disks fails, ServiceGuard will detect this when it carries out periodic checking, and it will write a message to the syslog file. After the loss of one of the lock disks, the failure of a cluster node could cause the cluster to go down.
Sridhar Bhaskarla
Honored Contributor

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

Hi Jithu,

A typical MC/ServiceGuard setup is fully comprised of redundant components.

Your SA is precisly correct in his "negating" statement. We specify the lock disk by it's device file and it will be accessed only through the device file. It will not look for the alternate link in case of link failure.

You can configure two lock disks. However, HP strongly recommends to have a single lock disk wherever is possible. If you are planning to configure two locks, you need to have them seen on two seperate controllers.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
oiram
Regular Advisor

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

Hi,

Have you think in using RAID?, in this way you can acces to the lock disk altough one disk were broken.
Using two lock disks is only recommended in some special situations. For example a campus cluster or nodes only with internal disks( a power cut in the node with the lock disk would imply a TC in the other node because it can??t access to the lock disk). If you use two lock disk you can find in a situation in which the heartbeat between the nodes is lost but the two nodes are up, each node would take a lock disk and both nodes would be working at the same time.

Best regards.
Ila Nigam
Occasional Contributor

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

I have a 2 node cluster in our env. with 2 cluster lock disk each with pvlinks. This basically depends upon your cluster requirement. For more information go to:

http://docs.hp.com/hpux/ha/index.html#ServiceGuard%20OPS%20Edition%20(MC/LockManager).

Did the cluster fail coz of tie-breaker???
jithu_1
Occasional Contributor

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

To add more info,
node1 is the primary, node1 has access to the cluster lock disk via Host bus adapter(HBA) which is connected to the odd numbered fibre adapter(FA) on EMC symmetrix. when the FA on Symmetrix failed, node1 had excessive I/O errors even though all the
disks have PVLINKS via even numbered FA's. but the lock disk was on the odd FA adapter(it had pvinks, but no use in this case), then node1 was shut and node2 was brought-up, node couldn't get
access to the lock disk and
hence the cluster failed.

would 2 lock disks, one on even FA and another on odd numbered FA solve this problem in future??

TIA
jithu
melvyn burnard
Honored Contributor

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

the answer is probably yes, but it would not give you 100% guarantee.
You also run the remote risk of encountering split-brain syndrome in this configuration, so you haveto take that into account.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
jithu_1
Occasional Contributor

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

can ayone explain what is split brain syndrome and in what circumstances does this happen in a 2 node cluster.

TIA
jithu

Re: FIRST_CLUSTER_LOCK_PV and issues with Lock device in MC/SG

Split brain syndrome is where more than one node in the cluster beleieve they are running a package, and both attempt to activate the VGs, mount the file systems and start the application. Split brain syndrome will usually end in corrupt application data, and broken file systems.

This is why HP use cluster lock disks in MCSG, as when all network connections between two nodes fail (either cos the network is really down, or cos one machines power supply has failed), a race for the cluster lock disk establishes ownership of the clusters packages, and causes the losing node to do a TOC to avoid data corruption.

Why does two cluster lock disks increase the chance of split brain syndrome? Well a situation could arise where both nodes win a race to different cluster locks, and both think they own the application.

If you really want to avoid the situations brought on by having only one cluster lock (like the problem you suffered), then you should look at implmenting an arbitrator node instead of using cluster locks. This requires another physically seperated node on seperate power supplies from the live nodes, but which has connections to networks used by the other nodes.

HTH

Duncan

I am an HPE Employee
Accept or Kudo