Operating System - HP-UX
1832310 Members
2696 Online
110041 Solutions
New Discussion

SG - Node reboot upon network cables disconnections

 
SOLVED
Go to solution
Farid Abizeid
Regular Advisor

SG - Node reboot upon network cables disconnections

Hi,

I would like to have your feedback regarding this issue:
2 Node SG, HP-UX 11i v1 cluster using 2 RP3440 and an EVA storage.
1 NIC for HB, two remaining NICS used for data (1 active, 1 Stand by)

Upon removing all 3 network connections from the stand by node, the active node reboots by itself, furthermore the cluster will not start but manually.

Your response is highly appreciated

Regards,
Farid
12 REPLIES 12
David Child_1
Honored Contributor

Re: SG - Node reboot upon network cables disconnections

Farid,

Refer to this document. It details a lot of senarios dealing with this;

UXSGLVKBAN00000010

(http://www2.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000062686681)

David
Farid Abizeid
Regular Advisor

Re: SG - Node reboot upon network cables disconnections

I am sorry David, I cannot find the link.
can you please verify

Thank you for your fast response

Farid
Farid Abizeid
Regular Advisor

Re: SG - Node reboot upon network cables disconnections

It is OK, I found it, the HB loss seems to be our case, will check that
I wonder if there are newer updates on this issue

Thank you David

Farid
John Poff
Honored Contributor
Solution

Re: SG - Node reboot upon network cables disconnections

Hi,

This is normal behavior. MC/SG will TOC a node if it cannot communicate with a majority of other cluster member. It does this to protect data integrity.

Try this link at the HP Docs site. It is a section in the Managing MC/ServiceGuard manual, which explains what you are seeing:

http://docs.hp.com/en/B3936-90073/ch03s07.html#d0e4517

JP
Farid Abizeid
Regular Advisor

Re: SG - Node reboot upon network cables disconnections

Thank you Guys for your great support

Farid
Farid Abizeid
Regular Advisor

Re: SG - Node reboot upon network cables disconnections

Going back to this subject
When we disconnect all network connections from node1, the TOC occurs on node2
which makes the system unavailable.

One would expect the TOC to happen on node1
leaving node2 operational.

Your opinions are highly appreciated.

Regards,
Farid

Kent Ostby
Honored Contributor

Re: SG - Node reboot upon network cables disconnections

Farid --

Here is how it works.

When there is a network disconnect between the nodes, the nodes attempt to "reform" the cluster.

Since neither node can communicate with each other they cannot form a cluster of a majority of the nodes (i.e. they cannot form a cluster of 2 of the 2 nodes).

Serviceguard, therefore, uses a tie-breaking system to ensure that both nodes don't try to access the data (and hence corrupt it).

The tie-breaker is known as the lock disk (some systems use a quorem server).

Once the nodes realize that they cannot talk to each other, they "race" to the lock disk to try to get it.

In your case, the node with the "failed" network gets to the lock disk first.

When node 2 gets to the lock disk, it sees that it's already owned by node 1.

To ensure that there is no data corruption, node 2 kills itself with the TOC.

This is standard Serviceguard behavior.

Best regards,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Farid Abizeid
Regular Advisor

Re: SG - Node reboot upon network cables disconnections

Thank you Kent for your reply
So this is by design, can you think of a workaround ? or you believe that loosing all network connections at the same time is not a real life scenario.. would appreciate further comments.

On the other hand if you disconnect all FC connections to the SAN storage system of Node1, Node2 still works but the package does not failover to node2
Do you have an interpretation for this ?

Best regards,
Farid
melvyn burnard
Honored Contributor

Re: SG - Node reboot upon network cables disconnections

>So this is by design, can you think of a workaround ?
Yes, you coul duse the serial heartbeat for a 2 node cluster, but there are issues with this.

>or you believe that loosing all network connections at the same time is not a real life scenario..
Well this is essentially a Multiple Points of Failure, which Serviceguard is generally not designed to cater for.
The suggested option here would be to have all lans set to be HEARTBEAT_IP.


>On the other hand if you disconnect all FC connections to the SAN storage system of Node1, Node2 still works but the package does not failover to node2. Do you have an interpretation for this ?
Yes, the network manager does not monitor FC disk interfaces. To monitor this you need to look at using EMS monitors.
But again, this is an MPOF....
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Farid Abizeid
Regular Advisor

Re: SG - Node reboot upon network cables disconnections

Thank you Melvyn for your answers
Do you have some official MC/SG test failure scenarios - this would be great then.

We will follow your suggestions to use EMS and all Network as Heartbeat_IP

Again thank you very much and best regards,
Farid
melvyn burnard
Honored Contributor

Re: SG - Node reboot upon network cables disconnections

Take a look at pages 320-323 of http://docs.hp.com/en/B3936-90079/B3936-90079.pdf

There are some suggested tests, and also a short discussion regarding using EMS monitoring.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Richard Perez
Valued Contributor

Re: SG - Node reboot upon network cables disconnections

Farid
"or you believe that loosing all network connections at the same time is not a real life scenario.. would appreciate further comments."

I would recommend looking at Quorum Server to decide which node remain up. With QS you will be sure that the node with IP connectivity will be up.