Operating System - HP-UX
1833878 Members
1666 Online
110063 Solutions
New Discussion

Re: Server TOC with lan connection lost

 
Benoît
Regular Advisor

Server TOC with lan connection lost

Hi,

Theres is our configuration :
NODE_NAME smcpr11
NETWORK_INTERFACE lan11
HEARTBEAT_IP 10.100.219.1
NETWORK_INTERFACE lan0
HEARTBEAT_IP 10.113.1.1
NETWORK_INTERFACE lan1
NETWORK_INTERFACE lan9
HEARTBEAT_IP 10.100.201.1
NETWORK_INTERFACE lan10

FIRST_CLUSTER_LOCK_PV /dev/dsk/c14t0d6SECOND_CLUSTER_LOCK_PV /dev/dsk/c25t8d6

NODE_NAME smcpr21
NETWORK_INTERFACE lan0
HEARTBEAT_IP 10.113.1.2
NETWORK_INTERFACE lan1
NETWORK_INTERFACE lan9
HEARTBEAT_IP 10.100.201.2
NETWORK_INTERFACE lan10
NETWORK_INTERFACE lan11
HEARTBEAT_IP 10.100.219.2

lan11 is direct connect between nodes with Giga Eth Fiber.

When we cut at the same time lan9 and lan0, server smcpr11 (node1) TOC.
Cluster got down.
Then package restarted on node2 (smcpr21).

Where should we look?
lan11 should have ensure connection between nodes, isn't it?

Thanks for your advices.

Ben
3 REPLIES 3
melvyn burnard
Honored Contributor

Re: Server TOC with lan connection lost

1) check the syslogs on BOTH nodes for the time of the failure

2) check the package log on the node that failed

3) verify the network information is indeed as you expect by running cmscanl and then checking the linkloops and what the binary says about heartbeat lans, backups etc.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Mel Burslan
Honored Contributor

Re: Server TOC with lan connection lost

Ben,

This is not exactly going to explain you why your cluster failed but, it looks like either of the two lan connections, i.e., lan0 or lan9 were being used as the primary heartbeat lan. When you severe the heartbeat, your primary node, to prevent the split-brain syndrome, decided, for the better of the application availability, TOC itself letting the app move over to the secondary node.

Serviceguard really does not like yanking lan connections, even jut to test the resilliency. I have seen this happening many many times. Lan goes down and the primary node TOCs. Nothing special about your case.

Also, I have read here on the forums again, lan0 on the newer family of servers, I believe especially on itanium servers, is not a lan supported under serviceguard. As you did not indicate your server or OS type, I thought this might be relevant to your case.

HTH.
________________________________
UNIX because I majored in cryptology...
Benoît
Regular Advisor

Re: Server TOC with lan connection lost

Hi,

Sorry for answering late.

The problem as been solved after change some package dependencies on networks.

Thanks for your help.