MC Service Guard 11.18 - Single Node Cluster - Lan recovery after dual failure

Michele Albertone · ‎10-28-2011

Hi,

I was testing our configuration of MC/SG 11.18 (on B.11.31) and have a question about the expected behavior of one scenario.

We have a single node cluster made by an rx2800 node, with a subnet supporting the package and using two interfaces (lan1 Primary and lan0 Standby)

Test was performed with both interfaces connected to the same switch.

Single failure (tested unplugging one of the cables) worked fine, with local IP switch as expected, with Primary taking the IP when possible and switching to Standby when needed.

Question refers to the dual failure scenario, once we unplugged both cables. Package went down as expected and being a single node stayed down. I was surprised when re-connecting one of the cables nothing happened, the syslog did not report the interface as available and cmvewcl was showing BOTH interfaces as DOWN.

After plugging in the second cable the situation went to normal.

After making some more tests we saw that after two cables are disconnected, and only one is connected back you have two scenarios

a) If the fist cable to be re-connected did not host the IP once the two cables were disconnected, the LAN is not identified as available

b) If the first cable to be reconnected hosted the IP then the interface is identified as available, BUTonly after attempting an IP conncetion (either a PING or a telnet/SSH) to it.

I understand when both interfaces are up one is identified as poller and start sending to the other some packets to ensure the other is healthy. Once a driver error is reported for an interface, that interface is marked as bad, and appropriate action is taken depending on the configuration.

But what happens after both available Lan interfaces are down and then one (only one) is brought up?

This way the polling won't help since the far end won't respond, but still we would have the connectivity available.

Am I missing anything obvious?

Thanks as usual!

Mike

John Bigg · ‎11-09-2011

You are not missing anything. Unfortunately this is a double failure that the product is not designed to protect you from. The product does not handle the situation where all interfaces on a single bridged net in the cluster go down. If there are no pollers left, polling stops and does not start again without external influences. You should find that if you use linkloop to manually induce traffic the lans will recover. You will need to send several messages over a few seconds for the increase in statistics to be enough to recover the lan.

You really need to configure the cluster such that there is no single point of failure which allows both lan interfaces to fail at the same time or risk this scenario.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

MC Service Guard 11.18 - Single Node Cluster - Lan recovery after dual failure

MC Service Guard 11.18 - Single Node Cluster - Lan recovery after dual failure

Re: MC Service Guard 11.18 - Single Node Cluster - Lan recovery after dual failure