cancel
Showing results for 
Search instead for 
Did you mean: 

MCSG LAN fail over issue.

Joe Short
Super Advisor

MCSG LAN fail over issue.

I have an MCSG (v11.16) cluster configured on 2 DL585 servers running RHEL AS 4 32 bit.
I have configured bonding for 2 of 3 NICs on each server, the third NIC is being used for dedicated heartbeat on a dedicated network.
With the package up and running on the promary server, I tested bonding by unplugging a NIC. The bond worked, and no fail over occured. however, when I unplugged the second NIC in the bond (only 2 NICs bonded) again no failover occurred. On HP-UX this would have triggered a failover of the package. When I completely disconnected all NICs on the primary server, the alternate server crashed and rebooted, but did not take the package.
Is this normal behavior on LINUX, or did I miss something? If so, what might I have missed?
6 REPLIES
Steven E. Protter
Exalted Contributor

Re: MCSG LAN fail over issue.

Shalom Joe,

You slightly misunderstand bonding in Linux. Unplugging one of the two bond cables does not result in a failure. Bonding defaults to active-passive on Linux and you can't force active-active except on intel nics.

So: If you unplug both cables, you should trigger a failover.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Joe Short
Super Advisor

Re: MCSG LAN fail over issue.

Steven,
That is my exact issue. Unplugging both cables does not trigger a failover. Based on my experience building clusters on HP-UX, I expected it would, but it did not. I am wondering if perhaps I am missing something, or there is another issue here.
Steven E. Protter
Exalted Contributor

Re: MCSG LAN fail over issue.

Ah!

Both cables and no failover.

What do the serviceguard logs say?

I would tend to think the failover LAN configuration is wrong or the NIC is bad or the card is plugged into the wrong lan.

My question now is can a NIC card be a failover LAN and a heartbeat LAN. My understanding of SG is that its either or, not both.

Please clarify.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Joe Short
Super Advisor

Re: MCSG LAN fail over issue.

If you define your NIC as HEARTBEAT_IP, both HB and Data can be used on that NIC. If it is defined as STATIONARY_IP, no HB is passed over it. What I have is 2 servers, each with 3 NICs. 2 NICs are in a bond, the third is on a separate network, that is simply between the 2 clustered servers. It is there to pass HB in the event the production network should go dark. If both bonded NICs on a server are unplugged, the cluster should respond by moving the package to the alternate server. In this case, that did not occur. My cluster config file is attached.
Joe Short
Super Advisor

Re: MCSG LAN fail over issue.

And the package log (I have a single package) and system log do not indicate anything out of the ordinary. I am wondering if the NODE_TIMEOUT parameter is set too high.
Joe Short
Super Advisor

Re: MCSG LAN fail over issue.

I had the bonding driver set incorrectly.
In /etc/modprobe.conf there is an options entry for the bond. it should read as follows

options bond0 miimon=100 mode=1

My was incorrect it read

options bond0 miimon=100 mode=0

Mode 1 is failover mode
Mode 0 is load balancing mode.