Operating System - Linux
1830119 Members
4400 Online
109998 Solutions
New Discussion

Re: bonding 10g rac failover

 
SOLVED
Go to solution
Rui Amaral_1
New Member

bonding 10g rac failover

Hi all,

Hope someone has a solution:

I have the following:

3 dl380 g3
6 nics per machine
RedHat AS 3 update 2
bcm5700-7.1.9e-1
e1000-5.2.16b-1
bonding-1.0.4o-1
Oracle 10g RAC 10.1.0.2/10.1.0.3
NetApp FAS 270c

I have all nics bonded successfully. And the system is stable. We are doing fault testing and are trying to mimic a complete loss to the bonds that manage the nfs storage or to the bond that manages the interconnect. We pull both cables. OS sees that the nics have faulted but the bond still remains and the database still runs or hangs. When we put the cables back it then the database faults and the system reboots.

The situation is we want the system to reboot when the all nics in a bond are gone. I know the document says that the bond will still be there when the nics fail but does someone have a way to make the bond fail in the scenario?

Thanks.
8 REPLIES 8
Steven E. Protter
Exalted Contributor

Re: bonding 10g rac failover

This is not an Oracle RAC issue, its a bonding issue.

Here is the doc I used to bond two Intel NIC cards. Note: If you are not using an Intel card or another card that supports bonding it won't work. You need to check to see if your NIC explicitly supports bonding.

http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/ref-guide/s1-modules-ethernet.html

It may say its bonded but still not work.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Rui Amaral_1
New Member

Re: bonding 10g rac failover

Sorry but yes I do know it's not a rac issue nor is it the nics (all nics support bonding - I checked months ago) as nic failover within the bond works fine - I have no problem with that at all.

The scenario is this:

Bond0 has 2 nics eth0 and eth1.

Eth0 fails and bond now uses eth1.
Eth1 fails and there is no active nic in the bond. Bond0 does not fail at all which causes the database to hang.

I have to manually bring down the bond which then causes the server to reboot (as expected within the rac environment).

So instead of me manually bringing the bond down is there a parameter that I can pass to the bonding driver? or some sort of other parameter I can use to cause the bond to fail if all nics are dead?
Johannes Krackowizer_1
Valued Contributor
Solution

Re: bonding 10g rac failover

hi Rui Amaral,

i have no experince with nic bonding under linux but i know a workaround. try to run this script by cron every minute or whatever you want as interval.

#!/bin/bash

ifconfig eth0 2>/dev/null | grep UP >/dev/null 2>/dev/null
$returnvalue=`echo $?`

if [ $returnvalue -ne 0 ]; then

ifconfig eth1 2>/dev/null | grep UP >/dev/null 2>/dev/null
$returnvalue=`echo $?`

if [ $returnvalue -ne 0 ]; then
ifconfig bond0 down
fi
fi
"First off, I'd suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it's a great symbolic gesture." (Linus Torvalds)
Rui Amaral_1
New Member

Re: bonding 10g rac failover

Thanks so much. That's what I was looking for!

Here's what I got after reviewing your response:

#!/bin/bash

#this little script is to check for the bond and associated slaves being up.

#bonding does not allow for the bond to fail even when both all slaces in the
#bond are down so this script will bring the bond down when all slaves are down.

#adding the sleep command to start the script so that the service network restarrt
#does not cause the script to drop the bond

sleep 60
echo "Slept for 1 minute"

sleep 60
echo "Slept for 2 minutes"

sleep 60
echo "Slept for 3 minutes"

ifconfig bond2 2>/dev/null | grep UP >/dev/null 2>/dev/null
valup=$?
echo $valup

# a value of 0 means that the card is up and a value of 1 means that the card is down

if [ $valup = 0 ]; then
echo 'Start test'

until [ $valup != 0 ]
do
sleep 10
# this is to check for the first slave in the bond
ifconfig eth4 2>/dev/null | grep UP >/dev/null 2>/dev/null
valeth4up=$?
echo "A Value of 0 means the card is up"
echo "State of eth4"
echo $valeth4up

#this is to check for the second slave in the bond

ifconfig eth5 2>/dev/null | grep UP >/dev/null 2>/dev/null
valeth5up=$?
echo "State of eth5"
echo $valeth5up

if [ $valeth4up != 0 ]; then
echo "First step :"
echo $valeth4up

if [ $valeth5up != 0 ]; then
echo "Second step might kill the bond"
echo $valeth5up
ifconfig bond2 down
fi
fi
done
fi


Do you think this will do the trick? or am I missing something?
Johannes Krackowizer_1
Valued Contributor

Re: bonding 10g rac failover

hi rui amaral,

i think the script should work but i have some little things to say about the script:

1) let the script run by cron so you don't need to sleep for 180 seconds. add the following line to /etc/crontab:
*/3 * * * * root /path/scriptname

2) don't echo anything in the script because if the script echos anything an e-mail will be generated every 3 minutes. echo only if both nic's fail so you get an e-mail if the server restarts.

3) why don't you do a 'shutdown -r now' when both nic's fail instead of 'ifconfig bond2 down'?

best regards,

johannes
"First off, I'd suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it's a great symbolic gesture." (Linus Torvalds)
Rui Amaral_1
New Member

Re: bonding 10g rac failover

Ah great. Thanks. All good points that I will implement except for point 3.

I just do the ifconfig down because I let the oracle cluster software determine when the node reboots so that the nodes can do a quick reconfig on their own before the problem one comes down.
Johannes Krackowizer_1
Valued Contributor

Re: bonding 10g rac failover

hi rui amaral,

it would be nice if you asign some points to the answers that helps you.

thanks,

johannes
"First off, I'd suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it's a great symbolic gesture." (Linus Torvalds)
Rui Amaral_1
New Member

Re: bonding 10g rac failover

Oh Yes. Sorry. I haven't gotten then hang of this set up yet.