1752643 Members
5871 Online
108788 Solutions
New Discussion юеВ

Infiniband bonding

 
SOLVED
Go to solution
Florian Heigl (new acc)
Honored Contributor

Infiniband bonding

Hi all,

quite some time since I last posted here... :(

I'm looking for someone who has working experience on RHEL and Infiniband, for some advice - I have a Cisco Infinband Gateway / Switch and a few servers with SDR/DDR hcas in them.

In the Centos5.4 relnotes I see that the "new" infinband bonding module now actually does come with loadbalancing / multipath support for IPoIB, what means I could really push over 20Gbit/s out and into the servers. Unfortunately, the whole thing is as undocumented as it gets.

First things first - right now I want to just build some IB bonding interface to test with, using the stock tools in RHEL, but even for that I find the documentation totally contradictive.

I wonder if one of you can tell me where to find a good documentation on the bonding bit, or has another hint.

I'm sure I can go the rest of the way from there :)

Regards,
Flo
yesterday I stood at the edge. Today I'm one step ahead.
7 REPLIES 7
Tim Nelson
Honored Contributor
Solution

Re: Infiniband bonding

For active/passive/failover bond mode.
(my voltaire switches do not support active/active/balanced mode.)

/etc/modprobe.conf
alias bond0 bonding


/etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
IPADDR=10.10.1.8
NETMASK=255.255.255.0
BROADCAST=10.10.1.255
ONBOOT=YES
BOOTPROTO=none
USERCTL=no
TYPE=Bonding
MTU=65520
BONDING_OPTS=" mode=1 miimon=100 primary=ib2"


/etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0
USERCTL=no
ONBOOT=yes
MASTER=bond1
BOOTPROTO=none
SLAVE=yes
TYPE=InfiniBand
HOTPLUG=no
CONNECTED_MODE=yes
MTU=65520


/etc/sysconfig/network-scripts/ifcfg-ib2
DEVICE=ib2
USERCTL=no
ONBOOT=yes
MASTER=bond1
BOOTPROTO=none
SLAVE=yes
TYPE=InfiniBand
PRIMARY=yes
HOTPLUG=no
CONNECTED_MODE=yes
MTU=65520



cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: ib2
Currently Active Slave: ib2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: ib0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:00:48:fe:80

Slave Interface: ib2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:00:48:fe:80
Florian Heigl (new acc)
Honored Contributor

Re: Infiniband bonding

10pts just for CONNECTED_MODE=yes! :)

I'll post my results tomorrow.
yesterday I stood at the edge. Today I'm one step ahead.
Tim Nelson
Honored Contributor

Re: Infiniband bonding

yep.. i had issue with this as the openib doc stated to enter this in the /etc/infiniband/openib.conf

this did not work..

after many moons of searching I found in /etc/sysconfig/ifup-ib script that CONNECTED_MODE was set in the ifcfg-ibX files.

Florian Heigl (new acc)
Honored Contributor

Re: Infiniband bonding

Wow, this is a whole new area to learn about and find errors in!

after correcting the typo in your config (device=bond0 in bond1 config file) I managed to re-type it into my own config.

that meant the bond0 ethernet bond switched, among other things it's mode.

that's all sorted now and the infiniband bond1 looks good. I can't ping through it though.

one thing, does yours also say it is the "ethernet channel bonding driver"?


[root@waxh0003 ~]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac)
Primary Slave: ib0
Currently Active Slave: ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: ib0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:04:04:fe:80

Slave Interface: ib1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:04:05:fe:80


While pinging I can see the counters for ib0 going up (both rx and tx) and ib1, being a good girl, waits for a failover event.

next thing I'll check now is the connected mode setting on the other host, maybe that's all that's to blame. Thanks so much for your help, I'm not sure I'd ever have noticed the ifup-ib file to start with.
yesterday I stood at the edge. Today I'm one step ahead.
Tim Nelson
Honored Contributor

Re: Infiniband bonding

sorry about any errors...

my configuration has bond0 as an ethernet bond and bond1 as an infiniband bond.

I attempted to correct the bond1 to bond0 referencences in my post in order to simplify / not have to explain the ethernet.. sorry bout that.. should have just left it..

Florian Heigl (new acc)
Honored Contributor

Re: Infiniband bonding

Thats great - our setup seems identical by all but the switch vendor.

unfortunately, I can't ping though ;))

do you see anything wrong with my modprobe.conf here?


alias bond1 bonding
alias ib0 ib_ipoib
alias ib1 ib_ipoib

alias bond0 bonding
alias eth0 e1000
alias eth1 e1000e

options bonding max_bonds=4

alias scsi_hostadapter ahci
alias scsi_hostadapter1 usb-storage

options netloop nloopbacks=0
yesterday I stood at the edge. Today I'm one step ahead.
Tim Nelson
Honored Contributor

Re: Infiniband bonding

Is there an IP stack on your bond interface ?
can you ping locally but not remote server on fabric ? if so then i would guess it is a ib switch routing issue. There are some ib diag utils that can be installed which can help view your fabric and port connections.
infiniband-diag

modprobe is below, nothing special.

alias bond0 bonding
options bonding max_bonds=2
alias bond1 bonding
primary=ib0
alias eth0 bnx2
alias eth1 bnx2
alias eth2 bnx2
alias eth3 bnx2
alias scsi_hostadapter cciss
alias scsi_hostadapter1 ata_piix
alias scsi_hostadapter2 qla2xxx
alias scsi_hostadapter3 usb-storage
options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=10 ql2xloginretrycount=30