Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

 
SOLVED
Go to solution

Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

Cluster configuration file:
CLUSTER_NAME cluster1

NODE_NAME holmes
CLUSTER_LOCK_LUN /dev/cciss/c0d1
NETWORK_INTERFACE bond0.7
HEARTBEAT_IP 172.18.7.25
NETWORK_INTERFACE bond0.11
HEARTBEAT_IP 192.168.11.2

NODE_NAME watson
CLUSTER_LOCK_LUN /dev/cciss/c0d1
NETWORK_INTERFACE bond0.7
HEARTBEAT_IP 172.18.7.26
NETWORK_INTERFACE bond0.11
HEARTBEAT_IP 192.168.11.1

HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 2000000

AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000

MAX_CONFIGURED_PACKAGES 10


Ifconfig holmes:
bond0.7 Link encap:Ethernet HWaddr 00:11:85:ED:82:21
inet addr:172.18.7.25 Bcast:172.18.7.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:36053 errors:0 dropped:0 overruns:0 frame:0
TX packets:36951 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3807125 (3.6 Mb) TX bytes:3754920 (3.5 Mb)

bond0.11 Link encap:Ethernet HWaddr 00:11:85:ED:82:21
inet addr:192.168.11.2 Bcast:192.168.11.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:37 errors:0 dropped:0 overruns:0 frame:0
TX packets:47 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2613 (2.5 Kb) TX bytes:4447 (4.3 Kb)

Ifconfig Watson:
bond0.7 Link encap:Ethernet HWaddr 00:11:85:EA:CE:CE
inet addr:172.18.7.26 Bcast:172.18.7.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:17917 errors:0 dropped:0 overruns:0 frame:0
TX packets:18581 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2126293 (2.0 Mb) TX bytes:2331196 (2.2 Mb)

bond0.11 Link encap:Ethernet HWaddr 00:11:85:EA:CE:CE
inet addr:192.168.11.1 Bcast:192.168.11.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:47 errors:0 dropped:0 overruns:0 frame:0
TX packets:37 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3831 (3.7 Kb) TX bytes:3085 (3.0 Kb)


Hosts can ping eachother, have the bond0.11 interfaces in the hosts file as holmes-hb and watson-hb. Hosts can also rsh rexec rlogin to eachother.


Checking cluster file: /etc/cmcluster/archief.ascii
Note : a NODE_TIMEOUT value of 2000000 was found in line 118. For a
significant portion of installations, a higher setting is more appropriate.
Refer to the comments in the cluster configuration ascii file or ServiceGuard
manual for more information on this parameter.
Checking nodes ... Done
Checking existing configuration ... Done
Warning: Can not find configuration for cluster cluster1
Gathering configuration information ... Done
Gathering configuration information ..
Gathering storage information ..
Found 1 devices on node holmes
Found 1 devices on node watson
Analysis of 2 devices should take approximately 1 seconds
0%----10%----20%----30%----40%----50%----60%----70%----80%----90%----100%
.
Gathering Network Configuration .................. Done

Error: Network interface bond0.11 cannot be found on node holmes
Error: Network interface bond0.11 cannot be found on node watson
Warning: Network interface bond0.11 on node holmes couldn't talk to itself.
Warning: Network interface bond0.11 on node watson couldn't talk to itself.
cmcheckconf : Unable to reconcile configuration file /etc/cmcluster/archief.ascii
with discovered configuration information.
10 REPLIES
Steven E. Protter
Exalted Contributor

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

Is bonding working correctly?

ifconfig bond0

or

ifconfig | more

# look for bond.

Make sure the interfaces are pinging.

To get this to work on my Fedora installation, I had to add a hack line to the network startup script. Every time I upgrade that hack is erased.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Serviceguard for Linux
Honored Contributor

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

It may be that since bond0.7 and bond0.11 are using the same HW and are on different subnets, SG is getting confused.

From the "managing" manual:
"IP addresses are configured only on each primary network interface card. Multiple IP addresses on the same network card must belong to the same IP subnet."

Is there any reason you are doing this?

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

It's a little complex, and we were even advised by an on-site HP consultant this would work.

We have 2 network adapters, and we want it to be redundant, which is why we use bonding (Which gives you one network adapter, bond0). For ServiceGuard you need at least 2 networks (HB and normal network) so HP advised us to VLAN the bonding interface. In theory this should work. In practice it does not.

Our initial plan was to use the ppp0 interfaces (through a null-modem cable on the serial port) but HP advised against this (the machines are in the same rack, so distance is not a problem).

I think to solve this problem, we will revive the PPP plan. That means we will have ppp0 and bond0.

Robert

Robert

Serviceguard for Linux
Honored Contributor

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

If you only have 2 network interfaces, then it is acceptable to just bond those 2 into a single bond. It is recommended to have another (separate) connection for heartbeat. This is in case the traffic over the bonded pair were so great as to cause you to lose heartbeat messages. So it is all targeted at HA. With the bond, you will not lose connection if a single NIC fails.

Setting up a VLAN doesn't help. And you still have the problem of a subnet that can't be monitored. Go back to a single bond. If you have the opportunity to add another NIC HBA, then it is best to bond that to one of the built-in NICs (I assume there are two built-in NICs in your servers.)
melvyn burnard
Honored Contributor

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

If you only have two interfaces then you only create a single bond, bond0.
This will give you the redundancy of network failover, an d if a network cable or NIC fails, this will do a local lan switch.
The problem does exist that you now only have one subnet, and hence one heartbeat. To create a better configuration, you could add another NIC in each node and have them just as a heartbeat lan.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

Finally managed to get the ppp0 interface to work, the answer was here (pins 1,6 and 4 are to be connected):
http://sunportal.sunmanagers.org/pipermail/summaries/2003-March/003317.html

But now we have hit this:
[root@watson /]# cmcheckconf -v -C /etc/cmcluster/archief.ascii

Checking cluster file: /etc/cmcluster/archief.ascii
Checking nodes ... Aborted
[root@watson /]# echo $?
134




Configuration:
watson-hb, holmes-hb can reach eachother. Ping rlogin and other services work.
watson, holmes can reach eachother. Ping rlogin and other services work.


CLUSTER_NAME cluster1

NODE_NAME holmes
NETWORK_INTERFACE bond0
HEARTBEAT_IP 172.18.7.25
NETWORK_INTERFACE ppp0
HEARTBEAT_IP 192.168.12.1
CLUSTER_LOCK_LUN /dev/cciss/c1d0

NODE_NAME watson
NETWORK_INTERFACE bond0
HEARTBEAT_IP 172.18.7.26
NETWORK_INTERFACE ppp0
HEARTBEAT_IP 192.168.12.2
CLUSTER_LOCK_LUN /dev/cciss/c1d0

HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 20000000

AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000

MAX_CONFIGURED_PACKAGES 10


on holmes & watson, this is in the /etc/hosts file:
127.0.0.1 localhost.localdomain localhost
192.168.12.2 watson-hb.ensdu.nl watson-hb
192.168.12.1 holmes-hb.ensdu.nl holmes-hb
172.18.7.25 holmes.ensdu.nl holmes
172.18.7.26 watson.ensdu.nl watson

[root@watson /]# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:11:85:EA:CE:CE
inet addr:172.18.7.26 Bcast:172.18.7.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:4233 errors:0 dropped:0 overruns:0 frame:0
TX packets:4104 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2251020 (2.1 Mb) TX bytes:408420 (398.8 Kb)

[root@watson /]# ifconfig ppp0
ppp0 Link encap:Point-to-Point Protocol
inet addr:192.168.12.2 P-t-P:192.168.12.1 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:2742 errors:1 dropped:0 overruns:0 frame:0
TX packets:1975 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:3
RX bytes:145633 (142.2 Kb) TX bytes:219719 (214.5 Kb)

[root@holmes root]# ifconfig bond0
bond0 Link encap:Ethernet HWaddr 00:11:85:ED:82:21
inet addr:172.18.7.25 Bcast:172.18.7.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:9487 errors:0 dropped:0 overruns:0 frame:0
TX packets:10453 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2822777 (2.6 Mb) TX bytes:1338871 (1.2 Mb)

[root@holmes root]# ifconfig ppp0
ppp0 Link encap:Point-to-Point Protocol
inet addr:192.168.12.1 P-t-P:192.168.12.2 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:1976 errors:0 dropped:0 overruns:0 frame:0
TX packets:2744 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:3
RX bytes:220238 (215.0 Kb) TX bytes:145737 (142.3 Kb)

modules.conf on both nodes:
alias eth0 tg3
alias eth1 tg3
alias scsi_hostadapter cciss
alias usb-controller usb-uhci
alias usb-controller1 ehci-hcd
alias md-personality-7 multipath
alias bond0 bonding
options bonding miimon=100 mode=1

[root@watson bond0]# cat /proc/net/bond0/info
Bonding Mode: fault-tolerance (active-backup)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Multicast Mode: all slaves

Slave Interface: eth0
MII Status: up
Link Failure Count: 0

Slave Interface: eth1
MII Status: up
Link Failure Count: 2

[root@holmes root]# cat /proc/net/bond0/info
Bonding Mode: fault-tolerance (active-backup)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Multicast Mode: all slaves

Slave Interface: eth1
MII Status: up
Link Failure Count: 1

Slave Interface: eth0
MII Status: up
Link Failure Count: 1


Link failure counts are from configuration of switches during server uptime.

melvyn burnard
Honored Contributor

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

What does a cmquerycl -v -C test.ascii -n holmes -n watson reveal when you examine th etest.ascii file and look at the node network information?

To my knowledge we do not support using ppp for Serviceguard
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

Melvyn,

in the meantime I've made a case with HP Support Netherlands, it seems that neither PPP or VLAN over Bonding are supported by Linux >:-( The funny thing is, we had a RHCE HP consultant who said PPP was possible but not advisable, and who also told us that Vlanning over Bonding should be possible. Let's just say we are not too happy with HP consultancy at the moment. (and that's putting it mildly)

We have since made it work by adding another ethernet adapter. We were trying to save the PCI slot associated with this for future expansion.

Robert
melvyn burnard
Honored Contributor
Solution

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

Oh dear. Yes it seems you were given bad advice in this instance.
I certainly would have said that this could not be done when planning a Serviceguard on Linux cluster.
All I can do is give a small apology on behalf of HP
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Re: Linux RHES3U4 abd SG 11.15 don't seem to accept 2 bonding interfaces

Thanks Melvyn,

we now have the following output, using Bond0 and eth2:

[root@watson cmcluster]# cmcheckconf -C archief.ascii

Begin cluster verification...
Adding node holmes to cluster cluster1.
Adding node watson to cluster cluster1.

Verification completed with no errors found.
Use the cmapplyconf command to apply the configuration.


Thanks to all people here who have given patient advice!

Kind Regards,

Robert Campbell