Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

Load balancing bonding and DEADMAN timeout under SG

SOLVED
Go to solution
Rui Vilao
Regular Advisor

Load balancing bonding and DEADMAN timeout under SG

Hi,

We are having serious problems with a Serviceguard (A.11.16.02-0) cluster running under Red Hat Enterprise Linux AS 4.0 (x86_64) on two DLG585 G1 boxes.

Basically from time to time the server hangs and reboots by ASR timeout after 10 minutes. When it starts up, the bond network fails.

The network bonding mode I have configured is to work with “balance-tlb” (mode=5). In read that SG does not support load balancing mode.

1. Is this still true with Serviceguard A.11.16.02 and Red Hat 4?


When the server hangs it reboots by ASR timeout and not by DEADMAN timeout.

2. What is the value of the DEADMAN timeout?

By the way I have NODE_TOC_BEHAVIOR="reboot"


Any help/suggestion is highly appreciated.

TIA.

Kind Regards,

Rui Vilao.
"We should never stop learning"_________ rui.vilao@rocketmail.com
2 REPLIES
Serviceguard for Linux
Honored Contributor
Solution

Re: Load balancing bonding and DEADMAN timeout under SG

1 You said 4.0 If this is the base RedHat 4, it is not supported. The certification matrix shows RH4 support starting with Update 1.

See ftp://ftp.compaq.com/pub/products/servers/ha/linux/svcguard-certmatrix.pdf

The Deadman driver timeout varies based on heartbeat timeouts. But, if the hang is that "hard" the deadman driver will not run at all. The deadman driver is to catch the system after if "unhangs" so it causes no problems. So ASR rebooting the system is to be expected in some cases. The key thing from a Serviceguard perspective is, did the packages fail over.

I think the ASR timeout is configurable, so drop it to a lower value if you wish.

On bonding - if I remember modes 0 and 1 are supported. I believe this is in the docs. Try a search for "mode" in acrobat. Too late here for me to double check.

Rui Vilao
Regular Advisor

Re: Load balancing bonding and DEADMAN timeout under SG

Thanks!
"We should never stop learning"_________ rui.vilao@rocketmail.com