cancel
Showing results for 
Search instead for 
Did you mean: 

ServiceGuard on Linux Problem

Kevin Feret_2
Occasional Visitor

ServiceGuard on Linux Problem

Hi,

I have a problem with SG where a node (mysteriously) reboots with no errors in /var/log/messages. The other nodes in the cluster attempt to update the cluster membership after a node timeout is detected from the node reboot and within 2 minutes, each node in the cluster automatically reboots which I can only attribute to some type of SG issue. I'm running RedHat 2.4.18-19.7.xsmp with SG for RedHat A.11.14.02-99 i386. Has anyone encountered this same problem?

Thanks

Kevin
4 REPLIES
Serviceguard for Linux
Honored Contributor

Re: ServiceGuard on Linux Problem

Kevin,
You may want to post a little more info. Examples are the number of nodes in the cluster, type of shared storage, whether or not there is a QS, and snippets of the logs of the systems around the time of failure.
The symptom seems to be that the cluster cannot reform after the loss of the first system.
I'm assuming that you don't have support, otherwise you would have logged a support call.

Rick
Kevin Feret_2
Occasional Visitor

Re: ServiceGuard on Linux Problem

 

Re: ServiceGuard on Linux Problem

Are we to assume the cluster was up and had formed? or is this upon starting the cluster/
I would suggest you have either Networking issues/config problems, or authorisation issues.
Check you rconfiguration thoroughly.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Kevin Feret_2
Occasional Visitor

Re: ServiceGuard on Linux Problem

Hi,

The SG cluster is formed and runs for about 30 to 40 days before SG has problems. I did find a power plug issue that was resolved (vibration from cooling fans were causing the plugs to wiggle out of the socket), but as the log snippits suggest, SG was not able to adjust the cluster membership and take control of the failed nodes packages before the rest of the node decided to reboot themselves. I'm looking into possible network problems, but everything still points to SG having problems. Does anyone have suggestions for either network monitoring tools on Linux or SG configuration changes based on the info that I posted on the last messages?

Thanks

Kevin