Operating System - HP-UX
1835388 Members
3157 Online
110078 Solutions
New Discussion

Service Guard Behaviour on reboot.

 
SOLVED
Go to solution
Ian Killer_1
Regular Advisor

Service Guard Behaviour on reboot.

Hi experts...

We had a situation last week, where our juniour admins were paged for a problem on a member node of a cluster. Attempts to connect to the machine were failing for "unable to fork" by the ssh daemon, one of them happenned to have a session open and the consensus was to reboot the box. The command issued was simply "reboot". I would have expected the cluster to failover packages on the node to the remaining members, when the heartbeat timeout expired. This didn't happen. Can you tell me why?

What is the MC/SG response to a reboot of a member node?
Where ever the gypsies rome.
7 REPLIES 7
Ian Killer_1
Regular Advisor

Re: Service Guard Behaviour on reboot.

Here are the pertinent syslog entries....
2 node cluster (rysap108 and 109) this syslog is from 108. 109 was the node that got rebooted.
______________________________________________
Sep 5 10:30:16 rysap108 cmcld: Communication with node rysap109 has been interrupted
Sep 5 10:30:16 rysap108 cmcld: Node rysap109 may have died
Sep 5 10:30:16 rysap108 cmcld: Attempting to adjust cluster membership
Sep 5 10:30:19 rysap108 cmcld: Obtaining First Dual Cluster Lock
Sep 5 10:30:26 rysap108 cmcld: Obtaining Second Dual Cluster Lock
Sep 5 10:30:27 rysap108 cmcld: Turning off safety time protection since the cluster
Sep 5 10:30:27 rysap108 cmcld: may now consist of a single node. If ServiceGuard
Sep 5 10:30:27 rysap108 cmcld: fails, this node will not automatically halt
Sep 5 10:30:49 rysap108 cmcld: 1 nodes have formed a new cluster, sequence #22
Sep 5 10:30:49 rysap108 cmcld: The new active cluster membership is: rysap108(id=1)
Sep 5 10:51:46 rysap108 cmcld: New node rysap109 is joining the cluster
Sep 5 10:51:46 rysap108 cmcld: Attempting to adjust cluster membership
Sep 5 10:51:48 rysap108 cmcld: Enabling safety time protection
Sep 5 10:51:48 rysap108 cmcld: Clearing First Dual Cluster Lock
Sep 5 10:51:49 rysap108 cmcld: Clearing Second Dual Cluster Lock
Sep 5 10:51:49 rysap108 cmcld: 2 nodes have formed a new cluster, sequence #23
Sep 5 10:51:49 rysap108 cmcld: The new active cluster membership is: rysap108(id=1), rysap109(id=2)
Sep 5 10:52:45 rysap108 cmcld: Request from node rysap109 to start package ryprtsv9 on node rysap109.
Sep 5 10:52:52 rysap108 cmcld: (rysap109) Started package ryprtsv9 on node rysap109.
Sep 5 10:55:45 rysap108 cmcld: Enabled switching for package ryprtsv9.
Where ever the gypsies rome.
Ashwani Kashyap
Honored Contributor
Solution

Re: Service Guard Behaviour on reboot.

on failure of a node in a cluster , the packages on the failed nodes are failed over to other nodes in the cluster provided adoptive nodes for the packages are properly defined in the config files and switching is turned on for the package .
Martin Johnson
Honored Contributor

Re: Service Guard Behaviour on reboot.

Post the results of "cmviewcl"

Marty
Dietmar Konermann
Honored Contributor

Re: Service Guard Behaviour on reboot.

Hi!

Assuming a correct package configuration the only reason I could think of is package switching (AUTO_RUN) being disabled.

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Ian Killer_1
Regular Advisor

Re: Service Guard Behaviour on reboot.

Thanks ashwani. Do you know why rysap108 decided not to start the package? See between 10:30 and 10:50 above... Nothing happens.

I've checked both package ascii files and the adoptive nodes are fine.
Where ever the gypsies rome.
Ian Killer_1
Regular Advisor

Re: Service Guard Behaviour on reboot.

I was on a plane when it happenned but I found out today that they had done this before and not told me. There is a possibility that when the package was restarted the previous time that no one cmmodpkg -e'd it.

I also wasn't around to get a cmviewcl on it.

I just needed reassurance that in case of a reboot, and all configurations normal that the package would have restarted on the adoptive node. Thanks.

Ian
Points shortly.
Where ever the gypsies rome.
Ashwani Kashyap
Honored Contributor

Re: Service Guard Behaviour on reboot.

Ian ,

I believe that cmmodpkg is your culprit . A detailed descrption of cmmodpkg is available in the man pages and I think you might find something akin to your situation there .

However turning on the global switching should take of your problem during next reboot .