Operating System - HP-UX
1833162 Members
4050 Online
110051 Solutions
New Discussion

Re: critical nodes showing up upon failover

 
Lisa McBride
Occasional Advisor

critical nodes showing up upon failover

we have a 2 node cluster with service guard. we have maintenance weekly. during the failover, there are always nodes that show up critical regarding the agent processes not running. that is the first indication we are given that there are problems with the agent. it is not the same nodes each time. we have no idea how long they have been critical.
-lis
5 REPLIES 5
Rita C Workman
Honored Contributor

Re: critical nodes showing up upon failover

A little confusing.
2 node cluster

>>nodes that show up critical regarding the agent processes not running.
I'm going to go and guess you are referring to packages with processes after a failover that aren't running. Well, then your boxes and/or your application is not set up right in your package. If it were everything the package needs on a failover would be there and would start properly when this happens.
** Check your package cntl logs and see what starts, what doesn't, and any messages. Then go back and get it fixed so the processes/agents it needs running on the failover node is working right. Once you have that done, do a couple manual cmhaltpkg on the primary, then cmrunpkg on the failover node (server). Stop and start them a number of times going back and forth till they run perfectly clean. And that includes making sure that everytime you stop the package on a node, go back and ensure that every process and memory segment got cleaned up properly too!

As for how they have been critical - or without certain agents running. Well that sounds like an internal monitoring issue to make certain things are up and running. Don't you have anything in place on your syslog that will email you when you have a node/package/cluster fail? We have alerts like this send msgs to our central server and from there we email out alerts. You may want to set something up to keep you informed when failovers happen.

I hope this helps and is what you are talking about. Maybe I need more coffee or I have totally misunderstood what your asking.

Kindest regards,
Rita
Lisa McBride
Occasional Advisor

Re: critical nodes showing up upon failover

Rita,
it was my bad for not being clear enough. the nodes coming back critical are NOT the 2 management(clustered) nodes, but some of the managed nodes. those nodes are not reporting that the agent is not running until we fail over. it is not the same nodes, it does not always happen. gremlins. :}
-lis
Stephen Doud
Honored Contributor

Re: critical nodes showing up upon failover

the terms management and agent infer Openview. Serviceguard doesn't use those terms, so clarification of the actual problem, with messages from either the package control log or syslog.log would help.
Lisa McBride
Occasional Advisor

Re: critical nodes showing up upon failover

there is nothing in the syslog or the SG cntl log other than normal failover entries. all that we see are messages in the browser from a handful of managed nodes for 'message agent not running' or 'comm broker is down'. for whatever reason, these messages were not received UNTIL the failover. almost as if they were buffering in some inner space somewhere.
-lis
Stephen Doud
Honored Contributor

Re: critical nodes showing up upon failover

Unfortunately, the details you have provided lack context. What user interface are you referring to? Openview, or SMH-Serviceguard Manager?

If your issue centers on Openview, I can re-direct this thread to the correct team. Some of the terms you are using are not part of the Serviceguard jargon - ie manage nodes and agent.

If the syslog.log does not include cmcld or cmclconfd statements indicating a Serviceguard problem, I have to conclude you are dealing with an issue with Openview.