Operating System - HP-UX
1834194 Members
2616 Online
110064 Solutions
New Discussion

Re: ServiceGuard syslog messages

 
SOLVED
Go to solution
Eric Guerizec
Frequent Advisor

ServiceGuard syslog messages

Hi!
Does anybody known the most "popular" messages that cmcld reports in syslog (package failure, package switch, etc.)
The aim is to trap these messages with TNG.

Thanks.

11 REPLIES 11
Paula J Frazer-Campbell
Honored Contributor

Re: ServiceGuard syslog messages

Eric

I am afraid that is is an "it depends" answer.

It depends of your configuration.


Paula
If you can spell SysAdmin then you is one - anon
Massimo Bianchi
Honored Contributor

Re: ServiceGuard syslog messages

Hi,
the easiest way is to issue a

grep cmcld /var/adm/syslog/syslog.log

in a couple of server with MC/SG on it.

HTH,
Massimo

John Poff
Honored Contributor

Re: ServiceGuard syslog messages

Hi,

Another idea is to use the 'logger' command to write your own messages to syslog. You could put these into your package start and stop functions so that you would have definite messages for your software to trap. That would free you from having to worry about the exact syntax of the cmcld system generated messages.

JP
Chris Wilshaw
Honored Contributor
Solution

Re: ServiceGuard syslog messages

Here's a sample of common errors (the %s and %d values represent variables) The list was extracted from /usr/lbin/cmcld using

strings cmcld.

I suppose the most common messages would be those relating to the start/switch/halt processes.

Daemon could not clean up.
Disabled node %s from running package %s.
Examine the file %s.log for more details.
Halted package %s on node %s.
Package %s halt script %s could not be started.
Package %s halt script %s does not exist.
Package %s halt script %s does not have execute permission.
Package %s halt script exited with NO_RESTART.
Package %s halt script failed.
Package %s halt script failed: exit value %d.
Package %s halt script timed out.
Package %s run script %s could not be started.
Package %s run script %s does not exist.
Package %s run script %s does not have execute permission.
Package %s run script exited with NO_RESTART.
Package %s run script exited with RESTART.
Package %s run script failed.
Package %s run script failed: exit value %d.
Package %s run script timed out.
Started package %s on node %s.
Switching disabled on package %s.
Unable to run halt script for package %s.
Unable to set realtime priority to RTPRIO_RTOFF.
Unable to start package %s.
Eric Guerizec
Frequent Advisor

Re: ServiceGuard syslog messages

Thanks everybody!
Chris, your list is interesting! Thank you.
John, the use of logger is a good idea!

T. M. Louah
Esteemed Contributor

Re: ServiceGuard syslog messages

Here is a sample of cmcld process failed to connect after a system reboot when the cmrunnode command was issued:

The messages logged in the syslog.log are:

Jun 14 10:52:15 nodeA CM-CMD[19055]: cmrunnode -v hostname
Jun 14 10:52:17 nodeA cmclconfd[19059]: Executing "/usr/lbin/cmcld" for node
nodeA.abc.com
Jun 14 10:52:17 nodeA cmcld: Daemon Initialization - Maximum number of packag
es supported for this incarnation is 2.
Jun 14 10:52:17 nodeA cmcld: Reserving 1448 Kbytes of memory and 34 threads
Jun 14 10:52:17 nodeA cmcld: The maximum # of concurrent local connections to
the daemon that will be supported is 18.
Jun 14 10:52:17 nodeA cmcld: Warning. No cluster lock is configured.
Jun 14 10:52:27 nodeA cmcld: Processing exit status for service cmlogd
Jun 14 10:52:27 nodeA cmcld: Service cmlogd terminated due to an exit(1).
Jun 14 10:52:27 nodeA cmcld: Automatically restarted service cmlogd for the 1
st time after failure.
Jun 14 10:52:27 nodeA cmcld: Processing exit status for service cmlvmd
Jun 14 10:52:27 nodeA cmcld: Service cmlvmd terminated due to an exit(1).
Jun 14 10:52:27 nodeA cmcld: Daemon exiting to preserve data integrity
Jun 14 10:52:27 nodeA cmcld: Reason: LVM daemon failed
Jun 14 10:52:28 nodeA cmsrvassistd[19064]: Lost connection to the cluster dae
mon.
Jun 14 10:52:28 nodeA cmsrvassistd[19064]: Lost connection with ServiceGuard
cluster daemon (cmcld): Software caused connection abort
Jun 14 10:52:28 nodeA cmsrvassistd[19068]: Unable to notify ServiceGuard main
daemon (cmcld): Connection reset by peer
Jun 14 10:52:28 nodeA cmclconfd[19067]: Unable to lookup any node information
in CDB: Connection refused
----- end of syslog.log -----
The above errors are related to security software denying rsh - and ServiceGuard "hacl" ports access.
Little learning is dangerous!
Michael Steele_2
Honored Contributor

Re: ServiceGuard syslog messages

Might be easier if you checked for what should be there and note any exceptions, since there are new messages coming out with different patches or releases that will change your parsing logic.

Here are some cmcld error messages.

Most critical:

"...MC/ServiceGuard: Unable to maintain contact with cmcld daemon.
Performing TOC to ensure data integrity..."

###############################

/etc/rc.config.d/cmcluster file to enable the new node to join the
cluster automatically each time it reboots.

###############################

Dec 14 14:34:44 star04 cmcld[2048]: Executing ???/etc/cmcluster/pkg5/pkg5_run
start??? for package pkg5.
Dec 14 14:34:45 star04 LVM[2066]: vgchange -a n /dev/vg02
Dec 14 14:34:45 star04 cmcld[2048]: Package pkg5 run script exited with
NO_RESTART.
Dec 14 14:34:45 star04 cmcld[2048]: Examine the file

###############################

Sep 24 09:35:12 prc02b03 cmcld: Stopped accepting local connection requests because there are currently too many concurrent connections to th
e daemon (48).
Sep 24 09:35:12 prc02b03 cmcld: There are no longer too many local connections. Now accepting local connection requests.

###############################

"...node..."[###]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection abort
Support Fatherhood - Stop Family Law
Eric Guerizec
Frequent Advisor

Re: ServiceGuard syslog messages

Hi!

I don't need to know all the cmcld messages of course. I want to trap critical messages and espacially when a package switch to another node.

When I switch package manually, I see messages like these in syslog :
cmcld: Halted package Package2 on node NodeA
cmcld: (NodeB) Started package Package2 on node NodeB
I can't trap these messages because the switch is done manually and I don't want TNG report them. I need messages when the switch is due to an error (hardware). You see what I mean...
John Poff
Honored Contributor

Re: ServiceGuard syslog messages

Here is something you could do. Add logic to your start and stop functions in the package control script so that they check for the presence of a file in /etc/cmcluster. You could name the file MAINT or maintenance or whatever you like. Now, if that file doesn't exist, use the logger command to write something to syslog that your TNG will trap for, otherwise, if the file is there just ignore things. This way, you can touch the file when you are going to do some work stopping and starting packages, and TNG won't get notified via syslog. Remove the file when you are done, and then if something crashes your control script will see that the maint file isn't there, write to syslog using logger, and your TNG will get excited and call you.

JP
Michael Steele_2
Honored Contributor

Re: ServiceGuard syslog messages

Why can't you use 'grep -v' in some way? Make a list of startup of messages and :

grep -v -f startup syslog.log (* approximate *)
Support Fatherhood - Stop Family Law
Eric Guerizec
Frequent Advisor

Re: ServiceGuard syslog messages

Thanks guys. I will do some tests now :)