Operating System - HP-UX
1835961 Members
2425 Online
110088 Solutions
New Discussion

Re: Error message when starting cluster (cmruncl)

 
SOLVED
Go to solution
Romaric Guilloud
Regular Advisor

Error message when starting cluster (cmruncl)

When trying to restart a MC/ServiceGuard cluster, I get the following error message:

hooke1-> cmrunnode -v hooke1
Successfully started $SGLBIN/cmcld on hooke1.
cmrunnode : Waiting for cluster to form..Sep 3 13:19:45 hooke1 cmcld: cl_kepd_printf, fstat: kepd_fd=7, st_dev=1073741827, st_ino=142, st_rdev=-486539264

Message from syslogd@hooke1 at Fri Sep 3 13:19:45 2004 ...
hooke1 cmcld: cl_kepd_printf, fstat: kepd_fd=7, st_dev=1073741827, st_ino=142, st_rdev=-486539264

Message from syslogd@hooke1 at Fri Sep 3 13:19:45 2004 ...
hooke1 cmcld: Aborting! Failed to communicate with DLPI
Sep 3 13:19:45 hooke1 cmcld: Aborting! Failed to communicate with DLPI
..........
cmrunnode : Node hooke1 unable to join Cluster. Check the syslog file on that node for information.


Then if I look into the syslog.log, the details on the DLPI call are:

Sep 3 13:19:44 hooke1 CM-CMD[20059]: cmrunnode -v hooke1
Sep 3 13:19:44 hooke1 cmclconfd[20061]: Executing "/usr/lbin/cmcld" for node hooke1
Sep 3 13:19:44 hooke1 cmcld: Daemon Initialization - Maximum number of packages supported for this incarnation is 30.
Sep 3 13:19:44 hooke1 cmcld: Global Cluster Information:
Sep 3 13:19:44 hooke1 cmcld: Heartbeat Interval is 2 seconds.
Sep 3 13:19:44 hooke1 cmcld: Node Timeout is 10 seconds.
Sep 3 13:19:44 hooke1 cmcld: Network Polling Interval is 8 seconds.
Sep 3 13:19:44 hooke1 cmcld: Auto Start Timeout is 600 seconds.
Sep 3 13:19:44 hooke1 cmcld: Information Specific to node hooke1:
Sep 3 13:19:44 hooke1 cmcld: Cluster lock disk: /dev/dsk/c19t2d6.
Sep 3 13:19:44 hooke1 cmcld: lan0 0x00306e0ca99b 172.18.1.14 bridged net:1
Sep 3 13:19:44 hooke1 cmcld: lan1 0x00306e06b215 10.137.146.50 bridged net:2
Sep 3 13:19:44 hooke1 cmcld: lan6 0x00306e270f0d 10.137.138.57 bridged net:3
Sep 3 13:19:44 hooke1 cmcld: lan3 0x00306e040541 standby bridged net:2
Sep 3 13:19:44 hooke1 cmcld: lan4 0x00306e040542 standby bridged net:3
Sep 3 13:19:44 hooke1 cmcld: Heartbeat Subnet: 172.18.1.0
Sep 3 13:19:44 hooke1 cmcld: Heartbeat Subnet: 10.137.144.0
Sep 3 13:19:45 hooke1 cmcld: The maximum # of concurrent local connections to the daemon that will be supported is 58.
Sep 3 13:19:45 hooke1 cmcld: Lookup of link /nodes/hooke1/networks/lan/lan3/peers failed.
Sep 3 13:19:45 hooke1 cmcld: Unable to send DLPI info request, Not a stream
Sep 3 13:19:45 hooke1 cmcld: cl_abort: abort cl_kepd_printf failed: Invalid argument
Sep 3 13:19:45 hooke1 cmcld: cl_kepd_printf, fstat: kepd_fd=7, st_dev=1073741827, st_ino=142, st_rdev=-486539264
Sep 3 13:19:45 hooke1 cmcld: Aborting! Failed to communicate with DLPI
Sep 3 13:19:46 hooke1 cmsrvassistd[20068]: The cluster daemon aborted our connection.
Sep 3 13:19:46 hooke1 cmsrvassistd[20068]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection abort
Sep 3 13:19:46 hooke1 cmsrvassistd[20070]: Unable to notify ServiceGuard main daemon (cmcld): Connection reset by peer
Sep 3 13:19:46 hooke1 cmsrvassistd[20069]: The cluster daemon aborted our connection.
Sep 3 13:19:46 hooke1 cmsrvassistd[20069]: Unable to notify ServiceGuard main daemon (cmcld): Software caused connection abort
Sep 3 13:19:46 hooke1 cmclconfd[20061]: The ServiceGuard daemon, /usr/lbin/cmcld[20062], died upon receiving signal number 6.

Blocking problem, any idea of what it can be?
Your quick feedback/help will be greatly appreciated.
Regards,


Romaric.
"And remember: There are no stupid questions; there are only stupid people." (To Homer Simpson, in "The Simpsons".)
3 REPLIES 3
Stephen Doud
Honored Contributor
Solution

Re: Error message when starting cluster (cmruncl)

I found the error string
"cmcld: cl_kepd_printf, fstat: "
... in patches to Serviceguard A.11.14 and A.11.15

What version do you have?
Use:
# what /usr/lbin/cmcld | grep Date

Patches that may help you...
HPUX 11.00/11.11 SG v. A.11.14 patch: PHSS_31015 (first patch v. PHSS_30028)
HPUX 11.11 SG v. A.11.15 patch: PHSS_30370 (first patch v. PHSS_30087)
HPUX 11.23: SG v. A.11.15 patch: PHSS_30371 (first patch v. PHSS_29902)

StephenD.

Geoff Wild
Honored Contributor

Re: Error message when starting cluster (cmruncl)

I'm assuming this was working at some point?

Okay, you might get more error information by running a cmquerycl:

cmquerycl -C test.ascii -c test1 -n hooke1 -n hooke2

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Romaric Guilloud
Regular Advisor

Re: Error message when starting cluster (cmruncl)

Thanks Stephen for the hint.
Indeed my cmcld version is pretty old and would need patching...
However for this time it was a network pb that I resolved since then.
Rgds,

Romaric.
"And remember: There are no stupid questions; there are only stupid people." (To Homer Simpson, in "The Simpsons".)