Operating System - HP-UX
1822564 Members
3713 Online
109642 Solutions
New Discussion юеВ

cmcld: Halting node to preserve data integrity

 
SOLVED
Go to solution
Mihails Nikitins
Super Advisor

cmcld: Halting node to preserve data integrity

Hi,

I have just installed test cluster system (HP-UX 11i (June 2003) on 2 servers with 2 shared disk arrays), and configured Apache as test HA service.

Problem. When I start package on any node always goes to reboot.


Aug 19 16:21:05 db2 cmcld: Request from node db2 to start package pkg1 on node db2.
Aug 19 16:21:05 db2 cmcld: Executing '/etc/cmcluster/pkg1/pkg1.sh start' for package pkg1, as service PKG*40961.
Aug 19 16:21:06 db2 LVM[5364]: vgchange -a y vgdb
Aug 19 16:21:07 db2 CM-pkg1[5396]: cmmodnet -a -i 10.20.2.90 10.20.2.0
Aug 19 16:21:07 db2 CM-pkg1[5406]: cmrunserv www >> /etc/cmcluster/pkg1/pkg1.sh.log 2>&1 /etc/cmcluster/pkg1/www monitor
Aug 19 16:21:07 db2 cmcld: Service www terminated due to an exit(127).
Aug 19 16:21:07 db2 cmcld: Service PKG*40961 terminated due to an exit(0).
Aug 19 16:21:07 db2 cmcld: Started package pkg1 on node db2.
Aug 19 16:21:07 db2 cmcld: Service www in package pkg1 has gone down.
Aug 19 16:21:07 db2 cmcld: Service fail fast is set. Node will be failed.
Aug 19 16:21:07 db2 cmcld: Failed node in response to failure of package pkg1.
Aug 19 16:21:07 db2 cmcld: Halting db2 to preserve data integrity
Aug 19 16:21:07 db2 cmcld: Reason: A crucial package failed
Aug 19 16:21:07 db2 cmlvmd: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Aug 19 16:21:07 db2 cmlvmd: CLVMD exiting
Aug 19 16:21:07 db2 cmsrvassistd[5042]: The cluster daemon aborted our connection.
Aug 19 16:21:07 db2 cmsrvassistd[5042]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection abort
Aug 19 16:21:07 db2 cmtaped[5045]: The cluster daemon aborted our connection.
Aug 19 16:21:07 db2 cmtaped[5045]: cmtaped terminating. (ATS 1.14)
Aug 19 16:21:07 db2 cmclconfd[5355]: The cluster daemon aborted our connection.
Aug 19 16:21:12 db2 vmunix: SCSI: Reset detected -- lbolt: 7927106, bus: 4
Aug 19 16:21:12 db2 vmunix: lbp->state: 4060
Aug 19 16:21:12 db2 vmunix: lbp->offset: ffffffff
Aug 19 16:21:12 db2 vmunix: lbp->uPhysScript: f9fef000
Aug 19 16:21:12 db2 vmunix: From most recent interrupt:
Aug 19 16:21:12 db2 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: f9fef028
Aug 19 16:21:12 db2 vmunix: lsp: 0000000000000000
Aug 19 16:21:12 db2 vmunix: lbp->owner: 0000000000000000
Aug 19 16:21:12 db2 vmunix: scratch_lsp: 0000000000000000
Aug 19 16:21:12 db2 vmunix: Pre-DSP script dump [fffffffff9fef0e0]:
Aug 19 16:21:12 db2 vmunix: e0340004 00000000 e0100004 00000000
Aug 19 16:21:12 db2 vmunix: 48000000 00000000 78350000 00000000
Aug 19 16:21:12 db2 vmunix: Script dump [fffffffff9fef100]:
Aug 19 16:21:12 db2 vmunix: 50000000 f9fef028 80000000 0000000b
Aug 19 16:21:12 db2 vmunix: 0f000001 f9fef5c0 60000040 00000000

I suspect something is not OK with vg sharing, some patches may help. BTW, how where to find a list of recommended patches for MC/Servce Guard?

Many thanks and point for your comments!

BR,
Mihails

KISS - Keep It Simple Stupid
4 REPLIES 4
melvyn burnard
Honored Contributor
Solution

Re: cmcld: Halting node to preserve data integrity

you have configured a package to have a service and on startup this service is failing.

Aug 19 16:21:07 db2 CM-pkg1[5406]: cmrunserv www >> /etc/cmcluster/pkg1/pkg1.sh.log 2>&1 /etc/cmcluster/pkg1/www monitor
Aug 19 16:21:07 db2 cmcld: Service www terminated due to an exit(127).
Aug 19 16:21:07 db2 cmcld: Service PKG*40961 terminated due to an exit(0).
Aug 19 16:21:07 db2 cmcld: Started package pkg1 on node db2.
Aug 19 16:21:07 db2 cmcld: Service www in package pkg1 has gone down.


You then have th e package configured with SERVICE_FAIL_FAST=YES
This causes the system to TOC on hte failure of a service.

Aug 19 16:21:07 db2 cmcld: Service fail fast is set. Node will be failed.
Aug 19 16:21:07 db2 cmcld: Failed node in response to failure of package pkg1.
Aug 19 16:21:07 db2 cmcld: Halting db2 to preserve data integrity
Aug 19 16:21:07 db2 cmcld: Reason: A crucial package failed


reconfigure the package to have SERVICE_FAIL_FAST=NO, and then try again.
also look at the package log to see if you can track down why your service is failing
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Michael Steele_2
Honored Contributor

Re: cmcld: Halting node to preserve data integrity

Regarding "... BTW, how where to find a list of recommended patches for MC/Servce Guard?..."

maintenance and support for hp products > individual patches > hp-ux > enter version in data field > enter serviceguard in 'search by keyword' data field > 47 patches are returned

http://www1.itrc.hp.com/service/patch/search.do

#############################################

This link provides endless ServiceGuard advice:

http://docs.hp.com/hpux/ha/

#############################################

Regarding "...I suspect something is not OK with vg sharing..."

From either node you should be able to activate vg.'s without bringing up the cluster. But here is the procedure:

vgchange -c y /dev/vgdata
#vgexport -p -s -m /tmp/vgoracle.map /dev/vgoracle

# rcp /tmp/vgoracle.map nodeB:/tmp/vgoracle.map

On nodeB

# mkdir /dev/vgoracle
#mknod /dev/vgoracle/group c 64 0x0x0000
#vgimport -s -m /tmp/vgoracle.map /dev/vgoracle
vgchange -c y /dev/vgdata
#vgchange -a y /dev/vgdata

#############################################

However, I feel the problem is within your package.conf or package.cntl files. These files are linked together by the exact same service name, so check for this with :

cmcheckconf -P package.conf

The exact same service name
Support Fatherhood - Stop Family Law
Stephen Doud
Honored Contributor

Re: cmcld: Halting node to preserve data integrity

Note that "vgchange -c y " can only be performed if the node the command is executed on is currently running the ServiceGuard daemons. cmlvmd must be running to authorize the "clusterizing" of VGs.

-s.
Mihails Nikitins
Super Advisor

Re: cmcld: Halting node to preserve data integrity

Hi,

Thank tou, the error was bad value of SERVICE_FAIL_FAST parameter.

BR,
Mihails
KISS - Keep It Simple Stupid