topic cmcld: Halting node to preserve data integrity in Operating System - HP-UX

cmcld: Halting node to preserve data integrity

Mihails Nikitins — Tue, 19 Aug 2003 13:18:09 GMT

Hi,

I have just installed test cluster system (HP-UX 11i (June 2003) on 2 servers with 2 shared disk arrays), and configured Apache as test HA service.

Problem. When I start package on any node always goes to reboot.

Aug 19 16:21:05 db2 cmcld: Request from node db2 to start package pkg1 on node db2.
Aug 19 16:21:05 db2 cmcld: Executing '/etc/cmcluster/pkg1/pkg1.sh start' for package pkg1, as service PKG*40961.
Aug 19 16:21:06 db2 LVM[5364]: vgchange -a y vgdb
Aug 19 16:21:07 db2 CM-pkg1[5396]: cmmodnet -a -i 10.20.2.90 10.20.2.0
Aug 19 16:21:07 db2 CM-pkg1[5406]: cmrunserv www >> /etc/cmcluster/pkg1/pkg1.sh.log 2>&1 /etc/cmcluster/pkg1/www monitor
Aug 19 16:21:07 db2 cmcld: Service www terminated due to an exit(127).
Aug 19 16:21:07 db2 cmcld: Service PKG*40961 terminated due to an exit(0).
Aug 19 16:21:07 db2 cmcld: Started package pkg1 on node db2.
Aug 19 16:21:07 db2 cmcld: Service www in package pkg1 has gone down.
Aug 19 16:21:07 db2 cmcld: Service fail fast is set. Node will be failed.
Aug 19 16:21:07 db2 cmcld: Failed node in response to failure of package pkg1.
Aug 19 16:21:07 db2 cmcld: Halting db2 to preserve data integrity
Aug 19 16:21:07 db2 cmcld: Reason: A crucial package failed
Aug 19 16:21:07 db2 cmlvmd: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Aug 19 16:21:07 db2 cmlvmd: CLVMD exiting
Aug 19 16:21:07 db2 cmsrvassistd[5042]: The cluster daemon aborted our connection.
Aug 19 16:21:07 db2 cmsrvassistd[5042]: Lost connection with ServiceGuard cluster daemon (cmcld): Software caused connection abort
Aug 19 16:21:07 db2 cmtaped[5045]: The cluster daemon aborted our connection.
Aug 19 16:21:07 db2 cmtaped[5045]: cmtaped terminating. (ATS 1.14)
Aug 19 16:21:07 db2 cmclconfd[5355]: The cluster daemon aborted our connection.
Aug 19 16:21:12 db2 vmunix: SCSI: Reset detected -- lbolt: 7927106, bus: 4
Aug 19 16:21:12 db2 vmunix: lbp->state: 4060
Aug 19 16:21:12 db2 vmunix: lbp->offset: ffffffff
Aug 19 16:21:12 db2 vmunix: lbp->uPhysScript: f9fef000
Aug 19 16:21:12 db2 vmunix: From most recent interrupt:
Aug 19 16:21:12 db2 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: f9fef028
Aug 19 16:21:12 db2 vmunix: lsp: 0000000000000000
Aug 19 16:21:12 db2 vmunix: lbp->owner: 0000000000000000
Aug 19 16:21:12 db2 vmunix: scratch_lsp: 0000000000000000
Aug 19 16:21:12 db2 vmunix: Pre-DSP script dump [fffffffff9fef0e0]:
Aug 19 16:21:12 db2 vmunix: e0340004 00000000 e0100004 00000000
Aug 19 16:21:12 db2 vmunix: 48000000 00000000 78350000 00000000
Aug 19 16:21:12 db2 vmunix: Script dump [fffffffff9fef100]:
Aug 19 16:21:12 db2 vmunix: 50000000 f9fef028 80000000 0000000b
Aug 19 16:21:12 db2 vmunix: 0f000001 f9fef5c0 60000040 00000000

I suspect something is not OK with vg sharing, some patches may help. BTW, how where to find a list of recommended patches for MC/Servce Guard?

Many thanks and point for your comments!

BR,
Mihails

Re: cmcld: Halting node to preserve data integrity

melvyn burnard — Tue, 19 Aug 2003 13:37:04 GMT

you have configured a package to have a service and on startup this service is failing.

Aug 19 16:21:07 db2 CM-pkg1[5406]: cmrunserv www >> /etc/cmcluster/pkg1/pkg1.sh.log 2>&1 /etc/cmcluster/pkg1/www monitor
Aug 19 16:21:07 db2 cmcld: Service www terminated due to an exit(127).
Aug 19 16:21:07 db2 cmcld: Service PKG*40961 terminated due to an exit(0).
Aug 19 16:21:07 db2 cmcld: Started package pkg1 on node db2.
Aug 19 16:21:07 db2 cmcld: Service www in package pkg1 has gone down.

You then have th e package configured with SERVICE_FAIL_FAST=YES
This causes the system to TOC on hte failure of a service.

Aug 19 16:21:07 db2 cmcld: Service fail fast is set. Node will be failed.
Aug 19 16:21:07 db2 cmcld: Failed node in response to failure of package pkg1.
Aug 19 16:21:07 db2 cmcld: Halting db2 to preserve data integrity
Aug 19 16:21:07 db2 cmcld: Reason: A crucial package failed

reconfigure the package to have SERVICE_FAIL_FAST=NO, and then try again.
also look at the package log to see if you can track down why your service is failing

Re: cmcld: Halting node to preserve data integrity

Michael Steele_2 — Tue, 19 Aug 2003 13:55:06 GMT

Regarding "... BTW, how where to find a list of recommended patches for MC/Servce Guard?..."

maintenance and support for hp products > individual patches > hp-ux > enter version in data field > enter serviceguard in 'search by keyword' data field > 47 patches are returned

http://www1.itrc.hp.com/service/patch/search.do

#############################################

This link provides endless ServiceGuard advice:

http://docs.hp.com/hpux/ha/

#############################################

Regarding "...I suspect something is not OK with vg sharing..."

From either node you should be able to activate vg.'s without bringing up the cluster. But here is the procedure:

vgchange -c y /dev/vgdata
#vgexport -p -s -m /tmp/vgoracle.map /dev/vgoracle

# rcp /tmp/vgoracle.map nodeB:/tmp/vgoracle.map

On nodeB

# mkdir /dev/vgoracle
#mknod /dev/vgoracle/group c 64 0x0x0000
#vgimport -s -m /tmp/vgoracle.map /dev/vgoracle
vgchange -c y /dev/vgdata
#vgchange -a y /dev/vgdata

#############################################

However, I feel the problem is within your package.conf or package.cntl files. These files are linked together by the exact same service name, so check for this with :

cmcheckconf -P package.conf

The exact same service name

Re: cmcld: Halting node to preserve data integrity

Stephen Doud — Wed, 20 Aug 2003 11:27:12 GMT

Note that "vgchange -c y " can only be performed if the node the command is executed on is currently running the ServiceGuard daemons. cmlvmd must be running to authorize the "clusterizing" of VGs.

-s.

Re: cmcld: Halting node to preserve data integrity

Mihails Nikitins — Thu, 21 Aug 2003 15:20:22 GMT

Hi,

Thank tou, the error was bad value of SERVICE_FAIL_FAST parameter.

BR,
Mihails