Operating System - HP-UX
1826554 Members
4329 Online
109695 Solutions
New Discussion

Re: SG nodes fail when starting the multi-node package

 
Dima Kouznetsov
Advisor

SG nodes fail when starting the multi-node package

Hello, I have been stuck on this one for a few days, any advise would be appreciated:

I am doing a configuration of Serviceguard with Veritas Cluster File System on HP-UX 11i v3 and Oracle 10g R2 RAC. Right now I have Serviceguard running on 2 Integrity Virtual machines which are hosted on an rx8640 server.

swlist | grep -i serviceguard
T1905CA A.11.19.00 Serviceguard
T2777CB A.02.01 HP Serviceguard Cluster File System for RAC
PHSS_40152 1.0 Serviceguard A.11.19.00

I loaded the latest Patches from the ITRC website for Serviceguard Storage Management Suite A.02.01 for HP-UX 11i v3.

The cluster starts and runs just fine:
cmviewcl -v

CLUSTER STATUS
cluster1 up

NODE STATUS STATE
vguest1 up running

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vglock /dev/dsk/c0t0d0 up

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/0/1/0 lan0
STANDBY up 0/0/2/0 lan1

NODE STATUS STATE
vguest2 up running

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vglock /dev/dsk/c0t2d0 up

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/0/2/0 lan0
STANDBY up 0/0/1/0 lan1

The problem comes when I create and start the package:
cfscluster config
CVM is now configured
cfscluster start
Starting CVM...

I attached the txt document of the error output from there. But basically the second node errors saying a crucial package failed, and then the second one goes down with the same error, and both nodes reboot.

Any ideas for what may be causing this, or what I can try to fix it? I will attach the SG-CFS-pkg.log in my next post (this only lets me attach one file it looks like).

Thanks you for your help,
Dima

15 REPLIES 15
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

Attached is the SG-CFS-pkg.log

Dima
Steven E. Protter
Exalted Contributor

Re: SG nodes fail when starting the multi-node package

Shalom,

Cluster works okay without this package?

Is it possible to run through another cmquerycl/cmcheckconf/cmapplyconf series on this system.

It would be nice to see what the package control script looks like versus the cluster configuration script.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

Thanks for the quick reply Steven. I attached the output from the commands that create the cluster from the beginning. After those commands is where I try to create the package and the error occurs:

# cfscluster config
# cfscluster start

Thanks,
Dima
freddy_21
Respected Contributor

Re: SG nodes fail when starting the multi-node package

did you run vxinstall for install license for vxvm?

Regards
Freddy
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

I ran vxinstall on both the nodes. I am able to view the presented disks with vxdisk list.
Stephen Doud
Honored Contributor

Re: SG nodes fail when starting the multi-node package

Create a 1-node cluster on the vguest2 host and attempt to build the SG-CFS-pkg. If it starts without failure, focus your attention on the other server that TOCs.

I have only seen CVM/CFS TOCs occur when CFS-related filesets are not properly patched. Check (swlist) the affected server for properly installed patches to the CFS (vx*) filesets.
The filesets that affect CFS are:

Also insure it is using the same version of VxVM (4.0/5.0 etc) as vguest2.
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

Stephen, thanks for your suggestion.

I tried to create the cluster on vguest2 with just itself as the node and run the package, and it also crashed. It gave this error message:

^GMessage from syslogd@vguest2 at Wed Oct 28 10:34:05 2009 ...
vguest2 cmcld[25315]: Reason: A crucial package failed
Executing "/usr/sbin/cmrunpkg -v SG-CFS-pkg"
Unable to retrieve package status for package SG-CFS-pkg
cmrunpkg: Unable to start some package or package instances
Error: Failed to start CVM

So it looks like CVM is failing to start, which is something new I see. I also tried to do the same thing but with vguest1, and it gave the same error. Both of these machines have the same software loaded on them, and same patch level. I suspect you may be right with the patches, but I have installed everything I can think of.

Here is the list of CFS patches installed:

swlist -l patch | grep -i cfs
# Cluster-CFS A.11.19.00 HP SG Cluster CFS SD Product
# Cluster-CFS.CFS-ADMIN A.11.19.00 Cluster File System Admin SD Product
# Cluster-CFS.CFS-MAN A.11.19.00 CFS MAN pages
# PHSS_40152.CM-CVM-CFS 1.0 CVM CFS Package Fileset applied
# PHSS_40152.CM-CVM-CFS-COM 1.0 CVM CFS Package Fileset applied
# Package-CVM-CFS A.11.19.00 HP SG Cluster CVM CFS SD Product
# Package-CVM-CFS.CM-CVM-CFS A.11.19.00 CVM CFS Package Fileset
PHSS_40152.CM-CVM-CFS 1.0 CVM CFS Package Fileset applied
# Package-CVM-CFS.CM-CVM-CFS-COM A.11.19.00 CVM CFS Package Fileset
PHSS_40152.CM-CVM-CFS-COM 1.0 CVM CFS Package Fileset applied

Also, I don't know if this means anything, but I keep seeing these error messages in the SG-CFS-pkg.log:

10/28/09 10:34:04 cmrunserv -r 5 SG-CFS-sgcvmd >> /etc/cmcluster/cfs/SG-CFS-pkg.
log 2>&1 /etc/cmcluster/cfs/SG-CFS-sgcvmd.sh
10/28/09 10:34:04 List current imported disk groups:
10/28/09 10:34:04 SG configuration monitor started (poll_interval = 10)
/etc/cmcluster/cfs/SG-CFS-vxfsckd.sh[53]: /usr/sbin/vxfsckd: not found.
10/28/09 10:34:04 ERROR: Unable to determine pid of vxfsckd

...and...

10/28/09 10:32:37 Starting VXFEN
10/28/09 10:32:42 Enabling cluster ODM
10/28/09 10:33:48 /sbin/init.d/odm start (exit=1)
ERROR: The module 'odm' has a dependency on the module 'fdd' that
cannot be satisfied.

Also, I think you were going to list the filesets that affect CFS in your post, but it didn't show up, could you please list those again.

Thanks for all the suggestions,
Dima
Stephen Doud
Honored Contributor

Re: SG nodes fail when starting the multi-node package

Hello Dima,

Your SG-CFS-pkg.log contains very important clues to why it's not working.

First:
# ll /usr/sbin/vxfsckd
-r-xr-xr-x 1 root sys 285184 May 21 2005 /usr/sbin/vxfsckd

# swlist -l file | grep vxfsckd
PHSS_37601.CM-CVM-CFS-COM: /etc/cmcluster/cfs/SG-CFS-vxfsckd.sh
Package-CVM-CFS.CM-CVM-CFS-COM: /etc/cmcluster/cfs/SG-CFS-vxfsckd.sh
VRTSvxfs.VXFS-RUN: /opt/VRTS/bin/vxfsckd
VRTSvxfs.VXFS-RUN: /usr/sbin/vxfsckd

If you don't see the file - it's not installed and likely neither is the fileset which CVM needs!

2) The messages you see about ODM dependency on fdd not being satisfied, reiterates the problem. Apparently not all required filesets have been installed.


As for the list of filesets that are supplied with T2775CB (HP Serviceguard Cluster File System), it is too long for this forum. If you would like to get it, please email me and I will send you the list. But since ODM has unsatisfied dependencies, it's very likely the SG product bundle did not install properly.

As for patches - please use this website to get a list and download currently recommended patches for the version of SMS (storage management suite) installed on your servers:
http://www.hp.com/go/sgsms/patches
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

Thank you for your reply, that definitely makes sense now why they keep crashing. I am missing the /usr/sbin/vxfsckd file.
...and...
swlist -l file | grep vxfsckd
PHSS_40152.CM-CVM-CFS-COM: /etc/cmcluster/cfs/SG-CFS-vxfsckd.sh
Package-CVM-CFS.CM-CVM-CFS-COM: /etc/cmcluster/cfs/SG-CFS-vxfsckd.sh

I do remember that certain components had errors when installing Serviceguard CFS for RAC. I am not exactly sure why thought. I am installing the newest version of the HP-UX OS I have, 11.31.0909. Previously I was working on 11.31.0709. I will reinstall Serviceguard, and let you know if that fixes the missing /usr/sbin/vxfsckd file. Is there a way to install only that component?
Stephen Doud
Honored Contributor

Re: SG nodes fail when starting the multi-node package

Dima,
Your swlist indicates that there is a patch (PHSS_40152) installed on vxfsckd.sh (which is part of the VRTSvxfs fileset).
You might be able to drill into the T2777CB product (in swinstall GUI) and reinstall just VRTSvxfs but you will also have to reinstall the patch too. I don't experiment with such things, so I just don't know for certain. I hope it goes well for you.
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

Ok, so I reinstalled one of the machines with 11.31.0909 and loaded the latest SG CFS for RAC, and installed the latest patches from the itrc website. I am still getting the package failed TOC, but it seems like something else it messing up. From the message below it says that the LLT failed to start, but when my system boots, the start-up progress shows that LLT is started. However, start-up progress shows that Starting VXFEN ......FAIL*. Before I installed some patches for VXFEN, the package syslog had an error message about vxfen failing, but after installing patches that went away (but it still fails on the start-up progress)...I am confused. Is there a patch for LLT that I also need to install outside the bundle?

10/30/09 13:35:57 ########### Node "vguest1": Starting package ###########
10/30/09 13:35:57 Starting service SG-CFS-vxconfigd
10/30/09 13:35:57 cmrunserv SG-CFS-vxconfigd >> /etc/cmcluster/cfs/SG-CFS-pkg.lo
g 2>&1 /etc/cmcluster/cfs/SG-CFS-vxconfigd.sh
10/30/09 13:35:57 Cleaning up any old GAB/LLT configuration
10/30/09 13:35:57 /etc/cmcluster/cfs/vx-modules clean
10/30/09 13:35:58 Cleaning up old LLT/GAB
10/30/09 13:35:58 ERROR: Unable to reset GAB
10/30/09 13:35:58 rm -f /etc/llttab /etc/llthosts /etc/gabtab
10/30/09 13:35:58 Starting service SG-CFS-cmvxpingd
10/30/09 13:35:58 cmrunserv SG-CFS-cmvxpingd >> /etc/cmcluster/cfs/SG-CFS-pkg.lo
g 2>&1 /usr/lbin/cmvxpingd -t 643
10/30/09 13:35:58 rm -f /var/adm/cmcluster/cmvxd.socket
10/30/09 13:35:58 Starting service SG-CFS-cmvxd
10/30/09 13:35:58 cmrunserv SG-CFS-cmvxd >> /etc/cmcluster/cfs/SG-CFS-pkg.log 2>
&1 /usr/lbin/cmvxd run -s /var/adm/cmcluster/cmvxd.socket -t 643
10/30/09 13:35:58 Creating LLT configuration
10/30/09 13:35:58 Monitoring vxconfigd (pid= 363) every 20 secs
10/30/09 13:35:58 mktemp -d /etc
10/30/09 13:35:58 touch /etc/003890
10/30/09 13:35:58 chmod 644 /etc/003890
10/30/09 13:35:58 chmod 444 /etc/003890
10/30/09 13:35:58 mv /etc/003890 /etc/llttab
10/30/09 13:35:58 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/llttab
10/30/09 13:35:58 Creating GAB configuration
10/30/09 13:35:58 mktemp -d /etc
10/30/09 13:35:58 touch /etc/003915
10/30/09 13:35:58 chmod 644 /etc/003915
10/30/09 13:35:58 chmod 444 /etc/003915
10/30/09 13:35:58 mv /etc/003915 /etc/gabtab
10/30/09 13:35:58 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/gabtab
10/30/09 13:35:58 chmod 544 /etc/gabtab
10/30/09 13:35:58 Creating initial LLT hosts file
10/30/09 13:35:58 mktemp -d /etc
10/30/09 13:35:58 touch /etc/003935
10/30/09 13:35:58 chmod 644 /etc/003935
10/30/09 13:35:58 chmod 444 /etc/003935
10/30/09 13:35:58 mv /etc/003935 /etc/llthosts
10/30/09 13:35:58 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/llthosts
10/30/09 13:35:58 Starting Veritas stack
10/30/09 13:35:58 /etc/cmcluster/cfs/vx-modules start
10/30/09 13:35:58 Starting LLT
10/30/09 13:35:58 /sbin/init.d/llt start
LLT lltconfig ERROR V-14-2-15040 node ID is already set, use -o to override
10/30/09 13:35:58 /sbin/init.d/llt start (exit=1)
10/30/09 13:35:58 ERROR: Failed to start LLT
10/30/09 13:35:58 ERROR: Could not start veritas stack
10/30/09 13:35:58 ########### Node "vguest1": Package script failed ###########
Michael Steele_2
Honored Contributor

Re: SG nodes fail when starting the multi-node package

Hi

Everything that I have read on this error:

"...ERROR: Could not start veritas stack..."

ends up as an incomplete patched node in the cluster.

ALL NODES MUST BE IDENTICAL.

http://docs.hp.com/en/T2771-90028/ch01s04.html#d0e2254
Support Fatherhood - Stop Family Law
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

Right now I am just trying to get this setup to work on one server (ie cluster with one node). I made sure that I downloaded the latest patches from the HP website, at this point the only thing I can think of trying is going back to the "recommended" or the older patches instead of the newest patches that I got from the website. Although I don't think that will fix my issue, I'm going to give that a shot.
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

Ok, FINALLY got it up and working. Here is what was wrong with my setup just in case anyone else runs into this issue:

I went back and checked the install of HP Serviceguard Cluster File System for RAC, and saw that not all of the components got installed, there were some errors during the install. After reading the error messages, the system was missing Base-VxTools-50 and SGeRAC (don't know why it needed it, is that normal???). After installing those two, I went back and reinstalled SGCFS for RAC, and it completed without errors. I also applied all the newest patches for SG, SGCFSRAC, VXVM. The main underlying issue was that VXFEN was failing on system startup. After the complete install and reboot it finally started.

Now, right now it is working on one node (but still huge progress), I will need to go and reinstall the second node to match it and then will try to get the two working together. I will close the thread as soon as I have both up and running. Thanks for all the help!

Dima
Dima Kouznetsov
Advisor

Re: SG nodes fail when starting the multi-node package

The patches definitely did the trick, what baffles me is why those patches weren't listed in the SG installation document. Guess they assume certain things about your system before the install.