Operating System - HP-UX
1845948 Members
2244 Online
110250 Solutions
New Discussion

after disk replacement, node won't join cluster

 
SOLVED
Go to solution
Greg Heim
New Member

after disk replacement, node won't join cluster

This is a three node cluster. All OS's are 10.20. MC/ServiceGuard is:

B3936AA_APZ A.10.06 MC / Service Guard
B5125AA_APZ A.10.05 MC/ServiceGuard NFS Toolkit
PHSS_10340 B.10.00.00.AA MC/ServiceGuard NFS Toolkit cumulative patch
PHSS_20577 B.10.00.00.AA MC/ServiceGuard and MC/LockManager A.10.06 patch

When I run cmrunnode on the system that just had to have a disk replaced (/usr), I get:

{porky:root}# cmrunnode
Unable to receive message from configuration daemon on porky: Software caused connection abort

There apparently was some disk corruption. I have already ftp'd over the cmcld executable from one of the working nodes:

{froggy:root}# file cmcld
cmcld: s800 shared executable dynamically linked -not stripped

{porky:root}# file cmcld
cmcld: commands text


I'm filling in for someone and my service guard knowledge is quite rusty. i see on the two other nodes the following are running:

froggy:

{froggy:root}# ps -ef |grep cm
root 11444 1 0 Sep 23 ? 56:33 /usr/lbin/cmcld -j
root 11469 11444 0 Sep 23 ? 0:00 /usr/lbin/cmlvmd

I don't seem to have a man page on cmcld so I don't know what the j option (join?) does. I see that cmcld is run with different options on the two running nodes.

can someone point me in the right direction?

Thanks

butch:

butch:root}# ps -ef |grep cm
root 2842 2836 0 Sep 20 ? 0:00 /usr/lbin/cmlvmd
root 2922 2836 0 Sep 20 ? root 2836 1 0 Sep 20 ? 109:31 /usr/lbin/cmcld -m -n froggy -n butch

9 REPLIES 9
Devender Khatana
Honored Contributor

Re: after disk replacement, node won't join cluster

Hi,

What ans all files replated to service guard you have ftped from other system. Is these include binaries you need to check and apply the configuration. Were there some problem in restoring /usr or it was done without problems?

HTH,
Devender
Impossible itself mentions "I m possible"
Torsten.
Acclaimed Contributor

Re: after disk replacement, node won't join cluster

Hi Greg,

what does it mean:

"There apparently was some disk corruption"?

Please tell us more about the failed disk and the way you replaced it. Are the disk mirrored? Your system has to be stable to run production. If there is any corruption, restore the backup.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
melvyn burnard
Honored Contributor
Solution

Re: after disk replacement, node won't join cluster

I would not worry about the options to cmcld, you do not change them.
What is bad news is that the cmcld binary file on the suspect system has been corrupted. What else may have been corrupted??
You may be better off recovering a known good backup to this system.

As an aside, you are running a totally obsolete unsupported version of Serviceguard, on an unsupported OS version.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Greg Heim
New Member

Re: after disk replacement, node won't join cluster

The reason I said there was apparent file corruption was that the cmcle executable was showing up as a text file (vs s800 shared executable...).

That is the only file I have recovered so far (just ftp'd from a known good, working node).

Any thoughts on what to recover?

/usr/lbin/cm*?
/etc/cmcluster
??

The disk was mirrored but there were apparently some stale extents even prior to the one disk failing. As you can surmise, this is an old cluster and we are moving away from it soon. we can live without this node up, but as long as we are using it i wanted to have all three nodes up and running if i can.

thanks
Devender Khatana
Honored Contributor

Re: after disk replacement, node won't join cluster

Hi,

Recovery here meant recovering complete OS using ignite backups assuming that like this file there may be some other files in /usr which would have got corrupted.

HTH,
Devender
Impossible itself mentions "I m possible"
Greg Heim
New Member

Re: after disk replacement, node won't join cluster

I can recover /usr (or some subset) using NetBackkup. Didn't have a ignite tape; we've been neglecting these for some time as they are about to go away.
Steven E. Protter
Exalted Contributor

Re: after disk replacement, node won't join cluster

It appears to me that the LVM configuration of a volume group that either contains software for SG or is activated by SG has been effected.

If its vg00, you probably need to replace the disk and restore your make_net_recovery or make_sys_recovery from Ignite.

Otherwise, you probably have to use vgreduce -f to force the reduction of the volume group in question, and then rebuild it after following the mandatory steps that the vgreduce -f command displays after runtime.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Greg Heim
New Member

Re: after disk replacement, node won't join cluster

Thanks for the suggestions. I did a restore of /usr and /etc/cmcluster and now it has joined the cluster.

Thanks again!
Greg Heim
New Member

Re: after disk replacement, node won't join cluster

restored /usr and /etc/cmcluster as suggested and now node is joining cluster.