Operating System - HP-UX
1846994 Members
3977 Online
110257 Solutions
New Discussion

3 Node Cluster cmcld Process down

 
Ajin_1
Valued Contributor

3 Node Cluster cmcld Process down

Hi experts

 

we received a cmcld process down alert in one server

 

we check the process status using

ps -ef |grep cmcld

 

 root  5028  5026  0  Aug 21  ?        41:52 /usr/lbin/cmcld -j

 

In other nodes it shows

 

root 25811 25810  0  Apr  4  ?        444:22 /usr/lbin/cmcld -m -n node10101 -n node10301 -n node10201

root 26703 26702  0  Apr  4  ?        442:59 /usr/lbin/cmcld -m -n node 10101 -n node10301 -n node10201

 

What does it means

 

/usr/lbin/cmcld -j

 

Thanks in advance

 

 

 

 

Thanks & Regards
Ajin.S
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
3 REPLIES 3
Matti_Kurkela
Honored Contributor

Re: 3 Node Cluster cmcld Process down

My guess is that the process with PID 5056 is either "cmrunnode" or "cmruncl", and it is running "cmcld -j" as an attempt to join an existing cluster. Maybe it cannot join for some reason?

 

In the other nodes, the cmcld processes are running in full cluster member mode, with a list of all the nodes the cluster is supposed to have.

 

Unless the cmcld process was properly shut down using the "cmhaltnode" command, the single node should have crashed & rebooted after cmcld died, as the kernel safety timer should have expired and triggered a TOC.
Did the single node reboot on Aug 21?

 

If it did not reboot, you probably should reboot it to be safe, and then check Serviceguard version and system patch level, and look for any indications of hardware failures: the cmcld process is normally very stable. It should not die without a reason.

 

What does the "cmviewcl" command report on the other nodes? If the single node is still down according to them, then there is a problem (possibly a network issue) that prevents the single node from hearing the cluster heartbeat from the other nodes, and/or prevents the other nodes from hearing any traffic from the single node.

 

As this is a 3-node cluster, the single node has lost cluster quorum and must assume that the other nodes may be happily running on the other side of the network break, using the cluster disks and IP addresses. So the single node MUST NOT use any cluster disks/IPs, until it either can communicate with at least one other cluster node again, or the sysadmin explicitly uses the cmruncl command to start the cluster in single-node mode (effectively telling Serviceguard that all the other nodes are down for sure).

MK
Ajin_1
Valued Contributor

Re: 3 Node Cluster cmcld Process down

 

Hi MK

 

Thanks for prompt reply ,iam posting the questions you asked

Did the single node reboot on Aug 21?

Yes we have ,reboot node2

What does the "cmviewcl" command report on the other nodes?

 

All packages are running in Primary node.

 

Thanks in Advance for your kind support

Thanks & Regards
Ajin.S
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Matti_Kurkela
Honored Contributor

Re: 3 Node Cluster cmcld Process down

>>What does the "cmviewcl" command report on the other nodes?

 

>All packages are running in Primary node.

 

Yes, but what does the cmviewcl tell you about the status of the nodes? Do the other nodes think this node is halted or running? Or maybe starting? Or "unknown"?

MK