1833390 Members
3138 Online
110052 Solutions
New Discussion

Cluster monitoring

 
SOLVED
Go to solution
David Prinz
Occasional Contributor

Cluster monitoring

What is the difference between the local node cluster status of HALTED and FAILED in the EMS monitoring? I need this information for some testing and I figure I can create a HALTED state by executing a cmhaltnode, but I cannot figure out how to force a FAILED state. Any help would be appreciated. Thanks.
6 REPLIES 6
Ashwani Kashyap
Honored Contributor

Re: Cluster monitoring

I am not sure , but try either TOCing one of the nodes or physically turn the power off without shutting it down and then do a cmviewcl on the other node .
Victor_5
Trusted Contributor

Re: Cluster monitoring

Failed A node never sees itself in this state. Other active members of the cluster will see a node in this state.

Halted A node never sees itself in this state. Other nodes will see it in this state after the node has gracefully left the cluster.
David Prinz
Occasional Contributor

Re: Cluster monitoring

I appreciate the quick response, but then why when I open the EMS Monitoring Service from SAM and select the /cluster/localNode/status resource and look at the possible values for the instance HALTED and FAILED seem to refer to the node I am on.

I tested running cmhaltnode on one of my 2 nodes in the cluster and the state changed to HALTED as I expected.

FAILED is defined as "node is no longer a member of a active cluster"
Rita C Workman
Honored Contributor

Re: Cluster monitoring

Have you tried dropping the network connections...that will force failover fairly handily..

Halted means you have stopped the node. i.e. gracefully halted
Failed means the node is out of communication with the cluster, and is viewed as failed. The reason could be any number of points of failure..heartbeat/lan/power/node just plain dead, etc.

I'd stop my applications before I dropped the network...but I'd leave the package running so it would failover though...

Rgrds,
Rita
Jeff Schussele
Honored Contributor

Re: Cluster monitoring

I agree with Rita - pull the NIC cable(s) to force the pkg to fail. The system remains running but the pkg will halt or failover depending on the pkg control definition.
That's how we do our failover testing here.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Jeff Schussele
Honored Contributor
Solution

Re: Cluster monitoring

Oh, I should have added that pulling the Public IP cables forces a pkg halt/failover.
Pulling the heartbeat cables will force a node failure.

Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!