Operating System - Linux
1752579 Members
4160 Online
108788 Solutions
New Discussion

Red Hat cluster running on RHEL 6.4 fails

 
ramesh9
Honored Contributor

Red Hat cluster running on RHEL 6.4 fails

Hi All

 

I am new to cluster and hope you can help in this query.

 

We have 2 node cluster configured in RHEL 6.4 using fencing method and we had installed an application on top it.

 

Suddenly we see one cluster node stops responding and moving to the other node. In the other node also it is not getting started and cluster service is in stopped state.

 

On looking at /var/log/cluster/rgmanager.log I see following,

 

Jan 13 10:32:01 rgmanager [NNMscript] Executing /var/opt/OV/hacluster/perfspi/nnmharhcs status
Jan 13 10:32:14 rgmanager [NNMscript] NNMscript:perfspi-APP: status of /var/opt/OV/hacluster/perfspi/nnmharhcs failed (returned 1)
Jan 13 10:32:14 rgmanager status on NNMscript "perfspi-APP" returned 1 (generic error)
Jan 13 10:32:14 rgmanager Stopping service service:perfspi
Jan 13 10:32:15 rgmanager [NNMscript] Executing /var/opt/OV/hacluster/perfspi/nnmharhcs stop
Jan 13 10:32:59 rgmanager [fs] unmounting /NNMPerformanceSPI
Jan 13 10:32:59 rgmanager [fs] umount failed: 1
Jan 13 10:32:59 rgmanager [fs] Sending SIGTERM to processes on /NNMPerformanceSPI
Jan 13 10:33:04 rgmanager [fs] unmounting /NNMPerformanceSPI
Jan 13 10:33:04 rgmanager [fs] umount failed: 1
Jan 13 10:33:04 rgmanager [fs] Sending SIGKILL to processes on /NNMPerformanceSPI
Jan 13 10:33:10 rgmanager [fs] unmounting /NNMPerformanceSPI
Jan 13 10:33:14 rgmanager [lvm] Stripping tag, node1.cluster.com

Jan 13 10:33:15 rgmanager [ip] Removing IPv4 address xx.xx.xx.xx/xx.xx from eth1
Jan 13 10:33:26 rgmanager Service service:perfspi is recovering
Jan 13 10:35:04 rgmanager Service service:perfspi is stopped

 

I understand "nnmharhcs" monitors cluster service or shared disk or something else and returns 1 ( Generic error ), based on which "nnmharhcs stop" is been called.

 

Please let me know on what conditions cluster status can return 1 ( Generic error ).

 

Please help.