Server Clustering
cancel
Showing results for 
Search instead for 
Did you mean: 

HP insight CMU Secondary server returns an error

wmz
Occasional Visitor

HP insight CMU Secondary server returns an error

Hello!

 

I've got a problem with dynamic information about nodes in CMU nodes list.

 

There are two nodes in one network entity, don't show their dynamic information and on a diagramm there is a grey shadow with a message: "State: Secondary server returns an error"

"cpuload: Not applicable"

 

I can enter these nodes with passwordless ssh. There are installed all libraries (tk, tcl, expect ...) and cmu_cn....

 

I think "Install CMU monitoring client" doesnt work. 

 

in SecondaryServerMonitoring_<nodename>.log:

missing sec debug level

missing smd debug level

missing master_hostname_ip

missing sec_ip

missing outgoingMSRRPort

missing outgoingMSHelloPort

missing incomingSLPort

missing nodes_file_path

missing AAFile

missing timestep

....\

 

 

what shall i do?

3 REPLIES
Alok_Pandey
Advisor

Re: HP insight CMU Secondary server returns an error

Is this a HP internal cluster or a customer cluster ?

If this is a customer, please raise the issue with the local HP Support Center.

 

Which version of CMU are you using?

How are you starting the monitoring service? Did you try to start the Secondary monitoring daemon manually on any of the compute nodes?

 

Can you please try restarting the CMU Monitoring service?               

  1. Stop the Monitoring service (from GUI, in Admin mode)
    • Monitoring ---> Stop Monitoring Engine

      Wait for few minutes to allow all the daemons to stop on all nodes.

 

2.   Start the Monitoring service (from GUI, in Admin mode)

    • Monitoring ---> Start Monitoring Engine

      Give it a while to allow starting the daemons on all nodes. 

 

Also make sure that password less ssh login works for self also on all compute nodes.

wmz
Occasional Visitor

Re: HP insight CMU Secondary server returns an error

Cmu v. 7.0

 

I don't know what had happened, but now i have 1 problem node.

 

CMU shows dyn. info about that node for a short period and after goes to a previous state ("not applicable").

 

Then cmu tries to restart the daemon and i can see dyn. information for a while. 

 

 

SecondaryServerMonitoring_<>.log:

mypid is 36984 CMUGetMonitoringDaemonLockFile

monitoring synchro is on

monitoring memlock is off

monitoring realtime priority parameter is 0

mypid is 36984 CMUGetMonitoringDaemonLockFile

killing process 36984: CMUKillDaemon

Halt single daemon msg received, exiting program MonitSlActOnMessageReceived

Fatal, thread_cancel failed could not find thread CMUThreadCancel

[Fatal] Error while trying to kill thread MonitRsKillThread

Coild not kill CS thread HaltMyThreadsAndDie

Stopping now HaltMyThreadsAndDie

--------------   HaltMyThreadsAndDie

 

Alok_Pandey
Advisor

Re: HP insight CMU Secondary server returns an error

Is this a HP internal cluster or a customer cluster?
If it is a customer cluster, please raise the issue with the local HP Support Center.

 

You are using an old version of CMU which is not supported any more.
Please upgrade to the latest CMU version 7.3.2

 

As you are using an out of support version of CMU, we can help you on best effort basis.
Here are few pointers to find the cause of issue.

 

Are you able to see the monitoring data for other node(s) in the Network Entity?

  1.  Ensure that password less ssh login works for self on all compute nodes.(using both hostname and IP)
  2.  Increase the debug level for monitoring logs by setting CMU_MAIN_MONITORING_DEBUG_LEVEL, CMU_SEC_MONITORING_DEBUG_LEVEL and CMU_SMD_MONITORING_DEBUG_LEVEL in "/opt/cmu/etc/cmuserver.conf" to 4.
  3.  Restart the CMU Monitoring service:
    • Stop the Monitoring service (from GUI, in Admin mode)
      • Monitoring ---> Stop Monitoring Engine
           Wait for few minutes to allow all the daemons to stop on all nodes.
    • Start the Monitoring service (from GUI, in Admin mode)
        • Monitoring ---> Start Monitoring Engine
              Give it a while to allow starting the daemons on all nodes. 
  4. Send us the following logs:   
    • On management node:
           /opt/cmu/log/MainMonitoringDaemon_<node>.log
    • On compute node running Secondary Monitoring Daemon:     
           /opt/cmu/log/SecondaryServerMonitoring_<node>.log
    • On all compute nodes:
           /opt/cmu/log/SmallMonitoringDaemon_<node>.log