1824179 Members
4394 Online
109669 Solutions
New Discussion юеВ

SG Node failure

 
Terry Johnson_1
Occasional Advisor

SG Node failure

Folks,

HPUX: 10.20
SG: 10.10

Thanks in advance for your help. I am having a problem getting a Service Guard node active. When I run a cmclquery ???v ???n from the node (or any other node) I get ???Protocol failure talking with cmclconfd on omtest01: no such file or directory. Error: unable to establish communication to node ???

I have checked all the network configs, rhosts and etc/services and everything looks good and appears to be functioning correctly. In SAM the heartbeat shows ???unknown??? for both interfaces of that node and node???s state ???unreachable???. Syslog shows ???cmclconfd: unable to lookup any node information in CDB: No such file or directory???

Any ideas?

Thanks

Gross
7 REPLIES 7
Uday_S_Ankolekar
Honored Contributor

Re: SG Node failure

Hi,
This problem has been known to occur when a cluster node is missing /var/adm/cmcluster/.cm_start_time.

Recreating this file on the node that no longer has a copy.

Goodluck,

-USA..
Good Luck..
James R. Ferguson
Acclaimed Contributor

Re: SG Node failure

Hi Buzz:

Try shutting down ServiceGuard and verify that all processes ('ps -ef|grep -i cm') are gone. Then restart and try again.

Regards!

...JRF...
Terry Johnson_1
Occasional Advisor

Re: SG Node failure

Uday,

Thanks for your response, but I forgot to mention that I also tried that without success.

Any other ideas.

Thanks
Terry Johnson_1
Occasional Advisor

Re: SG Node failure

JRF,

Thanks too ... I tried killing all ps with -9. When restarting the daemon, it looks like it tries to form the cluster but then fails. The syslog shows retries every two minutes.

Let me know if you have any other suggestions.

THANKS!!!
Sridhar Bhaskarla
Honored Contributor

Re: SG Node failure

Hi Buzz,

I can think of few cases.

1. .rhosts or cmclnodelist is missing on the node. It happened on my systems when my security dept removed .rhosts entries without letting me know.

2. inetd.conf info is incorrect. Like hacl-cfg is missing.

3. nslookup problems. Make sure the hosts are nslookable. /etc/hosts should not start with the fully qualified domain name but with the short name and optionally followed by the fully qualified domain name.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
James R. Ferguson
Acclaimed Contributor

Re: SG Node failure

Hi (again) Buzz:

There is a (usually) ten-minute AUTO_START_TIMEOUT before cluster formation will occur. Have you let this expire before manually initiating the cluster?

Also: What changed before your 'cmclquery' failure?

...JRF...
Uday_S_Ankolekar
Honored Contributor

Re: SG Node failure

Hi,

Check if you can resolve the hosts by nslookup.
Also check the entries are present in .rhosts or cmnodelist file.

If possible, remove the old cluster configuration binary file cmclconfig from all
hosts, revalidate the cluster ASCII configuraton file and re-apply it by cmcheckconf and cmapplyconf

-USA..
Good Luck..