Operating System - HP-UX
1836626 Members
1612 Online
110102 Solutions
New Discussion

Serviceguard mystery problem

 
c.h.Ip
Occasional Contributor

Serviceguard mystery problem

hi,

Recently I found a 2-node cluster in a very strange state: The cluster is started up clean and running (having one oracle and two NFS packages), but...

After about two hours of operation, one node becomes behave strangely, the node can be ping-ed, telnet/ssh-ed into a host, but when running 'cmviewcl', the command reports that the node is unreachable, and the services running on that node are became unavailable.

Inspecting the syslog and cluster package logs on that machine found nothing special is written.

In addition, this problem is repeatable for multiple trying of halting the node and start the cluster for all.

Is that any hints that I can look into?

A lot of thanks!
5 REPLIES 5
nanan
Trusted Contributor

Re: Serviceguard mystery problem

Hi
Could you post the error log on the screen when you issue cmviewcl

Regards
nanan
IT_2007
Honored Contributor

Re: Serviceguard mystery problem

what do you get when you type cmviewcl -v
Steven E. Protter
Exalted Contributor

Re: Serviceguard mystery problem

Shalom,

Sounds to me like networking has crashed on the second node.

Take a took at /var/adm/syslog/syslog.log

Also while the second node is up, examine the network configuration in /etc/rc.config.d/netconf

cstm/mstm/xstm might be useful in pinpointing a network problem.

With no evidence other than circumstantial, my prime suspect is a bad NIC card.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Bill Hassell
Honored Contributor

Re: Serviceguard mystery problem

You might start with the very basics for each LAN card:

1. From each node, use linkloop between the local card and the other node. You'll need the MAC address for each card pair. This bypasses most of the networking software.

2. ping each node from the other node.

3. run traceroute between each node. This will rule out a possible switch or router problem, or a route config error in one of rhe nodes.

4. verify each network seervice (telnet, ssh, remsh/rlogin, ftp)

Note that a good picture of each LAN card's connectivity to ensure that the network path is still correct (electrically as well as logically).


Bill Hassell, sysadmin
Stephen Doud
Honored Contributor

Re: Serviceguard mystery problem

What is the exact error please.

Since the node appears to be unreachable only to Serviceguard, it seems that Serviceguard cannot connect to the cmclconfd daemon via inetd.

Try 'inetd -k' followed by 'inetd'.
Does the problem stop?
Try the cmviewcl command from the other node in the cluster - does it work?

From the symptoms, this doesn't sound like a permission problem, but rather a connection problem, either physical or configuration-wise.