Serviceguard mystery problem

c.h.Ip · ‎08-26-2006

hi,

Recently I found a 2-node cluster in a very strange state: The cluster is started up clean and running (having one oracle and two NFS packages), but...

After about two hours of operation, one node becomes behave strangely, the node can be ping-ed, telnet/ssh-ed into a host, but when running 'cmviewcl', the command reports that the node is unreachable, and the services running on that node are became unavailable.

Inspecting the syslog and cluster package logs on that machine found nothing special is written.

In addition, this problem is repeatable for multiple trying of halting the node and start the cluster for all.

Is that any hints that I can look into?

A lot of thanks!

nanan · ‎08-26-2006

Hi
Could you post the error log on the screen when you issue cmviewcl

Regards
nanan

IT_2007 · ‎08-26-2006

what do you get when you type cmviewcl -v

Steven E. Protter · ‎08-27-2006

Shalom,

Sounds to me like networking has crashed on the second node.

Take a took at /var/adm/syslog/syslog.log

Also while the second node is up, examine the network configuration in /etc/rc.config.d/netconf

cstm/mstm/xstm might be useful in pinpointing a network problem.

With no evidence other than circumstantial, my prime suspect is a bad NIC card.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Bill Hassell · ‎08-27-2006

You might start with the very basics for each LAN card:

1. From each node, use linkloop between the local card and the other node. You'll need the MAC address for each card pair. This bypasses most of the networking software.

2. ping each node from the other node.

3. run traceroute between each node. This will rule out a possible switch or router problem, or a route config error in one of rhe nodes.

4. verify each network seervice (telnet, ssh, remsh/rlogin, ftp)

Note that a good picture of each LAN card's connectivity to ensure that the network path is still correct (electrically as well as logically).

Bill Hassell, sysadmin

Stephen Doud · ‎08-27-2006

What is the exact error please.

Since the node appears to be unreachable only to Serviceguard, it seems that Serviceguard cannot connect to the cmclconfd daemon via inetd.

Try 'inetd -k' followed by 'inetd'.
Does the problem stop?
Try the cmviewcl command from the other node in the cluster - does it work?

From the symptoms, this doesn't sound like a permission problem, but rather a connection problem, either physical or configuration-wise.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Serviceguard mystery problem

Serviceguard mystery problem

Re: Serviceguard mystery problem

Re: Serviceguard mystery problem

Re: Serviceguard mystery problem

Re: Serviceguard mystery problem

Re: Serviceguard mystery problem