Operating System - HP-UX
1846667 Members
3752 Online
110256 Solutions
New Discussion

Re: Why would cmviewcl command timeout?

 
Douglas D. Denney
Frequent Advisor

Why would cmviewcl command timeout?

I have a working cluster, running HPUX 11.11, ServiceGuard B.11.15, PARISC on 2 rp7410 servers.

/etc/cmcluster/cmclconf.ascii shows:

NODE_NAME nodeA
NETWORK_INTERFACE lan0
STATIONARY_IP 71.1.239.200
NETWORK_INTERFACE lan4
NETWORK_INTERFACE lan2
HEARTBEAT_IP 192.168.5.1
NETWORK_INTERFACE lan6
HEARTBEAT_IP 192.168.6.1
FIRST_CLUSTER_LOCK_PV /dev/dsk/c4t0d0
NODE_NAME nodeB
NETWORK_INTERFACE lan0
STATIONARY_IP 71.1.239.201
NETWORK_INTERFACE lan4
NETWORK_INTERFACE lan2
HEARTBEAT_IP 192.168.5.2
NETWORK_INTERFACE lan6
HEARTBEAT_IP 192.168.6.2
FIRST_CLUSTER_LOCK_PV /dev/dsk/c4t0d0

lan0 is the primary for each node; lan4 is the standby. Heartbeat lans are lan2 and lan6, with a lock disk.

All of this works together to provide an HA solution. The server sits in a special subnet of my company. The idea is that since this server is used for critical tasks, we want to be able to isolate it from the rest of the network, in case of network attack, virus attack, etc. So, our telecom people devised what we are calling a "drawbridge" on their routers. All services that are needed (such as DNS) are inside the special subnet. When an attack is declared, the telecom people pull up the drawbridge, which severs the connection between the special subnet and the rest of the company. All servers inside the special subnet should be able to survive without the link to the main company network.

A few months ago, we tested this to see if it work work as advertised. And, for the most part it did. An odd thing happened to the cluster mentioned above. While the drawbridge was up, cmviewcl, and most other cluster commands would fail:

cmviewcl : Cannot view the cluster configuration. Either this node is not confi
gured in a cluster, or else there is some obstacle to viewing the configuration.
Check the syslog file for more information. For a list of possible causes, see
the ServiceGuard manual for cmviewcl.

Occasionally (1 out of 10 tries), it would complete successfully.

If you put the drawbridge back down (ie, connected the special subnet back to the main company network), then cmviewcl would operate correctly.

No packages failed. The cluster was alive. It was strange that the cmviewcl command would fail as see above.

It caused problems with the applications because most of them use cmviewcl to determine on which node they are running and then act accordingly.

I've exhausted my list of things to look for. The DNS servers are correct. I even hardcoded things in the /etc/hosts file, thinking it may be the issue. The /etc/cmcluster/cmclnodelist has all the nodes, IP addresses, etc., in it that I thought it should need. Obviously, there is something that the cluster is trying to access that must exist outside its subnet, but I don't know what. Anyone have any ideas on where I can look next? Anyone have some quickstart guides to using nettl for looking at serviceguard traffic?

Thanks,
Doug
3 REPLIES 3
Sundar_7
Honored Contributor

Re: Why would cmviewcl command timeout?

Doug,

I do vaguely recollect seeing a problem similar to this.

I can tell you why it works after 10 tries. This is most probably because of the hung cmclconfd daemon.

Whenever cmviewcl is issued, cmclconfd is forked that gathers configuration information and status from other nodes in the cluster.

If for somereason cmclconfd could not finish or hung, cmviewcl will fail, but cmclconfd takes minutes to timeout and end.During this period, all cm* commands will fail, since it will use the exisiting cmclconfd daemon.

Next time this happens, the quickest way to recover is to kill cmclconfd (the one with -p) and then issue cmviewcl.

Sundar.

Learn What to do ,How to do and more importantly When to do ?
melvyn burnard
Honored Contributor

Re: Why would cmviewcl command timeout?

One thing to point out is that A.11.15 of Sg is no longer supported, you may want to consider at least ensuring the latest patch is on it, or uopgrade to a later supported revision (A.11.16 in this case)

Also there are specific ports that SG needs available, see:
http://docs.hp.com/en/5874/securingserviceguard_nov2005.pdf
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Stephen Doud
Honored Contributor

Re: Why would cmviewcl command timeout?

Sounds like your servers are configured to rely on DNS - which is not recommended for Serviceguard.

in /etc/nsswitch.conf, set
hosts: files dns

In /etc/hosts, list every fixed IP assigned to each NIC on each node in the cluster

Finally on each line, add the simple hostname of the server at the end of each line if it's not already listed on the line.