Operating System - HP-UX
1825793 Members
2219 Online
109687 Solutions
New Discussion

vmunix: Serviceguard Aborting!

 
Ignacio Javier
Regular Advisor

vmunix: Serviceguard Aborting!


Hi everybody:

When i try to run the cluster i get this:

Jan 30 18:09:19 primario cmcld: Starting cluster management protocols.
Jan 30 18:09:19 primario cmcld: Attempting to form a new cluster
Jan 30 18:09:19 primario cmcld: Beginning standard election
Jan 30 18:09:28 primario vmunix: Serviceguard Aborting!
Jan 30 18:09:28 primario vmunix: Cause: Timed out waiting for connection cleanup
Jan 30 18:09:28 primario vmunix: (File: rcomm/comm_config.c, Line: 1517)
Jan 30 18:09:28 primario cmcld: Communication to node secundar has been interrupted
Jan 30 18:09:28 primario cmcld: Attempting to form a new cluster
Jan 30 18:09:28 primario cmcld: Beginning standard election
Jan 30 18:09:28 primario cmcld: New node secundar is joining the cluster
Jan 30 18:09:28 primario cmcld: Clearing Cluster Lock
Jan 30 18:09:29 primario cmcld: Request to clear cluster lock /dev/dsk/c5t1d0 failed: Device busy
Jan 30 18:09:29 primario cmcld: Aborting! Timed out waiting for connection cleanup
Jan 30 18:09:34 primario cmsrvassistd[6056]: The cluster daemon aborted our connection.
Jan 30 18:09:34 primario cmsrvassistd[6056]: Lost connection with Serviceguard cluster daemon (cmcld): Software caused connection ab
ort
Jan 30 18:09:34 primario cmlvmd[6058]: The cluster daemon aborted our connection.
Jan 30 18:09:34 primario cmlvmd[6058]: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Jan 30 18:09:34 primario cmclconfd[5822]: The cluster daemon aborted our connection.

It is a two rx1620 ia node cluster. hpux11.23
sg: A.11.16.00
When this happens, the other node gets hung.

I have tried yo make a sigle node cluster no simplify and try no find the error, but when it starts it procudes the node to hang.

What do you think it could be ?

Thans for helping
9 REPLIES 9
A. Clay Stephenson
Acclaimed Contributor

Re: vmunix: Serviceguard Aborting!

Does the other node hand or does it do a TOC? Normally, when a cluster cannot form, the desirable behavior is for one of the nodes to do a TOC (Transfer of Control). This prevents possible data corruption should more than one node attempt to access data.

I would edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=0 on both nodes and then reboot. This will prevent the cluster from trying to form on boot and allow you to check network connections and disk availability manually. After troubleshooting the network and disks, you can then run cmruncl manually to attempt to start the cluster.
If it ain't broke, I can fix that.
Ignacio Javier
Regular Advisor

Re: vmunix: Serviceguard Aborting!


Hi:

It hangs...
I can make a ping.
I can do a telnet but i can not get in....

The console hangs too. I can enter but is not operative...


Regards
Mustafa Gulercan
Respected Contributor

Re: vmunix: Serviceguard Aborting!

hi;
i research for "cmcld: Request to clear cluster lock /dev/dsk/c5t1d0 failed: Device busy
" and "cmcld: Clearing Cluster Lock";

i found these, pls read.Maybe it will help you.

"SG is checking clusterlock disk just before it tries
to clear clusterlock. Because of checking, the kernel is returning EBUSY to
Serviceguard when Serviceguard makes a request to clear the clusterlock.
The clusterlock check is done every minute during cluster reformation.
Then on halting a node, we do clear clusterlock. If the check is not finished
so clear will fail. But this is not critical because the clear will be retried
an unlimited number of times and go through once the check is finished.
If all retries fail and a new cluster tries to form, then the reformation
would fail and this node would go down.
"

regards;
mustafa
melvyn burnard
Honored Contributor

Re: vmunix: Serviceguard Aborting!

do you have the relevant patch for Serviceguard?
what /usr/lbin/cmcld | grep PHSS
If not, patch the servers with the latest SG A.11.16 patch for HP-UX 11.23, available from the ITRC
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Carsten Krege
Honored Contributor

Re: vmunix: Serviceguard Aborting!

cmcld aborted and wrote a core file to /var/adm/cmcluster/core. This happened before when someone removed the IP address from a lan interface used by SG using ifconfig. Rather than being a SG problem, I'd say that SG is the messenger not the problem itself. YOu should verify the information stored in the SG binary (run cmviewconf) with what you have on the system.

Carsten
-------------------------------------------------------------------------------------------------
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- HhGttG
Ignacio Javier
Regular Advisor

Re: vmunix: Serviceguard Aborting!


Hi:

The system is patched. Maybe one of the problems is that. I have installed de september 2006 QPK
I have checked the configuration a it looks fine. I seems it is not a missconfigured packages because it does not get to that point.
It has to be, if it is a missconfiguration, a problem with de cluster configuration.
I had a copy of the cluster config file when it was running fine. I have applyed and it continues going wrong.

help !!

Thans everybody
Ignacio Javier
Regular Advisor

Re: vmunix: Serviceguard Aborting!


Ok:

My next step to try to isolate the problem...:

I have made a one node cluster.
It hangs like a said before.
I made a cmscancl and the only stange thing that i see is:

------ Output of lvmpvg (primario) ------

cat: Cannot open /etc/lvmpvg: No such file or directory


What do you think ??

Thanks again
Stephen Doud
Honored Contributor

Re: vmunix: Serviceguard Aborting!

/etc/lvmpvg is not a required file. Don't pay attention to that in scancl.out

Melvyn suggested installing the latest SG patch.

Carsten suggested comparing the cluster binary content for network configuration to actual network configuration (use cmviewconf), netstat -in and lanscan).

Did you patch the cluster nodes?
How do the network configurations compare.
If you built a one-node cluster on one node, can you do the same on the other and see which one is giving you the trouble?
If so, focus on that node. You may have to re-install Serviceguard code if patching doesn't help (suspicion it may be corrupted) - or did you install other patches to the server prior to the problem?
Ignacio Javier
Regular Advisor

Re: vmunix: Serviceguard Aborting!


Hi:

I have made a rollback off the QPCK Sep 2006 and everything is ok now.

I have to investigate know what is happening with those patches.

Thanks everybody