Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

reboot of cluster node when using a package IP address

 
mmfpbw
Advisor

reboot of cluster node when using a package IP address

Hello,

I I'm using MCSG 11.16.01 and SLES9 SP3 on a DL380 Cluster. When I'm trying to start a package that uses a (Package-) IP-Address the start fails and the cluster node starts to reboot after 20 seconds.

I'm using the 32Bit Versions of the SLES and the MCSG SW. I installed as well the current Proliant Service Packs. When I'm using the same package (which is a test package that does nothing) without the package IP address it starts as it should.

I have not opened a case at hp support yet. Maybe somebody here knows. I've seen that this case http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1012158 is probably related to my problem.


Matthias


--------------
wallace:/opt/cmcluster/ctest # cmrunpkg ctest

Message from syslogd@wallace at Mon May 22 11:48:48 2006 ...
wallace cmcld[12337]: Aborting! select failed (file: lcomm/local_server.c, line: 1165)

Message from syslogd@wallace at Mon May 22 11:48:48 2006 ...
wallace cmcld[12337]: Aborting! select failed (file: rcomm/comm_ip.c, line: 443)
cmrunpkg : Node wallace is currently unable to run package ctest.
Check the syslog on node wallace and pkg log files for more detailed information.


-------------------------
ctest.ctl.log

###### Node "wallace": Starting package at Mon May 22 10:48:47 BST 2006 ######
May 22 10:48:47 - Node "wallace": Starting md /dev/md10 .
mdadm: /dev/md10 has been started with 2 drives.
May 22 10:48:48 - Node "wallace": Activating volume group vgtest .
May 22 10:48:48 - Node "wallace": Checking filesystems:
/dev/vgtest/lvtest
/dev/vgtest/lvtest1
e2fsck 1.38 (30-Jun-2005)
/dev/vgtest/lvtest: clean, 11/25688 files, 7377/102400 blocks
e2fsck 1.38 (30-Jun-2005)
/dev/vgtest/lvtest1: clean, 11/51200 files, 10590/204800 blocks
May 22 10:48:48 - Node "wallace": Mounting /dev/vgtest/lvtest at /opt/informix
May 22 10:48:48 - Node "wallace": Mounting /dev/vgtest/lvtest1 at /home
May 22 10:48:48 - Node "wallace": Adding IP address 193.29.240.73 to subnet 193.29.240.0
WARNING: IP 193.29.240.73 is already configured on the subnet 193.29.240.0
May 22 10:48:48 - Node "wallace": Starting service ctest using
"/opt/cmcluster/ctest/ctest.mon"
cmrunserv : Unable to connect to daemon: Communication error on send
cmrunserv : Use the cmrunnode command to start the daemon on this node
ERROR: Function start_services; Failed to start service ctest
May 22 10:48:48 - Node "wallace": Halting service ctest
cmhaltserv : Unable to connect to daemon: Communication error on send
cmhaltserv : Use the cmrunnode command to start the daemon on this node
WARNING: Function halt_services; Failed to halt service ctest
May 22 10:48:48 - Node "wallace": Remove IP address 193.29.240.73 from subnet 193.29.240.0
May 22 10:48:48 - Node "wallace": Unmounting filesystem on /home
May 22 10:48:48 - Node "wallace": Unmounting filesystem on /opt/informix
May 22 10:48:48 - Node "wallace": Deactivating volume group vgtest
May 22 10:48:49 - Node "wallace": Deactivating md /dev/md10
###### Node "wallace": Package start FAILED at Mon May 22 10:48:49 BST 2006 ######



----------------------------------
/var/log/messages

May 22 10:48:48 wallace CM-ctest[13130]: cmmodnet -a -i 193.29.240.73 193.29.240.0
May 22 11:48:48 wallace kernel: NET: Registered protocol family 17
May 22 10:48:48 wallace cmcld[12337]: Aborting! select failed (file: lcomm/local_server.c, line: 1165)
May 22 10:48:48 wallace cmcld[12337]: select for port 46100 failed with Interrupted system call
May 22 10:48:48 wallace cmcld[12337]: 28, 1e1a4000, 0
May 22 10:48:48 wallace cmcld[12337]: 14 (read)
May 22 10:48:48 wallace cmcld[12337]: 17 (read)
May 22 10:48:48 wallace cmcld[12337]: 19 (read)
May 22 10:48:48 wallace cmcld[12337]: 20 (read)
May 22 10:48:48 wallace cmcld[12337]: 25 (read)
May 22 10:48:48 wallace cmcld[12337]: 26 (read)
May 22 10:48:48 wallace cmcld[12337]: 27 (read)
May 22 10:48:48 wallace cmcld[12337]: 28 (read)
May 22 10:48:48 wallace cmcld[12337]: Aborting! select failed (file: rcomm/comm_ip.c, line: 443)
May 22 10:48:48 wallace cmcld[12337]: select for port 46356 failed with Interrupted system call
May 22 10:48:48 wallace cmcld[12337]: 34, 81e00000, 7
May 22 10:48:48 wallace cmcld[12337]: 21 (read)
May 22 10:48:48 wallace cmcld[12337]: 22 (read)
May 22 10:48:48 wallace cmcld[12337]: 23 (read)
May 22 10:48:48 wallace cmcld[12337]: 24 (read)
May 22 10:48:48 wallace cmcld[12337]: 31 (read)
May 22 10:48:48 wallace cmcld[12337]: 32 (read)
May 22 10:48:48 wallace cmcld[12337]: 33 (read)
May 22 10:48:48 wallace cmcld[12337]: 34 (read)
May 22 10:48:48 wallace cmcld[12337]: Aborting! select failed (file: rcomm/comm_ip.c, line: 443)
May 22 10:48:48 wallace cmsrvassistd[12377]: The cluster daemon aborted our connection.
May 22 10:48:48 wallace cmsrvassistd[12377]: Lost connection with Serviceguard cluster daemon (cmcld): S\
oftware caused connection abort
May 22 10:48:48 wallace cmclconfd[13009]: The cluster daemon aborted our connection.
May 22 10:48:48 wallace cmclconfd[12328]: The Serviceguard daemon, /opt/cmcluster/bin/cmcld[12337], died\
upon receiving signal number 6.
May 22 10:48:48 wallace CM-ctest[13148]: cmrunserv ctest >> /opt/cmcluster/ctest/ctest.ctl.log 2>&1 /opt\
/cmcluster/ctest/ctest.mon
May 22 10:48:48 wallace CM-ctest[13153]: cmhaltserv ctest
May 22 10:48:48 wallace CM-ctest[13160]: cmmodnet -r -i 193.29.240.73 193.29.240.0


9 REPLIES
Luk Vandenbussche
Honored Contributor

Re: reboot of cluster node when using a package IP address

Hi,

It seems to me that your virtual ip adress 193.29.240.73 is already in use somewhere on your network when you start the package.

Can you ping to 193.29.240.73 when your package is down? Do you receive any reply?
mmfpbw
Advisor

Re: reboot of cluster node when using a package IP address

Hello Luk,

no the IP address 193.29.240.73 is not used in the network. It's definetly only used by this cluster. The cluster node IPs are 193.29.240.71 and 193.29.240.72.

Matthias
Rajesh SB
Esteemed Contributor

Re: reboot of cluster node when using a package IP address

Hi,

There are 2 aspects here.

1. Warning! message cleanly says that IP 193.29.240.73 is already configured.
Verify this IP is configured on the HA Node for ethernet like verify entries available /etc/sysconfig/network-scripts/ifcfg-eth
As Package IP is the virtual IP Cluster Daemon cmcld assign to node like eth0:0.
OR May IP is in use by other host.

2. Reason for node reboot after Package start fail could be, in Package configuration file if you enabled parameter SERVICE_FAIL_FAST_ENABLED=YES verify. If set YES make it NO and reapply the package configuration.

# The value for SERVICE_FAIL_FAST_ENABLED can be either YES or
# NO. If set to YES, in the event of a service failure, the
# cluster software will halt the node on which the service is
# running. If SERVICE_FAIL_FAST_ENABLED is not specified, the
# default will be NO.


Thanks & Regards,
Rajesh
mmfpbw
Advisor

Re: reboot of cluster node when using a package IP address

Hi Rajesh,

I've already checked this. The IP address is not in use. Not on this node nor on any other node on the network. The environment variable SERVICE_FAIL_FAST_ENABLED is set to NO.

Attached you'll find the applied conf and the ctl-script.

Thanks & regards
Matthias
John Bigg
Esteemed Contributor

Re: reboot of cluster node when using a package IP address

The reason for the reboot is because cmcld aborted:

May 22 10:48:48 wallace cmcld[12337]: Aborting! select failed (file: rcomm/comm_ip.c, line: 443)
May 22 10:48:48 wallace cmcld[12337]: select for port 46356 failed with Interrupted system call

To be honest, this should really go to HP engineering since even if there is a duplicate IP or similar cmcld should not abort.

I'd suggect contacting your response centre.
Rajesh SB
Esteemed Contributor

Re: reboot of cluster node when using a package IP address

Hi,

This really peculiar behaviour.
You run command
# cmscancl
This generates the /tmp/scancl.out file take a look at Network ethernet. This will help to isolate the problem.

Thanks & Regards,
Rajesh
mmfpbw
Advisor

Re: reboot of cluster node when using a package IP address

Hi,

I've attached the output of the command to this posting. Nothing suspicious... (at least for me)
I'll open a case at the support center. I'll let you know the results.

Thanks & regards
Matthias
mmfpbw
Advisor

Re: reboot of cluster node when using a package IP address

Hello,

I needed to install the current patch (16.11.05). This solved my problem.

Thanks & Regards
Matthias

mmfpbw
Advisor

Re: reboot of cluster node when using a package IP address

-