cancel
Showing results for 
Search instead for 
Did you mean: 

Update to SG 11.16_03

SOLVED
Go to solution
Huettner
Advisor

Update to SG 11.16_03

I have tried to Update an 2 Node Cluster.
OS: Sles9 SP2
SG 11.16
Update to 11.16_03

The Update is ok, no errors or warnings.
Start cmruncl, nothing works fine.
cmviewcl print Cluster down (no info for forming the Cluster. Then i looks into the messages and ther are the following meassages.

In the logfile from the cluster, nothing reportet.


Feb 11 14:45:52 bklsora1 CM-CMD[7873]: cmruncl -v
Feb 11 14:45:53 bklsora1 cmclconfd[7877]: Request from root on node bklsora1 to start the cluster on this node
Feb 11 14:45:53 bklsora1 cmcld[7886]: Logging level changed to level 0.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Logging level changed to level 0.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Daemon Initialization - Maximum number of packages supported for this incarnation is 150.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Global Cluster Information:
Feb 11 14:45:53 bklsora1 cmcld[7886]: Heartbeat Interval is 2.00 seconds.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Node Timeout is 8.00 seconds.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Network Polling Interval is 2.00 seconds.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Auto Start Timeout is 600.00 seconds.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Failover Optimization is disabled.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Information Specific to node bklsora1:
Feb 11 14:45:53 bklsora1 cmcld[7886]: Cluster lock disk: /dev/sda1.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Quorum Server: localhost.
Feb 11 14:45:53 bklsora1 cmcld[7886]: eth0 0x00:08:02:28:ff:04 150.12.23.110 bridged net:1
Feb 11 14:45:53 bklsora1 cmcld[7886]: eth2 0x00:11:0a:55:a2:59 192.168.0.8 bridged net:2
Feb 11 14:45:53 bklsora1 cmcld[7886]: Heartbeat Subnet: 150.12.0.0
Feb 11 14:45:53 bklsora1 cmcld[7886]: Heartbeat Subnet: 192.168.0.0
Feb 11 14:45:53 bklsora1 cmcld[7886]: The maximum # of concurrent local connections to the daemon that will be supported is 994.
Feb 11 14:45:53 bklsora1 cmcld[7886]: CLUSTER_RUNTIME_ID is set to 0
Feb 11 14:45:53 bklsora1 cmcld[7886]: Quorum server port number is 1238
Feb 11 14:45:53 bklsora1 cmcld[7886]: qm_cluster_lock_config:my_appl_id = bklsora1 old_appl_id = 1
Feb 11 14:45:53 bklsora1 cmcld[7886]: Quorum server probe interval is 1800000000
Feb 11 14:45:53 bklsora1 cmcld[7886]: Quorum server probe timeout interval is 28000000
Feb 11 14:45:53 bklsora1 cmcld[7886]: Quorum server request timeout interval is 28000000
Feb 11 14:45:53 bklsora1 cmcld[7886]: Lock LUN Device is /dev/sda1
Feb 11 14:45:53 bklsora1 cmcld[7886]: The quorum device localhost is being initialized.
Feb 11 14:45:53 bklsora1 cmcld[7886]: Assertion failed: (tsb_tmp).tsb_low <= TICKS_PER_MAX_USEC, file: cm/timers.c, line: 1082
Feb 11 14:45:53 bklsora1 cmsrvassistd[7921]: Unable to notify Serviceguard main daemon (cmcld): Connection reset by peer
Feb 11 14:45:53 bklsora1 cmsrvassistd[7920]: Unable to send 64 bytes (Software caused connection abort).
Feb 11 14:45:53 bklsora1 cmsrvassistd[7920]: Unable to notify Serviceguard main daemon (cmcld): Software caused connection abort
Feb 11 14:45:53 bklsora1 cmsrvassistd[7920]: The cluster daemon aborted our connection.
Feb 11 14:45:53 bklsora1 cmsrvassistd[7920]: Lost connection with Serviceguard cluster daemon (cmcld): Software caused connection abort
Feb 11 14:46:23 bklsora1 cmclconfd[7877]: The Serviceguard daemon, /opt/cmcluster/bin/cmcld[7886], died upon receiving signal number 6.
7 REPLIES
Steven E. Protter
Exalted Contributor

Re: Update to SG 11.16_03

Shalom Hüttner,

1) Both nodes updated with the software?
2) Check that networking and heartbeat is steady.
3) : The quorum device localhost is being initialized.

localhost as a quorum device?

Thats a bit unusual.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Huettner
Advisor

Re: Update to SG 11.16_03

Hi Steven,
Both Maschines are updated to SP2
SGLX is updated 03
on the Network i did`t mad any changes for the update. With Version SGLX01 everything is working fine.

The Cluster uses only a lockdisk. Quorumserver is`t configured.

I don`t know where the message Quorumserver local comes from.

Best regards
Hüttner
Huettner
Advisor

Re: Update to SG 11.16_03

Hallo

The update to SlES9 SP2 uses the Kernel 2.6.5-97 (old Kernel)befor update
John Bigg
Esteemed Contributor
Solution

Re: Update to SG 11.16_03

The error:

Feb 11 14:45:53 bklsora1 cmcld[7886]: Assertion failed: (tsb_tmp).tsb_low <= TICKS_PER_MAX_USEC, file: cm/timers.c, line: 1082

indicates that you are running a new version of cmcld from the patch, but the old deadman driver from the oringinal install. It would appear that something went wrong during the patch install process. Did you use sgupdate to install the patch as described in the patch texts or did you simply load the rpm? Loading the rpm is not enough. Another possibility is that you were booted from the wrong kernel when you ran sgupdate and the newer deadman driver was loading into the wrong kernel modules directory.

Anyway, to resolve this problem I suggest you perform the following actions:

1) Ensure your system is booted from the kernel you wish to use with Serviceguard.

2) unload the old deadman driver:

# rmmod -f deadman

3) build the new deadman driver installed by the 11.16.03 patch:

# cd $SGROOT/drivers/
# make modules
# make modules_install

4) reload this new driver:

# depmod -a
# modprobe deadman

Once you have done this you should find the cluster will form without trouble. You could check that the deadman.ko file in the $SGROOT/drivers/ directory is the same as the file in the /lib/modules//extra/deadman.ko

There was a defect in the original deadman code which required new code in cmcld and new code in the deadman driver in the 11.16.03 patch. If you have a mismatch between the deadman code and the cmcld code you get the assertion error you report.
Huettner
Advisor

Re: Update to SG 11.16_03

Hallo Jhon,
I updatet my system whith sgupdate, but i didn`t install a new Kernel.
I think that will solve the problem on my system.
In $SGROOT/drivers/README is also the Info for a new deadmandriver, this driver isent createt with sgupdate. When you install an new Kernel you bekam the sam Problem, this is a problem from sgupdate.
I postet this problem to HP, they make an scr for the deadmanproblem.

John Bigg
Esteemed Contributor

Re: Update to SG 11.16_03

I meant to say that the README contains more information. You are correct that sgupdate does not build the new deadman driver. This is done by the Serviceguard rpm. However, the Serviceguard rpm does not remove any old drivers first. This is why you need to use sgupdate which removes the old driver and moves the new driver built by the rpm into place.
Huettner
Advisor

Re: Update to SG 11.16_03

Thanks for your help.