Operating System - HP-UX
1827260 Members
2438 Online
109717 Solutions
New Discussion

Safety timer TOC, with SG on Integrity VM

 
SOLVED
Go to solution
likid0
Honored Contributor

Safety timer TOC, with SG on Integrity VM

Hy,

I hava built a little test enviroment, on one box
with 2 IVM version 3.50 with 11.23, i have installed and configured SG 11.17, i have a cluster running with no packages, when i halt the cluster or just one node, the other TOCs with safety timer panic on the shutdownlog.

Here is the panic and some config:

Apr 30 2008. Reboot after panic: SafetyTimer expired, INIT, IIP:0xe0000000014123d0 IFA:0x0000000000000045

This is what i get on the syslog of the machine that doesn't toc:
Apr 29 18:15:37 iumtest2 cmcld[5118]: Request from root on node iumtest3 to halt the cluster on this node
Apr 29 18:15:37 iumtest2 cmcld[5118]: Turning off safety time: node halting
Apr 29 18:15:37 iumtest2 cmcld[5118]: Service cmlvmd terminated due to an exit(0).

And on the machine that tocs i get no info at all.

Heartbeat networks and Quorum server are working ok.

I increased the node timeout, but I still have the same problem.

Cpu and mem are ok, because the machines are completly IDLE

Any idea on what else i can check?
Windows?, no thanks
13 REPLIES 13
likid0
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

I aso add the timeout params:

Apr 30 12:51:16 iumtest2 cmcld[3441]: Global Cluster Information:
Apr 30 12:51:16 iumtest2 cmcld[3441]: Heartbeat Interval is 1.00 seconds.
Apr 30 12:51:16 iumtest2 cmcld[3441]: Logging level changed to level 0.
Apr 30 12:51:16 iumtest2 cmcld[3441]: Node Timeout is 12.00 seconds.
Apr 30 12:51:16 iumtest2 cmcld[3441]: Network Polling Interval is 1.00 seconds.
Apr 30 12:51:16 iumtest2 cmcld[3441]: IO Timeout Extension is 70.00 seconds.
Apr 30 12:51:16 iumtest2 cmcld[3441]: Auto Start Timeout is 600.00 seconds.
Windows?, no thanks
Mridul Shrivastava
Honored Contributor
Solution

Re: Safety timer TOC, with SG on Integrity VM

How did u halt the node , can you share the command syntax.

Do you face the same issue if the other node is halted which is TOCing now ?

As long as one node is leaving the cluster gracefully there shouldn't be any issues on the other node.
Time has a wonderful way of weeding out the trivial
likid0
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

the thing is, if I run:

cmhaltcl the node where I run the command TOCs
cmhaltnode iumtest3 --> iumtest2 TOCs
cmhaltnode iumtest2 --> iumtest3 TOCs
Windows?, no thanks
likid0
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

It has to be some kind of communication time out, becasue if I use:

cmruncl -n iumtest3

and then:

cmhaltcl


the cluster stops with no problem.
Windows?, no thanks
Mridul Shrivastava
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

Could you please check the OLDsyslog.log on both the nodes since these are the main sources in these kind of cases.

Please check the output of cmscancl as well.

You can also check the flight recorder logs for more details. See if the node which is TOCing if its saving crash dump.
Time has a wonderful way of weeding out the trivial
likid0
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

Hy,

On the OLDsyslog from the machine that TOCs i don't have anything on the syslog, not even the node stop command, on the other node you get:

pr 29 18:15:37 iumtest2 cmcld[5118]: Request from root on node iumtest3 to halt the cluster on this node
Apr 29 18:15:37 iumtest2 cmcld[5118]: Turning off safety time: node halting
Apr 29 18:15:37 iumtest2 cmcld[5118]: Service cmlvmd terminated due to an exit(0).

and nothing else.

It's no leaving any cluster core dumps either.


Every thing looks ok on the cmscancl, i have attached it, if you want to have a look.


Thnx
Windows?, no thanks
Eric SAUBIGNAC
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

Bonjour,

I took a look at output of cmscancl. Nothing to say but heartbeat : it is attached to only one network 10.132.75.0 on Lan1. I would also attach it to 10.10.10.0 on lan0.

I don't understand how it could be an issue but found this in "HP Integrity Virtual Machines Installation, Configuration, and Administration Version A.03.50", page 154 :

-----------------------------
Whether Serviceguard is installed on the VM Host system or on the guest, HP
recommends that you configure every LAN as a heartbeat LAN.
-----------------------------

So try to replace stationnary_ip with heartbeat_ip for lan0 and tell us

Eric
likid0
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

I have made both networks HearBeat.

The same happens, this time a little more info was available on the machine that didn't panic:

May 5 15:53:05 iumtest2 cmcld[3110]: Request from root on node iumtest3 to halt the cluster on this node
May 5 15:53:05 iumtest2 cmcld[3110]: Request from node iumtest3 to disable node switching for package test1 on node iumtest2.
May 5 15:53:05 iumtest2 cmcld[3110]: Request from node iumtest3 to disable global switching for package test1.
May 5 15:53:05 iumtest2 cmcld[3110]: (iumtest3) Halted package test1 on node iumtest3.
May 5 15:53:05 iumtest2 cmcld[3110]: Request from node iumtest3 to enable global switching for package test1.
May 5 15:53:05 iumtest2 cmcld[3110]: Package test1 cannot run on this node because switching has been disabled for this node
May 5 15:53:05 iumtest2 cmcld[3110]: Turning off safety time: node halting
May 5 15:53:05 iumtest2 cmcld[3110]: Service cmlvmd terminated due to an exit(0).
May 5 15:56:19 iumtest2 cmcld[3110]: HB connection to 10.10.10.2 not responding, closing
May 5 15:56:19 iumtest2 cmcld[3110]: HB connection to 10.132.75.47 not responding, closing
May 5 15:56:19 iumtest2 cmcld[3110]: GS connection to 10.10.10.2 not responding, closing
May 5 15:56:19 iumtest2 cmcld[3110]: GS connection to 10.132.75.47 not responding, closing
May 5 15:56:19 iumtest2 cmcld[3110]: Service cmnetassistd terminated due to an exit(0).


Looks like the hole machie goes AWOL and Freezes, as soon as you type the cmhaltcl command
Windows?, no thanks
Mridul Shrivastava
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

These HB connection not responding messages should not appear and this seems to be the issue. Do you see these messages too frequently in syslog.log or they appear after cmhaltnode/cmhaltcl is executed ?

What abt the setup stability if u don't halt one node, does it run without any error messages like these on any of the node ??

If HB is not responding then cluster lock disk comes into the picture. Do we have a cluster lock VG and PV in place to avoid tie b/w nodes.
Time has a wonderful way of weeding out the trivial
likid0
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

Hy, yes as you say, I only get this errors when I issue a cmhaltcl/node, the rest of the time there are no errors in the syslog.

When it comes to cluster lock, I have tried with both Quorum server and lock disk, with the same result.

I have tried increasing the node timeout up to 10 minutes to test, and on the machine that u issue the command, it freezes for 10 minutes and then tocs.
Windows?, no thanks
Ivan Krastev
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

You can analyse the flight recorder dumps in /var/adm/cmcluster/frdump.cmcld.x using /usr/contrib/bin/cmfmtfr.


regards,
ivan
Eric SAUBIGNAC
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

Bonsoir,

Sorry for my late post : mostly out of office today.

Well, I have no obvious idea on what is happening. May be you could post some more informations :

- "hpvmnet", and "hpvmnet -S XXX" for each virtual switch XXX
- "hpvmstatus", and "hpvmstatus -P YYY" for each guest YYY
- cluster configuration file, package configuration file and package control script

Please, do the post in attachement, in a tar gzipped file : it is more readable and easier to use ;-)

Have you placed a call to HP ? And have you searched for all patches around HPVM an MCSG ?

Regards

Eric
likid0
Honored Contributor

Re: Safety timer TOC, with SG on Integrity VM

Hy,

I am at the moment, updating the Host and the client patches, also HPIVM to 3.5 and I am going to try using the aviolan drivers.

I will update when I finish and try the cluster out.
Windows?, no thanks