1833780 Members
2536 Online
110063 Solutions
New Discussion

Re: Reboot after panic:

 
Ashwin_4
Frequent Advisor

Reboot after panic:

Hi Masters,
We have HP L-CLass Server with hp-ux 11.x, which is rebooting frequently with following message in shutdownlog:

13:06 Mon Oct 11 2004. Reboot after panic: SafetyTimer expired, isr.ior = 0'92
27ffff.c0000000'e83b1030

How to resolve this issue?

Thanks.
6 REPLIES 6
Sunil Sharma_1
Honored Contributor

Re: Reboot after panic:

have a look on this thread.

http://forums1.itrc.hp.com/service/forums/questionanswer.do?admit=716493758+1097740689233+28353475&threadId=301388

Sunil
*** Dream as if you'll live forever. Live as if you'll die today ***
Sridhar Bhaskarla
Honored Contributor

Re: Reboot after panic:

Hi,

"Safety Timer" is associated with serviceguard daemon cmcld. If you are running Serviceguard, set your NODE_TIMEOUT value to atleast 8 seconds with HEARTBEAT_INTERVAL to 2 seconds. That should cover any intermittent network issues.

Also try installing latest ServiceGuard patches. Go to itrc.hp.com -> patches and search for 'cmcld' to find the patch.

If these systems are not setup with serviceguard, ignore this message. It's better to open a call with HP after runnign Q4 analysis.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Ashwin_4
Frequent Advisor

Re: Reboot after panic:

Hi, Parameter setting is as below:
# Cluster Timing Parmeters (microseconds).
HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 2000000


# Configuration/Reconfiguration Timing Parameters (microseconds).

AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000
Ashwin_4
Frequent Advisor

Re: Reboot after panic:

The output of Q4 is paste below:

q4> trace event 0
stack trace for event 0
crash event was a TOC
Send_Monarch_TOC+0x58
safety_time_check+0x190
per_spu_hardclock+0x3c
clock_int+0x58
mp_ext_interrupt+0x150
ivti_patch_to_nop3+0x0
idle+0xcbc
swidle_exit+0x0
Sridhar Bhaskarla
Honored Contributor

Re: Reboot after panic:

Hi Ashwin,

Any errors in OLDsyslog.log prior to crash?.

Crash event was a "TOC". So, there is a great possibility that the node TOC'ed due to heartbeat timeouts. See if there are any errors in OLDsyslog.log.

Heartbeat interval of 1 sec and Node timeout for 2 seconds, means if the node doesn't receive two successful heartbeats in 2 seconds then it will consider the other node as down.

This can happen if there are any 'intermittent freezes' on the system. They could be due to heavily set buffer cache (50%), a known bug with ident (in inetd.conf) etc., You can turn of ident in inetd.conf.

Do not forget latest serviceguard patches.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Kent Ostby
Honored Contributor

Re: Reboot after panic:

Ashwin --

Update your NODE TIMEOUT to 8 seconds and that will likely take care of the problem.

This is fairly common for people running SG with the default NODE TIMEOUT setting.

Best regards,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"