Re: Reboot after panic:

Ashwin_4 · ‎10-13-2004

Hi Masters,
We have HP L-CLass Server with hp-ux 11.x, which is rebooting frequently with following message in shutdownlog:

13:06 Mon Oct 11 2004. Reboot after panic: SafetyTimer expired, isr.ior = 0'92
27ffff.c0000000'e83b1030

How to resolve this issue?

Thanks.

Sunil Sharma_1 · ‎10-13-2004

have a look on this thread.

http://forums1.itrc.hp.com/service/forums/questionanswer.do?admit=716493758+1097740689233+28353475&threadId=301388

Sunil

*** Dream as if you'll live forever. Live as if you'll die today ***

Sridhar Bhaskarla · ‎10-13-2004

Hi,

"Safety Timer" is associated with serviceguard daemon cmcld. If you are running Serviceguard, set your NODE_TIMEOUT value to atleast 8 seconds with HEARTBEAT_INTERVAL to 2 seconds. That should cover any intermittent network issues.

Also try installing latest ServiceGuard patches. Go to itrc.hp.com -> patches and search for 'cmcld' to find the patch.

If these systems are not setup with serviceguard, ignore this message. It's better to open a call with HP after runnign Q4 analysis.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

Ashwin_4 · ‎10-13-2004

Hi, Parameter setting is as below:
# Cluster Timing Parmeters (microseconds).
HEARTBEAT_INTERVAL 1000000
NODE_TIMEOUT 2000000

# Configuration/Reconfiguration Timing Parameters (microseconds).

AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000

Ashwin_4 · ‎10-13-2004

The output of Q4 is paste below:

q4> trace event 0
stack trace for event 0
crash event was a TOC
Send_Monarch_TOC+0x58
safety_time_check+0x190
per_spu_hardclock+0x3c
clock_int+0x58
mp_ext_interrupt+0x150
ivti_patch_to_nop3+0x0
idle+0xcbc
swidle_exit+0x0

Sridhar Bhaskarla · ‎10-13-2004

Hi Ashwin,

Any errors in OLDsyslog.log prior to crash?.

Crash event was a "TOC". So, there is a great possibility that the node TOC'ed due to heartbeat timeouts. See if there are any errors in OLDsyslog.log.

Heartbeat interval of 1 sec and Node timeout for 2 seconds, means if the node doesn't receive two successful heartbeats in 2 seconds then it will consider the other node as down.

This can happen if there are any 'intermittent freezes' on the system. They could be due to heavily set buffer cache (50%), a known bug with ident (in inetd.conf) etc., You can turn of ident in inetd.conf.

Do not forget latest serviceguard patches.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

Kent Ostby · ‎10-14-2004

Ashwin --

Update your NODE TIMEOUT to 8 seconds and that will likely take care of the problem.

This is fairly common for people running SG with the default NODE TIMEOUT setting.

Best regards,

Kent M. Ostby

"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Reboot after panic:

Reboot after panic: