- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- safety trime check causes TOC and crash
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 03:50 AM
07-03-2007 03:50 AM
safety trime check causes TOC and crash
I have a 2 node cluster with serviceguard. when I reboot a node, it crashes the first time with a TOC, and the event is the following:
#0 0x211800 in Send_Monarch_TOC+0x70 ()
#1 0x431f70 in safety_time_check+0x170 ()
#2 0x15fab8 in per_spu_hardclock+0x340 ()
#3 0x1601cc in clock_int+0x94 ()
#4 0x1692ac in mp_ext_interrupt+0x3ec ()
#5 0x16783c in lpmc_handler+0x90c ()
#6 0x179afc in idle+0xcac () <--- Trap in Kernel mode
#7 0x1772b4 in swidle+0x28 ()
during startup for some reason it fails again,
(and continues to crash for ever) and the event is the following:
Event selected is 0. It was a panic
#0 0x20d9a4 in panic+0x6c ()
#1 0x26308c in report_trap_or_int_and_panic+0x94 ()
#2 0x1666c8 in trap+0xef8 ()
#3 0x1689e8 in thandler+0xd24 ()
#4 0xb2858 in wsio_dev_node+0x40 () <--- Trap in Kernel mode
#5 0x1cf6e0 in wsio_get_node+0x38 ()
I think that I'm having here 2 separate problems:
the first one is the TOC generated by the safety_time_check - for that I tried to increase the NODE_TIMEOUT but that didn't help.
The second problem is fact that this TOC causes an endless loop of crashes.
Thanks,
Anat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 02:32 PM
07-03-2007 02:32 PM
Re: safety trime check causes TOC and crash
A couple of questions
Version of SG
Type of hardware
Patches both OS and SG version dependent
What is running on the box. Any special 3rd party hardware or drivers in the kernel. What types of applications.. What is the normal memory and CPU laod on the box. If the cmcld is starved for CPU and you have some real time or posix processes running it could starve the cmcld from doing its normal checks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 07:57 PM
07-03-2007 07:57 PM
Re: safety trime check causes TOC and crash
You may be able to get away with a reboot -q, but really this should only be used as a last resort as you are simulating a node failure rather than a graceful exit from the cluster so the other nodes have to re-form as if a failure had occurred.
You should *always* make sure Serviceguard is halted using cmhaltcl or cmhaltnode before issuing a reboot.
With regards to the second problem this will probably require dump analysis and you may need additional help from HP so I would suggest you contact your local support organisation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 08:56 PM
07-03-2007 08:56 PM
Re: safety trime check causes TOC and crash
first, i'll give some missing information:
sg version: A.11.16.00
hardware version : HPux 11iv6
a problem that we have is that we don't meet the minimum requirements of SG- we have only 1 nic.
I tried to do what you suggested - I halted the cluster, and only then I did reboot -q
but I recieved a crash.
when i tried to analyze the crashes,
there are no events:
q4> trace event 0
Invalid event no ( 0 )
q4> examine panicstr using s
Error accessing memory address 0x0: Invalid argument.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 09:10 PM
07-03-2007 09:10 PM
Re: safety trime check causes TOC and crash
This is confusing. Was that 1.6? 11.22 is no longer supported. But your stack trace doesn't look like an IPF box.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2007 09:17 PM
07-03-2007 09:17 PM
Re: safety trime check causes TOC and crash
HP-UX B.11.11 U 9000/800
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-04-2007 09:00 PM
07-04-2007 09:00 PM
Re: safety trime check causes TOC and crash
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2007 05:25 PM
07-07-2007 05:25 PM
Re: safety trime check causes TOC and crash
the index file for the crash with the safety_time_check is the following :
comment savecrash crash dump INDEX file
version 2
hostname daisy
modelname 9000/800/rp3440
panic SafetyTimer expired, isr.ior = 0'9227fff9.c0000000'e0391030
dumptime 1183329139 Sun Jul 1 16:32:19 MDT 2007
savetime 1183329492 Sun Jul 1 16:38:12 MDT 2007
release @(#) $Revision: vmunix: vw: -proj selectors: CUPI80_BL2000_1108 -c 'Vw for CUPI80_BL2000_1108 build' -- cupi80_bl2000_1108 'CUPI80_BL2000_1
108' Wed Nov 8 19:24:56 PST 2000 $
memsize 4292870144
chunksize 67108864
module /stand/vmunix vmunix 19347752 1077864474
module /stand/dlkm/mod.d/MY_MOD_0 1713536 3034328821
image image.1.1 0x0000000000000000 0x0000000003ff9000 0x0000000000000000 0x0000000000004b6f 1621674512
image image.1.2 0x0000000000000000 0x0000000003ff9000 0x0000000000004b70 0x0000000000008b67 1541992192
image image.1.3 0x0000000000000000 0x0000000003ff5000 0x0000000000008b68 0x00000000000304d7 3715982170
image image.1.4 0x0000000000000000 0x0000000003fff000 0x00000000000304d8 0x0000000000038457 1863461765
image image.1.5 0x0000000000000000 0x0000000003bdf000 0x0000000000038458 0x000000000003ffff 4053057157
image image.2.1 0x0000000000000000 0x0000000000540000 0x0000000004040000 0x00000000040ffdff 2206639257
the index file path for the crash followed after is:
comment savecrash crash dump INDEX file
version 2
hostname daisy
modelname 9000/800/rp3440
panic Data page fault
dumptime 1183329557 Sun Jul 1 16:39:17 MDT 2007
savetime 1183329897 Sun Jul 1 16:44:57 MDT 2007
release @(#) $Revision: vmunix: vw: -proj selectors: CUPI80_BL2000_1108 -c 'Vw for CUPI80_BL2000_1108 build' -- cupi80_bl2000_1108 'CUPI80_BL2000_1
108' Wed Nov 8 19:24:56 PST 2000 $
memsize 4292870144
chunksize 67108864
module /stand/vmunix vmunix 19347752 1077864474
image image.1.1 0x0000000000000000 0x0000000003fff000 0x0000000000000000 0x0000000000004b77 1800458738
image image.1.2 0x0000000000000000 0x0000000003ff9000 0x0000000000004b78 0x0000000000008b6f 95519830
image image.1.3 0x0000000000000000 0x0000000003ff8000 0x0000000000008b70 0x0000000000030357 2358701351
image image.1.4 0x0000000000000000 0x0000000003ffb000 0x0000000000030358 0x000000000003834f 2641749233
image image.1.5 0x0000000000000000 0x0000000003cb3000 0x0000000000038350 0x000000000003ffff 3958451556
image image.2.1 0x0000000000000000 0x000000000048f000 0x0000000004040000 0x00000000040ffdff 758875273
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2007 03:08 AM
07-08-2007 03:08 AM
Re: safety trime check causes TOC and crash
Confirm you have enough dump size confifigured to capture the full dump.
"crashconf" output says the dump device configured. Compare the swapinfo -tam and ideally all the swap devices should be configured as dump too, assuming you dont have any device/LV configured specifically as dump device alone.
what is the physcial memory on the system( i am not sure if it is 4GB)?what is the average memory utilization when the server crashes?
the first crash is SG TOC and the second one is likly related to memory
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2007 03:10 AM
07-08-2007 03:10 AM
Re: safety trime check causes TOC and crash
Confirm you have enough dump size confifigured to capture the full dump.
"crashconf" output says the dump device configured. Compare the swapinfo -tam and ideally all the swap devices should be configured as dump too, assuming you dont have any device/LV configured specifically as dump device alone.
what is the physcial memory on the system( i am not sure if it is 4GB)?what is the average memory utilization when the server crashes?
Also try a "savecrash -vr DIR" if you want to create the crashdump for analysis.
the first crash is SG TOC and the second one is likly related to memory
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2007 03:10 AM
07-08-2007 03:10 AM
Re: safety trime check causes TOC and crash
Confirm you have enough dump size confifigured to capture the full dump.
"crashconf" output says the dump device configured. Compare the swapinfo -tam and ideally all the swap devices should be configured as dump too, assuming you dont have any device/LV configured specifically as dump device alone.
what is the physcial memory on the system( i am not sure if it is 4GB)?what is the average memory utilization when the server crashes?
the first crash is SG TOC and the second one is likly related to memory
Also try a "savecrash -vr DIR" if you want to create the crashdump for analysis.