Operating System - HP-UX
1832763 Members
2895 Online
110045 Solutions
New Discussion

Re: Urgent help needed - system crash

 
Tim Medford
Valued Contributor

Urgent help needed - system crash

Our production hpux server just rebooted itself for no reason. This box was installed into the production environment 1 week ago and has been behaving perfectly until now.

It is an rx3600 running 11.23, Sept 2006. There was nothing written to the syslog at all. I also cannot find a crash dump in /var/adm/crash. The /etc/rc.config.d/savecrash file has savecrash set to 0.

I have never had this happen before, can someone please advise?? I tried following some Q4 instructions I found out here, but since there's no crash file I'm not sure how to proceed.


In the iLO server log I can see these messages:


192 OSA 2 0x2146D319F0021160 FF0F066F001F0300 BOOT_FINISHED
27 Aug 2007 18:37:36
191 SFW 0 2 0x40801CBB00E01140 0000000000000000 BOOT_SWITCH_INSECURE_MODE
27 Aug 2007 18:34:36
190 SFW 0 0 0x548002C500E01120 0000000000000000 BOOT_REBOOT
27 Aug 2007 18:33:54
189 BMC 2 0x2046D31905021110 FFFF027000120300 SOFT_RESET
27 Aug 2007 18:33:41
188 HPUX 2 *3 0x7F80033702E010F0 00000000000AD700 HP-UX_CRASHDUMP_STARTED
27 Aug 2007 18:32:25
187 HPUX 2 *5 0xBF80033802E010D0 000000000002B100 HP-UX_HEX_FAULT_CODE
27 Aug 2007 18:32:23
186 OSA *7 0x2146D318B30210C0 FF0F016F00200300 OS_CRITICAL_SHUTDOWN
27 Aug 2007 18:32:19
185 SFW 6 *7 0xF480007906E010A0 0000000000000006 INIT_INITIATED
27 Aug 2007 18:32:17
184 SFW *7 0xC146D318B1021090 003FA36F00130300 INIT_INTERRUPT_INITIATED
27 Aug 2007 18:32:17
183 SFW 4 *7 0xF480007904E01070 0000000000000004 INIT_INITIATED
27 Aug 2007 18:32:17
182 SFW *7 0xC146D318B1021060 003FA36F00130300 INIT_INTERRUPT_INITIATED
27 Aug 2007 18:32:17
181 SFW 0 *7 0xF480007900E01040 0000000000000000 INIT_INITIATED
27 Aug 2007 18:32:17
180 SFW *7 0xC146D318B0021030 003FA36F00130300 INIT_INTERRUPT_INITIATED
27 Aug 2007 18:32:16



The Console log in the iLO also has this information:



Console Login: Calling function e000000000ff5680 for Shutdown State 8 type 0x2

Stored message buffer up to panic:
820175
4- 7 00000000_04378960 00000000_00000000 00000000_00000000 00000000_00000000
8-11 e0000001_b506ad6c 00000000_000005ac 00000000_00000001 e0000001_e98f5d48
12-15 e0000001_01197f40 9fffffff_7f7e8000 e0000001_e8f0cc40 e0000001_b506ab74
16-19 e0000001_b506ab30 e0000001_01197f40 e0000001_00a37b48 e0000001_e2fc33e8
20-23 00000000_00000083 00000000_00000007 e0000001_e2fc34c0 e0000001_01198000
24-27 00000000_00000631 e0000000_f00000d8 e0000001_b506ab74 e0000001_b9110508
28-31 e0000000_f00000d8 00000000_00000631 e0000001_b9110518 e0000000_f0000000

br_0-3 e0000000_005f0780 e0000000_005a5bf0 00000000_00000000 00000000_00000000
br_4-7 00000000_00000000 00000000_00000000 e0000000_005c1180 e0000001_080005c0

5 44 3 3 2 2 2 1 0 00 0
pr bits = -------------0-87---------7----2---8-6--3-----7-------9---54---0
pr value = 00058021_14820231

k0 00000000_00001b3c rsc 00000000_00000013 fpsr 0009804c_0270033f
k1 00000000_0029a288 bsp e0000001_011846f0 unat ffffffff_ffffffff
k2 00000000_00000000 bspstore e0000001_011844e0 lc 00000000_0000028e
k3 00000000_00000000 bsp_base 00000000_00000000
ppdp e0000001_01198000 dirty 00000000_00000210 csd 00000000_00000000
ktp e0000001_afd100c0 rnat 00000000_00000000 ssd 00000000_00000000
ksp 9fffffff_7f7e8882 ccv 00000000_00000000
sv e0000001_08003000 pfs 00000000_00001cc1 ec 00

iip e0000000_005f07c0/0 ifs 80000000_00001cc1 [i]psr 00001010_086ae01a
iipa e0000000_005f07b0 tpr 00000000_000000c0 isr 00000002_00000010
ibe sddiimic rtldsdpsddd p i mmaub
andrisdadtcspl tbpbiipphlt kic hlcpe
psr bits = ------------------01000000010000----10000110101-111-------01101-

e snirsn
deioirsparwx < code >
isr bits = --------------------000000000010--------000000000000000000010000

arg0=e0000000_005f07c0 (cr.iip)
arg1=00000000_00000000 (unknown)

Dirty Registers:
loc0 : e0000001_e98f5b80 e0000001_eb1f00c0 00000000_00000d9f e0000000_005a5dd0
loc4 : 00000000_0000028e 00058021_14820133 e0000001_01197f60 e0000001_01197f10
loc8 : e0000001_eb1f0225 e0000001_006e4510 e0000001_00bce920 e0000001_b506ac74
locc : e0000001_b506ac40 e0000001_b506ab64 e0000001_b506ac68 e0000001_b506ac6c
loc10: e0000001_b506ab68 e0000001_b506ac8c e0000001_b506ab58 e0000001_b506ab60
loc14: 00000000_00000000 00000000_00000000 e0000001_eb1f0224 e0000001_b506ab6c
loc18: e0000001_eb1f0227 e0000001_eb1f0226 e0000001_b506ae7c e0000001_eb1f0218
loc1c: e0000001_b506abf6 00000000_0000ffff e0000001_b506ac0c e0000001_b506ae78
loc20: e0000001_b506ac08 e0000001_b506abb8 00000000_00000112 e0000001_b506ae90
loc24: 00000000_e0880aee e0000001_b506ac70 e0000001_b506ab28 e0000001_b506acc0
loc28: e0000001_01197f70 00000000_00000000 00000000_000000ff e0000000_f0000058
loc2c: e0000001_eb1f0204 00000000_00008000 00000000_1c7ad1a9 e0000001_eb1f00e0
loc30: e0000001_eb1f00d8 00000000_1c7ad1aa e0000001_b506aba8 e0000001_006e22a0
loc34: 00000000_00000000 ffffffff_ffffff7f e0000001_b506acc4 00000000_000000cd
loc38: 00000000_00001125
out0 : e0000001_b506ab28 e0000001_eb1f0204 e0000001_eb1f0218 e0000001_b506ad10
out4 : 00000000_00000000 00000000_000005ac 00000000_00000001 e0000001_eb1f0234

System Panic:

panic: Bad News!

Stack Trace:
IP Function Name
0xe000000000c96ca0 bad_news+0x950
0xe000000000c95cc0 bubbledown
0xe0000000005f07c0 tcp_rput+0x22a0
0xe0000000005a5dd0 ip_wput_ioctl+0x5d0
0xe000000000836ac0 soft_intr_handler+0x220
0xe000000000835250 external_interrupt+0x3b0
0xe000000000c95cc0 bubbledown
0x0000000000000000
0x00000000000000c0 +0xc0
End of Stack Trace

linkstamp: Wed Aug 01 09:37:23 PDT 2007
_release_version: @(#) $Revision: vmunix: B11.23_LR FLAVOR=perf Fri Aug 29$

NOT sync'ing disks (on the ICS) (0 buffers to flush):
0 buffers not flushed
9 buffers still dirty

*** A system crash has occurred. (See the above messages for details.)
*** The system is now preparing to dump physical memory to disk, for use
*** in debugging the crash.

*** Cannot dump with compression because there are too few processors



*** Dumping without compression
*** The dump will be a SELECTIVE dump: 3452 of 32739 megabytes.
*** To change this dump type, press any key within 10 seconds.
*** Proceeding with selective dump.



Primary Dump Header Location :
Device : /dev/rdsk/c3t0d0 offset: 9689972.
*** The dump may be aborted at any time by pressing ESC.
***********************************************************dsk/c3t0d0 )
* ROM Version : 02.03
* ROM Date : 11/29/2006
* BMC Version : 05.14
***********************************************************
0 0 0x0015B2 0x0000000025228945 boot time event
1 0 0x0000A4 0x0000000000000000 start memory configuration
2 0 0x001CBB 0x0000000000000000 System set to insecure mode
15 REPLIES 15
Don Morris_1
Honored Contributor

Re: Urgent help needed - system crash

Most definitely enable system crash dump saving [since having the dump would be useful if this is a new problem]. (Edit the /etc/rc.config.d/savecrash file or use crashconf [man 1M crashconf]).

Most of the hits on this panic stack and string end up at JAGaf71680 and as such should have been resolved by PHNE_34671 -- but since that has a Warning against it, I would think you'd want PHNE_35766. That's a July 2007 patch, so I expect you don't have it. If it were me, and I could take the system down again (having just rebooted), I'd set the crashconf, apply the patch and watch for the problem re-occurring. If it does, contact Support and give them the crash.
Tim Medford
Valued Contributor

Re: Urgent help needed - system crash

Thanks Don,

I disabled that savecrash on the advice of another Forum post. When I setup the boot disk I have separate swap and dump volumes, and I thought this would just allow the system to come up faster without having to create the crash dump during the bootup?

Other post info:

"If you configure separate dump and swap then modify /etc/rc.config.d/savecrash and set SAVECRASH=0 because there is no need to compress and move the dump to /var before swap can be used."
Patrick Wallek
Honored Contributor

Re: Urgent help needed - system crash

In your case, that is correct. What you can now do is run 'savecrash' manually to save the crash dump off to disk from the dump device. See the 'savecrash' man page for details on the command.

As far as your crash goes, does /etc/shutdownlog show anything?
A. Clay Stephenson
Acclaimed Contributor

Re: Urgent help needed - system crash

I've seen these symptoms and they were fixed by PHNE_34671 & PHNE_33732 but these patches have been superseded by PHNE_35766 & PHNE_34788 respectively.
If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: Urgent help needed - system crash

Notice that your system DID dump; it just didn't compress the image and save it in /var before swap was activated. You probably read one of my postings about separate dump and swap areas and not having to compress/save the dump image. In any event, I'm all but certain the two patches I listed will fix you.
If it ain't broke, I can fix that.
Tim Medford
Valued Contributor

Re: Urgent help needed - system crash

There is nothing in /etc/shutdownlog other than a record of the last normal boot on the 19th.

I read through the man pages on savecrash. I'm not exactly sure which options to choose. This system is production and I cannot do anything destructive right now.

Is it ok to run savecrash now? My root disk has separate swap and dump volumes.

Steven E. Protter
Exalted Contributor

Re: Urgent help needed - system crash

Shalom,

You can press the TOC, Transfer of control button on your system and force a crash dump.

Having crash dump enabled, lets you do q4 analysis and let HP do what A. Clay did for you, namely name the patches you need to install.

The tricky part is you need the system to run long enough to get the patch installed. I suggest not letting users on the box or starting applications until the patches are in.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Don Morris_1
Honored Contributor

Re: Urgent help needed - system crash

With the dump ensconced on a non-swap dump device, savecrash shouldn't affect the system other than consuming some cpu and I/O resources [same as any process moving data around]. As far as I can tell, no special arguments should be needed (you're reading from the configured device, and /var/adm/crash should be fine for the dump directory target... use "-v" if you want to see what all happens.. worst case, you might have to use "-r" to resave if things aren't as you wish...).

Then you can get the dump off to Support for further analysis. I'd still recommend the network patch(es) since the one I mentioned covered many instances of this class of problem.

(Don't want to disagree with SEP here, but I wouldn't TOC the box at this point as you'd overwrite the dump with the problem with the TOC dump... just get the dump written to /var/adm/crash and off of the dump device as soon as possible.)
Tim Medford
Valued Contributor

Re: Urgent help needed - system crash

Thanks everyone for the replies. I have the crash dump created now.

There seems to be alternative wasy to run the Q4 analysis. Looks like there's a kwdb command now for Itanium 11.23.

kwdb -q4
Bill Hassell
Honored Contributor

Re: Urgent help needed - system crash

And a note about logs...there won't be any with possible exception of a 1-liner in /etc/shutdownlog. When the kernel panics, it instantly stops and begins the dump steps. There are no filesystems, no mounpoints, nothing. The crashdump is mandatory to determine the reason for a crash. By having a separate dump area, you can transfer the dump to a directory or tape after you reboot.


Bill Hassell, sysadmin
Tim Medford
Valued Contributor

Re: Urgent help needed - system crash

Thanks Bill. Yes, /etc/shutdownlog was no help at all "14:04 Mon Aug 27 2007. Reboot after panic: Bad News!".

I finally got the Q4 analysis completed and sent the info to HP. There were several errors and warnings during the analysis, I hope that's normal. I haven't run it before.

I'll attach the files in case anyone is extremely cored and wants to look at them before HP gets back to me.

Thanks everyone for the help.
whiteknight
Honored Contributor

Re: Urgent help needed - system crash

Tim,

it is known issue Arpa related issue.

PHNE_35766 11.23 cumulative ARPA Transport patch

Those are required + dependencies patches
========================================
PHCO_35524 LVM commands patch
PHKL_31500 Sept04 base patch
PHKL_36244 LVM Cumulative Patch
PHNE_34788 Cumulative STREAMS Patch
PHNE_35766 cumulative ARPA Transport patch


WK
please assign points
Problem never ends, you must know how to fix it
tkc
Esteemed Contributor

Re: Urgent help needed - system crash

if you wanna verify if this is related to any hardware issue, send the recent mca file saved in the /var/tombstones directory to hp for analysis done to the file.
Tim Medford
Valued Contributor

Re: Urgent help needed - system crash

HP has analyzed the crash dump and confirmed that the cause was a known bug, fixed in PHNE_35766 s700_800 11.23 cumulative ARPA Transport patch (as all of you have been telling me!)

Thank you everyone for your help in resolving this matter.

Tim
Tim Medford
Valued Contributor

Re: Urgent help needed - system crash

Resolved