Operating System - HP-UX
1832880 Members
2591 Online
110048 Solutions
New Discussion

Re: D380 Crashes/Undetermined

 
Erik Bond
Occasional Contributor

D380 Crashes/Undetermined

I have a D380 system, checked the STM for pdt entries, syslog OLDsyslog Dmesg etc...My system keeps going down every other day. Nothing in the ts99/98 files etc...
The shutdown log only has this entry: 12:16 Sun Jul 18 2004. Reboot after panic: , isr.ior = 0'9227ffff.c0000000'e

Memory in 5a/5b was recently replaced due to a mult-bit error as a precaution. The unit is plugged into an UPS, we did remove the server from the UPS and direct connected it to facility power with a surge protector, to rule out the UPS. Can anyone suggest what my problem may be. Unix 11.0-all files checked, STM/DMEGS/TS99/Shutdownlog/ etc...nothing is showing up as a problem. There was one message in the OLDsyslog- the very last entry stated "going down on signal 15" and that was it.....thanks,
11 REPLIES 11
Jeff Schussele
Honored Contributor

Re: D380 Crashes/Undetermined

Hi,

You should have tombstones for these events.
Look in /var/adm/crash for files named ts99, ts98, etc. The most recent will be ts99, then ts98, etc. HP can decode these.
I suspect you're getting HPMCs which would indicate a serious HW problem.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Chris Wilshaw
Honored Contributor

Re: D380 Crashes/Undetermined

It could still be a hardware issue.

Check /var/tombstones/ts99

If there have been no problems detected, you'll see;

------- Processor 0 HPMC Information - PDC Version: 42.19 ------

* * * No valid timestamp * * *


No HPMC chassis codes logged


If there are chassis codes, then you'll need to pass the details to HP.

IF there's no hardware error, then it's a kernel panic, so you'll need to look at your patching levels. Again, HP will be able to assist.
Erik Bond
Occasional Contributor

Re: D380 Crashes/Undetermined

We've already had the tombstones checked, all entries, for all 3 crashes are clean, there are no hex codes in the tombstones. Again, every aspect of the machine was checked and HP and I, are not seeing anything out of sorts. But the unit does this every so often....
Patrick Wallek
Honored Contributor

Re: D380 Crashes/Undetermined

If the machine is panicing, then it should be generating a crashdump when it comes back up. Check /var/adm/crash for the crash dump files. Verify that your /var file system has at least as much free space as you have RAM so that you can obtain the full crash dump. Also, if your primary swap space is your dump space, make sure it is at least as large as your RAM.

The /var/adm/crash files are really what needs to be analyzed here. If HP hasn't asked about them, you need a different tech helping you out.
Kent Ostby
Honored Contributor

Re: D380 Crashes/Undetermined

Erik ....

isr.ior is generally one of two things:

#1) HPMC -- I'd be interested to have you post one of the ts99 files since another set of eyes will occassionally find something.

#2) ServiceGuard TOC -- Are you running ServiceGuard ? If so, you should take a look at the end of the OLDsyslog.log file on the machine that crashed and post the last say 40 lines of it here so we can see what was going on.

Best regards,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Steven E. Protter
Exalted Contributor

Re: D380 Crashes/Undetermined

Check the UPS batteries.

We had a box that started going down with minor power fluctuations because we didn't know the batteries were 15 months past HP's life cycle.

We still have a D380 in production for a few more weeks believe it or not.

I'd also set up for crash dumps.

Set the first variable in /rc.config.d/savecrash to 1.

If it already is check /var/adm/crash for crash dumps.

Do a q4 analysis and have HP tell you what patch is missing.

q4 docs attached. There is an HP doc number at the top which you can search itrc for, my document is a summary because HP doesn't like me posting docs that are only available to service contract users. The original is somewhat better.

SEP



SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Erik Bond
Occasional Contributor

Re: D380 Crashes/Undetermined

here is the dump from Sunday,

BSA.vistar>pwd
/var/adm/crash/crash.0
BSA.vistar>ll
total 884944
-rw-r--r-- 1 root root 1289 Jul 18 12:16 INDEX
-rw-r--r-- 1 root root 67104768 Jul 18 12:15 image.1.1
-rw-r--r-- 1 root root 67080192 Jul 18 12:15 image.1.2
-rw-r--r-- 1 root root 67088384 Jul 18 12:15 image.1.3
-rw-r--r-- 1 root root 67080192 Jul 18 12:16 image.1.4
-rw-r--r-- 1 root root 67088384 Jul 18 12:16 image.1.5
-rw-r--r-- 1 root root 67080192 Jul 18 12:16 image.1.6
-rw-r--r-- 1 root root 28741632 Jul 18 12:16 image.1.7
-rw-r--r-- 1 root root 21784632 Jul 18 12:14 vmunix

The only readable file in INDEX:

BSA.vistar>cat IN*
comment savecrash crash dump INDEX file
version 2
hostname select01
modelname 9000/800/R380
panic , isr.ior = 0'9227ffff.c0000000'e8331030
dumptime 1090166521 Sun Jul 18 12:02:01 EDT 2004
savetime 1090167286 Sun Jul 18 12:14:46 EDT 2004
release @(#) $Revision: vmunix: vw: -proj selectors:
CUPI80_BL2000_
1108 -c 'Vw for CUPI80_BL2000_1108 build' -- cupi80_bl2000_1108 'CUPI80_BL2000_1 108' Wed Nov 8 19:05:38 PST 2000 $
memsize 1610612736
chunksize 67108864
module /stand/vmunix vmunix 21784632 4057314133
image image.1.1 0x0000000000000000 0x0000000003fff000 0x0000000000000000
0x0
0000000000049c7 4202223366
image image.1.2 0x0000000000000000 0x0000000003ff9000 0x00000000000049c8
0x0
000000000017917 2797385876
image image.1.3 0x0000000000000000 0x0000000003ffb000 0x0000000000017918
0x0
00000000002b447 3328354480
image image.1.4 0x0000000000000000 0x0000000003ff9000 0x000000000002b448
0x0
00000000002f43f 3632271723
image image.1.5 0x0000000000000000 0x0000000003ffb000 0x000000000002f440
0x0
000000000046d8f 3254409714
image image.1.6 0x0000000000000000 0x0000000003ff9000 0x0000000000046d90
0x0
00000000005e497 1016631242
image image.1.7 0x0000000000000000 0x0000000001b69000 0x000000000005e498
0x0
00000000005ffff 3488816613
Kent Ostby
Honored Contributor

Re: D380 Crashes/Undetermined

Eric --

Again you would have to run Q4 to get more details from these files or log a call with HP.

Do you have ServiceGuard on the box ?

Can you type the following command and post the results here as well:

grep HPMC /var/tombstone/ts99

Thanks,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Erik Bond
Occasional Contributor

Re: D380 Crashes/Undetermined

Hey Kent,
you can call my cell, 614-506-3374.
Erik.

I can explain a little better.

Erik
Mohanasundaram_1
Honored Contributor

Re: D380 Crashes/Undetermined

Hi Erik,

The D-class systems had a problem with the DC switch at its front. This would cause such shutdowns described by you.

The crash analysis should reveal this. If there is no HPMC and if it has nothing to do with your applications, give this a shot.

Check with your HP team about this switch issue. The solution will be to replace the DC switch kit.

Cheers,
Mohan.
Attitude, Not aptitude, determines your altitude
Mohanasundaram_1
Honored Contributor

Re: D380 Crashes/Undetermined

Also check if anything specific is run when the system goes down.

Maybe some batch job in cron or some user initiated application process. If your crash is at a particular time of the day these things may apply.

Crashes without HPMC can also be due to inadequate patching on the OS. Do you have SCSI,LVM,LAN and other IO related patches up-to-date? if not, this is a good time to consider doing that.

Attitude, Not aptitude, determines your altitude