Operating System - HP-UX
1849236 Members
2463 Online
104042 Solutions
New Discussion

Re: Machine rebooting itself

 
Marcos_9
Occasional Advisor

Machine rebooting itself

I have an L1000 having some trouble. It is filling the /var/adm/crash with crash.xx folders. When it reaches the filesystem size the computer reboots itself once and again.

It also has the Fault led flashing.

Does this means a hardware error?

How could I check it?

When accesing through web console I get two system alerts at startup
******** SYSTEM ALERT *******
SYSTEM NAME: uninitialized
DATE: 08/01/2002 TIME: 09:55:47
ALERT LEVEL: 7 = reserved

REASON FOR ALERT
SOURCE: 0 = unknown, no source stated
SOURCE DETAIL: 0 = unknown, no source stated SOURCE ID: FF
PROBLEN DETAIL: 0 = no problem detail
LEDs: RUN ATTENTION FAULT REMOTE POWER
ON FLASH FLASH OFF ON
LED State: System Running. Unexpected reboot. Non-critical Error Detected
Check Chassis and Console Logs for error messages.

0x0000207000FF6292 00000000 00000000 - type 0 = Data Field Unused
0x5800287000FF6292 00006607 0109372F - type 11 = Timestamp 08/01/2002 09:55:47
A: ack read of this entry - X: Disable all future alert messages.
-> Choice


And after ACK this one I get this one.

******** SYSTEM ALERT *******
SYSTEM NAME: uninitialized
DATE: 08/01/2002 TIME: 09:57:47
ALERT LEVEL: 3 = System blocked waiting for operator input

REASON FOR ALERT
SOURCE: 1 = processor
SOURCE DETAIL: 1 = processor general SOURCE ID: 0
PROBLEN DETAIL: 0 = no problem detail
LEDs: RUN ATTENTION FAULT REMOTE POWER
ON FLASH FLASH OFF ON
LED State: Unrecognized state - Refer to LED Decoder on your Support CD.

0xF8E000301100E000 00000000 0000E000 - type 31 = legacy PA HEX chassis-code
0xF8E000301100E000 00006607 01092F0A - type 11 = Timestamp 08/01/2002 09:57:47
A: ack read of this entry - X: Disable all future alert messages.
-> Choice

After ack this alert the machine reboots itself. If I'm not connected to the web console the machine does not reboot itself and after deleting /var/adm/crash/crash.* the system remains stable for a few hours.

How can I know which is the error and how can I solve it?
I have tried to analyze the crash with q4 but it seems that it's not installed on this computer.

Thanks in advance:


12 REPLIES 12
Bill McNAMARA_1
Honored Contributor

Re: Machine rebooting itself

what is in the /etc/shutdownlog ?

You can safely delete old crash ie, ln crash.1 to /dev/null so only the last one is saved... I'd assume all the panics are the same.

Q4 would most likely be in /usr/contrib/bin/q4 if installed... do you have a tombsone? /var/tombstones?

Q4 should be installed with
HPUXEng64RT.OS-Core.Q4,
UXCoreMedia.OS-Core.Q4
and/or
QPK1100.PHCO_20262.Q4.
You could install the SUPPORT_PLUS bundle which should include q4.
It works for me (tm)
Stefan Farrelly
Honored Contributor

Re: Machine rebooting itself


Whats the contents of /var/tombstones/ts99 ?

Install Q4 as per previous reply.

Anything in /var/adm/shutdownlog ? or /var/adm/OLDmessages ?
Im from Palmerston North, New Zealand, but somehow ended up in London...
Marcos_9
Occasional Advisor

Re: Machine rebooting itself

 
Bill McNAMARA_1
Honored Contributor

Re: Machine rebooting itself

Someone from the Response Center may send you Q4. (from wtec.cup.hp.com) If you have trouble installing it from the Support+ bundles (from software.hp.com)

cd /usr/contrib/lib
uncompress Q4Lib.tar.Z
tar -xf ./Q4Lib.tar
cd /
cp /usr/contrib/lib/q4lib/sample.q4rc.pl .q4rc.pl
vi .q4rc.pl
replace the line :
push(INC, "./q4lib");
with :
push(INC, "/usr/contrib/lib/q4lib");
cd to your dump directory (default= /var/adm/crash)
Preprocess the kernel file :
/usr/contrib/bin/q4pxdb ./crash.X/vmunix Where "X" is the number of the coredump directory
export PATH=$PATH:/opt/perl/bin
q4 ???p /var/adm/crash/crash.0/

You may need then to:
q4> include whathappened.pl
Then
Q4> run WhatHappened > /tmp/crash.wh
When prompted to give the kernel file, the kernel is the one you are debugging: /var/adm/crash/crash.X/vmunix


Your problem is either software! or hardware!
The q4 analysis will tell us exactly.
If you have another identically configured machine (software wise) working okay, it is most likely a h/w problem.. panics typically occur due to SCSI problems, but can also be due to memory/cpu errors.. it is difficult to determine exactly from the info you have provided where the error is.. (I'm sure the isr can lead to an i/o path.)

What is your machine model, os revision, software list, running applications and ioscan output. Is the crash reproducable on demand?
What's in the syslog around the time of the panics.


Later,
Bill
It works for me (tm)
BFA6
Respected Contributor

Re: Machine rebooting itself

Hi,

You can always log a support call with HP and get them to analyse the latest crash dump.

And they can analyse the panic string.

I think (but maybe incorrect) if you create a /var/tombstones directory a tombstone file will be created the next time the box goes down.

Regards,

Hilary
Denise Kent
Advisor

Re: Machine rebooting itself

Hi,

If you dont have serviceguard which could be toc'ing the system then this looks like a hardware problem. In shutdownlog does it say HPMC? These are usually hardware errors. I would log a hardware support call and get the hardware checked.

Kind regards
Denise
Paula J Frazer-Campbell
Honored Contributor

Re: Machine rebooting itself

Hi

Look at /var/adm/tombstones/ts99

If you find that the machine had a HPMC (High Priority Machine Check) you are having a HW problem and you should contact technical HW support.

also:-

crash commands
cd /var/adm/crash/core.*
q4 .
trace event 0

Stack trace from the first crash event:
stack trace for event 0

crash event was an HPMC <- typically hardware
or

crash event was a TOC <- hang or Serviceguard TOC
or

crash event was a Panic <- typically software



Paula
If you can spell SysAdmin then you is one - anon
Bill McNAMARA_1
Honored Contributor

Re: Machine rebooting itself

The tombstone software (pdcinfo) is installed with:

# OnlineDiag B.11.00.18.09 HPUX 11.0 Support Tools Bundle, Sep 2001
OnlineDiag.Sup-Tool-Mgr B.11.00.18.09 Support Tools Manager for HPUX systems
OnlineDiag.EMS-KRMonitor A.11.00.04 EMS Kernel Resource Monitor
OnlineDiag.EMS-Core A.03.20 EMS Core Product
OnlineDiag.EMS-Config A.03.20 EMS Config
OnlineDiag.Predictive C.11.00.18.07 HP Predictive Support
OnlineDiag.Contrib-Tools B.11.00.18.09 Contributed Tools
OnlineDiag.LIF-LOAD B.11.00.18.09 HP LIF LOAD Tools

pdcinfo (run on reboot after panic by /sbin/init.d/pdcinfo) is supplied by the Online Diag contrib tools fileset

http://www.software.hp.com/SUPPORT_PLUS/index.html

http://www.software.hp.com/cgi-bin/swdepot_parser.cgi/cgi/displayProductInfo.pl?productNumber=B6191AAE




It works for me (tm)
Denise Kent
Advisor

Re: Machine rebooting itself

This type of panic, isr.ior is either a user initiated toc, Serviceguard toc or HPMC (hardware related). It is best to get a call logged with HP and send in the dump so we can do a full dump reading on it.
Kind regards
Denise
Bill McNAMARA_1
Honored Contributor

Re: Machine rebooting itself

Marcos_9
Occasional Advisor

Re: Machine rebooting itself

Thnaks Bill for helping me with the new threads I opened.

Sorry everybody; I'm a newbie at the forums

So actually I'm installing OnlineDiag_11.00.depot and after installing it what should I do wait for another crash, wait for another reboot or use any tool to know wich my problem is?

Bill McNAMARA_1
Honored Contributor

Re: Machine rebooting itself

run the q4 tool WhatHappened and redirect the output to file.
Post that as an attachement here.

The next time you get a crash, the tombstone should be saved to /var/tombstones/ts99

When you see this file, it is relatively easy to determine the failed/troublesome hardware component.. it is possibly memory, that's why you're getting random panics, ie, not one particular application running.

This is easily detectable via the tombstone, which you are just about to install.. recall HP 9000 have partity checked memory.

Is there anything in roots email? You may have ems messages reporting memory problems... that is once EMS is installed (you are installing/patching it now also)

Note also that the s/w you are installing is going to install some corrective patches. Should the problem have been due to a software defect, it may now be corrected.. send the crash dump via tape to your hp rep. s/he will confirm if it is hw/ or software.

Note also that q4 does not need to be run on the same system that the crash occured on, if q4 is on another system, you just need to be able to read the dump there in order to be able to analyse it...

Later,
Bill
It works for me (tm)