1835921 Members
2588 Online
110088 Solutions
New Discussion

Re: System crash?

 
Edward McCouch
Frequent Advisor

System crash?

I have a K460 running HP-UX 11.0. It is a SAP application server. Last night, the operator on duty called me and said that the server had "crashed" so he rebooted it. Since the reboot, the system has been functioning normally. I have been trying to figure what caused the crash, but the OLDsyslog has revealed nothing, neither has /var/adm/crash or /etc/shutdownlog. There is an entry in /var/tombsotnes, but I am not sure where else I should be looking. As far as I can tell, it almost appears that the server was rebooted instead of the server crashing.

Anyone have any ideas on where I should be looking?
13 REPLIES 13
RAC_1
Honored Contributor

Re: System crash?

do you have any codes in ts99 file? If not there was no crash.

The time stamp of ts99 should match with the time operator called you. if not it was not a crash. (check crash saving-/etc/rc.config.d/savecrash.

There is no substitute to HARDWORK
Ken Hubnik_2
Honored Contributor

Re: System crash?

Do a cntr - B and see if there were any errors logged by the system.
RAC_1
Honored Contributor

Re: System crash?

you can also check sl logs from GSP.
There is no substitute to HARDWORK
Sanjay_6
Honored Contributor

Re: System crash?

Hi Edward,

Have you done the crash analysis. If not try this link on how to do the crash analysis,

http://www2.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000065011660

http://www2.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000063211994

This alongwith the ts99 file in /var/tombstones normally helps in finding out the cause of the crash.

If you post your ts99, we can try and have a look.

Hope this helps.

Regds
S.K. Chan
Honored Contributor

Re: System crash?

It depends on how hard the server crashed, what's the nature of the crash for it to leave some clues or info behind. You're looking at the right places.
a) /etc/shutdownlog => Should be able to tell you if someone accidently rebooted the system. It also would have some error information especially if the crash is hardware related.
b) /var/adm/crash => Only if it's serious enough you;ll see some info here, otherwise it's not going to show anything.
c) /var/tombstones = > Look at the latest tsXX file (probably ts99), it should have some register dumps and possibly readable error message. This is best decoded by HP.
You may want to look/search for ..
1) Core files that may be left behind.
2) STM logs or better still run STM's "infolog".
Jerome Swiniarski
New Member

Re: System crash?

Hi,

perhaps you doesn't have dump in /var/adm/crash
because /etc/rc.config.d/savecrash is not configured correctly or dump area (lvlnboot -v)
but if it's correct and no dump was generated and no messages sended in syslog and shutdownlog file, I think you had an hardware problem on your server, an entry in ts99 file
may confirm this theory. Best way now is to contact HP hardware support.

Bye, Jerome.
Helen French
Honored Contributor

Re: System crash?

I would question the operator who called you and will ask him what happened? How did he know that the system was crashed? Any application errors? system errors? any issues with power supply? any hardware errors (/var/adm/syslog/syslog.log)?

Check root mails (elm), stm logs, check hardware with stm?
Life is a promise, fulfill it!
Edward McCouch
Frequent Advisor

Re: System crash?

I attached the ts99 file. It looks kinda weird. I'm not sure how to read ts logs, so I've never really looked at one before this incident. I will have to change that. ;) Anyway, I know that this particular K-box was a 360 but was later upgraded to a 460. I also know that the memory isn't all HP, some of it is 3rd party memory, but those cards are in the last set of trays.

The timestamp on ts99 matches the syslog boot time, and there *is* a crash.0 directory with the same timestamp, but it is empty.

From what I was told by the operator, the users called him and he couldn't log into the console. He called the admin on duty(not me) and that admin told him to turn the server off and then on again. Then other admin called me to tell me that the server crashed. By the time I called the operator he had already switched the machine off and then back on. Since the machine came up normally, I figured that it could wait until morning. This is also the reason that I didn't go into service mode (Cntl-B) and check the service logs.

I think I've answered everyone's questions.... Thank you for the links... I'm looking through them right now. Thank you all for your speedy responses.

-Ed
Bill McNAMARA_1
Honored Contributor

Re: System crash?

the tombstone is NOT always saved to /var/adm/ on reboot after panic. You need to install the appropriate patches to diagnostics.

In eithercase, /etc/shutdownlog should tell you a lot.

typically crashes are due to things changing -adding/upgrading software/hardware and if not system hardware failure.
Memory beats decoding via q4!

Later,
Bill
It works for me (tm)
Edward McCouch
Frequent Advisor

Re: System crash?

Ok HERE is the ts99 file.

Nothing has been changed on the box for a couple of months. /etc/shutdown doesn't have an entry in it at all regarding last night.
Anil C. Sedha
Trusted Contributor

Re: System crash?

Hi Edward,

Looking at your TS99 file, i believe the error is most likely to be memory related.

You may do this thing, unseat your hardware one by one and insert it again. By this, i only mean the memory modules and the CPU.

Other option is analyze your /var/adm/crash/* files (recent directory) and send the errors to HP and they will analyze it. You may use their number 1-800-633-3600 to ask for support, only if your company has a support agreement with HP.

You may run the Q4 analysis utility for analyzing the dump. Follow this document for the same.
http://www1.itrc.hp.com/service/cki/search.do?searchString=OZBEKBRC00000611&mode=id&submit=Search&searchCrit=allwords&docType=Security&docType=Patch&docType=EngineerNotes&docType=BugReports&docType=Hardware&docType=ReferenceMaterials&docType=ThirdParty

Regards,
Anil
If you need to learn, now is the best opportunity
S.K. Chan
Honored Contributor

Re: System crash?

My advice is get HP to analyze the file. It's not a guessing game and considering the fact that this is an SAP application server, it has got to be a critical machine.
Edward McCouch
Frequent Advisor

Re: System crash?

Thank you all once again for your helpfulness. I am calling HP to analyze the file. I can't pull the K-box down without first scheduling the down time, so I've started that process too.

Thank you all!
-Ed