Operating System - HP-UX
1833625 Members
3643 Online
110062 Solutions
New Discussion

HP9000 E45 : Systems reboot once in 15 days

 
Sanjiv Sharma_1
Honored Contributor

HP9000 E45 : Systems reboot once in 15 days

Dear Friends,

We are facing a peculier problem in HP 9000 E45 Server with HP-UX 10.20 and Oracle 8.0.4.
The system gives the messages " Disk Sync timedout,continuing reboot. It was not possible for the kernel to find out the process which caused this crash".
Dumpsys ( ) called.
It happens once in 15 days.

Pls. help me to resolve it.

Regards,
Sanjiv Sharma
Everything is possible
8 REPLIES 8
Patrick Wessel
Honored Contributor

Re: HP9000 E45 : Systems reboot once in 15 days

Sanjiv,
Your first step should be to check the INDEX file in /var/adm/crash/core.x There should be a "panic string" which gives you a brief idea what happened. Can you provide use that information? If the message is "HPMC occurred" you better contact your local support to find the hardware bug. In case of other messages call the software support to analyze the dump in /var/adm/crash/core.x
There is no good troubleshooting with bad data
Manju Kampli
Trusted Contributor

Re: HP9000 E45 : Systems reboot once in 15 days

enable savecore option in /etc/rc.config.d/savecore. When the crash happens next time, the system will dump the memory in to /var/adm/crash directory which can be used to analyse the cause for the crash.
Tools like adb are used to debug these dumps. Send these dumps to HP response centre and they should be able to help you in identifying the problem.
Never stop "LEARNING"
John Palmer
Honored Contributor

Re: HP9000 E45 : Systems reboot once in 15 days

/var/adm/shutdownlog may also give you a panic string which can aid diagnosis.

Does the server crash regularly (every two weeks or so)? If so then some sort of memory leak may be the problem. Keep an eye on the amount of swap space in use with:-
swapinfo -t
The PCT USED value on the last line is significant. If this steadily increases over a period than a memory leak is likely.

Regards,

John
Anthony deRito
Respected Contributor

Re: HP9000 E45 : Systems reboot once in 15 days

Sanjiv,

Are you saving dumps?:

If you have savecore enabled but you are not sure where the dump files are, look at the file /etc/rc.config.d/savecore file and check where the $SAVECORE_DIR parameter is set to. cd to this directory and check if you have core.x directories in it. Also check in this file what your $SAVE_PAGES parameter is set for. Are you doing a full dump, partial dump if not enough space is available, etc...?

If you do have a good dump:

If checking /etc/shutdownlog does not give you enough information on what the panic string is a result of, download q4 by getting patch PHCO_20261 (this may be updated) and install it. Q4 is very easy to use and gives valuable information on what caused the crash. If you need steps to use Q4, let me know. Although, without a good dump, this is senseless.

Check out how to configure savecore if this is so.


Tony
Terja
Frequent Advisor

Re: HP9000 E45 : Systems reboot once in 15 days

What it sounds like is that oracle has a memeory leak which is causing you to run out of swap. Put into a cron a little routine that will show you swap on say every hour,
"swapinfo" is the command. Other things to look at are

a) did you patch oracle there are a number of patches for that release
b) what does dmesg say.
c) what is in syslog.
d) what jobs ran on the days of the crash.
e) run a manual fsck of your entire system
UNIX - Live free or Die
Philip Chan_1
Respected Contributor

Re: HP9000 E45 : Systems reboot once in 15 days

Look under /var/adm/crash, see if there are any dump directory that correspond to your server reboot time (see the timestamp). If there are then give HP a call, they will ask for the core dump files that are under the above directory, they will examine the coredump and should be able to find out what gone wrong in your system.

You probably have a faulty hardware component. Last time we had a bad CPU that caused our machine to reboot almost every week.

Rgds,
Philip
David Ritts
Occasional Contributor

Re: HP9000 E45 : Systems reboot once in 15 days

along with var/adm/crash...look for /var/tombstones..There may be a log ts9x. this will tell you if there is a HPMC associated with this problem.Make sure your sytem has enough dump space to save the full crash dump for evaluation..Kev
Patrick Wessel
Honored Contributor

Re: HP9000 E45 : Systems reboot once in 15 days

Kevin,
Unfortunately is it not possible to read the PIM data of the PA7100LC CPUs online. That is the reason why the diagnostic isn't able to create a useful tombstone file on an E-Class server.
There is no good troubleshooting with bad data