Operating System - HP-UX
1832181 Members
2712 Online
110038 Solutions
New Discussion

Swapper process caused system crash?

 
Andrew Griffin
Advisor

Swapper process caused system crash?

We had a system crash last night, and I've spent the majority of the day today digging through the crash dump with q4 trying to figure out exactly why. Long story short one of the processors crashed due to a spinlock deadlock timeout - some process locked one processor and caused the system to crash. After analyzing with q4 I discovered that the process that locked up was process 0, or the swapper process. Anyone have any ideas how/why the swapper process might do this? What conditions/situations may have caused this? The system is of course back up and running fine, I'm just searching for a possible explanation. Any help is appreciated.
8 REPLIES 8
harry d brown jr
Honored Contributor

Re: Swapper process caused system crash?


How sure are you that the process id was actually zero?

Is your system up-to-date in patches?

live free or die
harry
Live Free or Die
Patrick Wallek
Honored Contributor

Re: Swapper process caused system crash?

Are you sure it was process 0, not processor 0?

I didn't think there was a PID 0, I thought it started with 1, which is init.

I would make sure you are up to date on patches and firmware. I think there are some patches for spinlock deadlock problems.
Andrew Griffin
Advisor

Re: Swapper process caused system crash?

Well, I'm far from a q4 expert, but I have used it quite a bit (probably too much) for analyzing system crashes. This is basically what I did in q4:

1. Loaded crash_event_t from crash_event_table - found out that the crash was from a spinlock timeout on processor 3. Processors 1, 2, and 4 crashed due to a CT_TOC panic - meaning they panic'd because processor 3 did.

2. Then I loaded mpinfo_t from mpproc_info and confirmed that processors 1, 2, and 4 were in a SPUSTATE_IDLE state and processor 3 was in a SPUSTATE_SYSTEM state confirming that proc 3 was the root of the crash

3. From there I loaded the proc structure and determined the PID of the process that was using processor 3 when the system crashed. That's when I determined that process 0 was the process running when the system crashed.
Andrew Griffin
Advisor

Re: Swapper process caused system crash?

We are up to date through the June 2002 QPK.
S.K. Chan
Honored Contributor

Re: Swapper process caused system crash?

This is just one example where patches are critical for this kind of issues. The section under "Resolution" explains how and why spinlock deadlock panic can cause system crash. I suggest a call to response center, it seems if you're updated on patches this could be due to hardware problem, have you check for any other hardware error log ?(DocID=UXDNKBRC00007848)
http://support1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000062922779
Hope it helps ..
Andrew Griffin
Advisor

Re: Swapper process caused system crash?

Yeah, I thought of that initially too - there were several "I-Cache parity error" messages in crash_event_table. However I checked through the syslog and stm logs and I couldn't find any other instances of that error. I will be keeping a close eye on the processor - it may be going out. I will continue to look for spinlock patches as well.
Dietmar Konermann
Honored Contributor

Re: Swapper process caused system crash?

Brian,

just a small comment on spinlock deadlock panics... you wrote that proc 3 paniced the system. In case of a spinlock deadlock the panicing processor it NOT the guilty one. It olny acted as sime kind of watch dog... and here it found that the spinlock was held longer than 60 secs by someone _else_. The lock owner needs to be determined.

You really should go ahead and log a call with your repsonse center to get this dump analyzed. The LPMCs may be an important hint also.

Regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
ASO CENTRAL
Advisor

Re: Swapper process caused system crash?

The I-cache parity errors are crucial to this problem. syslog and stm logs are non-existant during a crash so there won't be any details on internal hardware faults. Check the ts99 file to see if there are futher details, but I would strongly suspect a hardware failure (I-cache parity) which then caused mistakes in HP-UX which then caused the spinlock which then caused the crash.

Usually, internal hardware failures like this will cause an HPMC which is almost always a hardware failure. Time to call HP hardware support.