HPE 9000 and HPE e3000 Servers

Re: K580 Daily HPMC

 
Walt Stupinski
Occasional Advisor

K580 Daily HPMC

Getting daily HPMC from K580 w/4 240MHz procesors. Each HPMC is exactly the same. Based upon this, today I pulled Proc 0 out, and am now running w/only 3 processors.

I'm attaching the info and am curious if people here agree or can point me in the right direction. TIA.

Walt
10 REPLIES 10
Eugeny Brychkov
Honored Contributor

Re: K580 Daily HPMC

Walt,
all CPUs in the box should work synchronously. Looking to output provided I conclude that all CPUs saw HPMC, but CPU3 did not (date is invalid). So I conclude that cause is CPU3.
In addition, please check PDC firmware revision. Latest is 42.01
Eugeny
Michael Steele_2
Honored Contributor

Re: K580 Daily HPMC

Cross reference your ts99 file by reading the PIM information in the GSP:

># shutdown -r now
-interrupt boot cycle
-ser (* From BCH *)
-piminfo (* Check for HPCM *)
-pim 0 hpmc (* cpu 0 *)
-pim 1 hpmc (* cpu 1 *)
-pim 2 hpmc (* cpu 2 *)
-pim 3 hpmc (* cpu 3 *)

-info /in
-pr (* should list deconfigured processors *)
Support Fatherhood - Stop Family Law
Walt Stupinski
Occasional Advisor

Re: K580 Daily HPMC

Wow! - I never noticed that CPU 3 time stamp. Yeah, I'd say that's a little off, huh? I was hung up on the CPU 0 bus timeout, where the others didn't show that.

(Oh, great, now that I think of it, I pulled CPU 0 and moved 3 into it's slot)
Claudiu Schmidt
Valued Contributor

Re: K580 Daily HPMC

Hi,

500b on cpu 0 means it detected a bus timeout, and because cpu 3 has an old timestamp, it means this cpu blocked the bus.so cpu 3 is for shure the defect part, and you should replace it.

Claudiu
Vincent Farrugia
Honored Contributor

Re: K580 Daily HPMC

Hello,

CPU 3's 5404 and 5504 errors refer to errors in the system board. These errors are NOT the same as previous CPUs.

5xy4 refers to "parity error", while 5xy8 refer to "broad error". x=4 and x=5 refer to the system board.

HTH,
Vince
Tape Drives RULE!!!
Patrick Wessel
Honored Contributor

Re: K580 Daily HPMC

Walt,
As Claudiu already stated discovered CPU a runwaybus timeout. This timeout occurred during a TLB purge. Because A TLB purge is a CPU-to-CPU transaction and CPU didn???t report the HPMC is CPU 3 my first suspect
But, before you touch any piece of hardware get sure that the PDC has at least revision 39.11. Everything below 39.11 can cause trouble like the runwaybus timeout (or the older parity error that is shown by CPU 3)

Claudiu,
It???s good to see you. Please, say hi to Patricia

There is no good troubleshooting with bad data
Walt Stupinski
Occasional Advisor

Re: K580 Daily HPMC

Hi everybody. I really appreciate all the responses. Based upon those responses, Here's what I'll do:
1. Verify PDC level is 39.11 or better.
2. Replace CPU 3
3. Monitor system for a few days.
4. If another same HMPC, replace system board.

I guess it'll be a couple of days before I'll know if the system's stable.

Once again, thanks to all, and I'll keep you posted.....

Walt
Michael Steele_2
Honored Contributor

Re: K580 Daily HPMC

Any points for anyone big guy?
Support Fatherhood - Stop Family Law
Walter Stupinski
New Member

Re: K580 Daily HPMC

We're still monitoring this one here. Users are not convinced problem has been fixed. A week without HPMCs will convince them. Points are forthcoming.