Integrity Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

Superdome sx2000 hardware degraded

 
Peter Marais
Frequent Advisor

Superdome sx2000 hardware degraded

When I reboot a particular npar on my superdome I receive a warning that my harware is degraded:

Keyword = CELL_HW_DEGRADED

Description:

A dimm or CPU has failed and is not operational for the system. This event is emitted prior to determining if the cell should be integrated into the Partition.

Cause/Action:

A deconfigured dimm or cpu has been detected. Examine earlier events to isolate the problem.


Recommendation:

None

Yet when I look at the system via parstatus or stm I can not find any degraded memory or cpu...
Also my SEL does not give me any more detail of what failed

any other sagestions of where I can look? Running HP-UX 11.23
18 REPLIES 18
Torsten.
Acclaimed Contributor

Re: Superdome sx2000 hardware degraded

Use "info all" from EFI or BCH, depending on the CPUs installed (RISC vs. Integrity server).

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Peter Marais
Frequent Advisor

Re: Superdome sx2000 hardware degraded

Thanks Torsten is there any unix commands I can try as I do not have the luxery of bringinig the system down
Torsten.
Acclaimed Contributor

Re: Superdome sx2000 hardware degraded

You can check the memory with stm.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Matti_Kurkela
Honored Contributor

Re: Superdome sx2000 hardware degraded

If you have currently unused iCOD/iCAP CPUs in your system, the system is smart enough to "swap" a failed CPU with an unused one.

The rationale is apparently something like: you've paid for X CPUs, so the system will provide you with X CPUs as long as there are enough usable CPUs to do that.

MK
MK
Andrew Rutter
Honored Contributor

Re: Superdome sx2000 hardware degraded

hi,

Did you run the info tool in stm? and look at the log for the memory or cpu's. Also try running the info tool on the system, usually the first item in the map in CSTM.

this should show any degraded/deconfigured items

Andy
Peter Marais
Frequent Advisor

Re: Superdome sx2000 hardware degraded

Hi Guys

Thanks for all the responces.

As I stated in my opening mail stm reveals no deconfigured memory or cpu's. The only message I get is from SEL on the mp stating the qouted worning (from opening statement)above and marks the cell board with a w in VFP. No ICOD running on this system
Machinfo also gives the correct amount of CPU's and memory.
Torsten.
Acclaimed Contributor

Re: Superdome sx2000 hardware degraded

So I would investigate the cells from MP with

MP:CM> ps

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Peter Marais
Frequent Advisor

Re: Superdome sx2000 hardware degraded

Thanks Torston, ps only gives a power status. No joy with that.
Torsten.
Acclaimed Contributor

Re: Superdome sx2000 hardware degraded

I would expect an item in "failed" status.

However, there must be a related message in the diags logs, maybe in root's mailbox and in the MP logs.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Torsten.
Acclaimed Contributor

Re: Superdome sx2000 hardware degraded

Apropos stm, if you run the info tool for memory, do you get "config" for all dimms?

...

Basic Memory Description

Module Type: MEMORY
Page Size: 4096 Bytes
Total Physical Memory: 32768 MB
Total Configured Memory: 32768 MB
Total Deconfigured Memory: 0 MB

Memory Board Inventory

DIMM Location Size(MB) State Serial Num Part Num
-------------------- -------- ------- ---------------- ------------------
Cab 0 Cell 1 DIMM 0A 2048 Config PRY1234567 A9846-60301
Cab 0 Cell 1 DIMM 0B 2048 Config PRY1234567 A9846-60301
Cab 0 Cell 1 DIMM 1A 2048 Config PRY1234567
... A9846-60301

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Peter Marais
Frequent Advisor

Re: Superdome sx2000 hardware degraded

All dimms configured in STM. The only log I pick up of the error is in SEL on the mp but that does not tell me what or where. Only that the cell has been degraded. Only thing left for me looks like is to run offline diags when I get some downtime on the system. I find it weard though that there is no sign of the problem in the OS.
Andrew Rutter
Honored Contributor

Re: Superdome sx2000 hardware degraded

Peter,

out of interest, what does parstatus show?

is there any problems indicated here with cpu's?

#parstatus -Vp0

are there any vpars created on the npar? I presume not, but worth knowing

Andy
Peter Marais
Frequent Advisor

Re: Superdome sx2000 hardware degraded

No vpars configured. parstatus -V shows everything up as normal. Have some downtime on the system on Monday will update my findings on Tuesday.
Bill Hassell
Honored Contributor

Re: Superdome sx2000 hardware degraded

How about the PDC logs from the MP? From the MP prompt:

sl
e

That may give a hint about what is happening.


Bill Hassell, sysadmin
Peter Marais
Frequent Advisor

Re: Superdome sx2000 hardware degraded

Thanks Bill. please note this is an Itanium box. mp -> SL -> SEL /FPL
cnb
Honored Contributor

Re: Superdome sx2000 hardware degraded

You might try looking in logtool and the event.log to see if anything unusual is happening:

# cstm
cstm> ru logtool
Logtool Utility>rs

Look at memory errors:

Logtool Utility> vd


/var/opt/resmon/log/event.log

Or use the slview command or the IPMI event viewer to look at low-level events:
http://docs.hp.com/en/diag/eit/st_event_viewer.htm


Rgds,
Peter Marais
Frequent Advisor

Re: Superdome sx2000 hardware degraded

Had the same problem occur again on a different partition. Reset the cell nvram by removing the battery on the PDH. This cleared the fault.
Peter Marais
Frequent Advisor

Re: Superdome sx2000 hardware degraded

Reset nvram on cell reporting the error.