HPE 9000 and HPE e3000 Servers
1827324 Members
3710 Online
109962 Solutions
New Discussion

Re: CPU failure: reboot possible

 
SOLVED
Go to solution
Mann_2
Frequent Advisor

CPU failure: reboot possible

Dear all,

one of our customer has a rp5450 with 4(360Mhz) cpu's and 8GB of RAM. six months before, we have put a new cpu in this servers and it work fast an verry well.

But now, the new placed cpu isn't available in the system, if I use top, I can see only 3 cpu's.

The customer has now a bad performance.
to my question: if it possible to restart the server without boot-problems?

I can't remove the cpu to this time, the customer isn't in my nearness.

it is possible to make a reboot or will the server refuse the start with the "dead"-cpu?

Thank for your answer in advance
BR
Roland
Roland
19 REPLIES 19
jaivinder
Frequent Advisor

Re: CPU failure: reboot possible

Hi,

As you r telling that the O/P of top is showing only three out of four cpu's.
Check the state of the fourth cpu using ioscan.
ioscan -fnC processor
The o/p will revel whether the cpu is claimed or not.
If you want to reboot the system in this case your system should come up without any problem.For the cpu failure had u checked the console logs or not.

First of all see the cpu state and then reboot if required.

Jaivinder
Mann_2
Frequent Advisor

Re: CPU failure: reboot possible

Hi Jaivinder,

isoscan means, that the cpu is claimed:

ioscan -fnC processor
Class I H/W Path Driver S/W State H/W Type Description
===================================================================
processor 0 160 processor CLAIMED PROCESSOR Processor
processor 1 162 processor CLAIMED PROCESSOR Processor
processor 2 164 processor CLAIMED PROCESSOR Processor
processor 3 166 processor CLAIMED PROCESSOR Processor
#

but in top i can see only 3 cpu's.

i found this from yesterday in the syslog.log:
Jul 10 16:55:17 HE_H1_DB EMS [2348]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/system/events/
cpu/lpmc/cache_errors" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata
-R 153878530 -r /system/events/cpu/lpmc/cache_errors -n 153878562 -a

what can I do in this case?
Roland
Torsten.
Acclaimed Contributor

Re: CPU failure: reboot possible

Looks like the system was taking off all the load from this CPU due to cache errors.

If you reboot, stop at the boot menu.

Check, if the CPU is disabled (I guess it is).

Main Menu: Enter command or menu > co

---- Configuration Menu ------------------------------------------------------

Command Description
------- -----------
AUto [BOot|SEArch|STart] [ON|OFF] Display or set specified flag
BootID [] [] Display or set Boot Identifier
BootINfo Display boot-related information
BootTimer [0 - 200] Seconds allowed for boot attempt
CPUconfig [] [ON|OFF] Config/Deconfig processor
DEfault Set the system to predefined values
FastBoot [ON|OFF] Display or set boot tests execution
ResTart [ON|OFF] Display or set the System Restart Policy
PAth [PRI|ALT] [] Display or modify a path
SEArch [DIsplay|IPL] [] Search for boot devices
TIme [c:y:m:d:h:m:[s]] Read or set the real time clock in GMT

BOot [PRI|ALT|] Boot from specified path
DIsplay Redisplay the current menu
HElp [] Display help for specified command
RESET Restart the system
MAin Return to Main Menu


Use the CPUconfig command to check and disable if needed.

Did you check the logs/mails for any related messages?

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Torsten.
Acclaimed Contributor

Re: CPU failure: reboot possible

Looks like the system was taking off all the load from this CPU due to cache errors.

If you reboot, stop at the boot menu.

Check, if the CPU is disabled (I guess it is).

Main Menu: Enter command or menu > co

---- Configuration Menu ------------------------------------------------------

Command Description
------- -----------
AUto [BOot|SEArch|STart] [ON|OFF] Display or set specified flag
BootID [] [] Display or set Boot Identifier
BootINfo Display boot-related information
BootTimer [0 - 200] Seconds allowed for boot attempt
CPUconfig [] [ON|OFF] Config/Deconfig processor
DEfault Set the system to predefined values
FastBoot [ON|OFF] Display or set boot tests execution
ResTart [ON|OFF] Display or set the System Restart Policy
PAth [PRI|ALT] [] Display or modify a path
SEArch [DIsplay|IPL] [] Search for boot devices
TIme [c:y:m:d:h:m:[s]] Read or set the real time clock in GMT

BOot [PRI|ALT|] Boot from specified path
DIsplay Redisplay the current menu
HElp [] Display help for specified command
RESET Restart the system
MAin Return to Main Menu


Use the CPUconfig command to check and disable if needed.

Did you check the logs/mails for any related messages?

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Mann_2
Frequent Advisor

Re: CPU failure: reboot possible

Hi Torsten,

you saw a failure by an hpux-cpu before?

I would have never tought, which it is possible to have a defect hpux RAM or CPU....

tommorrow, i would drive to that customer to see the error live an to disable the cpu.

Thanks for your help !

BR
Roland

Roland
Patrick Wallek
Honored Contributor

Re: CPU failure: reboot possible

I have seen defects in both RAM and CPUs on HP-UX machines. Any component in your system has the potential to fail.

Bharath PSB
Advisor

Re: CPU failure: reboot possible

Are you sure that all the CPU's are of same type? Why I ask is if there is a mismatch between the CPU's HVersion and CVersion, there is a threat that some of the processors will not boot. Can you please confirm this?? To get this information, you may have to do a "in pr" at BCH.
Mann_2
Frequent Advisor

Re: CPU failure: reboot possible

Hi Barath,

yes, I think so. The server works for six months with 4 cpu's verry well. Fast and without problems.

but one of the cpu's will not work since 2 days..

today i'll drive to this customer for installation 4 new(refurbished) cpu's with 440Mhz(at the moment they have 360Mhz), and I hope we don't get more problems as before:-)

we'll see:-)

Thanks for all your answers

BR
Roland
Roland
Marcel Burggraeve
Trusted Contributor

Re: CPU failure: reboot possible

If you're going to replace those CPU's with faster ones make sure you have a document with you showing all dipswitch settings on the system board for the different CPU speeds out there.
Mann_2
Frequent Advisor

Re: CPU failure: reboot possible

Hi Marcel,

how many dip-switch must be configured?

Thanks for your anwer!

BR
Roland
Roland
Marcel Burggraeve
Trusted Contributor
Solution

Re: CPU failure: reboot possible

It's a total of 8 dipswitches which can be found on the left side of the system board ( if you're facing the front of the system )
However, I just tried to find a document on ITRC and Google showing the correct settings but couldn't find one.
Maybe someone else on ITRC can provide a link or a document ?
Mann_2
Frequent Advisor

Re: CPU failure: reboot possible

Hi,

thanks for your help and the search for the documentation:-)

I would ty to find a detailed documentaion also.

BR
Roland
Roland
Andrew Merritt_2
Honored Contributor

Re: CPU failure: reboot possible

Also, what is the full text of the EMS Event that was notified in syslog.log? Either run the command given, or look in /var/opt/resmon/log/event.log to see it.

Andrew
Mann_2
Frequent Advisor

Re: CPU failure: reboot possible

Hi Andrew,

here the message of the logfile:

# more /var/opt/resmon/log/event.log

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Sun Jun 17 16:55:03 2007

HE_H1_DB sent Event Monitor notification information:

/system/events/cpu/lpmc/cache_errors is >= 1.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time..........: Sun Jun 17 16:55:03 2007
Severity............: MAJORWARNING
Monitor.............: lpmc_em
Event #.............: 100521
System..............: HE_H1_DB

Summary:

Module at Hard Physical Address = 0xfffffffffffa2000 : The faulty
processor is still active.


Description of Error:

On Fri Jun 8 16:44:04 2007, a SERIOUS event was generated indicating an
abnormally high failure rate for this processor. The processor is still
found to be active and continued use of the system may lead to a
catastrophic failure.

Probable Cause / Recommended Action:

Reboot the system to deconfigure the faulty processor. Also, the processor
should be replaced as soon as possible.

Additional Event Data:
System IP Address...: 139.64.2.5
Event Id............: 0x46754b4700000000
Monitor Version.....: B.01.00
Event Class.........: LPMC
Client Configuration File...........:
/var/stm/config/tools/monitor/default_lpmc_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800
EMS Version.....................: A.03.20
STM Version.....................: A.26.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/lpmc_em.htm#100521

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
HPA.....................: 0xfffffffffffa2000
Processor Number........: 1
Physical Device Path....: 162
Serial Number...........: 0


>---------- End Event Monitoring Service Event Notification ----------<






BR
Roland
Roland
Andrew Merritt_2
Honored Contributor

Re: CPU failure: reboot possible

Ok, I think the text in that event answers your question about rebooting. The CPU would be deallocated, but the others would still work, and the system should boot OK.

It's not the cause of the problem, but I would strongly recommend that you install a current version of the OnlineDiags. That message shows that A.26.00 is installed, which was the June 2001 release! The current release for 11.11 is A.57.00 (see http://www.docs.hp.com/en/diag/stm/stm_upd.htm#table for the currently supported versions).

If you don't have the CDs, you can download the current version from http://h20293.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=B6191AAE

Andrew

Mann_2
Frequent Advisor

Re: CPU failure: reboot possible

Hi,

thanks for help.

Do you know, how many dip-switches must be configured by changing the cpu's from 360Mhz to 440Mhz?

Thanks in advance for you answer!

BR
Roland
Roland
Phil uk
Honored Contributor

Re: CPU failure: reboot possible

Hi,
You may have a revA or RevB system Bd.
So if revA:
360Mhz:
SW1 ON, SW2 OFF, SW3 ON, SW4 OFF, SW5 ON
440Mhz:
SW1 ON, SW2 OFF, SW3 OFF, SW4 OFF, SW5 ON

If revB:
360Mhz:
SW1 ON, SW2 OFF, SW3 ON, SW4 ON, SW5 ON
440Mhz:
SW1 ON, SW2 OFF, SW3 OFF, SW4 ON, SW5 ON

As you can see, the 360Mhz SW settings are different for the revA & revB bds, so you can work out which system board you have.
HTH,
Phil
Phil uk
Honored Contributor

Re: CPU failure: reboot possible

ps,
the Switch Bank is located near to a Button Battery on the SYS Bd, near the CPU sockets
Mann_2
Frequent Advisor

Re: CPU failure: reboot possible

Hi Phil,

thanks for your answer.

I'll go now to my customer and i'll get you an answer tomorrow.

Thnx a lot

BR
Roland
Roland