HPE 9000 and HPE e3000 Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

CPU_FAN_SLOW / FAILED on rp7410

 

CPU_FAN_SLOW / FAILED on rp7410

Hello everybody,

I have installed a new rp7410 (2 x 750MHZ) and I have noticed that it was shutting down due to Over-temperature once or twice a day. For almost a week, I have suspected the A/C in the sevrer room which is actually not working to fine, but after installing another A/C, the problem persisted and, to my surprise, since 2 days it started shutting down 10-15 minutes after booting.
From the MP ---> SL, I have checked the Error Logs and found a log saying that the CPU_FAN_SLOW and alomost 10 minutes later another log CPU_FAN_FAILED is logged before the server auto-shutdown due to Over-Temperature. (Attached is the complete log for SLOW. Unfortunately I couldn't capture the FAILED one).

I have removed the Cell Board, "checked" the 3 fans, unplugged/plugged them, re-test with no success.

I checked on the ITRC Forum and found that might be due to a PDC Firmware revision (although the server was shipped from HP 45 days ago only). I do not know what is the actual PDC rev of the server (I would be able to get it tomorrow).

Can someone help on this before changing the defective CPU/fan assembly (in case we can locate it among the 2 installed ones; it is not easy!!!).

Thanks,
Charbel
14 REPLIES 14
Michael Steele_2
Honored Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Boy, this was hard to find. PDC version 17.005 in firmware patch PF_CKEYMAT0600, aka firmware version 5.0.

http://www1.itrc.hp.com/service/cki/patchDocDisplay.do?patchId=PF_CKEYMAT0600

Search on 'PDC'.
Support Fatherhood - Stop Family Law
Saurav_1
Valued Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi,

JUST check. If this is a new machine then there shud not be any problem wit the hardware. but U can control the speed of fan from GSP menu. take help. If some problem with fans. then it shud not work at all. If some fan is failed. Logs will show the fan number which has faild or not responding. Better. RESET the GSP settings to factory default.

Saurav
Jeff Schussele
Honored Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi Charbel,

Yes, I had a new rp7410 that was taking itself down because it thought 2 fans failed.

It was at PDC 16.09 & MP 3.
The solution was to upgrade PDC to 16.11 & MP to 4.
I also upgraded all my other 7410s as this was a nasty little bug.

So upgrade yours to at least those values - newer would be better & you should be OK.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi all,

Thanks for your answers.

Michael,
I have dowmloaded Firmware 6.0 and upgraded the PDC/PDH/MP firmwares (with fw) but the problem persisted.
Thanks anyway for your reply.

Saurav,
Thanks for the info regarding the speed of the Fan in GSP. Unfortunately I have just read your reply and have to wait till tomorrow to test it including resetting GSP to factory default (which I think it is factory default as I had only changed the Web Console LAN config only).
Using "ps" from "MP>CM>" and under "Cabinet" I found that the CPU0 Fan is "Failed".
I will let you know by tomorrow.

Jeff,
The new firmwares are now 17.05 for PDC, 2.002 for PDH and 4.32 for MP and I'm still having the same problem on this CPU0 Fan.
I think I'm not lucky at all and CPU0 mustbe changed as the Fan cannot be ordered seperatley....
Thnaks anyway for your reply.

I will let you know what will happen tomorrow after checking what Saurav proposed.

Cheers,
Charbel
Tony Horton
Frequent Advisor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi Charbel,

I had a simmilar problem recently on an L1500. The machine kept logging that One of the CPU fans had failed (even though it was running) and the fans went into high speed mode. After a while it would report both fans had failed and immediately do an ungracefull shutdown.

It turned out to be the environmental monitoring board. HP repaced it and the problem went away.

Regards,

Tony.
No man is an isthmus
Saurav_1
Valued Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi,
Pls check the power supply status frm GSP prompt. Type SL for show log. and list all errors. may be U are able to find something interesting there. /var/tombstones/ts99. pls attach the recent/latest file to your reply. Check /var/adm/syslog/OLDsyslog.log & syslog.log.

Saurav
Saurav_1
Valued Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi,

EMS sends a mail regarding system health to root. if something wrong happens. Pls check the mailbox of root. use ELM. Check all mail. Cheers.

Saurav
Jeff Schussele
Honored Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi (again) Charbel,

Yes it appears you'll probably have to replace CPU0. My problem was with misreported cabinet fan failures - not CPU fans.
Should be covered under warranty if it's a new system so log a HW call with the RC & get a CE on-site to check it.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi all,

Just to let you know that I have done a factory reset for GSP, powered down/up the cabinet and restarted the server.
I waited for 2 hours without any problem. I have kept the server running and I will recheck by tomorrow to know if it has failed again (which I hope it will not).

Basically, for me the solution (in case it is resolved) is the combination of Firmmware Update & Reset of GSP. Please also note that after the last failure that occured immediately after the FW update, I kept the server in Stdy Power (almost 24 hours) before doing the GSP Reset 3 hours ago (I'm not too sure if this has helped).

Anyway, everything seems OK now, but I will reconfirm by tomorrow.

p.s. Toni, unfortunately the Power Monitor Module and the Fan Monitor Module are integrated on the backplane in an rp7410 (I'm not too sure about L1500/rp54xx), and if someone has a Monitor Module fault, he has to replace the System Backplane.

p.s. Logs files contained OverTemp/Shutdown critical errors entries only, while Mail pointed to a particular CPU Fan failure and ts99 nothing serious.

Thanks for all of you.

Cheers,
Charbel
Tony Horton
Frequent Advisor

Re: CPU_FAN_SLOW / FAILED on rp7410

The L/rp54X series has a separate card that plugs into the main CPU board. I haven't seen a 74x series machine, just assumed it would be similar :)

Our problem also went away after a hard reset, but only for 2 hours to 2 days then returned, the problem eventually got worse and would happen almost immediately. One tell tale sign was that the main cabinet fans (The L class doesn't have any fans on the CPU's themselves) were running at a speed somewhere between maximum and normal, and if you changed the speed at the initial boot prompt to high they hardly changed at all (if anything they slowed down a bit!).

Hope it's just a glitch and the firmware update and reset has fixed it for you. I remember seeing a firmware notice from HP mentioning problems with the fans.

Regards,

Tony.
No man is an isthmus
Jeff Schussele
Honored Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi Tony,

The rp7410 is not a "standard" rp74XX system.
It's a cell-based system like the rp8400 & SuperDomes.
It can host 2 hard partitions - nPars - and doesn't use the same power monitor HW/FW like a rp7450/7470 uses.
It's really more like a SuperDome than a N-class.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi again,

Unfortunately the server failed again after only 30 minutes from keeping it running for test.

Today and after intensive troubleshooting, I discovered that the source of this Fault is simply the Cell Board and not the CPU/Fan itself. I have swapped CPU0 and CPU1 and the "ps" was still showing CPU0 Fan Failed!!!
I have gotten also a new Cell Board which I have installed using the same CPUs and everything is working fine since more than 5 hours now (hope it will continue....).

I've read some of you saying that we can change the CPU Fan speed from the GSP, how do we do it? Because this might be the problem of this Cell Board or what do you think?

Cheers,
Charbel

Tony Horton
Frequent Advisor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi Charbel,

I'm not 100% sure about this but I think that you can change the fan speed at the initial boot prompt (interupt when it says press any key within 10 seconds to interupt). I'm not sure but I think it was something like fan high/fan normal probably from the service menu. I never actually did it myself (watched the engineer do it) so I'm a little unsure.

Regards,

Tony.
No man is an isthmus
Tony Horton
Frequent Advisor

Re: CPU_FAN_SLOW / FAILED on rp7410

Just did a bit of a search it's under the configuration menu not the service menu at the Boot console handler (hope this is the same on an rp7410 as it is on an L class)....

Regards,

Tony.
No man is an isthmus