HPE 9000 and HPE e3000 Servers
1752278 Members
4934 Online
108786 Solutions
New Discussion юеВ

Re: CPU_FAN_SLOW / FAILED on rp7410

 
Charbel Bou-eid
Advisor

CPU_FAN_SLOW / FAILED on rp7410

Hello everybody,

I have installed a new rp7410 (2 x 750MHZ) and I have noticed that it was shutting down due to Over-temperature once or twice a day. For almost a week, I have suspected the A/C in the sevrer room which is actually not working to fine, but after installing another A/C, the problem persisted and, to my surprise, since 2 days it started shutting down 10-15 minutes after booting.
From the MP ---> SL, I have checked the Error Logs and found a log saying that the CPU_FAN_SLOW and alomost 10 minutes later another log CPU_FAN_FAILED is logged before the server auto-shutdown due to Over-Temperature. (Attached is the complete log for SLOW. Unfortunately I couldn't capture the FAILED one).

I have removed the Cell Board, "checked" the 3 fans, unplugged/plugged them, re-test with no success.

I checked on the ITRC Forum and found that might be due to a PDC Firmware revision (although the server was shipped from HP 45 days ago only). I do not know what is the actual PDC rev of the server (I would be able to get it tomorrow).

Can someone help on this before changing the defective CPU/fan assembly (in case we can locate it among the 2 installed ones; it is not easy!!!).

Thanks,
Charbel
14 REPLIES 14
Michael Steele_2
Honored Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Boy, this was hard to find. PDC version 17.005 in firmware patch PF_CKEYMAT0600, aka firmware version 5.0.

http://www1.itrc.hp.com/service/cki/patchDocDisplay.do?patchId=PF_CKEYMAT0600

Search on 'PDC'.
Support Fatherhood - Stop Family Law
Saurav_1
Valued Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi,

JUST check. If this is a new machine then there shud not be any problem wit the hardware. but U can control the speed of fan from GSP menu. take help. If some problem with fans. then it shud not work at all. If some fan is failed. Logs will show the fan number which has faild or not responding. Better. RESET the GSP settings to factory default.

Saurav
Jeff Schussele
Honored Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi Charbel,

Yes, I had a new rp7410 that was taking itself down because it thought 2 fans failed.

It was at PDC 16.09 & MP 3.
The solution was to upgrade PDC to 16.11 & MP to 4.
I also upgraded all my other 7410s as this was a nasty little bug.

So upgrade yours to at least those values - newer would be better & you should be OK.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Charbel Bou-eid
Advisor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi all,

Thanks for your answers.

Michael,
I have dowmloaded Firmware 6.0 and upgraded the PDC/PDH/MP firmwares (with fw) but the problem persisted.
Thanks anyway for your reply.

Saurav,
Thanks for the info regarding the speed of the Fan in GSP. Unfortunately I have just read your reply and have to wait till tomorrow to test it including resetting GSP to factory default (which I think it is factory default as I had only changed the Web Console LAN config only).
Using "ps" from "MP>CM>" and under "Cabinet" I found that the CPU0 Fan is "Failed".
I will let you know by tomorrow.

Jeff,
The new firmwares are now 17.05 for PDC, 2.002 for PDH and 4.32 for MP and I'm still having the same problem on this CPU0 Fan.
I think I'm not lucky at all and CPU0 mustbe changed as the Fan cannot be ordered seperatley....
Thnaks anyway for your reply.

I will let you know what will happen tomorrow after checking what Saurav proposed.

Cheers,
Charbel
Tony Horton
Frequent Advisor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi Charbel,

I had a simmilar problem recently on an L1500. The machine kept logging that One of the CPU fans had failed (even though it was running) and the fans went into high speed mode. After a while it would report both fans had failed and immediately do an ungracefull shutdown.

It turned out to be the environmental monitoring board. HP repaced it and the problem went away.

Regards,

Tony.
No man is an isthmus
Saurav_1
Valued Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi,
Pls check the power supply status frm GSP prompt. Type SL for show log. and list all errors. may be U are able to find something interesting there. /var/tombstones/ts99. pls attach the recent/latest file to your reply. Check /var/adm/syslog/OLDsyslog.log & syslog.log.

Saurav
Saurav_1
Valued Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi,

EMS sends a mail regarding system health to root. if something wrong happens. Pls check the mailbox of root. use ELM. Check all mail. Cheers.

Saurav
Jeff Schussele
Honored Contributor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi (again) Charbel,

Yes it appears you'll probably have to replace CPU0. My problem was with misreported cabinet fan failures - not CPU fans.
Should be covered under warranty if it's a new system so log a HW call with the RC & get a CE on-site to check it.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Charbel Bou-eid
Advisor

Re: CPU_FAN_SLOW / FAILED on rp7410

Hi all,

Just to let you know that I have done a factory reset for GSP, powered down/up the cabinet and restarted the server.
I waited for 2 hours without any problem. I have kept the server running and I will recheck by tomorrow to know if it has failed again (which I hope it will not).

Basically, for me the solution (in case it is resolved) is the combination of Firmmware Update & Reset of GSP. Please also note that after the last failure that occured immediately after the FW update, I kept the server in Stdy Power (almost 24 hours) before doing the GSP Reset 3 hours ago (I'm not too sure if this has helped).

Anyway, everything seems OK now, but I will reconfirm by tomorrow.

p.s. Toni, unfortunately the Power Monitor Module and the Fan Monitor Module are integrated on the backplane in an rp7410 (I'm not too sure about L1500/rp54xx), and if someone has a Monitor Module fault, he has to replace the System Backplane.

p.s. Logs files contained OverTemp/Shutdown critical errors entries only, while Mail pointed to a particular CPU Fan failure and ts99 nothing serious.

Thanks for all of you.

Cheers,
Charbel