- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers - Netservers
- >
- DL140 spontaneous shutdown
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-02-2004 04:25 AM
тАО10-02-2004 04:25 AM
When running serious stress tests on the NICs, the DL140s (no other machine types affected) would spontaneously shut down about 10s into the test. Often this is preceeded by one or two bursts where the cooling fans speed up for half a second or so and then slow down again.
Reading about the problems with BMC and IPMI on these machines, (which by the way are running RedHat Enterprise 3), I upgraded both BMC and BIOS flash.
After the upgrade it now takes between 10 and 15 minutes for the machines to switch off, but switch off they still do. I have also loaded ipmi.o and tried to use shutdown_watchdog, (which, thank you HP gives no messages at all about what if anything it's doing, not even in the system log), but this last action does not seem to make a difference to the shutdown
behaviour.
The only other pieces of evidence I have is that each time the system is shut down there is a message in the event log saying
Temperature 59 4C 4B
Or other hex digits following.
I also have a logic analyser on the PCI bus and there is no unusual activity, parity or system error reported, just reset being asserted and then the power going off.
The problem only arises when the PCI bus and processors are heavily loaded.
I don't buy the temperature argument as the fans kick in for such a small period of time. Other times the system will shut off without blipping the fans at all.
Is anyone able to decode the hex digits ? What do they mean ?
Any other ideas what might be doing this, or how to get it fixed ?
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-02-2004 05:06 AM
тАО10-02-2004 05:06 AM
Re: DL140 spontaneous shutdown
Tempurature events can be controlled in the bios ASR events. You might try setting the ASR Thermal events and see if the problem goes away. I am surprised you do not see the POST message about a thermal event when the server boots up. You may need to contact HP directly and see if they have a better answer.
G'luck! -john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-02-2004 05:42 AM
тАО10-02-2004 05:42 AM
Re: DL140 spontaneous shutdown
How do I configure these things ? I see no controls in the BIOS for anything like this.
Thanks
Derek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-02-2004 05:57 AM
тАО10-02-2004 05:57 AM
Solutionvery sorry about that... I was working from memory on this. Other ProLiant servers have the option to configure ASR events in the bios. The DL140 appears to be like all the other 100 series servers in that the bios does not have it. It does appear to have a Watchdog Timer but I do not think you can control it.
http://h200001.www2.hp.com/bc/docs/support/UCR/SupportManual/TPM_349109-002/TPM_349109-002.pdf
page 42
You are running the IPMI "Heath Driver"?
http://h18023.www1.hp.com/support/files/server/us/download/20414.html
for update 1 or later.
As a temporary fix (test) ... put a room fan in front of the server to move more air through the box. If the load tests run longer or complete altogether, contact HP and let them know. They may be able to work this into their issue tracker and get a fix for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-02-2004 06:18 AM
тАО10-02-2004 06:18 AM
Re: DL140 spontaneous shutdown
I tried using the IPMI driver but it didn't seem to make much difference.
Only consistent thing is it keeps dropping messages in the event log about over temp, but the fans change speed for such a small amount of time I can't believe this is real.
I can only conclude that there's some sort of intermittent intereference in the temp measurement circuitry which is fooling the BMC into thinking the processors are in meltdown.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-02-2004 06:27 AM
тАО10-02-2004 06:27 AM
Re: DL140 spontaneous shutdown
http://welcome.hp.com/country/uk/en/contact_us.html
Let them know of the condition and get a support ticket started. They may have some suggestions off the bat that may help, but be sure to get a ticket started. They may warranty the boards if they know if an issue they have not published.
G'luck! -john
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-29-2004 04:17 PM
тАО10-29-2004 04:17 PM
Re: DL140 spontaneous shutdown
We've been dealing with this exact same problem with an order of 12 DL140s. However, we do not have a proprietary NIC in our systems. They are HP's standard DL140s with 2 processors and no CD-ROM.
We opened a support ticket with HP but have not received any resolution to the problem. The only additional information I have (that you may know by now) is that the "Critical Temperature Threshold" is only 60 degrees! Too bad we can't change (or even disable) this "feature."
I would greatly appreciate hearing of any success you have with this.
Thanks,
Devin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2004 11:08 AM
тАО11-30-2004 11:08 AM
Re: DL140 spontaneous shutdown
As long as the jobs being run have disk IO, there is no problem. But, I have software which loads a design into memory and then does computing. This runs the CPUs at a higher load. I can crash a DL140 consistanly.
This is using HP memory (brand new).
Has anyone gotten a fix?
My company policy is to use HP whenever possible and I need to get 8 more machines.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-30-2004 11:18 AM
тАО11-30-2004 11:18 AM
Re: DL140 spontaneous shutdown
It looks like HP released a BMC firmware update on 11/25 that might help.
http://h18007.www1.hp.com/support/files/server/us/download/22197.html
(I don't know why it isn't listed on the DL140 "Software and Drivers" page ...)
I just updated our systems today, so hopefully we'll see if it helps. Let me know if you already applied it or if it you see any improvement.
Thanks,
Devin
ps. If you haven't already seen it, there's a BIOS update from 11/25 also.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-01-2004 04:01 AM
тАО12-01-2004 04:01 AM
Re: DL140 spontaneous shutdown
I just wish HP could have had this fixed back in August.