ProLiant Servers (ML,DL,SL)
1752676 Members
6224 Online
108789 Solutions
New Discussion

Re: DL380G5 unexpected shutdowns on a daily basis

 
Skrou
Occasional Visitor

DL380G5 unexpected shutdowns on a daily basis

Hello all,

 

Since last Wednesday this server of my company keeps on shutting down. It's a DL380G5 that has IIS and some printers on it. On Wednesday it shutdown once. Then it started shutting down twice a day, and today I powered it from home at 7:30am (GMT+2) only to find out it shutdown again until I got to work (30 minute drive).

 

Through Event Viewer, I can't find anything related to it. So my first guess was the PSU. But later I found that the server's memory is running high on Temperatures and that might be it. Even after it's down for 3-4 hours, the moment it boots the temperatures are 50-60 degrees of Celcius. I have attached a picture of the Speedfan readings.

 

Another thing is I cannot access the HP System Management Homepage due to not having username/pass. And I don't have a way of finding them out. Been in the company for only a month and the guys who set up the servers are nowhere to be found.

 

Should the temperature be that high the moment the server boots after 4 hours of downtime? Is there any way that the memory readings are wrong?

 

Thank you in advance for your patience and possbile solutions.

6 REPLIES 6
MTtech
Visitor

Re: DL380G5 unexpected shutdowns on a daily basis

I am not the most qualified on your specific system, but I'll give you what I know.

 

Is the event log showing a dirty shutdown? "Through Event Viewer, I can't find anything related to it"

There really should be some entry unless the system is configured not to log them - see here Technet Windows not logging 6008 - if it is configured to log the unexpected shutdown and is not doing it, then I think you are on the right path with the overheated memory.  I have been under the understanding that memory should only operate between 30 - 40 C max.

Memory and CPU start heating up immediately at power on, so the fact that there is no delay in the problems doesn't surprise me much.  Drives would be more inclined to heat up over time.

 

I would first thing slide that system out of the rack and make sure all the internal components are clean of dust and the memory is able to "breath" and has good airflow over it.

With the top cover off and the use of a proper static strap, you should physically be able to verify if the software is lying to you or if you really have a memory heat issue by holding your hand directly above the memory or using a laser thermometer.

 

If overheated memory is not the issue, then I would power off and re-seat both hot swap PS units.  I have had similar issues with a G3 that this corrected the problem

 

Historically I have noticed this behavior with a failing power supply as you mentioned in your post.  Do you have redundant power supplies in your DL380G5?  If so, I would isolate the individual supplies by unplugging one and verifying that the system fails in the same manner.  Then plug the supply back into power and remove power from the other supply an document the behavior in that setup.  If it behaves badly or turns off when you remove the power from supply.

 

I feel for you on the "homepage password problem"  I had the same issue at my new employment.

I never have found a solution to that. Maybe HP can help there.

Skrou
Occasional Visitor

Re: DL380G5 unexpected shutdowns on a daily basis

Thank you for your reply MTech.

The thing is, that there aren't 2 PSU's. Just one. I've already ordered a second one and I expect it to arrive by Monday.  I have "tested" the memory temperature by hand and they are in fact quite hot. I cleaned it up and it was up and running for 2 consecutive days and I have to admit I did kind of get a peace of mind.

 

But just a few minutes ago, Nagios alerted me it went down again. I only hope it is a failing PSU as it is on its way.

 

MTtech
Visitor

Re: DL380G5 unexpected shutdowns on a daily basis

Good luck.  If the PS is bad, it could be causing a little of both problems.  If the memory has been operating that hot, I think I would schedule to replace it sooner than later as well.

A way to get by with the old stuff for a while (don't tell any tier 3 techs)  Hang an extra 3" brushless muffin fan inside the chassis just by positioning it and using zip ties under tension to directly remove heat form the memory.  The chassis should still exhaust normally, but the memory usually has a little air space that doesn't flow well.  I'd hang it just behind the drive enclosure and suspend it up off of the motherboard.  I have my fingers crossed that the PS will correct the issues.

Best wishes.

waaronb
Respected Contributor

Re: DL380G5 unexpected shutdowns on a daily basis

Have you checked ILO or the integrated management log (IML) to see what, if any, it reported?

 

If it's a thermal shutdown it would be logged in IML.  It's kind of unmistakeable when you see it happening because the server will reboot, go to a POST screen where it shows the thermal warning, then shuts down for a while and it may try booting back up after a while.  if it's cooled down, it'll boot okay, otherwise it repeats and shuts down again.

 

If it's a power supply issue, it may or may not log it in the IML... it might just show up as if the power was unplugged because the PSU conked out.  The PSUs do have some basic self tests but usually they either work or they don't.  Having redundant supplies is awesome and definitely recommended for any important server.

 

I'm looking at one of my DL380 G5's right now and the ILO doesn't show the memory temp like that program you use, but 55-60C doesn't seem too bad?  My CPU readings are 33C but there's not really much load on it now.

 

Your temp readings will vary based on load, so keep that in mind.  Also, do you know if it has all of the fans installed, including the "redundant" ones (I always install all fans just for peace of mind... even on a single CPU system).

 

Check the air flow inside, make sure all of the baffles are there, correctly installed, and all the fans are working.  If a fan failed and you lost redundancy, it would warn you.  But yeah, I think 55-60C isn't enough to cause it to fail, but I guess it depends on the condition and age of the module I guess.

Jonashdez
Occasional Advisor

Re: DL380G5 unexpected shutdowns on a daily basis

I am having the same issue,  does someone has an idea about how to determinate what the problem could be?

ShawnL
New Member

Re: DL380G5 unexpected shutdowns on a daily basis

We are having the same issues on about 6-8 different Servers at different locations. All showing unexpected shutdowns and Automatic Server Recovery by hp.

All have started within past month or so. 

Have updated to recent HP SPP firmware from April.

Ran ILO diagnostics on all and all hardware passed for all the servers.