ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

All of our HP servers kicked into high fan speeds for no obvious reason at the same time?

 
etank52
Occasional Contributor

All of our HP servers kicked into high fan speeds for no obvious reason at the same time?

We have three DL380 G10's in a hyper-v cluster.  Two HP DL380 G9's.  Two HP DL380 G8's.  Two HP DL380 G7's.  One HP DL380 G6... we have a P2000 G3 SAN, a 2040 SAN, and a 2052 SAN.  I can't be sure about the san's fan's - but all of our servers suddenly kicked into high fan speeds today around 1:22pm CST.  They all continued to spin up and slow down over the course of the next hour or so.  Sounded like someone was opening the chassis, or it was rebooting in a sauna room. 

Our Hyper-V cluster went down - all of our servers had black screens and were unresponsive.  Two of our three hyper-v cluster servers suddenly came to life on the vga ports showing they needed some windows updates, but what would cause them to be unresponsive for so long and then suddenly return to a normal windows screen?  The other one server in our 3 cluster farm I powered off manually to try to bring it back to life.  It stayed at the windows spinning circle for 15 minutes before I pulled it's power again.

To get them working again, the only thing I did different to the servers was remove all their network connections.  They use a 1gb port for heartbeat and have 4 10gb ports for all the hyperv and VM traffic.  After they booted up without NICs, I plugged in their cables, and after a few minutes of stability rebooted each again for good measure. 

I've left all the iLO ports unplugged for now thinking maybe we got hacked?  I've upgraded the iLO firmware a few months back so we're not too far behind if at all.  I'm just baffled why all of our HP servers started freaking out at the same time and all became unreponsive.  One of our G9 computers I did NOT detach from the network, but it eventually came back to life as well... after a few more restarts life is back to normal without any indication in the windows logs as to what caused this fiasco.  Since I still have the iLO ports disconnected - I'm going to wait until this weekend before I tinker with those and see the log files there to shed some light on maybe hardware failure?

It just doesn't make any sense that servers in different racks, on different circuits, on different generators and UPSes, all get hit at the same time?  All become unresponsive to a direct keyboard and monitor... and all finally coming back to life after disconnecting NIC cards and restarting the windows 2016 datacenter OSes... 

Anyone seen something like this before?

2 REPLIES 2

Re: All of our HP servers kicked into high fan speeds for no obvious reason at the same time?

Dear etank

   Certainly not,better to have a Disaster recovey PLan & initiate a case with HP, so then u will get a local partner support as well.

Best Regards
Mohamed
System Engineer
Suvamay
HPE Pro

Re: All of our HP servers kicked into high fan speeds for no obvious reason at the same time?

Hello,

This is scenario based issue. Needs to check the logs during that point of time. We would suggest you to log a support ticket with HPE to work further on this issue.

I am an HPE employee