HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ML350p - HDD Access freeze/lock

 
Andy Warner
Advisor

ML350p - HDD Access freeze/lock

The issue is a HP ML350p Gen8 (Xeon E5-2620, 64GB ram) running Server 2012/HyperV where all the guest OS's lose access to the hard drives...

 

RAID 1 (2x146GB SAS) C: Partition

Server 2012 (HOST OS)

 

RAID 10 (6x450GB SAS) E: Partition

SBS 2011 (Guest1 VHDX)

Server 2008R2 (Guest2 VHDX)

Server 2012 (Guest3 VHDX)

Windows 7 (Guest4 VHDX)

 

512MB Cache (75% write/25% read) on Raid10 only

 

What happens is that all four guest OS’s freeze but the host OS continues to work correctly. After 5-10 minutes they all start working again (without restart or reboot) and the event logs on each guest OS indicate issues accessing the hard drives during this period. If I open Resource Monitor during this time and look at disk activity is shows no read or write to any of the VHDX's on the RAID10 but the ‘Active Time’ is around 95%+ for E partition only. The System Process (PID 4) read & write gradually increases up to 120,000,000 Total(B/sec) and then all of a sudden the system starts writing to the VHDX's and this figure drops down to around 5,000,000 Total(B/sec) within 10 seconds. There is nothing in the event logs of the host OS during this time.

 

The server is running the latest Proliant Service Pack and all hardware is showing as fine. The problem is random and has occurred at the following times...

 

16/05 - 9.13am

10/05 - 1.19pm

03/05 - 3.38am

26/04 - 8.24am

25/04 - 8.22am

23/04 - 9.15am

23/03 - 0.05am

27/02 - 7.38am

 

I have attached some screenshots of Resource Monitor during the freeze and shortly afterwards, view in order 1-6.

 

The raid cache was replaced as when we turned it off the server did not freeze for around  a week. However 15 minutes before the HP engineer turned up to replace it the server froze. The engineer replaced the part anyway and we also removed a 3rd party USB3.0 card in case that was causing an issue but the server has frozen since.

 

The customer/end user is starting to get very annoyed about this problem and although HP have been very helpful they want the problem resolved ASAP.

6 REPLIES
Andy Warner
Advisor

Re: ML350p - HDD Access freeze/lock

Crash2.jpg

 

 

 

 

Crash5.jpg

Oscar A. Perez
Honored Contributor

Re: ML350p - HDD Access freeze/lock

This could be a long shot but, please log into iLO4, go to "System Information->Firmware" and check what version of the "System Programmable Logic Device" you have on that system. Also, please check what iLO4 firmware version you have.

There is a new System Programmable Logic Device version 4C for ML350p Gen8 on the web to fix the incorrect frequency that iLO4 is running on the ML350p Gen8 system. Since the Matrox video is embedded in iLO4 Chip, the wrong frequency is also affecting video refresh rate.

As I said, this could be a long shot but, it is worth a try. Beware that flashing "System Programmable Logic Device" is a very delicate process so, PLEASE read carefully the instructions. A reboot is needed after flashing. If the upgrade process is interrupted, you will end up with a 18.19 x 8.58 x 29.13 in, 70Lbs paper weight.  I would recommend you to upgrade iLO4 firmware to version 1.22 before attempting to upgrade the System Programmable Logic Device.

 

 




__________________________________________________
If you feel this was helpful please click the KUDOS! thumb below!
TarunJain
Respected Contributor

Re: ML350p - HDD Access freeze/lock

Hi Andy,
Controller replacement also worth a shot in this case. It will help to confirm if its hardware fault or any Hyper-V Bug.
If issue doesn't get resolved then MS should be engaged to check the memory dump.
You can configure the server for kernel memory dump and generate a Memory dump at the time of issue.
Hope this will be helpful
-Tarun Jain
Andy Warner
Advisor

Re: ML350p - HDD Access freeze/lock

Ok, im leaning towards a software issue or bug.

 

During the freeze the 'System PID4' process is going crazy with reads and writes but not to any file, the available disk space dramatically decreases by almost 15GB a minute so it must be writing something somewhere. Two of the three VHDX's are fixed and the other is very small in size, nothing else exists on the partition.

 

What and where is this process writing data is the question and what is causing it?

TarunJain
Respected Contributor

Re: ML350p - HDD Access freeze/lock

It could be writing to one of many possibilities: Page files, system files, File Streams etc.
Maybe you can get more clues on Microsoft Hyper-V Forum
TarunJain
Respected Contributor

Re: ML350p - HDD Access freeze/lock

Hi Andy,
Hope you are doing fine.
Did you nail down the cause?