HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

HPE ProLiant Gen 10 Server's Smart Cache Issues

 
PDS-Consulting
Collector

HPE ProLiant Gen 10 Server's Smart Cache Issues

We have decided to take to these forums in a desperate attempt to get a resolution to an issue that has been plaguing us since the release of Gen 10 servers. Our many attempts at contacting HP support for a resolution has failed and been met with poor attempts to make light of the issue.  Our company which is an MSP and HP partner has taken to selling Gen 10's to our customers.  With that being said… since the first Gen 10 we have sold we have had an issue with HP Smart Cache.  We now have 3 Gen 10's that have this issue one of which is in production and the other two are waiting on our work bench for this issue to be resolved.  Here is the rundown of what the issue is and how it occurs I will also list the Hardware specs of each server which are very similar. 

No matter how the server is configured after installing Sever 2016 using intelligent provisioning the smart cache seems to fail.  we have the HDD’s in the server configured as Raid 5 and the 2 SSD's in the server configured as Smart Cache.  Now to answer to obvious question, yes, we do have Smart Cache Licensed. Let me take you through the steps of setting up one of these and what happens.

After receiving the server, we get them out of the box and install the hard drives and get it booted to the Smart Array Manager where we put in the Smart Cache license. While we are in the Smart Array manager we set up all the HDD’s in usually raid 5 and leave the SSD’s for the smart cache.
We then proceed to install Microsoft Server 2016 Standard using HP Intelligent Provisioning. Now this is the point where we thought there might be an inconsistency, so we reversed this and even tried setting up Smart Cache before installing the OS but still run into the same issue.  So, any way after OS is installed and ready we reboot to Smart Array Manager once again to where we proceed to enable Smart Cache on the 2 Intel 480Gb SSD’s.  We reboot out of Smart Array and load back into the OS. We then do a standard reboot from the OS to make sure everything is working fine, and this is where trouble begins.  After reboot and during post we get the error that Smart Cache has failed.

The code that shows is “1831-Slot X Drive Array – Data in Write-Back Smart Cache has been lost”. Luckily on a fresh OS install this doesn’t really cause an issue. We can just rebuild the HP Smart Cache and continue into the OS. However, if you have Hyper-V installed which is what these servers are used for then you will corrupt the OS and must do a complete reinstall or reload a backup of windows.  Now of course this issue occurs every time the system restarts so its impossible to use and dangerous to put in production.

Now here are the specs of the 3 servers.

Server 1:

HP ProLiant ML350 Gen 10
32 GB ECC RAM
1 Intel Xeon Silver 4110
HPE Smart Array P408i-a SR Gen10 Controller
1 Intel SSD S4600 Series 480 GB
4 HPE 10K 1.2 TB SAS Drives

 

Server 2:

HP ProLiant DL360 Gen10
80 GB ECC RAM
1 Intel Xeon Silver 4116
HPE Smart Array P408i-a SR Gen10 Controller
2 Intel SSD S4600 Series 480 GB
6 HPE 10K 1.2 TB SAS Drives

 

Server 3:

HP ProLiant ML350 Gen 10
48 GB ECC RAM
1 Intel Xeon Silver 4116
HPE Smart Array P408i-a SR Gen10 Controller
2 Intel SSD S4600 Series 480 GB
4 HPE 10K 1.2 TB SAS Drives

 

At this point we decided to do some investigation of our own and after some time of working on server #3 which is the one in production right now.  We found that it seems to be driver related to the HPE Smart Array P408i-a SR Gen10 Controller.  We tried several combinations of drivers and firmware with both the SSD’s and the controller and after all that it seems the correct combination to work around this issue is using the oldest driver available for the controller with the newest firmware.  Now you can say at this point and time “Well what is the issue? You got it working” and you would be right we did get it working. How ever using an outdated driver is very unstable for production and if ILO or Windows update decides to run and update the driver then we lose everything because the issue will return. 

This is the point where we took to HPE Support in hopes of finding a better resolution instead of having to use old drivers on a new server. After several hours and several support agents we have got nowhere.  First thing they had us try was replacing the 2 SSD’s and the HPE Smart Array P408i-a SR Gen10 Controller with new ones incase it was hardware related???? After that HP logged into the servers and made sure that the BIOS was completely up to date which it was as we had done that in our testing.  Next, they went into ProLiant service pack which we had installed and updated all drivers and firmware.  None of this had resolved the issue. So then came the excuses and wild guesses yet with no offer of a resolution.
We had one L2 agent tell us that Gen 10 servers do not support the 480 GB Intel SSD’s that HP sales sent with the servers. He suggested we use Samsung SSD’s instead.  However, could not show any documentation proving this or provide any proof. Once we pointed to the GEN 10 and Smart Cache quick specs sheets which says they work together and are recommended we got quickly brushed off and told we would get an email with some proof which never showed up.  Next agent told us that the last agent was crazy, and they do work. He suggested just using the old driver till they get it resolved. Once again, we mentioned the old driver is unstable at best and should not be the only solution and Once again we were brushed off and told there is nothing they can do.

So, after countless hours of time spent and after delaying the deployment of these servers to our customers. We are still no closer to a resolution and it feels like HP has given up on trying to help.  We aren’t asking for much other then a resolution that does not compromise the servers.  We are hoping to get an official response from HP on this and maybe some feed back from the community.

3 REPLIES
krobert1978
Occasional Visitor

Re: HPE ProLiant Gen 10 Server's Smart Cache Issues

Hello PDS-Consulting,

Thank you for your recent submission to the HPE community regarding your issues with HPE Smart Cache etc.

This was shared with me this morning and I have elevated your message to the appropriate team. 

Great news! The issue you shared is being tested and attempting to replicate.  The engineering team has ask me for your contact information for when they find the resolution or if they have specific questions.  

Please use the Private Message option to send me your contact details and I'll continue to do whatever it takes to find you a resolution.

Regards, Kelly Robert- HPE Service Business Manager

 

 

PDS-Consulting
Collector

Re: HPE ProLiant Gen 10 Server's Smart Cache Issues

Thank you,

Contact information sent

krob2018rks
Occasional Visitor

Re: HPE ProLiant Gen 10 Server's Smart Cache Issues

Thank you again for sharing your experience and for partnering with Hewlett Packard Enterprise to ensure a resolution was reached for you.  Today I confirmed with our elevation team that you informed them the issue has been resolved after further working with an HPE Engineer to update a driver and the case has been closed.  Thank you again, we appreciate the opportunity to make sure your case was resolved. KR