ProLiant Servers (ML,DL,SL)
1820699 Members
2740 Online
109627 Solutions
New Discussion

Microserver Gen10 Plus v2 NVMe SSD randomly crashes

 
SOLVED
Go to solution
Liceo
Occasional Advisor

Microserver Gen10 Plus v2 NVMe SSD randomly crashes

I have two Microserver Gen10 Plus v2 and i have problems to get an Samsung MVNe SSD 990 Pro runing stable. I have used a simple PCIe 4.0 Card with no active components on it. I'm running Server 2022 and installed the latest ProLiant updates using SUM.

On both servers, the Samsung NVMe SSD are running and i measure high troughputs (up tp 7000MB/s sequencial) which is pretty good. 

But suddently, the drive disappears completely (in disk mgmt, or device manager) followed by an the event: Reset to device, Device RaidPort1, was issued.

Any ideas what could cause this issue?

12 REPLIES 12
Liceo
Occasional Advisor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

 

Screenshot 2023-05-08 224919.jpg

The Problem seems to be related to the Standard NVM Express Controller

Suman_1978
HPE Pro

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

Hi,

I am not sure on the Microserver but in other ProLiant's, it was caused by Smart Array drivers or WBEM.  Please refer to these articles.

Certain HPE ProLiant Gen10 and Gen10 Plus Servers - HPE 12G SAS Expander Card May Stop Responding Due to Task Set Full Condition When Attached Storage is Accessed in HBA Mode

Multiple HpCISSs3 "Event ID 129" Messages May be Displayed in the Windows System Event Log When an HPE Smart Array H241 Storage Controller Card is Attached

https://community.hpe.com/t5/proliant-servers-ml-dl-sl/reset-to-device-device-raidport2-and-device-raidport1-was-issued/td-p/6848435

Thank You!
I work with HPE but opinions expressed here are mine.
HPE Tech Tips videos on How To and Troubleshooting topics



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Liceo
Occasional Advisor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

@Suman_1978 

That's interesting, indeed! But the MicroServer has no Smart Array Controller on board as far i know? Being a bit more precise, the error-event is thrown by stornvme followed by a number of other errors in this order:

  1. Event-ID: 11, Source: stornvme, Error: The driver detected a controller error on \Device\RaidPort1.
  2. Event-ID: 153, Source: disk, Warning: The IO operation at logical block address 0x432680 for Disk 4 (PDO name: \Device\00000074) was retried.
  3. Event-ID: 134, Source: ReFS, Error: The file system was unable to write metadata to the media backing volume E:. A write failed with status "A device which does not exist was specified." ReFS will take the volume offline. It may be mounted again automatically.
  4. Ohter subsequent errors...

But how "uninstall" or updarte the stornvme.sys which is a Windows driver compoment..?

 

Liceo
Occasional Advisor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

Anhother wierd behaviour: Just reboot the server doesn't bring the drive back to live. First i thought, this was a hardware issue until it happend on the other server as well. I even had replaced the "host bus adapter" PCIe 4.0 card with a different make/model (no active components), but the problem persists. 

When i run an MUM update deployment, the drive is back online after reboot. Maybe because it does toggle power off/on.

I will test today, if the drives come back after a normal shutdown, power-cut and boot again.

Liceo
Occasional Advisor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

OK, i can confirm: The drives come back after power toggle the servers. Just reboot is not sufficient.. 

Suman_1978
HPE Pro

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

Hi,

After the power reset, the drives came back online?
Is your issue solved after power reset?

Thank You!
I work with HPE but opinions expressed here are mine.
HPE Tech Tips videos on How To and Troubleshooting topics



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Liceo
Occasional Advisor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

@Suman_1978 

Sorry for the late reply: Yes drives comes back after power reset. But i think this is not solved, i just wait until it happens again. I had to move VMs away from this drive so no load at the moment...

Any ideas?

Vinky_99
Esteemed Contributor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

@Liceo 

The below stuff might help! Please give it a try:

1. Ensure that the Samsung NVMe SSD 990 Pro is compatible with your Microserver Gen10 Plus v2 and the PCIe 4.0 card you are using. Check the specifications and compatibility lists for both the server and the SSD to verify their compatibility with each other.
2. NVMe SSDs can consume significant power and generate heat. Make sure that your server's power supply is adequate to handle the power requirements of the SSDs and that the cooling system is properly functioning to keep the drives within acceptable temperature ranges.
3. Check for firmware updates for the Microserver Gen10 Plus v2, the PCIe card, and the Samsung NVMe SSD. Updated firmware often includes bug fixes and compatibility improvements that can help resolve such issues.
4. Make sure that you have the latest drivers installed for the PCIe card, the server's RAID controller (if applicable), and any other relevant hardware components.
5. If you have multiple PCIe slots available, try different slots for the NVMe SSDs. It's possible that there could be an issue with a specific slot that is causing the intermittent crashes. Experimenting with different slots can help isolate the problem.
6. Review the server's system logs and event viewer for any related error messages or warnings that may provide further insight into the cause of the SSD crashes. Look for patterns or specific error codes that could point to a particular issue.
7. Run stress tests on the NVMe SSDs to see if you can consistently reproduce the issue. This can help identify whether the problem occurs under specific conditions or workload scenarios, which may provide additional clues for troubleshooting.


If you have exhausted all troubleshooting options and the issue persists, consider reaching out to technical support for the Microserver Gen10 Plus v2, the PCIe card manufacturer, and Samsung. They may have specific insights or solutions tailored to your setup.

I hope this give some insights and help you in solving the problem!

Have a Good day!

These are my opinions so use it at your own risk.
Liceo
Occasional Advisor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

@Vinky_99 thanks for the advices, but the are more generic. Most of the topics i have covered already. And no, this is not an officially supported configuration, but i have used a very common NVMe card wich is running in 100 of thousands of systems. The PCIe card i already exchanged with another, same problem.  I did load test, but i could not reproduce the issue. This SSD is not getting that hot, so it seems to be pretty efficient.

Vinky_99
Esteemed Contributor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

@Liceo 

I apologize for the redundancy in the suggestions. If you have already covered those steps, and the issue persists, here are a few additional suggestions:

* Make sure that the power management settings for the NVMe SSDs and the PCIe card are properly configured. Disable any power-saving features that may be causing the drives to intermittently disconnect.

* Ensure that you have the latest BIOS version installed for your Microserver Gen10 Plus v2. BIOS updates often include stability improvements and bug fixes that could potentially resolve the issue.

* Verify that there are no IRQ conflicts between the PCIe card and other devices in the server. Conflicting IRQ assignments can lead to instability and device disconnections. If there is a conflict, try changing the slot or rearranging other devices to resolve the conflict.

* If possible, try using a different brand or model of NVMe SSD to see if the issue persists. This can help determine if the problem is specific to the Samsung 990 Pro or if it's a more general compatibility issue with NVMe SSDs in your setup.

* Since your configuration is not officially supported, reaching out to HPE support may provide further insights or possible solutions. They may have encountered similar issues or have specific recommendations for your setup.

* If you are unable to resolve the issue with the Samsung NVMe SSDs, you might consider alternative storage options such as SATA SSDs or traditional hard drives. Although they may not offer the same performance as NVMe SSDs, they can provide stable and reliable storage in your Microserver.

Hope this helps!

These are my opinions so use it at your own risk.
Liceo
Occasional Advisor
Solution

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

There was another firmware update available wich i rolled out 10th of June. Since then, the system seems to be stable. Knock on wood...

Vinky_99
Esteemed Contributor

Re: Microserver Gen10 Plus v2 NVMe SSD randomly crashes

That's good. Glad it is solved. 

These are my opinions so use it at your own risk.