ProLiant Servers (ML,DL,SL)
1819505 Members
3147 Online
109603 Solutions
New Discussion

Unrecoverable Media Errors Detected on Drives

 
SOLVED
Go to solution
Fabio D'Angelo
Occasional Advisor

Unrecoverable Media Errors Detected on Drives

Hello!

I have this message from SSA cli:

 

"Warning: Unrecoverable Media Errors Detected on Drives during previous Rebuild or Background Surface Analysis (ARM) scan. Errors will be fixed automatically when the sector(s) are overwritten. Backup and Restore are recommended."

 

 Everything works fine and no disk is marked as failed.

Obvoisly I cannot backup and restore a productions server just for check if the warning goes away.

How can I check every single disk without affecting RAID volumes?

How can I fix this issue?

Thank you.

F.

Controller: HPE Smart Array P408i-a SR Gen10
Server: DL380 GEN10

11 REPLIES 11
Fabio D'Angelo
Occasional Advisor

Rif.: Unrecoverable Media Errors Detected on Drives

=> ctrl slot=0 show

HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)


Warning: Unrecoverable Media Errors Detected on Drives during previous Rebuild
or Background Surface Analysis (ARM) scan. Errors will be fixed automatically
when the sector(s) are overwritten. Backup and Restore are recommended. The
following logical drives are affected: 1, 2

   Bus Interface: PCI
   Slot: 0
   Serial Number: PEXXXXXXXXX9R
   RAID 6 Status: Enabled
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 2.65
   Firmware Supports Online Firmware Activation: True
   Driver Supports Online Firmware Activation: True
   Rebuild Priority: High
   Expand Priority: Medium
   Surface Scan Delay: 1 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: Yes
   Current Parallel Surface Scan Count: 4
   Max Parallel Surface Scan Count: 16
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Enabled
   Write Cache Bypass Threshold Size: 1040 KiB
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 10% Read / 90% Write
   Configured Drive Write Cache Policy: Disable
   Unconfigured Drive Write Cache Policy: Default
   Total Cache Size: 2.0
   Total Cache Memory Available: 1.8
   Battery Backed Cache Size: 1.8
   No-Battery Write Cache: Disabled
   SSD Caching RAID5 WriteBack Enabled: True
   SSD Caching Version: 2
   Cache Backup Power Source: Batteries
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 53
   Capacitor Temperature  (C): 43
   Number of Ports: 2 Internal only
   Encryption: Not Set
   Express Local Encryption: False
   Driver Name: SmartPqi.sys
   Driver Version: Windows 1010.64.0.1037 Build qa
   WWN Port: 51402EC0128F1690
   PCI Address (Domain:Bus:Device.Function): 0000:5C:00.0
   Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
   Controller Mode: Mixed
   Port Max Phy Rate Limiting Supported: False
   Latency Scheduler Setting: Disabled
   Current Power Mode: MaxPerformance
   Survival Mode: Enabled
   Host Serial Number: CXXXXXXXK
   Sanitize Erase Supported: True
   Sanitize Lock: None
   Sensor ID: 0
      Location: Capacitor
      Current Value (C): 43
      Max Value Since Power On: 48
   Sensor ID: 1
      Location: ASIC
      Current Value (C): 53
      Max Value Since Power On: 59
   Sensor ID: 2
      Location: Unknown
      Current Value (C): 46
      Max Value Since Power On: 50
   Primary Boot Volume: logicaldrive 1 (600508B1001C9B6E7E1E0086F500D42B)
   Secondary Boot Volume: None
   SPDM Supports Get Slot Certificate Chain: no
   SPDM Supports Get Controller Info       : no
   SPDM Supports Get Slot Info             : no
   SPDM Supports Set Import Certificate    : no
   SPDM Supports Set Invalidate Slot       : no
   Surface Scan Completion Supported: False
   Persistent Event Log Policy Change Supported: False
Fabio D'Angelo
Occasional Advisor

Rif.: Unrecoverable Media Errors Detected on Drives

=> ctrl slot=0 pd all show

HPE Smart Array P408i-a SR Gen10 in Slot 0 (Embedded)

   Array A

      physicaldrive 1I:3:1 (port 1I:box 3:bay 1, SAS HDD, 1.2 TB, OK)
      physicaldrive 1I:3:2 (port 1I:box 3:bay 2, SAS HDD, 1.2 TB, OK)
      physicaldrive 1I:3:3 (port 1I:box 3:bay 3, SAS HDD, 1.2 TB, OK)
      physicaldrive 2I:3:5 (port 2I:box 3:bay 5, SAS HDD, 1.2 TB, OK)
      physicaldrive 2I:3:6 (port 2I:box 3:bay 6, SAS HDD, 1.2 TB, OK)
      physicaldrive 2I:3:7 (port 2I:box 3:bay 7, SAS HDD, 1.2 TB, OK)

   Array B

      physicaldrive 1I:3:4 (port 1I:box 3:bay 4, SAS SSD, 960 GB, OK)
      physicaldrive 2I:3:8 (port 2I:box 3:bay 8, SAS SSD, 960 GB, OK)
TVVJ
HPE Pro

Rif.: Unrecoverable Media Errors Detected on Drives

Hello,

You may generate the array diagnostics utility report and check if a hard drive is reported defective. You may refer to section "13.1.5.3 Generating ADU Report" on page 114 of the HPE Smart Storage Administrator CLI User Guide for the command.

Regards,



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[All opinions expressed here are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Fabio D'Angelo
Occasional Advisor

Rif.: Unrecoverable Media Errors Detected on Drives

None of the drives Is reported as defective.
Sebasbin
HPE Pro

Re: Unrecoverable Media Errors Detected on Drives

 

Hi,

Please refer the below advisory and follow the instruction in the resolution

https://support.hpe.com/hpesc/public/docDisplay?docId=a00104843en_us&docLocale=en_US



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Fabio D'Angelo
Occasional Advisor

Re: Unrecoverable Media Errors Detected on Drives

@Sebasbin 

This is not an acceptable solution for a production server.

Thank you.

Fabio D'Angelo
Occasional Advisor

Re: Unrecoverable Media Errors Detected on Drives

Fixed! (I hope...)

I did not delete the raid volumes, no backup and restore. I did not take offline the server.

I will share my solution here if the warning doesn't reoccur, just wait a couple of days.

Fabio D'Angelo
Occasional Advisor
Solution

Re: Unrecoverable Media Errors Detected on Drives

My solution:

- just update firmware on disks, controller, MB and ILO (I had to update some firmware manually because SPP hasn't updated them... I don't know why)

- update Smart Storage Administrator to latest release then I changed surface scan priority to high (or delay, I don't remember) and a new surface scan takes place.

The  Unrecoverable Media Errors goes away and never comes back.

Bye

WardenSirpa
Established Member

Re: Unrecoverable Media Errors Detected on Drives

@Fabio D'Angelo 

Sorry to necro, but how did you upgrade the firmware of your disks and motherboard without taking the server out of production? Or was this scheduled downtime for maintenance?

Sunitha_Mod
Moderator

Re: Unrecoverable Media Errors Detected on Drives

Hello @WardenSirpa,

Thank you for posting.

You might want to consider creating a new topic by utilizing the ""New Discussion"" button, as this will not only enhance visibility compared to the old topic but also boost your chances of receiving responses from experts.



Thanks,
Sunitha G
I'm an HPE employee.
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Danny9000
Occasional Visitor

Re: Unrecoverable Media Errors Detected on Drives

Hi

update Smart Storage Administrator to latest release then I changed surface scan priority to high (or delay, I don't remember) and a new surface scan takes place.

did you run the scan under Esxi or Windows?

Did you set it to high an reboot the server or could you change this online

Thanks