ProLiant Servers (ML,DL,SL)
1822355 Members
5915 Online
109642 Solutions
New Discussion юеВ

Failed Disk in RAID5 fails to rebuild

 
SOLVED
Go to solution
DekPlen
Advisor

Failed Disk in RAID5 fails to rebuild

Hi all

I have been battling with an issue trying to replace a disk in a RAID 5 volume:

# hpssacli ctrl slot=0 pd all show

Smart Array P420i in Slot 0 (Embedded)

Array A

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA SSD, 250 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA SSD, 250 GB, OK)

Array B

physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA SSD, 2 TB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA SSD, 2 TB, OK)
physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SATA SSD, 2 TB, OK)
physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SATA SSD, 2 TB, OK)
physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA SSD, 2 TB, Failed)
physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA SSD, 2 TB, OK)

Unassigned

physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SATA SSD, 2 TB, OK)

The drive itself:

physicaldrive 1I:1:7
Port: 1I
Box: 1
Bay: 7
Status: Failed
Last Failure Reason: Hot removed
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/4096
Firmware Revision: M3CR046
Serial Number: **Confidential info erased**
WWID: 50014380314CA987
Model: ATA CT2000MX500SSD1
SATA NCQ Capable: True
SATA NCQ Enabled: True
Maximum Temperature (C): 37
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: Unknown
Drive Authentication Status: Not Applicable
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

The disk was originally at Predictive Failure state and so I pulled it to swap waiting for a few mins before replacing with another drive of the same type. The system starts the rebuild but after getting to approximately 5% the disk fails again. I have tried this with 3 other drives, all resulting with  the same outcome in this state, I cannot add a hotspare as the hpssacli states that the RAID volume is not OK.

I tried the modify disablepd to see if I could temporarily disable the drive

I also looked at modify reenable forced but that looks like it will wipe the LD.

I am wondering if the bay 7 is bad perhaps.

I cannot reboot at the moment as this is a proxmox server serving some VMs but if necessary wll have to restart out of hours.

 

I was wondering if it was possible to move the disk from failed position 7 and rebuild with the unassigned drive in bay 9 with a modify drive command?

 

Any suggestions would be welcome.

 

Thanks Dek

15 REPLIES 15
support_s
System Recommended

Query: Failed Disk in RAID5 fails to rebuild

System recommended content:

1. HPE OmniStack 4.1.3 for vSphere Release Notes

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

DekPlen
Advisor

Re: Query: Failed Disk in RAID5 fails to rebuild

Hi There,

Thanks for the reference, was there a particular section you were referring me to check?

Regards

Dek

DekPlen
Advisor

Re: Query: Failed Disk in RAID5 fails to rebuild

After persisting and trying a 4th disk the array finally  rebuilt and so I added a spare just in case.

But now I am seeing the following which I have not seen before..

Cache Board Present: True
Cache Status: Temporarily Disabled
Cache Status Details: Cache disabled; requires reboot to enable cache.
Cache Disable Reason: Temporary disable condition. Posted write operations were disabled by the Flush/Disable Posted-Write Cache command.

Does anyone know whether is there anyway to reenable this cache or reset this status without a reboot?

Thanks again


Dek

BPSingh
HPE Pro

Re: Query: Failed Disk in RAID5 fails to rebuild

Greetings!

Do you see a cache disable code in the array diagnostics report?



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DekPlen
Advisor

Re: Query: Failed Disk in RAID5 fails to rebuild

@BPSingh 

Hi There,

 

Is this report only available from Array Configuration Utility? If so, can this be run from the OS as I am unable to reboot this server right now, unfortunately.

Is there any other way whilst the system is running?

EDIT: I will look to install hpssaducli for debian, if I can

Thanks and Regards

 

Derek

BPSingh
HPE Pro

Re: Query: Failed Disk in RAID5 fails to rebuild

Greetings!

Did you get a chance to install the HPE SSAcli utility and check the status?



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DekPlen
Advisor

Re: Query: Failed Disk in RAID5 fails to rebuild

@BPSingh 

Hi There,

I have already installed hpssacli The output  I posted was from that utility. Is there something additional available in the cache full diag report? The hpssacli output was:

 

# hpssacli ctrl slot=0 show detail

Smart Array P420i in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: XXXXXXXXXXXX
Cache Serial Number: XXXXXXXXXXXX
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 8.32-0
Rebuild Priority: Low
Expand Priority: Medium
Surface Scan Delay: 15 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: No
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: Temporarily Disabled
Cache Status Details: Cache disabled; requires reboot to enable cache.
Cache Disable Reason: Temporary disable condition. Posted write operations were disabled by the Flush/Disable Posted-Write Cache command.
Drive Write Cache: Disabled
Total Cache Size: 2.0
Total Cache Memory Available: 1.8
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: False
SSD Caching Version: 1
Cache Backup Power Source: Capacitors
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 67
Cache Module Temperature (C): 34
Capacitor Temperature (C): 24
Number of Ports: 2 Internal only
Encryption: Not Set
Driver Name: hpsa
Driver Version: 3.4.20
Driver Supports SSD Smart Path: True
PCI Address (Domain:Bus:Device.Function): 0000:02:00.0
Port Max Phy Rate Limiting Supported: False
Host Serial Number: XXXXXXXX
Sanitize Erase Supported: False
Primary Boot Volume: logicaldrive 1 (600508B1001C84C325FF5126867E2892)
Secondary Boot Volume: None

DekPlen
Advisor

Re: Query: Failed Disk in RAID5 fails to rebuild

Has anyone ever come across this condition (running hpssacli) then:

 

Temporary disable condition. Posted write operations were disabled by the Flush/Disable Posted-Write Cache command.
Drive Write Cache: Disabled

 

Thanks again

BPSingh
HPE Pro
Solution

Re: Query: Failed Disk in RAID5 fails to rebuild

Greetings!

May I know if you had a chance to reboot the node and check the outcome?

Cache Status: Temporarily Disabled
Cache Status Details: Cache disabled; requires reboot to enable cache.
Cache Disable Reason: Temporary disable condition. Posted write operations were disabled by the Flush/Disable Posted-Write Cache command.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DekPlen
Advisor

Re: Query: Failed Disk in RAID5 fails to rebuild

Hi There,

Thanks for your message.

No we have not been able to reboot as yet as we have some services running on guest VMs that we are unable to stop at the moment.

On the week commencing 13th I may be able to.

But have you seen this message before ? Or do you know of a way to clear this message and reenable the cache without a reboot?

Thanks

Dek

PDP-Fan
Valued Contributor

Betreff: Failed Disk in RAID5 fails to rebuild

Is your replacement drive exactly the same as the failed one? This sounds like your new drive may be a tiny bit too small. If it's smaller than the others and even if it's only a few blocks, this may be the result.

Why not take the drive from slot9 to slot7 and see if that one works? It makes not much sense to have an unconfigured disk running in your system anyway...

***********************************************
"If it seems illogical... you just don't have enough information"
DekPlen
Advisor

Betreff: Failed Disk in RAID5 fails to rebuild

Hi There,

 

Many thanks for the reply. The disk is an identical model as far as I can see from the model number and details on the deviceThe spare disk is a hot spare and so I could potentially pull disk 7 to see if it uses disk 9.

regards

Derek

BPSingh
HPE Pro

Re: Query: Failed Disk in RAID5 fails to rebuild

Greetings!

As the error message implies the need to reboot the server, please proceed with a reboot and check the outcome. During the reboot, please pay close attention to see if any of these errors appear during the POST process.

HPE Smart Storage Battery Backup тАУ Start
HPE Smart Storage Battery Backup - Error



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DekPlen
Advisor

Re: Query: Failed Disk in RAID5 fails to rebuild

Hi There,

I jhave finally managed to reboot the server and the error condition cleared so it appears that there is no way to clear this with out a reboot.

 

Thanks for everyone's time. I have another issue for which I will post a  new question,

 

Thanks

Sunitha_Mod
Moderator

Re: Query: Failed Disk in RAID5 fails to rebuild

Hello @DekPlen,

That's perfect! 

We are extremely glad to know the problem has been resolved and we appreciate you for keeping us posted. 



Thanks,
Sunitha G
I'm an HPE employee.
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo