ProLiant Servers (ML,DL,SL)
1751720 Members
5572 Online
108781 Solutions
New Discussion юеВ

Re: Smart Array Ready for Recovery - but recovery not starting

 
SOLVED
Go to solution
acrossley
Visitor

Smart Array Ready for Recovery - but recovery not starting

I have a RAID 1 drive which had a disk failure. I replaced the drive and it now says "Ready for Recovery" but after a couple of weeks it still says ready for recovery and there's no sign of it starting recovery. The server is operating normally but I presume this disk is currently unprotected. The array is the main C: drive with Windows on it so won't have many quiet periods.

Could someone advise on how to move this forward,

Thanks

9 REPLIES 9
parnassus
Honored Contributor

Re: Smart Array Ready for Recovery - but recovery not starting

You provided no info about what HP/HPE ProLiant Server and what HP/HPE Smart Array Controller are you dealing with...so what Server? what Smart Array Controller? what's about checking your (Physical/Logical) RAID 1 Array status by using the HPE Smart Storage Administrator (known as HPE SSA) application directly from withing your Windows OS? <-- supposing you already installed HPE SSA application on your Windows OS.


I'm not an HPE Employee
Kudos and Accepted Solution banner
Torsten.
Acclaimed Contributor

Re: Smart Array Ready for Recovery - but recovery not starting

Some entry class disk controllers need a reboot after replacing a disk. During boot you need to answer the question about how to proceed (press F1 or F2).


Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
acrossley
Visitor

Re: Smart Array Ready for Recovery - but recovery not starting

Sorry - too eary in the day to think ...

It's a ProLiant DL80 Gen9 server with a Dynamic Smart Array B140i RAID Controller

It's running Windows Server 2012 R2

I had been using the HPE Systems Management and that was giving me the details on the disk controller. I've now installed HPE SSA and run a diagnostics report which is at https://www.dropbox.com/s/ig8ikj35ne7dcq6/ADUReport.zip?dl=0. I'm working my way through this but not seeing anything with my untrained eye that helps.

Thanks

[Note: broken link updated/removed by Mod]

acrossley
Visitor

Re: Smart Array Ready for Recovery - but recovery not starting

Thanks - it has been rebooted several times.

F2 shows messages that the disk array needs to be rebuilt and is queued for rebuilding

F1 continues to boot

I've looked in the Smart Disk Array controller from boot up and couldn't see any options there to trigger the rebuild. Everything indicates it is queued for rebuilding and it should just happen. But it's been a few weeks and still nothing!

Torsten.
Acclaimed Contributor
Solution

Re: Smart Array Ready for Recovery - but recovery not starting

In the B140i User guide please read chapter

 

Recovering from compromised fault tolerance


Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
parnassus
Honored Contributor

Re: Smart Array Ready for Recovery - but recovery not starting

Since your Server has about 4TB Logical Volume to rebuild from "good" disk (D006) to the replaced one (D007):

[2017-07-19 09:03:50] Array B Unit U01: RAID 1
[2017-07-19 09:03:50]   U01 from 2 drives: D006  D007[R]
[2017-07-19 09:03:50]   stripsize=512 (256 KiB) volstate=NEEDS_REBUILD datadrives=1 paritygroups=1 cache=disabled accel=disabled/disabled
[2017-07-19 09:03:50]   offset=0x0 logical_blocks=0x1D1BFBEB0 (3725 GiB)
[2017-07-19 09:03:50]   uf=0x0 srf=0x1 dt=3 pdm=0 psf=4 bd=0x0 naz=0x0 nwz=0x0 bsf=512 muf=0x0

where D006 and D007 disks are, respectively:

[2017-07-19 09:03:50] D006 p0|0x1 [01]P2I:02:03,HDD  ATA     ,ST4000NM0033-9ZM,SN06    ,S1Z2GDX1            ,NCQ,07K,SCFW=11,SCTYPE=1
[2017-07-19 09:03:50] D007 p0|0x1 [01]P2I:02:04,HDD  ATA     ,ST4000NM0033-9ZM,SN06    ,S1Z2KC9C            ,NCQ,07K,SCFW=11,SCTYPE=1

supposing the only one remaining disk (D006) of that particular RAID 1 is good enough (has no unrecoverable errors) then the rebuilt process - in the very best case - would be very long (leaving the RAID 1 Volume U01 "degraded" and so unprotected against another drive failure)...probably it would take up to 2 weeks (considering an average of 20 minutes per Gigabyte)...if it starts.

It's not starting...AFAIK.

Regarding the ADU logged Error, the HPE ProLiant Gen9 Troubleshooting Guide (Volume II: Error Messages) reports:

Logical drive state: The logical drive is queued for rebuilding

Symptom
Logical drive state: The logical drive is queued for rebuilding.
 
Cause
A logical drive is queued for rebuilding.
 
Action
No action is required. Normal operations can occur; however, performance will be less than optimal during the rebuilding process.
 
Maybe I'm wrong in reading ADU logs...but it seems that the drive you used as replacement (D007) ran for at least 1400 hours (Power On hours) - so it's not "new" - compared to about 8300 hours of the drive your RAID 1 is rebuilding from (D006)...don't know if this is positive or not regarding rebuilding speed (Server I/O Load it is for sure...since Online rebuilding happens when OS is running...you should consider this point)...another thing to say reading your logs (Slot0b.txt) is that D006 and D007 about 1 Year ago (August 2016) were differents units (see theirs respective Serial Numbers), same models:
 
[2016-06-28 09:05:06] D006 p0|0x1 [01]P2I:02:03,HDD  ATA     ,ST4000NM0033-9ZM,0001    ,Z1Z0TMBP            ,NCQ,07K,SCFW=11,SCTYPE=1
[2016-06-28 09:05:06] D007 p0|0x1 [01]P2I:02:04,HDD  ATA     ,ST4000NM0033-9ZM,0001    ,Z1Z18VW3            ,NCQ,07K,SCFW=11,SCTYPE=1
this means that something (bad sectors? failure(s) due to overtemperature?) already happened in the past to those two drives (initial D006 failure - U01 remained in degraded state for days/weeks since D006 was replaced with a new Disk, then rebuild started finally on September, 9th 2016 [*] and then, June 22nd 2017 very bad day for you old D007 disk this time, your Server started to have another issue on your old D007 - the one with S/N Z1Z18VW3 - replaced the day after).
 
On June, 23rd D007 was replaced with the one your Server has actually (S/N S1Z2KC9C) but its rebuild process never started (or, eventually if started as expected, never actually ended...which is a thing I personally doubt).
 
Also Surface Analysis processes seem to have, generally speaking, issues - Surface Analysis not completed - on all your RAID 1 / RAID 5 Volumes (at least for U00, U01 and U02 as per ADU Logs)
 
At this point, what's about:
 
  • Checking your Server's status regarding Firmware version (especially the SystemROM for your Server which embed Firmware for Dynamic Smart Array B140i) and B140i Software Drivers version (especially the Storage driver) for your Microsoft Windows Server 2012 R2 operating system?
  • Checking if disks you use are good enough (failure rate is strange or they are very heavily used or not best suited to be used into a Server) considering they look like non genuine HPE SATA drives (their Firmware looks the Seagate Constellation ES.3 original one, not the typical HPE - HPxx(x) - customized one found on genuine branded HPE drives for HPE ProLiant Gen9).
  • Checking Server's operational temperature (the higher it is the worst it is for rotational disks).

So questions, in the end, are:

  • Why rebuild process of U01 is not starting?
  • Is the D006 drive really good? (it hasn't unrecoverable errors that may stop rebuild process to start)
Note that Rebuild Priority for the HPE Dynamic Smart Array B140i is set to Rapid High.
 
[*] September, 9th rebuild of D006 from D007 happened:
 
[2016-09-09 10:29:19] Array B Unit U01: RAID 1
[2016-09-09 10:29:19]   U01 from 2 drives:  D006[R] D007
[2016-09-09 10:29:19]   stripsize=512 (256 KiB) volstate=NEEDS_REBUILD datadrives=1 paritygroups=1 cache=enabled accel=disabled/disabled
[2016-09-09 10:29:19]   offset=0x0 logical_blocks=0x1D1BFBEB0 (3725 GiB)
[2016-09-09 10:29:19]   uf=0x0 srf=0x1 dt=3 pdm=0 psf=4 bd=0x0 naz=0x0 nwz=0x0 bsf=512 muf=0x0
[2016-09-09 10:29:19] Array B Unit U01: RAID 1
[2016-09-09 10:29:19]   U01 from 2 drives:  D006[R] D007
[2016-09-09 10:29:19]   stripsize=512 (256 KiB) volstate=REBUILDING datadrives=1 paritygroups=1 cache=enabled accel=disabled/disabled
[2016-09-09 10:29:19]   offset=0x0 logical_blocks=0x1D1BFBEB0 (3725 GiB)
[2016-09-09 10:29:19]   uf=0x0 srf=0x1 dt=3 pdm=0 psf=4 bd=0x0 naz=0x0 nwz=0x0 bsf=512 muf=0x0
[2016-09-09 10:29:19] Starting rebuild U01 D006 ibc=2048
[2016-09-09 10:39:19] Rebuild progress U01 D006: 0x1C66B16B0 blocks remaining, 2.5% complete (will START OVER if reset)
[2016-09-09 10:49:20] Rebuild progress U01 D006: 0x1BB09D6B0 blocks remaining, 4.9% complete (will START OVER if reset)
[2016-09-09 10:59:21] Rebuild progress U01 D006: 0x1AFA4F6B0 blocks remaining, 7.4% complete (will START OVER if reset)
[2016-09-09 11:09:22] Rebuild progress U01 D006: 0x1A458F6B0 blocks remaining, 9.8% complete (will START OVER if reset)
[2016-09-09 11:19:22] Rebuild progress U01 D006: 0x19923B6B0 blocks remaining, 12.2% complete (will START OVER if reset)
[2016-09-09 11:29:24] Rebuild progress U01 D006: 0x18DE92EB0 blocks remaining, 14.6% complete (will START OVER if reset)
[2016-09-09 11:39:25] Rebuild progress U01 D006: 0x182BC76B0 blocks remaining, 17.0% complete (will START OVER if reset)
[2016-09-09 11:49:26] Rebuild progress U01 D006: 0x177B946B0 blocks remaining, 19.4% complete (will START OVER if reset)
[2016-09-09 11:59:26] Rebuild progress U01 D006: 0x16CAFBEB0 blocks remaining, 21.7% complete (will START OVER if reset)
[2016-09-09 12:09:27] Rebuild progress U01 D006: 0x161D216B0 blocks remaining, 24.1% complete (will START OVER if reset)
[2016-09-09 12:19:27] Rebuild progress U01 D006: 0x156FFE6B0 blocks remaining, 26.4% complete (will START OVER if reset)
[2016-09-09 12:29:28] Rebuild progress U01 D006: 0x14C3756B0 blocks remaining, 28.7% complete (will START OVER if reset)
[2016-09-09 12:39:29] Rebuild progress U01 D006: 0x141A45EB0 blocks remaining, 31.0% complete (will START OVER if reset)
[2016-09-09 12:49:31] Rebuild progress U01 D006: 0x1370DEEB0 blocks remaining, 33.3% complete (will START OVER if reset)
[2016-09-09 12:59:32] Rebuild progress U01 D006: 0x12C7536B0 blocks remaining, 35.5% complete (will START OVER if reset)
[2016-09-09 13:09:33] Rebuild progress U01 D006: 0x12219E6B0 blocks remaining, 37.8% complete (will START OVER if reset)
[2016-09-09 13:19:35] Rebuild progress U01 D006: 0x117BE16B0 blocks remaining, 40.0% complete (will START OVER if reset)
[2016-09-09 13:29:36] Rebuild progress U01 D006: 0x10D8BF6B0 blocks remaining, 42.2% complete (will START OVER if reset)
[2016-09-09 13:39:37] Rebuild progress U01 D006: 0x103661EB0 blocks remaining, 44.4% complete (will START OVER if reset)
[2016-09-09 13:49:38] Rebuild progress U01 D006: 0xF9719EB0 blocks remaining, 46.5% complete (will START OVER if reset)
[2016-09-09 13:59:38] Rebuild progress U01 D006: 0xEF8736B0 blocks remaining, 48.6% complete (will START OVER if reset)
[2016-09-09 14:09:40] Rebuild progress U01 D006: 0xE5B566B0 blocks remaining, 50.7% complete (will START OVER if reset)
[2016-09-09 14:19:40] Rebuild progress U01 D006: 0xDBFF36B0 blocks remaining, 52.8% complete (will START OVER if reset)
[2016-09-09 14:29:41] Rebuild progress U01 D006: 0xD27086B0 blocks remaining, 54.9% complete (will START OVER if reset)
[2016-09-09 14:39:42] Rebuild progress U01 D006: 0xC8FC9EB0 blocks remaining, 56.9% complete (will START OVER if reset)
[2016-09-09 14:49:43] Rebuild progress U01 D006: 0xBFAC26B0 blocks remaining, 58.9% complete (will START OVER if reset)
[2016-09-09 14:59:44] Rebuild progress U01 D006: 0xB679E6B0 blocks remaining, 60.9% complete (will START OVER if reset)
[2016-09-09 15:09:45] Rebuild progress U01 D006: 0xAD4C16B0 blocks remaining, 62.8% complete (will START OVER if reset)
[2016-09-09 15:19:46] Rebuild progress U01 D006: 0xA4710EB0 blocks remaining, 64.7% complete (will START OVER if reset)
[2016-09-09 15:29:47] Rebuild progress U01 D006: 0x9B94AEB0 blocks remaining, 66.6% complete (will START OVER if reset)
[2016-09-09 15:39:49] Rebuild progress U01 D006: 0x92E556B0 blocks remaining, 68.5% complete (will START OVER if reset)
[2016-09-09 15:49:49] Rebuild progress U01 D006: 0x8A701EB0 blocks remaining, 70.3% complete (will START OVER if reset)
[2016-09-09 15:59:50] Rebuild progress U01 D006: 0x81FA46B0 blocks remaining, 72.1% complete (will START OVER if reset)
[2016-09-09 16:09:51] Rebuild progress U01 D006: 0x799F96B0 blocks remaining, 73.9% complete (will START OVER if reset)
[2016-09-09 16:19:51] Rebuild progress U01 D006: 0x715E6EB0 blocks remaining, 75.7% complete (will START OVER if reset)
[2016-09-09 16:29:52] Rebuild progress U01 D006: 0x6942DEB0 blocks remaining, 77.4% complete (will START OVER if reset)
[2016-09-09 16:39:53] Rebuild progress U01 D006: 0x615F76B0 blocks remaining, 79.1% complete (will START OVER if reset)
[2016-09-09 16:49:54] Rebuild progress U01 D006: 0x59AB66B0 blocks remaining, 80.8% complete (will START OVER if reset)
[2016-09-09 16:59:54] Rebuild progress U01 D006: 0x51F5FEB0 blocks remaining, 82.5% complete (will START OVER if reset)
[2016-09-09 17:09:55] Rebuild progress U01 D006: 0x4A93F6B0 blocks remaining, 84.0% complete (will START OVER if reset)
[2016-09-09 17:19:56] Rebuild progress U01 D006: 0x43347EB0 blocks remaining, 85.6% complete (will START OVER if reset)
[2016-09-09 17:29:57] Rebuild progress U01 D006: 0x3C1F06B0 blocks remaining, 87.1% complete (will START OVER if reset)
[2016-09-09 17:39:57] Rebuild progress U01 D006: 0x3524CEB0 blocks remaining, 88.6% complete (will START OVER if reset)
[2016-09-09 17:49:59] Rebuild progress U01 D006: 0x2E298EB0 blocks remaining, 90.1% complete (will START OVER if reset)
[2016-09-09 18:00:00] Rebuild progress U01 D006: 0x27628EB0 blocks remaining, 91.6% complete (will START OVER if reset)
[2016-09-09 18:10:01] Rebuild progress U01 D006: 0x20C1F6B0 blocks remaining, 93.0% complete (will START OVER if reset)
[2016-09-09 18:20:02] Rebuild progress U01 D006: 0x1A3986B0 blocks remaining, 94.4% complete (will START OVER if reset)
[2016-09-09 18:30:04] Rebuild progress U01 D006: 0x1400E6B0 blocks remaining, 95.8% complete (will START OVER if reset)
[2016-09-09 18:40:06] Rebuild progress U01 D006: 0xDCABEB0 blocks remaining, 97.1% complete (will START OVER if reset)
[2016-09-09 18:50:08] Rebuild progress U01 D006: 0x7BFEEB0 blocks remaining, 98.4% complete (will START OVER if reset)
[2016-09-09 19:00:10] Rebuild progress U01 D006: 0x1E076B0 blocks remaining, 99.6% complete (will START OVER if reset)
[2016-09-09 19:03:25] Ending rebuild U01 D006 status=0x0
[2016-09-09 22:02:13] Logical drive U01 has completed a surface analysis pass.

I'm not an HPE Employee
Kudos and Accepted Solution banner
acrossley
Visitor

Re: Smart Array Ready for Recovery - but recovery not starting

Thanks - that seems to have solved it.

I followed the F2 option at boot up and several layers down behind the information message was an obscure option which when selected set off the rebuild. We're now at 1.78% so hopefully finished by next week.

For anyone coming across this ... all the "you don't need to do anything it just does it itself" posts out there didn't work for me. It did need manually initiating via the F2 option that comes up once it loads the smart array at boot up (well before Windows).

acrossley
Visitor

Re: Smart Array Ready for Recovery - but recovery not starting

Thanks for the analysis on the smart array output. You had picked up and identified some interesting issues:

Both drive failures occurred on hot days. The server isn't in an air conditioned room and this obviously needs looking at.

The drives aren't HP, they are Seagate Constellation and from my experience of 2 out of 8 failing within 18 months I would look for something like the WD range if, like me, you want something cheaper than HP.

 

parnassus
Honored Contributor

Re: Smart Array Ready for Recovery - but recovery not starting

An high temperature environment (where "high" refers to a temperature well over suggested operating thermal range - which could be say something between 10┬░C to 35┬░C - for selected rotational drives) is quite dangerous with regard to expected HDDs lifespan...

Regarding the price for genuine HDD...you're right, non genuine HPE HDD are "cheap" considering a one-to-one comparison which accounts for storage capacity only (4TB versus 4TB)...but if you start considering their failure rate, at suggested operating condition, then "expensive" HPE drives can be considered cheap too in the long term...plus they benefit of custom HPE Firmware and strict product selections and, if I recall correctly, support for Smart Carrier disks (for Servers that support them).

In your case a possible alternative to Seagate Constellation ES.3 4TB SATA 3,5" would be a genuine HPE disk like the HPE 4TB 6G SATA 7.2k RPM LFF (3.5") Non-Hot Plug Midline drive with 1 Year Warranty (SKU: 801888-B21), Non-Hot Plug because, even if you are dealing with HPE Dynamic Smart Array B140i Controller which supports both Hot Plug and Non-Hot Plug configurations, Non-Hot Plug disks generally cost less...


I'm not an HPE Employee
Kudos and Accepted Solution banner