ProLiant Servers (ML,DL,SL)
1748134 Members
3665 Online
108758 Solutions
New Discussion

Re: Smart Array E200i stuck on "Ready for rebuild"

 
tjubb
Occasional Contributor

Smart Array E200i stuck on "Ready for rebuild"

ProLiant ML350 G5 running SLES 11 in a RAID5 configuration.

I've been installing new SAS drives in our array, one at a time and waiting for each rebuild successfully before adding the next drive.  The last drive I installed resulted in numerous write errors on the swap-device and services becoming slow and sluggish.  Disable the swap file for the short term has alleviated any slow downs.  However, the array is still showing a status of "ready for rebuild" with no further activity.  It's trying to rebuild the boot partition (/dev/cciss/c0d0).  The HPSMH is showing drive 6 as having 14961 read errors.  That number has not increased since turning off swap.

/var/log/messages only seems to report errors on the "swap-device" and not necessarily on any real data area.  And it's write errors, whereas hpsmh reports read errors.

The drive POST complained about is in 1I:1:3.
but
hpsmh reports read errors on the drive 2I:1:6

Those are the 2 most recent drive swaps I performed.

I have tried booting from a linux livecd and used a DD command to image the drives/partitions.  It was going excruciatingly slow on the boot partition because of the disk errors so I proceeded to image the other drives/partitions.  As we got into the backup process it was going to take 15 hours to complete and then another 15 to restore once the array was reconfigured after removing the faulty drives.  That was not feasible so that planned was scrapped.

SO, at this point, kind of stuck.  Thought about throwing a USB 3.0 expansion card in to increase the throughput of the backups but turns out this server does not respond well to USB 3.0 because of its age.

My question is this:  If I remove the drive (bay 6) that hpsmh is reporting as having read errors while logical drive 1 is stuck on "ready for rebuild" am I going to destroy my raid?  Same goes for the drive that /var/log/messages reports as having write errors, if I remove that drive and install a fresh drive, will I destroy the raid and/or boot partition?

I'm not terribly fluent with linux or raid configuration so let me know if I'm not clear with any of this.   I have a diagnostic file created with hpacucli if that would be helpful.  My other option is to upgrade the server itself due to its age.  Was planning to upgrade servers this year anyway.

---------------------------------------------

details about the controller and drives:


# /opt/compaq/hpacucli/bld/hpacucli ctrl all show config detail
 

Smart Array E200i in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   Serial Number: QT83MP3021    
   Cache Serial Number: --
   RAID 6 (ADG) Status: Disabled
   Controller Status: OK
   Hardware Revision: A
   Firmware Version: 1.86
   Rebuild Priority: Medium
   Expand Priority: Medium
   Surface Scan Delay: 15 secs
   Surface Scan Mode: Idle
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Status Details: A cache error was detected. Run a diagnostic report for more information.
   Cache Ratio: 50% Read / 50% Write
   Drive Write Cache: Disabled
   Total Cache Size: 128 MB
   Total Cache Memory Available: 96 MB
   No-Battery Write Cache: Disabled
   Cache Backup Power Source: Batteries
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: False
 
   Array: A
      Interface Type: SAS
      Unused Space: 0  MB
      Status: OK
      Array Type: Data
 
 
 
      Logical Drive: 1
         Size: 32.0 GB
         Fault Tolerance: RAID 5
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 8224
         Strip Size: 64 KB
         Full Stripe Size: 448 KB
         Status: Ready for Rebuild
         Caching:  Enabled
         Parity Initialization Status: Initialization Completed
         Unique Identifier: 600508B1001032333720202020200002
         Disk Name: /dev/cciss/c0d0
         Mount Points: / 30.0 GB
         OS Status: LOCKED
         Logical Drive Label: AFC929D4QT7BMU0237     4700
         Drive Type: Data
      Logical Drive: 2
         Size: 924.9 GB
         Fault Tolerance: RAID 5
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 64 KB
         Full Stripe Size: 448 KB
         Status: OK
         Caching:  Enabled
         Parity Initialization Status: Initialization Completed
         Unique Identifier: 600508B1001032333720202020200003
         Disk Name: /dev/cciss/c0d1
         Mount Points: None
         OS Status: LOCKED
         Logical Drive Label: AC2929DFQT7BMU0237     5207
         Drive Type: Data
 
      physicaldrive 1I:1:1
         Port: 1I
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 10000
         Firmware Revision: HPDF
         Serial Number: Info erased
         Model: HP      EG0300FAWHV    
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown
 
      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 10000
         Firmware Revision: HPDF
         Serial Number: --
         Model: HP      EG0300FAWHV    
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown
 
      physicaldrive 1I:1:3
         Port: 1I
         Box: 1
         Bay: 3
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 10000
         Firmware Revision: HPDE
         Serial Number: --
         Model: HP      EG0300FAWHV    
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown
 
      physicaldrive 1I:1:4
         Port: 1I
         Box: 1
         Bay: 4
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 146 GB
         Rotational Speed: 10000
         Firmware Revision: HPDF
         Serial Number: --
         Model: HP      DG0146FAMWL    
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown
 
      physicaldrive 2I:1:5
         Port: 2I
         Box: 1
         Bay: 5
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 10000
         Firmware Revision: HPD6
         Serial Number: --
         Model: HP      EG0300FBDSP    
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown
 
      physicaldrive 2I:1:6
         Port: 2I
         Box: 1
         Bay: 6
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 10000
         Firmware Revision: HPDF
         Serial Number: --
         Model: HP      EG0300FAWHV    
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown
 
      physicaldrive 2I:1:7
         Port: 2I
         Box: 1
         Bay: 7
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 10000
         Firmware Revision: HPDF
         Serial Number: --
         Model: HP      EG0300FAWHV    
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown
 
      physicaldrive 2I:1:8
         Port: 2I
         Box: 1
         Bay: 8
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 10000
         Firmware Revision: HPDF
         Serial Number: --
         Model: HP      EG0300FAWHV    
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown
7 REPLIES 7
Harsh_b
HPE Pro

Re: Smart Array E200i stuck on "Ready for rebuild"

Hello, 

I would request to provide / attach an adu report to check the compete array health status and assist you accordigly. 

 

Regards

 

I am an HPE employee
Accept or Kudo
tjubb
Occasional Contributor

Re: Smart Array E200i stuck on "Ready for rebuild"

I didn't see a way to attach the log file here and pasting it into this message exceeds the maximum characters allowed.  I have uploaded the files to my Google Drive here in plain text, xml and html versions:
https://drive.google.com/drive/folders/1f28f9GKshafRBhuCw39zu8TOc-oSIH0N?usp=sharing

Appreciate any insight before I do something catastrophic.

Thanks in advance!

Tom

 

 

Harsh_b
HPE Pro

Re: Smart Array E200i stuck on "Ready for rebuild"

Hello Tom,

i have checked the logs and could see 2 logical drives are configured on Array A, 

LD 1 - Raid 5 -34GB

LD2 - Raid 5- 993GB

0 Physical Drive (146 GB SAS) 1I:1:4
1 Physical Drive (300 GB SAS) 1I:1:3
2 Physical Drive (300 GB SAS) 1I:1:2
3 Physical Drive (300 GB SAS) 1I:1:1
4 Physical Drive (300 GB SAS) 2I:1:8
5 Physical Drive (300 GB SAS) 2I:1:7
6 Physical Drive (300 GB SAS) 2I:1:6
7 Physical Drive (300 GB SAS) 2I:1:5

05-24-2019 07:44:03 Logical Drive Status Rebuild aborted due to read error, logical drive 00000000.
due to Medium Error / Unrecovered Read Error on 2I:1:6

Recommended to replace the drive on 2I:1:6 to start the rebuilding. if still does not rebuild the logical drive then please log a support case to investigate further. 

Disclaimer : Data integrity is customer responsibility. HPE recommends to have data backup frequently and before performing any activity so that it can be restored during disaster recovery

Thanks

 

I am an HPE employee
Accept or Kudo
tjubb
Occasional Contributor

Re: Smart Array E200i stuck on "Ready for rebuild"

Thank you for your reply.  My concern (and my local consultant's) was that removing a drive in this unfinished state could danage the boot partition.

Here is an excerpt from /var/log/messages that seems to report errors on "swap-device" but not necessarily any real data area whereas hpsmh reports read-errors.

The drive POST complained about is in 1|:1:3
but
hpsmh reports read erros on drive 2|:1:6.

 

We had initially begun an image backup of /dev/cciss/c0d0p2 and /dev/cciss/c0d1p1 so that we could swap out the drive in bay 6 in case that destroyed any raid configuration/data in which case we could rebuild from scratch.  However, said backup was going to take 15 hours to backup via USB 2.0 and another 15 hours to restore which was NOT a reasonable amount of server downtime for the business so we aborted that mission.

SO, are you confident, pulling the problematic drive in bay 6 is NOT going to destroy my RAID?  Hesitant to pull drives in an incomplete rebuild state.  My alternate solution was to purchase a new server which is needed soon anyway.  However, a simple drive swap would be a preferable immediate solution.

BTW, I was already in the process of pulling drives and replacing with larger drives.  This is was spurred the error on the array.

 

Thanks!

Tom

Harsh_b
HPE Pro

Re: Smart Array E200i stuck on "Ready for rebuild"

Hello Tom, 

since we do not have complete logs ( IML logs + OS logs) so unable to comment on OS partition level, Based on ADU logs i dont see any error on bay 3, only bay 2i:1:6 (serial no 3SE21B0E00009038VY45) has reported 13485 read error. 

logical drive Rebuild aborted due to read error so it is now queued for rebuilding and i believe once problematic disk is replaced then rebuiling should start and complete as well. 

Affected disk is 300 gb , are you planning to replace all disks with higher capacity or only the affected one ?

I would suggest to log a support case with HPE support center with all logs in case further assistance is required on same. 

Regards

I am an HPE employee
Accept or Kudo
tjubb
Occasional Contributor

Re: Smart Array E200i stuck on "Ready for rebuild"


@Harsh_b wrote:

Hello Tom, 

Affected disk is 300 gb , are you planning to replace all disks with higher capacity or only the affected one ?

I would suggest to log a support case with HPE support center with all logs in case further assistance is required on same. 

Regards


Yes, was in the process of replacing all drives with the last drive yet to be replaced in bay4:
0 Physical Drive (146 GB SAS) 1I:1:4

I will reach out to HPE support if you don't think posting the additional OS and IML logs here would be of any benefit.

 

Thanks

Tom

 

Harsh_b
HPE Pro

Re: Smart Array E200i stuck on "Ready for rebuild"

Hello Tom, 

A support ticket must be logged for further assistance on same if you still have issues with this server. 

Thanks

I am an HPE employee
Accept or Kudo