- Community Home
- >
- Storage
- >
- Entry Storage Systems
- >
- Disk Enclosures
- >
- Remapping Media Errors when Rebuilds Fail
Disk Enclosures
1748289
Members
3206
Online
108761
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-29-2002 08:26 PM
тАО07-29-2002 08:26 PM
Remapping Media Errors when Rebuilds Fail
Hello All,
I wanted to get some opinions as this:
Scenario is a hardware Raid in a NetServer With a NetRaid Controller running Raid 5 with 3 or more hdds.
Ch 0
ID 0 Online A0-0 10 media errors
ID 1 Online A0-1 no errors
ID 2 Failed A0-2 no errors
ID 2 is marked as failed and the rebuild of ID 2 fails at the 30% on multiple replacement hdds. Upon closer inspection one of the hdds that is still in an Online State, ID 0 has 10 Media Errors.
Some techs recommended to run a disk verify off of a Symbios or Adaptec SCSI controller and remap the bad blocks (Also showing at 30%). Once remapping is complete the rebuild will complete 100%. This resolves the failed rebuild issue.
However the question is what is the status of the data on the bad blocks?? If the data was corrupt on the bad blocks are you introducing corruption into the array because you essentially have two failed blocks in the Data Stripe?? If the data is ok you should be ok but how do you know? Any thoughts on this??
Is there the possibility of corruption of the backups themself as well? Also just looking for thoughts as to why the hdd with the media errors stays in an offline state and another hdd is being marked as failed.
Cheers,
Greg
P.S. I had also posted this under NetServers but have only had one reply..
I wanted to get some opinions as this:
Scenario is a hardware Raid in a NetServer With a NetRaid Controller running Raid 5 with 3 or more hdds.
Ch 0
ID 0 Online A0-0 10 media errors
ID 1 Online A0-1 no errors
ID 2 Failed A0-2 no errors
ID 2 is marked as failed and the rebuild of ID 2 fails at the 30% on multiple replacement hdds. Upon closer inspection one of the hdds that is still in an Online State, ID 0 has 10 Media Errors.
Some techs recommended to run a disk verify off of a Symbios or Adaptec SCSI controller and remap the bad blocks (Also showing at 30%). Once remapping is complete the rebuild will complete 100%. This resolves the failed rebuild issue.
However the question is what is the status of the data on the bad blocks?? If the data was corrupt on the bad blocks are you introducing corruption into the array because you essentially have two failed blocks in the Data Stripe?? If the data is ok you should be ok but how do you know? Any thoughts on this??
Is there the possibility of corruption of the backups themself as well? Also just looking for thoughts as to why the hdd with the media errors stays in an offline state and another hdd is being marked as failed.
Cheers,
Greg
P.S. I had also posted this under NetServers but have only had one reply..
Lets Roll!
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2002 09:40 AM
тАО07-30-2002 09:40 AM
Re: Remapping Media Errors when Rebuilds Fail
A successful remapping of a bad block indicates that the data was successfully read from the bad block or reconstructed with the use of a CRC, and written to another area of the media.
If the data was too badly corrupted, the remapping would have failed.
The hdd with the media error can be kept online because media errors are normally recoverable. Non-recoverable errors cause failed drives. Some RAID controllers are more conservative than others, which means that a different RAID controller may have failed the drive. (for example, XP Disk Arrays are very conservative, and will fail any drive with media errors.)
I would replace the drive with the media failure anyway, if I were you - it is usually an indication that the drive is going to fail soon.
Good luck!
If the data was too badly corrupted, the remapping would have failed.
The hdd with the media error can be kept online because media errors are normally recoverable. Non-recoverable errors cause failed drives. Some RAID controllers are more conservative than others, which means that a different RAID controller may have failed the drive. (for example, XP Disk Arrays are very conservative, and will fail any drive with media errors.)
I would replace the drive with the media failure anyway, if I were you - it is usually an indication that the drive is going to fail soon.
Good luck!
No matter where you go, there you are.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-30-2002 08:03 PM
тАО07-30-2002 08:03 PM
Re: Remapping Media Errors when Rebuilds Fail
The BIOS level disk verification performed by Adaptec and Symbios SCSI controllers does not care about data. It only cares that each block on the disk drive can be read.
If a given block can not be read; i.e., media errors are encountered, then that block address is remapped to one of the spare blocks on the drive.
So, yes, it is possible that taking the drive with media errors from your degraded RAID to a PC with an Adaptec or Symbios controller will cause data corruption for those blocks that get remapped by the verification process. The replacement disk block will not have data from the old, now superceded, RAID-formatted disk block.
From the sound of it, I would guess that the media error(s) encountered at the 30% threshold are preventing the successful rebuild of the RAID.
If you have irreplaceable data on this RAID, you might consider changing the retry thresholds for the disk drive first, and see if you can recover the RAID. This typically requires SCSI tools unavailable to the normal end user, such as SCSI Toolbox.
Otherwise, do the offline verify, return the drive to the RAID system, and hope for the best.
By the way, the verification process will display which disk blocks are being remapped. It is (remotely) possible for you to track these blocks into the file system to see which file(s) and/or directories are affected by the now corrupt data.
If a given block can not be read; i.e., media errors are encountered, then that block address is remapped to one of the spare blocks on the drive.
So, yes, it is possible that taking the drive with media errors from your degraded RAID to a PC with an Adaptec or Symbios controller will cause data corruption for those blocks that get remapped by the verification process. The replacement disk block will not have data from the old, now superceded, RAID-formatted disk block.
From the sound of it, I would guess that the media error(s) encountered at the 30% threshold are preventing the successful rebuild of the RAID.
If you have irreplaceable data on this RAID, you might consider changing the retry thresholds for the disk drive first, and see if you can recover the RAID. This typically requires SCSI tools unavailable to the normal end user, such as SCSI Toolbox.
Otherwise, do the offline verify, return the drive to the RAID system, and hope for the best.
By the way, the verification process will display which disk blocks are being remapped. It is (remotely) possible for you to track these blocks into the file system to see which file(s) and/or directories are affected by the now corrupt data.
A journey of 1000 steps ends in a mile.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP