MSA Storage

Re: 2050 disk error detected

 
sashavl
Regular Visitor

2050 disk error detected

Hi hp community, 

we have couple 2040/2050 MSA systems, and of the 2050 yesterday started reporting "disk error detected" for one disk. We are receiving tens of emails per day like this:

An event was reported by a disk drive. (disk: channel: 0, ID: 4, SN: WFK0GCTB, enclosure: 1, slot: 5) (Key,Code,Qual,UEC:0x1,0xB,0x5,0x0000) (CDB:Rd 8191e000 0400)(CmdSpc:0x0, FRU:0x0, SnsKeySpc:0x0)(Recovered Error, no decode for ASC/ASCQ)

and 

 

An event was reported by a disk drive. (disk: channel: 0, ID: 4, SN: WFK0GCTB, enclosure: 1, slot: 5) (Key,Code,Qual,UEC:0x1,0x17,0x2,0x0000) (CDB:Rd 723ae6b8 0008)(Info:0x723AE6BC)(CmdSpc:0x81000626, FRU:0x1, SnsKeySpc:0x22)(Recovered Error, recovered data with positive head offset)

We also had couple of (2 by now):

Disk channel event. (channel: 0, ID: 4, SN: WFK0GCTB, enclosure: 1, slot: 5): I/O Timeout  CDB:Wr 7259b800 0008

and

An error was reported by a disk drive. (disk: channel: 0, ID: 4, SN: WFK0GCTB, enclosure: 1, slot: 5) (Key,Code,Qual,UEC:0x3,0x11,0x0,0x0000) (CDB:Rd 6fc93800 0048)(Info:0x6FC93809)(CmdSpc:0x8103E7FF, FRU:0x85, SnsKeySpc:0xF1)(Medium Error, unrecovered read error)

A bad block was corrected by the drive after the controller wrote the block. LBA: 0x6FC93809, (disk: channel: 0, ID: 4, enclosure: 1, slot: 5)

Is this only the bad disk problem and shloud we replace it right now or to wait to fully fail (it's marked heatly still) ? What are your suggestion ?

 

update: well the disk failed and disk group went to critical state (it's raid10), so i guess next step would be replacing it

 

Thank you all

3 REPLIES 3
sbhat09
HPE Pro

Re: 2050 disk error detected

Hello @sashavl,

This error ID:4 is fine as long as the bad block is getting corrected automatically.
It is (Error ID:4) just an information alert.

If it is too repetitive, then you may have to replace the disk.

Regards,
Srinivas Bhat

If you feel this was helpful please click the KUDOS! thumb below!
Note: All of my comments are my own and are not any official representation of HPE.


I am an HPE Employee

Accept or Kudo

JonPaul
HPE Pro

Re: 2050 disk error detected

@sashavl 
Drives which start going bad don't get better.
This drive may stop having bad blocks and be perfectly fine for operation for a while but it might also report more errors on every full drive read.
You should be prepared to replace this drive, if you have a spare already allocated then when the drive dies it will reconstruct.  If you don't have a spare already you should contact HPE support and see about at least getting a drive to have on hand if the drive fails completely.
You are getting errors each day likely because the Disk-Group scrub is reading the entire disk about daily (default) and it is finding more bad blocks on each time through.
Keep the disk group scrub on so that bad blocks don't just sit there silently.
Also consider using the HPE HealthCheck to validate this and other best practices:  www.hpe.com/storage/msahealthcheck
It's free and can point out other best practices such as security.

I work for HPE
sashavl
Regular Visitor

Re: 2050 disk error detected

@JonPaul 

The disk failed that same afternoon, so we are going to replace it anyhow