MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA 2040 disk spare failed

 
Nagios73
Frequent Visitor

MSA 2040 disk spare failed

Hi,

I have MSA 2040. I have disk damage and I follow this description.

https://www.stephenwagner.com/2017/01/27/hpe-msa-2040-disk-failure-considerations-and-steps/

I do what is described in point two and I have the disk as spare for 15 seconds after confirmation and disk becomes a failed.

Why MSA doesn't accept as spare the disk and marks it as damaged?
Replacing for another disk and the same problem - failed.

Logs:

Health                     Fault
Health Reason       The disk has a probable hardware failure.
Health Recommendation
- Replace the disk with one of the same type (SAS SSD, enterprise SAS, or midline SAS) and the same or greater capacity. For continued optimum I/O performance, the replacement disk should have performance that is the same as or better than the one it is replacing.

Event ID           Code            Message

 B5794                314            There is a problem with a FRU. (FRU type: disk, enclosure: 1, device ID: 13, vendor: HP , product ID: EH0300FBQDD , SN: 6XN4R9AL0000B409FBPW, version: HPD3, related event serial number: B5793, related event code: 62)

B5793                  62               A spare disk drive failed. The disk was a dedicated spare for a vdisk. (disk: channel: 0, ID: 13, SN: 6XN4R9AL0000B409FBPW, enclosure: 1, slot: 14) (vdisk: vd02, SN: 00c0ff1b22d6000054908c5300000000)
Additional Information:
None.
Recommended Action:
- Replace the disk with one of the same type (SAS SSD, enterprise SAS, or midline SAS) and the same or greater capacity. For continued optimum I/O performance, the replacement disk should have performance that is the same as or better than the one it is replacing.
- If the failed disk was a global spare, configure the new disk as a global spare.
- If the failed disk was a dedicated spare, configure the new disk as a dedicated spare for the same vdisk.

 

 

10 REPLIES 10

Re: MSA 2040 disk spare failed

What is Controller firmware version running ?

How many total drive present in this system?

How many vdisks present in this system?

Did you verified if any other drive part of same vdisk vd02 having hardware errors ? It's possible that other drive having issue not allowing reconstruction to get start or getting completed for 1.14 and due to that drive at location 1.14 getting failed again and again.

Other possibility would be issue with the slot itself.

If you have multiple vdisk then you can take downtime, stop host IO and then shutdown the system. Then move some working drive which is already part of some other vdisk like vd01 to slot 1.14 or in other words swap the location of the drive. Then start the MSA again and check after movement if this drive can sustain in 1.14 location or not. If not then issue with the slot for which you need to replace the entire chassis. If after movement drive got sustained and not failed then you need to troubleshoot vd02 and figure out if any other drive having issue or not.

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************


Accept or Kudo
Nagios73
Frequent Visitor

Re: MSA 2040 disk spare failed

Hi,

Firmware wersion is GL101R001.

Total drive in system is 24.

Vdisks in system is 3 -vd01, vd02, vd03.

I verified and in vd02 not have hardware error any other drive.

This solution with move some working drive for me is very difficult and risk. The vdisk from which the disk move to 1.14 will may be damaged.
In vd01 I have one disk spare and this working spare maybe move to vd02 to 1.14.
What do you think about it?
Can such a test be done on a running MSA or do I have to shut down MSA?

Re: MSA 2040 disk spare failed

Yes you can move any spare drive from one location to other location and for that no downtime required. Our objective is to identify if any issue with slot 1.14

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

*************************************************************************


Accept or Kudo
Nagios73
Frequent Visitor

Re: MSA 2040 disk spare failed

I moved spare disk from workin slot to 1.14. Disk filed after moving in slot 1.14 

Second problem is that when I moved disk spare to his orginal slot he failed. How to reset disk to factory reset and make him again a spare.

It looks like slot 1.14 is damaged.

Re: MSA 2040 disk spare failed

At this point it looks like slot 1.14 is bad for which you may have to replace the Chassis.

If the drive back to it's original slot and shows as failed then only option is to physically re-seat the drive to get proper state of the drive. If you have any other system then you can check this drive health there as well. If possible take downtime and do a proper power cycle of the entire system. Give it a last try so that if slot 1.14 and other drive get fixed. It's very important that you update all firmware for all of the components as well.

There are few more things, Controller firmware is very old and inactive as below you can see,

firmware.JPG

 

I don't know the drive model also and in which firmware drives are running. 

We can avoid many issues by updating firmwares time to time.

You can check the below links to get all firmwares,

https://h41111.www4.hpe.com/storage/msafirmware.html

https://support.hpe.com/hpsc/doc/public/display?docLocale=en_US&docId=emr_na-a00041326en_us

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04315770&lang=en-us&cc=us

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

***********************************************************************


Accept or Kudo
Nagios73
Frequent Visitor

Re: MSA 2040 disk spare failed

How to re-seat the drive to get proper state of the drive.?

In another system disk works, is ok.

If I replace the chassis the configuration array will be lost? I have two controllers.

Re: MSA 2040 disk spare failed

You just have to physically remove the drive from the slot. Half of the slot or bay is enough. Give 1min and then put it back.

Then check in CLI and SMU about the state of the drive.

As you have mentioned this drive looks fine in other system so I would suggest you to try power cycle of the MSA before you go for Chassis replacement.

Regarding Chassis replacement, you should have everything intact. Only there may be chance of losing mapping information. All vdisk and volumes information available and saved in drive metadata. As there is no controller getting replaced then all should be fine. You can anyway capture MSA log before go for any kind of replacement activity.

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

*********************************************************************


Accept or Kudo
Shawn_K
HPE Pro

Re: MSA 2040 disk spare failed

Hello,

If the disk works in another system then there are a couple of steps you can take.

Both your array and your drives need upgrading. There are several drive types that have a critical firmware upgrade that prevents a premature failure from occurring. You need to get those drives upgraded quickly! Your array is also running a version of code that has several improvements in drive error handling and reporting.

If you have another system that works with the drive you should review the drive errors in that system before continuing to swap the drive back into the original system. It sounds as if that drive might have thrown errors. WIthout a full log review it is impossible to determine whether the drive has predictive failure or UREs and how close the drive is to failure.

Cheers,
Shawn

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.

Accept or Kudo

Nagios73
Frequent Visitor

Re: MSA 2040 disk spare failed

Hi, Remove drive from the slot and wait 1 min and put in back nothing change. Disk still failed. In MSU state drive is failed.