MSA Storage

MSA 2040 Error Disk

 
robertoaxity
Occasional Advisor

MSA 2040 Error Disk

Hello,

I have 3 disk with error in one pool, and i have 2 disk (Dedicated Spare) but the Pool not change thisk with the bad disk.

So i dont understand what i need to do, i need change the spare disk with the bad disk manually (put out and put in) or i need use an option to do that?

Can you help please, the pool are un critical state.

https://imgur.com/nrTlOAD

https://imgur.com/xA6CadC

https://imgur.com/10DQ92r

Regards.

10 REPLIES 10
Dardan
Valued Contributor

Re: MSA 2040 Error Disk

Hi,
Some questions out of curiosity.
What firmware is your MSA running on? Did the disks fail all at the same day/time?
I had a similar issue several disks being marked as Leftovers - couldn't find any reasonable root cause.
Thnx,
Dardan

___________
Hit the Kudo's button to show appreciation or mark as solution if your question was answered.
robertoaxity
Occasional Advisor

Re: MSA 2040 Error Disk

Hello,

Version is GL210R004.

I dont know, because the disk are in the state a few months ago.

I have two pool, one pool with 3 disk in that state Degraded (LEFTOVR), the pool 2 is ok, and have 2 disk (Dedicated Spare) so i dont understand why not the storage use this disk Spare to rebuild or replace the disk degraded.

Sorry i dont know much the storage.

Regards

Re: MSA 2040 Error Disk

First of all you need to understand MSA architecture. Try below docs to get idea about MSA 2040 product,

SMU guide -> https://support.hpe.com/hpesc/public/docDisplay?docId=c04220794

CLI guide -> https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c03791989

User Guide -> https://support.hpe.com/hpesc/public/docDisplay?docId=c03792322

You need to provide few details,

>> Is this array configured as Linear Array or Virtual Array?

>> If configured as Virtual Array then there is no concept of dedicated spare.

>> If you are not sure then please provide below command outputs (mask or hide serial numbers from the output before posting here and it shouldn't be visible in public),

show pools

show disk-groups

show vdisks

show disks

If you find all Vdisk or disk-group state shows as FTOL then you should check for hardware or medium errors for those LEFTOVER drives from logs or events. If no hardware error exists then you can clear metadata and re-configure them as spare drive.

If any of the vdisk or disk-group shows as offline ot QTOF then don't take any action and contact HPE Support

You can also follow below steps and take help,

  1. Download MSA Log File from the MSA Storage Management Utility (SMU)
  2. Upload MSA Log File in the MSA Health Check website
  3. Review Results by clicking through the tabs and saving the PDF report
  4. Take Action and start improving your MSA availability

There is no cost to use the MSA Health Check.   Upload your log file today and check the health status of your MSA.

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

*********************************************************************


I work for HPE
Accept or Kudo
robertoaxity
Occasional Advisor

Re: MSA 2040 Error Disk

Hello;

Pool 1, name vd0001 is Raid50, class Linear, status Critical, 12 Disk, 3 Disk Degraded (the disk may containt invalid metadata)

Pool 2, name vd0002 is Raid5, class Lineas, status FTOL, 10 disk, 2 Disk Dedicated SP (disk group vd0002)

So how i can change one or 2 disk Dedicated SP (vd0002) to pool 1 (vd0001) to replace degraded disk or i need to physically remove the disk (Dedicated SP) and the connect it to the bad disk slot?

Can you help please

Regards

Re: MSA 2040 Error Disk

You need to first understand below vdisk condition,

"Pool 1, name vd0001 is Raid50, class Linear, status Critical, 12 Disk, 3 Disk Degraded (the disk may containt invalid metadata)"

RAID 50 is a combination of RAID 5 (striping and error correction) and RAID 0 (striping) where RAID 5 sub-arrays are striped together. So if you lose two disks in the same RAID 5 sub-array your data is lost but if you lose 1 disk in each sub-array then your data is intact.

Now in your case vd0001 made with 12 disks so I am assuming each sub-vdisk created with 6 drives. So technically from both 6 drives sub-vdisk if 1 drive failed you still have data access and your vd0001 is alive. You have mentioned 3 drives degraded but still vd0001 shows Critical instead of offline or QTOF which means there was some dedicated spare configured i assume which is also in degraded state. So clarity on your full setup is very important otherwise troubleshooting will be difficult.

So you should verify why 3 drives went into LEFTOVER state and if they have any hardware errors. If so then you may have to replace them. If they don't have any hardware errors then you can clear metadata and re-use them again. Still I would recommend to capture MSA log and verify with HPE support as you have multiple drive degraded situation.

Coming to your other query, yes you can remove dedicated spare from vd0002 and create new dedicated spare for vd0001. You can refer command line guide and refer command "remove spares" and "add spares". 

My recommendation would be don't add any spare to vd0001 without fixing LEFTOVER drives otherwise it make situation more worse.

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

***********************************************************************


I work for HPE
Accept or Kudo
robertoaxity
Occasional Advisor

Re: MSA 2040 Error Disk

Unfortunately we do not have support from HP, sorry for asking so many questions I am new to this storages, therefore you recommend me to perform the action "clear metadata" on the degraded disks that corresponds to the vd001? Would this not erase any data from the disks? is that the environment is productive and I don't want a disaster to happen.

How i see if the disk have hardware error? i only have remote access.

https://ibb.co/7G5nP5Q

https://ibb.co/6J57f7S

https://ibb.co/FsQNjMg

The 3 Disk have the same message.

I follow the https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=emr_na-a00046239en_us

And i see:

Step 4: Is there more than one drive in a leftover state?

  • Yes - If there is more than one disk in a leftover state the Vdisk is marked as Degraded or OFFL. Data is at risk in this condition. Extreme care should be taken to preserve data integrity. Do not clear disk metadata when a Vdisk is in a Degraded or OFFL state to try to begin a rebuild or in the hope the rebuild will begin from a Spare. Contact HPE support or the next level of HPE support for further assistance.So can you help me? or i need change Dedicated Spare (vd0002) to (vd0001), or i need buy new disk?

Thanks very much and Regards.

 

Shawn_K
HPE Pro

Re: MSA 2040 Error Disk

Hello,

If you clear the metadata of a drive it removes all data from the drive. Taking this step should only be done on drives that have not gone into a leftover state due to drive errors. 

You will need to review your Event logs to determine why the drives went into a leftover state. If the drives had media errors, you will see Event messages from the drives stating the error type. UREs or Smart Trips are indications to never reuse that drive.

You should also be taking a back up of your data before performing any further troubleshooting on your own. Another option would be to purchase support from HPE for a limited time in order to recover your Vdisk. A log review is needed in order to recover your Vdisk without risking loss of data.

Cheers,
Shawn

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.


I work for HPE

Accept or Kudo

Re: MSA 2040 Error Disk

Login to any of the Controller Command line and run below command,

show events both detail

show events both error

Save above command line outputs to some nodepad (better in Notepad++) then search for the serial number of the drives which are in LEFTOVER state. That will give information of if any of the LEFTOVER drive having hardware error.

You haven't confirmed yet that vd0001 had dedicated spare configured or not because for RAID50 it's not possible that after 3 drive failed still your vdisk is up.

It's always recommended to take backup and do any further troubleshooting.

Clear metadata doesn't mean you will have data loss. It depends on which RAID your vdisk is configured. For example if your vdisk configured with RAID5 then more than one drive failed your vdisk will be down and chance of data loss but in this case vd0001 configured with RAID50 which means per sub-vdisk one drive fail allowed and you should have data access. 3rd drive failure will bring down the vdisk for RAID50.

You can perform clear metadata on LEFTOVER drive only in case of no hardware error exist on those drives otherwise you need to replace them.

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

***********************************************************************


I work for HPE
Accept or Kudo
robertoaxity
Occasional Advisor

Re: MSA 2040 Error Disk

Hello,

I use the command of events:

Link from txt file

https://intellego365-my.sharepoint.com/:f:/g/personal/roberto_floresq_axity_com/EoOdyQJZ4e9Oi5M2Uo3GtNwBSjIIWxfGpfbENf9RkgzglA?e=Ee89OD

 

 

Can you help me please

Thanks all for your hep.

Regards