MSA Storage
1819828 Members
2974 Online
109607 Solutions
New Discussion юеВ

Degraded disk group

 
DmitriyZ
Occasional Advisor

Degraded disk group

Hi all,

we have a MSA 2040 SAN and facing an issue with it:

There is a problem with a FRU. (FRU type: disk, enclosure: 1, slot: 10, device ID: 10, vendor: HP , product ID: EG1200JEHMC , SN: info erased, version: HPD3, related event serial number: B1126, related event code: 55)

Disk somehow in "Healthy" state, but group not.

The disk group is not fault tolerant. Reconstruction cannot start because there is no spare disk available of the proper type and size.

- Replace the disk with one of the same type (SAS SSD, enterprise SAS, or midline SAS) and the same or greater capacity. For continued optimum I/O performance, the replacement disk should have performance that is the same as or better than the one it is replacing.

Disk Group is "Degraded" and in CRIT state, also we have 3 disks in "Global Spare", but reconstruction has not started. Could you advise what need to be done?
Pools.1623229451.png

Disc1.1623229543.png

16 REPLIES 16

Re: Degraded disk group

@DmitriyZ 

It looks to me dgA02 was created with 12 drives but not sure how many available now. From screenshot right now 10 drives visible but may be other two drives or 1 drives got hide which need to scroll down/up to get visibility. At least from the screenshot it looks like two drives missing.

Please provide screenshots of all 3 Global Spares the way you have given for 1.10

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************

 



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DmitriyZ
Occasional Advisor

Re: Degraded disk group

@SUBHAJIT KHANBARMAN_1 

Thanks for your answer!

I don't really know how was created dgA02. Here is screenshots with whole dgA02 and all disks.

Pools.pngSystem.png

 

I'm agree that group was created with 12 disks, but how to add it back or assign spare to it? What do you mean "Please provide screenshots of all 3 Global Spares the way you have given for 1.10", what I should do to provide this info?
Thank you again!

Re: Degraded disk group

@DmitriyZ 

First take downtime and go for rescan option because IOs will be halt for few seconds.

If this also not help then try rebooting Controller A

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DmitriyZ
Occasional Advisor

Re: Degraded disk group

@SUBHAJIT KHANBARMAN_1 

This rescan option? 

Spares.png

How to reboot controller A? I don't see such option in menu...

DmitriyZ
Occasional Advisor

Re: Degraded disk group

After rescan got these messages:

A.pngB.png

Re: Degraded disk group

@DmitriyZ 

It looks like 1.10 having issue as well.

What is the status of dgA02 now ?

Do you see any rebuild progress bar now ?

To perform a Controller restart if required,
1. Perform one of the following:
яВв In the banner, click the system panel and select Restart System.
яВв In the System topic, select Action > Restart System.
The Controller Restart and Shut Down panel opens.
2. Select the Restart operation.
3. Select the controller type to restart: Management or Storage.
4. Select the controller module to restart: Controller A, Controller B, or both.
5. Click OK. A confirmation panel appears.
6. Click Yes to continue. Otherwise, click No. If you clicked Yes, a message describes restart activity.

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo

Re: Degraded disk group

@DmitriyZ 

It's always suggested to have valid data backup before you try Controller restart or any other operation. I hope before you try anything you already taken data backup as best practice.

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DmitriyZ
Occasional Advisor

Re: Degraded disk group

@SUBHAJIT KHANBARMAN_1 

We have backup for all VMs running on this storage, but anyway (in normal) I'm able to restart controllers (one after another) without any issue?

Group still degraded:
Pools.png

 

Maybe somehow I'm able to add disks to existing group? Cause number of disks is 12, but actually only 11 are in Up state.

[Moderator edit: Erased the confidential Info.]

DmitriyZ
Occasional Advisor

Re: Degraded disk group

Dear @SUBHAJIT KHANBARMAN_1 

I have rebooted  both controllers and still no changes, no jobs started and disk group still dergaded

Re: Degraded disk group

@DmitriyZ 

Can you please provide the picture of all Global Spares by hovering your mouse pointer as marked below on top of each Global Spare so that drive details will pop-up like below one,

drive1.10.JPG

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
JonPaul
HPE Pro

Re: Degraded disk group

Have you run your logs through the MSA HealthCheck?   https://www.hpe.com/storage/MSAHealthCheck

I notice your drive firmware is downlevel this is not causing this problem.
Have you tried setting the spares back to available and then back again to SPARE?
Are the SPARE drives suitable replacements?  Same or larger size and same type of drive?
One of the updated best practices (see HealthCheck) is to enable Dynamic Sparing which will also take AVAIL drives and use them as spares.

I work for HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
DmitriyZ
Occasional Advisor

Re: Degraded disk group

@SUBHAJIT KHANBARMAN_1 

1.5.png

1.13.png

1.24.png

 

@JonPaul 

HealthCheck:

Status   Unhealthy Component   Description   Recommendation

 

 

  Virtual Pool A   The virtual pool is degraded.   - Ensure that spare disks are available. Reconstruction should start automatically. - When the reconstruction is complete, replace the failed disk(s). (Look for event 8 in the event log to determine which disk(s) failed.) - Disk groups that cannot find compatible spares will automatically move data to fault-tolerant components.

 

 

  Disk Group dgA02   The disk group is not fault tolerant. Reconstruction cannot start because there is no spare disk available of the proper type and size.   - Replace the disk with one of the same type (SAS SSD, enterprise SAS, or midline SAS) and the same or greater capacity. For continued optimum I/O performance, the replacement disk should have performance that is the same as or better than the one it is replacing. - Configure the new disk as a spare so the system can start reconstructing the vdisk. - To prevent this problem in the future, configure one or more additional disks as spare disks.

 

Status   Component   Installed Version (Quantity)   Recommended Version (Download Pages)

 

 

  Enclosure 1 - Controllers   GL210R004 (2)   GL225P002-02 (Windows, Linux)

 

 

  Drive Model - EG1200JEHMC   HPD3 (20)   HPD5 (Windows, Linux, FLA)

 

 

  Drive Model - EG1200JEMDA   HPD4 (2)   HPD6 (Windows, Linux, FLA)

 

 

  Drive Model - ST400FM0403   0007 (2)

 

HC.png

 

[Moderator edit: Erased the confidential info]

Re: Degraded disk group

@DmitriyZ 

1.10 - EG1200JEHMC
 1.5 - EG1200JEHMC
1.13 - EG1200JEMDA
1.24 - EG1200JEHMC

It seems no issue with drive models.

Next you can try deleting all Global Spares and make 1,5, 1.13 and 1.24 as AVAIL drives. Then configure Dynamic Spare. Follow below process,

https://support.hpe.com/hpesc/public/docDisplay?docId=c04220794 (page no 183)

With dynamic spares enabled, if a disk fails and you replace it with a compatible disk, the storage system rescans the bus, finds the new disk, automatically designates it a spare, and starts reconstructing the disk group.

If above still didn't work then try configuring only One Global Spare for example, 1.5 or 1.24. Rest two drives leave it as AVAIL drives.

If still it's not working then log a HPE Support case,

https://support.hpe.com/hpesc/public/docDisplay?docId=cep-help_en_us&page=index.html

 

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

***********************************************************************

 

[Moderator edit: Updated the broken link.]



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DmitriyZ
Occasional Advisor

Re: Degraded disk group

@SUBHAJIT KHANBARMAN_1 

I have finished with settings, but no luck with isuue resolving.

We replaced disks with new one model - EG001200JWJNQ. Does it fit with MSA 2040 or incompatible, because still no changes...

Re: Degraded disk group

@DmitriyZ 

It looks to me supported model only as per below supported drive firmware link,

https://support.hpe.com/hpesc/public/swd/detail?swItemId=MTX_e4af5736d2d743bc836ba7bee4

You go ahead and log support case with HPE.

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Imtiaz_Waraich
Occasional Visitor

Re: Degraded disk group

@SUBHAJIT KHANBARMAN_1 

Hi  SUBHAJIT 

 

I have same issue, Disk Group dgB02 - Degraded ,and HPE support advice me reboot the controller, but I have too much workload, production,  can you help for an other cli command run rescan disks or way I can restart Reconstruction  opration.  also advise do I rescan disk channel so it will impact running SAN status.     

 

System Health
Degraded

Reason:
A subcomponent of this component is unhealthy.

Unhealthy Components

Virtual Pool B - Degraded

A virtual disk group is missing one or more disks.

- Ensure that spare disks are available. Reconstruction should start automatically.
- When the reconstruction is complete, replace the failed disk(s). (Look for event 8 in the event log to determine which disk(s) failed.)
- Disk groups that cannot find compatible spares will automatically move data to fault-tolerant components.

Disk Group dgB02 - Degraded

One disk in the RAID disk group failed. Reconstruction cannot start because there is no spare disk available of the proper type and size.

- Replace the disk with one of the same type (SSD, enterprise SAS, or midline SAS) and the same or greater capacity. For continued optimum I/O performance, the replacement disk should have performance that is the same as or better than the one it is replacing.
- Configure the new disk as a spare so the system can start reconstructing the disk group.
- To prevent this problem in the future, configure one or more additional disks as spare disks.
- If the disk group is being expanded, reconstruction will start after the expansion is complete.