Degraded disk group

DmitriyZ · ‎06-09-2021

Hi all,

we have a MSA 2040 SAN and facing an issue with it:

There is a problem with a FRU. (FRU type: disk, enclosure: 1, slot: 10, device ID: 10, vendor: HP , product ID: EG1200JEHMC , SN: ~~info erased~~, version: HPD3, related event serial number: B1126, related event code: 55)

Disk somehow in "Healthy" state, but group not.

The disk group is not fault tolerant. Reconstruction cannot start because there is no spare disk available of the proper type and size.

- Replace the disk with one of the same type (SAS SSD, enterprise SAS, or midline SAS) and the same or greater capacity. For continued optimum I/O performance, the replacement disk should have performance that is the same as or better than the one it is replacing.

Disk Group is "Degraded" and in CRIT state, also we have 3 disks in "Global Spare", but reconstruction has not started. Could you advise what need to be done?
Pools.1623229451.png

Disc1.1623229543.png

SUBHAJIT KHANBARMAN_1 · ‎06-10-2021

@DmitriyZ

It looks to me dgA02 was created with 12 drives but not sure how many available now. From screenshot right now 10 drives visible but may be other two drives or 1 drives got hide which need to scroll down/up to get visibility. At least from the screenshot it looks like two drives missing.

Please provide screenshots of all 3 Global Spares the way you have given for 1.10

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

DmitriyZ · ‎06-11-2021

@SUBHAJIT KHANBARMAN_1

Thanks for your answer!

I don't really know how was created dgA02. Here is screenshots with whole dgA02 and all disks.

I'm agree that group was created with 12 disks, but how to add it back or assign spare to it? What do you mean "Please provide screenshots of all 3 Global Spares the way you have given for 1.10", what I should do to provide this info?
Thank you again!

SUBHAJIT KHANBARMAN_1 · ‎06-11-2021

@DmitriyZ

First take downtime and go for rescan option because IOs will be halt for few seconds.

If this also not help then try rebooting Controller A

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

DmitriyZ · ‎06-11-2021

@SUBHAJIT KHANBARMAN_1

This rescan option?

How to reboot controller A? I don't see such option in menu...

DmitriyZ · ‎06-11-2021

After rescan got these messages:

SUBHAJIT KHANBARMAN_1 · ‎06-11-2021

@DmitriyZ

It looks like 1.10 having issue as well.

What is the status of dgA02 now ?

Do you see any rebuild progress bar now ?

To perform a Controller restart if required,
1. Perform one of the following:
 In the banner, click the system panel and select Restart System.
 In the System topic, select Action > Restart System.
The Controller Restart and Shut Down panel opens.
2. Select the Restart operation.
3. Select the controller type to restart: Management or Storage.
4. Select the controller module to restart: Controller A, Controller B, or both.
5. Click OK. A confirmation panel appears.
6. Click Yes to continue. Otherwise, click No. If you clicked Yes, a message describes restart activity.

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

SUBHAJIT KHANBARMAN_1 · ‎06-11-2021

@DmitriyZ

It's always suggested to have valid data backup before you try Controller restart or any other operation. I hope before you try anything you already taken data backup as best practice.

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

DmitriyZ · ‎06-11-2021

@SUBHAJIT KHANBARMAN_1

We have backup for all VMs running on this storage, but anyway (in normal) I'm able to restart controllers (one after another) without any issue?

Group still degraded:

Maybe somehow I'm able to add disks to existing group? Cause number of disks is 12, but actually only 11 are in Up state.

[Moderator edit: Erased the confidential Info.]

DmitriyZ · ‎06-11-2021

Dear @SUBHAJIT KHANBARMAN_1

I have rebooted both controllers and still no changes, no jobs started and disk group still dergaded

SUBHAJIT KHANBARMAN_1 · ‎06-11-2021

@DmitriyZ

Can you please provide the picture of all Global Spares by hovering your mouse pointer as marked below on top of each Global Spare so that drive details will pop-up like below one,

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

JonPaul · ‎06-14-2021

Have you run your logs through the MSA HealthCheck? https://www.hpe.com/storage/MSAHealthCheck

I notice your drive firmware is downlevel this is not causing this problem.
Have you tried setting the spares back to available and then back again to SPARE?
Are the SPARE drives suitable replacements? Same or larger size and same type of drive?
One of the updated best practices (see HealthCheck) is to enable Dynamic Sparing which will also take AVAIL drives and use them as spares.

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

DmitriyZ · ‎06-15-2021

@SUBHAJIT KHANBARMAN_1

@JonPaul

HealthCheck:

Status Unhealthy Component Description Recommendation

Virtual Pool A The virtual pool is degraded. - Ensure that spare disks are available. Reconstruction should start automatically. - When the reconstruction is complete, replace the failed disk(s). (Look for event 8 in the event log to determine which disk(s) failed.) - Disk groups that cannot find compatible spares will automatically move data to fault-tolerant components.

Disk Group dgA02 The disk group is not fault tolerant. Reconstruction cannot start because there is no spare disk available of the proper type and size. - Replace the disk with one of the same type (SAS SSD, enterprise SAS, or midline SAS) and the same or greater capacity. For continued optimum I/O performance, the replacement disk should have performance that is the same as or better than the one it is replacing. - Configure the new disk as a spare so the system can start reconstructing the vdisk. - To prevent this problem in the future, configure one or more additional disks as spare disks.

Status Component Installed Version (Quantity) Recommended Version (Download Pages)

Enclosure 1 - Controllers GL210R004 (2) GL225P002-02 (Windows, Linux)

Drive Model - EG1200JEHMC HPD3 (20) HPD5 (Windows, Linux, FLA)

Drive Model - EG1200JEMDA HPD4 (2) HPD6 (Windows, Linux, FLA)

Drive Model - ST400FM0403 0007 (2)

[Moderator edit: Erased the confidential info]

SUBHAJIT KHANBARMAN_1 · ‎06-15-2021

@DmitriyZ

1.10 - EG1200JEHMC
1.5 - EG1200JEHMC
1.13 - EG1200JEMDA
1.24 - EG1200JEHMC

It seems no issue with drive models.

Next you can try deleting all Global Spares and make 1,5, 1.13 and 1.24 as AVAIL drives. Then configure Dynamic Spare. Follow below process,

https://support.hpe.com/hpesc/public/docDisplay?docId=c04220794 (page no 183)

With dynamic spares enabled, if a disk fails and you replace it with a compatible disk, the storage system rescans the bus, finds the new disk, automatically designates it a spare, and starts reconstructing the disk group.

If above still didn't work then try configuring only One Global Spare for example, 1.5 or 1.24. Rest two drives leave it as AVAIL drives.

If still it's not working then log a HPE Support case,

https://support.hpe.com/hpesc/public/docDisplay?docId=cep-help_en_us&page=index.html

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

***********************************************************************

[Moderator edit: Updated the broken link.]

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

DmitriyZ · ‎06-28-2021

@SUBHAJIT KHANBARMAN_1

I have finished with settings, but no luck with isuue resolving.

We replaced disks with new one model - EG001200JWJNQ. Does it fit with MSA 2040 or incompatible, because still no changes...

SUBHAJIT KHANBARMAN_1 · ‎06-28-2021

@DmitriyZ

It looks to me supported model only as per below supported drive firmware link,

https://support.hpe.com/hpesc/public/swd/detail?swItemId=MTX_e4af5736d2d743bc836ba7bee4

You go ahead and log support case with HPE.

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

**********************************************************************

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Imtiaz_Waraich · ‎12-20-2023

@SUBHAJIT KHANBARMAN_1

Hi SUBHAJIT

I have same issue, Disk Group dgB02 - Degraded ,and HPE support advice me reboot the controller, but I have too much workload, production, can you help for an other cli command run rescan disks or way I can restart Reconstruction opration. also advise do I rescan disk channel so it will impact running SAN status.

System Health
Degraded

Reason:
A subcomponent of this component is unhealthy.

Unhealthy Components

Virtual Pool B - Degraded

A virtual disk group is missing one or more disks.

- Ensure that spare disks are available. Reconstruction should start automatically.
- When the reconstruction is complete, replace the failed disk(s). (Look for event 8 in the event log to determine which disk(s) failed.)
- Disk groups that cannot find compatible spares will automatically move data to fault-tolerant components.

Disk Group dgB02 - Degraded

One disk in the RAID disk group failed. Reconstruction cannot start because there is no spare disk available of the proper type and size.

- Replace the disk with one of the same type (SSD, enterprise SAS, or midline SAS) and the same or greater capacity. For continued optimum I/O performance, the replacement disk should have performance that is the same as or better than the one it is replacing.
- Configure the new disk as a spare so the system can start reconstructing the disk group.
- To prevent this problem in the future, configure one or more additional disks as spare disks.
- If the disk group is being expanded, reconstruction will start after the expansion is complete.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Degraded disk group

Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group

Re: Degraded disk group