MSA Storage

MSA P2000 G3 replace failed controller -cant shutdown the failed controller from the SMU

 
Luke_Y
Advisor

MSA P2000 G3 replace failed controller -cant shutdown the failed controller from the SMU

Hello.

i have a failed controller A  and tried to follow the steps of replacing it....tried to shut it down in order to get

the blue light that it will be fine to replace.

i got an error of that controller B is not active so A wont be able to shut down:

output from show redundancy-mode:

HP MSA Storage P2000 G3 SAS

System Name: ********

System Location: *******

Version: TS251P005

# show redundancy-mode

System Redundancy

-----------------

Controller Redundancy Mode: Fail Over

Controller Redundancy Status: Operational but not redundant

Controller A Status: Down

Controller A Serial Number:

Controller B Status: Operational

Controller B Serial Number: ******

 show controllers commands output:

Controllers

-----------

Controller ID: A

Serial Number:

Hardware Version:

CPLD Version:

MAC Address: *******

WWNN: *******

IP Address: 0.0.0.0

IP Subnet Mask: 0.0.0.0

IP Gateway: 0.0.0.0

Disks: *

Vdisks: *

Cache Memory Size (MB): 2048

Host Ports: 4

Disk Channels: 2

Disk Bus Type: SAS

Status: Down

Failed Over to This Controller: No

Fail Over Reason: Not applicable

Health: Degraded

Health Reason: A subcomponent of this component is unhealthy.

Health Recommendation: - See the information about unhealthy components that is shown in the WBI or by the CLI 'show system' command.

Position: Top

Phy Isolation: Enabled

Controller Redundancy Mode: Fail Over

Controller Redundancy Status: Operational but not redundant

 

Controllers

-----------

Controller ID: B

Serial Number: *****

Hardware Version: 53

CPLD Version: 22

MAC Address: ******

WWNN: *******

IP Address: *.*.*.*

IP Subnet Mask: *.*.*.*

IP Gateway: *.*.*.*

show system output:

System Information

------------------

System Name: *********

System Contact: *********

System Location: ***********

System Information: Uninitialized Info

Midplane Serial Number: *********

Vendor Name: HP

Product ID: P2000 G3 SAS

Product Brand: MSA Storage

SCSI Vendor ID: HP

SCSI Product ID: P2000 G3 SAS

Enclosure Count: 2

Health: Degraded

Health Reason: A subcomponent of this component is unhealthy.

Supported Locales: English (English), Spanish (español), French (français), German (Deu                             tsch), Italian (italiano), Japanese (日本語), Dutch (Nederlands), Chinese-Simplified (                             简体中文), Chinese-Traditional (繁體中文), Korean (한국어)

 

  Unhealthy Component

  -------------------

  Component ID: Enclosure 1, Controller A, CompactFlash

  Health: Fault

  Health Reason: The component is not present.

  Health Recommendation: - Replace the FRU that contains this component.

 

  Unhealthy Component

  -------------------

  Component ID: Enclosure 1, Controller A

  Health: Degraded

  Health Reason: A subcomponent of this component is unhealthy.

  Health Recommendation: - See the information about unhealthy components that is shown                              in the WBI or by the CLI 'show system' command.

 

  Unhealthy Component

  -------------------

  Component ID: Enclosure 1, Controller A, Management Port

  Health: Degraded

  Health Reason: The network port Ethernet cable is unplugged, or the network is inoper                             able.

 

im wondering if it will be a better idea to shut it down completley or just remove Controller A and replace it
while disabling the partner firmware upgrade.

thank you in advance.

6 REPLIES 6
arun_r
HPE Pro

Re: MSA P2000 G3 replace failed controller -cant shutdown the failed controller from the SMU

Hi Luke,

It does not seem to capture the reason for controller B reporting degraded status.

I would recommend to first fix controller B issues before replacing controller A.

Restarting storage controller B after scheduling a down time would be a good idea if its a false alert.

Share the output of the following commands:

show cache-parameters

show vdisks

show ports

Confirm whether the show network-parameters output indicates health status

Have you already tried restarting storage controller A and removing/reseating it?

It would be good if you can engage HPE support team through chat option to review the logs and provide the recommendations. I believe that basic support would be available even for out of warranty units.

https://support.hpe.com/hpesc/home/chat

 

 

 

I am an HPE Employee

Accept or Kudo

Luke_Y
Advisor

Re: MSA P2000 G3 replace failed controller -cant shutdown the failed controller from the SMU

already tried to re-seat controller A but that didnt work.

i dont see in the logs that B is degraded , the only thing i got when started to do the process 

of replacing the controller (via SMU tried to shutdown controller A) got that B is not operational.

what concerns me as well that i dont see the status of controller B in the show controllers command only in the show redundancy-mode that Controller B Status: Operational .

 

 

 

arun_r
HPE Pro

Re: MSA P2000 G3 replace failed controller -cant shutdown the failed controller from the SMU

Hi ,

May I know the output of the following commands?

show cache-parameters

show vdisks

show ports

show controllers

 

I am an HPE Employee

Accept or Kudo

Re: MSA P2000 G3 replace failed controller -cant shutdown the failed controller from the SMU

@Luke_Y 

Is Controller A physically dead controller ? no power taking ?

I would suggest to connect Controller B with serial cable and check everything before you proceed further. Few commands to check,

show system
show controllers
show shutdown-status
show redundancy-mode
show cache-parameters
show vdisks

 

Hope this helps!
Regards
Subhajit

I am an HPE employee

If you feel this was helpful please click the KUDOS! thumb below!

*************************************************************************

 


I work for HPE
Accept or Kudo
Shawn_K
HPE Pro

Re: MSA P2000 G3 replace failed controller -cant shutdown the failed controller from the SMU

Hello Luke,

It is a little hard to tell what state the controllers are in from the limited output. If you are connected to controller A and gathering output it may not be responding enough to communicate with controller B. This results in commands not completing fully and is misleading.

First - does controller A have an amber LED? If yes, then you will not be able to shut it down as the controller is not up enough to accept a shutdown command. If there is an amber LED, I suggest to remove controller A, wait 5-10 minutes for controller B to stabalize and then inset the new controller A. I would ensure that PFU is enabled before controller replacement - but you can check this option after removal of controller A. Once you do remove controller A the other controller should respond better.

If the LED on controller A is green then the controller is up. Try to connect to controller B either via telnet or CLI and then run the shutdown controller A command. 

A word of caution. You have provided no reason (logs or Event messages) as to why controller A is failed and needs replacement. Be sure you are replacing the correct controller, as sometimes it can be confusing whether or not controller A killed B or if controller B killed A and who is actualy a bad player on the system. 

I work for Hewlett Packard Enterprise. The comments in this post are my own and do not represent an official reply from HPE. No warranty or guarantees of any kind are expressed in my reply.

Cheers,
Shawn


I work for HPE

Accept or Kudo

Luke_Y
Advisor

Re: MSA P2000 G3 replace failed controller -cant shutdown the failed controller from the SMU

Hello.

there is an Amber led on Controller A  other lights are on.

the unhealthy components of controller A (connected):

Unhealthy Component

  -------------------

  Component ID: Enclosure 1, Controller A, CompactFlash

  Health: Fault

  Health Reason: The component is not present.

  Health Recommendation: - Replace the FRU that contains this component.

 

  Unhealthy Component

  -------------------

  Component ID: Enclosure 1, Controller A

  Health: Degraded

  Health Reason: A subcomponent of this component is unhealthy.

  Health Recommendation: - See the information about unhealthy components that is shown                              in the WBI or by the CLI 'show system' command.

 

  Unhealthy Component

  -------------------

  Component ID: Enclosure 1, Controller A, Management Port

  Health: Degraded

  Health Reason: The network port Ethernet cable is unplugged, or the network is inoper                             able.

System Cache Parameters
-----------------------

Operation Mode: Fail Over

  Controller A Cache Parameters

  -----------------------------

  Write Back Status: Not up

  CompactFlash Status: Not Installed

  Cache Flush: Disabled

  Controller B Cache Parameters

  -----------------------------

  Write Back Status: Enabled

  CompactFlash Status: Installed

  Cache Flush: Enabled

show ports

Ports Media    Target ID         Status        Speed(A) Health

  Health Reason                                   Health Recommendation

----------------------------------------------------------------------------

A1    SAS      500c0ff19f33e000  Disconnected  1Gb      N/A

  There is no host connection to this host port.  - No action is required.

 

   Topo(C) Width

   --------------

   Direct  0

 

A2    SAS      500c0ff19f33e100  Disconnected  1Gb      N/A

  There is no host connection to this host port.  - No action is required.

 

   Topo(C) Width

   --------------

   Direct  0

 

A3    SAS      0000000000000000  Disconnected  1Gb      N/A

  There is no host connection to this host port.  - No action is required.

 

   Topo(C) Width

   --------------

   Direct  0

A4    SAS      0000000000000000  Disconnected  1Gb      N/A

  There is no host connection to this host port.  - No action is required.

 

   Topo(C) Width

   --------------

   Direct  0

 

----------------------------------------------------------------------------

Ports Media    Target ID         Status        Speed(A) Health

  Health Reason                                   Health Recommendation

----------------------------------------------------------------------------

B1    SAS      500c0ff19f33e400  Up            6Gb      OK

 

 

   Topo(C) Width

   --------------

   Direct  4

 

B2    SAS      500c0ff19f33e500  Disconnected  Auto     N/A

  There is no host connection to this host port.  - No action is required.

# show vdisks

all Vdisks in OK status