StoreVirtual Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

P4300 2 Node out of 4 fail - Please help

joevn
Occasional Contributor

P4300 2 Node out of 4 fail - Please help

Hi

Please help me i have been trying to rescue this P4300 storage for at least 1 week and i am totally clueless. I am totally new to the P4000 storage.

The setup of the storage system is as follows :-
1 Cluster which has 2 Way Mirror enabled
4 Nodes of each 16TB
Each Node of 8 Hard Disk

Last week the storage was went down with 2 Nodes failing at the same time where in 1 Node a Hard disk failed. And the other Node has 2 Hard disk failure.

To ensure that there is a odd number of manager, i started a virtual manager in a existing Node that is operational and proceeded with recovery.

During recovery the process stalled, the Node was moved out of cluster and a Ghost node was created but that Ghost node was corrupted. It indicated that the usable disk space was -25 MB ? Furthermore the Ghost node has the same MAC addess as the one that require recovery but with RIP tag to the front of the MAC address. This Ghost node could not be deleted at all.

For the Existing Node that it was suppose to recovery, the RAID was marked as Off and each disk was labeled Inactive.

Next trying to Recovery the other node that had failed, at the recover node function the error that was indicated "Node cannot be recovered as the Replica is the most up to date"

This Node has the same problem the RAID is Off line as well and the Hard disk are all inactive.

Please help give me an idea how to solve these problems ? I really need help, Thank you

5 REPLIES
oikjn
Honored Contributor

Re: P4300 2 Node out of 4 fail - Please help

the ghost node is the one to target first.  assuming the other one still somehow has data on it.

 

The RIP:MAC address node is just a place holder.  ignore the size it shows.   You just need to get a clean new node into the management group and then you can do a "node exchange" with the RIP node and once that completes its restripe the RIP node will disappear.  The reason it shows as RIP:MAC  is that you could in theory (and most likely) just replace the disk(s) and reset the bad node and then it could have the same name and the same MAC address so it uses that RIP as a place holder instead.

 

Once that is done with the RIP node, then just rinse and repeat with the 2nd node.

joevn
Occasional Contributor

Re: P4300 2 Node out of 4 fail - Please help

Hi oikjn

 

joevn
Occasional Contributor

Re: P4300 2 Node out of 4 fail - Please help

Hi oikjn

In addition to my earlier quuestion about Clean Node please advise me if i can try the following as well :-

I am thinking that as now i have 2 nodes down and the Rip:Mac Address has error "unknown ip" can i perform the following.

1> I will switch off the 2 working Nodes and 1 Node that is not functioning

2> Remove the 1 non-functional node from the cluster volume and recreate it as a new Node

3> After this i will turn on the 2 functioning node, when they are stable i will add the new node into the cluster and use the Restore configuration on the new node to populate it

Will this cause the 2 functioning node to sync their data with the new node and cause it to function again ?

Earlier question

oikjn
Honored Contributor

Re: P4300 2 Node out of 4 fail - Please help

you get a clean new node by taking one of the existing dead nodes and fixing it...  take the one that matches up with the RIP:node, replace the dead HDDs on it, use the local console to reset it.  Give it an IP address (can be the original IP), then use CMC to discover the node and add it to the management group.  I would suggest getting it to the correct patch level before adding it to the management group, but you can do it after as well.  Then once in the mgt group, you can right click on the cluster and select "exchange nodes" and exchange the RIP:node with the new node.  Once it completes its rebuild, the RIP one will just go away.

Stor_Mort
HPE Pro

Re: P4300 2 Node out of 4 fail - Please help

Hi joevn,

This situation is often caused when multiple drives fail in a raid set, causing the raid set to fail and taking the node offline. There may be some recovery that we can attempt at HPE. In some cases like this, the data may not be recoverable. We will need to examine the management group logs. Call 1-800-633-3600 to open a support case. Time and material charges will apply if you do not have a current support contract.

 

I am an HPE employee - HPE StoreVirtual Support