StoreVirtual Storage
1756438 Members
3804 Online
108847 Solutions
New Discussion юеВ

StoreVirtual 4730 RAID OFF

 
VictorHugo Ortiz
Regular Visitor

StoreVirtual 4730 RAID OFF

Yesterday morning we lost two 4730 from a cluster of 4 nodes. One of them just stopped working after a reboot it came back and the second due to two drive failures which happened almost at the same time, we didn't find out this until several hours after the failure.

All the LUNs are on replication status offline and critical, the node with the fail drives has its RAID0 off, I can removed from the group to reconfigure RAID because a LUN is replicating or being moved.

We don't have support any more, we did but no more.

Any assistance will be highly appreciated.

5 REPLIES 5
Rachna-K
HPE Pro

Re: StoreVirtual 4730 RAID OFF

@VictorHugo Ortiz 

We might have to get more information to understand the configuration like the RAID configured on the Node and also Volumes configuration. We would also need to know the Storage System status. Is it just RAID which is Off or the Storage System is Offline as well.

We can reboot the Node and check the Logical Drive Status during POST. It might have got disabled due to 2 Drive Failures.


Regards,
Rachna K
I am an HPE Employee

Accept or Kudo




Note: While I am an HPE Employee, all of my comments (whether noted or not), are my own and are not any official representation of the company
VictorHugo Ortiz
Regular Visitor

Re: StoreVirtual 4730 RAID OFF

Here is some additional information with screenshots. If additional information is requred from specific log and node, please let me know.

Last Friday two nodes went out almost at the same time. ESC-SV-28 and ESC-SV-25. The node ESC-SV-28 lost two drives at that time. At the time I found out the two node had failed about 10 hours had pass but the nodes nor the cluster were not recovering. I replaced the two drives and the node ESC-SV-28 began to recover and as I was looking into the logs, I found a third drive was going to fail but the GUI wasnтАЩt showing anything yet.

2 nodes.png

 

I couldnтАЩt find in the longs anything that would tell me how long would take to recover or if it was going to recover and the appeared in this state for about 36 hours until I removed ESC-SV-28 from the cluster but as I tried to removed it from the management group, I was unable to do it because it was migrating data but it has been migrating data since last tuesday.

Fig-3.jpg

Fig-5.jpg

On Wednesday or Thursday the state of the LUNs change to

Fig-3.jpg

The LUNs are RAID-10 2 way mirrow.

For some reason although all nodes are license ESC-SV-28 shows as unlicences.

Fig-4.jpg

The LUNs are RAID-10 2 way mirrow. Node ESC/-SV-28 has been in migration in progress since Wednesday night, Thursday morning until today 8/8/2020

LUNs offline.JPG

 

 

 

 

 

 

 

Rachna-K
HPE Pro

Re: StoreVirtual 4730 RAID OFF

@VictorHugo Ortiz 

I do see the Node # ESC-SV-28 is outside the Cluster as per the screenshots shared.  Volumes are Offline and trying to restripe.

 

This might need indepth analysis.


Do let me know if the Volumes were Offline when you removed the Node ESC-SV-28 from the Cluster. They might have gone Offline when 2 Nodes ESC-SV-28 and ESC-SV-25 went down. We need to put this Node back in the Cluster however since the Licenses are showing as expired, it might not allow us. Request you to Log a ticket with our Licensing Team to fix the License issues and then add the Node to the Cluster. Check if the Volumes are coming online. If not we might have to perform force authorization which will be done by the HPE Support Team only to get the Volumes back Online.

Below is the HPE Licensing Portal link. They would request for the License Key and a Feature Key.

https://myenterpriselicense.hpe.com/cwp-ui/auth/login


Regards,
Rachna K
I am an HPE Employee

Accept or Kudo




Note: While I am an HPE Employee, all of my comments (whether noted or not), are my own and are not any official representation of the company
VictorHugo Ortiz
Regular Visitor

Re: StoreVirtual 4730 RAID OFF

Hello @Rachna-K ,

Based on several posts I read, I remove the node from the cluster but not from the management group. It was like this for serveral days showing that data was being replicated. Based on some other posts, I powered down the node SV-28 and lefted that way for some time then I powered up the node again and re-joined the node to the cluster by the time it was done with the process, the licenses was back, there was no more data migration but the volumes remain offline and the storage system remains not available. If I recall correctly, the volumes are offline since the failure of the two nodes and the 2 drives+1.

I wish I could call support but we don't have support anymore and due to Covid-19 resource are alocated as needed.

Please let me know what I can provide to you so I can resolve this. My users are done since July 26/27.

Thank you in advance for your prompt response.

VIctorHugo Ortiz

Rachna-K
HPE Pro

Re: StoreVirtual 4730 RAID OFF

@VictorHugo Ortiz 

The Volumes are Offline even after the Volume Resync is completed and the Node is added back in the Cluster. As I mentioned before this requires force authorization and an approval from our Engineering Team is required for the same. This can be done by our support personnel only.

I do understand there is no warranty on the storage however a backend entry into the Node is required here which cannot be done without Support Team.


Regards,
Rachna K
I am an HPE Employee

Accept or Kudo




Note: While I am an HPE Employee, all of my comments (whether noted or not), are my own and are not any official representation of the company