HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

Deployed patch to Cluster, put one node into an unknown state.. Help! (DL320s/SANiQ 9.5)

craigl2112
Occasional Visitor

Deployed patch to Cluster, put one node into an unknown state.. Help! (DL320s/SANiQ 9.5)

Hi Folks -

Have an old 5-node DL320S Lefthand cluster running SANiQ 9.5.  At the suggestion of our hardware maintenance provider, we attempted to deploy Patch set 05 (25050-02) Saturday night.  Along with that, we deployed a couple other patches that were available.  

Sadly, the upgrade failed towards the end, and left one of our nodes in an 'Unknown' state in the CMC.  Additionally, at the console, after you hit 'Login', we are presented with 'Cannot log in to the console because the storage system has not yet fully initialized.  Wait a minute and try again."

On a good note, the node boots enough so that all of the volumes (all Network RAID 10) re-sync without issue, so we are confident it's not totally hosed.  We even get alerts saying the node is up!

Additionally, the node in the 'Unknown' state says it is on software version 9.5.00.1237, where the other four are all on 9.5.00.1215.0.   Seems like one of the patches it was trying to install bumped the version number, and that's probably where it puked.

At this point, we are considering removing that node from the cluster, rebuilding it from scratch and re-admitting it to the cluster.  We have plenty of excess capacity, and given the rebuild process looks pretty simple.

Does anyone have any suggestions on things we can try prior to the removal/rebuild? 

Thank you!

-Craig

1 REPLY
oikjn
Honored Contributor

Re: Deployed patch to Cluster, put one node into an unknown state.. Help! (DL320s/SANiQ 9.5)

if you have the capacity to do it, I would definitely remove it from the cluster and then from the management group before messing with it.  Its likely once removed from the MG it might appear normal or at least give you the option to do whatever packs required.  The alternative is that you are going to need to use a recovery boot disk to reinstall the OS.  Again, once you get that node out of the cluster/MG you have a lot more wiggle room to make sure it comes back online without issue and without risk to your data.