HPE Storage Tech Insiders
cancel
Showing results for 
Search instead for 
Did you mean: 

Tip: How not to shoot yourself in the foot

rfenton4

Over the years, I have seen many a customer spend serious amounts of time, effort and money deploying a variety of backup and replication solutions, to ensure they have the ability to recover data when it matters. Of course this is extremely wise and has for a long time been a necessity for the majority of production environments.  However, often it's uncommon a flood, fire, rogue application or natural disaster that causes the majority of the outages.  Many studies will show up to 75% of the time, it's us (human error) that is the root cause of datacentre outages.  It's no wonder, with increasing complexity and lack of time we have, that mistakes and errors are made.  One of the things I really like about Nimble Storage is a lot of the complexity in managing the storage layer is simply removed, giving back time to the infrastructure administrator to focus on other activities.

One of my customers recently requested that we include a capability to create a 'waste-basket' in the GUI, to allow deleted volumes to be moved into an area and deleted at a later date.  The primary requirement was he desired to avoid such human errors when it comes to deprovisioning and decommissioning services.   I highlighted to him that Nimble arrays already have a similar feature and thought it was worth sharing as often it's the little things that can save an inordinate amount of time and heartache and save you from proverbially 'shooting yourself in the foot'

shoot-in-foot.jpg

Each volume in Nimble OS has an Online status.  Typically when a volume is first provisioned, it is marked as being online, such that any host that is in it's associated Initiator Group can reprobe and access the volume.   In order to delete the volume, it first must be placed in an offline status (removing host access) and then subsequently deleted.   Often as administrators, in haste we do both actions quickly when clearing up volumes. 


Here's how you take a volume offline... Browse to the Volume:


Offline1.jpg


Click Set Offline, you will receive a warning that connected hosts will lose access to the volume....


Offline2.jpg


Click OK and the volume will be placed offline....  Turning the volume Grey in the process


Offline3.jpg


Offline4.jpg



Now of course the hosts will lose access when a volume is placed offline, but if you inadvertently chose the incorrect volume, placing it back online would be a very simple operation  !


This is where my Top Tip is... STOP!


Rather than making that decommissioning process an atomic transaction of offline then delete... take the volume offline and wait a while... have a cup of tea/coffee, wait until the end of the day/week and once nobody is complaining that their application is no longer available... then delete the volume!!   It's a simple technique but one I couldn't recommend more highly when working with any storage system, as most will allow you take the volume offline or unmasked before deleting.


Hopefully this is sound advice for anyone but I'd be interested to hear what other simple and effective tips you've picked up along the way. Please share them in the comments section or better still write your own document/blog on Nimble Connect !

About the Author

rfenton4

Comments
amirul93

Excellent advice rich.

I Suppose an extension to this could be that, if you are in a situation where you are replicating the volume to a second nimble array, you could delete the primary volume (if you needed the space for example) but retain the replica for a while longer.

rfenton4

Agree but even waiting a short time will save you hassle in failover promoting and failing back

j_mcdonald

for ease of cleanup later we take the volume offline, then rename it as:

zzz-${volume}-tbd-yyyy-mm-dd

Usually for a prod volume we give it a date about 1 month in the future, and for dev/test we give 7 days

The 'zzz' puts it at the end of the list so it's easy to find and keeps them out of the way in the gui.

putting the 'tbd' and date, we can easily go through and know which of these old offline volumes are safe to delete, and which ones haven't reached their 'to be deleted' date yet.

Events
Apr 24 - 25, 2018
Online
Expert Days - 2018
Visit this forum and get the schedules for online HPE Expert Days where you can talk to HPE product experts, R&D and support team members and get answ...
Read more
June 19 - 21
Las Vegas, NV
HPE Discover 2018 Las Vegas
Visit this forum and learn about all things Discover 2018 in Las Vegas, Nevada, June 19 - 21, 2018.
Read more
View all