Re: Lefthand RAID Levels

ertech · ‎03-12-2013

I've got the P4500 28.8 Multi-Site solution setup, with all 4 Nodes in a network RAID-10 cluster on the one site. Management wants off-site DR, and we're running out of space. Instead of adding another 2 or 4 nodes I'll like to upgrade the hard disks in each node to 2TB, and then split up the cluster to 2 sites.

My plan for this is to remove 2 of the nodes from the current cluser, chuck the new hard drives in there, reinstall SAN/IQ from the DVD and set that up as a new cluster. Then transfer all of the information across from old to new, and repeat the HDD process with the other 2 nodes. Then I can re-add the other 2 nodes in and make this cluster properly multi-site.

The question I have is - can I 'degrade' the Network RAID-10 cluster down to RAID-0, without losing any data? If so, is this done through the 'remove nodes' option in CMC? And how do I know which of the nodes to remove (so I dont remove both of one mirrored pair)?

I'm aware there will be a risk running in RAID0, but the entire process should be completed in less than a week so it's worth it. I'm also aware the 2TB drives are 7.2K (vs the 600GB being 15K) and therefore slower - but our network links are only 1Gbps at present (long story, cant get funds for 10Gbps for the switch) and our Cacti monitoring says we're not hitting 1Gbps sustained throughput, so I'm not worried about speed

I've got umpteen resources from HP saying the Lefthand P4X00 SUPPORTS Network RAID-0, but nothing on how to transition from Network RAID-10 to network RAID-0? Is anyone able to assist?

P.S. this thread has been moved from Storage Area Networks (SAN) (Enterprise) to HP StoreVirtual / HP LeftHand Storage - HP Forums Moderator

KurtG · ‎03-13-2013

When you bought the "Multisite bundle" you basically bought a bundle of software and hardware together. As everything else today you would loose you're right to support from HP if you change hardware to something that is "unsupported".

Would it technically work? - Probably - unless there is a check in the software for a "supported" hardware config.

Would you loose the "rights" to upgrade and patches? Probably - as you no longer are running a supported configuration without support.

Having said all that you can change your luns raid level on the go provided you have the space needed to get to the requested level.

Would I choose to go the route you are drawing up for us? Probably not. If management demands multisite it should be possible to get the money todo it the right way.

KurtG

ertech · ‎03-13-2013

Hi Kurt, thanks for the response

I'm aware I can change the LUN RAID level, and there's a nice little dialog box for that. But how about changing he NODE RAID level, or in other words changing the cluster's Network RAID level?

The only options I can see are to 'remove a node' from the cluster. But given that I'm using Network RAID 10, I don't know which nodes I can remove. We have two nodes mirrored, each striped, and if I remove both nodes that are part of one mirrored pair I will lose data.

I'm also looking for some reassurance that we won't lose data doing this?
Maybe if I phrase the question differently - when using Network RAID 10, if I use the 'Remove a Node' option to remove a node from the array, will data loss result?

KurtG · ‎03-13-2013

Disclaimer; Have not done this! ;-)

If you have a correctly configured system utilizing the site definition data is placed in both "sites" (but you can not know on what node and so on).

Easies way as I see it is to convert from mutisite to standard cluster, restripe, remove a node at a time (x2), reconfigure new hardware/nodes/sites and repeat for the remaining nodes after restripe. New capasity should become avaliable when youre are finished but lets face it; -Who are you gonna call about that if youre are running unsupported drives?

Performance impact can be high.
Time needed to reach youre goal is high.
Workhouers can be hight.

I would't do it.

KurtG

oikjn · ‎03-13-2013

Kurt, you seem to be a little confused about network raid levels... the network raid is applied at the LUN level not at the node level so you can have a mix of LUN NR levels all on the same nodes.

There is a disconnect here and your company is either going to have to spend lots of money or change their requirements. Multi-site assumes you are asking for active-active... that requires $$$ site-site link costs which you probably don't have. Remember that link will become your LOCAL storage throughput limit so if you don't have a WAN link fast enough to handle the latency and bandwidth of your SAN, you are out of luck for a multi-site cluster. If you only need A/P using snapshots, you don't have to be as worried about that (though you do have to calculate change rates for bandwidth).

Assuming your current storage needs are enough and you have the latency/bandwidth link, AND you are ok if your data is only mirrored once (effectively raid0 at the site and raid1 between the sites), then you can simply edite your nodes to change two nodes to a 2nd site while still at your current site then once the restrip completes, you just ship those units out to the new site and turn them on. You will be unprotected (effectively raid0) while those nodes are in transit, but if you were thinking about raid0 anyway, this should be a viable option.

If you 100% require more storage, you are stuck and will have to get more nodes. The cheapest way would be to get additional nodes to meet your needs at the local site and then at that time, get them to throw in some VSA licenses for you to run at your DR site and run those VSAs as a seporate management group and just link the groups goether and do remote snapshots across the WAN.

I've never tried, but it wouldn't hurt to talk with the sales guys about upgrading the HDDs or asking to trade in your current nodes for higher capacity ones.... you can always get storage elsewhere so they should hopefully cooperate in some way. As long as they allow some time for overlap for having the new nodes with the old, you can easily swap out the old for the new nodes.

Nothing will be "free" so if management doesn't want to pay for it then you are screwed.

ertech · ‎03-13-2013

OK, there's a bit of a disconnect here, let me try again

Whilst we purchased the "Multi-Site" bundle, as I said in the first sentence these are currently all running on one site. So I cannot 'convert to a regular' cluster - it already is.

If I simplify things a bit that might help. We currently have 4 x Lefthand P4500 Nodes, filled with 600GB drives (BQ889B), running as one logical cluster. That cluster is exposing only one LUN, using 2 x 6-Disk RAID-5 on each node and Network RAID10 across the nodes, for a total of approx 10.1 TB usable. We utilise only a single LUN as we are storing Hyper-V VMs on this (the LUN is utilised as a Cluster Shared Volume via Microsoft Clustering Services)

So, given the above and leaving out any issues concerning support or warranty, here's the questions -

"Can I 'remove' 2 of the nodes from the cluster and still have all my data? If I use the 'remove node from cluster' option in HP SAN CMC and choose 2 of the nodes, will the CMC be intelligent enough to reconfigure the nodes in Network RAID 0 and 're-stripe' my data without losing any of it?"

If so, the rest of what I need to do is easy. I'm aware the re-striping will take some time (9TB of data) but we can live with that, as long as the SAN remains operational and the VMs remain live during this time

Any assistance is appreciated

oikjn · ‎03-13-2013

Ah, I thought you were trying to split because now you actually wanted to use your multi-site SAN as a multi-site config and not that you were looking to remove nodes from your system temporarily.

you "could" do that, but if you really don't care about the warranty then why not just shut down one node, rip the drives out and replace them and then see if the node recognises the drives. If it does, the system will recognise that the storage unit is corrupted and allow you to swap out the ghost of the original unit with the new unit (assuming the new drives are recognised).

I've never tried with real hardware, but this works for the VSA version so I know it will work on the management group side (even with the same MAC addresses). Can't say the node will actually accept the new disks, but maybe you can tell us :)

I would suggest you spin up a few VSA nodes in another mgt group and try the procedure yourself.

Side comment: Why set up a single massive LUN? Talk about all your eggs in one basket. I try and keep my LUNs under 2TB each so if for whatever reason a LUN loses connection to a server you don't lose all your VMs in one shot. Sure MS lets you make CSVs that large, but what if get corruption and have to run a chkdsk? Can you handle that kind of total system downtime? At least with a 2TB LUN, that downtime would be 1/5 the time and only ~1/5 of your VMs would be affected.

If you don't want to try my method for just pulling a node on the running system and want to go w/ the NR0 option, you can remove the nodes by selecting the edit cluster option and remove the nodes you want. You have to make sure that you have spare capacity in the cluster, and since you don't, you would have to first edit your LUN and change its raid level to NR0. After that restripes you can edit the cluster and remove the nodes you want... that is going to force a 2nd restripe and once that restripe is done you can remove the nodes from the management group. During the migration process you will see how much data is remaining to be transfered if you click on the node in question.

I would suggest you try my first option if you really want to risk the downtime and data-loss possible with NR0. My reason is that with my option you only leave 1/2 your data unprotected AND it would only require 1/2 a restripe as your data gets reseeded from the mirror node. If you remove two nodes and go to raid 0, you will have to restripe all of your data 6 times assuming you do two nodes at a time. I wouldn't want ALL of my servers at risk on NR0 for that long.

PS. when I say "at risk" for NR0, its just because I see it like any raid0 set. Sure data loss might be a little less since the risk of losing a node should be less than losing a single disk, but I try and avoid those risks for things as important as ALL of your company VMs. Technically NR0 does function on the SAN but you lose your 99.9+% availability... which right now you probalby think is OK to take that risk until you run into a temporary outage on NR0 and your company comes to a crashing hault.

ertech · ‎03-13-2013

Hi oikjn

Thats a useful answer, thank you

I did think of swapping the drives and waiting for a rebuild, but unfortunately there's another requirement attached to this process. We currently run Cluster Services 1 & Hyper-V 2. I wanted to have 2 of the nodes upgraded with 2Tb drives so I could spin it up as a second cluster, install Server 2012 (Clustering 2 / Hyper-V 3) on some of the current Hyper-V hosts and do a sliding migration of the VMs. We don't need to get into the mechanics of that option here, but it's a reason that I can't simply rip-out and replace the drivers - I need a second SAN (or LUN) to expose to the new 2012 Cluster.

It is a good idea in general though, and thanks for suggesting it.

Re: the large LUN, it's because we have an environment that changes frequently. I mean, a alot. We have 2 VMs (Exchange and File) whcih each have 2 x 2TB disks. So best case is we could split up 3 LUNS (1 Exchange, 1 File, 1 Other), but that doesn't leave a lot of flexibility. And we need that flexibility due to some upcomign organisational changes. We've had drive failures before, and of course there's degraded performance during a rebuild but it's nothing we cant handle.

Similarly with NR0, yes I agree there is a risk. I'm hoping the fact that each Node is composed of 2 x 6-Disk RAID5 arrays will provide me with some redundancy (ie up 2 disks per node can fail, so long as it's not the right combination of disks). Yes there's a risk, but it should only be for a short period and the benefits are quite significant.

So going back to the original question, I can edit the LUN and change it's Network RAID level to 0, yes? Will the CMC indicate which nodes are now being used and which are 'spare'? Once the initial restripe is done there, if I then edit the cluster and remove the 'spare' nodes why is another re-stripe forced?

Approximately how long does the re-stripe process take? (I know it will be only an estimate). We backup every single VM & VHD to tapes on the weekend, so if I start the re-stripe process and something goes wrong I do have a fall back, but my recovery window will be small.Im thinking this could be my process:

- Do a backup

- Once complete, edit the Lun for NR0

- Once re-striped, edit the cluster to remove the spare nodes (assuming the CMC tells me which are spare)

- Once re-striped (again), start the drive swap process

Will that work?

Again, thanks for your help

RonsDavis · ‎03-14-2013

When you switch from NR-10 to NR-0 you will restripe. To ALL of the nodes. You won't have a spare, and you won't be able to sustain the outage of any of your nodes. This is not a situation you want to be in. One 3 second problem that takes out one node and all of your systems go down. Why on earth would you want to take that risk.
As for testing out some of these restripe scenarios, make a quick 10 MB LUN, and start testing.

oikjn · ‎03-14-2013

what ron just said is correct. It will restripe across all the nodes so you will have to restripe for the switch to NR0 and restripe again to remove the nodes you want.

This is going to take a LONG time and I see no way to do it "offline" so the backups are important as an insurance item, but if something goes wrong 30 hours into your restripe you just lost that changed data.

Really, since you are ok with running NR0, it seems most logical to just power off one node, replace the HDDs and try and restore that node than it would be to go through your current thought process. Assuming pulling and replacing that node works and you get the extra space as shown available, you can do that for all the nodes one at a time. You won't get the expanded usable space until the process completes on all the nodes, but the transition risk is significantly reduced and the time to transition would probably be cut by 90%.

One big risk you seem to overlook with NR0 is that availability (not data securify) is inversly proportional to the number of nodes you have in your cluster since the data is striped across all nodes an interruption on one node will halt access to the LUN. These things definitely happen and probably happen on your system but you don't realize it because MPIO handles it without a problem, but that all goes away with NR0.

The advantage of just shutting off one node and changing the disks on it would be the following:

Assuming you ahve Node1, Node2, Node3, Node4 in NR10, data is mirrored on Node1+Node3 and Node2+Node4 and striped between those two groups. If you turn off Node1, your risk to data availability would only be if something then happens to Node3. If you have NR0, then you have a risk if something happens to ANY node. Beyond that, the data restripe would only be 1/2 the data when you put the node back into service and there would be no initial restripe required to split out the nodes into NR0.

If I were you, I would do one of the following in order of preference.

1 If I had a physical computer with the same free storage capacity as one of the current nodes, I would spool up a VSA with the same capacity as the current nodes, then do a node swap to pull out a node to test on. Then swap it back in once the drive are changed. Rinse and repleat for all nodes and this can be done on a live system and you lose no protection or availability. You should be able to use a trial license of the VSA assuming you didn't get free ones with your package when you bought it.

2. If I didn't have a place to put a VSA, I would just pull a node and try replacing the disks and rebuilding as I said above.

3. I would simply plan an extended outage. Backup the servers on the CSV, shut them all down, delete everything from the Hyper-V cluster and destroy that cluster and remove the iscsi connections to SAN. Delete the cluster and the managemnt group in the SAN. Replace all the disks on the SAN. Rebuild the management group and cluster. Upgrade your Hyper-V Hosts and create the new cluter. Restore servers to new cluster from backup. I would guess this woudl take a complete weekend of downtime and would be playing it very fast and loose with the potential of extending out into the work week, but assuming your backups work, this would be just like a test of your DR planning with the addition of a SAN upgrade in the middle but you would have it in a controlled manner where you would not lose any data instead of potentially running that same DR plan unprepared when your NR0 cluster suddenly craps out mid transition.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Lefthand RAID Levels

Lefthand RAID Levels