Array Performance and Data Protection

Can replication go faster?

 
bubbagump41
Occasional Advisor

Can replication go faster?

Is there any way to speed Nimble replication? It seems there is a relatively hard limit of ~1000Mb/s. I assume replication is a very low priority task to be sure to not impact production traffic, but even in our off hours, replication never gets much past 1000Mb/s and this coming from an AF7000 that can certainly push some IOPS. We don't have any replication QoS configured.

22 REPLIES 22
Valdereth
Trusted Contributor

Re: Can replication go faster?

I haven't run into any replication issues with my deployments but I'm curious to hear the follow up to your question - did you reach out to Nimble Support to see if there is something that could be causing the bandwidth ceiling you're noticing?

bubbagump41
Occasional Advisor

Re: Can replication go faster?

I did open a ticket. They pulled a core dump to see what the deal is. However, I am seeing this same ceiling on all of our 8 or 10 arrays all on 3.7 code. We just upgraded the dev/test array to 3.8.0.1 and the SFAs to 4.3... so I am not sure if we'll maybe see an improvement, but the release notes don't indicate there are any replication fixes. We also have a volume on a CS5000 that will only replicate at like 200MB/s when other volumes on the array replicate at that mystery 1000Mb/s ceiling. It's all somewhat strange.

bubbagump41
Occasional Advisor

Re: Can replication go faster?

I should also mention we have AFs, which I would expect to replicate very quickly... but nope. Same ~800-1000MB/s ceiling.

Valdereth
Trusted Contributor

Re: Can replication go faster?

With synchronous replication on the horizon I would imagine any throughput bottlenecks within NOS or controllers will be actively addressed.  I know Nimble has plenty of service providers that accept replication points from customers so I'm hopeful they'll be able to point you in the right direction on how to exceed the cap.  Thank you for the update!

gregbuchner79
New Member

Re: Can replication go faster?

It doesn't specify in the specs, but are the management ports 1 GbE? Usually replication is configured to run through those ports. That would definitely limit your replication speed if they are.

milovanov88
Advisor

Re: Can replication go faster?

bubbagump

Thank you for posting this question. I am glad you have opened the support case, Nimble Storage Support should be able to provide you with the answer to your question using the Support data and statistics coming from arrays.

Without seeing your statistics, i would like to at least provide some guidance regarding your question.

Yes, there are system resource constraints and design limitations in replication engine. The majority of that particular scenario occurs when the data is randomly written on the RAID, making the replication message read requests take more time, even from SSD'd. While the "slow blocks" are retrieved, any other problem (network congestion or packet loss) which could exist will degrade performance even more since the messages will not be sent out at the right time (or have to be re-transmitted) and system cannot continue moving through to next blocks until data in TCP buffer moves to network.

For your particular situation, seeing your statement about 1000Mbps being the "ceiling" on multiple arrays makes me wonder if there is some network path which is taken across 1Gbps switch. I have seen replication perform as fast as 230MiBps (~1.9Gbps) on the 10Gbps interface card when the data was laid out favorably on the RAID with multiple volumes replicating at once. When you state that many of your arrays are "capped" at 1000 Mbps, I cannot imagine that all of them have exactly same data structure on RAID. I would imagine that the peak and average replication performance will be different from array to array. I would continue working with Nimble Storage Support and seek to identify the common point between all your arrays which limits the bandwidth to 1Gbps.

Usually, to maximize replication bandwidth, the best solution is to replicate as many of volumes as possible for a good mix of the data streams. With Nimble OS 3.x and 4.x the system will replicate as much as 8 volumes at once, allowing it to obtain blocks of data to replicate from all at once (some will be slower than others) and push it on the network. However, I do not think it will be possible to obtain 800MiBps (6.7Gbps) with current replication implementation even with perfectly laid out sequential blocks in each RAID stripe and lack of any other contention on the system or the network.

On the non-AFA platforms, with Nimble OS 3.x or above it is best to replicate more frequently, such as every 15 minutes after initial seeding was completed. This should pull the data from the SSD's instead of the HDD's, which would increase the read speed of the blocks to replicate and thus, increase the bandwidth used for replication. Take care not to coincide the snapshots (take snapshots at different times of the day), to guarantee there will be enough resources to complete all snapshot jobs before replication.

I hope the information i have provided is useful and will lead you to the answer to your original question. If possible, please post the final resolution and the isolation from Support here.

bubbagump41
Occasional Advisor

Re: Can replication go faster?

Our management ports are indeed 1G, but I a certain that this traffic is going over 10G. See below. The configuration as well as interface stats support 10G all the way. Additionally, the arrays all live on an isolated 10G flat switching fabric, so there are no 1G hops anywhere.

Regarding streams and such, we are replicating ~40 volumes out of 80 in the group, so I would expect that to be a reasonable mix. 

Thanks for all of the insight. I'll keep poking with engineering.

gregbuchner79
New Member

Re: Can replication go faster?

Guess it couldn't be simple. This came immediately to mind as we just moved our replication traffic around. Not for the speed (we're limited to 100 Mb connection to our DR site), but so we could put the replication traffic on a different subnet and route it over a different connection than normal daily traffic.

Good luck getting this resolved and seeing faster speeds.

milovanov88
Advisor

Re: Can replication go faster?

Not sure it matters, but can you identify if specific volumes are a problem?

Are they keeping up with your schedules at this time?

You can use InfoSight >> Manage >> Labs >> Replication Timeline  to see how well or not well each volume is replicating.