- Integrated Systems
- About Us
- Integrated Systems
- About Us
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
01-03-2018 11:58 AM
Nimble replication speed not consistent / how to check?
Hi all, I have two AF7k arrays in different data centers, both running the recent 18.104.22.168 code. There's 10gig linking the two facilities, but due to firewall limitations, there's a soft limit of roughly a gigabit across the VPN. I've reproduced the gigabit cap consistently via iperf doing tcp tests, so it's a reliable limit with little variance.
I'm seeing replication of snaps peak at about 500 Mbit/sec, although since turning it on, I've never seen it get that high since the first day of replication. It seems to like to average around 320 Mbit/sec since that time when it's in one of its good phases, but between those times, it may go for hours at just 20-50 Mbit/sec inexplicably. The slow time, based on how past snaps have replicated in the past day or two, may occur towards the end of replicating any one snap, so perhaps this is some checksumming phase and data is not really moving at that point?
Currently I'm getting behind on snaps making it over to the other side because of these weird drops in performance that can last for hours before spinning back up to high speed, which might coincide with moving on to the next snap. I currently have about 4TB left to replicate to get the two arrays completely in sync, and this is making it difficult to predict just how long that will take, and what the minimum replication interval can be for the future.
The arrays are not particularly loaded. The primary is averaging <20k IOPS and the target has no activity other than receiving the replication data. The replication is via the management interface, not the 10gig data interfaces.
Here's the past few days since initiating replication:
You'll see the throughput just before Dec 30 where it was hitting 500 Mbit, and how it seems to ramp up and plateau since then, when it's in a good mood. However, currently I'm only seeing 15 Mbit/sec.
- Is there any way, perhaps via CLI, to see what is truly going on with any given replication task so I can really judge how far along it is or when it might complete?
- Any way to speed up the replication to get back to the original 500 Mbit/sec or preferably closer to gigabit?
- Is there a most efficient way to complete an initial seeding of the target? For example, currently the production array is set to have hourly snaps and replicate every 8th. The ~11TB of data on it was loaded over those first few days (vmware live migrations) but has now stabilized. Currently it's still working on a snapshot from yesterday afternoon, i.e. just over 24 hours old, and it's gone through these weird phases of <20 Mbit and >250 Mbit, but never back to the original 500 Mbit. I'm happy to kill the replication, delete all snaps, and start over if that would help.
04-24-2018 11:17 AM
Re: Nimble replication speed not consistent / how to check?
In short, the arrays should replicate as fast as possible so long as it doesn't hinder the array's ability to serve production data. I also checked to see if we had any known bugs open on this particular issues, which we do not. This is also something I have not seen in my own customer base, and I have quite a few customers running on 4.x.
Probably not the answer you're hoping for, but this one is going to require a support call to figure out. Luckily, Nimble has the best support around. Give them a call. If you remember to, come back and update us on what the root cause ended up being.
Solutions Architect – Storage & Hybrid IT
Register a Kudo by clicking the thumb if this helped in your issue.
Please consider marking it as an Accepted Solution if issue is resolved.