Array Performance and Data Protection
cancel
Showing results for 
Search instead for 
Did you mean: 

Nimble replication speed not consistent / how to check?

Highlighted
EliteX2Owner
Occasional Contributor

Nimble replication speed not consistent / how to check?

Hi all, I have two AF7k arrays in different data centers, both running the recent 4.5.0.0 code.  There's 10gig linking the two facilities, but due to firewall limitations, there's a soft limit of roughly a gigabit across the VPN.  I've reproduced the gigabit cap consistently via iperf doing tcp tests, so it's a reliable limit with little variance.

I'm seeing replication of snaps peak at about 500 Mbit/sec, although since turning it on, I've never seen it get that high since the first day of replication.  It seems to like to average around 320 Mbit/sec since that time when it's in one of its good phases, but between those times, it may go for hours at just 20-50 Mbit/sec inexplicably.  The slow time, based on how past snaps have replicated in the past day or two, may occur towards the end of replicating any one snap, so perhaps this is some checksumming phase and data is not really moving at that point?

Currently I'm getting behind on snaps making it over to the other side because of these weird drops in performance that can last for hours before spinning back up to high speed, which might coincide with moving on to the next snap.  I currently have about 4TB left to replicate to get the two arrays completely in sync, and this is making it difficult to predict just how long that will take, and what the minimum replication interval can be for the future.

The arrays are not particularly loaded.  The primary is averaging <20k IOPS and the target has no activity other than receiving the replication data.  The replication is via the management interface, not the 10gig data interfaces.

Here's the past few days since initiating replication:

Historical

You'll see the throughput just before Dec 30 where it was hitting 500 Mbit, and how it seems to ramp up and plateau since then, when it's in a good mood.  However, currently I'm only seeing 15 Mbit/sec.

Current Speed

 So, questions:

  • Is there any way, perhaps via CLI, to see what is truly going on with any given replication task so I can really judge how far along it is or when it might complete?
  • Any way to speed up the replication to get back to the original 500 Mbit/sec or preferably closer to gigabit?
  • Is there a most efficient way to complete an initial seeding of the target?  For example, currently the production array is set to have hourly snaps and replicate every 8th.  The ~11TB of data on it was loaded over those first few days (vmware live migrations) but has now stabilized.  Currently it's still working on a snapshot from yesterday afternoon, i.e. just over 24 hours old, and it's gone through these weird phases of <20 Mbit and >250 Mbit, but never back to the original 500 Mbit.  I'm happy to kill the replication, delete all snaps, and start over if that would help.