Array Performance and Data Protection
Showing results for 
Search instead for 
Did you mean: 

SQL Server Backup Performance (Not Using Nimble VSS Snapshots)

Go to solution
Frequent Advisor

SQL Server Backup Performance (Not Using Nimble VSS Snapshots)

So just stay with me on this one, it'll sound counter intuitive at first.

Just wondering if anyone else out there is using integrated SQL Server backups that are backed up on a Nimble volume and if you've seen any performance bottlenecks.

To set the stage we have a SQL 2k8 machine, and our DBA runs full verified SQL backups each day.  The SQL server is a 2k8 r2 VM on ESXi.  VMDKs for system drives.  Then we have a windows iscsi guest initiated volume for our database volume as well as a separate one for t-logs and a backup volume, all of which are on our Nimble.  So yes we are creating a backup file on the Nimble of a database on the same Nimble.  There are some reasons for this for spinning off to tape as well as for the DBA's sanity.situation

We know there would be a performance hit in this situation than if we were backing up to a different location.  However, we see maybe an average of 20MB/s on the Nimble and next to no utilization on any network connection, no disk pressure at all, no CPU or memory contention.

I know this is all kind of vague, but curious if someone had seen this when switching from one storage provider to another.

We're working with Nimble support on this as well, but was just curious if anyone else was doing something similar and seeing 3-4x the duration.  Side note, backups aren't failing, they just take a much longer time.

Near as we can tell we're using all SQL and Nimble best practices.

Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Is the DBA / you just using SQL Jobs to create the backups or are you using a 3rd party program to initiate SQL backups?

If you're not using SQL, I would try that first. Just run a manual backup task on the database to see what performance you get.

I've got an SQL server which has all the disks inside VMDK files.  We have 5 volumes (OS, SQL Data, SQL Logs, tempDB, backups).  I've just kicked off a manual backup, uncompressed and the Nimble is sitting around a solid 100-120MB/sec.

I'm not sure what throughput we get when backing this up with the commvault agent.

Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

He's using SQL Jobs to create the backups for each database.  The database itself is not compressed, but the backups are.  The volume that the backups are going to has compression turned off just because we don't want to get that occurrence where we get negative compression on the storage because we're compressing compressed files.

I'll elaborate on the whole setup.

Basically all drives are on the same Nimble array.

VMware 5.1 infrastructure, all NICs on the VM data and iSCSI for the guest are all VMXNET3, all network connections are 1gbe on the server side.  2 iscsi connections 1 data.

On the SAN side we have 10GbE connectivity up to our Nexus 5k switches.

Using Nimble OS 2.x with the Nimble connection manager software.

Server 2008 R2 SP1

SQL Server 2008 SP1

And the drive layout looks like this for us.

C: and D: are VMDKs.

C: we normally use for the system drive and D: for a sort of apps drive, so SQL server is actually located there.

The rest from here on out are Windows iSCSI guest initiator volumes:

F: TembDB

G: SystemDB

I: Backup Volume

T: T-Logs

V: Actual SQL DBs

So we're backing up to I: in this case.

Average bandwidth we're seeing is 20-40MB/s

Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

What is SQL performance like normally?  Have you run SQLIO on the SQL data volumes?

I would check:

1. Performance Policy for all volumes.  I use VMware ESX 5 for system drives.  I made a VMware-No-Cache for Backup drives that aren't iSCSI attached directly in the VM (4k, compression, no cache).  Use the appropriate SQL performance policy for the SQL volumes.

2. Do you run Jumbo Frames?  Remember to set Jumbo frames on the network adapters inside the VM.  Run "ping <nimble iSCSI> -l 9000 -f" to check that jumbo frames are working end-to-end.

3.  Run SQLIO on the data volume (in a quiet period) to make sure the sql data volume is working correctly.

Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

We don't really see any SQL Performance problems.  I'm not sure if we've run SQLIO specifically on these volumes since they've been created, we can attempt to do that during the evening to see what it looks like.  But I would like to think considering we used Nimble and Microsoft best practices for SQL server using the SQL Performance policies per Nimble and 64K allocation unit size per Microsoft's best practice we should see solid performance.

1. Performance policies for all volumes are set to SQL (with the exception of our C and D drives seen above which are VMDKs use the ESX 5 policy, and I drive for backups use the default policy and have the default NTFS allocation unit size).

2. No jumbo frames per suggestion of Nimble engineers because it wasn't worth the gain, plus it seems as though our Nexus 5ks are an all or nothing type switch, you can't enable jumbo for certain ports or port groups, which is problematic.

Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Dave could you give me some insight as to how you did your SQL backups and what kind of hardware it runs on?  I used you as an example with Nimble support yesterday and they were wondering how you were configured to see if there was just a difference in setup (CS260 vs 460 or 10gb end to end, multithreaded backup etc...)  That'd be very helpful.

Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

We are in fact seeing some similar behavior, not from SQL but from a different database system. We are a brand new Nimble customer so I haven't had a chance to start beating the bushes to solve the problem yet -- indeed, it's not really a problem, more a curiosity.

It's a back office system is a program running on a database called Progress, and once a day we run a "live backup" (basically a snapshot). On our crappy old HP MSA2012i it would take around 10 minutes to complete this, and the entire company would slow to a crawl if they were trying to access that program. After 10 minutes everything was just fine. Since moving everything over to Nimble, it now takes 72 minutes to run the identical backup process (it's only about 15 GB of data).

The server is 2008r2 on a VMDK version 3 volume -- it's one big vdisk, everything under the C: volume. Performance Policy is "ESX" (the Nimble's default settings). So like you mentioned, it's reading and writing the same data to and from the same volume, but it's dog slow. I do notice the overall response time for this system is better during the backup period with the Nimble , probably due to the caching.

I was going to look at moving just the data files off to a directly mapped iSCSI volume (instead of running inside a VMDK) and see if it's better.

Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Thought I'd chime in as we're running a similar setup.  Our enviornment:

- Hyper-V

- SQL 2008

- Windows 2008R2, iSCSI drives

     - H: (Data)

     - O: (Logs)

     - T: (Backups - see what we did there? )

- Servers connected to Nexus 5K

- 2 MPIO 1GB NICs for iSCSI

- 2 Teamed 1GB NICs for standard traffic

We moved our SQL from a physical cluster connected to an old XIO SAN to a virtual cluster inside a Hyper-V cluster on the Nimble.  We run full nightly backups and they finish just under an hour, similar to our old setup.  We've always been more focused on uptime than performance, and with a window that didn't change much we didn't really think to investigate from that end.  Anyhow, it seems like a similar enough setup that it could be helpful.  If it gets down to it I could probably run some tests for comparison.

Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Sure.  We're only new to Nimble (in the last month), so we did some performance tests with SQLIO as you do, but that's no where close to what it will do in real life.

The setup I'm using here is:

Nimble 460G with 10gbit iSCSI interfaces.

Connected to a Nexus 5548 10gbit switch.

Connecting to a HP C7000 Blade Chassis via 10gbit Flex Connect.

Each blade is then provisioned with 2x 2.5gbit iSCSI interfaces.  This is what shows up in ESX.

So theoretically, each ESXi host has 5gbit connection through to the Nimble, with a max throughput available of 600MB/s.

Our SQL server is set up with 5 disks, configured as 5 separate volumes all residing on the Nimble.  They're all attached to VMware, we have no iSCSI from within our SQL Server VM directly to the Nimble (against Nimble best practices).

vol1: C Drive, Custom VMware ESX 5 performance profile (4KB blocks, Compression, No cache)

vol2: SQL Data, Standard VMware ESX 5 performance profile (4KB blocks, Compression, No cache)

vol3: SQL Logs, Custom VMware ESX 5 performance profile (4KB blocks, Compression, No cache)

vol4: TempDB, Standard VMware ESX 5 performance profile (4KB blocks, Compression, No cache)

vol5: Backups: Custom VMware ESX 5 performance profile (4KB blocks, Compression, No cache)

I've purposely disabled cache on C Drive, Logs and Backups to maximise read cache available for other workloads.  I *may* enable it again for the C drives as with pagefile, registry and so on it may help.

Test 1: Standard backup to backups volume.

Avg transfer of 90MB/s

Test 2: I re-ran setting all Nimble disks in VMware to change round robin path after every IO.

Avg transfer of 90MB/s

Test 3: Manually copied a .BAK file from the backups volume to the SQL data volume, just to test normal windows transfer speed

Transfer started around 200MB/s but over time dropped to a steady 130MB/

Test 4:  Manually copied that .BAK file from the SQL data volume to the backups volume.

Transfer speed stayed perfectly at 190MB/s the whole time.

So the two things I got out of this was:

1. The major speed killer here seems to be SQL servers ability to read the data from the database and write it to the .bak file.  This could be RAM / integrity checking ?  CPU usage was fine throughout the process <20% usage.

2. Copying data, even large sequential still seems to perform better from read cached volumes.

I'd be interested to see how you go with this, as again we're only new to Nimble and my SQL DBA skills certainly aren't the best so I'd be keen to see what extra performance we can squeeze out of it.


Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Sounds like what you guys have Dave is a little higher end than what we're on actually.  We actually have two CS260s right now, on one side it's a 10gig but still using 1gig connections to the servers until we can make a switch to UCS or something similar.  The other side which is where our SQL boxes are is a CS260 also, but is currently the 1gig model.  We plan on upgrading both sides within a year or so to 10gig end to end, but until then this is what we have.

I think we're in a similar boat though, this is actually my second company I've helped deploy Nimble at now, but previously we were using all VMFS volumes even for the SQL since we were using 3rd party backup utilities we weren't leveraging Nimble VSS verified snaps, so nearly identical to what you are doing.  This company is different however and we're using a lot of windows guest initiator iscsi volumes.  Our DBA however is more comfortable at this point in time to do the native SQL backups rather than to use the Nimble snaps.

I do understand why it's working differently than what our old storage was.  We were backup up to a different disk aggregate at that point, so we never expected to see a 1 to 1 performance trend for this type of oddball action.  But we also didn't expect to see no disk pressure, cpu pressure, memory pressure or network pressure while seeing marginal at best throughput on the array.  At worst I would have expected 2x the performance loss, but not 5x in this case.  I'm sure it's a configuration issue of some sort on our end, but I just don't know what.

I'm with you though, I personally am no DBA, my area of expertise is usually VMware and Windows Server(s).  The rest of my knowledge is anywhere from somewhat knowledgeable to I know enough to get myself into trouble.  Storage is a middle ground between I generally know what i'm doing and somewhat knowledgeable.  SQL stuff though is the lower end of the stick

Thanks for the info though Dave, knowing your performance metrics on those various tests is very helpful!