Array Performance and Data Protection
1752794 Members
6272 Online
108789 Solutions
New Discussion

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

 
SOLVED
Go to solution
julez66
Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Nice drive layout
A couple questions for you.

First how are you doing your SQL backups, I'm assuming from within SQL server to a bak file on T:

Second how are your drives mapped.  Are you using the 2008 R2 guest initiator from directly in that VM or are you passing it through Hyper-V as a virtual disk?

Last so you say you're getting the same performance on your backup.  When the backup runs how long does it run and during the backup what sort of Nimble bandwidth spike do you see on the array?  20MB/s 50MB/s, 200MB/s?

tmoore106
Trusted Contributor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Julian I have some updates for you and want to talk with you an Ryan about some things to test/try. Let's chat!

ryanmus124
Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Correct, we are using running our backups within SQL to a .bak on the T drive.  All SQL drives are mapped directly with the 2008R2 initiator with 2 1GB virtual NICs, MPIO (least queue depth), and Jumbo Packets.  The OS drive is a virtual disk also on the SAN.

Our backup runs for ~50 minutes nightly and the sum of our .baks for a night are ~45GB, here's what I'm seeing:

- Data drive

     Read spikes to 130MB/sec, averages around 60 MB/sec

- Backup Drive

     Write spikes to 60 MB/sec, averages around 20 MB/sec

I'm getting averages by eyeballing, for some reason I can't use a small enough custom window so the calculated numbers are being skewed by big periods of inactivity.

julez66
Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

So still not completely sure what's going on with this since we're not seeing any bottleneck really anywhere.  However, we have found that if you split up the backup into multiple files (even if it's on the same disk volume) where it can essentially use multiple threads it runs faster.  I haven't had a chance to really look to see what the array is doing since our DBA has modified this.  I know it's sort of a band-aid, but still doesn't explain why we're seeing this hidden bottleneck.

mallocarray12
Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Just for giggles and grins, trying running the SQL backup to multiple .bak files, even on the same volume.  So rather than having SQL go to backup.bak, have it go to backup1.bak, backup2.bak, backup3.bak, etc.  I run our 200 GB database to 10 backup files like this and went from around 40MB/sec to around 130 MB/sec.  Restores are faster when restoring from multiple files as well.

julez66
Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Sorry Joshua, I hadn't updated this thread for a little bit.  Our DBA has since tried that and it does increase the speed, but still doesn't explain why we see slower backup performance than before.  Basically we're not really doing apples to apples when we do this, but it has worked as a band-aid for the time being, we've got the time closer to where it was.  We're still experimenting with this right now to see if we can't figure out where the bottleneck is, because we've yet to find one anywhere, disk, network, memory or CPU.  I'd like to see something somewhere show me a bottleneck.

hduong71
Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Hi Julez,

I'm sorry about your issue here. It looks like the backup software here is copying the files in a single threaded fashion.

If you can try two test:

1. Manually copy one large from one drive to another drive. Measure the performance.

2. Manually copy TWO large files from one drive to another drive. Measure the performance.

If test number 2 is faster, that indicates a single threaded operation with the backup software. Depending on the backup software, it may or may not be tunable.

If the performance numbers on both number 1 and 2 are the same, then it may be the array, network, or host.

Feel free to reply back on the results of your test

julez66
Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Thanks Huy, as seen above in a response to Joshua Post we have verified that to be helpful, but still does not explain why the slowdown is occurring when using the same backup procedure and same server hardware and same software.  Even single threaded we should be seeing a bottleneck somewhere I would think.

hduong71
Occasional Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Hi Julez,

The problem is single threaded I/O processes would send 1 I/O and wait, send one I/O and wait.

You can think of this like an amusement park ride. Let's say the ride can do 32 at a time. You send one person at a time, wait until the person finishes the ride then send one more person to line up for the next ride. It would be more efficient to send two, or three, or up to 32 people  in the line, have them go on the ride and then fill the line again with 32 people.

This is what is happening with single threaded. The thread would send 1 I/O in the queue (line), wait for it to come back, and then fill the queue again with 1 I/O event. By threading it out, you are filling the queue with more I/Os at the same time.

The support engineer is correct that in general Jumbo frames does not help with most workloads. With random workloads, it does not help much. With Sequential workloads like we are seeing here, it MAY help, but no guarantee. From experience, you do sacrifice manageability when configuring jumbo frames, that's why most customers don't implement them.

julez66
Frequent Advisor

Re: SQL Server backup performance (We're not using Nimble VSS snapshots)

Right which is all well and good, but still doesn't explain why it was faster before, but 6x slower now on the same VM on the same hardware.  Single threaded process would be based on the server hardware it's located on, if it's on the same server hardware, using the same CPU to process the backup, I don't know why there's a difference, that makes absolutely no sense.  I'm still searching for something tangible I can visibly see bottle-necking the job.  Maybe there's that much of a difference in line speed to cause the problem, I don't know yet, but until I see some sort of bottleneck somewhere like in queue depth, cpu, memory, network, something, somewhere, I'm going to have to keep searching.

Think of it this way.  Old system we saw 30 min backup jobs.  New system we see closer to 4hrs.  Same server, same backup job, same process.

If I were to multithread with multiple backup files on the old system as we are now with the new, would we not see that 30 min backup job decrease as well?  Logic says yes.

If I run 8 backup files on that 4hr job, and get 30 min backups.  What would I get on the 30 min backup job?  4min?

That's the logic I'm looking at here.

The the only differences I'm seeing is the fact that we no longer have a secondary disk aggregate (but we're not seeing any disk pressure on the new storage anyway, so that shouldn't be the problem).

The other would be we're using 1gb multipathed iscsi connections instead of 4gb multipathed fc...(but again I'm not seeing any network pressure either, we're not flooding those iscsi connections).