Array Setup and Networking
1752780 Members
6222 Online
108789 Solutions
New Discussion юеВ

Re: Considering Nimble SANs

 
SOLVED
Go to solution
dajonx135
Advisor

Considering Nimble SANs

Hi,

I am considering Nimble as a storage solution for SQL Server 2016 (when it comes out) as well as Hyper-V VMs and file shares.  The largest database I have is about 2 TB.

I was wondering what you guys think about Nimble SANs.  How is performance?  I've read that you need to get the cache size right or performance can take a big hit since it's hitting the disks.  How are the features such as snapshots, replication, etc?  Is there any downtime in upgrading firmware?

Any insight is greatly appreciated!

12 REPLIES 12
Nick_Dyer
Honored Contributor

Re: Considering Nimble SANs

Hi James, welome to Nimble.

There's a wealth of information out there about Nimble - one of my favourite threads most recently is this one from Reddit. Hopefully other users will chime in on this thread for you too.

If you're considering the Adaptive Flash arrays then yes, the size of the SSD flash cache is important to your relative working set size of your workloads. However, the flash cache is only used for random reads (not random writes, nor sequential reads/writes) so we tend to need somewhere in the region of about 5-10% flash cache capacity for your total size of workload. If you're considering our All Flash array then of course there is no requirement to size for flash cache

Nick Dyer
twitter: @nick_dyer_
dajonx135
Advisor

Re: Considering Nimble SANs

Thanks, Nick!

That was a great thread to read.  Pretty entertaining!

Can you please explain the part where you said that the flash cache is only used for random reads?  How come it's not being used for sequential reads and random/sequential writes?  (I'm trying to understand...)

Also, how can I get pricing to get an idea if it's outside of our budget?

pfrisoli27
Valued Contributor

Re: Considering Nimble SANs

Hi James

all incoming IO is serviced via cores/NVDIMM/DRAM. Once IO enters the NVDIMM we mirror that to the standy controller and couple that workload into DRAM where CASL processes the IO by CRC/Write verify, indexing the Metadata, serializing the stripe and applying variable compression. From there the authoritative data (entire stripe) is serviced to the NLSAS tripe parity protected layer. Back upstream in DRAM, CASL analyzes the blocks that show to be Flash worthy (heuristic pattern review), and then buffer and apply CASL to those blocks and single write them to the adaptive flash layer (write amplification mitigation). All incoming data in converted to sequential writes so that any read calls are serviced as sequential reads. I hope this helps.

PS: our all flash array is not proportional cache but systemic to the array so this may be a great fit for you.

thanks

Nick_Dyer
Honored Contributor

Re: Considering Nimble SANs

Hi James,

Paul's given a great explanation of the filesystem layout - there's a further detailed blog post from the founder of CASL here which is a good read: The File System Powering Nimble Flash Arrays | Nimble Storage

TL;DR - We've separated the read/write IO path and are able to service random and sequential IO in isolation for both reads and writes - causing minimal performance bottlenecks for mixed workloads

Nick Dyer
twitter: @nick_dyer_
dajonx135
Advisor

Re: Considering Nimble SANs

Thank you, Paul and Nick!

How can I find out how much your systems would cost?  Just curious to see how much I'm looking at since I have absolutely no idea.

Also, can you please explain to me how the redirect-on-write snapshots work since so many people say that their snapshots take up very little room?

Nick_Dyer
Honored Contributor

Re: Considering Nimble SANs


James Yang wrote:



How can I find out how much your systems would cost?  Just curious to see how much I'm looking at since I have absolutely no idea.


If you ping me - nick at nimblestorage dot com - i'll hook you up with your local account team who can assist with pricing for you.


Also, can you please explain to me how the redirect-on-write snapshots work since so many people say that their snapshots take up very little room?


Oh man, that's a WHOLE world of discussion right there! Thankfully, we've already got useful content available which should answer those questions. Take a look here: clicky

Nick Dyer
twitter: @nick_dyer_
dajonx135
Advisor

Re: Considering Nimble SANs

Thank you, Nick!

I'll have a good amount of reading this afternoon...

I was wondering if you guys have tested with SQL Server 2014?  Is there any information regarding SAN-to-SAN replication, deduplication with SQL Server, Triple-Stripe Parity, and if hot spares are taken into account? 

I apologize for bombarding you with questions, but I'm pretty intrigued by your product.

pbitpro96
Advisor
Solution

Re: Considering Nimble SANs

James,

I'm a SQL Server DBA/Architect, virt engineer, storage admin, and a Nimble partner (if you're in TX I can get your pricing!).  In order for me to be the best MSP/Consultant for SQL Server, I found I have to be nearly expert in the entire stack down to storage.

Nimble, honestly, is the easy button for SQL Server. So much of what I do for ETL processes, standard maintenance, and for both transaction and warehouse workloads, the Nimble has been no-brainer move.  From the smallest single application DB, to the eCommerce/Portal databases, write-latency is the bottleneck when CPU and memory can be easily upgraded.

Backup windows are reduced, log waits go to near 0, and Nimble's adaptive cache takes care of the rest. It keeps the data that doesn't fit in SQL's memory, fresh at hand in the read cache. If it is cold data, it is on the spinning disk (where it belongs). I don't cache my backups, and the read/verify process is still pretty impressive compared to much higher spindle-count SANs.

I have 2 large clients on SQL 2014 both on Nimble. They are huge fans of the performance and I didn't have to try hard - just moved the VMDK files off of the Compellent to the Nimble datastore - no other changes to the VM/network/index/etc - it just completed every job 4-10x as fast. Again - write latency is king.

When you decouple write performance from spindle/disk count (or speed) amazing things can happen. And that is the key differentiation (to me) that Nimble does better than their competitors.

Replication - Nimble can do database consistent snapshots when in SIMPLE recovery mode. While nice, 90% of my production databases are in FULL mode so that native backup doesn't help me. I replicate my backup drive (where I keep FULL and LOG backups) to my DR every hour. Failover is simple, and I don't event have to failover to test my restores at DR. I can create a zero-copy clone of the latest replicated snapshot, mount it, do the restore, do my checks, and then delete the clone with out breaking anything!

On the Hybrid storage Nimbles, I don't have any dedupe yet. The newest All Flash Arrays include dedupe and I am anxious to see what it can do with my large warehouses. One nice thing, is that dedupe can be turned OFF with Nimble if you don't want the performance overhead or would rather keep full replicas of your data.

Triple-stripe parity/hot spares. In the Nimble, there are no hot-spares, and their drive redundancy is "Triple+".  So there are 3 copies of your data on the Nimble disks, with one parity disk. A hot spare would be a waste of capacity, and with NBD drive replacement, I have no worries about multiple concurrent failures taking the SAN down. 

Hope this is helpful!!

jstinson105
Advisor

Re: Considering Nimble SANs

Hi James,

I can try to answer your questions from my perspective as a Nimble customer. About a year ago we moved all storage production storage to Nimble and have had a really good experience with it. Aside from ~500 VMs on it, we also have a few large SQL Server instances on it. One of them is a SQL 2014 Availability Group where the main database is a bit over 2 TB. We have 5 replicas of this data, 2 on our production Nimble array and 3 on the backup/DR array. We have had zero problems and in fact it has been incredibly responsive compared to our old IBM array. That Big Blue array was about 4 years old but average latency was 4-5ms, spiking regularly to 20ms+. On Nimble, our average read latency is right around 1 ms with write latency consistently less than .2ms! On top of that, we're seeing array-side compression of >70% on this database, completely transparent to the server. Needless to say, we've been pretty pleased!

As for your other questions:

  • Snapshots work as advertised. Our most critical database gets snap'd and replicated every 15 minutes without any perceivable impact on performance. All of our VMs are on a 30 minute snap and replicate schedule, also without performance impact.
  • For firmware updates, we do perform ours after hours because of exactly one SQL failover cluster which goes offline if there is a controller handoff but this problem existed on the NetApp array it was originally built on. I have a colleague here in town who does his updates during the day. At lunch last week, he looked at his phone and laughed because he forgot he'd kicked off a firmware update before leaving the office! Clearly he has no worries about the process. I don't think my paranoia would allow me to leave the building while the upgrade was happening but if it weren't for the one fussy system, I wouldn't hesitate to upgrade in the middle of the day.
  • Cache sizing is important but your Nimble sales engineer should be able to guide you. We were also nervous about this and so we added extra cache to the system beyond what they recommended. We did this so that we could "pin" our main database into cache ensuring all random reads come from SSD. It is true that a read request that can't be found in cache will have to go out to spinning disk which will be slower but my statistics (of which Nimble gives you plenty) show that this is pretty uncommon in our environment meaning Nimble's caching algorithms are solid. Our cache hit rate is 98%+ during business hours, only dropping below that during nightly backups which obviously accesses parts of the system that might have been idle for many hours.
    • The extra SSDs we ordered effectively doubled the cache size. However, Nimble's Infosight analysis shows that even with the large volumes we have pinned in cache, we only need about 40% of the installed cache which would be 80% of what they recommended so they were pretty spot on.

Altogether, we have about 100 TB of used space, 40% average compression (a large image datastore tanked this metric which was over 50% before), an average of 8000 IOPS and all of it is averaging sub-millisecond latency. Nimble has worked very well for us and I can recommend it without reservation.

Cheers,
Jonathan