Around the Storage Block
cancel
Showing results for 
Search instead for 
Did you mean: 

Don’t get duped with a dedupe-ratios-only approach to capacity efficiency

StorageExperts

Pd.jpegBy Priyadarshi (Pd), Product Line Manager, HP 3PAR Storage   @Priyadarshi_Pd

 

Here is a chart depicting deduplication ratios and the storage savings they represent. Something very interesting stands out: the effect of higher dedupes ratios on storage savings falls exponentially with increase in dedupe ratios.

 

Dedupe1J.jpg

                                            Dedupe ratios and storage savings

 

So while just a 2:1 dedupe ratio results in a 50% incremental savings (vs. 1:1), going from 4:1 to 5:1 results only in 5% incremental savings.

 

Why is this so important to realize? Because vendors often brag about their high dedupe ratios. A 10:1 ratio sounds way cooler than 5:1, yet it results in just 9% additional savings. Is 9% additional savings bad? Of course not. Any savings are good. But one would be wise not to assume 10:1 dedupe ratio for planning purposes – as there is an exponential increase in risk (and decrease in savings) when a storage capacity plan is built around high dedupe ratios. And while initial data sets might be very friendly to dedupe, there is no guarantee that repeated updates will keep the dedupe ratios intact.

 

Be aware of the storage overheads when looking at dedupe

Dedupe ratios are well and good. However don’t forget that dedupe is a means to an end, not an end in itself. The final goal is “storage efficiency.” And storage efficiency is not just about dedupe. If an architecture is inherently inefficient, a higher dedupe ratio might not help much. Let me dig into this a bit more.

 

Take for examples two architectures, where architecture 1 has an overall overhead (RAID, metadata, sparing) of 25% vs. architecture 2 that has an overall overhead of 40% or more.To make up for that 15% up front disadvantage, architecture 2 will have to deliver some crazy ratios. Think about it: when architecture 1 delivers just 4:1 savings (so 75% savings), architecture 2 will have to deliver 10:1 savings (90% savings) just to be on par with architecture 1 in terms of storage capacity efficiency. Would you want to bet your projects on a consistent 10:1 ratio?

 

Coming back to the chart above, it clearly shows the law of diminishing returns with higher dedupe ratios. So what can be done to improve storage efficiency? We believe the answer lies in focusing on every aspect of storage architecture to eventually result in a lower $/GB usable.

 

At HP Storage, that’s what we have been doing. Instead of only chasing higher dedupe ratios, we take a holistic approach called compaction that includes dedupe ratios but does not stop there. Dedupe ratios are good but not enough. The diagram below shows our approach, all of the elements shown below have been delivered (and we are working on more elements that tie into this holistic approach of reducing $/GB usable).

 

Dedupe2J.jpg

                               Compaction - a mutlifaceted approach to efficiency

 

Just as an example of how powerful this approach has been, with HP 3PAR StoreServ we typically see at least a 50% savings with our thin technologies, resulting in 50% reduction in $/GB usable. Then, with thin deduplication, even if you plan for just a 2:1 dedupe ratio, that’s 25% additional reduction in $/GB usable.

 

Next, with adaptive sparing, we are able to increase usable capacity of SSDs by 20%, thereby obtaining another 20% reduction in $/GB usable. Just with these three elements of compaction, the overall reduction in $/GB usable is 80%! No wonder our 3PAR StoreServ 7450 all-flash array was the first AFA to make flash mainstream with its $2/GB usable price point. And this is just the beginning. With our flash optimized architecture, we are able to take advantage of lower flash costs like few others can. We have embraced cMLC (that comes with a 5-year warranty with no additional support cost), and already offer one of the biggest cMLCs in the market (1.92TB SSD). We are on the leading edge of the cost decline curve of SSDs benefitting our customers.

 

But price may not be even the most important reason you might like us! With our consistent sub-mms performance even under multi-tenant workload, a very rich enterprise feature set and the fact that our solution encompasses all-flash, hybrid and disk based systems to match your varied needs to the right solutions, all within a single architecture – you would want to consider 3PAR flash not only because it can accelerate your business but also because it can keep your risks in check.

 

Check us out at:

www.hp.com/storage/flash

 

Interested in learning more? Check out these papers:

HP 3PAR StoreServ optimized for flash technical white paper

HP 3PAR StoreServ Architecture technical white paper

HP 3PAR StoreServ 7450 All-Flash datasheet

0 Kudos
About the Author

StorageExperts

Our team of Hewlett Packard Enterprise storage experts helps you to dive deep into relevant infrastructure topics.

Comments

== disclaimer: Pure Storage employee ==

 

Priyadarshi

 

I think there’s an error in correlation you are presenting and an exclusion relating to addressing considerations around the lifecycle of NAND flash.

 

The market understands NAND flash is more expensive than disk storage, which is measured in terms of $/GB – this form of cost is often referred to as the price per raw GB.

 

Raw prices are worthless, as the market also understands that storage architectures employee various forms of overhead to deliver capabilities like resiliency and ensure performance. For these capabilities storage capacity is lost resulting in an increase in price per GB – this form of cost is often referred to as the price per usable GB.

 

Usable prices are how one looks at the legacy disk storage market. Data reduction technologies are the norm with all-flash arrays and they reverse the loss of capacity by amplifying the effective capacity. Data reduction lowers the $/GB by delivering by ensuring every block of data on the array is unique (this is data deduplication) and each unique block requires less storage capacity than it logically requires on the host (this is data compression).

 

*** Note: Thin Provisioning is not a data reduction technology – it is a means to dynamically allocate and return capacity on demand. It does increase efficiency but it does not reduce the amount of data stored on disk or flash storage. ***

 

Your correlation misses the mark in regards to effective capacity. A customer can either storage 10TBs of data on 10TBs of usable capacity -or- they can store 10TBs of data on 2TBs of usable capacity (with a 5:1 data reduction ratio). In the latter example, the customer is receiving 10TBs of effective capacity for the price of 2TBs of usable capacity.

 

Summary: Data reduction directly correlates to customers purchasing less raw capacity.

 

Additionally, the market understands NAND flash has a finite number of write cycles. There are two methods for increasing the life of a flash drive… 1) over provision SSD capacity (aka provide more raw capacity than what is addressable – allowing space cells to replace failed cells over time) and 2) reduce the number of writes a flash drive must execute (aka avoid writing duplicate data (dedupe and array clones) and store all data in a dense format (compression).

 

By eliminating redundant data writes – both from hosts and from SSD processes like garbage collection – SSD resiliency is increased multi-fold.

 

Summary: Data reduction directly correlates to SSD reliability.

 

I know we work for competing vendors but I believe the market shares a common objective whether we are employed by Pure Storage, HP, EMC, NetApp or another… we all believe innovation in areas like data reduction are enabling customers to adopt all-flash storage arrays today at a price of – and sometimes below that of - mechanical disk arrays.

 

All-flash is the new norm in storage arrays for performance centric applications and data sets and the future is even more exciting as the raw price per GB of flash drops new markets will open like capacity optimized storage for unstructured data. It’s an exciting time – keep up the good work!

 

--cheers,

Vaughn Stewart

@vStewed

http://FlashStorageGuy.com

The last 2 papers that You link to, links to adresses that are NOT working.

 

BR

Thomas

I love blogs that get attention! 

 

I think they main point of the write up is to invite customers to distance themselves from mere ratios and look

at what storage efficiency really is.

 

The second point is really around the question “what’s in a ratio?”:

 

Every vendor use a different approach for calculating and displaying savings;

some present clear breakdowns other don’t, some include zero detection and patter removal others

don’t, sometimes raid and metadata is included others don’t.

 

Below is a simple example:

 

20images of 80GiB each for small VDI deployment – Host allocation ~1.34TB

 

3PAR – Dedup ratio displayed 1.85:1  - Data allocated on the array 130GB

Competitor – Data Reduction displayed 9:1 – Data allocated on the array 200GB

 

So the rhetorical questions are :  

 

  • Which data reduction technology is more efficient?
  • And which approach empowers the customer to do accurate capacity planning?

 

I also find Vaughn comment around Thin Provisioning interesting, as Pure Storage advertises Thin Provisioning as “Total Reduction”,

So if the claim is Thin Provisioning is not data reduction technology what does “Reduction” stand for J

Ivan - interesting points, especially as the math doesnt work in either scenario.

 

-- cheers,

v

CalvinZito

The last 2 papers that You link to, links to adresses that are NOT working.


 

 

Hi Thomas - they are for me. Not sure why but every once in a while, the links are flaky and work if you try again.  Thanks for letting me know.

I think some of the confusion and disagreement here is around 3PAR's definition of deduplication and the analysis that follows that definition. Thus far in my experience, 3PAR is the only vendor who employs a ratio such as you described (~1372GB reducing to 130GB resulting in a ratio of 1.85:1). Of course, the "3PAR vs Competitor" comparison is missing a key piece of data: the actual data (not the OS/image partition size of 80GB). Thus, the formulas that follow are a bit incomplete. Assuming the 1.34TB of "host allocation" is an indicator, though, then each VDI image is a fat 67GB each.

 

As a customer, I think 3PAR is confusing the playing field and imposing a disadvantage on itself by using this non-intuitive dedupe ratio. When I look at XtremIO and Pure presenting dedupe or data reduction ratios of 1.7:1 and 4.5:1, I know that my physical capacity is being multiplied to a logical, usable space of 1.7x and 4.5x, accordingly (i.e. 170GB fits into 100GB, and 450GB squeezed down to occupy 100GB). It's simple and it makes sense.

 

If 3PAR wants to be more user friendly in the data analysis field, I would encourage the team to adopt this common methodology for calculation. 3PAR can still present the "Thin Built-In" data alongside the intuitive data reduction ratio, just as XtremIO shows users the dedupe and compression breakdowns under its total ratio.

 

Wrapping up, I just want to reduce the marketing spin out there. The graph at the top is spin in my opinion because it makes it look like 10:1 isn't noticeably better than 5:1 or 3:1. In my world, if 10TB of physical disks is all I need to hold 100TB of data, that's vastly better than needing 20TB or 33.3TB to hold that same 100TB of data. Hope that makes sense.

--

Vaughn Stewart 2 hours ago
Options
Ivan - interesting points, especially as the math doesnt work in either scenario.
-- 

 

That is exactly my point, both maths are correct (at least I know our is) but the assumptions of what is considered as data reduction or deduplication are clearly different. So the only baseline a customer can use to draw a conclusion is to look at what is exported (let's call it thin provisioning for sake of argument) vs what is actually allocated on the storage array. 

It's odd to me that there's all this interest in the dedupe ratio when the effect of what Vaughn is saying is to remove the need to know about it at all.

 

I should only care about $/usable GB for my workload. If I'm doing VDI, maybe I get great dedupe, if I'm doing database OLTP, not so much. Doesn't matter. How many $ did it cost to buy my 500GB of usable storage?

 

If I get 500GB of storage for my database for $50,000 (made up numbers), I don't care what the dedupe ratio is.

 

You, as vendors, care a lot, because better dedupe means you can deliver more usable with fewer parts, and keep the cost of your gear down. Maybe you keep the savings as better margins, maybe you pass them on to customers with a lower price.

 

 

Thanks all for your lively comments.

 

Hi Chris - On your comment on the 10:1 vs. 5:1, my point is not to disparage higher savings one could get. Higher ratios (and savings) are welcome. It is about risk - to achieve the same $/GB usable, a capacity planning done assuming a continued high ratio is riskier than one done with lower ratios. In the end, customers might still get the higher savings but at least they avoid the up front risk. Secondly, I agree with you - we need to do a better job in explaining 'what's in a ratio' and not let competitors bedazzle customers with high ratios. This is why the right conversation needs to be around which solution lets me accomplish my job in the most economical way and without adding risk. 

 

Hi Vaughn - Flash vendors have done a good job of keeping the discussion what I call "widgets" level - my ratio is better than yours, this tech is not data reduction vs. that tech is, I treat NAND as a touch-me-not queen etc etc. And the number of capacity types are just insane - raw, post-RAID usable, post-RAID and post-metadata usable, post-RAID and post-metadata and post-spares usable, usable with Dedup/compression (that some refer to as effective), Used (by host), Provisioned (to host) and so on. 

 

Like I said above, the real question is only around the business objective that the storage solution is being designed for (all-flash or otherwise), cost of achieving that and the risk associated. The simple math used in the blog shows why a solution promising a high ratio can be more risky (and expensive). The technical architectural details are important as well, and this is why we have architecture whitepapers (could you point me to a Pure architecture WP link if one is available now).

 

 

Events
Online Expert Days - 2020
Visit this forum and get the schedules for online Expert Days where you can talk to HPE product experts, R&D and support team members and get answers...
Read more
HPE at 2020 Technology Events
Learn about the technology events where Hewlett Packard Enterprise will have a presence in 2020
Read more
View all