HPE Storage Tech Insiders
cancel
Showing results for
Did you mean:

## Compression Ratios vs. Space Savings: A Numbers Game

This has come up in conversation a number of times since I joined Nimble, where data compression is such a key component of our system design.  It may be old news to a lot of folks but there's bound to be someone out there who'll benefit from this.

Thanks to the analytics engine behind InfoSight, we have really good real-world data indicating how much compression you can get out of typical enterprise workloads.  This is almost always expressed as a compression ratio such as 1.5X, which is really shorthand for 1.5 to 1 or simply 1.5:1.  But what most people want to know is, "By how much capacity will the data footprint of my workload by reduced?  How much space do I really need for this particular application once Nimble's compression has worked on my data?"

To answer this, I found it helpful to "rephrase" the compression ratio concept.  Mathematically, a data compression ratio is defined as the size of the uncompressed data divided by the size of the compressed data:

Uncompressed data

Compression ratio  =   {-------------------}

Compressed data

So if an application had 3 TB of uncompressed data but it compressed down to 2 TB, the resulting compression ratio would be 3 TB / 2 TB = 1.5:1 or 1.5X.

But many people find it more useful to think of data compression in terms of space savings or reduction percentages, which you would calculate as follows:

Uncompressed data - Compressed data              Compressed data

Reduction percentage  =  -------------------------------------   =  1 - {-------------------}

Uncompressed data                        Uncompressed data

Using our previous example, we calculate the space savings in going from 3 TB uncompressed down to 2 TB compressed as  1  - ( 2 TB / 3 TB) = 0.33, or 33%.

T

### tmcmullin51

Nice post Taylor, very useful info!

Hi Taylor,

Although I am replying on very old post, but I think it makes sense to post it anyways because one can always drop into this conversation when looking for this information as did I.

Anyways, point I want to make is that your calculations don't give right picture in percentage space savings as does the compression ratio. 33% savings of space is not the final answer yet as one would be reusing saved space to store more. A compounded savings would be much larger.

This is much simpler to think in terms of compression ratio. i.e. 1.5x  compression ratio allows us to store 1.5TB data on 1TB raw capacity. which is 50% more data you can store. With 2x compression, it is 100% and 4x, it is 300% and so on. Ultimately what matters is \$\$ per GB. Hope I could make my point.

-Rajat

I see what you mean, Rajat, and I agree that compression ratio is a better way to express space savings.  The intent of my original post was simply to help people convert from one "marketing metric" to the other.

I don't see way to reasonably quantify compounded space savings as the calculation would have to be recursive and would only work as a formula if the compressibility factor was constant for all data coming into the system.  But if you have other ideas I'd love to hear them.

I understand your intent, but I wanted to add careful evaluation, because recently we were in similar debate internally within team and it is easy to get into impression of underestimating what a change in compression ratio results into. One has to be careful.

Here is recursive compounded savings assuming 4x compression and assuming same compressibility across all writes. Assume that we always writes data equal to space left on the volume.

Pass 1: Write 1 unit of data results in 1/4 occupancy

Saving (S) = 3/4

Pass 2: Write to remaining 3/4 space again

S = 3/4 + (3/4)*(3/4)

And so on

S = 3/4 + (3/4)^2 + (3/4)^3 + .... + inifinity

This is GP with total sum given as

S = (3/4)/(1 - (3/4)) = 3

So, what this means, if we keep compounding with compression ratio of 4x, we get 3x savings or 300% savings.

Where is the catch? With constant compression ratio, it does not matter how much data we write in each pass as it always yields 3x savings. So, if we can fill up the volume in first pass only, i.e. write 4x the data to it, we can get 3x savings write away. But usage is always going to be incremental, that's why I mentioned compounding the savings.

I think this is just a question of comparison semantics.  I'm comparing the data written (positive space) to a given storage capacity, and you're comparing the data not written (negative space) to the same capacity.

With a 4X compression ratio, you'd store 4 TB of data in a 1 TB volume, right?  That's a net space savings of 3 TB, which is 300% of the total space consumed (the 1 TB volume size).  But it's not 300% of the original data footprint (4 TB) - it's 75% of the original data footprint.

Agreed! Both views have values in it, both are perfect calculations, no disagreement on that. My point was that one needs to perceive compression ratio both ways to evaluate its effectiveness. Since your blog was only about 75% view, I thought to complete the picture with 300% view as well if anybody in future stumbles upon this blog in (like I did).