Around the Storage Block
1748166 Members
3662 Online
108758 Solutions
New Article ๎ฅ‚
katedavis

On-premises solution vs. off-premises public cloud: weighing the costs of big data storage

Understand the value of on-premises storage for big data analytics. This storage cost benefit analysis is between an on-premises solution using HPE Elastic Platform and off-premises public cloud offering AWS S3.

The comparison: Enterprises are increasingly transitioning to the public cloud for data storageโ€”with the objective of achieving greater Weighing costs of big data storage_blog.jpgflexibility and scale, while leveraging the potential cost benefits of a pay-as-you-go consumption model. With the trend of moving to the cloud to lower storage costs, Amazon Web Services (AWS) S3 is generally viewed to be the market leader in public cloud storage. Itโ€™s perceived by many enterprises as the most cost effective option for storing cold data.  For short term data store needs or smaller data volumes, AWS S3 can be an attractive option for customers. 

Hadoop Distributed File System (HDFS) Erasure Coding offers a very compelling, cost effective alternative to AWS S3.  As explained in the blog, Big Data update: Hadoop 3.0 HDFS erasure coding performance, instead of running to the cloud for cost-savings, the new erasure coding and hardware compression features in Hadoop 3.0 reduces the amount of actual capacity needed to store HDFS workloads by a factor of 6x.

HPE Elastic Platform for Big Data Analytics vs. AWS S3

To address your questions around the cost-effectiveness of on-premises solutions and the trend of moving to the cloud, we have summarized the results of recent HPE testing. We are demonstrating that storing data on-premises with HDFS Erasure Coding for a duration of a typical hardware lifecycle (3+ years) offers a potentially lower cost solution compared to storing the same amount of HDFS data on AWS S3.

For the HDFS infrastructure we utilized a density-optimized storage tier as part of the HPE Elastic Platform for Big Data Analytics (HPE EPA) architecture. The HPE EPA is a building block model for delivering big data and analytics workloads composed of independent workload-optimized compute and storage blocks.

This compare netted that the HPE solution costs $0.457 USD per GB compared to $0.756 USD per GB for AWS S3 (without data transfer costs). The HPE solution presents a 40% cost savings over AWS S3. When the comparison includes AWS data transfer cost pricing, the cost is about $0.874 USD per GB (not including licenses for RHEL and Hortonworks distributions which will need to be purchased), resulting in a 48% savings with the HPE solution.

The following graph represents $/GB for the 3 options (HPE EPA, AWS S3 and AWS with data transfer costs).Price_compare_graph.png

HPE On-premises Solution

The HPE Big Data Analytics team tested the latest version of HDFS Erasure Coding with Hadoop 3.0.  The on-premises solution was comprised of HPEโ€™s Elastic Platform Architecture for Big Data Analytics โ€“ Balanced and Density Optimized (EPA BDO).

Below are configuration details for the storage tier (the compute tier and compute resources in AWS were not a factor in this evaluation):  

  • 17 HPE Apollo 4200 Gen9 systems
  • 76PB RAW capacity (17 nodes of 28*10 TB disks)
  • 1 HPE 42U G2 Shock Rack

Erasure Coding with RS(10,4) was employed for the exercise, resulting in 3.4PB of actual storage available. 

Typically, to provision on-premises infrastructure, the following factors must be considered:

  1. Hardware procurement cost (Rack and servers)
  2. Hardware support cost
  3. Licensing cost to provision appropriate Distro (Testing utilized Hortonworks)
  4. Power and Cooling cost
  5. Space cost
  6. System administration cost
  7. Direct and Indirect Labor cost

HPEโ€™s EPA sizer tool will provide information on factors 1, 2, 3, 4 and 5 above.  This calculation also included:

  • System administration related cost of $26,088 USD for 3 years ($199 USD per rack per month).
  • Direct and indirect data center labor costs and space costs are estimated at $31,189 USD.

This table details the cost heads for the HPE EPA on-premises solution:Costs_table.JPG

AWS S3 Off-premises Solution

For the compare to AWS S3 object storage, the same capacity of 3.4 PB was used.  The time period considered was also for 3 years.

For AWS S3 pricing, the AWS website was used to identify the cost, which ended up at $0.756 USD per GB. The lowest cost โ€œstandardโ€ storage available was $0.021/GB/month.[2]  Calculating a 3-year cost would total $0.756/GB.

Conclusion

In summary, the HPE on-premises offering, based on the Apollo 4200, will cost about 60% of the equivalent AWS S3 cost per GB.  Next, you should consider the cost for accessing the data and for data transfers in/out (also extra for support) with AWS S3. A rough estimate shows that you may end up paying $276,480 over 3 years for data transfer costs (considering 150 TB data transfer per month at the rate of USD 0.05 per GB for >150 TB per month). This works out to about 17% of total cost of ownership.

To learn more about HPEโ€™s Elastic Platform architecture for Big Data analytics, check out HPE reference architectures.

Featured articles:

 

[1] Industry costs obtained from โ€œHPE On-Prem Price-Performance Beats Amazon Web Services (AWS)โ€, https://h20195.www2.hpe.com/V2/GetDocument.aspx?docname=a00043038enw, April 2018

[2] Lowest-cost, region-based pricing:  US West (Nothern California), https://aws.amazon.com/s3/pricing/, April 2018

0 Kudos
About the Author

katedavis

I have been working in the tech industry for over 15 years marketing hot topics including storage, software-defined, big data, hybrid cloud and as-a-service.