Around the Storage Block
1819804 Members
2963 Online
109607 Solutions
New Article ๎ฅ‚
StorageExperts

Reduce your overall storage footprint and TCO with HPE GreenLake for File Storage

The advanced data reduction in HPE GreenLake for File Storage optimizes your Elasticsearch storage footprint and helps reduce your TCO. Learn how.

โ€“ By Keith Vanderford, Storage Solutions engineer, HPE

In American culture, a familiar childrenโ€™s tongue-twister asks this question: How much wood would a woodchuck chuck?  A similar question might be applied to modern storage platforms, although it does not roll off the tongue as poetically: How much data reduction could a storage system achieve? The amount ofHPE GreenLake for File Storage Elasticsearch-BLOG.png reduction attained by a storage platform has a direct impact on the storage capacity required for your data, and thus affects your overall TCO. The higher the reduction factor, the smaller the capacity needed to store the data, and the lower your total cost of storage will be.

We tested the data reduction capability of HPE GreenLake for File Storage with several different workloads which represent a variety of data types. Each data type we tested is commonly used across a broad spectrum of businesses. Our results demonstrated that the advanced data reduction of HPE GreenLake for File Storage can reduce your storage footprint, leading to a lower TCO.

Among storage vendors and platforms across the industry, youโ€™ll find some pretty bold claims about the amount of data reduction thatโ€™s possible. But how do you sort through the claims and figure out which ones are accurate, and which are over exaggerated? How do you know what you can expect to achieve in your use cases and with your specific data types? Some of those claims really do stand up to the scrutiny of real-world experience. But others can be based on highly reducible data, which includes a lot of duplicate or very similar entries. Itโ€™s a little like those advertising claims that automobile manufacturers make about getting a certain number of miles per gallon for fuel-powered cars, or a specified range per charge for electric cars. Those figures are usually achievable, but sometimes only under carefully controlled and ideal conditions. What you get in the real world is sometimes a little less glamorous than the marketing claims. The same could be said for some of those claims about data reduction.

What can you really expect to get for data reduction with your data and your applications in your environment? What impact does the data reduction ratio have on the amount of storage capacity you need in your environment? Weโ€™re here to help you sort through the clutter and determine what you might reasonably expect in your own environment, with your own data. Our aim is to help you develop some realistic expectations about the amount of data reduction you might actually achieve in real-world use cases, not ideal circumstances with highly optimized datasets.

To test this, we did not use highly compressible datasets. Instead, we used a collection of datasets that represent common real-world use cases. Our testing was done using Elasticsearch, with data from the cold tier written to HPE GreenLake for File Storage. The goal of this testing was to determine the data reduction capability of the HPE GreenLake for File Storage platform which is recommended for Elasticsearchโ€™s cold and frozen data tiers. Because our focus was on HPE GreenLake for File Storage, the data reduction of other storage platforms used for the hot and warm data tiers was not monitored during this activity.

Workloads represented commonly used data types

The workloads used in testing were generated using Rally, which is the tool developed by Elastic for benchmarking Elasticsearch. We used several different tracks in Rally to represent a variety of different data types and workloads. A track in Rally is a specification for one or more benchmarking scenarios which includes a specific collection of documents or entries. The track defines the indices to be used in Elasticsearch, as well as the data files and operations to be invoked. Each track rep'resented a different type of data with unique workload characteristics, and was chosen to represent a commonly used data type.

The tracks used consisted of event-based data, a general purpose logging workload, webserver logs, Q&A posts from a popular software user forum, security workloads, and Kubernetes pod metrics. At first these tracks were tested individually, and all data deleted between each round of testing. In subsequent testing, several tracks were run simultaneously from different servers to simulate a workload of mixed data types. For each data type tested, the data reduction ratio (DRR) was determined using the HPE GreenLake for File Storage Onboard UI.

Results observed during testing

By default, Elasticsearch compresses the data before it writes it to the storage platform. This is true regardless of the source or type of data. The purpose of this compression is to optimize storage and network usage, and achieve a balance between performance and overall cost for your Elasticsearch deployment. To the greatest extent possible, the default configurations for Elasticsearch and Rally were used in testing. This included use of the LZ4 compression algorithm, which is the default setting in Elasticsearch, or the best-compression algorithm which was hard coded in some of the Rally tracks used. Therefore, the data was already compressed by Elasticsearch before being written to the HPE GreenLake for File Storage platform. This makes the data difficult to compress further. However, the HPE GreenLake for File Storage system with its advanced data reduction capability, was still able to get an additional 10-to-40% reduction as indicated by the DRR of 1.1:1 to 1.4:1 as described below. This additional reduction helps shrink your storage footprint, lowering your overall storage costs and thus reducing TCO.

The data reduction ratio observed during testing ranged from 1.1:1 to 1.4:1, depending on the specific characteristics of the data type used. The track containing a dump of questions and answers from a software user forum had the lowest amount of data reduction, at 1.1:1. Due to the varied nature and inconsistent formatting of the user questions and answers in the posts, that data type is more difficult to gain additional reduction on beyond what Elasticsearch has already done. The event-based dataset had a little higher data reduction, coming in at a ratio of 1.2:1. The streamed logging messages and the webserver logs were the workloads that had the greatest amount of additional data reduction in our testing, with a DRR of 1.4:1 for each of these logging data types. As might be expected, the test with mixed data types showed that the combination of data was not able to be compressed much more than the individual data types. The observed amount of data reduction for the mixed data types ranged from 1.2:1 to 1.4:1, depending on what percentage of the whole each data type was.  These reduction ratios may not seem like much for use cases with small amounts of data. But a reduction of as much as 40% in the physical capacity required to store your data can really add up for enterprise implementations where the volume of indexed data can accumulate up to hundreds of terabytes or even petabytes.

Keep in mind that with data reduction, your specific results will depend heavily on the content and characteristics of your data. In other configurations with different types of data, the amount of data reduction can be different from what was observed in this testing. As with those claims from the automobile manufactures, your mileage may vary. But with this information about what we observed in testing with the real-world workloads described above, you should be well on your way to determining how HPE GreenLake for File Storage can help you reduce your storage footprint, and minimize your overall TCO.

In addition to advanced data reduction, the HPE GreenLake for File Storage system is optimized for fast response times to queries. Management of your storage is simple and intuitive with HPE GreenLake Data Services Cloud Console. It can be combined with management of other HPE storage and computing platforms in your environment in a single pane of glass. These HPE computing and storage platforms help you increase the efficiency of your Elasticsearch deployment โ€“ and lower your TCO.

For more information about data reduction and the other great benefits of deploying Elasticsearch with HPE GreenLake for File Storage, HPE Alletra 4110 Storage Servers, and HPE ProLiant servers, read the white paper: Optimize Elasticsearch performance and simplify management with HPE GreenLake for File Storage


Meet Storage Experts blogger Keith Vanderford, Storage Solutions engineer, HPE

Keith Vanderford-HPE.pngKeith is on the worldwide storage solutions team at HPE with more than 15 years of experience with HPE storage products. For the past several years, heโ€™s focused primarily on running data analytics software with HPE Storage.

 


Storage Experts
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

 

 

 

 

 

About the Author

StorageExperts

Our team of Hewlett Packard Enterprise storage experts helps you dive deep into relevant data storage and data protection topics.