Optimize Elasticsearch performance and management with HPE GreenLake for File Storage

StorageExperts · ‎03-06-2024

Discover how HPE GreenLake for File Storage accelerates your Elasticsearch performance, while at the same time keeping your storage deployment and management streamlined and simple.

– Keith Vanderford, Storage Solutions, HPE

As its name implies, Elasticsearch is very flexible and can be deployed in many different ways. But what are the best ways to optimize performance and keep storage management from getting too complicated?

To answer these questions, we at HPE set up a configuration in our lab and tested it to provide recommendations for the optimal storage architecture for Elasticsearch deployments. The findings in our lab demonstrated that HPE GreenLake for File Storage is a great choice for the cold and frozen data tiers of Elasticsearch data. It empowers your Elasticsearch deployment for fast searches over cold data, but doesn’t unnecessarily complicate your storage management.

The HPE Alletra 4110 Storage Servers with local NVMe storage pair well with HPE GreenLake for File Storage, making for an excellent choice for the hot and warm tiers of Elasticsearch data. This combination enables your Elasticsearch deployment to meet all your performance needs. You get not only fast indexing of new data, but also fast, responsive searches across all your data tiers. Management of the entire environment is consolidated in one place, without the complication of separate management interfaces for your different storage tiers or server platforms.

The configuration used for testing

The environment we used during testing is shown in Figure 1. HPE compute nodes were used in a non-virtualized configuration to host all instances of Elasticsearch. All tiers of Elasticsearch data were stored on high performance file-based storage provided by the HPE GreenLake for File Storage platform. The testing goal was to measure the performance characteristics and responsiveness of the HPE GreenLake for File Storage platform, and validate its use for the cold and frozen tiers of Elasticsearch data.

Figure 1. Elasticsearch environment used for testing, with HPE GreenLake for File Storage

The workload used in testing was generated using Rally, which is the tool developed by Elastic for benchmarking Elasticsearch (https://esrally.readthedocs.io/en/stable/index.html). The Rally track we used simulates a typical customer workload consisting of HTTP access logs based on sample logs from the elastic.co website. It combines rate-limited indexing at varying levels with a fixed level of automated searches. Our purpose was to provide a realistic benchmark which included indexing of documents with concurrent queries for populating Kibana dashboards.

This workload included ingest of log data with simultaneous searches to simulate a real world use case. Ingest was tested at varying levels up to 6.5 TB/day, with simultaneous searches across pre-existing indices, as well as indices currently being written to. The search load ranged from 100 to 200 searches per second throughout the testing. With this load running, the HPE GreenLake for File Storage dashboard showed a peak write bandwidth of over 888 MB/s, with a peak read bandwidth of 350 MB/s.

Test results showed sub-millisecond read response times

Under the workload described above, observed read latency during testing consistently hovered around 0.25 milliseconds, with some spikes of up to 0.4 ms. This provided fast search times over Elasticsearch data stored on the HPE GreenLake for File Storage system. Consistent sub-millisecond response time ensures fast searches over data that extends beyond the warm tier of storage. Thus, you get the benefits of a tiered storage architecture, without the downside of performance degradation.

Accelerate your data with RDMA and nconnect

We tested with Linux servers using NFSv3 over Remote Direct Memory Access (RDMA). The advantage of RDMA is that it bypasses some of the buffering layers and socket limitations that are a normal part of connections using TCP as the transport protocol. This streamlines the movement of data between the servers and HPE GreenLake for File Storage. Not only is the overall throughput increased, but CPU utilization is reduced on both the Linux hosts and the storage since there is less protocol overhead on both ends of the connection.

In addition to using RDMA, we configured our servers to mount the NFSv3 file shares from HPE GreenLake for File Storage using nconnect. This is a mount option that allows the NFS driver to use multiple connections between the Linux server and storage port. We tested with an nconnect value of 8, which increased the usable bandwidth between the HPE compute nodes and the storage platform.

When using both RDMA and nconnect, we achieved a peak write throughput of 888 MB/s, and peak read throughput of 350 MB/s. Read latency, as already mentioned, was 0.4 ms or less. The focus during our testing was read performance, since that has the most impact on how responsive searches are over cold data stored in HPE GreenLake for File Storage. Even though cold data is more static than hot or warm data, the cold tier still experiences a significant amount of write activity. As data ages and moves out of the warm tier, it is written to the cold tier. Movement of shards due to rebalancing operations generates additional write activity in the cold tier. HPE GreenLake for File Storage can easily support all the write activity in the cold tier and simultaneously achieve very low read latency to provide sustained fast response times for your queries.

Recommended configuration optimized for performance and ease of management

The architecture we recommend is shown in Figure 2, and includes HPE Alletra 4110 Storage Servers for the hot and warm tiers of Elasticsearch data, and HPE GreenLake for File Storage for the cold and frozen tiers.

Figure 2. Recommended Elasticsearch environment for peak performance and ease of management

Optimized for performance

This optimized configuration takes advantage of the performance of HPE Alletra 4110 Storage Servers with local NVMe storage. These servers provide high performance computing and fast NVMe-based local storage for the hot and warm tiers of Elasticsearch data. The Kibana node and cold tier data nodes of the Elasticsearch cluster are HPE ProLiant DL360 Gen11 servers, while high-performance file-based storage for the cold tier of data is provided by HPE GreenLake for File Storage.

This configuration is optimized to give you fast response times for both indexing of new data in the hot tier of Elasticsearch, and searches of existing data across all your data tiers. It also streamlines and consolidates management for all the servers and storage into a single console that is intuitive, simple to use, and can be accessed from anywhere.

Streamlined environment management with one simple to use console

As Figure 2 shows, the servers and storage can all be managed through the HPE GreenLake Cloud Platform. Data Services Cloud Console (DSCC) provides management of the servers via HPE GreenLake for Compute Ops Management. HPE GreenLake for File Storage is easily managed through the same console, using HPE GreenLake Data Ops Manager. With the HPE GreenLake edge-to-cloud platform, environment management is consolidated in a single control plane, freeing you from the complication of multiple management consoles and applications. You get a cloud native management experience for the entire environment that is simple, intuitive, and can be accessed from anywhere, on any device.

The entire solution is also available via a managed service, just like our other HPE GreenLake solutions. This consumption-based service helps you to free up capital, increase business agility, and relieve your IT staff from the administrative tasks associated with managing your Elasticsearch deployment. Whether you choose a consumption-based managed service for your servers and storage, or a self-managed implementation, HPE Alletra 4110 Storage Servers, HPE ProLiant servers, and HPE GreenLake for File Storage give your Elasticsearch deployment all the performance you need for fast indexing, and responsive searches over all your tiers of data.

Scale seamlessly without performance limitations

The high-performance architecture used in HPE GreenLake for File Storage provides a modular solution that scales easily and seamlessly. Compute enclosures can be added to increase available bandwidth and processing capability to handle a higher volume of data, or an increase in concurrent users or the number and complexity of concurrent searches. Adding storage enclosures increases the available storage capacity of the system. These compute and storage enclosures can be added independently, allowing you to add just the resources you need, without the worry or expense of ending up with underutilized resources. As your data grows and retention times increase, you can expand compute or storage capacity without disruption – and in the most cost-effective way for your changing business needs.

To learn more about the great benefits of deploying Elasticsearch with HPE Alletra 4110 Storage Servers, HPE ProLiant servers, and HPE GreenLake for File Storage, read the technical paper: Optimize Elasticsearch performance and simplify management with HPE GreenLake for File Storage

Meet Storage Experts blogger Keith Vanderford, storage solutions engineer at HPE

Keith is on the worldwide storage solutions team at HPE with more than 15 years of experience with HPE storage products. For the past several years, he’s focused primarily on running data analytics software with HPE Storage..

Storage Experts
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Optimize Elasticsearch performance and management with HPE GreenLake for File Storage

Discover how HPE GreenLake for File Storage accelerates your Elasticsearch performance, while at the same time keeping your storage deployment and management streamlined and simple.

– Keith Vanderford, Storage Solutions, HPE

The configuration used for testing

Test results showed sub-millisecond read response times

Accelerate your data with RDMA and nconnect

Recommended configuration optimized for performance and ease of management

Optimized for performance

Streamlined environment management with one simple to use console

Scale seamlessly without performance limitations

Meet Storage Experts blogger Keith Vanderford, storage solutions engineer at HPE

StorageExperts

Author

Kudos