Servers: The Right Compute
cancel
Showing results for 
Search instead for 
Did you mean: 

HDFS Data Lake high-speed ingest into HPE Superdome Flex

ServerExperts

Ever wondered how fast can you load data into big scale up systems like HPE Superdome Flex? Read on to get insights into expected ingest performance into HPE Superdome Flex.

HPE Supderdome Flex-Hadoop-blog.jpgCompanies looking to turn their critical business data into real-time insights need a solution that provides the performance and scalability to grow with their business. The unique scale up architecture of the HPE Superdome Flex allows you to scale from 4-to-32 sockets and up to 48TB of memory to handle your largest in-memory compute needs.

Ideal for accelerating data analytics and tackling AI and HPC workloads holistically, HPE Superdome Flex helps you tap into new business insights quickly to help you gain competitive advantages in your business.  

Data comes in many sizes and ingesting that data efficiently into HPE Superdome Flex is key to gaining those faster insights.

Data coming from various data lakes typically include large amounts to data that must be ingested prior to processing with Superdome’s large in-memory compute capabilities. Let’s walk through one example of ingesting large amounts of a data from a Hadoop data lake.

For many customers the timeframe for ingestion is constrained by business requirements, so increasing the ingestion speed becomes critical. In order to achieve these high ingest speeds, data needs to be read directly and in parallel from all nodes hosting the Hadoop data lake, otherwise any gateway nodes that are inserted between the Hadoop data lake and HPE Superdome Flex will most likely impose performance bottlenecks or not make sense financially.

There are multiple technology options for storing large amounts of data with HPE Superdome Flex, covering RAM/PMEM/NVME/SAN/NAS. Storing data on NVME internal drives offers significant performance benefits and space & power savings. Additionally, it provides up to 614.4TB of raw storage with current 6.4TB NVMEs in a configuration optimized for high speed ingest from Hadoop data lakes. Using 100G networking balances the performance provided by internal NVMEs. HPE Superdome Flex scales up in units of 4 CPU sockets chassis (up to 32 sockets). Each 4 CPU sockets chassis can be configured with a 16 slot PCI option or a 12 slot PCI option, giving it a total capability of 128 PCI slots per system.

Current typical Hadoop data lakes configurations are based on worker nodes with 10G or 25G networking with 20 or more nodes per rack, combined with their respective redundant Top of Rack (ToR) switches providing six or eight 100G uplinks. Most of these uplinks are aggregated together and connected to upper layer spine switches, but usually there are at least 2 uplinks available. The recommended configuration is to connect these 2 available uplinks from each rack to separate 4 CPU sockets chassis for HPE Superdome Flex, thus distributing the HDFS racks across HPE Superdome Flex chassis. Each rack can be connected to a single dual port 100G network card. This would provide the fastest links directly to HDFS worker nodes and would limit the size of the HDFS cluster that can be connected by the number of PCI slots available in HPE Superdome Flex. It works well for HDFS clusters up to about 32 racks. For larger clusters connecting through the spine level switches is the alternative option providing unlimited cluster scale access.

HPE Superdome Flex-Hadoop.jpg

 

For best performance. we recommend using 2 NVME cards per 4 CPU sockets chassis, up to a total of 16 NVME cards per system. HPE 6.4TB NVMe x8 Lanes mixed use HHHL model is currently the highest capacity and performance NVME drive we offer on this platform. It offers up to 6.1GB/sec for reads and up to 2.9GB/sec for writes. Using 1 HPE InfiniBand EDR/Ethernet 100Gb 2-port 841QSFP28 network cards per 4 CPU socket chassis will make sure that networking is not a bottleneck in this scenario.

The table below summarizes the number of NVME devices, ingestion top speed and internal NVME raw storage related to the number of CPU chassis used. This reflects the currently supported number of NVMEs per system. This number can be increased based on customer special requests and validation tests, up to the physical slots availability.

HPE Superdome Flex - Hadoop - table.jpg

For these high ingest rates, the software stack and operating system should be properly configured to balance the workload across all physical devices. Ideally an application running on HPE Superdome Flex can read HDFS data directly using multiple mappers and can fully engage the parallelism offered by Hadoop Data Lake and local resources. As such, the recommended option is to make HPE Superdome Flex a compute block in the HPE Elastic Platform for Analytics (EPA) architecture and install Hadoop compute services on it.

This will allow HPE Superdome Flex to properly access any data type directly from the HDFS data lake at high speeds and can also augment the computing capabilities of the Hadoop cluster when needed. Many Hadoop ecosystem services are priced by node so this will also provide cost benefits for using HPE Superdome Flex for those services.

In case the option of installing Hadoop services on HPE Superdome Flex is not viable, Apache Sqoop2 or similar products can be used to export data from HDFS using a high degree of parallelism.

In summary: HPE Superdome Flex can help provide necessary performance and cost benefits for HDFS Data Lake high speed ingest.

In case you want to know more, please reach out to your HPE representative.


DP.jpgMeet Server Experts blogger Daniel Pol, Data and Analytics Architect. Dani is part of HPE’s Data & Analytics team creating Solution Reference Architectures for BigData landscapes.  

 


Server Experts
Hewlett Packard Enterprise

twitter.com/HPE_HPC
linkedin.com/showcase/hpe-servers-and-systems/
hpe.com/servers

 

0 Kudos
About the Author

ServerExperts

Our team of Hewlett Packard Enterprise server experts helps you to dive deep into relevant infrastructure topics.