Behind the scenes at Labs
cancel
Showing results for 
Search instead for 
Did you mean: 

HPE and Hortonworks collaborate to bring big-memory Spark to the enterprise

Curt_Hopkins

sparkteam1.gifSriram Narasimhan, Tuan Bui, Jun Li, Mijung Kim, Alexander Ulanov, Manish Marwah, Hernan Laffitte, Haris Volos, (Not pictured: Carlos Zubieta, Tere Gonzalez, Janneth Rivera)

By Curt Hopkins, Managing Editor, Hewlett Packard Labs

Today, Hortonworks made a major announcement about Spark and HPE. Namely, Labs is helping make Spark better. Much better.

Driven by the motivation to make The Machine accessible to developers and demonstrate performance and scale beyond existing barriers, a cross-Labs and BU team led by Jun Li, Principal Research Scientist, set their sights on Spark for its in-memory focus. Apache Spark is a distributed in-memory analytics platform and the most active Apache project in big data.

“We wanted to test a hypothesis,” said April Slayden Mitchell, Director of Programmability and Analytics Workloads. “Can in-memory analytics perform better with big shared memory? We wanted to put Spark through the rigors to see if at The Machine scale we could surpass limitations of current memory bandwidth intensive workloads.” Possible use cases might include genome sequencing, probabilistic graph inferencing, and network flow analysis – all of which require largely random access over the irregular data structures with total sizes that can go up to 10’s of TBs or beyond.

Global shared memory

Today, said Li, “Spark uses disk-based storage for the data. Now, we read and write that data through globally shared memory. And that data is instantaneously accessible. ” This also means a tremendous reduction in the time and energy required.

The current method employed in Spark is to communicate intermediate processing results via TCP/IP, a very high latency, low bandwidth proposition, with a typical 0.1 millisecond of end-to-end latency and only 10 Gb/s of bandwidth in a cluster environment. This new Spark offering, however, has a “write/read paradigm for sharing data over globally shared memory,” said Li. It is now a low latency, high bandwidth proposition, with a remote memory access latency of only 210 nanoseconds and remote memory access bandwidth of 32 GB/s.

“We used global shared memory to turn Spark into a true in-memory data processing platform,” said Li, smiling. “And it’s much, much, much faster.”

As the team validated their findings, they realized they had more than a platform for The Machine, they had a platform and hardware they could put it in front of customers today. 

That hardware was HPE’s Superdome X server.

A funny thing happened on the way to The Machine

“We are confirming here that scale-out and scale-up can both be of benefit to our customers,” said Mitchell. “With Spark, we’ve taken a scale-out platform and turned it into a scale-up platform while still maintaining the same user-level application programming interfaces, so that our customers can use familiar tools in new ways to go beyond current scale and performance limitations.”

By applying their approach for Memory-Driven Computing, Mitchell said, “We have demonstrated the value of large shared memory machines as extreme analytics beasts.”

Collaborating with Hortonworks – an industry innovator that creates, distributes, and supports enterprise-ready open data platforms – will allow Labs to contribute this code to the Apache Spark community. Customers will have access to the software, hardware, and support they need to keep up with their growing requirements for scalable analytics solutions.  

Spark can already run on HPE Superdome X today, Li noted, “but can later run on The Machine. The move will be an instantaneous change because the software is the same.”

In addition to creating a highly improved Spark, this process proved again that, in addition to building toward a massive revolution in computer architecture, moving toward The Machine is producing radical improvements to already-existing technologies on the way. 

Watch Labs Director Martin Fink talk about the partnership. 

0 Kudos
About the Author

Curt_Hopkins

Managing Editor, Hewlett Packard Labs