Servers: The Right Compute
cancel
Showing results for 
Search instead for 
Did you mean: 

DataStax + HP Moonshot: Scale-out Database Meets Scale-out Infrastructure

Guest blog by Matt Kennedy, Architect, DataStax

 

Matt Kennedy_DataStax.jpgThere is a bit of confusion in the Big Data space about how different solutions in the ecosystem intersect with each other, and the infrastructure that they run on. This dynamic is evident with the prevalence of what some vendors position as "Big Data Boxes". These servers are overwhelmingly tuned for Hadoop workloads, with a lot of low-cost, high capacity drives and high CPU core counts. For Hadoop workloads, those configs make sense. Hadoop overwhelmingly sits in the land of batch processing and high density, cost effective storage. At DataStax, we’re less concerned about capacity and more concerned with latency. Whereas Hadoop solutions are often configured with particular core to spindle ratios, for our workloads we’d prefer to avoid spindles entirely.

 

DataStax and HP have been working to validate and tune DataStax Enterprise (DSE) on the HP Moonshot dense server platform. DSE is an enterprise-grade database built on a foundation of Apache Cassandra, providing a scalable, always on, low latency operational database for web, mobile and IoT applications. Cassandra workloads are all about relatively small low-latency reads and writes to structured database records.

 

The first observation we have to make is that we need fundamentally different storage. Low latency databases (that aren’t in-memory only) require low-latency storage, this means SSD. This is true for all low-latency databases, but Cassandra is particularly well suited to leverage the advantages of SSD. Cassandra is also perhaps the most effective scale-out platform available today. Cassandra clusters commonly number in the hundreds of nodes. When the software scales out that effectively, architecturally simple single CPU servers can be employed. A single CPU with 4 or more cores is plenty of parallelism for a single Cassandra node.

 

With dozens or hundreds of nodes, we also expect there to be hardware failures. Cassandra gracefully handles this. However, we have to consider the recovery scenarios when failures do occur. The more data that exists on a node at the time of failure, the longer the recovery time. So servers that have relatively lower density than what is typically encountered with Big Data Boxes are quite useful here. And since SSDs tend to be lower capacity than HDDs, this is actually a convenient dynamic. There are certainly those out there addressing these challenges by carving up larger boxes into partitions, but this introduces complexities around replica placement and resource isolation.

 

Enter HP’s Moonshot platform, specifically the m710 cartridge, a platform that combines a single CPU with an extremely fast M.2 form factor SSD. Moreover, the density per node is relatively low, but from a physical perspective there are so many nodes per chassis that we actually have relatively high physical density. DSE databases deployed on Moonshot can achieve 75% savings in terms of physical space and a staggering 90% power savings. This increase in architectural efficiency also provides performance boosts supporting up to 1.7 times as many transactions per second than the conventional architecture on our standard benchmarks. From a physical perspective, Moonshot is also a joy to work with. Conventional clusters are often a spaghetti ball of cables. Moonshot nicely tucks 45 servers into a single 4.3u chassis. Each of those servers just plugs in with no additional cabling other than what is required to hook up the chassis. This also makes handling the occasional node failure simply a matter of replacing a cartridge.

 

DSE on Moonshot is an architecturally elegant platform with very high compute and storage density delivering a TCO of up to 66%. Rather than having to carve up large NUMA boxes to achieve high datacenter efficiency, we can leverage a server platform that is "rightsized" for Cassandra workloads and is cost effective and high performance to boot. This not only improves quantitative aspects of database operation like TCO, density, efficiency and performance, but from a qualitative perspective, it provides operators with a simple building-block like physical infrastructure that provides conceptual simplicity, which is important to professionals that are tasked with running large and complex infrastructures.

 

A technical paper on the solution can be found here.

0 Kudos
About the Author

DonnaSMartin

Donna is responsible for identifying training and certification market opportunities and developing strategies, positioning and content for HPE Storage and Networking portfolios. Donna joins the HPE Global Partner Enablement Certification and Learning team from the HPE Enterprise Group Content Marketing and Strategy team where she spearheaded development of customer-facing content at the business and thought leadership level, and as social media strategist for HPE Servers.