Exploiting the economics of density & scale for X-large NoSQL databases using HPE PMEM and Aerospike

Tilman · ‎08-08-2019

The Hewlett Packard Enterprise (HPE) Open Source Solutions R&D Lab (OSSL) develops solutions exclusively for the world’s hyperscale and extreme scale market. The work performed in this lab is cutting edge, sensitive, and thought provoking—ranging from end-to-end pipeline development to algorithm engineering for some of the most profound artificial intelligence (AI) problems in existence. During development of global solutions for hundreds of petabytes to tens of exabytes, an understanding of international law, finance, accounting, and economics is just as important as security, programming, statistics, and mathematics. That said, as an open source lab, our charter is purpose-driven rather than profit-driven; our mission lies in bestowing a lasting public benefit on the global open source development community. Although the focus of OSSL is on designing effective “freemium licensing models” for HPE hardware, commercial products are explored and ultimately selected when highly specific performance, security, financial, or regulatory requirements are mandated by the client, thereby eliminating from consideration open source options.

This was the case with Persistent Memory (pmem), a ground-breaking technical innovation that had an equally ground-breaking price tag attached to it. The quants clearly showed that any pmem solution would require cost savings, a tangible financial benefit that puts capital back into the customer’s budget—allowing this money to be spent elsewhere in the next fiscal year. To achieve this goal, only well-crafted pmem solutions would prove fiscally viable.

However, this in itself presented a challenge because Intel® Optane™ DC Persistent Memory, the technology behind HPE Persistent Memory (along with nearly every other vendor’s Persistent Memory offering out there), was bleeding-edge and lacked any mature application integrations. A problem indeed because, on paper, this meant that pmem possessed almost no ability to pay for itself (that is, to generate a positive ROI) at hyperscale and beyond—which is the market that the OSSL team specifically develops for. Thus, a solution to this problem needed to be found, quick.

A 50-word summary for the Executive and Twitter communities

Pairing Aerospike Enterprise Server (AES) 4.5 and later with HPE Persistent Memory provides astonishingly high performance at tremendous horizontal and vertical scale. This in turn empowers the current hyperscale and emerging extreme scale markets with significant cost advantages due to their colossal density requirements. Moreover, a NoSQL “freemium licensing model” is projected to be more than 600X more expensive than this commercial NoSQL solution.

1,000 concise words for people with 20 minutes to spare

In order to truly appreciate the magnitude of recent NoSQL market forecasts, specific microeconomics theories—or the science behind how people and businesses make decisions—must be understood. The first theory is the concept of cost density savings, which seems to no longer possess a three-dimensional (3D) spatial proximity boundary, at least not for the NoSQL market. Instead, a new multi-dimensional aspect seems to apply to the economies of density, encompassing multi-cloud spatial properties. This represents the new hyperscale model, and it accommodates some incredibly large cost savings resulting from spatial proximity of suppliers.

Next, economies of scale must also be scrutinized to fully understand how the NoSQL market went from being an upstart, with revenue measuring in the hundreds of millions in 2012, to a mature billion-dollar industry less than a decade later. The underlying premise of economies of scale is relatively simple—unit costs decrease as scale increases. It represents the per unit cost advantages that companies can secure due to the size and scale of their business operations. With this definition in mind, let’s now take a short ride through the past ten years to see if a tech pattern can be established that would directly impact the economies of scale for the NoSQL market.

/* Begin Google Search Stats … This info does not count towards my 1,000 word limit

In 2006 AWS launched into a world that largely failed to recognize its significance. The next year Google released its Cloud Platform along with its Android OS, and Apple’s very first iPhone hit the market. In 2009 Uber launched its rideshare app, and the next year saw the release of Microsoft Azure and new Microsoft Windows phones. In 2011, Lyft launched its rideshare app, and Netflix changed how the world watched TV with its new streaming media service. Skipping ahead in time slightly, Amazon initiated the smart-home race in 2014 with the release of Echo, and Google Home caught up in 2016. [i]

End */

And now, in 2019, nearly every technology device has an embedded sensor that actively harvests data. And so begins our reliance on “things”, or the Internet of Things (IoT) to be more precise. All this mobile and web application, blog and session management, social networking and e-commerce data is ingested into NoSQL Key/Value stores. This in turn fuelled the recent hyperscale NoSQL storm. And it is all this amassed “things” data that now feeds the world’s AI machine.

With companies suddenly capable of collecting an absurd amount of data from a vast number of devices, the intense need for flexible databases drove up the demand for NoSQL technologies—which afforded massive scale. And with data now seen as a commodity, the more data that a company owns, the more leverage the company has over these global providers and suppliers. Thus, the hyperscale and emerging extreme scale markets can obtain cost advantages due to the immense size of their global operations.

By applying a multi-dimensional definition to the economies of density in order to reflect modern multi-cloud trends, the impact on the economics of scale becomes instantly transparent. It is these new hyperscale and extreme scale customers, powered by tremendous amounts of data that forced an agglomeration between NoSQL vendors and strategic business partners. This assemblage was the only way that this market could achieve the synergies required for high-density server scale and database service provisioning at a lower overall cost to their business. With companies racing to improve their market position through AI and agile software development, NoSQL was hurtled to the forefront of database market, almost overnight.

The research and technology behind the economics

With the emergence of Persistent Memory, an even greater improvement in NoSQL database scale suddenly seemed possible. The OSSL team spent the better part of 2019 researching and testing every open source NoSQL database project that possessed any kind of native persistent pmem integration in order to determine the following:

The level of development required to get these projects production-ready
Proper placement for these various NoSQL/pmem solutions within our hyperscale and extreme scale designs

Some of the tested open source projects were further along than others, although none were viable candidates for the extreme scale accounts for which OSSL was specifically developing pmem solutions. Having exhausted all open source options, the project then branched out to commercial NoSQL databases, of which a quick Google search showed only one[ii]—Aerospike.

The initial Yahoo! Cloud Serving Benchmark (YCSB) tests were quick, with what is known as a “dirty benchmark” being performed. The sole purpose of a dirty-bench is to test the veracity of code. These types of “extraordinary” tests are necessary for developers to quickly determine which tools allow them to make the best use of their time. As a result, only a single pmem server was used for the entire YCSB test—which is about as far from a best practice YCSB as it gets. What these results ultimately showed is that pmem is simply too new; open source projects need more time to catch up to the technology. However, the commercial AES pmem implementation appeared borderline bullet-proof.

With the exception of AES, none of the other open source project names are mentioned in the following chart. This level of detail was deemed counterproductive to the scores of open source developers out there who donate their off-hours to improve projects for the greater community, from which they will likely receive no direct benefit.

Yahoo! Cloud Benchmark Tool (YCSB)

The AES pmem results were astonishing. A real-world test was performed using a best practice YCSB configuration, three production-grade HPE ProLiant DL380 Gen10 servers supporting NVMe using 1.5 TB of HPE Persistent Memory per node, and a high-performing AES cluster configuration, as observed in the following graphic.

It is important to understand what makes AES so unique in the pmem space. Unlike nearly every other NoSQL database, AES can place only indexes in pmem, sending all database data to disk. This gives AES both a tremendous cost and scale advantage. Additionally, the microsecond-level performance of AES combined with its vertical and horizontal scale capabilities also made this commercial NoSQL database stand apart.

Lastly, the improved reliability that AES and HPE pmem offer cannot be understated—with node restart times now reduced to seconds. When storing AES database indexes in HPE pmem, rather than in memory, the indexes persist across reboots. Suddenly warm database restarts of a NoSQL database are possible. It is for these reasons that AES constitutes the NoSQL Key/Value Store foundation for the newest HPE extreme scale designs, as illustrated in the following graphic.

THERESA MELVIN

Chief Architect, AI-Driven Big Data Solutions

HPE Worldwide Solutions | Open Source Profession

Theresa runs the HPE Open Source Solutions R&D Lab in Fort Collins, Colorado. She is a certified geek who spends free time discretely helping out other open source developers in the community. Theresa is a J.D. who is now working on a PhD in data science. As a senior pipeline developer, Theresa's expertise in end-to-end AI-driven designs runs deep. Her diversified background in STEM, business, finance, and law gives her a unique and rich insight into extreme scale solution development. Theresa is the proud wife of an Army vet and the mother of a vivacious three-year-old, who enjoys speaking Japanese and debugging Python code with her mommy.

Tilman Walker
Hewlett Packard Enterprise

twitter.com/HPE_Storage
linkedin.com/showcase/hpestorage/
hpe.com/storage

[i] The stats in this paragraph were collected in less than 80 seconds using targeted Google searches. Thanks, Google!

[ii] This Google search was performed at the end of February in 2019. Although Aerospike does offer a community version of its enterprise database, which is open source, their new pmem integration feature is only offered in their commercial database.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Exploiting the economics of density & scale for X-large NoSQL databases using HPE PMEM and Aerospike

Tilman

Author

Kudos