- Integrated Systems
- About Us
- Integrated Systems
- About Us
An Oral History of The Machine—Chapter Three: Software
Chapter Three: Software
By Curt Hopkins, Managing Editor, Hewlett Packard Labs
The Machine is a computing architecture so radically different than any which has come before that it will affect everything we do in the future. Hewlett Packard Labs has spent the last five years developing the memory-driven computing, photonics, and fabric that has gone into The Machine and which have made the impossible inevitable.
We spoke to several dozen researchers – programmers, architects, open source advocates, optical scientists, and others – to construct a nine-part oral history of the years-long process of innovating the most fundamental change in computing in 70 years.
These men and women are not only scientists, they are also compelling story tellers with an exciting history to relate. If you’re interested in how differently we will be gathering, storing, processing, retrieving, and applying information in the near future, or you just enjoy good stories about science and discovery, read on.
If you would like to read other entries in the series, click here.
Director, Machine Applications and Software. Involved in The Machine program since it was announced.
Martin instinctively understood that The Machine had the potential to change the way we think about computing, but he needed evidence. “The Machine is good,” he said, “Now show me what it is good for.” He charged us with delivering an interesting application on The Machine that would be 10x better than what was currently happening.
It wasn’t a straightforward task, and took time to figure that out. All applications assume that both memory and compute is limited within the confines of a single server, so the only way to scale the application is either by building large scale-up servers, or scale-out horizontally by adding servers in a cluster. Scale-up architectures do not give me the compute I need. In scale-out, the communication traffic kills me.
Here we had the exact opposite. Memory in the new architecture is centralized so I don’t have to spend time shuffling data back and forth using I/O channels within the cluster. Each CPU can look at any part of global memory, removing much of the communication overhead. However, The Machine retains the benefit of compute scalability that distributed clusters provide me. I can attach as much compute as I need around the global memory.
This was not originally obvious, because we carry a lot of implicit assumptions based on years of experience on the old architecture. We struggled through a lot of apps to get here. We spent almost a year circling around the solution. Once people got it into their heads that we had no memory limit, lots and lots of projects started to come to the fore.
One of the first things we looked at was Hadoop. Using The Machine, we have so much bandwidth we should be able to run it better. But when we tried, it was often slower. At best it ran at 2X! So our team went deep into Hadoop to figure out what the problem was. After a number of investigations, we found two bottlenecks. Hadoop requires all-to-all communications within the cluster in the shuffle phase of many applications. We had the ability to simply exchange pointers to data in memory rather than moving the data itself. In addition, individual Hadoop nodes do not handle the data set sizes that we were using. We had to re-do the memory management within Hadoop. Once we did that, we created a version of Spark on Hadoop in our environment that was 10X-20X faster than the unmodified version running on the same hardware.
Another thing we realized was that we could index more easily because of the memory we had available. Frequently, data indices are maintained in-memory because they are accessed often, and in random order. Many algorithms avoid precomputing results and creating large indices because of memory limitations.
Within The Machine, however, we can trade off space for speed. Instead of redoing compute every time, we created larger indices that are memory resident, and reduced the compute to a look-up. We have found a large class of problems in search, optimization, and machine learning that can benefit from these kinds of changes. With smaller memory architectures, we can either choose the application to be small and fast or big and slow. The Machine architecture allows us to convert this class of applications to be big and fast.
HPE Fellow, Deputy Director of Hewlett Packard Labs
What software workloads would benefit from The Machine architecture? And how would we show the level of improvement we were charged with delivering? How could we select projects that would not only highlight the improvements The Machine gave but point to solutions deemed unsolvable with conventional architectures?
Director, Systems Software for The Machine
At the end of any big analytics operation, you have to sort on something (name, star rating, and so on). The Machine’s memory fabric testbed will be able to match any current world record holder in sorting, but it will be able to do in one rack what the record holder needs 40 to do.
It’s one thing to hypothesize, another to see something run at performance level. Our in-memory database significantly outperformed anything than had been built before to 100x.
Graph speed opens up the possibility of dealing with problems you couldn’t before – malware in an enterprise, bioinformatics, traffic and congestion, national security – real-world scale problems.
How does one create software that can exploit larger and larger memory systems? Those systems will get smaller, so at one point in that laptop or mobile device you will have access to an amazing amount of terabytes of non-volatile memory. That won’t be very interesting if you don’t have the software to exploit it. With the right software, and the phenomenal amount of storage in your hand, you won’t have to search the Internet. Using today’s algorithms, certain bid data searches take weeks. With The Machine, you’ll be able to do it in near real time.
Director, Programmability and Analytics Workloads
The hardware and software focus for the Machine started with two clear and bold challenges: Make memory driven computing real, and demonstrate its undeniable value. Our researchers and data scientists are never ones to back down from a challenge, so they came together to deliver not just one, or a few, but instead a stream of relevant examples, proof-points, and working implementations that went beyond just 10X performance to achieve results previously impossible.
One of those key Machine-capable workloads was Graph Inference, an approach that is a critical influencer of decision making and decision makers. As a decision maker, you look at the observed variables and make a choice. If the state of those variables were different, you may make a difference choice. Inferencing allows you to take advantage of the observed to identify unknowns.
Consider the complexity of a huge web of interconnected decision threads, where the states of each choice are constantly changing and there are many more unknowns than knowns. The current approach is to brute-force the answers by increasing demands on processing power. Taking a Memory-Driven Computing approach, our teams learned that we could infer probabilities at an unprecedented scale and make 100X improvements over the state of the art.
Our projects were designed to really run, and to completely flex The Machine in terms of compute and memory bandwidth. We created a Memory-Driven Computing software engine capable of providing decision makers with the answers they needed in real-time.
HPE Fellow, Deputy Director of Hewlett Packard Labs
The reason The Machine was so compelling to me was that it united many of my professional experiences in the development and understanding of computer architecture. Over my career as an engineer and architect, I have contributed and led design teams in software, firmware, VLSI, and system development for a broad range of products, from real-time operating systems to mission critical servers
During my time in BCS under Martin Fink, a group of technologists, including Kirk Bresniker, would periodically meet with Martin to discuss emerging technology and industry trends and to do deep dives on architectural concepts. A small advanced development software team began testing some of the performance hypotheses of what is now our Memory-Driven Computing (MDC) architecture. These included open data management frameworks, in-memory data management performance measurement, and application acceleration frameworks for ASICs, GPUs, and FPGAs.
After becoming head of Labs and CTO of HP, Martin was in a position to fund a larger advanced development program centered on MDC. At this point, both Kirk Bresniker and I reunited with Martin in Labs to lead the effort. We had an accomplished team of researchers working in this space, but we needed a broader team of technologists and developers to actually build a working prototype.
The model we ultimately created was somewhat unique for the company, at least in terms of scale. We took researchers from Labs and put them together with subject matter experts across the business units. The success of that action speaks to the overall strength of the company. No other startup could do it and very few companies of any size would have the expertise to fundamentally alter the architecture of computing and actually prove it and build it out.
To read the other chapters in the series, click here.
- on: From Research to Reality: Exascale computing in th...
- on: Research to Reality: Obsoleting the electron (PODC...
- on: The era of real-time everything
- on: For sale: Memory-Driven Computing
- on: Labs distinguished technologist talks about the fu...
- pernikahan on: (VIDEO) From the Lab: Novel accelerators for the f...
- on: Labs intern Elizabeth Liri wins Best in Class for ...
- Campbellja on: (PHOTO ESSAY) The cook in her kitchen: A photograp...
- on: Stan Williams: a retrospective
- on: HPE DISCOVER: Demos are the best way to lay your h...