Behind the scenes @ Labs
cancel
Showing results for 
Search instead for 
Did you mean: 

Neuromorphic Computing: a closer look at Labs research previewed at HPE Discover 2016

Curt_Hopkins

Cat Graves & Naveen Muralimanohar,Cat Graves & Naveen Muralimanohar,

By Curt Hopkins, Managing Editor, Hewlett Packard Labs

If you attended HPE Discover 2016 last week, or have been catching up on replays, you might have had watched Hewlett Packard Labs’ “Tomorrow Show” featuring an exclusive first look at our neuromorphic computing research. Here we take a deep dive into this exciting emerging technology.

Naveen Muralimanohar, a principal researcher in Foundational Technologies at Hewlett Packard Labs, together with his team, has taken a major leap in computing. Their “neuromorphic accelerator” – demonstrated by researcher Cat Graves in an exciting Discover session – exploits the very process of accessing memory to perform in-situ analog computation, vastly increasing the speed and energy efficiency of a critical mathematical operation.

“Our team looks at tech trends with an eye on how they affect system architecture,” Muralimanohar told Behind the Scenes. As Moore’s Law is grinding to a halt, we continue to generate data at an exponential rate with new data categories that have very different computational constraints.

“For example, every self-driving car will likely generate around two petabytes of data every year and most of these data will get processed through a neural network once in real time and might never get reused,” said Muralimanohar. “Certainly, we cannot afford to have a mini server rack in each car. Also, we cannot rely on communicating back and forth to a data center for every operation.”

A new function for a new type of memory

With memory bandwidth being a major bottleneck in traditional machines, building new hardware primitives directly into the memory itself allows the computer to scale more efficiently while making computing faster and more efficient. But the architecture of a computer will have to be changed in order to allow this new use of memory and to take maximal advantage of it.  The Memory-Driven Computing architecture of The Machine is ideally suited to incorporate accelerators that operate locally and directly on stored data.

“In a typical memory you activate one row at a time in a grid of cells to perform load/store operations,” Muralimanohar explained. “But in emerging resistive memories, such as Memristor, by activating multiple rows in parallel, the access operation can naturally manifest as an analog multiply-accumulate of a matrix and vector. The memory array contents, which naturally represents a matrix, need not be moved across several layers of caches to perform the computation. But there is also flip side. You have to precisely tune each cell and you can only store a few bits of memory per cell.” In practical terms, this ability to very rapidly and efficiently do vector matrix multiplication – or simply matrix multiplication – would have a profound impact on many computationally-intensive applications that are commonplace today. These types of algorithms lie at the heart of applications like image filtering and recognition, speech recognition, neural networks or machine learning.

To get from this thesis to a functional accelerator, Muralimanohar’s team had to answer the following questions.

  • Can we exploit this process to replace complex functional units?
  • If we did, what kind of precision could we get out of analog computation?
  • How limiting could the precision be for emerging applications, especially when each Memristor cell can handle only 5-6 bits?
  • Ultimately, would it improve the speed and the efficiency to a spectacular degree?

It turns out that the answer to all those questions is yes, with the right microarchitecture.

“In fact, with our scalable design, we can not only overcome the limitations mentioned above, but also provide varying bit precision with different energy-delay tradeoffs even within the same application,” he said. This is in stark contrast to the current approach of designing a data path and functional units for a specific word size and maximum precision for an entire processor. While such an approach has served well in the past, it is not the best for emerging big-data workloads, especially the emerging applications around machine learning.

By employing this new design, Muralimanohar believes we can achieve a level of performance that is more than an order of magnitude above CMOS ASIC chips.

Recognizing a breakthrough

“My work is about looking into fundamentally different technologies with potential for huge effects on the community,” Muralimanohar said. In the past, Muralimanohar has worked on other foundational technologies such as Memristor memories and optical interconnects. “The architecture community is very interested in game changing technologies.” As is Labs.

We talked to Stan Williams, HPE Senior Fellow and a director in the Foundational Technologies group at Labs, about Muralimanohar and his team’s work. Williams enthusiastically told us that this “work on the Dot Product Engine [the prototype’s code name] astonished me. Naveen has shown that one can break a vector-matrix multiplication using Memristors all the way down to binary arithmetic, pull all of the resultants back together and still beat a digital CMOS ASIC in terms of speed and power. I never would have thought that possible.”

Muralimanohar has co-authored a paper on this work in collaboration with the School of Computing at the University of Utah. Entitled “ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” it will be presented at the International Symposium on Computer Architecture (ISCA), 2016.

With this paper, Muralimanohar joins other heavy hitters – most of whom are far older than him – in ISCA’s informal Hall of Fame, which recognizes authors with eight or more papers in ISCA.

“Doing so is quite an achievement for anyone, but especially so for someone as young as Naveen,” added Williams.

Muralimanohar will present his work at ISCA's 2016 symposium in Seoul, South Korea, June 18-22.

If you haven't had a chance to see this exciting technology in action, watch “The Tomorrow Show: three new technologies from Hewlett Packard Labs” below.

About the Author

Curt_Hopkins

Managing Editor, Hewlett Packard Labs

Comments
Santhana Ganesan

Naveen's feat is applauded! Though Im a novice to the field of computer I could  understand that what Naveen has done would lead to elimination of high capacity servers in future! As one who was part of his childhood days In proud that he achieved this breakthrough! Congrats to him and the team!

Events
28-30 November
Madrid, Spain
Discover 2017 Madrid
Join us for Hewlett Packard Enterprise Discover 2017 Madrid, taking place 28-30 November at the Feria de Madrid Convention Center
Read more
See posts for dates
Online
HPE Webinars - 2017
Find out about this year's live broadcasts and on-demand webinars.
Read more
View all