Behind the scenes at Labs
Showing results for 
Search instead for 
Did you mean: 

Making the mission to Mars compute



By Curt Hopkins, Managing Editor, Hewlett Packard Labs

Hewlett Packard Enterprise’s (NYSE: HPE) today introduced the world’s largest single-memory computer, the latest milestone in The Machine research project (The Machine).

Today at the Newseum in Washington DC, as part of an event titled “On the Launchpad: Return to Deep Space” hosted by The Atlantic, HPE will debut a functioning 160 terabyte prototype of our Machine computing project. Kirk Bresinker, chief architect of The Machine, will give an address that will stream live at 2:20 Eastern/11:20 Pacific. The Machine aspect of the live event will feature a feed of the prototype in the Fort Collins lab as well as the Executive Dashboard.

In addition to Bresniker, speakers addressing the opportunities and obstacles of a manned flight to Mars will include scientific luminaries like acting NASA administrator Robert Lightfoot, former NASA chief scientist Ellen Stofan, and Dr. Robert Zubrin, founder of Mars Society. It will be moderated by The Atlantic’s science editor Ross Anderson and the magazine’s Washington editor-at-large. The entire event will be live streamed from 1:00 to 5:00.

The Machine

The prototype being unveiled today possesses 160 terabytes of memory, capable of simultaneously working with the data of approximately 160 million books, or five times the collection of the Library of Congress. Based on the current prototype, we expect the architecture to scale to an Exabyte-level single-memory system and eventually to a nearly limitless pool of memory: 4,096 yottabytes, or 250,000 times the entire current digital.

Blinking lights represent an accomplishment, said Sharad Singhal, director of Machine applications and software, but they also represent a huge, huge opportunity moving forward for all of us.

 Among the elements of the Machine program the prototype contains are:

  • 160 TB of shared memory spread across 40 physical nodes, interconnected using a high-performance fabric protocol
  • An optimized Linux-based operating system (OS) running on ThunderX2, Cavium’s flagship second generation dual socket capable ARMv8-A workload optimized System on a Chip
  • Photonics/Optical communication links, including the new X1 photonics module, are online and operational
  • Software programming tools designed to take advantage of abundant persistent memory

Each of the 40 nodes consists of two connected boards: a Fabric Attached Memory Board and a compute board. Each Fabric-Attached Memory board consists of four Memory Fabric Controllers, with 4TB of memory per node, and Fabric-Attached Memory. Each compute board consists of a node processor System-on-a-Chip, almost three terabytes per second of aggregate bandwidth, and a local fabric switch.

"No computer on earth can manipulate that much data in a single place at once," said Labs chief architect Kirk Bresniker, "And this is just our prototype.".


 Life on Mars?

"While computing technology has improved enormously since the Moon landing the fundamental architecture underlying it all hasn't actually changed much in the last 60 years.," said Bresniker. "And that is quickly becoming a problem. As a computer engineer and researcher, this is the thing that keeps me up at night, the idea that our current technology won't be able to deliver on our expectations for the future." 

So instead of settling for that current technology, Labs and HPE have invented the future. 

According to Hewlett Packard Labs senior technical communicator, Richard Lewington, there are four categories in which a Memory-Driven Computing system can help facilitate our journey to Mars: Independence; hardening; space, weight, and power; and anomaly detection and situational awareness. 


The ideal computing system for a journey to Mars will be smart enough to handle predicted and unpredicted situations. This implies lots of memory to hold all the data created before the flight, during, and on Mars, enabling the infinite what-if engine to do things like predictive maintenance and disaster preparedness. Today, “non-critical” data like video footage is thrown away. But you should keep everything because you don’t know what history will be vital in the future. Memory-driven computing makes this possible.


 Anywhere without a proper atmosphere – open space and Mars being two of them – is very hard on computers due to exposure to radiation. On the International Space Station, for instance, they use laptops solely for non-critical functions. It’s not unheard of to get radiation-induced blue screens of death twice a day.

A Machine-Driven Computing machine, on the other hand, would be inherently hardened against deep space radiation. Cosmic rays can’t disrupt photons. They can cause data errors in DRAM by moving electrons around but not in a technology like memristor, in which entire atoms are moved around.

Space, Weight, and Power

Power and thus cooling, are critical for successful spaceflight. Not only are spacecraft power-constrained, but dumping the heat they produce is challenging. Because of that, every watt saved pays double. In the Space Shuttle, many computer systems were completely turned off most of the time, except for certain critical events in order to save power and obviate the need to disperse heat.

Space flight has always been constrained by computing power in an environment with a large number of things that can go wrong. There is a heavy reliance on precomputed scenarios. But what If those scenarios could be computed on board the spacecraft, in real time?

But because of Non-Volatile Memory, you can power down a Memory-Driven Computing device and it won’t lose a thing. We can also store a more-or-less infinite amount of data at zero energy cost until a user needs to access it. The same goes for the processors. Because the state of an application is held in memory, processors can be power cycled without the application losing its place.

Photonics are much lighter and smaller than wires. That makes a difference given you can only lift so much mass out of Earth’s gravity well and you can only cram so much stuff in those metal cans. Extremely dense memory arrays using photonics are on the horizon. All in all, the ability to make extremely small, extremely light computing devices increases when you use Machine-Driven Computing.

Anomaly detection and situational awareness

The ability to do anomaly detection on all systems makes a risky proposition much safer. This happens in Machine-Driven Computing by utilizing a node free from the overall system that is capable of divining the “last best state.” Such a system might help astronauts by giving them advanced warning of a failing or compromised component.

Another historical limitation is instrumentation. Given the power required and data generated, you just can’t build telemetry into everything. Vehicles are also “display-limited.” Put those issues together and they make a flight crew “insight-limited.”

Astronauts will always want situational awareness. In space travel it is easy to find “edge cases,” situations that fall outside a pre-programmed decision structure and therefore breed bad decisions. With Machine-Driven Computing, the pre-planned responses can be continually updated so there is no such thing as an edge case.

Finally, for Mars missions, the manned craft won’t be the only thing making the trip. Unmanned craft will be sent ahead to pave the way, take supplies etc. These will have to deal with bandwidth/latency issues and need much more autonomy.

The 40-node prototype will be the subject of both a Discover Theater session and a demo at HPE Discover 2017. Read more about the computing needs of a trip to Mars here

Header image via NASA  J. Bell (Cornell U.) and M. Wolff (SSI); graphic via NASA

About the Author


Managing Editor, Hewlett Packard Labs


How do you extend the addressability of memory beyond the 48 bits of the ARM CPU? 



(From Kirk Bresniker) We’ve already crossed that threshold with our prototype platform.   At 160TB of fabric attached memory, it has over 10X the amount of physical memory that any one of the Cavium ThunderX2 ARM processors can address directly at any one instant.   But, this is a problem which has been solved before in computing, so we draw inspirations from the past.   We take the 160TB of fabric attached memory and we deal with it in 8GB chunks.  

Every one of the ARM processor physical address maps consists of Firmware and some I/O devices and then a local 256GB of memory directly attached to processor.  The balance of the address space can be populated from the fabric attached memory, think of it as being able to install as many 8GB DIMMs as the part can address.  But these DIMMs are special.  They are (pseudo) Non-Volatile, so they have contents before they are installed.  They can be added and removed dynamically without losing their contents, think of passing a 8GB DIMM with contents between systems instantly. 

Finally, just like normal DIMMs they can be dedicated to a single processor, but they can also be used simultaneously by every processor on the fabric with multiple reader and/or multiple writers.   Since all the data is cacheable, you obviously need to be careful, but some of us are old enough to remember non-coherent and stateless I/O devices back in the UNIX/RISC days, so we know how to handle those cases as well and we now have some pretty interesting incentives to motivate the creation of lock free algorithms.

Online Expert Days - 2020
Visit this forum and get the schedules for online Expert Days where you can talk to HPE product experts, R&D and support team members and get answers...
Read more
HPE at 2020 Technology Events
Learn about the technology events where Hewlett Packard Enterprise will have a presence in 2020
Read more
View all