Tech Insights

Giving purpose to AI: Deep reinforcement learning

Blog_reinforcementlearning.jpgThe field of artificial intelligence has grown by leaps and bounds in recent years, already providing new capabilities that we see used in real life.

Smart phones can recognize our faces to identify us as their proper owners. We can now translate street signs in a different language using an app. And we are starting to have computers that can respond to voice commands. The future seems bright, with talk of autonomous vehicles, robots, and advances in many different fields—from medical diagnostics to automated factories.

Yet many of the applications we’ve seen are single-event driven. Some examples: Is the image shown that of a cat? Given a word, translate it into English. Execute a given command, such as “Turn on the Light.”  Deep learning techniques have been responsible for many AI applications like these, but fundamentally, deep learning is task-oriented. After learning from a pool of data, the model is able to make a decision or prediction when presented with new data. After seeing thousands of cat pictures, a model “learns” to recognize cats within an image. The usage pattern for deep learning is typically serial. That is: Does picture one have cats in it? Does picture two have cats in it? And so on.

Reinforcement learning offers higher-level actions. Instead of single events, reinforcement learning enables sequences of decisions. It is goal-oriented, enabling higher-level problems to be solved. For example, reinforcement learning has enabled computers to play some games better than humans. Atari video games and, most famously, the game of Go are examples of this. These games are bounded in the sense that there are distinct rules, and the environment is well defined. For example, a Go board is a 19x19 grid, and the rules clearly specify what actions can be taken on each move.

However, reinforcement learning is also able to be used in unbounded scenarios, such as teaching a robot how to walk and to navigate obstacles, or teaching autonomous vehicles how to merge in traffic.

What characterizes reinforcement learning?

Reinforcement learning involves a learning agent interacting with its environment to achieve a goal. It has three distinguishing characteristics:

  1. Being closed-loop in an essential way, where the learning agent’s actions can influence its later inputs
  2. Not having direct instructions as to what actions to take
  3. Having the consequences of actions play out over extended time periods.

The agent must be able to sense the state of the environment to some extent, be able to take actions that affect the state, and have a goal or goals relating to the state of the environment.

How is reinforcement learning different from supervised learning?

Reinforcement learning differs from supervised learning in that there isn’t a set of labeled examples that are provided by a knowledgeable external supervisor. The object of supervised learning is for the system to extrapolate or generalize its responses so that it acts correctly in situations not present in the training set. This approach doesn’t work well with interactive learning as it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act. Consider that the game of Go has 10-to-the-power-of-170 possible board configurations. It would not be possible to get examples of all possible moves. The agent has to be able to learn from its own experience.

Reinforcement learning differs from unsupervised learning in that the object of unsupervised learning is typically about finding structure hidden in collections of unlabeled data. Reinforcement learning is trying to achieve a goal, using a reward system, instead of trying to find hidden structure. Reinforcement learning can be considered a third machine learning paradigm alongside supervised and unsupervised learning.

Why rewards are a central feature of reinforcement learning

Central to reinforcement learning is the concept of rewards. The agent can receive either positive or negative rewards for its actions. So the agent then must be able to assign values to actions towards maximizing its rewards over time. Typically, this is done using a table where values can be assigned to states of the environments for actions.

The problem is that this approach has practical limits in terms of table size and being able to get the time and data needed to complete the table. And as mentioned above with Go, it may not be possible, or there will be states that have never been experienced before. The key issue is that of generalization; the agent must be able to generalize from previously experienced states to ones that have never been seen.

Going deeper: Deep reinforcement learning defined

This ability to generalize is a severe problem and would limit the applicability of reinforcement learning to problems with small numbers of states and actions. However, reinforcement learning methods can be combined with existing generalization methods. One such method has been highly developed over recent years, namely, deep learning. Deep learning is designed to recognize patterns in data—and can be used to help the reinforcement agent learn how to better assign values to actions. Thus, we have deep reinforcement learning.

Deep reinforcement learning gained attention when AlphaGo was able to defeat professional world champions in the game of Go, including the world’s number one ranked player. Prior to these victories, it had been assumed that it could take another decade before a computer could beat a professional level player. Further development has resulted in AlphaZero, a system which learned to play chess, shogi (Japanese chess), and Go from scratch and playing against itself to become the strongest player in history for each. Deep reinforcement learning is also able to play Atari games above the level of professional players in 49 different Atari games.

Use cases for reinforcement learning and deep reinforcement learning

Real-world applications of reinforcement learning are being developed to find faster ways of identifying common eye diseases, reduce energy consumption at data centers, improve autonomous driving, and train robots to do newer tasks without manual programming. Deep reinforcement learning promises to have wide application in areas such as robotics, autonomous vehicles, natural language processing, computer vision, financial services, and healthcare.

Aligning the right computer resources

The computer resources needed for deep reinforcement learning seem to vary depending on the application. AlphaGo used 48 CPUs and 8 GPUs. A distributed version of AlphaGo used multiple machines, 1,202 CPUs and 176 GPUs. AlphaZero and AlphaGo Zero used a single machine with 4 first generation TPUs and 44 CPU cores. While a single server may be enough for reinforcement learning, compute resources can be distributed across multiple servers. Distributed computing offers a way to handle increasing amounts of data while minimizing training time. For instance, IMPALA is a distributed deep reinforcement learning framework that can scale to thousands of machines without sacrificing data efficiency or resource utilization.

The HPE Apollo 6500 Gen10 offers a flexible accelerated compute platform that offers up to 8 GPUs per server. Users have the ability to choose accelerator topologies with PCIe GPUs, selecting either a 4:1 or 8:1 GPU:CPU ratio. If NVIDIA SXM2 NVLink GPUs are used, an efficient hybrid cube-mesh topology is used. Throughput is provided with up to 4 high-speed, low-latency network fabric adapters.

The Apollo 6500 Gen10 server is built to enterprise-level reliability and serviceability standards, providing a solid foundation for reinforcement learning and deep reinforcement learning.

Learn more about HPE Apollo Gen10 servers

Featured articles

Pankaj Goyal
Vice President, HPE AI Business
Hewlett Packard Enterprise

0 Kudos
About the Author


Pankaj is building HPE’s Artificial Intelligence business. He is excited by the potential of AI to improve our lives, and believes HPE has a huge role to play. In his past life, he has been a computer science engineer, an entrepreneur, and a strategy consultant. Reach out to him to discuss everything AI @HPE.

HPE Webinars
Find out about the latest live broadcasts and on-demand webinars
Read more
Online Expert Days
Visit this forum and get the schedules for online Expert Days where you can talk to HPE product experts, R&D and support team members and get answers...
Read more
View all