We only understand a sliver of how the brain works, but we *do* know that it often learns through trial and error. We’re rewarded when we do good things and punished when we do the wrong ones; that’s how we figure out how to live. Reinforcement Learning puts computational power behind that exact process and lets us model it with software.

**What Is Reinforcement Learning?**

The easiest mental model to aid in understanding Reinforcement Learning (RL) is as a video game, which coincidentally is one of the most popular applications of RL algorithms. In a typical video game, you might have:

- An
**agent**(the player) who moves around doing stuff - An
**action**that the agent takes (moves upward one space, sells cloak) - A
**reward**that the agent acquires (coins, killing other players, etc.) - An
**environment**that the agent exists in (a map, a room) - A
**state**that the agent currently exists in (on a particular square of a map, part of a room) - A
**goal**of your agent getting as many rewards as possible

These literally are the exact building blocks of Reinforcement Learning (maybe Machine Learning is just a game?). In RL, we guide an **agent** through an **environment**, **state** by **state**, by issuing a **reward** every time the agent does the right thing. If you’ve heard the term Markov Decision Process thrown around, it pretty much describes this exact setting.

For a visual aid, you can think of a mouse in a maze:

Source: Machine Learning for Humans

If you were navigating the maze yourself and your goal was to collect as many rewards as possible (the water droplets and cheese), what would you do? At each **state** (position in the maze), you’d calculate what steps you need to take to reach the rewards near you. If there are 3 rewards on your right and 1 on your left, you’d go right.

This is how Reinforcement Learning works. At each state, the agent makes an educated calculation about all of the possible **actions** (left, right, up, down, you name it) and takes the action that’ll yield the best result. After completing this process a few times, you’d imagine that our mouse might know the maze pretty well.

But how exactly do you decide what the “best result” is?

**Decision Making in Reinforcement Learning**

There are 2 broad approaches for how you can teach your agent to make the right decisions in a Reinforcement Learning environment.

*Policy Learning*

Policy learning is best understood as a set of very detailed directions – it tells the agent exactly what to do at each state. A part of a policy might look something like: “if you approach an enemy and the enemy is stronger than you, turn backwards.” If you think of a policy as a function, it only has one input: the state. But knowing in advance what your policy should be isn’t easy, and requires deep knowledge of the complex function that maps state to goal.

There’s been some very interesting research in applying Deep Learning to learn policies for Reinforcement Learning scenarios. Andrej Karpathy implemented a neural net to teach an agent how to play the classic game of pong. This shouldn’t surprise us, since neural nets can be very good at approximating complicated functions.

*Q-Learning / Value Functions*

Another way of guiding our agent is not by explicitly telling her what to do at each point, but by giving her **a framework** to make her own decisions. Unlike policy learning, Q-Learning takes *two inputs* – state *and* action – and returns a value for each pair. If you’re at an intersection, Q-learning will tell you the expected value of each action your agent could take (left, right, etc.).

One of the quirks of Q-Learning is that it doesn’t just estimate the *immediate* value of taking an action in a given state: it also adds in all of the potential future value that could be had if you take the specified action. For readers familiar with corporate finance, Q-Learning is sort of like a discounted cash flow analysis – it takes all potential future value into account when determining the current value of an action (or asset). In fact, Q-Learning even uses a *discount-factor* to model the fact that rewards in the future are worth less than rewards now.

Policy Learning and Q-Learning are the two mainstays of how to guide an agent in Reinforcement Learning, but a bunch of new approaches have been using Deep Learning to combine the two or attempt other creative solutions. DeepMind published a paper about using neural nets (called Deep Q Networks) to approximate Q-Learning functions, and achieved impressive results. A few years later, they pioneered a method called A3C that combined Q-Learning and Policy Learning approaches.

Adding neural nets into anything can make it sound complicated. Just remember that all of these learning approaches have a simple goal: to effectively guide your agent through the environment and acquire the most rewards. That’s it.

**Practical Applications of Reinforcement Learning**

While the concepts supporting Reinforcement Learning have been around for decades, unfortunately it’s rarely implemented in practice today in business contexts. There are a number of reasons for that (see the challenges section below), but they all follow a similar thread: Reinforcement Learning struggles to efficiently beat out other algorithms for well defined tasks.

Most of the practical application of Reinforcement Learning in the past decade has been in the realm of video games. Cutting edge Reinforcement Learning algorithms have achieved impressive results in classic and modern games, often beating out their human counterparts by a significant margin.

This graph is from the above mentioned DQN paper by DeepMind. For more than half the games tested, their agent was able to outperform human benchmarks, often by more than double the skill level. For certain games though, their algorithms weren’t even close to human performance.

The other major area where RL has seen some practical success is in robotics and industrial automation. Robots can easily be understood as agents in an environment, and Reinforcement Learning has been shown to be a feasible teaching solution. Google has also made progress using Reinforcement Learning to cut down costs in their data centers.

Healthcare and education are also promising areas for Reinforcement Learning, but most of the work is purely academic at this point.

**Challenges With Implementing Reinforcement Learning**

While extremely promising, Reinforcement Learning is notoriously difficult to implement in practice.

Source: Alex Irpan

The first issue is data: Reinforcement Learning typically requires a *ton* of training data to reach accuracy levels that other algorithms can get to more efficiently. RainbowDQN (from the most recent DeepMind paper) requires 18 Million frames of Atari gameplay to train properly, or about 83 hours of play time. A human can pick up the game *much *faster than that. This issue seems to hold across disciplines (like learning a running gait).

Another challenge in implementing Reinforcement Learning is the domain-specificity problem. Reinforcement Learning is a general algorithm, in that it should theoretically work for all different types of problems. But most of those problems have a domain-specific solution that will work better than RL, like online trajectory optimization for MuJuCo robots. There’s always a tradeoff between scope and intensity.

Finally, the most pressing issue with Reinforcement Learning as it currently stands is the design of the reward function (recall Q-Learning and Policy Learning). If the algorithm designers are the ones setting up rewards, then model results are extremely subjective to the bias of the designers. Even when set up properly, Reinforcement Learning has this clever way of finding ways around what you want it to do and getting stuck in local optima.

With so much cutting edge research focused on advancing Reinforcement Learning, expect some of these kinks to get ironed out over time.

**Resources**

**Frameworks and Packages**

RL-Glue – “*RL-Glue (Reinforcement Learning Glue) provides a standard interface that allows you to connect **reinforcement learning** agents, environments, and experiment programs together, even if they are written in different languages.*”

Gym (OpenAI) – “*Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from **walking** to playing games like **Pong **or **Pinball**.*”

RL4J (DL4J) – “*RL4J is a reinforcement learning framework integrated with deeplearning4j and released under an Apache 2.0 open-source license.*”

TensorForce (Reinforce.io) – “*A TensorFlow library for applied reinforcement learning.*”

**Reading**

Machine Learning for Humans, Part 5: Reinforcement Learning (Machine Learning for Humans) – “*In reinforcement learning (RL) there’s no answer key, but your reinforcement learning agent still has to decide how to act to perform its task. In the absence of existing training data, the agent learns from experience. It collects the training examples (“this action was good, that action was bad”) through trial-and-error as it attempts its task, with the goal of maximizing long-term reward.*”

An Introduction to Reinforcement Learning (freeCodeCamp) – “*Reinforcement learning is an important type of Machine Learning where an agent learn how to behave in a environment by performing actions and seeing the results. In recent years, we’ve seen a lot of improvements in this fascinating area of research. In this series of articles, we will focus on learning the different architectures used today to solve Reinforcement Learning problems.*”

A Beginner’s Guide to Deep Reinforcement Learning (DL4J) – “*Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. They can start from a blank slate, and under the right conditions they achieve superhuman performance. Like a child incentivized by spankings and candy, these algorithms are penalized when they make the wrong decisions and rewarded when they make the right ones – this is reinforcement.*”

Deep Reinforcement Learning (DeepMind) – “*Humans excel at solving a wide variety of challenging problems, from low-level motor control through to high-level cognitive tasks. Our goal at DeepMind is to create artificial agents that can achieve a similar level of performance and generality. Like a human, our agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards.*”

Lessons Learned Reproducing a Deep Reinforcement Learning Paper (Amid Fish) – “*I’ve seen a few recommendations that reproducing papers is a good way of levelling up machine learning skills, and I decided this could be an interesting one to try with. It was indeed a super fun project, and I’m happy to have tackled it - but looking back, I realise it wasn't exactly the experience I thought it would be. If you’re thinking about reproducing papers too, here are some notes on what surprised me about working with deep RL.*”

**Papers**

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (13 authors!) – “*In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.*”

Deep Reinforcement Learning: An Overview (Li) – “*We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration.*”

Playing Atari with Deep Reinforcement Learning (DeepMind) – “*We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.*”

Human-Level Control Through Deep Reinforcement Learning (DeepMind) – “*The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations.*”

**Lectures and Videos**

Reinforcement Learning (Udacity, Georgia Tech) – “*You should take this course if you have an interest in machine learning and the desire to engage with it from a theoretical perspective. Through a combination of classic papers and more recent work, you will explore automated decision-making from a computer-science perspective. You will examine efficient algorithms, where they exist, for single-agent and multi-agent planning as well as approaches to learning near-optimal decisions from experience. At the end of the course, you will replicate a result from a published paper in reinforcement learning.”*

CS234: Reinforcement Learning (Stanford) – ”*To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration.*”

CS 294: Deep Reinforcement Learning, Fall 2017 (Berkeley) – “*This course will assume some familiarity with reinforcement learning, numerical optimization and machine learning. Students who are not familiar with the concepts below are encouraged to brush up using the references provided right below this list. We’ll review this material in class, but it will be rather cursory.”*

Advanced AI: Deep Reinforcement Learning in Python (Udemy) – “*This course is all about the application of deep learning and neural networks to reinforcement learning. If you’ve taken my first reinforcement learning class, then you know that reinforcement learning is on the bleeding edge of what we can do with AI. Specifically, the combination of deep learning with reinforcement learning has led to AlphaGo beating a world champion in the strategy game Go, it has led to self-driving cars, and it has led to machines that can play video games at a superhuman level.*”

**Tutorials**

Simple Beginner’s Guide to Reinforcement Learning & Its Implementation (Analytics Vidhya) – “*Today, we will explore Reinforcement Learning – a goal-oriented learning based on interaction with environment. Reinforcement Learning is said to be the hope of true artificial intelligence. And it is rightly said so, because the potential that Reinforcement Learning possesses is immense*.”

Simple Reinforcement Learning with Tensorflow Part 0: Q-Learning with Tables and Neural Networks (Arthur Juliani) – “*For this tutorial in my Reinforcement Learning series, we are going to be exploring a family of RL algorithms called Q-Learning algorithms. These are a little different than the policy-based algorithms that will be looked at in the the following tutorials (Parts 1–3). Instead of starting with a complex and unwieldy deep neural network, we will begin by implementing a simple lookup-table version of the algorithm, and then show how to implement a neural-network equivalent using Tensorflow.*”

A Tutorial for Reinforcement Learning (Abhijit Gosavi) – “*The tutorial is written for those who would like an introduction to reinforcement learning (RL). The aim is to provide an intuitive presentation of the ideas rather than concentrate on the deeper mathematics underlying the topic.*”