Introduction to Reinforcement Learning. Part 2: Q-Learning

Figure 1: The complete RL cycle.

Value Function

Q-Learning — Solving the RL Problem

Example: Grid-World

Bellman’s Equation

Equation 1. Bellman’s equation.

The code, solving the example

Algorithm 1. Defining the Grid-World.
Algorithm 2. Function for the ε-greedy policy.
Algorithm 3. Function to simulate taking an action in the environemnt.
Algorithm 4. Main loop.
Q-values are: [[0.0, 0.0], [-5, 3.28], [2.95, 3.64], [3.28, 4.05], [3.64, 4.5], [4.05, 5], [0.0, 0.0]] 
Best action for state 0 is left
Best action for state 1 is right
Best action for state 2 is right
Best action for state 3 is right
Best action for state 4 is right
Best action for state 5 is right
Best action for state 6 is left

PhD Candidate 2021, NC State University

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store