Lecture 1: Introduction To Reinforcement Learning: David Silver
Lecture 1: Introduction To Reinforcement Learning: David Silver
David Silver
Lecture 1: Introduction to Reinforcement Learning
Outline
1 Admin
4 Inside An RL Agent
Class Information
Assessment
Textbooks
http://www.ualberta.ca/szepesva/papers/RLAlgsInMDPs.pdf
Lecture 1: Introduction to Reinforcement Learning
About RL
Computer Science
Engineering Neuroscience
Machine
Learning
Optimal Reward
Control System
Reinforcement
Learning
Operations Classical/Operant
Research Conditioning
Bounded
Mathematics Psychology
Rationality
Economics
Lecture 1: Introduction to Reinforcement Learning
About RL
Supervised Unsupervised
Learning Learning
Machine
Learning
Reinforcement
Learning
Lecture 1: Introduction to Reinforcement Learning
About RL
Helicopter Manoeuvres
Lecture 1: Introduction to Reinforcement Learning
About RL
Bipedal Robots
Lecture 1: Introduction to Reinforcement Learning
About RL
Atari
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
Reward
Rewards
Examples of Rewards
Fly stunt manoeuvres in a helicopter
+ve reward for following desired trajectory
ve reward for crashing
Defeat the world champion at Backgammon
+/ve reward for winning/losing a game
Manage an investment portfolio
+ve reward for each $ in bank
Control a power station
+ve reward for producing power
ve reward for exceeding safety thresholds
Make a humanoid robot walk
+ve reward for forward motion
ve reward for falling over
Play many different Atari games better than humans
+/ve reward for increasing/decreasing score
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
Reward
observation action
Ot At
reward Rt
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
Environments
observation action
Ht = O1 , R1 , A1 , ..., At1 , Ot , Rt
St = f (Ht )
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
State
Environment State
Agent State
The agent state Sta is the
agent state Sat
agents internal
representation
observation action
Ot At
i.e. whatever information
the agent uses to pick the
next action
reward Rt i.e. it is the information
used by reinforcement
learning algorithms
It can be any function of
history:
Sta = f (Ht )
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
State
Information State
An information state (a.k.a. Markov state) contains all useful
information from the history.
Definition
A state St is Markov if and only if
Rat Example
reward Rt
Agent state = environment
state = information state
Formally, this is a Markov
decision process (MDP)
(Next lecture and the
majority of this course)
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
State
Policy
Value Function
Model
Ras = E [Rt+1 | St = s, At = a]
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent
Maze Example
Start
Rewards: -1 per time-step
Actions: N, E, S, W
States: Agents location
Goal
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent
Start
Goal
-16 -17 -6 -7
-18 -19 -5
-24 -20 -4 -3
a
Grid layout represents transition model Pss 0
Value Based
No Policy (Implicit)
Value Function
Policy Based
Policy
No Value Function
Actor Critic
Policy
Value Function
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent
Model Free
Policy and/or Value Function
No Model
Model Based
Policy and/or Value Function
Model
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent
RL Agent Taxonomy
Model-Free
Value-Based Policy-Based
Model-Based
Model
Lecture 1: Introduction to Reinforcement Learning
Problems within RL
observation action
Ot At
Rules of the game are
unknown
reward Rt
Learn directly from
interactive game-play
Pick actions on
joystick, see pixels
and scores
Lecture 1: Introduction to Reinforcement Learning
Problems within RL
Examples
Restaurant Selection
Exploitation Go to your favourite restaurant
Exploration Try a new restaurant
Online Banner Advertisements
Exploitation Show the most successful advert
Exploration Show a different advert
Oil Drilling
Exploitation Drill at the best known location
Exploration Drill at a new location
Game Playing
Exploitation Play the move you believe is best
Exploration Play an experimental move
Lecture 1: Introduction to Reinforcement Learning
Problems within RL
(a) (b)
What is the value function for the uniform random policy?
Lecture 1: Introduction to Reinforcement Learning
Problems within RL
a) gridworld v
b) V* c) *
Course Outline