The main focus of machine learning is making decisions or predictions based on data

Yüklə 167,41 Kb.

1 2 3 4 5 6 7 8 9 ... 13

Chapter 1 Introduction

1.3 Reinforcement learning

In reinforcement learning, the goal is to learn a mapping from input values x to output

values y, but without a direct supervision signal to specify which output values y are

best for a particular input. There is no training set specified a priori. Instead, the learning

problem is framed as an agent interacting with an environment, in the following setting:

• The agent observes the current state, x

(

0)

.

• It selects an action, y

(

• It receives a reward, r

(

0)

, which depends on x

(

and possibly y

(

Last Updated: 08/04/21 21:06:54

MIT 6.036

Fall 2021

• The environment transitions probabilistically to a new state, x

(

, with a distribution

that depends only on x

(

and y

(

• The agent observes the current state, x

(

1)

.

• . . .

The goal is to find a policy π, mapping x to y, (that is, states to actions) such that some

long-term sum or average of rewards r is maximized.

This setting is very different from either supervised learning or unsupervised learning,

because the agent’s action choices affect both its reward and its ability to observe the envi-

ronment. It requires careful consideration of the long-term effects of actions, as well as all

of the other issues that pertain to supervised learning.

Yüklə 167,41 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 13