The main focus of machine learning is making decisions or predictions based on data

Yüklə 167,41 Kb.

Pdf görüntüsü

səhifə	9/13
tarix	02.01.2022
ölçüsü	167,41 Kb.
	#45744

1 ... 5 6 7 8 9 10 11 12 13

Chapter 1 Introduction

Squared loss

3 Evaluation criteria

Once we have specified a problem class, we need to say what makes an output or the an-

swer to a query good, given the training data. We specify evaluation criteria at two levels:

how an individual prediction is scored, and how the overall behavior of the prediction or

estimation system is scored.

The quality of predictions from a learned model is often expressed in terms of a loss

function. A loss function L(g, a) tells you how much you will be penalized for making a

guess g when the answer is actually a. There are many possible loss functions. Here are

some frequently used examples:

• 0-1 Loss applies to predictions drawn from finite domains.

If the actual values are

drawn from a contin-

uous distribution, the

probability they would

ever be equal to some

predicted g is 0 (except

for some weird cases).

If the actual values are

drawn from a contin-

uous distribution, the

probability they would

ever be equal to some

predicted g is 0 (except

for some weird cases).

(g

, a) =

0 if g = a

1 otherwise

• Squared loss

(g

, a) = (g − a)

• Linear loss

(g

, a) = |g − a|

• Asymmetric loss Consider a situation in which you are trying to predict whether

someone is having a heart attack. It might be much worse to predict “no” when the

answer is really “yes”, than the other way around.

, a) =











if g = 1 and a = 0

10 if g = 0 and a = 1

otherwise

Any given prediction rule will usually be evaluated based on multiple predictions and

the loss of each one. At this level, we might be interested in:

• Minimizing expected loss over all the predictions (also known as risk)

• Minimizing maximum loss: the loss of the worst prediction

• Minimizing or bounding regret: how much worse this predictor performs than the

best one drawn from some class

• Characterizing asymptotic behavior: how well the predictor will perform in the limit

of infinite training data

• Finding algorithms that are probably approximately correct: they probably generate

a hypothesis that is right most of the time.

There is a theory of rational agency that argues that you should always select the action

that minimizes the expected loss. This strategy will, for example, make you the most money

in the long run, in a gambling setting. Expected loss is also sometimes called risk in the

Of course, there are

other models for ac-

tion selection and it’s

clear that people do not

always (or maybe even

often) select actions that

follow this rule.

Of course, there are

other models for ac-

tion selection and it’s

clear that people do not

always (or maybe even

often) select actions that

follow this rule.

machine-learning literature, but that term means other things in economics or other parts

of decision theory, so be careful...it’s risky to use it. We will, most of the time, concentrate

on this criterion.

Last Updated: 08/04/21 21:06:54

MIT 6.036

Fall 2021

Yüklə 167,41 Kb.

Dostları ilə paylaş:

1 ... 5 6 7 8 9 10 11 12 13