The main focus of machine learning is making decisions or predictions based on data

Yüklə 167,41 Kb.

Pdf görüntüsü

səhifə	11/13
tarix	02.01.2022
ölçüsü	167,41 Kb.
	#45744

1 ... 5 6 7 8 9 10 11 12 13

Chapter 1 Introduction

4.2 Prediction rule

This two-step process is more typical:

1. “Fit” a model to the training data

2. Use the model directly to make predictions

In the prediction rule setting of regression or classification, the model will be some hy-

pothesis or prediction rule y = h(x; θ) for some functional form h. The idea is that θ is

a vector of one or more parameter values that will be determined by fitting the model to

the training data and then be held fixed. Given a new x

(n+

1)

, we would then make the

prediction h(x

(n+

; θ).

We write f(a; b) to de-

scribe a function that is

usually applied to a sin-

gle argument a, but is a

member of a paramet-

ric family of functions,

with the particular func-

tion determined by pa-

rameter value b. So,

for example, we might

write h(x; p) = x

describe a function of a

single argument that is

parameterized by p.

We write f(a; b) to de-

scribe a function that is

usually applied to a sin-

gle argument a, but is a

member of a paramet-

ric family of functions,

with the particular func-

tion determined by pa-

rameter value b. So,

for example, we might

write h(x; p) = x

to

describe a function of a

single argument that is

parameterized by p.

The fitting process is often articulated as an optimization problem: Find a value of θ

that minimizes some criterion involving θ and the data. An optimal strategy, if we knew

the actual underlying distribution on our data, Pr(X, Y) would be to predict the value of

that minimizes the expected loss, which is also known as the test error. If we don’t have

that actual underlying distribution, or even an estimate of it, we can take the approach

of minimizing the training error: that is, finding the prediction rule h that minimizes the

average loss on our training data set. So, we would seek θ that minimizes

(θ) =

(h(x

(i)

; θ), y

(i)

)

where the loss function L(g, a) measures how bad it would be to make a guess of g when

the actual value is a.

We will find that minimizing training error alone is often not a good choice: it is possible

to emphasize fitting the current data too strongly and end up with a hypothesis that does

not generalize well when presented with new x values.

Yüklə 167,41 Kb.

Dostları ilə paylaş:

1 ... 5 6 7 8 9 10 11 12 13