Here, W = (w1 . . . wd) is a d-dimensional row vector of coefficients that needs to be learned from the training data so as to minimize the unexplained error n (W · Xi − yi)2 of
i=1
modeling. The response values of test instances can be predicted with this linear relationship. Note that a constant (bias) term is not needed on the right-hand side, because we can append an artificial dimension1 with a value of 1 to each data point to include the constant term within W . Alternatively, instead of using an artificial dimension, one can mean-center the
Here, we assume that the total number of dimensions is d, including the artificial column.
354
|
|
|
|
CHAPTER 11. DATA CLASSIFICATION: ADVANCED CONCEPTS
|
|
|
12
|
|
|
|
|
|
|
|
|
|
|
|
120
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MINIMIZE SUM OF
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
|
|
|
|
|
10
|
|
|
SQUARED ERRORS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8
|
|
|
|
|
|
|
|
|
|
|
|
80
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
VARIABLE
|
|
|
|
|
|
|
|
|
|
|
|
VARIABLE
|
60
|
|
|
|
|
|
|
|
|
|
|
|
6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40
|
|
|
|
|
|
|
|
|
|
|
|
RESPONSE
|
|
|
|
|
|
|
|
|
|
|
|
RESPONSE
|
|
|
|
|
|
|
|
|
|
|
|
4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20
|
|
|
|
|
|
|
|
|
|
|
|
|
2
|
|
|
|
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
|
|
|
|
|
−20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
−20
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
|
−400
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
|
|
|
|
|
|
FEATURE VARIABLE
|
|
|
|
|
|
|
|
|
|
FEATURE VARIABLE
|
|
|
|
|
|
|
|
|
(a) Linear regression y = x
|
|
|
|
|
(b) Nonlinear regression y = x2
|
|
|
|
Figure 11.1: Examples of linear and nonlinear regression
data matrix and the response variable. In such a case, it can be shown that the bias term is not necessary (see Exercise 8) . Furthermore, the standard deviations of all columns of the data matrix, except for the artificial column, are assumed to have been scaled to 1. In general, it is common to standardize the data in this way to ensure similar scaling and weighting for all attributes. An example of a linear relationship for a 1-dimensional feature variable is illustrated in Fig. 11.1a.
To minimize the squared-error of prediction on the training data, one must determine
that minimizes the following objective function O:
Dostları ilə paylaş: |