Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	187/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 183 184 185 186 187 188 189 190 ... 423

1-Data Mining tarjima

. CLASS A . CLASS B

TEST INSTANCE	HYPERPLANE 2 SUPPORT VECTOR

.
_.^. _..
^._.. ^.
. CLASS A	.	CLASS B
.	.
^._. .

^.._. .

HYPERPLANE 1

CHAPTER 10. DATA CLASSIFICATION

.^._. .^{. .}_.

_..^. .
. ^.

^._. .	.
^._. .
_..
. ^.

MARGIN VIOLATION WITH PENALTY BASED SLACK VARIABLES

(a) Hard separation (b) Soft separation

Figure 10.7: Hard and soft SVMs

Consider a hyperplane that cleanly separates two linearly separable classes. The margin of the hyperplane is defined as the sum of its distances to the closest training points belong-ing to each of the two classes on the opposite side of the hyperplane. A further assumption is that the distance of the separating hyperplane to its closest training point of either class is the same. With respect to the separating hyperplane, it is possible to construct parallel hyperplanes that touch the training data of opposite classes on either side, and have no data point between them. The training data points on these hyperplanes are referred to as the support vectors, and the distance between the two hyperplanes is the margin. The separating hyperplane, or decision boundary, is precisely in the middle of these two hyper-planes in order to achieve the most accurate classification. The margins for hyperplane 1 and hyperplane 2 are illustrated in Fig. 10.7a by dashed lines. It is evident that the margin for hyperplane 1 is larger than that for hyperplane 2. Therefore, the former hyperplane provides better generalization power for unseen test instances in the “diﬃcult” uncertain region separating the two classes where classification errors are most likely. This is also con-sistent with our earlier example-based observation about the more accurate classification with hyperplane 1.

How do we determine the maximum margin hyperplane? The way to do this is to set up a nonlinear programming optimization formulation that maximizes the margin by expressing it as a function of the coeﬃcients of the separating hyperplane. The optimal coeﬃcients can be determined by solving this optimization problem. Let the n data points in the training set

be denoted by (X₁, y₁) . . . (X_n, y_n), where X_i is a d-dimensional row vector corresponding to the ith data point, and y_i ∈ {−1, +1} is the binary class variable of the ith data point. Then, the separating hyperplane is of the following form:

	·		+ b = 0.	(10.35)
W		X

Here, W = (w₁ . . . w_d) is the d-dimensional row vector representing the normal direction to the hyperplane, and b is a scalar, also known as the bias. The vector W regulates the orientation of the hyperplane and the bias b regulates the distance of the hyperplane from the origin. The (d + 1) coeﬃcients corresponding to W and b need to be learned from the training data to maximize the margin of separation between the two classes. Because it

10.6. SUPPORT VECTOR MACHINES

315

is assumed that the classes are linearly separable, such a hyperplane can also be assumed to exist. All data points X_i with y_i = +1 will lie on one side of the hyperplane satisfying W · X_i + b ≥ 0. Similarly, all points with y_i = −1 will lie on the other side of the hyperplane satisfying W · X_i + b ≤ 0.

	·		+ b ≥ 0	∀i : y_i = +1	(10.36)
W		^Xi
	·		+ b ≤ 0	∀i : y_i = −1	(10.37)
W		X_i

These constraints do not yet incorporate the margin requirements on the data points. A stronger set of constraints are defined using these margin requirements. It may be assumed that the separating hyperplane W · X + b = 0 is located in the center of the two margin-defining hyperplanes. Therefore, the two symmetric hyperplanes touching the support vec-tors can be expressed by introducing another parameter c that regulates the distance between them.

	·		+ b = +c	(10.38)
W		X
	·		+ b = −c	(10.39)
W		X

It is possible to assume, without loss of generality, that the variables W and b are appropri-ately scaled, so that the value of c can be set to 1. Therefore, the two separating hyperplanes can be expressed in the following form:

			·				+ b = +1	(10.40)
	W				X
		·				+ b = −1.		(10.41)
W				X

These constraints are referred to as margin constraints. The two hyperplanes segment the data space into three regions. It is assumed that no training data points lie in the uncertain decision boundary region between these two hyperplanes, and all training data points for each class are mapped to one of the two remaining (extreme) regions. This can be expressed as pointwise constraints on the training data points as follows:

+ b ≥ +1

∀i : y_i = +1

(10.42)

X_i

+ b ≤ −1

∀i : y_i = −1.

(10.43)

X_i

Note that the constraints for both the positive and negative classes can be written in the following succinct and algebraically convenient, but rather cryptic, form:

y_i(		·		+ b) ≥ +1 ∀i.	(10.44)
	W		X_i

The distance between the two hyperplanes for the positive and negative instances is also referred to as the margin. As discussed earlier, the goal is to maximize this margin. What is the distance (or margin) between these two parallel hyperplanes? One can use linear algebra to show that the distance between two parallel hyperplanes is the normalized diﬀerence between their constant terms, where the normalization factor is the L₂-norm

\|\|W \|\| =	d	of the coeﬃcients. Because the diﬀerence between the constant terms
	_i₌₁w_i²

of the two aforementioned hyperplanes is 2, it follows that the distance between them is 2/||W ||. This is the margin that needs to be maximized with respect to the aforementioned constraints. This form of the objective function is inconvenient because it incorporates a

316

CHAPTER 10. DATA CLASSIFICATION

square root in the denomin

ator of the objective function. However, maximizing 2/

|| is

is a convex quadratic programming problem, because

the same as minimizing ||W || /2. This

/2 needs to be minimized subject to a set of linear

the quadratic objective function ||

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 183 184 185 186 187 188 189 190 ... 423