Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	190/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 186 187 188 189 190 191 192 193 ... 423

1-Data Mining tarjima

n	λ_iy_i = 0. This alternative formulation of SVMs is discussed

i=1
in Chap. 13.

10.6. SUPPORT VECTOR MACHINES

319

10.6.2 Support Vector Machines with Soft Margin for Nonseparable Data

The previous section discussed the scenario where the data points of the two classes are linearly separable. However, perfect linear separability is a rather contrived scenario, and real data sets usually will not satisfy this property. An example of such a data set is illus-trated in Fig. 10.7b, where no linear separator may be found. Many real data sets may, however, be approximately separable, where most of the data points lie on correct sides of well-chosen separating hyperplanes. In this case, the notion of margin becomes a softer one because training data points are allowed to violate the margin constraints at the expense of a penalty. The two margin hyperplanes separate out “most” of the training data points but not all of them. An example is illustrated in Fig. 10.7b.

The level of violation of each margin constraint by training data point X_i is denoted by a slack variable ξ_i ≥ 0. Therefore, the new set of soft constraints on the separating hyperplanes may be expressed as follows:

· X_i + b ≥ +1 − ξ_i ∀i : y_i = +1

· X_i + b ≤ −1 + ξ_i ∀i : y_i = −1

ξ_i ≥ 0 ∀i.

These slack variables ξ_i may be interpreted as the distances of the training data points from the separating hyperplanes, as illustrated in Fig. 10.7b, when they lie on the “wrong” side of the separating hyperplanes. The values of the slack variables are 0 when they lie on the correct side of the separating hyperplanes. It is not desirable for too many training data points to have positive values of ξ_i, and therefore such violations are penalized by C · ξ_ir, where C and r are user-defined parameters regulating the level of softness in the model. Small values of C would result in relaxed margins, whereas large values of C would minimize training data errors and result in narrow margins. Setting C to be suﬃciently large would disallow any training data error in separable classes, which is the same as setting all slack variables to 0 and defaulting to the hard version of the problem. A popular choice of r is 1, which is also referred to as hinge loss. Therefore, the objective function for soft-margin SVMs, with hinge loss, is defined as follows:

					n
					n

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 186 187 188 189 190 191 192 193 ... 423