10.6. SUPPORT VECTOR MACHINES
|
319
|
10.6.2 Support Vector Machines with Soft Margin for Nonseparable Data
The previous section discussed the scenario where the data points of the two classes are linearly separable. However, perfect linear separability is a rather contrived scenario, and real data sets usually will not satisfy this property. An example of such a data set is illus-trated in Fig. 10.7b, where no linear separator may be found. Many real data sets may, however, be approximately separable, where most of the data points lie on correct sides of well-chosen separating hyperplanes. In this case, the notion of margin becomes a softer one because training data points are allowed to violate the margin constraints at the expense of a penalty. The two margin hyperplanes separate out “most” of the training data points but not all of them. An example is illustrated in Fig. 10.7b.
The level of violation of each margin constraint by training data point Xi is denoted by a slack variable ξi ≥ 0. Therefore, the new set of soft constraints on the separating hyperplanes may be expressed as follows:
· Xi + b ≥ +1 − ξi ∀i : yi = +1
· Xi + b ≤ −1 + ξi ∀i : yi = −1
ξi ≥ 0 ∀i.
These slack variables ξi may be interpreted as the distances of the training data points from the separating hyperplanes, as illustrated in Fig. 10.7b, when they lie on the “wrong” side of the separating hyperplanes. The values of the slack variables are 0 when they lie on the correct side of the separating hyperplanes. It is not desirable for too many training data points to have positive values of ξi, and therefore such violations are penalized by C · ξir, where C and r are user-defined parameters regulating the level of softness in the model. Small values of C would result in relaxed margins, whereas large values of C would minimize training data errors and result in narrow margins. Setting C to be sufficiently large would disallow any training data error in separable classes, which is the same as setting all slack variables to 0 and defaulting to the hard version of the problem. A popular choice of r is 1, which is also referred to as hinge loss. Therefore, the objective function for soft-margin SVMs, with hinge loss, is defined as follows:
Dostları ilə paylaş: |