constraints (Eqs. 10.42–10.43) on the training points. Note that each training data point leads to a constraint, which tends to make the optimization problem rather large, and explains the high computational complexity of SVMs.
Such constrained nonlinear programming problems are solved using a method known as Lagrangian relaxation. The broad idea is to associate a nonnegative n-dimensional set of Lagrangian multipliers λ = (λ1 . . . λn) ≥ 0 for the different constraints. The multiplier λi corresponds to the margin constraint of the ith training data point. The constraints are then relaxed, and the objective function is augmented by incorporating a Lagrangian penalty for constraint violation:
|
|
|
|
|
|
|
|
n
|
|
|
|
|
|
|
|
|
|
|
|
|
|
=
|
|
||W ||2
|
−
|
|
y
|
(
|
|
·
|
|
+ b)
|
−
|
1 .
|
(10.45)
|
|
|
|
|
|
|
|
|
|
L
|
|
|
λ
|
|
W
|
X
|
|
|
P
|
|
2
|
|
i=1
|
i
|
i
|
|
|
i
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For fixed nonnegative values of λi, margin constraint violations increase Lp. Therefore, the penalty term pushes the optimized values of W and b towards constraint nonviolation for minimization of LP with respect to W and b. Values of W and b that satisfy the margin constraints will always result in a nonpositive penalty. Therefore, for any fixed nonnegative value of λ, the minimum value of LP will always be at most equal to that of the original optimal objective function value ||W ∗||2/2 because of the impact of the non-positive penalty term for any feasible (W ∗, b∗).
Therefore, if LP is minimized with respect to W and b for any particular λ, and then maximized with respect to nonnegative Lagrangian multipliers λ, the resulting dual solution L∗D will be a lower bound on the optimal objective function O ∗ = ||W ∗||2/2 of the SVM formulation. Mathematically, this weak duality condition can be expressed as follows:
O∗ ≥ LD∗ = max
|
|
≥0 min
|
|
,bLP .
|
(10.46)
|
|
λ
|
W
|
|
Optimization formulations such as SVM are special because the objective function is convex, and the constraints are linear. Such formulations satisfy a property known as strong duality. According to this property, the minimax relationship of Eq. 10.46 yields an optimal and feasible solution to the original problem (i.e., O∗ = L∗D) in which the Lagrangian penalty term has zero contribution. Such a solution (W ∗, b∗, λ∗) is referred to as the saddle point of the Lagrangian formulation. Note that zero Lagrangian penalty is achieved by a feasible solution only when each training data point Xi satisfies λi yi(W · Xi + b) − 1 = 0. These conditions are equivalent to the Kuhn–Tucker optimality conditions, and they imply that data points Xi with λi > 0 are support vectors. The Lagrangian formulation is solved using the following steps:
The Lagrangian objective LP can be expressed more conveniently as a pure maxi-mization problem by eliminating the minimization part from the awkward minimax formulation. This is achieved by eliminating the minimization variables W and b with gradient-based optimization conditions on these variables. By setting the gradient of LP with respect to W to 0, we obtain the following:
|
|
|
|
|
|
|
|
|
|
|
|
n
|
|
|
|
|
|
|
|
|
|
|
|
∇
|
|
|
|
=
|
∇
|
||W ||2
|
− ∇
|
|
y
|
(
|
|
·
|
|
+ b)
|
−
|
1 = 0
|
(10.47)
|
|
|
|
|
|
|
|
|
|
L
|
|
λ
|
|
W
|
X
|
|
|
|
P
|
|
2
|
|
|
i=1
|
i
|
i
|
|
|
i
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
n
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
= 0.
|
|
|
|
|
|
|
|
|
|
|
(10.48)
|
|
W −
|
|
λiyiXi
|
|
|
|
|
|
|
|
|
|
|
|
i=1
10.6. SUPPORT VECTOR MACHINES
|
317
|
Therefore, one can now derive an expression for W in terms of the Lagrangian multi-pliers and the training data points:
i=1
Furthermore, by setting the partial derivative of LP with respect to b to 0, we obtain
2. The
|
optimization condition
|
n
|
0
|
can
|
be used to eliminate the term
|
|
i=1 λiyi =
|
|
|
n
|
|
|
|
|
n
|
|
|
|
|
−b
|
|
|
|
=
|
λiyiXi from Eq. 10.49 can then
|
|
i=1 λiyi from LP . The expression W
|
i=1
|
|
be substituted in LP to create a dual problem LD in terms of only the maximization
|
|
variables λ. Specifically, the maximization objective function LD for the Lagrangian dual is as follows:
Dostları ilə paylaş: |