Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	188/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 184 185 186 187 188 189 190 191 ... 423

1-Data Mining tarjima

\|\|

W

constraints (Eqs. 10.42–10.43) on the training points. Note that each training data point leads to a constraint, which tends to make the optimization problem rather large, and explains the high computational complexity of SVMs.

Such constrained nonlinear programming problems are solved using a method known as Lagrangian relaxation. The broad idea is to associate a nonnegative n-dimensional set of Lagrangian multipliers λ = (λ₁ . . . λ_n) ≥ 0 for the diﬀerent constraints. The multiplier λ_i corresponds to the margin constraint of the ith training data point. The constraints are then relaxed, and the objective function is augmented by incorporating a Lagrangian penalty for constraint violation:

||W ||²

−

(

+ b)

−

1 .

(10.45)

i=1

For fixed nonnegative values of λ_i, margin constraint violations increase L_p. Therefore, the penalty term pushes the optimized values of W and b towards constraint nonviolation for minimization of L_P with respect to W and b. Values of W and b that satisfy the margin constraints will always result in a nonpositive penalty. Therefore, for any fixed nonnegative value of λ, the minimum value of L_P will always be at most equal to that of the original optimal objective function value ||W ∗||2/2 because of the impact of the non-positive penalty term for any feasible (W ∗, b∗).

Therefore, if L_P is minimized with respect to W and b for any particular λ, and then maximized with respect to nonnegative Lagrangian multipliers λ, the resulting dual solution L∗_D will be a lower bound on the optimal objective function O ∗ = ||W ∗||2/2 of the SVM formulation. Mathematically, this weak duality condition can be expressed as follows:

O^∗ ≥ L_D^∗ = max		_≥₀min		_,b^LP ^.	(10.46)
	λ		W

Optimization formulations such as SVM are special because the objective function is convex, and the constraints are linear. Such formulations satisfy a property known as strong duality. According to this property, the minimax relationship of Eq. 10.46 yields an optimal and feasible solution to the original problem (i.e., O∗ = L∗_D) in which the Lagrangian penalty term has zero contribution. Such a solution (W ∗, b^∗, λ∗) is referred to as the saddle point of the Lagrangian formulation. Note that zero Lagrangian penalty is achieved by a feasible solution only when each training data point X_i satisfies λ_i y_i(W · X_i + b) − 1 = 0. These conditions are equivalent to the Kuhn–Tucker optimality conditions, and they imply that data points X_i with λ_i > 0 are support vectors. The Lagrangian formulation is solved using the following steps:

The Lagrangian objective L_P can be expressed more conveniently as a pure maxi-mization problem by eliminating the minimization part from the awkward minimax formulation. This is achieved by eliminating the minimization variables W and b with gradient-based optimization conditions on these variables. By setting the gradient of L_P with respect to W to 0, we obtain the following:

∇

||W ||²

− ∇

(

+ b)

−

1 = 0

(10.47)

i=1

= 0.

(10.48)

W −

^λi^yi^Xi

i=1

10.6. SUPPORT VECTOR MACHINES

317

Therefore, one can now derive an expression for W in terms of the Lagrangian multi-pliers and the training data points:

n
W =λ_iy_iX_i.	(10.49)

i=1

Furthermore, by setting the partial derivative of L_P with respect to b to 0, we obtain

n	λ_iy_i = 0.
i=1	λ_iy_i = 0.

2. The

optimization condition

can

be used to eliminate the term

_i₌₁^λi^yi =

−b

λ_iy_iX_i from Eq. 10.49 can then

_i₌₁λ_iy_i from L_P . The expression W

i=1

be substituted in L_P to create a dual problem L_D in terms of only the maximization

variables λ. Specifically, the maximization objective function L_D for the Lagrangian dual is as follows:

n		1		n n

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 184 185 186 187 188 189 190 191 ... 423