Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə213/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   209   210   211   212   213   214   215   216   ...   423
1-Data Mining tarjima

n

1

n n







LD = i=1 λi




i=1 j=1 λiλj yiyj K(




,




).

(11.1)







Xi

Xj




2




The number of Lagrangian parameters λ i (or optimization variables) is equal to the number of training data points n, and the size of the kernel matrix K(Xi , Xj ) is O (n2). As a result, the coefficients of the entire optimization problem cannot even be loaded in main memory for large values of n. The SVMLight approach is designed to address this issue. This approach is mainly based on the following two observations:





  1. It is not necessary to solve the entire problem at one time. A subset (or working set) of the variables λ1 . . . λn may be selected for optimization at a given time. Different working sets are selected and optimized iteratively to arrive at the global optimal solution.




  1. The support vectors for the SVMs correspond to only a small number of training data points. Even if most of the other training data points were removed, it would have no impact on the decision boundary of the SVM. Therefore, the early identification of such data points during the computationally intensive training process is crucial for efficiency maximization.

The following observations discuss how each of the aforementioned observations may be leveraged. In the case of the first observation, an iterative approach is used, in which the set of variables of the optimization problem are improved iteratively by fixing the majority of the variables to their current value, and improving only a small working set of the variables. Note that the size of the relevant kernel matrix within each local optimization scales with the square of the size q of the working set Sq, rather than the number n of training points. The SVMLight algorithm repeatedly executes the following two iterative steps until global optimality conditions are satisfied:





  1. Select q variables as the active working set Sq, and fix the remaining n − q variables to their current value.




  1. Solve LD(Sq), a smaller optimization subproblem, with only q variables.

A key issue is how the working set of size q may be identified in each iteration. Ideally, it is desired to select a working set for which the maximum improvement in the objective




Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   209   210   211   212   213   214   215   216   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin