Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə214/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   210   211   212   213   214   215   216   217   ...   423
1-Data Mining tarjima

11.5. REGRESSION MODELING WITH NUMERIC CLASSES

353

function is achieved. Let V be a vector with length equal to the number of Lagrangian variables and at most q nonzero elements. The goal is to determine the optimal choice for the q nonzero elements to determine the working set. An optimization problem is set up for determining V in which the dot product of V with the gradient of LD (with respect to the Lagrangian variables) is optimized. This is a separate optimization problem that needs to be solved in each iteration to determine the optimal working set.


The second idea for speeding up support vector machines is that of shrinking the training data. In the support vector machine formulation, the focus is primarily on the decision boundary. Training examples that are on the correct size of the margin, and far away from it, have no impact on the solution to the optimization problem, even if they are removed. The early identification of these training examples is required during the optimization process to benefit as much as possible from their removal. A heuristic approach, based on the Lagrangian multiplier estimates, is used in the SVMLight approach. The specific details of determining these training examples are beyond the scope of this book but pointers are provided in the bibliographic notes. Another later approach, known as SVMPerf, shows how to achieve linear scale-up, but for the case of the linear model only. For some domains, such as text, the linear model works quite well in practice. Furthermore, the SVMPerf method has O(s · n) complexity where s is the number of nonzero features, and n is the number of training examples. In cases where s d, such a classifier is very effective. This is the case for sparse high-dimensional domains such as text and market basket data. Therefore, this approach will be described in Sect. 13.5.3 of Chap. 13 on text data.


11.5 Regression Modeling with Numeric Classes

In many applications, the class variables are numerical. In this case, the goal is to minimize the squared error of prediction of the numeric class variable. This variable is also referred to as the response variable, dependent variable, or regressand. The feature variables are referred


to as explanatory variables, input variables, predictor variables, independent variables , or regressors. The prediction process is referred to as regression modeling. This section will discuss a number of such regression modeling algorithms.


11.5.1 Linear Regression


Let D be an n × d data matrix whose ith data point (row) is the d-dimensional input feature vector Xi , and the corresponding response variable is yi. Let the n-dimensional column-vector of response variables be denoted by y = (y1, . . . yn)T . In linear regression, the dependence of each response variable yi on the corresponding independent variables Xi is modeled in the form of a linear relationship:





yi




·




i ∈ {1 n}.

(11.2)





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   210   211   212   213   214   215   216   217   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin