Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə199/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   195   196   197   198   199   200   201   202   ...   423
1-Data Mining tarjima

10.8. INSTANCE-BASED LEARNING

333







1





































0.9
















CLASS A























































0.8





































0.7














































LINEAR




























0.6







DISCRIMINANT






















Y






































































FEATURE

0.5
















X<−TEST INSTANCE





































0.4









































































0.3





































0.2
















CLASS B

















































0.1





































00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1






















FEATURE X
















Figure 10.11: Importance of class sensitivity in distance function design


matrix A in Eq. 10.71. In the case of Fig. 10.11, the best discriminating direction is illus-trated pictorially. Fisher’s linear discriminant, discussed in Sect. 10.2.1.4, can be used to determine this direction, and map the data into a 1-dimensional space. In this 1-dimensional space, the different classes are separated out perfectly. The nearest-neighbor classifier will work well in this newly projected space. This is a very special example where only a 1-dimensional projection works well. However, it may not be generalizable to an arbitrary data set.

A more general way of computing the distances in a class-sensitive way, is to use a soft weighting of different directions, rather than selecting specific dimensions in a hard way. This can be achieved with the use of an appropriate choice of matrix A in Eq. 10.71. The choice of matrix A defines the shape of the neighborhood of a test instance. A distortion of this neighborhood from the circular Euclidean contour corresponds to a soft weighting, as opposed to a hard selection of specific directions. A soft weighting is also more robust in the context of smaller training data sets where the optimal linear discriminant cannot be found without overfitting. Thus, the core idea is to “elongate” the neighborhoods along the less discriminative directions and “shrink” the neighborhoods along the more discrimi-native directions with the use of matrix A. Note that the elongation of a neighborhood in a direction by a particular factor α > 1, is equivalent to de-emphasizing that direction by that factor because distance components in that direction need to be divided by α. This is also done in the case of the Mahalanobis metric, except that the Mahalanobis metric is an unsupervised approach that is agnostic to the class distribution. In the case of the unsu-pervised Mahalanobis metric, the level of elongation achieved by the matrix A is inversely dependent on the variance along the different directions. In the supervised scenario, the goal is to elongate the directions, so that the level of elongation is inversely dependent on the ratio of the interclass variance to intraclass variance along the different directions.


Let D be the full database, and Di be the portion of the data set belonging to class i. Let μ represent the mean of the entire data set. Let pi = |Di|/|D| be the fraction of data points belonging to class i, μi be the d-dimensional row vector of means of Di, and Σi be the


334 CHAPTER 10. DATA CLASSIFICATION




d × d covariance matrix of Di. Then, the scaled10 within-class scatter matrix Sw is defined as follows:













k










Sw =







piΣi.

(10.73)













i=1







The between-class scatter matrix Sb may be computed as follows:










k































Sb =

pi(









)T (









).

(10.74)




μi

μ

μi

μ




i=1

Note that the matrix Sb is a d×d matrix because it results from the product of a 1 matrix with a 1×d matrix. Then, the matrix A (of Eq. 10.71), which provides the desired distortion of the distances on the basis of class distribution, can be shown to be the following:





A = Sw1SbSw1.

(10.75)

It can be shown that this choice of the matrix A provides an excellent discrimination between the different classes, where the elongation in each direction depends inversely on ratio of the between-class variance to within-class variance along the different directions. The reader is referred to the bibliographic notes for pointers to the derivation of the aforementioned steps.


10.9 Classifier Evaluation

Given a classification model, how do we quantify its accuracy on a given data set? Such a quantification has several applications, such as evaluation of classifier effectiveness, com-paring different models, selecting the best one for a particular data set, parameter tuning, and several meta-algorithms such as ensemble analysis. The last of these applications will be discussed in the next chapter. This leads to several challenges, both in terms of method-ology used for the evaluation, and the specific approach used for quantification. These two challenges are stated as follows:





  1. Methodological issues: The methodological issues are associated with dividing the labeled data appropriately into training and test segments for evaluation. As will become apparent later, the choice of methodology has a direct impact on the eval-uation process, such as the underestimation or overestimation of classifier accuracy. Several approaches are possible, such as holdout, bootstrap, and cross-validation.




  1. Quantification issues: The quantification issues are associated with providing a numer-ical measure for the quality of the method after a specific methodology (e.g., cross-validation) for evaluation has been selected. Examples of such measures could include the accuracy, the cost-sensitive accuracy, or a receiver operating characteristic curve quantifying the trade-off between true positives and false positives. Other types of numerical measures are specifically designed to compare the relative performance of classifiers.

In the following, these different aspects of classifier evaluation will be studied in detail.






  1. The unscaled version may be obtained by multiplying Sw with the number of data points. There is no difference to the final result whether the scaled or unscaled version is used, within a constant of proportionality.




10.9. CLASSIFIER EVALUATION




335





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   195   196   197   198   199   200   201   202   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin