Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	199/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 195 196 197 198 199 200 201 202 ... 423

1-Data Mining tarjima

10.8. INSTANCE-BASED LEARNING

333

	1
	0.9								CLASS A
									CLASS A
	0.8
	0.7
				LINEAR
	0.6			DISCRIMINANT
Y	0.6
Y
FEATURE	0.5							X<−TEST INSTANCE
								X<−TEST INSTANCE
	0.4
	0.4
	0.3
	0.2								CLASS B
									CLASS B
	0.1
	0₀	0.1	0.2	0.3	0.4		0.5		0.6	0.7	0.8	0.9	1
						FEATURE X

Figure 10.11: Importance of class sensitivity in distance function design

matrix A in Eq. 10.71. In the case of Fig. 10.11, the best discriminating direction is illus-trated pictorially. Fisher’s linear discriminant, discussed in Sect. 10.2.1.4, can be used to determine this direction, and map the data into a 1-dimensional space. In this 1-dimensional space, the diﬀerent classes are separated out perfectly. The nearest-neighbor classifier will work well in this newly projected space. This is a very special example where only a 1-dimensional projection works well. However, it may not be generalizable to an arbitrary data set.

A more general way of computing the distances in a class-sensitive way, is to use a soft weighting of diﬀerent directions, rather than selecting specific dimensions in a hard way. This can be achieved with the use of an appropriate choice of matrix A in Eq. 10.71. The choice of matrix A defines the shape of the neighborhood of a test instance. A distortion of this neighborhood from the circular Euclidean contour corresponds to a soft weighting, as opposed to a hard selection of specific directions. A soft weighting is also more robust in the context of smaller training data sets where the optimal linear discriminant cannot be found without overfitting. Thus, the core idea is to “elongate” the neighborhoods along the less discriminative directions and “shrink” the neighborhoods along the more discrimi-native directions with the use of matrix A. Note that the elongation of a neighborhood in a direction by a particular factor α > 1, is equivalent to de-emphasizing that direction by that factor because distance components in that direction need to be divided by α. This is also done in the case of the Mahalanobis metric, except that the Mahalanobis metric is an unsupervised approach that is agnostic to the class distribution. In the case of the unsu-pervised Mahalanobis metric, the level of elongation achieved by the matrix A is inversely dependent on the variance along the diﬀerent directions. In the supervised scenario, the goal is to elongate the directions, so that the level of elongation is inversely dependent on the ratio of the interclass variance to intraclass variance along the diﬀerent directions.

Let D be the full database, and D_i be the portion of the data set belonging to class i. Let μ represent the mean of the entire data set. Let p_i = |D_i|/|D| be the fraction of data points belonging to class i, μ_i be the d-dimensional row vector of means of D_i, and Σ_i be the

334 CHAPTER 10. DATA CLASSIFICATION

d × d covariance matrix of D_i. Then, the scaled10 within-class scatter matrix S_w is defined as follows:

S_w =

p_iΣ_i.

(10.73)

i=1

The between-class scatter matrix S_b may be computed as follows:

S_b =

p_i(

−

)T (

−

(10.74)

μ_i

i=1

Note that the matrix S_b is a d×d matrix because it results from the product of a d×1 matrix with a 1×d matrix. Then, the matrix A (of Eq. 10.71), which provides the desired distortion of the distances on the basis of class distribution, can be shown to be the following:

A = S_w⁻¹S_bS_w⁻¹.

(10.75)

It can be shown that this choice of the matrix A provides an excellent discrimination between the diﬀerent classes, where the elongation in each direction depends inversely on ratio of the between-class variance to within-class variance along the diﬀerent directions. The reader is referred to the bibliographic notes for pointers to the derivation of the aforementioned steps.

10.9 Classifier Evaluation

Given a classification model, how do we quantify its accuracy on a given data set? Such a quantification has several applications, such as evaluation of classifier eﬀectiveness, com-paring diﬀerent models, selecting the best one for a particular data set, parameter tuning, and several meta-algorithms such as ensemble analysis. The last of these applications will be discussed in the next chapter. This leads to several challenges, both in terms of method-ology used for the evaluation, and the specific approach used for quantification. These two challenges are stated as follows:

Methodological issues: The methodological issues are associated with dividing the labeled data appropriately into training and test segments for evaluation. As will become apparent later, the choice of methodology has a direct impact on the eval-uation process, such as the underestimation or overestimation of classifier accuracy. Several approaches are possible, such as holdout, bootstrap, and cross-validation.

Quantification issues: The quantification issues are associated with providing a numer-ical measure for the quality of the method after a specific methodology (e.g., cross-validation) for evaluation has been selected. Examples of such measures could include the accuracy, the cost-sensitive accuracy, or a receiver operating characteristic curve quantifying the trade-oﬀ between true positives and false positives. Other types of numerical measures are specifically designed to compare the relative performance of classifiers.

In the following, these diﬀerent aspects of classifier evaluation will be studied in detail.

The unscaled version may be obtained by multiplying S_w with the number of data points. There is no diﬀerence to the final result whether the scaled or unscaled version is used, within a constant of proportionality.

10.9. CLASSIFIER EVALUATION		335

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 195 196 197 198 199 200 201 202 ... 423