Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə177/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   173   174   175   176   177   178   179   180   ...   423
1-Data Mining tarjima

Wrapper models: It is assumed that a classification algorithm is available to evaluate how well the algorithm performs with a particular subset of features. A feature search algorithm is then wrapped around this algorithm to determine the relevant set of features.




  1. Embedded models: The solution to a classification model often contains useful hints about the most relevant features. Such features are isolated, and the classifier is retrained on the pruned features.

In the following discussion, each of these models will be explained in detail.


10.2.1 Filter Models


In filter models, a feature or a subset of features is evaluated with the use of a class-sensitive discriminative criterion. The advantage of evaluating a group of features at one time is that redundancies are well accounted for. Consider the case where two feature variables are perfectly correlated with one another, and therefore each can be predicted using the other. In such a case, it makes sense to use only one of these features because the other adds no incremental knowledge with respect to the first. However, such methods are often expensive because there are 2d possible subsets of features on which a search may need to be performed. Therefore, in practice, most feature selection methods evaluate the features independently of one another and select the most discriminative ones.


Some feature selection methods, such as linear discriminant analysis, create a linear combination of the original features as a new set of features. Such analytical methods can be viewed either as stand- alone classifiers or as dimensionality reduction methods that are used before classification, depending on how they are used. These methods will also be discussed in this section.


10.2.1.1 Gini Index


The Gini index is commonly used to measure the discriminative power of a particular feature. Typically, it is used for categorical variables, but it can be generalized to numeric attributes by the process of discretization. Let v1 . . . vr be the r possible values of a particular categorical attribute, and let pj be the fraction of data points containing attribute value vi that belong to the class j ∈ {1 . . . k} for the attribute value vi. Then, the Gini index G(vi) for the value vi of a categorical attribute is defined as follows:





k




G(vi) = 1 − pj2.

(10.1)

j=1







When the different classes are distributed evenly for a particular attribute value, the value of the Gini index is 1 1/k. On the other hand, if all data points for an attribute value

10.2. FEATURE SELECTION FOR CLASSIFICATION

289







1





































0.9





































0.8





































0.7


































VALUE

0.6






































































CRITERION

0.5


































0.4









































































0.3





































0.2





































0.1













GINI INDEX































ENTROPY




















































00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1



















FRACTION OF FIRST CLASS
















Figure 10.1: Variation of two feature selection criteria with class distribution skew




vi belong to the same class, then the Gini index is 0. Therefore, lower values of the Gini index imply greater discrimination. An example of the Gini index for a two- class problem for varying values of p1 is illustrated in Fig. 10.1. Note that the index takes on its maximum value at p1 = 0.5.

The value-specific Gini index is converted into an attributewise Gini index. Let ni be the number of data points that take on the value vi for the attribute. Then, for a data set



containing

r










i=1 ni = n data points, the overall Gini index G for the attribute is defined as




the weighted average over the different attribute values as follows:













r










G =

niG(vi)/n.

(10.2)






i=1

Lower values of the Gini index imply greater discriminative power. The Gini index is typi-cally defined for a particular feature rather than a subset of features.


10.2.1.2 Entropy


The class-based entropy measure is related to notions of information gain resulting from fixing a specific attribute value. The entropy measure achieves a similar goal as the Gini index at an intuitive level, but it is based on sound information-theoretic principles. As before, let pj be the fraction of data points belonging to the class j for attribute value vi. Then, the class-based entropy E(vi) for the attribute value vi is defined as follows:





k




E(vi) = − pj log2(pj ).

(10.3)

j=1




The class-based entropy value lies in the interval [0, log 2(k)]. Higher values of the entropy imply greater “mixing” of different classes. A value of 0 implies perfect separation, and, therefore, the largest possible discriminative power. An example of the entropy for a two-class problem with varying values of the probability p1 is illustrated in Fig. 10.1. As in the case of the Gini index, the overall entropy E of an attribute is defined as the weighted


290 CHAPTER 10. DATA CLASSIFICATION


average over the r different attribute values:





r




E =niE(vi)/n.

(10.4)

i=1




Here, ni is the frequency of attribute value vi.


10.2.1.3 Fisher Score


The Fisher score is naturally designed for numeric attributes to measure the ratio of the average interclass separation to the average intraclass separation. The larger the Fisher score, the greater the discriminatory power of the attribute. Let μj and σj , respectively, be the mean and standard deviation of data points belonging to class j for a particular feature, and let pj be the fraction of data points belonging to class j. Let μ be the global mean of the data on the feature being evaluated. Then, the Fisher score F for that feature may be defined as the ratio of the interclass separation to intraclass separation:








k











Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   173   174   175   176   177   178   179   180   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin