Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə180/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   176   177   178   179   180   181   182   183   ...   423
1-Data Mining tarjima

Donor




Age < 50







Salary > 50,000






















Age > 50

Salary < 50,000

























(Age > 50)




Donor















Salary < 50,000
Salary > 50,000
(Age > 50 ) AND (Salary > 50,000) Donor

Figure 10.5: Rule growth is analogous to decision tree construction


similarity between rule growth and decision trees suggests the use of analogous measures such as the accuracy, entropy, or the Gini index, as used for split criteria in decision trees.

The criteria do need to be modified because a rule is relevant only to the training exam-ples covered by the antecedent and the single class in the consequent, whereas decision tree splits are evaluated with respect to all training examples at a given node and all classes. Furthermore, decision tree split measures do not need to account for issues such as the coverage of the rule. One would like to determine rules with high coverage in order to avoid overfitting. For example, a rule that covers only a single training instance will always have 100 % accuracy, but it does not usually generalize well to unseen test instances. There-fore, one strategy is to combine the accuracy and coverage criteria into a single integrated measure.


The simplest combination approach is to use Laplacian smoothing with a parameter β that regulates the level of smoothing in a training data set with k classes:





Laplace(β) =

n+ + β

(10.12)




n+ + n + .




The parameter β > 0 controls the level of smoothing, n+ represents the number of cor-rectly classified (positive) examples covered by the rule and n− represents the number of incorrectly classified (negative) examples covered by the rule. Therefore, the total number of covered examples is n+ + n−. For cases where the absolute number of covered exam-ples n+ + n− is very small, Laplacian smoothing penalizes the accuracy to account for the unreliability of low coverage. Therefore, the measure favors greater coverage.


A second possibility is the likelihood ratio statistic. Let nj be the observed number of training data points covered by the rule that belong to class j , and let nej be the expected number of covered examples that would belong to class j, if the class distribution of the covered examples is the same as the full training data. In other words, if p1 . . . pk be the fraction of examples belonging to each class in the full training data, then we have:







Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   176   177   178   179   180   181   182   183   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin