Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə181/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   177   178   179   180   181   182   183   184   ...   423
1-Data Mining tarjima

k




nie = pi

ni.

(10.13)



i=1

304 CHAPTER 10. DATA CLASSIFICATION


Then, for a k-class problem, the likelihood ratio statistic R may be computed as follows:





k




R = 2 nj log(nj /nje).

(10.14)



j=1

When the distribution of classes in the covered examples is significantly different than that in the original training data, the value of R increases. Therefore, the statistic tends to favor covered examples whose distributions are very different from the original training data. Furthermore, the presence of raw frequencies n1 . . . nk as multiplicative factors of the individual terms in the right-hand side of Eq. 10.14 ensures that larger rule coverage is rewarded. This measure is used by the CN2 algorithm.


Another criterion is FOIL’s information gain. The term “FOIL” stands for first order inductive learner. Consider the case where a rule covers n+1 positive examples and n1 negative examples, where positive examples are defined as training examples matching the class in the consequent. Consider the case where the addition of a conjunct to the antecedent changes the number of positive examples and negative examples to n+2 and n2, respectively. Then, FOIL’s information gain F G is defined as follows:





+

log2

n2+

− log2

n1+

(10.15)




F G = n2







.




n2+ + n2

n1+ + n1




This measure tends to select rules with high coverage because n+2 is a multiplicative factor in F G. At the same time, the information gain increases with higher accuracy because of the term inside the parentheses. This particular measure is used by the RIPPER algorithm.


As in the case of decision trees, it is possible to grow the rules until 100 % accuracy is achieved on the training data, or when the added conjunct does not improve the accuracy of the rule. Another criterion used by RIPPER is that the minimum description length of the rule must not increase by more than a certain threshold because of the addition of a conjunct. The description length of a rule is defined by a weighted function of the size of the conjuncts and the misclassified examples.

10.4.3 Rule Pruning


Rule-pruning is relevant not only for rules generated by the Learn-One-Rule method, but also for methods such as C4.5rules that extract the rules from a decision tree. Irrespective of the approach used to extract the rules, overfitting may result from the presence of too many conjuncts. As in decision tree pruning, the MDL principle can be used for pruning. For example, for each conjunct in the rule, one can add a penalty term δ to the quality criterion in the rule-growth phase. This will result in a pessimistic error rate. Rules with many conjuncts will therefore have larger aggregate penalties to account for their greater model complexity. A simpler approach for computing pessimistic error rates is to use a separate holdout validation set that is used for computing the error rate (without a penalty) but is not used by Learn-One-Rule during rule generation.


The conjuncts successively added during rule growth (in sequential covering) are then tested for pruning in reverse order. If pruning reduces the pessimistic error rate on the train-ing examples covered by the rule, then the generalized rule is used. While some algorithms such as RIPPER test the most recently added conjunct first for rule pruning, it is not a strict requirement to do so. It is possible to test the conjuncts for removal in any order, or in greedy fashion, to reduce the pessimistic error rate as much as possible. Rule pruning may result in some of the rules becoming identical. Duplicate rules are removed from the rule set before classification.




Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   177   178   179   180   181   182   183   184   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin