Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	180/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 176 177 178 179 180 181 182 183 ... 423

1-Data Mining tarjima

Salary > 50,000
Age > 50 Salary
(Age > 50) Donor
Salary 50,000 Salary > 50,000 (Age > 50 ) AND (Salary > 50,000) Donor

Donor

Age < 50				Salary > 50,000

		Age > 50	Salary < 50,000

(Age > 50)	Donor

Salary < 50,000
Salary > 50,000
(Age > 50 ) AND (Salary > 50,000) Donor

Figure 10.5: Rule growth is analogous to decision tree construction

similarity between rule growth and decision trees suggests the use of analogous measures such as the accuracy, entropy, or the Gini index, as used for split criteria in decision trees.

The criteria do need to be modified because a rule is relevant only to the training exam-ples covered by the antecedent and the single class in the consequent, whereas decision tree splits are evaluated with respect to all training examples at a given node and all classes. Furthermore, decision tree split measures do not need to account for issues such as the coverage of the rule. One would like to determine rules with high coverage in order to avoid overfitting. For example, a rule that covers only a single training instance will always have 100 % accuracy, but it does not usually generalize well to unseen test instances. There-fore, one strategy is to combine the accuracy and coverage criteria into a single integrated measure.

The simplest combination approach is to use Laplacian smoothing with a parameter β that regulates the level of smoothing in a training data set with k classes:

Laplace(β) =	n⁺ + β	(10.12)
	n⁺ + n⁻ + kβ ^.

The parameter β > 0 controls the level of smoothing, n+ represents the number of cor-rectly classified (positive) examples covered by the rule and n− represents the number of incorrectly classified (negative) examples covered by the rule. Therefore, the total number of covered examples is n+ + n−. For cases where the absolute number of covered exam-ples n+ + n− is very small, Laplacian smoothing penalizes the accuracy to account for the unreliability of low coverage. Therefore, the measure favors greater coverage.

A second possibility is the likelihood ratio statistic. Let n_j be the observed number of training data points covered by the rule that belong to class j , and let ne_j be the expected number of covered examples that would belong to class j, if the class distribution of the covered examples is the same as the full training data. In other words, if p₁ . . . p_k be the fraction of examples belonging to each class in the full training data, then we have:

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 176 177 178 179 180 181 182 183 ... 423