Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	182/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 178 179 180 181 182 183 184 185 ... 423

1-Data Mining tarjima

10.4. RULE-BASED CLASSIFIERS

305

10.4.4 Associative Classifiers

Associative classifiers are a popular strategy because they rely on association pattern mining, for which many eﬃcient algorithmic alternatives exist. The reader is referred to Chap. 4 for algorithms on association pattern mining. The discussion below assumes binary attributes, though any data type can be converted to binary attributes with the process of discretization and binarization, as discussed in Chap. 2. Furthermore, unlike sequential covering algorithms in which rules are always ordered, the rules created by associative clas-sifiers may be either ordered or unordered, depending upon application-specific criteria. The main characteristic of class- based association rules is that they are mined in the same way as regular association rules, except that they have a single class variable in the consequent. The basic strategy for an associative classifier is as follows:

Mine all class-based association rules at a given level of minimum support and confi-dence.

For a given test instance, use the mined rules for classification.

A variety of choices exist for the implementation of both steps. A naive way of implementing the first step would be to mine all association rules and then filter out only the rules in which the consequent corresponds to an individual class. However, such an approach is rather wasteful because it generates many rules with nonclass consequents. Furthermore, there is significant redundancy in the rule set because many rules that have 100 % confidence are special cases of other rules with 100 % confidence. Therefore, pruning methods are required during the rule-generation process.

The classification based on associations (CBA) approach uses a modification of the Apriori method to generate associations that satisfy the corresponding constraints. The first step is to generate 1-rule-items. These are newly created items corresponding to com-binations of items and class attributes. These rule items are then extended using traditional Apriori-style processing. Another modification is that, when patterns are generated corre-sponding to rules with 100 % confidence, those rules are not extended in order to retain greater generality in the rule set. This broader approach can be used in conjunction with almost any tree enumeration algorithm. The bibliographic notes contain pointers to several recent algorithms that use other frequent pattern mining methods for rule generation.

The second step of associative classification uses the generated rule set to make pre-dictions for unseen test instances. Both ordered or unordered strategies may be used. The ordered strategy prioritizes the rules on the basis of the support (analogous to coverage), and the confidence (analogous to accuracy). A variety of heuristics may be used to create an integrated measure for ordering, such as using a weighted combination of support and confidence. The reader is referred to Chap. 17 for discussion of a representative rule-based classifier, XRules, which uses diﬀerent types of measures. After the rules have been ordered, the top m matching rules to the test instance are determined. The dominant class label from the matching rules is reported as the relevant one for the test instance. A second strategy does not order the rules but determines the dominant class label from all the triggered rules. Other heuristic strategies may weight the rules diﬀerently, depending on their support and confidence, for the prediction process. Furthermore, many variations of associative classifiers do not use the support or confidence for mining the rules, but directly use class-based dis-criminative methods for pattern mining. The bibliographic notes contain pointers to these methods.

306 CHAPTER 10. DATA CLASSIFICATION

10.5 Probabilistic Classifiers

Probabilistic classifiers construct a model that quantifies the relationship between the fea-ture variables and the target (class) variable as a probability. There are many ways in which such a modeling can be performed. Two of the most popular models are as follows:

Yüklə 17,13 Mb.

Dostları ilə paylaş: