Example reweighting: The training examples from various classes are reweighted according to their misclassification costs. This approach naturally leads to a bias in classifying rare class examples more accurately than normal class examples. Therefore, classification algorithms need to be modified to work with weighted examples.
Example resampling: The examples from different classes are resampled to undersam-ple normal classes and/or oversample rare classes. In such cases, unweighted classifiers can be directly used.
Each of these different methods will be discussed in the following sections.
11.3.1 Example Reweighting
In this case, the examples are weighted in proportion to their costs. Because the origi-nal classification problem is designed to maximize accuracy, the analogous solution to the weighted problem maximizes cost-weighted accuracy. Therefore, all instances belonging to the ith class are weighted by C(i). Therefore, the existing classification algorithms need to be modified to work with these additional weights. In most cases, the required changes are relatively minor. The following contains a brief description of the required changes to various classification algorithms:
Decision trees: Weights can be incorporated in decision-tree algorithms easily. The split criterion requires the computation of the Gini index and entropy, all of which can be computed using weights on the examples. Both the Gini index and the entropy are computed as a function of the proportionate class distribution of the training examples. This proportionate class distribution can be computed with the use of
11.3. RARE CLASS LEARNING
|
349
|
weights on the examples. Tree-pruning can also be modified to measure the impact of removing nodes on the weighted accuracy.
Rule-based classifiers: Sequential covering algorithms are similar to decision-tree con-struction. The main difference is in terms of the criteria used to grow rules. Measures such as the Laplace measure and FOIL’s information gain use the raw number of positive and negative examples covered by the rule. In this case, the weighted number of examples are used as substitute for the raw number of examples. Rule-pruning uses weighted accuracy to measure the impact of conjunct pruning. For associative clas-sifiers, the weights on the instances need to be used in computation of support and confidence.
Bayes classifiers: The implementation of Bayes classifiers remains virtually the same as the unweighted case except for one crucial difference in the probability estimation process. The class priors and conditional feature probabilities are now estimated using weights on the instances.
Support vector machines: Interestingly, the hard-margin support vector machines are not affected by reweighting of examples because the support vectors do not depend on example weights. However, in practice, soft margin is used. In such cases, the slack penalty terms in the objective function are appropriately weighted, and it results in modifications to both the primal and dual methods for soft SVMs (see Exercises 3 and 4). This typically leads to a movement of the boundary of the support-vector machine toward the normal class side of the separation. This ensures that fewer rare class examples are penalized for (the more costly) margin violation, and more normal class examples are penalized. The result is a lower likelihood of incorrectly misclassifying rare class examples but a greater likelihood of misclassifying normal class examples.
Dostları ilə paylaş: |