Data Mining: The Textbook

particular data set. However, over

Yüklə 17,13 Mb.

səhifə	236/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 232 233 234 235 236 237 238 239 ... 423

1-Data Mining tarjima

particular data set. However, over many data sets, the approach has the advantage of being able to use the best model that is suited to each data set, because diﬀerent classifiers may work diﬀerently on diﬀerent data sets. The bucket of models is used commonly for model selection and parameter tuning in classification algorithms. Each individual model is the same classifier over a diﬀerent choice of the parameters. The winner therefore provides the optimal parameter choice across all models.

The bucket of models approach is based on the idea that diﬀerent classifiers will have diﬀerent kinds of bias on diﬀerent data sets. This is because the “correct” decision boundary varies with the data set. By using a “winner-takes-all” contest, the classifier with the most accurate decision boundary will be selected for each data set. Because the bucket of models evaluates the classifier accuracy based on overall accuracy, it will also tend to select a model with lower variance. Therefore, the approach can reduce both the bias and the variance.

384 CHAPTER 11. DATA CLASSIFICATION: ADVANCED CONCEPTS

11.8.3.5 Stacking

The stacking approach is a very general one, in which two levels of classification are used. As in the case of the bucket of models approach, the training data is divided into two subsets A and B. The subset A is used for the first-level classifiers that are the ensemble components. The subset B is used for the second-level classifier that combines the results from diﬀerent ensemble components in the previous phase. These two steps are described as follows:

Train a set of k classifiers (ensemble components) on the training data subset A. These k ensemble components can be generated in various ways, such as drawing k bootstrapped samples (bagging) on data subset A, k rounds of boosting on data subset A, k diﬀerent random decision trees on data subset A, or simply training k heterogeneous classifiers on data subset A.

Determine the k outputs of each of the classifiers on the training data subset B. Create a new set of k features, in which each feature value is the output of one of these k classifiers. Thus, each point in training data subset B is transformed to this k-dimensional space based on the prediction of the k first-level classifiers. Its class label is its (known) ground-truth value. The second-level classifier is trained on this new representation of subset B.

The result is a set of k first-level models used to transform the feature space, and a combiner classifier at the second-level. For a test instance, the first-level models are used to create a new k-dimensional representation. The second-level classifier is then used to predict the test instance. In many implementations of stacking, the original features of data subset B are retained along with the k new features for learning the second-level classifier. It is also possible to use class probabilities as features rather than the class label predictions. To prevent loss of training data in the first -level and second-level models, this method can be combined with m-way cross -validation. In this approach, a new feature set is derived for each training data point by iteratively using (m − 1) segments for training the first-level classifier, and using it to derive the features of the remainder. The second-level classifier is trained on the newly created data set, which represents all the training data points. Furthermore, the first-level classifiers are re-trained on the full training data in order to enable more robust feature transformations of test instances during classification.

The stacking approach is able to reduce both bias and variance, because its combiner learns from the errors of diﬀerent ensemble components. Many other ensemble methods can be viewed as special cases of stacking in which a data-independent model combination algorithm, such as a majority vote, is used. The main advantage of stacking is the flexible learning approach of its combiner, which makes it potentially more powerful than other ensemble methods.

11.9 Summary

In this chapter, we studied several advanced topics in data classification, such as multiclass learning, scalable learning, and rare class learning. These are more challenging scenarios for data classification that require dedicated methods. Classification can often be enhanced with additional unlabeled data in semisupervised learning, or by selective acquisition of the user, as in active learning. Ensemble methods can also be used to significantly improve classification accuracy.

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 232 233 234 235 236 237 238 239 ... 423