Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	233/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 229 230 231 232 233 234 235 236 ... 423

1-Data Mining tarjima

Variance: Random variations in the choices of the training data will lead to diﬀerent models. Consider the example illustrated in Fig. 11.5b. In this case, the true decision

376 CHAPTER 11. DATA CLASSIFICATION: ADVANCED CONCEPTS

boundary is linear. A suﬃciently deep univariate decision tree can approximate a linear boundary quite well with axis -parallel piecewise approximations. However, with limited training data, even when the trees are grown to full depth without pruning, the piecewise approximations will be coarse like the boundaries illustrated for hypothetical decision trees A and B in Fig. 11.5b. Diﬀerent choices of training data might lead to diﬀerent split choices, as a result of which the decision boundaries of trees A and B are very diﬀerent. Therefore, (test) instances such as X are inconsistently classified by decision trees which were created by diﬀerent choices of training data sets. This is a manifestation of model variance. Model variance is closely related to overfitting. When a classifier has an overfitting tendency, it will make inconsistent predictions for the same test instance over diﬀerent training data sets.

Noise: The noise refers to the intrinsic errors in the target class labeling. Because this is an intrinsic aspect of data quality, there is little that one can do to correct it. Therefore, the focus of ensemble analysis is generally on reducing bias and variance.

Note that the design choices of a classifier often reflect a trade-oﬀ between the bias and the variance. For example, pruning a decision tree results in a more stable classifier and therefore reduces the variance. On the other hand, because the pruned decision tree makes stronger assumptions about the simplicity of the decision boundary than the unpruned tree, the former leads to greater bias. Similarly, using a larger number of neighbors for a nearest-neighbor classifier will lead to larger bias but lower variance. In general, simplified assumptions about the decision boundary lead to greater bias but lower variance. On the other hand, complex assumptions reduce bias but are harder to robustly estimate with limited data. The bias and variance are aﬀected by virtually every design choice of the model, such as the choice of the base algorithm or the choice of model parameters.

Ensemble analysis can often be used to reduce both the bias and variance of the classi-fication process. For example, consider the case of the example illustrated in Fig. 11.5a, in which the decision boundary is not linear, and therefore any linear SVM classifier will not find the correct decision boundary. However, by using diﬀerent choices of model parameters, or data subset selection, it is possible to create three diﬀerent linear SVM hyperplanes A, B, and C, as illustrated in Fig. 11.6a. Note that these diﬀerent classifiers tend to work well in diﬀerent parts of the data and have diﬀerent directions of bias in any particular part of the data. This kind of diﬀerential performance on diﬀerent parts of the data is sometimes artificially induced in ensemble components in some methods, such as boosting. In other cases, it may be a natural result of using ensemble model components that are very diﬀerent from one another (e.g., decision trees and Bayes classifiers). Now consider a new ensemble classifier that is created using the majority vote of the three aforementioned classifiers cor-responding to hyperplanes A, B, and C. The decision boundary of this ensemble classifier is illustrated in Fig. 11.6a as well. This decision boundary is not linear and has lower bias with respect to the true decision boundary. The reason for this is that diﬀerent classifiers have diﬀerent levels and directions of bias in diﬀerent parts of the training data, and the majority vote across the diﬀerent classifiers is able to obtain results that are generally less biased in any specific region than each of the component classifiers.

A similar argument applies to the variance example illustrated in Fig. 11.5b. Although instances such as X are inconsistently classified because of model variance, they will often be classified correctly when the model bias is low. As a result, by using the aggregation over suﬃciently independent classifiers, it becomes increasingly likely that instances close to the decision boundary, such as X, will be correctly classified. For example, a majority vote of just three independent trees, each of which classifies X correctly with 80 % probability,

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 229 230 231 232 233 234 235 236 ... 423