Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə234/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   230   231   232   233   234   235   236   237   ...   423
1-Data Mining tarjima

11.8. ENSEMBLE METHODS

377




1































1


































0.9

CLASS A










SVM A
















0.9

CLASS A










INSTANCE X

















































































































































0.8































0.8








































ENSEMBLE


























































0.7




BOUNDARY






















0.7

































































































0.6































0.6


































0.5































0.5

ENSEMBLE













TRUE BOUNDARY








































BOUNDARY



























































































0.4

SVM B
















TRUE BOUNDARY




0.4
























































































0.3































0.3


































0.2













SVM C
















0.2

























































































































CLASS B































CLASS B












































































0.1































0.1


































00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

00

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1
















(a) bias

























(b) variance
















Figure 11.6: Ensemble decision boundaries are more refined than those of component clas-sifiers


will be correct (0.83 + 32 × 0.82 × 0.2) × 100 90% of the time. In other words, the ensemble decision boundary of the majority classifier will be much closer to the true decision boundary than that of any of its component classifiers. In fact, a realistic example of how an ensemble boundary might look like after combining a set of relatively coarse decision trees, is illustrated in Fig. 11.6b. Note that the ensemble boundary is much closer to the true boundary because it is not regulated by the unpredictable variations in decision -tree behavior for a training data set of limited size. Such an ensemble is, therefore, better able to make use of the knowledge in the training data.

In general, different classification models have different sources of bias and variance. Models that are too simple (such as a linear SVM or shallow decision tree) make too many assumptions about the shape of the decision boundary, and will therefore have high bias. Models that are too complex (such as a deep decision tree) will overfit the data, and will therefore have high variance. Sometimes a different parameter setting in the same classifier will favor different parts of the bias-variance trade-off curve. For example, a small value of k in a nearest-neighbor classifier will result in lower bias but higher variance. Because different kinds of ensembles learners have different impacts on bias and variance, it is important to choose the component classifiers, so as to optimize the impact on the bias-variance trade-off. An overview of the impact of different models on bias and variance is provided in Table 11.1.


11.8.2 Formal Statement of Bias-Variance Trade-off


In the following, a formal statement of the bias-variance trade-off will be provided. Consider









Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   230   231   232   233   234   235   236   237   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin