Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə231/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   227   228   229   230   231   232   233   234   ...   423
1-Data Mining tarjima

j = 1;

repeat
Select an algorithm Qj from A1 . . . Ar;


Create a new training data set fj (D) from D;
Apply Qj to fj (D) to learn model Mj ;



  • = j + 1;

until(termination);


report labels of each T ∈ T based on combination of predictions from all learned models Mj ;


end

Figure 11.4: The generic ensemble framework


basic approach of ensemble analysis is to apply the base ensemble learners multiple times by using either different models, or by using the same model on different subsets of the training data. The results from different classifiers are then combined into a single robust prediction.


Although there are significant differences in how the individual learners are constructed and combined by various ensemble models, we start with a very generic description of ensemble algorithms. Later in this section, we will discuss specific instantiations of this broad framework, such as bagging, boosting, and random decision trees. The ensemble approach uses a set of base classification algorithms A1 . . . Ar. Note that these learners might be completely different algorithms, such as decision trees, SVMs, or the Bayes classifier. In some types of ensembles, such as boosting and bagging, a single learning algorithm is used but with different choices of training data. Different learners are used to leverage the greater robustness of different algorithms in different regions of the data. Let the learning algorithm selected in the jth iteration be denoted by Qj . It is assumed that Qj is selected from the base learners. At this point, a derivative training data set fj (D) from the base training data is selected. This may be a random sample of the training data, as in bagging, or it may be based on the results of the past execution of ensemble components, as in boosting. A model Mj is learned in the jth iteration by applying the selected learning algorithm Qj to fj (D). For each test instance T , a prediction is made by combining the results of different models Mj on T . This combination may be performed in various ways. Examples include the use of simple averaging, the use of a weighted vote, or the treatment of the model combination process as a learning problem. The overall ensemble framework is illustrated in Fig. 11.4.


The description of Fig. 11.4 is very generic, and allows significant flexibility in terms of how the ensemble components may be learned and the combination may be performed. The two primary types of ensembles are special cases of the description of Fig. 11.4:





  1. Data-centered ensembles: A single base learning algorithm (e.g., an SVM or a decision tree) is used, and the primary variation is in terms of how the derivative data set fj (D) for the jth ensemble component is constructed. In this case, the input to the algorithm contains only a single learning algorithm A1. The data set fj (D) for the jth component of the ensemble may be constructed by sampling the data, focusing on incorrectly classified portions of the training data in previously executed ensemble components, manipulating the features of the data, or manipulating the class labels in the data.


Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   227   228   229   230   231   232   233   234   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin