E(
|
|
V ) =
|
pi(
|
|
)
|
|
(X,i)(
|
|
)
|
|
0.5
|
|
(11.28)
|
|
X,
|
|
X
|
|
|
|
P
|
Z
|
|
.
|
|
|
|
|
i=1
|
|
|
|
j=1
|
|
∈V ||
|
|
j
|
−
|
|
||
|
|
|
|
|
|
|
|
|
|
Z
|
|
|
|
|
|
|
|
|
|
|
The objective function can be interpreted as the expected label certainty of remaining test instances. Therefore, the objective function is maximized rather than minimized, as in the case of uncertainty-based models.
This result can easily be extended to the case of k-way models by using the same entropy criterion that was discussed for uncertainty-based models. In that case, the aforementioned
expression is modified to replace ||Pj(X,i)(Z) − 0.5|| with the class-specific entropy term −Pj(X,i)(Z)log(Pj(X,i)(Z)). Furthermore, this criterion needs to be minimized.
11.8. ENSEMBLE METHODS
|
373
|
11.7.2.2 Expected Variance Reduction
One observation about the aforementioned error-reduction method of Eq. 11.28 is that it needs to be computed in terms of the entire set of unlabeled instances in V , and a new model needs to be trained incrementally to test the effect of adding a new instance. This can be computationally expensive. It should be pointed out that when the error of an instance set reduces, the corresponding variance also typically reduces. The overall generalization error can be expressed4 as a sum of the true label noise, model bias, and variance. Of these, only the last term is highly dependent on the choice of instances selected. Therefore, it is possible to reduce the variance instead of the error, and the main advantage of doing so is the reduction in computational requirements. The main advantage of these techniques is the ability to express the variance in closed form , and therefore achieve greater computational efficiency. A detailed description of this class of methods is beyond the scope of this book. Refer to the bibliographic notes.
11.7.3 Representativeness-Based Models
The main advantage of performance-based models over heterogeneity-based models is that they intend to improve the error behavior on the aggregate set of unlabeled instances, rather than evaluating the uncertainty behavior of the queried instance. Therefore, unrepresenta-tive or outlier-like queries are avoided. In some models, the representativeness itself becomes a part of the criterion for querying. One way of measuring representativeness is with the use of a density-based criterion, in which the density of a region in the space is used to weight the querying criterion. This weight is combined with a heterogeneity-based query criterion. Therefore, such methods can be considered a variation of the heterogeneity-based model, but with a representativeness weighting to ensure that outliers are not selected.
Therefore, these methods combine the heterogeneity behavior of the queried instance with a representativeness function from the unlabeled set V to decide on the queried instance. The representativeness function weights dense regions of the input space. The objective function O(X, V ) of such a model is expressed as the product of a heterogeneity component H(X) and a representativeness component R(X, V ).
O(X, V ) = H(X)R(X, V )
The value of H(X) (assumed to be a maximization function) can be any of the hetero-geneity criteria (transformed appropriately for maximization), such as the entropy criterion from uncertainty sampling, or the expected model change criterion. The representativeness criterion R(X, V ) is simply a measure of the density of X with respect to the instances in V . A simple version of this density is the average similarity of X to the instances in V . Many other sophisticated variations of this simple measure are used. The reader is referred to the bibliographic notes for a discussion of the available measures.
11.8 Ensemble Methods
Ensemble methods are motivated by the fact that different classifiers may make different predictions on test instances due to the specific characteristics of the classifier, or their sensitivity to the random artifacts in the training data. An ensemble method is an approach to increase the prediction accuracy by combining the results from multiple classifiers. The
This theoretical concept is discussed in detail in the next section.
374 CHAPTER 11. DATA CLASSIFICATION: ADVANCED CONCEPTS
Algorithm EnsembleClassify(Training Data Set: D
Base Algorithms: A1 . . . Ar, Test Instances: T )
begin
Dostları ilə paylaş: |