Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə229/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   225   226   227   228   229   230   231   232   ...   423
1-Data Mining tarjima

δgi(

X

) = ||∇G(L ∪ (

X,

i))

G(L)

||.

(11.27)

Of course, we do not yet know the training label of X, but we can only estimate the posterior probability of each label with a Bayes classifier. Let pi be the posterior probability of the


372 CHAPTER 11. DATA CLASSIFICATION: ADVANCED CONCEPTS


class i with respect to the current label set of known labels in the training data. Then, the expected model change C(X) with respect to the instance X is defined as follows:




k
C(X) = pi · δgi(X).


i=1

The instance X with the largest value of C(X) is queried for the label.


11.7.2 Performance-Based Models


Although the motivation of heterogeneity-based models is that uncertain regions are the most informative by virtue of their proximity to decision boundaries, they have a drawback as well. Querying uncertain regions can inadvertently lead to the addition of unrepresenta-tive outliers to the training data. Performance-based classifiers are therefore focused directly on the classification objective function. Therefore, these methods evaluate the accuracy of classification on the remaining unlabeled instances.


11.7.2.1 Expected Error Reduction


For the purpose of discussion, the remaining set of instances that has not yet been labeled is denoted by V . This set is used as the validation set on which the expected error reduction is computed. This approach is related to uncertainty sampling in a complementary way. Whereas uncertainty sampling maximizes the label uncertainty of the queried instance, the expected error reduction minimizes the expected label uncertainty of the remaining instances V when the queried instance is added to the training data. Thus, in the case of a binary-classification problem, the predicted posterior probabilities of the instances in V should be as far away from 0.5 as possible after adding the queried instance. The idea here is that greater certainty in prediction of class labels of the remaining unlabeled instances, will eventually result in a lower error rate on an unseen test set as well. Thus, error-reduction models can also be considered as greatest certainty models, except that the certainty criterion is applied to the instances in V rather than the query instance itself. Let pi (X) denote the posterior probability of the label i for the query candidate instance X with a Bayes model


trained on the current set of labeled instances. Let Pj(X,i)(Z) be the posterior probability of class label j, when the instance-label combination (X, i) is added to the current set of labeled instances. Then, the error objective function E(X, V ) for the binary class problem (i.e., k = 2) is defined as follows:














k













k






































Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   225   226   227   228   229   230   231   232   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin