Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	228/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 224 225 226 227 228 229 230 231 ... 423

1-Data Mining tarjima

k

Certain(X) =\|\|p_i − 0.5\|\|.	(11.25)

i=1

11.7. ACTIVE LEARNING

371

The value lies in the range (0, 1), and lower values are indicative of greater uncertainty. In the multiclass scenario, a formal entropy measure may be used to quantify uncertainty. If the Bayes posterior probabilities of the k classes are p₁ . . . p_k, respectively, based on the current set of labeled instances, then the entropy measure Entropy(X) is defined as follows:

k
Entropy(X) = − p_ilog(p_i).	(11.26)

i=1

In this case, larger values of the entropy indicate greater uncertainty and are more desirable for label acquisition.

11.7.1.2 Query-by-Committee

In this case, the heterogeneity is measured in terms of the disagreement of diﬀerent classi-fiers rather than the posterior probabilities of a single classifier over diﬀerent labels. This criterion, however, tries to achieve the same intuitive goal, but in a diﬀerent way. Intuitively, when the posterior probability of a Bayes classifier is the same across diﬀerent classes, a sig-nificant disagreement may exist between diﬀerent classification models about the predicted label. Therefore, this approach uses a committee of diﬀerent classifiers that are trained on the current set of labeled instances. These classifiers are then used to predict the class label of each unlabeled instance. The instance for which the classifiers disagree the most is selected as the relevant one in this scenario.

At an intuitive level, the query-by-committee method achieves similar heterogeneity goals as the uncertainty sampling method. Diﬀerent classifiers are more likely to disagree on the class label for instances near the true decision boundary. The mathematical formula for quantifying the disagreement is also the same as uncertainty sampling. In particular, the posterior probability p_i of each class i in Eq. 11.26 is replaced with the fraction of votes received by each class i. It is particularly beneficial to use diverse classifiers that use fundamentally diﬀerent modeling methodologies.

11.7.1.3 Expected Model Change

In this approach, the instance with the greatest expected change from the current classi-fication model by adding a particular instance to the training data is selected. In many optimization-based classification models, such as discriminative probabilistic models, the gradient of the model objective function with respect to the model parameters can be quantified. By adding a queried instance to the training data, the gradient will change as well. The instance with the greatest change in the gradient when the queried instance is added to set of labeled instances. The intuition is that such an instance is likely to be very diﬀerent from the model constructed using already labeled instances. Let δg_i(X) be the change in the gradient with respect to the model parameters, conditional on the fact that the correct training label of the candidate instance X is the ith class. In other words, if the current labeled training set is L and ∇G(L) is the gradient of the objective function with respect to model parameters, we have:

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 224 225 226 227 228 229 230 231 ... 423