d
|
|
P (C = c|X) ∝ P (C = c) P (xj = aj |C = c)
|
(11.21)
|
j=1
(M-step) Estimate conditional distribution of features for different clusters (classes), using the current estimated posterior probabilities (unlabeled data) and known mem-berships (labeled data) of data points to clusters (classes).
One challenge with the use of the approach is that the clustering structure may some-times not correspond to the class distribution very well. In such cases, the use of unla-beled data can harm the classification accuracy, as the clusters found by the EM algorithm
366 CHAPTER 11. DATA CLASSIFICATION: ADVANCED CONCEPTS
drift away from the true class structure. After all, unlabeled data are plentiful compared to labeled data, and therefore the estimation of P (xj = aj |C = c) in Eq. 11.20 will be dominated by the unlabeled data. To ameliorate this effect, the labeled and unlabeled data are weighted differently during the estimation of P (xj = aj |C = c). The unlabeled data are weighted down by a predefined discount factor μ < 1 to ensure better corre-spondence between the clustering structure and the class distribution. In other words, the value of w(X, c) is multiplied with μ for only the unlabeled examples before estimating P (xj = aj |C = c) in Eq. 11.20. The EM-approach for semisupervised classification is par-ticularly remarkable because it demonstrates the link between semisupervised clustering and semisupervised classification, even though these two kinds of semisupervision are motivated by different application scenarios.
11.6.2.2 Transductive Support Vector Machines
The general assumption for most of the semisupervised methods is that the label values of unsupervised examples do not vary abruptly at densely populated regions of the data. In transductive support vector machines, this assumption is implicitly encoded by assigning labels to unsupervised examples that maximize the margin of the support vector machine. To understand this point, consider the example of Fig 11.2b. In this case, the margin of the SVM will be optimized only when the labels of the examples in the cluster containing the single example for class A, are also set to the same value A. The same is true for the unlabeled examples in the cluster containing the single label for class B. Therefore, the SVM formulation now needs to be modified to incorporate additional margin constraints, and binary decision variables for each unlabeled example. Recall from the discussion in Sect. 10.6 of Chap. 10 that the original SVM formulation was to minimize the objective
function
|
||W ||2
|
+ C
|
n
|
ξi, subject to the following constraints:
|
|
|
i=1
|
|
|
|
2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dostları ilə paylaş: |