k
|
|
f point(Xj |M) =αi · f i(Xj ).
|
(6.10)
|
i=1
Then, for a data set D containing n data points, denoted by X1 . . . Xn , the probability density of the data set being generated by the model M is the product of all the point-specific probability densities:
n
|
|
f data(D|M) = f point(Xj |M).
|
(6.11)
|
j=1
The log-likelihood fit L(D|M) of the data set D with respect to model M is the logarithm of the aforementioned expression and can be (more conveniently) represented as a sum of values over different data points. The log-likelihood fit is preferred for computational reasons.
n
|
n
|
k
|
|
|
L(D|M) = log( f point(
|
|
|M)) =
|
log(
|
αif i(
|
|
)).
|
(6.12)
|
|
Xj
|
Xj
|
|
j=1
|
j=1
|
i=1
|
|
|
This log-likelihood fit needs to maximized to determine the model parameters. A salient observation is that if the probabilities of data points being generated from different clusters were known, then it becomes relatively easy to determine the optimal model parameters separately for each component of the mixture. At the same time, the probabilities of data points being generated from different components are dependent on these optimal model parameters. This circularity is reminiscent of a similar circularity in optimizing the objec-tive function of partitioning algorithms in Sect. 6.3. In that case, the knowledge of a hard assignment of data points to clusters provides the ability to determine optimal cluster repre-sentatives locally for each cluster. In this case, the knowledge of a soft assignment provides the ability to estimate the optimal (maximum likelihood) model parameters locally for each cluster. This naturally suggests an iterative EM algorithm, in which the model parameters and probabilistic assignments are iteratively estimated from one another.
Let Θ be a vector, representing the entire set of parameters describing all components of the mixture model. For example, in the case of the Gaussian mixture model, Θ contains all the component mixture means, variances, covariances, and the prior generative proba-bilities α1 . . . αk. Then, the EM algorithm starts with an initial set of values of Θ (possibly
Dostları ilə paylaş: |