Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	108/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 104 105 106 107 108 109 110 111 ... 423

1-Data Mining tarjima

k

f ^point(X_j \|M) =α_i · f ⁱ(X_j ).	(6.10)

i=1

Then, for a data set D containing n data points, denoted by X₁ . . . X_n , the probability density of the data set being generated by the model M is the product of all the point-specific probability densities:

n
f ^data(D\|M) = f ^point(X_j \|M).	(6.11)

j=1

The log-likelihood fit L(D|M) of the data set D with respect to model M is the logarithm of the aforementioned expression and can be (more conveniently) represented as a sum of values over diﬀerent data points. The log-likelihood fit is preferred for computational reasons.

n				n		k
L(D\|M) = log( f ^point(		\|M)) =	log(		α_if ⁱ(			)).	(6.12)
	^Xj						^Xj
j=1				j=1		i=1

This log-likelihood fit needs to maximized to determine the model parameters. A salient observation is that if the probabilities of data points being generated from diﬀerent clusters were known, then it becomes relatively easy to determine the optimal model parameters separately for each component of the mixture. At the same time, the probabilities of data points being generated from diﬀerent components are dependent on these optimal model parameters. This circularity is reminiscent of a similar circularity in optimizing the objec-tive function of partitioning algorithms in Sect. 6.3. In that case, the knowledge of a hard assignment of data points to clusters provides the ability to determine optimal cluster repre-sentatives locally for each cluster. In this case, the knowledge of a soft assignment provides the ability to estimate the optimal (maximum likelihood) model parameters locally for each cluster. This naturally suggests an iterative EM algorithm, in which the model parameters and probabilistic assignments are iteratively estimated from one another.

Let Θ be a vector, representing the entire set of parameters describing all components of the mixture model. For example, in the case of the Gaussian mixture model, Θ contains all the component mixture means, variances, covariances, and the prior generative proba-bilities α₁ . . . α_k. Then, the EM algorithm starts with an initial set of values of Θ (possibly

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 104 105 106 107 108 109 110 111 ... 423