Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	119/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 115 116 117 118 119 120 121 122 ... 423

1-Data Mining tarjima

n}, ∀j ∈ {1 . . .	k}
(6.38)
(D^T U )_ij v_ij − (V U ^T U )_ij v_ij = 0	∀i ∈ {1 . . .	d}, ∀j ∈ {1 . . .	k}	(6.39)

These conditions are independent of P_α and P_β , and they provide a system of equations in terms of the entries of U and V . Such systems of equations are often solved using iterative methods. It can be shown that this particular system can be solved by using the following multiplicative update rules for u_ij and v_ij , respectively:

u_ij =		(DV )_ij u_ij	∀i ∈ {1 . . . n}, ∀j ∈ {1 . . . k}		(6.40)
		(U V ^T V )_ij
v_ij =	(D^T U )_ij v_ij			∀i ∈ {1 . . . d}, ∀j ∈ {1 . . . k}	(6.41)
		(V U ^T U )_ij

The entries of U and V are initialized to random values in (0, 1), and the iterations are executed to convergence.

One interesting observation about the matrix factorization technique is that it can also be used to determine word-clusters instead of document clusters. Just as the columns of V provide a basis that can be used to discover document clusters, one can use the columns of U to discover a basis that corresponds to word clusters. Thus, this approach provides complementary insights into spaces where the dimensionality is very large.

6.8.1 Comparison with Singular Value Decomposition

Singular value decomposition (cf. Sect. 2.4.3.2 of Chap. 2) is a matrix factorization method. SVD factorizes the data matrix into three matrices instead of two. Equation 2.12 of Chap. 2 is replicated here:

D ≈ Q_kΣ_kP_k^T .

(6.42)

It is instructive to compare this factorization to that of Eq. 6.30 for non-negative matrix factorization. The n × k matrix Q_kΣ_k is analogous to the n × k matrix U in non-negative matrix factorization. The d × k matrix P_k is analogous to the d × k matrix V in matrix factorization. Both representations minimize the squared-error of data representation. The main diﬀerences between SVD and NMF arise from the diﬀerent constraints in the corre-sponding optimization formulations. SVD can be viewed as a matrix-factorization in which the objective function is the same, but the optimization formulation imposes orthogonality constraints on the basis vectors rather than non-negativity constraints. Many other kinds of constraints can be used to design diﬀerent forms of matrix factorization. Furthermore,

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 115 116 117 118 119 120 121 122 ... 423