i=1
Each n × d matrix U i ViT is rank-1 matrix, which corresponds to a latent component in the data. Because of the interpretable nature of non-negative decomposition, it is easy
6.8. NON-NEGATIVE MATRIX FACTORIZATION
|
193
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6.23: The interpretable matrix decomposition of NMF
to map these latent components to clusters. For example, the two latent components of the aforementioned example corresponding to cats and cars, respectively, are illustrated in Fig. 6.23.
It remains to be explained how the aforementioned optimization problem for J is solved.
The squared norm of any matrix Q can be expressed as the trace of the matrix QQT .
Therefore, the objective function J can be expressed as follows:
|
1
|
|
|
|
J =
|
|
|
tr (D − U V T )(D − U V T )T
|
(6.33)
|
|
2
|
|
=
|
1
|
|
tr(DDT ) − tr(DV U T ) − tr(U V T DT ) + tr(U V T V U T )
|
(6.34)
|
|
|
|
|
2
|
|
|
This is an optimization problem with respect to the matrices U = [uij ] and V = [vij ]. There-fore, the matrix entries uij and vij are the optimization variables. In addition, the constraints uij ≥ 0 and v ij ≥ 0 ensure non-negativity. This is a typical constrained non-linear optimiza-tion problem and can be solved using the Lagrangian relaxation, which relaxes these non-negativity constraints and replaces them in the objective function with constraint-violation penalties. The Lagrange parameters are the multipliers of these new penalty terms. Let Pα = [αij ]n×k and Pβ = [β ij ]d×k be matrices with the same dimensions as U and V , respec-tively. The elements of the matrices Pα and Pβ are the corresponding Lagrange multipliers for the non-negativity conditions on the different elements of U and V , respectively. Fur-thermore, note that tr(PαU T ) is equal to i,j αij uij , and tr(Pβ V T ) is equal to i,j βij vij . These correspond to the Lagrangian penalties for the non -negativity constraints on U and V , respectively. Then, the augmented objective function with constraint penalties can be expressed as follows:
L = J + tr(PαU T ) + tr(Pβ V T ).
|
(6.35)
|
194 CHAPTER 6. CLUSTER ANALYSIS
To optimize this problem, the partial derivative of L with respect to U and V are computed and set to 0. Matrix calculus on the trace-based objective function yields the following:
∂L
|
=−DV +UVTV +Pα =0
|
(6.36)
|
|
|
|
∂U
|
|
∂L
|
=−DTU +VUTU +Pβ =0
|
(6.37)
|
|
|
|
∂V
|
|
The aforementioned expressions provide two matrices of constraints. The (i, j)th entry of the above (two matrices of) conditions correspond to the partial derivatives of L with respect to uij and vij , respectively. These constraints are multiplied by uij and v ij , respectively. By using the Kuhn -Tucker optimality conditions αij uij = 0 and βij vij = 0, the (i, j)th pair of constraints can be written as follows:
(DV )ij uij − (U V T V )ij uij = 0
|
∀i ∈ {1 . . .
|
|
Dostları ilə paylaş: |