6.8 Non-negative Matrix Factorization
Nonnegative matrix factorization (NMF) is a dimensionality reduction method that is tai-lored to clustering. In other words, it embeds the data into a latent space that makes it more amenable to clustering. This approach is suitable for data matrices that are non-negative and sparse. For example, the n × d document-term matrix in text applications always contains non-negative entries. Furthermore, because most word frequencies are zero, this matrix is also sparse.
Nonnegative matrix factorization creates a new basis system for data representation, as in all dimensionality reduction methods. However, a distinguishing feature of NMF com-pared to many other dimensionality reduction methods is that the basis system does not necessarily contain orthonormal vectors. Furthermore, the basis system of vectors and the coordinates of the data records in this system are non-negative. The non-negativity of the representation is highly interpretable and well-suited for clustering. Therefore, non-negative matrix factorization is one of the dimensionality reduction methods that serves the dual purpose of enabling data clustering.
Consider the common use-case of NMF in the text domain, where the n × d data matrix D is a document-term matrix. In other words, there are n documents defined on a lexicon of size d. NMF transforms the data to a reduced k- dimensional basis system, in which each basis vector is a topic. Each such basis vector is a vector of nonnegatively weighted words that define that topic. Each document has a non-negative coordinate with respect to each basis vector. Therefore, the cluster membership of a document may be determined by examining the largest coordinate of the document along any of the k vectors. This provides the “topic” to which the document is most related and therefore defines its cluster. An alternative way of performing the clustering is to apply another clustering method such as k-means on the transformed representation. Because the transformed representation better discriminates between the clusters, the k -means approach will be more effective. The expression of each document as an additive and non-negative combination of the underlying topics also provides semantic interpretability to this representation. This is why the non-negativity of matrix factorization is so desirable.
So how are the basis system and the coordinate system determined? The non -negative matrix factorization method attempts to determine the matrices U and V that minimize the following objective function:
J =
|
1
|
||D − UV T ||2.
|
(6.29)
|
|
2
|
|
Here, || · ||2 represents the (squared) Frobenius norm, which is the sum of the squares of all the elements in the matrix, U is an n ×k non-negative matrix, and V is a d ×k non-negative matrix. The value of k is the dimensionality of the embedding. The matrix U provides the new k-dimensional coordinates of the rows of D in the transformed basis system, and the matrix V provides the basis vectors in terms of the original lexicon. Specifically, the rows of U provide the k-dimensional coordinates for each of the n documents, and the columns of V provide the k d-dimensional basis vectors.
What is the significance of the aforementioned optimization problem? Note that by minimizing J, the goal is to factorize the document-term matrix D as follows:
Dostları ilə paylaş: |