X , to decompose X into a product of three matrices U V T , where U and V are in column
orthonormal form (i.e., the columns are orthogonal and have unit length: U T U = V T V = I )
and is a diagonal matrix of singular values (hence SVD). If X is of rank r, then is
also of rank r. Let
k , where k < r, be the diagonal matrix formed from the top k singular
values, and let U k and V k be the matrices produced by selecting the corresponding
columns from U and V. The matrix U k k V k T is the matrix of rank k that best
approximates the original matrix X, in the sense that it minimizes the approximation
errors. That is,
X ˆ = U k k V k T minimizes
F X X −
ˆ
over all matrices
X ˆ of rank
k , where
F denotes the Frobenius norm [Golub and Van Loan 1996; Bartell
et al. 1992]. We
may think of this matrix
U k k V k T as a “smoothed” or “compressed” version of the
original matrix
X .
LSA is similar to principal components analysis. LSA works by measuring the
similarity of words using the smoothed matrix
U k k V k T instead of the original matrix
X .
The similarity of two words, LSA(
word 1
,
word 2
), is measured by the cosine of the angle
between their corresponding row vectors in