Microsoft Word turney-littman-acm doc

Yüklə 200 Kb.

Pdf görüntüsü

səhifə	4/18
tarix	22.05.2023
ölçüsü	200 Kb.
	#119806

1 2 3 4 5 6 7 8 9 ... 18

X
, to decompose X into a product of three matrices U V
T
, where U and V are in column
orthonormal form (i.e., the columns are orthogonal and have unit length: U
T
U = V
T
V = I
)
and is a diagonal matrix of singular values (hence SVD). If X is of rank r, then is
also of rank r. Let
k
, where k < r, be the diagonal matrix formed from the top k singular
values, and let U
k
and V
k
be the matrices produced by selecting the corresponding
columns from U and V. The matrix U
k
k
V
k
T
is the matrix of rank k that best
approximates the original matrix X, in the sense that it minimizes the approximation
errors. That is,
X
ˆ = U
k
k
V
k
T
minimizes
F
X
X
−
ˆ
over all matrices
X
ˆ of rank
k
, where
F
denotes the Frobenius norm [Golub and Van Loan 1996; Bartell
et al.
1992]. We
may think of this matrix
U
k
k
V
k
T
as a “smoothed” or “compressed” version of the
original matrix
X
.
LSA is similar to principal components analysis. LSA works by measuring the
similarity of words using the smoothed matrix
U
k
k
V
k
T
instead of the original matrix
X
.
The similarity of two words, LSA(
word
1
,
word
2
), is measured by the cosine of the angle
between their corresponding row vectors in

Yüklə 200 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 18