Data Mining: The Textbook


DIMENSIONS DATAPOINTS



Yüklə 17,13 Mb.
səhifə35/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   31   32   33   34   35   36   37   38   ...   423
1-Data Mining tarjima

DIMENSIONS




DATAPOINTS




d




n

ORIGINAL













DATA















D
LATENT
COMPONENTS
k



DTAPOINTS

TOPkBASIS

VECTORSOFROWSTOFD

x

LATENT







n
















Qk












LATENT
COMPONENTS

COMPONENTS




k

LATENT




k

k x















k: IMPORTANCE OF LATENT COMPONENTS



CMPONENTS

DIMENSIONS
d


TOP k BASIS

  1. VECTORS OF



ROWS OF D


PkT



Figure 2.4: Complementary basis properties of matrix factorization in SVD




d-dimensional column vector and Dv be the projection of the data set D on v. Consider the problem of determining the unit vector v such that the sum of squared Euclidean distances (Dv)T (Dv) of the projected data points from the origin is maximized. Setting the gradient of the Lagrangian relaxation vT DT Dv − λ(||v ||2 1) to 0 is equivalent to the eigenvector condition DT Dv − λv = 0. Because the right singular vectors are eigenvectors of DT D, it follows that the eigenvectors (right singular vectors) with the k largest eigenvalues (squared singular values) provide a basis that maximizes the preserved energy in the transformed and reduced data matrix Dk = DPk = QkΣk. Because the energy, which is the sum of squared Euclidean distances from the origin, is invariant to axis rotation, the energy in Dk is the same as that in DkPkT = QkΣkPkT . Therefore, k-rank SVD is a maximum energy-preserving


factorization. This result is known as the Eckart–Young theorem.

The total preserved energy of the projection Dv of the data set D along unit right-singular vector v with singular value σ is given by (Dv)T (Dv), which can be simplified as follows:


(Dv)T (Dv) = vT (DT Dv) = vT (σ2v) = σ2


Because the energy is defined as a linearly separable sum along orthonormal directions, the preserved energy in the data projection along the top-k singular vectors is equal to the sum of the squares of the top-k singular values. Note that the total energy in the data set D is always equal to the sum of the squares of all the nonzero singular values. It can be shown that maximizing the preserved energy is the same as minimizing the squared error3 (or lost energy) of the k-rank approximation. This is because the sum of the energy in the preserved subspace and the lost energy in the complementary (discarded) subspace is always a constant, which is equal to the energy in the original data set D.


When viewed purely in terms of eigenvector analysis, SVD provides two different perspec-tives for understanding the transformed and reduced data. The transformed data matrix can either be viewed as the projection DPk of the data matrix D on the top k basis eigenvectors Pk of the d × d scatter matrix DT D, or it can directly be viewed as the scaled eigenvec-tors QkΣk = DPk of the n × n dot-product similarity matrix DDT . While it is generally computationally expensive to extract the eigenvectors of an n × n similarity matrix, such an approach also generalizes to nonlinear dimensionality reduction methods where notions of linear basis vectors do not exist in the original space. In such cases, the dot-product similarity matrix is replaced with a more complex similarity matrix in order to extract a nonlinear embedding (cf. Table 2.3).




SVD is more general than PCA and can be used to simultaneously determine a subset of k basis vectors for the data matrix and its transpose with the maximum energy. The latter can be useful in understanding complementary transformation properties of DT .



3The squared error is the sum of squares of the entries in the error matrix D − Qk Σk PkT .




Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   31   32   33   34   35   36   37   38   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin