Data Mining: The Textbook

DIMENSIONS DATAPOINTS

Yüklə 17,13 Mb.

səhifə	35/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 31 32 33 34 35 36 37 38 ... 423

1-Data Mining tarjima

DIMENSIONS

DATAPOINTS		d
DATAPOINTS	n	ORIGINAL
		ORIGINAL
		DATA
		DATA

D
LATENT
COMPONENTS
k

DTAPOINTS	TOPkBASIS	VECTORSOFROWSTOFD	x	LATENT
	n		x
		Q_k

LATENT
COMPONENTS

COMPONENTS		k	LATENT
	k	k x

_k: IMPORTANCE OF LATENT COMPONENTS

CMPONENTS

DIMENSIONS
d

TOP k BASIS

VECTORS OF

ROWS OF D

P_k^T

Figure 2.4: Complementary basis properties of matrix factorization in SVD

d-dimensional column vector and Dv be the projection of the data set D on v. Consider the problem of determining the unit vector v such that the sum of squared Euclidean distances (Dv)T (Dv) of the projected data points from the origin is maximized. Setting the gradient of the Lagrangian relaxation vT DT Dv − λ(||v ||2 − 1) to 0 is equivalent to the eigenvector condition DT Dv − λv = 0. Because the right singular vectors are eigenvectors of DT D, it follows that the eigenvectors (right singular vectors) with the k largest eigenvalues (squared singular values) provide a basis that maximizes the preserved energy in the transformed and reduced data matrix D_k = DP_k = Q_kΣ_k. Because the energy, which is the sum of squared Euclidean distances from the origin, is invariant to axis rotation, the energy in D_k is the same as that in D_kP_k^T = Q_kΣ_kP_k^T . Therefore, k-rank SVD is a maximum energy-preserving

factorization. This result is known as the Eckart–Young theorem.

The total preserved energy of the projection Dv of the data set D along unit right-singular vector v with singular value σ is given by (Dv)T (Dv), which can be simplified as follows:

(Dv)^T (Dv) = v^T (D^T Dv) = v^T (σ²v) = σ²

Because the energy is defined as a linearly separable sum along orthonormal directions, the preserved energy in the data projection along the top-k singular vectors is equal to the sum of the squares of the top-k singular values. Note that the total energy in the data set D is always equal to the sum of the squares of all the nonzero singular values. It can be shown that maximizing the preserved energy is the same as minimizing the squared error3 (or lost energy) of the k-rank approximation. This is because the sum of the energy in the preserved subspace and the lost energy in the complementary (discarded) subspace is always a constant, which is equal to the energy in the original data set D.

When viewed purely in terms of eigenvector analysis, SVD provides two diﬀerent perspec-tives for understanding the transformed and reduced data. The transformed data matrix can either be viewed as the projection DP_k of the data matrix D on the top k basis eigenvectors P_k of the d × d scatter matrix D^T D, or it can directly be viewed as the scaled eigenvec-tors Q_kΣ_k = DP_k of the n × n dot-product similarity matrix DD^T . While it is generally computationally expensive to extract the eigenvectors of an n × n similarity matrix, such an approach also generalizes to nonlinear dimensionality reduction methods where notions of linear basis vectors do not exist in the original space. In such cases, the dot-product similarity matrix is replaced with a more complex similarity matrix in order to extract a nonlinear embedding (cf. Table 2.3).

SVD is more general than PCA and can be used to simultaneously determine a subset of k basis vectors for the data matrix and its transpose with the maximum energy. The latter can be useful in understanding complementary transformation properties of DT .

³The squared error is the sum of squares of the entries in the error matrix D − Q_k Σ_k P_k^T .

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 31 32 33 34 35 36 37 38 ... 423