Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	34/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 30 31 32 33 34 35 36 37 ... 423

1-Data Mining tarjima

SVD provides the same basis vectors and data transformation as PCA for data sets in which the mean of each attribute is 0.

The basis vectors of PCA are invariant to mean-translation, whereas those of SVD are not. When the data are not mean centered, the basis vectors of SVD and PCA will not be the same, and qualitatively diﬀerent results may be obtained. SVD is often applied without mean centering to sparse nonnegative data such as document-term matrices. A formal way of defining SVD is as a decomposable product of (or factorization into) three matrices:

D = QΣP^T

(2.11)

Here, Q is an n × n matrix with orthonormal columns, which are the left singular vectors.

is an n × d diagonal matrix containing the singular values, which are always nonnegative and, by convention, arranged in nonincreasing order. Furthermore, P is a d × d matrix with orthonormal columns, which are the right singular vectors. Note that the diagonal matrix Σ is rectangular rather than square, but it is referred to as diagonal because only entries of the

2.4. DATA REDUCTION AND TRANSFORMATION

form Σ_ii are nonzero. It is a fundamental fact of linear algebra that such a decomposition always exists, and a proof may be found in [480]. The number of nonzero diagonal entries of

is equal to the rank of the matrix D, which is at most min{n, d}. Furthermore, because of the orthonormality of the singular vectors, both P T P and QT Q are identity matrices. We make the following observations:

The columns of matrix Q, which are also the left singular vectors, are the orthonormal eigenvectors of DD^T . This is because DD^T = QΣ(P ^T P )Σ^T Q^T = QΣΣ^T Q^T . There-fore, the square of the nonzero singular values, which are diagonal entries of the n × n diagonal matrix ΣΣT , represent the nonzero eigenvalues of DDT .

The columns of matrix P , which are also the right singular vectors, are the orthonor-mal eigenvectors of DT D. The square of the nonzero singular values, which are rep-resented in diagonal entries of the d × d diagonal matrix ΣT Σ, are the nonzero eigen-values of DT D. Note that the nonzero eigenvalues of DDT and DT D are the same. The matrix P is particularly important because it provides the basis vectors, which are analogous to the eigenvectors of the covariance matrix in PCA.

3. Because the covariance matrix of mean-centered data is D^T D (cf. Eq. 2.7) and the

n
right singular vectors of SVD are eigenvectors of DT D, it follows that the eigenvectors of PCA are the same as the right-singular vectors of SVD for mean-centered data. Furthermore, the squared singular values in SVD are n times the eigenvalues of PCA. This equivalence shows why SVD and PCA can provide the same transformation for mean-centered data.

Without loss of generality, it can be assumed that the diagonal entries of Σ are arranged in decreasing order, and the columns of matrix P and Q are also ordered accordingly. Let P_k and Q_k be the truncated d × k and n × k matrices obtained by selecting the first k columns of P and Q, respectively. Let Σ_k be the k × k square matrix containing the top k singular values. Then, the SVD factorization yields an approximate d-dimensional data representation of the original data set D:

D ≈ Q_kΣ_kP_k^T

(2.12)

The columns of P_k represent a k-dimensional basis system for a reduced representation of the data set. The dimensionality reduced data set in this k-dimensional basis system is given by the n×k data set D_k = DP_k = Q_k Σ_k, as in Eq. 2.10 of PCA. Each of the n rows of D_k contain the k coordinates of each transformed data point in this new axis system. Typically, the value of k is much smaller than both n and d. Furthermore, unlike PCA, the rightmost (d − k) columns of the full d-dimensional transformed data matrix D = DP will be approximately 0 (rather than the data mean), whether the data are mean centered or not. In general, PCA projects the data on a low-dimensional hyperplane passing through the data mean, whereas SVD projects the data on a low-dimensional hyperplane passing through the origin. PCA captures as much of the variance (or, squared Euclidean distance about the mean) of the data as possible, whereas SVD captures as much of the aggregate squared Euclidean distance about the origin as possible. This method of approximating a data matrix is referred to as

truncated SVD.

In the following, we will show that truncated SVD maximizes the aggregate squared Euclidean distances (or energy) of the transformed data points about the origin. Let v be a

46 CHAPTER 2. DATA PREPARATION

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 30 31 32 33 34 35 36 37 ... 423