n
|
|
U
|
|
|
|
|
|
|
|
|
|
|
n
|
|
|
|
|
|
|
Kernel PCA
|
Centered kernel matrix S
|
=(I−
|
|
)K(I −
|
|
)
|
|
|
n
|
n
|
|
|
|
(cf. Sect. 10.6.4.1 of Chap. 10)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
objective function O of Eq. 2.16 is usually divided by i,j:i δij2 to yield a value in (0, 1).
|
|
The square root of this value is referred to as Kruskal stress.
|
|
|
|
The basic assumption in classical MDS is that the distance matrix Δ = [δ2 ]
|
n×n
|
is
|
|
ij
|
|
|
generated by computing pairwise Euclidean distances in some hypothetical data matrix D
|
|
for which the entries and dimensionality are unknown. The matrix D can never be recovered completely in classical MDS because Euclidean distances are invariant to mean translation and axis rotation. The appropriate conventions for the data mean and axis orientation will be discussed later. While the optimization of Eq. 2.16 requires numerical techniques, a direct solution to classical MDS can be obtained by eigen decomposition under the assumption that the specified distance matrix is Euclidean:
Any pairwise (squared) distance matrix Δ = [δij2 ]n×n can be converted into a sym-metric dot-product matrix Sn×n with the help of the cosine law in Euclidean space. In particular, if Xi and Xj are the embedded representations of the ith and jth nodes, the dot product between Xi and Xj can be related to the distances as follows:
Xi · Xj = −
|
1
|
||Xi − Xj ||2 − (||Xi||2 + ||Xj ||2) ∀i, j ∈ {1 . . . n}
|
(2.17)
|
|
2
|
|
For a mean-centered embedding, the value of ||Xi||2 + ||Xj ||2 can be expressed (see
Exercise 9) in terms of the entries of the distance matrix Δ as follows:
|
|
|
|
|
|
|
|
n
|
|
|
|
|
|
|
n
|
|
|
|
|
|
|
n
|
n
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||Xi − Xp||2
|
|
|
||Xj − Xq||2
|
|
|
|
||Xp − Xq||2
|
|
|
|
2
|
|
|
|
2
|
|
p=1
|
+
|
q=1
|
|
p=1
|
q=1
|
|
|
+||Xj ||
|
=
|
−
|
|
||Xi||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
n
|
|
|
n
|
|
n2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(2.18)
|
|
A mean-centering assumption is necessary because the Euclidean distance is mean invariant, whereas the dot product is n ot. By substituting Eq. 2.18 in Eq. 2.17, it is possible to express the dot product X i · Xj fully in terms of the entries of the distance matrix Δ. Because this condition is true for all possible values of i and j, we can conveniently express it in n × n matrix form. Let U be the n × n matrix of all 1s, and let I be the identity matrix. Then, our argument above shows that the dot-product matrix S is equal to − 12 ( I − Un )Δ( I − Un ). Under the Euclidean assumption, the matrix S is always positive semidefinite because it is equal to the n × n dot-product matrix DDT of the unobserved data matrix D, which has unknown dimensionality. Therefore, it is desired to determine a high-quality factorization of S into the form DkDkT , where Dk is an n × k matrix of dimensionality k.
2.4. DATA REDUCTION AND TRANSFORMATION
|
57
|
Such a factorization can be obtained with eigen decomposition. Let S ≈ QkΣ2kQTk = (QkΣk)(QkΣk)T represent the approximate diagonalization of S, where Qk is an n ×k matrix containing the largest k eigenvectors of S, and Σ2k is a k × k diagonal matrix containing the eigenvalues. The embedded representation is given by Dk = QkΣk. Note that SVD also derives the optimal embedding as the scaled eigenvectors of the dot-product matrix of the original data. Therefore, the squared error of representation is minimized by this approach. This can also be shown to be equivalent to minimizing the Kruskal stress.
The optimal solution is not unique, because we can multiply QkΣk with any k × k matrix with orthonormal columns, and the pairwise Euclidean distances will not be affected. In other words, any representation of QkΣk in a rotated axis system is optimal as well. MDS finds an axis system like PCA in which the individual attributes are uncorrelated. In fact, if classical MDS is applied to a distance matrix Δ, which is constructed by computing the pairwise Euclidean distances in an actual data set, then it will yield the same embedding as the application of PCA on that data set. MDS is useful when such a data set is not available to begin with, and only the distance matrix Δ is available.
As in all dimensionality reduction methods, the value of the dimensionality k provides the trade-off between representation size and accuracy. Larger values of the dimensionality k will lead to lower stress. A larger number of data points typically requires a larger dimensionality of representation to achieve the same stress. The most crucial element is, however, the inherent structure of the distance matrix. For example, if a 10, 000 × 10, 000 distance matrix contains the pairwise driving distance between 10,000 cities, it can usually be approximated quite well with just a 2-dimensional representation. This is because driving distances are an approximation of Euclidean distances in 2-dimensional space. On the other hand, an arbitrary distance matrix may not be Euclidean and the distances may not even satisfy the triangle inequality. As a result, the matrix S might not be positive semidefinite. In such cases, it is sometimes still possible to use the metric assumption to obtain a high-quality embedding. Specifically, only those positive eigenvalues may be used, whose magnitude exceeds that of the most negative eigenvalue. This approach will work reasonably well if the negative eigenvalues have small magnitude.
Dostları ilə paylaş: |