Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə374/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   370   371   372   373   374   375   376   377   ...   423
1-Data Mining tarjima

Spectral embedding
(Random walk
Minimize trace(YTLY) version)
su bj ect to: YT Y = I



Note that neither

Spectral embedding

rows nor columns

(Symmetric version)

of matrix Y will




have unit norm




Minimize


trace(ZT 1/2L 1/2Z)
subject to: ZTZ = I


Note that rows of
matrix Z will not
have unit norm



Figure 19.8: Scaling relationships between random walk and symmetric versions of spectral clustering


intuitively generalize to the relaxed version of the problem because eigenvectors have both positive and negative components.


19.3.4.1 Important Observations and Intuitions


A few observations are noteworthy about the relationships between spectral clustering,




PageRank, and eigenvector analysis:



  1. Normalized random walk Laplacian: The smallest right eigenvectors of Λ1L = Λ1(Λ − W ) = I − P are used for the random walk embedding, where P is the stochastic transition matrix of the graph. The smallest right eigenvectors of I − P are the same as the largest right eigenvectors of P . The largest right eigenvector of P has eigenvalue 1. It is noteworthy that the largest left eigenvector of P , which also has eigenvalue 1, yields the PageRank of the graph. Therefore, both the left and the right eigenvectors of the stochastic transition matrix P yield different insights about the network.




  1. Normalized symmetric Laplacian: The smallest eigenvectors of the symmetric Lapla-cian Λ1/2(Λ − W1/2 are the same as the largest eigenvectors of the symmetric matrix Λ1/2W Λ1/2. The matrix Λ1/2W Λ1/2 can be viewed as a normalized and sparsified similarity matrix of the graph. Most forms of nonlinear embeddings such as SVD, Kernel PCA, and ISOMAP are extracted as large eigenvectors of similar-ity matrices (cf. Table 2.3 of Chap. 2). It is the choice of the similarity matrix that regulates the varying properties of these different embeddings.




  1. Goal of normalization: Spectral clustering is more effective when the unnormalized Laplacian L is normalized with the node-degree matrix Λ. While it is possible to explain this behavior with a cut-interpretation of spectral clustering, the intuition does not generalize easily to continuous embeddings with both positive and negative eigenvector components. A simpler way of understanding normalization is by examin-ing the similarity matrix Λ1/2W Λ1/2 whose large eigenvectors yield the normalized spectral embedding. In this matrix, the edge similarities are normalized by the geomet-ric mean of the node degrees at their end points. This can be viewed as a normalization of the edge similarities with a local measure of the network density. As discussed in Chap. 3, normalizing similarity and distance functions with local density is helpful even in the case of multidimensional data mining applications. One of the most well-known algorithms for outlier analysis in multidimensional data, referred to as LOF,

19.4. COLLECTIVE CLASSIFICATION

641




Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   370   371   372   373   374   375   376   377   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin