Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	379/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 375 376 377 378 379 380 381 382 ... 423

1-Data Mining tarjima

j∈N [P ^∞]ij . Let Yc be a column vector with n entries such that the jth entry is 1, if node
c
j belongs to class c, and 0, otherwise. Then, it is easy to see that the ith entry of the column

vector		= P ^∞		is equivalent to	_j_∈_N_c[P ^∞]_ij , which is the sum of the probabilities of a
	Z_c		Y_c
walk starting at unlabeled node i terminating at various nodes belonging to class c.

Therefore, we need to compute Z_c for each class c ∈ {1 . . . k}. Let Y be an n × k matrix for which the cth column is Y_c. Similarly, let Z be an n × k matrix for which the cth column is Z_c. Then Z can be obtained with simple matrix multiplication between P ∞ and Y .

Z=P^∞Y

(19.34)

646 CHAPTER 19. SOCIAL NETWORK ANALYSIS

The class with the maximum probability in Z for unlabeled node (row) i may be reported as its class label. This approach is also referred to as the rendezvous approach to label propagation.

We make a few important observations. If the ith row of P is absorbing then it is the same as the ith row of the identity matrix. Therefore, premultiplying Y with P for any number of times will not change the ith row of Y . In other words, rows of Z that correspond to labeled nodes will be fixed to the corresponding rows of Y . Therefore, predictions of labeled nodes are fixed to their training labels. For unlabeled nodes, the rows of Z will always sum to 1 in label-connected networks. This is because the sum of the values in row i in Z is equal to the probability that a random walk starting at node i reaches an absorbing state. In label-connected networks, every random walk will eventually reach an absorbing state.

19.4.2.1 Iterative Label Propagation: The Spectral Interpretation

Equation 19.34 suggests a simple iterative approach for computing the label probabilities in Z, rather than computing P ∞. One can initialize Z(0) = Y and then repeatedly use the following update for increasing value of iteration index t.

_Z(t+1) = _{P Z}(t)

(19.35)

It is easy to see that Z(∞) is the same as the value of Z in Eq. 19.34. For labeled (absorbing) node i, the ith row of Z will always be unaﬀected by the update because the ith row of

is the same as that of the identity matrix. The label-propagation update is executed to convergence. In practice, a relatively small number of iterations are required to reach convergence.

The label propagation update can be rearranged to show that the final solution Z will satisfy the following relationship at convergence:

(I−P)Z =0

(19.36)

Note that I −P is simply the normalized (random walk) Laplacian of the adjacency matrix of the network G with absorbing states. Furthermore, each column of Z is a eigenvector of this Laplacian with eigenvalue 0. In unsupervised spectral clustering, the first eigenvector with eigenvalue 0 is discarded because it is not informative. However, in collective classification, there are additional eigenvectors of (I − P ) with eigenvalue 0 because of the presence of absorbing states. Each class-specific column of Z contains a diﬀerent eigenvector with eigenvalue 0. In fact, the label propagation solution can also be derived with an optimization formulation similar to spectral clustering on the original undirected graph G. In this case, the optimization formulation uses a similar objective function as spectral clustering with the additional constraint that the embedded values of all labeled nodes are fixed to 1 for a particular (say, the cth) class and they are fixed to 0 for the remaining classes. The embedded values of only the unlabeled nodes are unconstrained decision variables. The solution to each such optimization problem can be shown to be an eigenvector of (I − P ), with eigenvalue 0. Iterative label propagation converges to these eigenvectors.

19.4.3 Supervised Spectral Methods

Spectral methods can be used in two diﬀerent ways for collective classification of graphs. The first method directly transforms the graph to multidimensional data to facilitate the use of a multidimensional classifier such as a k-nearest neighbor classifier. The embedding

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 375 376 377 378 379 380 381 382 ... 423