Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	378/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 374 375 376 377 378 379 380 381 ... 423

1-Data Mining tarjima

π	π

In other words, the underlying Markov chain is not strongly connected, and therefore not ergodic. See the description of the PageRank algorithm in Chap. 18.

19.4. COLLECTIVE CLASSIFICATION

645

By recursively applying the aforementioned condition t times, and then setting t = ∞, it is possible to show the following:

(t) =

(0)_P t

(19.30)

(∞) ₌

(0)_P ∞

(19.31)

How can the steady-state transition matrix P ∞ be computed? A key observation is that the largest magnitude of the eigenvalue of a stochastic matrix is always 1 (see Exercise 7 of Chap. 18). Therefore, P may be expressed as follows:

P =VΔV⁻¹

(19.32)

Here, V is an n × n matrix, whose columns contain the eigenvectors, and Δ is a diagonal matrix containing the eigenvalues, all of which have magnitude no larger than 1. Note that stochastic matrices with absorbing components will have an eigenvector with unit eigenvalue for each absorbing component. Then, by multiplying P with itself (t − 1) times, we get:

P ^t = V Δ^tV ⁻¹

(19.33)

In the limit where t approaches infinity, Δt will contain diagonal values of only 0 or 1. Any eigenvalue in the original matrix Δ with magnitude less than 1 will approach 0 in Δ∞. In other words, Δ∞ can be computed easily from Δ. Therefore, if V has been computed, then P ∞ can be computed easily as well. A further optimization is that the steady-state transition matrix P ∞ can be eﬃciently computed by determining only the l leading eigenvectors of P , where l is the number of labeled (absorbing) nodes. Refer to the bibliographic notes for more details of this optimization.

After P ∞ has been computed, it is relatively straightforward to compute the n-dimensional node probability vector π(∞) that results from starting the random walk at node i. When the starting state is (unlabeled) node i, the n-dimensional vector for the starting state π(0) contains 1 in the ith component, and 0 otherwise. According to our earlier discussion, one can compute π(∞) = π(0)P ∞. Note that π(∞) will contain positive probabilities only for the labeled nodes, which are also absorbing states. By summing up the probabilities in π(∞) of labeled nodes belonging to each class, one can obtain the probability of each class for unlabeled node i. The class with the maximum probability is reported as the relevant label.

There is, however, a simpler way of computing the class probability distributions of all unlabeled nodes in one shot, rather than having to explicitly compute P ∞, and then trying diﬀerent starting vectors for π⁽⁰⁾. For each class c ∈ {1 . . . k}, let N_c ⊆ N be the set of labeled nodes belonging to that class. In order for unlabeled node i to belong to class c , a walk starting at node i must end at a node in N_c. The probability of this event is given by

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 374 375 376 377 378 379 380 381 ... 423