Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə378/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   374   375   376   377   378   379   380   381   ...   423
1-Data Mining tarjima

π

π










  • In other words, the underlying Markov chain is not strongly connected, and therefore not ergodic. See the description of the PageRank algorithm in Chap. 18.

19.4. COLLECTIVE CLASSIFICATION

645

By recursively applying the aforementioned condition t times, and then setting t = , it is possible to show the following:












(t) =







(0)P t

(19.30)










π

π







() =




(0)P

(19.31)




π

π




How can the steady-state transition matrix P ∞ be computed? A key observation is that the largest magnitude of the eigenvalue of a stochastic matrix is always 1 (see Exercise 7 of Chap. 18). Therefore, P may be expressed as follows:



P =VΔV1

(19.32)

Here, V is an n × n matrix, whose columns contain the eigenvectors, and Δ is a diagonal matrix containing the eigenvalues, all of which have magnitude no larger than 1. Note that stochastic matrices with absorbing components will have an eigenvector with unit eigenvalue for each absorbing component. Then, by multiplying P with itself (t − 1) times, we get:





P t = V ΔtV 1

(19.33)

In the limit where t approaches infinity, Δt will contain diagonal values of only 0 or 1. Any eigenvalue in the original matrix Δ with magnitude less than 1 will approach 0 in Δ. In other words, Δ can be computed easily from Δ. Therefore, if V has been computed, then P ∞ can be computed easily as well. A further optimization is that the steady-state transition matrix P ∞ can be efficiently computed by determining only the l leading eigenvectors of P , where l is the number of labeled (absorbing) nodes. Refer to the bibliographic notes for more details of this optimization.


After P ∞ has been computed, it is relatively straightforward to compute the n-dimensional node probability vector π() that results from starting the random walk at node i. When the starting state is (unlabeled) node i, the n-dimensional vector for the starting state π(0) contains 1 in the ith component, and 0 otherwise. According to our earlier discussion, one can compute π() = π(0)P ∞. Note that π() will contain positive probabilities only for the labeled nodes, which are also absorbing states. By summing up the probabilities in π() of labeled nodes belonging to each class, one can obtain the probability of each class for unlabeled node i. The class with the maximum probability is reported as the relevant label.


There is, however, a simpler way of computing the class probability distributions of all unlabeled nodes in one shot, rather than having to explicitly compute P ∞, and then trying different starting vectors for π(0). For each class c ∈ {1 . . . k}, let Nc ⊆ N be the set of labeled nodes belonging to that class. In order for unlabeled node i to belong to class c , a walk starting at node i must end at a node in Nc. The probability of this event is given by





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   374   375   376   377   378   379   380   381   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin