Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə377/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   373   374   375   376   377   378   379   380   ...   423
1-Data Mining tarjima

pij =

(19.27)




n







k=1 wik







The transition probability pji of edge (j, i) is defined as follows:

pji =

wji

(19.28)




n







k=1 wjk







For example, the directed transition graph created from the undirected graph of Fig. 19.9 is illustrated in Fig. 19.11a.



  1. All outgoing edges from labeled nodes are removed from the graph G constructed in the previous step and replaced with a self-loop of transition probability 1. Such nodes are referred to as absorbing nodes because they trap the random walk after an incoming transition. An example of the final transition graph is illustrated in Fig. 19.11b. Therefore, for each absorbing node i, the ith row of P is replaced with the ith row of the identity matrix.

Assume that the final n × n transition matrix is denoted by P = [pij ]. For any absorbing node i, the value of pik is 1 only when i = k, and 0 otherwise. The transition matrix P does not have a unique steady-state probability distribution (or, PageRank vector), because of the presence of absorbing6 components. The steady-state probability distribution is dependent on the starting state of the random walk. For example, a random walk starting at test node X in Fig. 19.11b will always eventually end at label A, whereas a walk starting with node Y might end at either label A or B. It is noteworthy that the PageRank computation of Sect. 18.4.1 in Chap. 18 ensures unique steady-state probabilities by using teleportation to implicitly create a strongly connected transition graph. Interestingly, the modifications that create absorbing nodes have exactly the opposite effect because the steady state probability distribution depends on the starting state. Strong connectivity of the transition graph is required to ensure a unique steady- state distribution. However, if the starting state is fixed, then each node does have a steady state probability distribution.


For any given starting node i, the steady-state probability distribution has positive values only at labeled nodes. This is because a random walk will eventually reach an absorbing node in a label-connected graph, and it will never emerge from that node. Therefore, if one can estimate the steady-state probability distribution for starting node i, then the probability values of the labeled nodes in each class can be aggregated. The class with the highest probability is reported as the relevant label of the node i.


How can the steady-state probability be computed for a particular starting node i? Let π(t) represent the n-dimensional (row) probability vector after t steps, starting with a particular initial state π(0). When the starting state is node i, the value of π(0) is 1 for the ith component in this vector, and 0 otherwise. Then, we have:








(t) =




(t−1)P

(19.29)





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   373   374   375   376   377   378   379   380   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin