Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	377/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 373 374 375 376 377 378 379 380 ... 423

1-Data Mining tarjima

p_ij =	(19.27)
	(19.27)
n
	_k₌₁^wik

The transition probability p_ji of edge (j, i) is defined as follows:

p_ji =	^wji	(19.28)
p_ji =	n	(19.28)
	_k₌₁^wjk

For example, the directed transition graph created from the undirected graph of Fig. 19.9 is illustrated in Fig. 19.11a.

All outgoing edges from labeled nodes are removed from the graph G constructed in the previous step and replaced with a self-loop of transition probability 1. Such nodes are referred to as absorbing nodes because they trap the random walk after an incoming transition. An example of the final transition graph is illustrated in Fig. 19.11b. Therefore, for each absorbing node i, the ith row of P is replaced with the ith row of the identity matrix.

Assume that the final n × n transition matrix is denoted by P = [p_ij ]. For any absorbing node i, the value of p_ik is 1 only when i = k, and 0 otherwise. The transition matrix P does not have a unique steady-state probability distribution (or, PageRank vector), because of the presence of absorbing6 components. The steady-state probability distribution is dependent on the starting state of the random walk. For example, a random walk starting at test node X in Fig. 19.11b will always eventually end at label A, whereas a walk starting with node Y might end at either label A or B. It is noteworthy that the PageRank computation of Sect. 18.4.1 in Chap. 18 ensures unique steady-state probabilities by using teleportation to implicitly create a strongly connected transition graph. Interestingly, the modifications that create absorbing nodes have exactly the opposite eﬀect because the steady state probability distribution depends on the starting state. Strong connectivity of the transition graph is required to ensure a unique steady- state distribution. However, if the starting state is fixed, then each node does have a steady state probability distribution.

For any given starting node i, the steady-state probability distribution has positive values only at labeled nodes. This is because a random walk will eventually reach an absorbing node in a label-connected graph, and it will never emerge from that node. Therefore, if one can estimate the steady-state probability distribution for starting node i, then the probability values of the labeled nodes in each class can be aggregated. The class with the highest probability is reported as the relevant label of the node i.

How can the steady-state probability be computed for a particular starting node i? Let π(t) represent the n-dimensional (row) probability vector after t steps, starting with a particular initial state π(0). When the starting state is node i, the value of π(0) is 1 for the ith component in this vector, and 0 otherwise. Then, we have:

	(t) =		(t−1)_P	(19.29)

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 373 374 375 376 377 378 379 380 ... 423