Data Mining: The Textbook


TEST NODE X STRONGLY CONNECTED NETWORK



Yüklə 17,13 Mb.
səhifə376/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   372   373   374   375   376   377   378   379   ...   423
1-Data Mining tarjima

TEST NODE X
STRONGLY CONNECTED NETWORK


TEST NODE Y
LABELED NODES EVENTUALLY TRAP ALL RANDOM WALKS TEST NODE X TEST NODE Y



A


B


A


B



A


B


A


B



UNIQUE PAGE RANK VECTOR


NO UNIQUE PAGE RANK VECTOR



(a)

No absorbing state

(b)

With absorbing states



Figure 19.11: Creating directed transition graphs from undirected graph of Fig. 19.9


nodes can then be added to the training data, and the classifier is retrained by extracting the link features again with the augmented training data set. The approach is repeated until the labels of all nodes have been made final. Because the labels of nt/T nodes are finalized in each iteration, the entire process terminates in exactly T iterations. The overall pseudocode is illustrated in Fig. 19.10.

One advantage of the ICA is that it can seamlessly use content and structure in the classification process. The classifier can automatically select the most relevant features using off-the-shelf feature selection algorithms discussed in Chap. 10. This approach also has the merit that it is not strongly dependent on the notion of homophily, and can, therefore, be used for domains beyond social network analysis. Consider an adversarial relationship network in which nodes connected by links might have different labels. In such cases, the ICA algorithm will automatically learn the correct importance of adjacent class distributions, and therefore it will yield accurate results. This property is not true of most of the other collective classification methods, which are explicitly dependent on the notion of homophily. On the other hand, the errors made in the earlier phases of iterative classification can propagate and multiply in later phases because of augmented training examples with incorrect labels. This can increase the cumulative error in noisy training data sets.


19.4.2 Label Propagation with Random Walks


The label propagation method directly uses random walks on the undirected network struc-ture G = (N, A). The weight of edge (i, j) is denoted by wij = wji. To classify an unlabeled node i, a random walk is executed starting at node i and terminated at the first labeled node encountered. The class at which the random walk has the highest probability of ter-mination is reported as the predicted label of node i. The intuition for this approach is that the walk is more likely to terminate at labeled nodes in the proximity of node i. Therefore, when many nodes of a particular class are located in its proximity, then the node i is more likely to be labeled with that class.


An important assumption is that the graph needs to be label connected. In other words, every unlabeled node needs to be able to reach a labeled node in the random walk. For undirected graphs G = ( N, A), this means that every connected component of the graph needs to contain at least one labeled node. In the following discussion, it will be assumed that the graph G = (N, A) is undirected and label-connected.


The first step is to model the random walks in such a way that they always terminate at their first arrival at labeled nodes. This can be achieved by removing outgoing edges from labeled nodes and replacing them with self-loops. Furthermore, to use a random walk approach, we need to convert the undirected graph G = (N, A) into a directed graph G = (N, A ) with an n × n transition matrix P = [pij ]:



644 CHAPTER 19. SOCIAL NETWORK ANALYSIS



  1. For each undirected edge (i, j) ∈ A, directed edges (i, j) and (j, i) are added to A between the corresponding nodes. The transition probability pij of edge (i, j) is defined

as follows:

wij








Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   372   373   374   375   376   377   378   379   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin