Data Mining: The Textbook


SOURCE (SP(i), N(i)) = (0, 1)



Yüklə 17,13 Mb.
səhifə372/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   368   369   370   371   372   373   374   375   ...   423
1-Data Mining tarjima

SOURCE




(SP(i), N(i)) = (0, 1)

























0




SOURCE










0




NODE




1

3

NODE




1

3
















6










2
















2







(1, 1)

1

2

(3, 2)




1







2



















4




2







4




2



















3 (5, 3)







3




























1




2







1




2
















(6, 3)

4




5

(7, 3)




4




3

5




























2




1




2

6

1







6


































(8, 6)











































  1. Original graph(b) Tight-edge subgraph

Figure 19.5: Original graph and subgraph of tight edges


between nodes s and k, that pass through edge (i, j). The corresponding components of node betweenness centrality and edge betweenness centrality, specific to node s, are denoted by Bs(i) and bs(i, j), and they are defined as follows:



Bs(i) =

fsk(i)

(19.18)

k=s




bs(i, j) =

Fsk(i, j)

(19.19)




k=s




It is easy to see that the unnormalized values3 of the node betweenness centrality of i and the edge betweenness centrality of (i, j) may be obtained by respectively summing up each of Bs(i) and bs(i, j) over the different source nodes s.


The graph Gs of tight edges is used to compute these values. The key is to set up recursive relationships between Bs(i) and bs(i, j) as follows:





Bs(j) =

bs(i, j)

(19.20)

i:(i,j)∈As




Bs(i) = 1 +

bs(i, j)

(19.21)




j:(i,j)∈As




These relationships follow from the fact that shortest paths through a particular node always pass through exactly one of its incoming and outgoing edges, unless they end at that node. The second equation has an additional credit of 1 to account for the paths ending at node i, for which the full fractional credit fsi(i) = 1 is given to Bs(i).


The source node s is always assigned a betweenness score of Bs(s) = 0. The nodes and edges of the directed acyclic tight graph Gs are processed “bottom up,” starting at the nodes without any outgoing edges. The score Bs(i) of a node i is finalized, only after the








  • The normalized values, such as those in Eq. 19.13, may be obtained by dividing the unnormalized values by n · (n − 1) for a network with n nodes. The constant of proportionality is irrelevant because the Girvan–Newman algorithm requires only the identification of the edge with the largest betweenness.

634 CHAPTER 19. SOCIAL NETWORK ANALYSIS

scores on all its outgoing edges have been finalized. Similarly, the score bs (i, j) of an edge (i, j) is finalized only after the score Bs(j) of node j has been finalized. The algorithm starts by setting all nodes j without any outgoing edges to have a score of Bs(j) = fsj (j ) = 1. This is because such a node j, without outgoing edges, is (trivially) a intermediary between s and j, but it cannot be an intermediary between s and any other node. Then, the algorithm iteratively updates scores of nodes and edges in the bottom-up traversal as follows:





  • Edge Betweenness Update: Each edge (i, j) is assigned a score bs(i, j) that is based on partitioning the score Bs(j) into all the incoming edges (i, j) based on Eq. 19.20. The value of bs(i, j) is proportional to Ns(i) that was computed earlier. Therefore, bs(i, j) is computed as follows.

bs(i, j) =

Ns(i) · Bs(j)

(19.22)




k:(k,j)As Ns(k)
















  • Node Betweenness Update: The value of Bs(i) is computed by summing up the values of bs(i, j) of all its outgoing edges and then adding 1, according to Eq. 19.21.

This entire procedure is repeated over all source nodes, and the values are added up. Note that this provides unscaled values of the node and edge betweenness, which may range from 0 to n · (n − 1). The (aggregated) value of Bs(i) over all source nodes s can be converted to CB (i) of Eq. 19.13 by dividing it with n · (n − 1).


The betweenness values can be computed more efficiently incrementally after edge removals in the Girvan–Newman algorithm. This is because the graphs of tight edges can be computed more efficiently by the use of the incremental shortest path algorithm. The bibliographic notes contain pointers to these methods. Because most of the betweenness computations are incremental, they do not need to be performed from scratch, which makes the algorithm more efficient. However, the algorithm is still quite expensive in practice.


19.3.3 Multilevel Graph Partitioning: METIS


Most of the aforementioned algorithms are quite slow in practice. Even the spectral algo-rithm, discussed later in this section, is quite slow. The METIS algorithm was designed to provide a fast alternative for obtaining high-quality solutions. The METIS algorithm allows the specification of weights on both the nodes and edges in the clustering process. Therefore, it will be assumed that the weight on each edge (i, j) of the graph G = (N, A) is denoted by wij , and the weight on node i is denoted by vi.


The METIS algorithm can be used to perform either k-way partitioning or 2-way par-titioning. The k-way multilevel graph-partitioning method is based on top-down 2-way recursive bisection of the graph to create k-way partitionings, although variants to perform direct k-way partitioning also exist. Therefore, the following discussion will focus on the 2-way bisection of the graph.


The METIS algorithm uses the principle that the partitioning of a coarsened represen-tation of a graph can be used to efficiently derive an approximate partition of the original graph. The coarsened representation of a graph is obtained by contracting some of the adja-cent nodes into a single node. The contraction may result in self-loops that are removed. Such self-loops are also referred to as collapsed edges. The weights of the contracted nodes are equal to the sum of the weights of the constituent nodes in the original graph. Simi-larly, the parallel edges across contracted nodes are consolidated into a single edge with the weights of the constituent edges added together. An example of a coarsened representation of a graph, in which some pairs of adjacent nodes are contracted, is illustrated in Fig. 19.6.

19.3. COMMUNITY DETECTION










635





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   368   369   370   371   372   373   374   375   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin