Data Mining: The Textbook

SOURCE (SP(i), N(i)) = (0, 1)

Yüklə 17,13 Mb.

səhifə	372/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 368 369 370 371 372 373 374 375 ... 423

1-Data Mining tarjima

SOURCE	(SP(i), N(i)) = (0, 1)

						0		SOURCE
		0		NODE			1	3	NODE
1			3
		6					2
		2			(1, 1)	1		2	(3, 2)
1				2
						4		2
4			2
							3 (5, 3)
		3
							1		2
1			2
					(6, 3)	4		5		(7, 3)
4		3	5

2			1			2	6	1
		6
							(8, 6)

Original graph(b) Tight-edge subgraph

Figure 19.5: Original graph and subgraph of tight edges

between nodes s and k, that pass through edge (i, j). The corresponding components of node betweenness centrality and edge betweenness centrality, specific to node s, are denoted by B_s(i) and b_s(i, j), and they are defined as follows:

B_s(i) =	f_sk(i)	(19.18)
k=s
b_s(i, j) =	F_sk(i, j)	(19.19)
	k=s

It is easy to see that the unnormalized values3 of the node betweenness centrality of i and the edge betweenness centrality of (i, j) may be obtained by respectively summing up each of B_s(i) and b_s(i, j) over the diﬀerent source nodes s.

The graph G_s of tight edges is used to compute these values. The key is to set up recursive relationships between B_s(i) and b_s(i, j) as follows:

B_s(j) =	b_s(i, j)	(19.20)
i:(i,j)∈A^s
B_s(i) = 1 +	b_s(i, j)	(19.21)
	j:(i,j)∈A^s

These relationships follow from the fact that shortest paths through a particular node always pass through exactly one of its incoming and outgoing edges, unless they end at that node. The second equation has an additional credit of 1 to account for the paths ending at node i, for which the full fractional credit f_si(i) = 1 is given to B_s(i).

The source node s is always assigned a betweenness score of B_s(s) = 0. The nodes and edges of the directed acyclic tight graph Gs are processed “bottom up,” starting at the nodes without any outgoing edges. The score B_s(i) of a node i is finalized, only after the

The normalized values, such as those in Eq. 19.13, may be obtained by dividing the unnormalized values by n · (n − 1) for a network with n nodes. The constant of proportionality is irrelevant because the Girvan–Newman algorithm requires only the identification of the edge with the largest betweenness.

634 CHAPTER 19. SOCIAL NETWORK ANALYSIS

scores on all its outgoing edges have been finalized. Similarly, the score b_s (i, j) of an edge (i, j) is finalized only after the score B_s(j) of node j has been finalized. The algorithm starts by setting all nodes j without any outgoing edges to have a score of B_s(j) = f_sj (j ) = 1. This is because such a node j, without outgoing edges, is (trivially) a intermediary between s and j, but it cannot be an intermediary between s and any other node. Then, the algorithm iteratively updates scores of nodes and edges in the bottom-up traversal as follows:

Edge Betweenness Update: Each edge (i, j) is assigned a score b_s(i, j) that is based on partitioning the score B_s(j) into all the incoming edges (i, j) based on Eq. 19.20. The value of b_s(i, j) is proportional to N_s(i) that was computed earlier. Therefore, b_s(i, j) is computed as follows.

b_s(i, j) =	N_s(i) · B_s(j)	(19.22)
	_k_:(_k,j₎_∈_As ^Ns(^k)

Node Betweenness Update: The value of B_s(i) is computed by summing up the values of b_s(i, j) of all its outgoing edges and then adding 1, according to Eq. 19.21.

This entire procedure is repeated over all source nodes, and the values are added up. Note that this provides unscaled values of the node and edge betweenness, which may range from 0 to n · (n − 1). The (aggregated) value of B_s(i) over all source nodes s can be converted to C_B (i) of Eq. 19.13 by dividing it with n · (n − 1).

The betweenness values can be computed more eﬃciently incrementally after edge removals in the Girvan–Newman algorithm. This is because the graphs of tight edges can be computed more eﬃciently by the use of the incremental shortest path algorithm. The bibliographic notes contain pointers to these methods. Because most of the betweenness computations are incremental, they do not need to be performed from scratch, which makes the algorithm more eﬃcient. However, the algorithm is still quite expensive in practice.

19.3.3 Multilevel Graph Partitioning: METIS

Most of the aforementioned algorithms are quite slow in practice. Even the spectral algo-rithm, discussed later in this section, is quite slow. The METIS algorithm was designed to provide a fast alternative for obtaining high-quality solutions. The METIS algorithm allows the specification of weights on both the nodes and edges in the clustering process. Therefore, it will be assumed that the weight on each edge (i, j) of the graph G = (N, A) is denoted by w_ij , and the weight on node i is denoted by v_i.

The METIS algorithm can be used to perform either k-way partitioning or 2-way par-titioning. The k-way multilevel graph-partitioning method is based on top-down 2-way recursive bisection of the graph to create k-way partitionings, although variants to perform direct k-way partitioning also exist. Therefore, the following discussion will focus on the 2-way bisection of the graph.

The METIS algorithm uses the principle that the partitioning of a coarsened represen-tation of a graph can be used to eﬃciently derive an approximate partition of the original graph. The coarsened representation of a graph is obtained by contracting some of the adja-cent nodes into a single node. The contraction may result in self-loops that are removed. Such self-loops are also referred to as collapsed edges. The weights of the contracted nodes are equal to the sum of the weights of the constituent nodes in the original graph. Simi-larly, the parallel edges across contracted nodes are consolidated into a single edge with the weights of the constituent edges added together. An example of a coarsened representation of a graph, in which some pairs of adjacent nodes are contracted, is illustrated in Fig. 19.6.

19.3. COMMUNITY DETECTION

635

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 368 369 370 371 372 373 374 375 ... 423