Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə368/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   364   365   366   367   368   369   370   371   ...   423
1-Data Mining tarjima

COMMUNITY 1
COMMUNITY 2


A B

C
COMMUNITY 4
HUBS



COMMUNITY 3

Figure 19.2: Impact of hubs on communities





  • Different parts of the social network have different edge densities. In other words, the local clustering coefficients in distinct parts of the social network are typically quite different. As a result, when specific choices of parameters are used to quantify the clusters globally, it leads to unbalanced clusters because a single global parameter choice is not relevant in many network localities.




  • Real social networks often have a giant component that is densely connected. This contributes further to the tendency of community detection algorithms to create imbal-anced clusters, where a single cluster is the beneficiary of most of the nodes in the network.

Many network clustering algorithms have built-in mechanisms to address such issues. In the following, a discussion of some of the most well-known network clustering algorithms will be provided.


Assume that the undirected network is denoted by G = (N, A). The weight of the edge (i, j) between nodes i and j, is denoted by wij = wji . In some cases, the inverse concept of edge costs (or lengths) is specified instead of weights. In such cases, we assume that the edge cost is denoted by cij . These values can be converted to one another by using wij = 1/cij , or a suitably chosen kernel function.


The problem of network clustering, or community detection, is that of partitioning the network into k sets of nodes, such that the sum of the weights of the edges with end points in different partitions is minimized. Many variations of this basic objective function are used in practice and are able to achieve different application-specific goals, such as partition balancing in which different clusters have similar numbers of nodes.


In the special case, where wij = 1, and there are no balancing constraints on partitions, the 2-way cut problem is polynomially solvable. The reader is advised to refer to the biblio-graphic notes for pointers to Karger’s randomized minimum cut algorithm. This algorithm can determine the minimum cut in O(n2logr (n)) time for a network containing n nodes, where r is a constant regulating the desired level of probabilistic accuracy. However, the resulting cut is usually not balanced. Incorporating arbitrary edge weights or balancing con-straints makes the problem NP-hard. Many network clustering algorithms focus on balanced 2-way partitioning of graphs. A 2-way partitioning can be recursively used to generate a k-way partitioning.



19.3. COMMUNITY DETECTION

629

19.3.1 Kernighan–Lin Algorithm


The Kernighan–Lin algorithm is a classical method for balanced 2- way graph partitioning. The basic idea is to start with an initial partitioning of the graph into two equal1 subsets of nodes. The algorithm then iteratively improves this partitioning, until it converges to an optimal solution. This solution is not guaranteed to be the global optimum, but it is usually a good heuristic approximation. This iterative improvement is performed by determining sequences of exchanges of nodes between partitions that improve the clustering objective function as much as possible. To evaluate the improvement in the clustering objective func-tion by performing an exchange between a pair of nodes, some carefully chosen measures need to be continuously tracked maintained at each node. These will be discussed below.


The internal cost I i of node i is the sum of the weights of edges incident on i, whose other end is present in the same partition as node i. The external cost Ei of node i, is the sum of the weights of the edges incident on i, whose other end is in a different partition than node i. Moving a node from one partition to the other changes its external cost to its internal cost and vice versa. Therefore, the gain Di by moving a node i from one partition to the other is given by the difference between the external and the internal cost.






Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   364   365   366   367   368   369   370   371   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin