Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	387/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 383 384 385 386 387 388 389 390 ... 423

1-Data Mining tarjima

19.8. BIBLIOGRAPHIC NOTES

659

19.8 Bibliographic Notes

Social network analysis has been studied extensively in the context of the field of sociol-ogy [508 ], though more recent work has focused on online social networks [6, 192, 532]. A detailed discussion on proximity and centrality measures may be found in [6, 192 , 508, 532]. The dynamics of social network formation may be found in the excellent survey paper [69]. The derivation of the power-law with the use of the scale-free model is provided in [70]. A detailed study of the power-law in the context of the Internet topology is provided in [201]. A study of graph densification and shrinking diameters is provided in [342]. Other random graph models such as the Erdos–Renyi model and the Watts–Strogatz small-world model are discussed in [196, 509].

A detailed survey on community detection methods may be found in [212]. The minimum cut problem is polynomially solvable for some special cases. For example, the unweighted 2-way cut problem is polynomially solvable without balancing constraints [299]. The original Kernighan–Lin algorithm is presented in [312]. The enhancements to the Kernighan–Lin algorithm was discussed in [206, 301]. The Girvan–Newman algorithm discussed in this chapter is adapted from [230]. The METIS algorithm is presented in [ 301]. The normalized cut method for spectral clustering was discussed in this chapter [466]. The normalized sym-metric version was proposed in [405]. More details on spectral graph theory and clustering methods may be found in [152, 371]. This chapter uses the Laplacian eigenmap interpre-tation [90] of spectral clustering, rather than the more commonly used cut interpretation, because of its comprehensive explanation of the non-integer and possibly negative eigenvec-tor components.

ICA has been presented in the context of many diﬀerent data domains, such as docu-ment data [128], and relational data [404]. Several base classifiers have been used within this framework, such as logistic regression [370] and a weighted voting classifier [373]. The discussion in this chapter is based on [404]. The iterative label propagation method was proposed in [554], and the absorbing random walk interpretation is adapted from [78 ]. The iterative label propagation approach [554] was originally proposed with a spectral inter-pretation although the random walk interpretation is also briefly discussed in the same work. Most random walk methods can also be formulated as supervised versions of spectral embeddings [530, 551, 554 ]. The regularization framework for collective classification is dis-cussed in [551]. Collective classification of directed graphs is discussed in [552]. A method for incorporating content within the random walk framework is discussed in [44]. Detailed surveys on node classification methods may be found in [93, 368]. A toolkit for collective classification may be found in [427].

The link-prediction problem for social networks was proposed in [353 ]. The measures discussed in this chapter are based on this work. Since then, a significant amount of work has been done on incorporating content into the link prediction process. Methods that use content for link prediction may be found in [49, 64, 354, 484, 489]. The merits of supervised methods are discussed in [354], and matrix factorization methods are discussed in [383]. Recently, it has been shown how to use link prediction across multiple networks in [428]. A survey on link-prediction methods for social network analysis may be found in [63].

The problem of influence analysis in social networks was proposed in [ 304]. The linear threshold and independent cascade models are presented in this work. The degree-discount heuristic was proposed in [142]. A discussion of the submodularity property may be found in [403]. Other recent models for influence analysis in social networks are discussed in [45, 143, 144 , 362, 488]. One of the main problems in social influence models is a diﬃculty in learning the influence propagation probabilities, though there has been some recent focus

660 CHAPTER 19. SOCIAL NETWORK ANALYSIS

on this issue [235 ]. Recent work has also shown how influence analysis can be performed directly from the social stream [234, 482]. A survey on models and algorithms for social influence analysis may be found in [483].

19.9 Exercises

For the figure in Example 19.1a, compute the highest-degree centrality, closeness cen-trality and betweenness centrality. The nodes that take on these highest values are already marked in the figure.

Implement the algorithms for determining the degree centrality, closeness centrality, and betweenness centrality.

Implement the Kernighan–Lin algorithm.

Why is the balancing constraint more important in community detection algorithms, as compared to multidimensional clustering algorithms? What would the uncon-strained minimum 2-way cut look like in a typical real network?

Consider a variation of Girvan–Newman algorithm in which edges are randomly dis-connected from a network, as opposed to those with high betweenness centrality. Explain the negative impact of this change on the algorithm. Can you make minor changes to the disconnection criterion to ameliorate this impact?

Write an integer-programming formulation for the minimum 2-way cut problem, so that the cut is balanced in terms of the number of nodes.

For the random walk formulation of spectral clustering algorithm, show why the fol-lowing are true:

All nontrivial eigenvectors y have both positive and negative components.

Provide an intuitive explanation why the normalization factor Λ in the constraint yT Λy = 1, increases the propensity of low-degree nodes to be embedded away from the origin, and the high-degree nodes to be embedded near the origin.

Suppose that all edge weights w_ij are discounted by the geometric mean of the weighted node-degrees at their endpoints. Write an unnormalized formulation of spec-tral clustering in terms of these normalized weights for discovering a 1-dimensional embedding. What eﬀect would the weight normalization have on the embedding? Describe the algebraic similarities and diﬀerences of this formulation from the sym-metric normalized formulation of spectral clustering. Discuss why the resulting eigen-vectors will often be heuristically similar to those obtained with the symmetric for-mulation of spectral clustering.

Explain the relationship between the random walk label propagation and the graph regularization algorithm.

Discuss the connections between the link-prediction problem and network clustering.

Create a link prediction measure that can perform the degree normalizations per-formed both by the Jaccard measure and the Adamic–Adar measure.

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 383 384 385 386 387 388 389 390 ... 423