Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə366/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   362   363   364   365   366   367   368   369   ...   423
1-Data Mining tarjima

Shrinking diameters: In most real-world networks, as the network densifies, the average distances between the nodes shrink over time. This experimental observation is in contrast to conventional models that suggest that the diameters should increase as log(n(t)). This unexpected behavior is a consequence of the fact that the addition of new edges dominates the addition of new nodes. Note that if the impact of adding new nodes were to dominate, then the average distances between nodes would increase over time.




  1. Giant connected component: As the network densifies over time, a giant connected component emerges. The emergence of a giant connected component is consistent with the principle of preferential attachment, in which newly incoming edges are more likely to attach themselves to the densely connected and high-degree nodes in the network. This property also has a confounding impact on network clustering algorithms, because it typically leads to unbalanced clusters, unless the algorithms are carefully designed.

Preferential attachment also has a significant impact on the typical structure of online networks. It results in a small number of very high-degree nodes that are also referred to as hubs. The hub nodes are usually connected to many different regions of the network and, therefore, have a confounding impact on many network clustering algorithms. The notion of hubs, as discussed here, is subtly different from the notion of hubs, as discussed in the HITS algorithm, because it is not specific to a query or topic. Nevertheless, the intuitive notion of nodes being central points of connectivity in a network, is retained in both cases.


19.2.4 Power-Law Degree Distributions


A consequence of preferential attachment is that a small minority of high-degree nodes continue to attract most of the newly added nodes. It can be shown that the number of nodes P (k) with degree k, is regulated by the following power-law degree distribution:





P (k) ∝ k−γ

(19.4)

The value of the parameter γ ranges between 2 and 3. It is noteworthy that larger values of





  • lead to more small degree nodes. For example, when the value of γ is 3, the vast majority of the nodes in the network will have a degree of 1. On the other hand, when the value of

  • is small, the degree distribution is less skewed.

19.2.5 Measures of Centrality and Prestige


Nodes that are central to the network have a significant impact on the properties of the network, such as its density, pairwise shortest path distances, connectivity, and clustering behavior. Many of these nodes are hub nodes, with high degrees that are a natural result of the dynamical processes of large network generation. Such actors are often more prominent because they have ties to many actors and are in a position of better influence. Their impact on network mining algorithms is also very significant. A related notion of centrality is prestige, which is relevant for directed networks. For example, on Twitter, an actor with a larger number of followers has greater prestige. On the other hand, following a large number of individuals does not bring any prestige but is indicative of the gregariousness of an actor.


624 CHAPTER 19. SOCIAL NETWORK ANALYSIS


The notion of PageRank, discussed in the previous chapter, is often used as a measure of prestige.


Measures of centrality are naturally defined for undirected networks, whereas measures of prestige are designed for directed networks. However, it is possible to generalize centrality measures to directed networks. In the following, centrality measures will be defined for undirected networks, whereas prestige measures will be defined for directed networks.

19.2.5.1 Degree Centrality and Prestige


The degree centrality CD(i) of a node i of an undirected network is equal to the degree of the node, divided by the maximum possible degree of the nodes. The maximum possible degree of a node in the network is one less than the number of nodes in the network. Therefore, if Degree(i) is the degree of node i, then the degree centrality CD(i) of node i is defined as follows:


Degree(i)
CD(i) = 1 (19.5)


n −

Because nodes with higher degree are often hub nodes, they tend to be more central to the network and bring distant parts of the network closer together. The major problem with degree centrality is that it is rather myopic in that it does not consider nodes beyond the immediate neighborhood of a given node i. Therefore, the overall structure of the network is ignored to some extent. For example, in Fig. 19.1a, node 1 has the highest degree centrality, but it cannot be viewed as central to the network itself. In fact, node 1 is closer to the periphery of the network.


Degree prestige is defined for directed networks only, and uses the indegree of the node, rather than its degree. The idea is that only a high indegree contributes to the prestige because the indegree of a node can be viewed as a vote for the popularity of the node, similar to PageRank. Therefore, the degree prestige PD(i) of node i is defined as follows:





PD(i) =

Indegree(i)

(19.6)




n − 1













For example, node 1 has the highest degree prestige in Fig. 19.1b. It is possible to generalize this notion recursively by taking into account the prestige of nodes pointing to a node, rather than simply the number of nodes. This corresponds to the rank prestige, which will be discussed later in this section.

The notion of centrality can also be extended to the node outdegree. This is defined as the gregariousness of a node. Therefore, the gregariousness GD(i) of a node i is defined as follows:


Outdegree(i)


GD(i) = 1 (19.7)


n −

The gregariousness of a node defines a different qualitative notion than prestige because it quantifies the propensity of an individual to seek out new connections (such as following many other actors in Twitter), rather than his or her popularity with respect to other actors.


19.2.5.2 Closeness Centrality and Proximity Prestige


The example of Fig. 19.1a shows that the degree centrality criterion is susceptible to picking nodes on the periphery of the network with no regard to their indirect relationships to other nodes. In this context, closeness centrality is more effective.



19.2. SOCIAL NETWORKS: PRELIMINARIES AND PROPERTIES

625
































Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   362   363   364   365   366   367   368   369   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin