Os =
|
|
T Ls
|
|
=
|
|
T(I − S)
|
|
(19.38)
|
|
Zc
|
Zc
|
Zc
|
Zc
|
|
c=1
|
|
|
|
c=1
|
|
|
This term is referred to as the smoothness term because it ensures that the predicted label propensities Z vary smoothly along edges, especially if the weights of the edges are large. This term can also be viewed as a local consistency term.
Label-fitting objective: Because the embedding Z is designed to mimic Y as closely as possible the value of ||Zc − Yc||2 should be as small as possible for each class c. Note that unlabeled nodes are included within ||Zc − Yc||2, and for those nodes, this term serves as a regularizer. The goal of a regularizer is to avoid7 ill-conditioned solutions and overfitting in optimization models.
k
|
|
|
Of =||
|
|
−
|
|
||2
|
(19.39)
|
|
Yc
|
Zc
|
|
c=1
|
|
|
This term can also be viewed as a global consistency term.
The overall objective function may be constructed as O = Os + μOf , where μ defines the weight of the label-fitting term. The parameter μ reflects the trade-off between the two criteria. Therefore, the overall objective function may be written as follows:
k
|
|
T(I − S)
|
|
k
|
|
|
O =
|
|
|
+μ ||
|
|
−
|
|
||2
|
(19.40)
|
|
Zc
|
Zc
|
Yc
|
Zc
|
|
c=1
|
|
|
|
c=1
|
|
|
To optimize this objective function, one must use the partial derivative with respect to the different decision variables in Zc and set it to zero. This yields the following condition:
(I−S)
|
|
+ μ(
|
|
−
|
|
) = 0 ∀c ∈ {1 . . . k}
|
(19.41)
|
|
Zc
|
Zc
|
Yc
|
|
Because this condition is true for each class c ∈ {1 . . . k} one can write the aforementioned condition in matrix form as well:
(I−S)Z+μ(Z−Y)=0
One can rearrange this optimization condition as follows:
(19.42)
(19.43)
The goal is to determine a solution Z of this optimization condition. This can be achieved iteratively by initializing Z(0) = Y and then iteratively updating Z( t+1) from Z( t) as follows:
Z(t+1) =
|
SZ(t)
|
+
|
μ
|
(19.44)
|
|
|
|
|
Y
|
|
|
1 + μ
|
|
|
|
|
|
1 + μ
|
|
|
7In this case, the regularizer ensures that no single entry in Zc for unlabeled nodes is excessively large.
19.4. COLLECTIVE CLASSIFICATION
|
649
|
This solution is iterated to convergence. It can be shown that the approach converges to the following solution:
|
μ
|
I +
|
S
|
+
|
|
|
S
|
2
|
|
|
|
|
|
μ
|
|
|
|
S
|
−1
|
|
Z(∞) =
|
|
|
|
|
|
+
|
. . .
|
Y =
|
|
|
I −
|
|
|
Y(19.45)
|
|
1 + μ
|
1 + μ
|
1 + μ
|
1 + μ
|
1 + μ
|
|
Intuitively, the matrix
|
I −
|
|
S
|
−1
|
=
|
I +
|
|
S
|
+
|
S
|
2
|
+...
|
|
is an n × n matrix of
|
|
1+μ
|
|
|
1+μ
|
1+μ
|
|
|
|
pairwise weighted Katz coefficients (cf. Definition 19.5.4) between nodes. In other words, the propensity of node i to belong to class j is predicted as a sum of its weighted Katz coefficients with respect to labeled nodes of class j. Because the Katz measure predicts links (cf. Sect. 19.5) between nodes, this approach illustrates the connections between collective classification and link prediction.
It is possible to learn the optimal value of μ with the use of cross-validation. It is noteworthy that, unlike the aforementioned label propagation algorithm with absorbing states, this approach only biases Z with the labels, and it does not constrain the rows in Z to be the same as the corresponding rows of Y for labeled nodes. In fact, the matrix Z can provide a label prediction for an already labeled node that is different from its original training label. Such cases are likely to occur when the original labeling in the training data is error-prone and noisy. The regularization approach is, therefore, more flexible and robust for networks containing noisy and error-prone training labels.
19.4.3.3 Connections with Random Walk Methods
Even though the graph regularization approach is derived using spectral methods, it is also related to random walk methods. The n × k matrix-based update Eq. 19.44 can be decomposed into k different vector-based update equations, one for each n-dimensional column Zc of Z:
|
|
|
|
|
|
|
μ
|
|
|
|
|
|
|
|
SZc
|
+
|
|
|
|
|
Zc =
|
|
∀c ∈ {1 . . . k}
|
(19.46)
|
|
|
|
|
Yc
|
|
1 + μ
|
1 + μ
|
|
Each of these update equations is algebraically similar to a personalized PageRank equation where S replaces the transition matrix and the restart probability is 1+μμ at labeled nodes belonging to a particular class c. The vector Yc is analogous to the personalized restart vector for class c multiplied with the number of training nodes in class c. Similarly, the vector Zc is analogous to the personalized PageRank vector of class c multiplied with the number of training nodes in class c. Therefore, the class-specific Eq. 19.46 can be viewed as a personalized PageRank equation, scaled in proportion to the prior probability of class c. Of course, the symmetric matrix S is not truly a stochastic transition matrix because its columns do not sum to 1. Therefore, the results cannot formally be viewed as personalized PageRank probabilities.
Nevertheless, this algebraic similarity to personalized PageRank suggests the possibility of a closely related family of random walk methods, similar to label propagation. For exam-ple, instead of using a nonstochastic matrix S derived from spectral clustering, one might use a stochastic transition matrix P . In other words, Eqs. 19.27 and 19.28 are used to derive
= Λ−1W . However, one difference from the transition matrix P used in label-propagation methods is that the network structure is not altered to create absorbing states. In other words, the directed transition graph of Fig. 19.11a is used, rather than that of Fig. 19.11b to derive P . Replacing S with P in Eq. 19.46 leads to a variant of the label propagation
650 CHAPTER 19. SOCIAL NETWORK ANALYSIS
update (cf. Eq. 19.35) in which labeled nodes are no longer constrained to be predicted to their original label.
Replacing S with P T in Eq. 19.46 leads to the (class-prior scaled) personalized PageR-ank equations. This is equivalent to executing the personalized PageRank algorithm k times, where the personalization vector for the cth execution restarts at labeled nodes belonging to the cth class. Each class-specific personalized PageRank probability is multiplied with the prior probability of that class, or, equivalently, the number of labeled training nodes in that class. For each node, the class index that yields the highest (prior-scaled) person-alized PageRank probability is reported. The performance of these alternative methods is dependent on the data set at hand.
19.5 Link Prediction
In many social networks, it is desirable to predict future links between pairs of nodes in the network. For example, commercial social networks, such as Facebook, often recommend users as potential friends. In general, both structure and content similarity may be used to predict links between pairs of nodes. These criteria are discussed below:
Structural measures: Structural measures typically use the principle of triadic closure to make predictions. The idea is that two nodes that share similar nodes in their neighborhoods are more likely to become connected in the future, if they are not already connected.
Content-based measures: In these cases, the principle of homophily is used to make predictions. The idea is that nodes that have similar content are more likely to become linked. For example, in a bibliographic network containing scientific co-author rela-tions, a node containing the keyword “data mining” is more likely to be connected to another node containing the keyword “machine learning.”
While content-based measures have been shown to have potential in enhancing link predic-tion, the results are rather sensitive to the network at hand. For example, in a network such as Twitter, where the content is the form of short and noisy tweets with many nonstandard acronyms, content -based measures are not particularly effective. Furthermore, while struc-tural connectivity usually implies content-based homophily, the reverse is not always true. Therefore, the use of content similarity has mixed results in different network domains. On the other hand, structural measures are almost always effective in different types of net-works. This is because triadic closure is ubiquitous across different network domains and has more direct applicability to link prediction.
19.5.1 Neighborhood-Based Measures
Neighborhood-based measures use the number of common neighbors between a pair of nodes i and j, in different ways, to quantify the likelihood of a link between them in the future. For example, in Fig. 19.12a, Alice and Bob share four common neighbors. Therefore, it is reasonable to conjecture that a link might eventually form between them. In addition to their common neighbors, they also have their own disjoint sets of neighbors. There are different ways of normalizing neighborhood-based measures to account for the number and relative importance of different neighbors. These are discussed below.
19.5. LINK PREDICTION
|
651
|
Definition 19.5.1 (Common Neighbor Measure) The common-neighbor measure between nodes i and j is equal to the number of common neighbors between nodes i and j. In other words, if Si is the neighbor set of node i, and Sj is the neighbor set of node j, the common-neighbor measure is defined as follows:
CommonNeighbors(i, j) = |Si ∩ Sj |
|
(19.47)
|
The major weakness of the common-neighbor measure is that it does not account for the relative number of common neighbors between them as compared to the number of other connections. In the example of Fig. 19.12a, Alice and Bob each have a relatively small node degree. Consider a different case in which Alice and Bob are either spammers or very popular public figures who were connected to a large number of other actors. In such a case, Alice and Bob might easily have many neighbors in common, just by chance. The Jaccard measure is designed to normalize for varying degree distributions.
Definition 19.5.2 (Jaccard Measure) The Jaccard-based link prediction measure between nodes i and j is equal to the Jaccard coefficient between their neighbor sets Si and Sj , respec-tively.
JaccardPredict(i, j) =
|
|Si ∩ Sj |
|
(19.48)
|
|
|Si ∪ Sj |
|
|
|
|
|
The Jaccard measure between Alice and Bob in Fig. 19.12(a) is 4/9. If the degrees of either Alice or Bob were to increase, it would result in a lower Jaccard coefficient between them. This kind of normalization is important, because of the power-law degree distributions of nodes.
The Jaccard measure adjusts much better to the variations in the degrees of the nodes between which the link prediction is measured. However, it does not adjust well to the degrees of their intermediate neighbors. For example, in Fig. 19.12a, the common neighbors of Alice and Bob are Jack, John, Jill, and Mary. However, all these common neighbors could be very popular public figures with very high degrees. Therefore, these nodes are statistically more likely to occur as common neighbors of many pairs of nodes. This makes them less important in the link prediction measure. The Adamic–Adar measure is designed to account for the varying importance of the different common neighbors. It can be viewed as a weighted version of the common -neighbor measure, where the weight of a common neighbor is a decreasing function of its node degree. The typical function used in the case of the Adamic–Adar measure is the inverse logarithm. In this case, the weight of the common neighbor with index k is set to 1/log(|Sk|), where Sk is the neighbor set of node k.
Definition 19.5.3 (Adamic–Adar Measure) The common-neighbor measure between nodes i and j is equal to the weighted number of common neighbors between nodes i and j. The weight of node k is defined is 1/log(|Sk|).
AdamicAdar(i, j) =
|
1
|
(19.49)
|
|
|
|
log(|Sk|)
|
|
k∈Si∩Sj
|
|
|
The base of the logarithm does not matter in the previous definition, as long as it is chosen consistently for all pairs of nodes. In Fig. 19.12a, the Adamic-Adar measure between Alice
and Bob is + + + = .
652
|
|
CHAPTER 19. SOCIAL NETWORK ANALYSIS
|
|
|
JACK
|
|
|
|
|
|
|
Dostları ilə paylaş: |