Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	385/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 381 382 383 384 385 386 387 388 ... 423

1-Data Mining tarjima

f (S ∪ {e}) − f (S) ≥ f (T ∪ {e}) − f (T )

(19.52)

Virtually all natural models for quantifying influence turn out to be submodular. Submod-ularity is algorithmically convenient because a very eﬃcient greedy optimization algorithm exists for maximizing submodular functions, as long as f (S) can be evaluated for a given value of S. This algorithm starts by setting S = {} and incrementally adds nodes to S that increase the value of f (S) as much as possible. This procedure is repeated until the set S contains the required number of influencers k. The approximation level of this heuristic is based on a well-known classical result on optimization of submodular functions.

Lemma 19.6.1 The greedy algorithm for maximizing submodular functions provides a solu-

tion with an objective function value that is at least a fraction of the optimal value.

Here, e is the base of the natural logarithm.

Thus, these results show that it is possible to optimize f (S) eﬀectively, as long as an appropriate submodular influence function f (S) can be defined for a given set of nodes S.

Two common approaches for defining the influence function f (S ) of a set of nodes S are the Linear Threshold Model and the Independent Cascade Model. Both these diﬀusion models were proposed in one of the earliest works on social influence analysis. The general operational assumption in these diﬀusion models is that nodes are either in an active or inactive state. Intuitively, an active node is one which has already been influenced by the set of desired behaviors. Once a node moves to an active state, it never deactivates. Depending on the model, an active node may trigger activation of neighboring nodes either for a single time, or over longer periods. Nodes are successively activated until no more nodes are activated in a given iteration. The value of f (S) is evaluated as the total number of activated nodes at termination.

19.6.1 Linear Threshold Model

In this model, the algorithm initially starts with an active set of seed nodes S and iteratively increases the number of active nodes based on the influence of neighboring active nodes. Active nodes are allowed to influence their neighbors over multiple iterations throughout the execution of the algorithm until no more nodes can be activated. The influence of

19.6. SOCIAL INFLUENCE ANALYSIS

657

neighboring nodes is quantified with the use of a linear function of the edge-specific weights b_ij . For each node i in the network G = (N, A), the following is assumed to be true:

b_ij ≤ 1

(19.53)

j:(i,j)∈A

Each node i is associated with a random threshold θ_i ∼ U [0, 1] which is fixed up front and stays constant over the course of the algorithm. The total influence I(i) of the active neighbors of node i on it, at a given time-instant, is computed as the sum of the weights b_ij of all active neighbors of i.

I(i) = b_ij (19.54)

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 381 382 383 384 385 386 387 388 ... 423