LCSS(i
|
− 1, j − 1) + 1
|
only if xi = yj
|
|
−
|
|
|
LCSS(i, j) = max LCSS(i
|
|
1, j)
|
otherwise (no match on xi) .(3.21)
|
LCSS(i, j − 1)
|
otherwise (no match on yj )
|
|
|
|
|
Furthermore, the boundary conditions need to be set up. The values of LCSS(i, 0) and LCSS(0, j) are always equal to 0 for any value of i and j . As in the case of the DTW and edit-distance computations, a nested loop can be set up to compute the final value. A recursive computer program can also be implemented that uses the aforementioned recursive relationship. Although the LCSS approach is defined for a discrete sequence, it can also be applied to a continuous time series after discretizing the time-series values into a sequence of categorical values. Alternatively, one can discretize the time-series movement between two contiguous time stamps. The particular choice of discretization depends on the goals of the application at hand.
3.5. GRAPH SIMILARITY MEASURES
|
85
|
Figure 3.10: Shortest path versus homophily
3.5 Graph Similarity Measures
The similarity in graphs can be measured in different ways, depending on whether the similarity is being measured between two graphs, or between two nodes in a single graph. For simplicity, undirected networks are assumed, though the measures can be easily generalized to directed networks.
3.5.1 Similarity between Two Nodes in a Single Graph
Let G = (N, A) be an undirected network with node set N and edge set A. In some domains, costs are associated with nodes, whereas in others, weights are associated with nodes. For example, in domains such as bibliographic networks, the edges are naturally weighted, and in road networks, the edges naturally have costs. Typically, distance functions work with costs, whereas similarity functions work with weights. Therefore, it may be assumed that either the cost cij , or the weight wij of the edge (i, j) is specified. It is often possible to convert costs into weights (and vice versa) using simple heuristic kernel functions that are
chosen in an application-specific way. An example is the heat kernel K(x) = e−x2/t2 .
It is desired to measure the similarity between any pair of nodes i and j. The principle of similarity between two nodes in a single graph is based on the concept of homophily in real networks. The principle of homophily is that nodes are typically more similar in a network when they are connected to one another with edges. This is common in many domains such as the Web and social networks. Therefore, nodes that are connected via short paths and many paths should be considered more similar. The latter criterion is closely related to the concept of connectivity between nodes. The first criterion is relatively easy to implement with the use of the shortest-path algorithm in networks.
3.5.1.1 Structural Distance-Based Measure
The goal here is to measure the distances from any source node s to any other node in the network. Let SP (s, j ) be the shortest-path distance from source node s to any node j. The value of SP (s, j) is initialized to 0 for j = s and ∞ otherwise. Then, the distance computation of s to all other nodes in the network may be summarized in a single step that is performed exactly once for each node in the network in a certain order:
Among all nodes not examined so far, select the node i with the smallest value of SP (s, i) and update the distance labels of each of its neighbors j as follows:
Dostları ilə paylaş: |