Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	57/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 53 54 55 56 57 58 59 60 ... 423

1-Data Mining tarjima

LCSS(i	− 1, j − 1) + 1
only if x_i = y_j
	−
LCSS(i, j) = max LCSS(i		1, j)	otherwise (no match on x_i) .(3.21)
LCSS(i, j − 1)			otherwise (no match on y_j )

Furthermore, the boundary conditions need to be set up. The values of LCSS(i, 0) and LCSS(0, j) are always equal to 0 for any value of i and j . As in the case of the DTW and edit-distance computations, a nested loop can be set up to compute the final value. A recursive computer program can also be implemented that uses the aforementioned recursive relationship. Although the LCSS approach is defined for a discrete sequence, it can also be applied to a continuous time series after discretizing the time-series values into a sequence of categorical values. Alternatively, one can discretize the time-series movement between two contiguous time stamps. The particular choice of discretization depends on the goals of the application at hand.

3.5. GRAPH SIMILARITY MEASURES

Figure 3.10: Shortest path versus homophily

3.5 Graph Similarity Measures

The similarity in graphs can be measured in diﬀerent ways, depending on whether the similarity is being measured between two graphs, or between two nodes in a single graph. For simplicity, undirected networks are assumed, though the measures can be easily generalized to directed networks.

3.5.1 Similarity between Two Nodes in a Single Graph

Let G = (N, A) be an undirected network with node set N and edge set A. In some domains, costs are associated with nodes, whereas in others, weights are associated with nodes. For example, in domains such as bibliographic networks, the edges are naturally weighted, and in road networks, the edges naturally have costs. Typically, distance functions work with costs, whereas similarity functions work with weights. Therefore, it may be assumed that either the cost c_ij , or the weight w_ij of the edge (i, j) is specified. It is often possible to convert costs into weights (and vice versa) using simple heuristic kernel functions that are

chosen in an application-specific way. An example is the heat kernel K(x) = e−x²/t² .

It is desired to measure the similarity between any pair of nodes i and j. The principle of similarity between two nodes in a single graph is based on the concept of homophily in real networks. The principle of homophily is that nodes are typically more similar in a network when they are connected to one another with edges. This is common in many domains such as the Web and social networks. Therefore, nodes that are connected via short paths and many paths should be considered more similar. The latter criterion is closely related to the concept of connectivity between nodes. The first criterion is relatively easy to implement with the use of the shortest-path algorithm in networks.

3.5.1.1 Structural Distance-Based Measure

The goal here is to measure the distances from any source node s to any other node in the network. Let SP (s, j ) be the shortest-path distance from source node s to any node j. The value of SP (s, j) is initialized to 0 for j = s and ∞ otherwise. Then, the distance computation of s to all other nodes in the network may be summarized in a single step that is performed exactly once for each node in the network in a certain order:

Among all nodes not examined so far, select the node i with the smallest value of SP (s, i) and update the distance labels of each of its neighbors j as follows:

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 53 54 55 56 57 58 59 60 ... 423