M (
|
|
,
|
|
) = |xi − yi| + M (
|
|
,
|
|
).
|
(3.17)
|
|
Xi
|
Yi
|
Xi−1
|
Yi−1
|
|
Note that the indices of both series are reduced by 1 in the right- hand side because of the one-to-one matching. In DTW, both indices need not reduce by 1 unit because a many-to-one mapping is allowed. Rather, any one or both indices may reduce by 1, depending on the best match between the two time series (or sequences). The index that did not reduce by 1 corresponds to the repeated element. The choice of index reduction is naturally defined, recursively, as an optimization over the various options.
Let DT W (i, j) be the optimal distance between the first i and first j elements of two time series X = (x1 . . . xm) and Y = (y1 . . . yn), respectively. Note that the two time series are of lengths m and n, which may not be the same. Then, the value of DT W (i, j) is defined recursively as follows:
DT W (i, j) = distance(xi, yj ) + min
|
|
DT W (i, j − 1)
|
repeat xi
|
.(3.18)
|
|
|
DT W (i
|
−
|
1, j)
|
repeat yj
|
|
|
|
|
1, j − 1)
|
repeat neither
|
|
|
|
|
DT W (i −
|
|
|
|
|
|
|
|
|
|
|
The value of distance(xi, yj ) may be defined in a variety of ways, depending on the appli-cation domain. For example, for continuous time series, it may be defined as |xi − yi|p, or by a distance that accounts for (behavioral attribute) scaling and translation. For discrete sequences, it may be defined using a categorical measure. The DTW approach is primarily focused on warping the contextual attribute, and has little to do with the nature of the behavioral attribute or distance function. Because of this fact, time warping can easily be extended to multiple behavioral attributes by simply using the distances along multiple attributes in the recursion.
Equation 3.18 yields a natural iterative approach. The approach starts by initializing DT W (0, 0) to 0, DT W (0, j) to ∞ for j ∈ {1 . . . n}, and DT W (i, 0) to ∞ for i ∈ {1 . . . m}. The algorithm computes DT W (i, j) by repeatedly executing Eq. 3.18 with increasing index values of i and j. This can be achieved by a simple nested loop in which the indices i and j increase from 1 to m and 1 to n, respectively:
3.4. TEMPORAL SIMILARITY MEASURES
|
81
|
Figure 3.9: Illustration of warping paths
for i = 1 to m
for j = 1 to n
compute DT W (i, j) using Eq. 3.18
The aforementioned code snippet is a nonrecursive and iterative approach. It is also possible to implement a recursive computer program by directly using Eq. 3.18. Therefore, the approach requires the computation of all values of DT W (i, j) for every i ∈ [1, m] and every j ∈ [1, n]. This is a m × n grid of values, and therefore the approach may require O(m · n) iterations, where m and n are lengths of the series.
The optimal warping can be understood as an optimal path through different values of i and j in the m × n grid of values, as illustrated in Fig. 3.9. Three possible paths, denoted by A, B, and C, are shown in the figure. These paths only move to the right (increasing i and repeating yj ), upward (increasing j and repeating xi), or both (repeating neither).
A number of practical constraints are often added to the DTW computation. One com-monly used constraint is the window constraint that imposes a minimum level w of positional alignment between matched elements. The window constraint requires that DT W (i, j ) be computed only when |i − j | ≤ w. Otherwise, the value may be set to ∞ by default. For example, the paths B and C in Fig. 3.9 no longer need to be computed. This saves the computation of many values in the dynamic programming recursion. Correspondingly, the computations in the inner variable j of the nested loop above can be saved by constraining the index j, so that it is never more than w units apart from the outer loop variable i. Therefore, the inner loop index j is varied from max{0, i − w} to min{n, i + w}.
The DTW distance can be extended to multiple behavioral attributes easily, if it is assumed that the different behavioral attributes have the same time warping. In this case, the recursion is unchanged, and the only difference is that distance(xi, yj ) is computed using a vector-based distance measure. We have used a bar on xi and yj to denote that these are vectors of multiple behavioral attributes. This multivariate extension is discussed in Sect. 16.3.4.1 of Chap. 16 for measuring distances between 2-dimensional trajectories.
82 CHAPTER 3. SIMILARITY AND DISTANCES
3.4.1.4 Window-Based Methods
The example in Fig. 3.7 illustrates a case where dropped readings may cause a gap in the matching. Window-based schemes attempt to decompose the two series into windows and then “stitch” together the similarity measure. The intuition here is that if two series have many contiguous matching segments, they should be considered similar. For long time series, a global match becomes increasingly unlikely. The only reasonable choice is the use of windows for measurement of segment-wise similarity.
Consider two time series X and Y , and let X1 . . . Xr and Y1 . . . Yr be temporally ordered and nonoverlapping windows extracted from the respective series. Note that some windows from the base series may not be included in these segments at all. These correspond to the noise segments that are dropped. Then, the overall similarity between X and Y can be computed as follows:
Dostları ilə paylaş: |