p
yj = a · yi + c +
t r t−r t
This is similar to the AR(p) model, except that the elements of stream i are being used to predict those of stream j. As in the case of the AR(p) model, least-squares regression
14.5. TIME SERIES CLUSTERING
|
479
|
Algorithm UpdateClusters(Multivariate Stream: Y1 . . . Yt . . .
Current Set of Representatives: J)
begin
Receive next time-stamp Yt of multivariate stream;
repeat
Add a stream to J that leads to the maximum decrease in regression error of the clustering; Drop the stream from J that leads to the least increase in regression error of the clustering;
Assign each series to closest representative in J to create the clustering C;
until(J did not change in previous iteration); return(J, C);
end
Figure 14.9: Dynamically maintaining cluster representatives
can be used to learn p coefficients. Furthermore, the training data is restricted to a win-dow of size w > p to allow for stream evolution. The squared average of the white noise error terms, or the R2-statistic over the window of size w > p, can be used as the distance (similarity) between the two streams. Note that the regression coefficients can also be main-tained incrementally because they are already known at the previous timestamp, and the model simply needs to be updated with the addition of a single data point. Most iterative optimization methods for least-squares regression, such as gradient descent, converge very fast when starting with a near-optimal solution.
This regression-based similarity function is not symmetric because the error of predicting stream j from stream i is different from the error of predicting stream i from stream j. The representative set J can also be used to create a clustering C of the streams, by assigning each stream to the representative, that best predicts it. Thus, at each step, the set of representatives J and clusters C can be incrementally reported, after updating the model. The pseudocode for the online stream clustering approach is illustrated in Fig. 14.9. This approach can be useful for trend analysis in financial markets, where a representative set of stocks needs be tracked from the vast universe of stocks. Another relevant application is that of sensor selection, where a subset of representative sensors need to be determined to lower the operational costs of sensor networks. The description in this section is based on a simplification of a cost-based dynamic sensor selection algorithm [50].
14.5.2 Shape-Based Clustering
The second type of clustering is derived from shape-based clustering. In this case, the different time series may not be synchronized in time. The time series are clustered on the basis of the similarity of the shape of the overall series. A first step is the design of a shape-based similarity function. A major challenge in this case is that the different series may be scaled, translated, or stretched differently. This issue was discussed in Sect. 3.4.1 of Chap. 3. The illustration of Fig. 3.7 is replicated in Fig. 14.10. This figure illustrates different hypothetical stock tickers. In these cases, the three stocks show similar patterns, but with different scaling and random variations. Furthermore, in some cases, the time dimension may also be warped. For example, in Fig. 14.10, the entire set of values for
480 CHAPTER 14. MINING TIME SERIES DATA
|
40
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
STOCK A
|
|
|
|
|
|
|
|
|
|
|
35
|
|
STOCK B (DROPPED READINGS)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
STOCK C
|
|
|
|
|
|
|
|
|
|
|
30
|
|
STOCK A (WARPED)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25
|
|
|
|
|
|
|
|
|
|
|
|
VALUE
|
20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15
|
|
|
|
|
|
|
|
|
|
|
|
|
10
|
|
|
|
|
|
|
|
|
|
|
|
|
5
|
|
|
|
|
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
|
|
|
|
|
|
0
|
50
|
100
|
150
|
200
|
250
|
300
|
350
|
400
|
450
|
500
|
|
|
|
|
|
|
|
TIME
|
|
|
|
|
|
|
Figure 14.10: Impact of scaling, translation and noise on clustering (revisiting Fig. 3.7)
stock A was stretched because the time-granularity information was not available to the analyst. This is referred to as time warping. Fortunately, the dynamic time warping (DTW) similarity function, discussed in Sect. 3.4.1 of Chap. 3, can address these issues. The design of an effective similarity function is, therefore, one of the most crucial steps in time series clustering.
Many existing methods can be adapted to shape-based time series clustering with the use of different time series similarity functions. The k-medoids and graph-based methods can be used with almost any similarity function. Methods such as k -means can also be used, though in a more limited way. This is because the different time series need to be of the same length in order for the mean of a cluster to be defined meaningfully.
14.5.2.1 k-Means
The k-means method for multidimensional data is discussed in Sect. 6.3.1 of Chap. 6. This method can be adapted to time series data, by changing the similarity function and the computation of the means of the time series. The computation of the similarity function can be adapted from Sect. 3.4.1 of Chap. 3. The precise choice of the similarity function may depend on the application at hand, though the k-means approach is optimized for the Euclidean distance function. This is because the k-means approach can be viewed as an iterative solution to an optimization problem, in which the objective function is constructed with the Euclidean distance. This aspect is discussed in more detail in Chap. 6.
The Euclidean distance function on a time series is defined in the same way as multidi-mensional data. The means of the different time series are also defined in a similar way to multidimensional data. The k-means method is best used for databases of series of the same length with a one-to-one correspondence between the time points. Thus, the centroid for each time point can be defined using the correspondence. Time warping is typically difficult to use in k-means algorithms in a meaningful way, because of the assumption of one-to-one correspondence between the time series data points. For more generic distance functions such as DTW, other methods for time series clustering are more appropriate.
14.5.2.2 k-Medoids
The main problem with the k-means approach is the fact that it cannot incorporate arbitrary similarity (or distance) functions. The k-medoids approach can be used more effectively in
Dostları ilə paylaş: |