14.7. TIME SERIES CLASSIFICATION
|
487
|
(Offline Batch) Learn the coefficients α1 . . . αd that best distinguish between the true and normal periods. The details of this step are discussed later in this section.
(Real Time) Determine the (absolute) deviation level for each timeseries data stream, with the use of any forecasting method discussed in Sect. 14.3. These correspond the absolute values of the white noise error terms. Let the absolute deviation level of stream j at timestamp n be denoted by znj.
(Real Time) Combine the deviation levels for the different streams as follows, to create the composite alarm level:
i=1
The value of Zn is reported as the alarm level at timestamp n. Thresholding can be used on the alarm level to generate discrete labels.
The main step in the last section, which has not yet been discussed, is the determination of the discrimination coefficients α1 . . . α d. These should be selected in the training phase, so as to maximize the differences in the alarm level between the primary events and the normal periods.
To learn the coefficients α1 . . . αd in the training phase, the composite alarm level is aver-aged at the timestamps T1 . . . Tr for all primary events of interest. Note that the composite alarm level at each timestamp Ti is an algebraic expression, which is a linear function of the coefficients α1 . . . αd according to Eq. 14.24. These expressions are added up over the time stamps T1 . . . Tr to create an alarm level Qp(α1 . . . αd) which is a function of (α1, . . . αd).
r
Qp(α1 . . . αd) = i=1 ZTi . (14.25)
r
similar algebraic expression for the normal alarm level Qn(α1 . . . αd) is also computed by using all of the available timestamps, the majority of which are assumed to be normal.
|
n
|
|
|
Qn(α1 . . . αd) =
|
i=1 Zi
|
(14.26)
|
|
n
|
|
|
|
|
As in the case of the event signature, the normal alarm level is also a linear function of α1 . . . αd. Then, the optimization problem is that of determining the optimal values of αi that increase the differential signature between the primary events and the normal alarm level. This optimization problem is as follows:
Maximize Qp(α1 . . . αd) − Qn(α1 . . . αd)
i=1
This optimization problem can be solved using any off-the-shelf iterative optimization solver. In practice, the online event detection and offline learning processes are executed simulta-neously, as new events are encountered. In such cases, the values of αi can be updated incrementally within the iterative optimization solver. The composite alarm level can be reported as an event score. Alternatively, thresholding on the alarm level can be used to generate discrete timestamps at which the events are predicted. The choice of threshold will regulate the trade-off between the precision and recall of the predicted events.
488 CHAPTER 14. MINING TIME SERIES DATA
14.7.2 Whole Series Classification
In whole-series classification, the labels are associated with the entire series, rather than events associated with individual timestamps. It is assumed that a database of N different series is available, and each series has a length of n. Each of the series is associated with a class label drawn from {1 . . . k}.
Many proximity-based classifiers are designed with the help of time series similarity functions. Thus, the effective design of similarity functions is crucial in classification, as is the case in many other time series data mining applications.
In the following, three classification methods will be discussed. Two of these methods are inductive methods, in which only the training instances are used to build a model. These are then used for classification. The third method is a transductive semisupervised method, in which the training and test instances are used together for classification. The semisupervised approach is a graph-based method in which the unlabeled test instances are leveraged for more effective classification.
14.7.2.1 Wavelet-Based Rules
A major challenge in time series classification is that much of the series may be noisy and irrelevant. The classification properties may be exhibited only in temporal segments of varying length in the series. For example, consider the scenario where the series in Fig. 14.11 are presented to a learner with labels. In the case where the label corresponds to a recession (Fig. 14.11a), it is important for a learner to analyze the trends for a period of a few weeks or months in order to determine the correct labels. On the other hand, where the label corresponds to the occurrence of a flash crash (Fig. 14.11b), it is important for a learner to be able to extract out the trends over the period of a day.
For a given learning problem, it may not be known a priori what level of granularity should be used for the learning process. The Haar wavelet method provides a multigranu-larity decomposition of the time series data to handle such scenarios. As discussed in Sect. 14.4 on time series motifs, wavelets are an effective way to determine frequent trends over varying levels of granularity. It is therefore natural to combine multigranular motif discovery with associative classifiers.
Readers are advised to refer to Sect. 2.4.4.1 of Chap. 2 for a discussion of wavelet decomposition methods. The Haar wavelet coefficient of order i analyzes trends over a time period, which is proportional to 2−i · n, where n is the full length of the series. Specifically, the coefficient value is equal to half the difference between the average values of the first half and second half of the time period of length 2−i · n. Because the Haar wavelet represents the coefficients of different orders in the transformation, it automatically accounts for trends of different granularity. In fact, an arbitrary shape in any particular window of the series can usually be well approximated by an appropriate subset of wavelet coefficients. These can be considered signatures that are specific to a particular class label. The goal of the rule-based method is to discover signatures that are specific to particular class labels. Therefore, the overall training approach in the rule-based method is as follows:
Generate wavelet representation of each of the N time series to create N numeric multidimensional representations.
Discretize wavelet representation to create categorical representations of the time series wavelet transformation. Thus, each categorical attribute value represents a range of numeric values of the wavelet coefficients.
Dostları ilə paylaş: |