Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	296/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 292 293 294 295 296 297 298 299 ... 423

1-Data Mining tarjima

14.7. TIME SERIES CLASSIFICATION

487

(Oﬄine Batch) Learn the coeﬃcients α₁ . . . α_d that best distinguish between the true and normal periods. The details of this step are discussed later in this section.

(Real Time) Determine the (absolute) deviation level for each timeseries data stream, with the use of any forecasting method discussed in Sect. 14.3. These correspond the absolute values of the white noise error terms. Let the absolute deviation level of stream j at timestamp n be denoted by z_nj.

(Real Time) Combine the deviation levels for the diﬀerent streams as follows, to create the composite alarm level:

d
Z_n =α_iz_nⁱ	(14.24)

i=1

The value of Z_n is reported as the alarm level at timestamp n. Thresholding can be used on the alarm level to generate discrete labels.

The main step in the last section, which has not yet been discussed, is the determination of the discrimination coeﬃcients α₁ . . . α _d. These should be selected in the training phase, so as to maximize the diﬀerences in the alarm level between the primary events and the normal periods.

To learn the coeﬃcients α₁ . . . α_d in the training phase, the composite alarm level is aver-aged at the timestamps T₁ . . . T_r for all primary events of interest. Note that the composite alarm level at each timestamp T_i is an algebraic expression, which is a linear function of the coeﬃcients α₁ . . . α_d according to Eq. 14.24. These expressions are added up over the time stamps T₁ . . . T_r to create an alarm level Q^p(α₁ . . . α_d) which is a function of (α₁, . . . α_d).

r
Q^p(α₁ . . . α_d) = ⁱ⁼¹^Z^Ti . (14.25)

r

similar algebraic expression for the normal alarm level Qn(α₁ . . . α_d) is also computed by using all of the available timestamps, the majority of which are assumed to be normal.

	n
Qⁿ(α₁ . . . α_d) =	_i₌₁^Zi	(14.26)
Qⁿ(α₁ . . . α_d) =	n	(14.26)
	n

As in the case of the event signature, the normal alarm level is also a linear function of α₁ . . . α_d. Then, the optimization problem is that of determining the optimal values of α_i that increase the diﬀerential signature between the primary events and the normal alarm level. This optimization problem is as follows:

Maximize Q^p(α₁ . . . α_d) − Qⁿ(α₁ . . . α_d)

	d
subject to:	α²	= 1
	i

i=1

This optimization problem can be solved using any oﬀ-the-shelf iterative optimization solver. In practice, the online event detection and oﬄine learning processes are executed simulta-neously, as new events are encountered. In such cases, the values of α_i can be updated incrementally within the iterative optimization solver. The composite alarm level can be reported as an event score. Alternatively, thresholding on the alarm level can be used to generate discrete timestamps at which the events are predicted. The choice of threshold will regulate the trade-oﬀ between the precision and recall of the predicted events.

488 CHAPTER 14. MINING TIME SERIES DATA

14.7.2 Whole Series Classification

In whole-series classification, the labels are associated with the entire series, rather than events associated with individual timestamps. It is assumed that a database of N diﬀerent series is available, and each series has a length of n. Each of the series is associated with a class label drawn from {1 . . . k}.

Many proximity-based classifiers are designed with the help of time series similarity functions. Thus, the eﬀective design of similarity functions is crucial in classification, as is the case in many other time series data mining applications.

In the following, three classification methods will be discussed. Two of these methods are inductive methods, in which only the training instances are used to build a model. These are then used for classification. The third method is a transductive semisupervised method, in which the training and test instances are used together for classification. The semisupervised approach is a graph-based method in which the unlabeled test instances are leveraged for more eﬀective classification.

14.7.2.1 Wavelet-Based Rules

A major challenge in time series classification is that much of the series may be noisy and irrelevant. The classification properties may be exhibited only in temporal segments of varying length in the series. For example, consider the scenario where the series in Fig. 14.11 are presented to a learner with labels. In the case where the label corresponds to a recession (Fig. 14.11a), it is important for a learner to analyze the trends for a period of a few weeks or months in order to determine the correct labels. On the other hand, where the label corresponds to the occurrence of a flash crash (Fig. 14.11b), it is important for a learner to be able to extract out the trends over the period of a day.

For a given learning problem, it may not be known a priori what level of granularity should be used for the learning process. The Haar wavelet method provides a multigranu-larity decomposition of the time series data to handle such scenarios. As discussed in Sect. 14.4 on time series motifs, wavelets are an eﬀective way to determine frequent trends over varying levels of granularity. It is therefore natural to combine multigranular motif discovery with associative classifiers.

Readers are advised to refer to Sect. 2.4.4.1 of Chap. 2 for a discussion of wavelet decomposition methods. The Haar wavelet coeﬃcient of order i analyzes trends over a time period, which is proportional to 2−i · n, where n is the full length of the series. Specifically, the coeﬃcient value is equal to half the diﬀerence between the average values of the first half and second half of the time period of length 2−i · n. Because the Haar wavelet represents the coeﬃcients of diﬀerent orders in the transformation, it automatically accounts for trends of diﬀerent granularity. In fact, an arbitrary shape in any particular window of the series can usually be well approximated by an appropriate subset of wavelet coeﬃcients. These can be considered signatures that are specific to a particular class label. The goal of the rule-based method is to discover signatures that are specific to particular class labels. Therefore, the overall training approach in the rule-based method is as follows:

Generate wavelet representation of each of the N time series to create N numeric multidimensional representations.

Discretize wavelet representation to create categorical representations of the time series wavelet transformation. Thus, each categorical attribute value represents a range of numeric values of the wavelet coeﬃcients.

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 292 293 294 295 296 297 298 299 ... 423