Real-time analysis: In real-time analysis, the data points in one or more series are analyzed in real time, to make predictions. Typically, a small window of recent history is used over the different data streams for the analysis. Examples of such analysis include forecasting, deviation detection, or event detection. When multiple series are available, they are typically analyzed in a temporally synchronized way. Even in cases where data mining applications such as clustering are applied to these problems, the analysis is typically performed in real time.
Retrospective analysis: In retrospective analysis, the time series data is already avail-able, and subsequently analyzed. The analysis of different time series within a database is sometimes not synchronized over time. For example, in a time series database of ECG readings, the data may have been recorded over different periods.
14.2. TIME SERIES PREPARATION AND SIMILARITY
|
459
|
Both these forms of analysis are useful in different kinds of applications. Furthermore, these two scenarios have different interpretations for the same applications such as clustering or outlier detection. These issues are discussed in more detail in later sections.
This chapter is organized as follows. The next section presents methods for time series preparation and similarity. Because the methods for time series similarity have already been discussed in detail in Chap. 3, they are summarized only briefly in this chapter. The reader is referred to the relevant sections of Chap. 3 for the different time series similarity measures. The problem of time series forecasting is discussed in Sect. 14.3. Time series motif discovery is discussed in Sect. 14.4. Section 14.5 addresses the problem of clustering time series. Outlier detection is discussed in Sect. 14.6. Time series classification is discussed in Sect. 14.7. The summary of the chapter is presented in Sect. 14.8.
14.2 Time Series Preparation and Similarity
Time series data may be either univariate or multivariate. In univariate time series data, a single behavioral attribute is associated with each time instant. In multivariate time series data, multiple behavioral attributes are associated with each time instant. The dimensional-ity of the time series, therefore, refers to the number of behavioral attributes being tracked.
Definition 14.2.1 (Multivariate Time Series Data) A time series of length n and dimensionality d contains d numeric features at each of n timestamps t1 . . . tn. Each times-tamp contains a component for each of the d series. Therefore, the set of values received at timestamp ti is Yi = (yi1 . . . yid). The value of the jth series at timestamp ti is yij
In a univariate time series, the value of d is 1. In such cases, a series of length n is represented as a set of scalar behavioral values y1 . . . yn, associated with the timestamps t1 . . . tn.
14.2.1 Handling Missing Values
It is common for time series data to contain missing values. Furthermore, the values of the series may not be synchronized in time when they are collected by independent sensors. It is often convenient to have time series values that are equally spaced and synchronized across different behavioral attributes for data processing. The most common methodology used for handling missing, unequally spaced, or unsynchronized values is linear interpolation. The idea is to create estimated values at the desired time stamps. These can be used to generate multivariate time series that are synchronized, equally spaced, and have no missing values.
Consider the scenario where yi and yj are values of the time series at times ti and tj , respectively, where i < j. Let t be a time drawn from the interval (ti, tj ). Then, the interpolated value of the series is given by:
Dostları ilə paylaş: |