Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	287/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 283 284 285 286 287 288 289 290 ... 423

1-Data Mining tarjima

q

^yt = ^ai ^· ^yt−i + ^bi ^· t−i + ^c + t
i=1 i=1

470 CHAPTER 14. MINING TIME SERIES DATA

Thus, this model is virtually identical to the ARM A(p, q) model, except that diﬀerencing is used within the model. If the order of the diﬀerencing is d, then this model is referred to as the ARIM A(p, d, q) model.

14.3.3 Multivariate Forecasting with Hidden Variables

All the aforementioned models are designed for a single time series. In practice, a given application may have thousands of time series, and there may be significant correlations both across diﬀerent series and across time. Therefore, models are required that can combine the autoregressive correlations with the cross-series correlations for making forecasts.

While there are many diﬀerent ways of multivariate forecasting, hidden variables are often used to achieve this goal. This is because the hidden variable approach is able to cleanly separate out the cross-series correlations from the autoregressive correlations in the modeling process. The idea in hidden variable modeling is to transform the large number of cross-correlated time series into a small number of uncorrelated time series. Typically, principal component analysis (PCA) is used for this transformation. Because these diﬀerent series are uncorrelated with one another, it is possible to use any of the AR, ARM A or ARIM A models individually on the series to predict the hidden values. Then, the predicted values are mapped back to their original representation. This provides the forecasted values for all the diﬀerent series with the use of a small number of hidden variable predictions. Readers are advised to revisit Sect. 2.4.3.1 of Chap. 2 for the discussion on PCA before reading further.

It is assumed that there are d synchronized time series of length n. The d diﬀerent time series values received at the ith timestamp are denoted by Y_i = (y_i¹ . . . y_i^d). The goal is to predict Y_n₊₁ from Y₁ . . . Y_n. The steps of the multivariate forecasting approach are as follows:

Construct the d × d covariance matrix of the multidimensional time series. Let the d × d covariance matrix be denoted by C. The (i, j)th entry of C is the covariance between the ith and jth series. This step is identical to the case of multidimensional data, and the temporal ordering among the diﬀerent values of Y_i is not used at this stage. Thus, the covariance matrix only captures information about correlations across series, rather than correlations across time. Note that covariance matrices can also be maintained incrementally in the streaming setting, using an approach discussed in Sect. 20.3.1.4 of Chap. 20.

Determine the eigenvectors of the covariance matrix C as follows:

C = PΛP^T

(14.17)

Here, P is a d × d matrix, whose d columns contain the orthonormal eigenvectors. The matrix Λ is a diagonal matrix containing the eigenvalues. Let P _truncated be a d × p matrix obtained by selecting the p d columns of P with the largest eigenvalues. Typically, the value of p is much smaller than d. This represents a basis for the hidden series with the greatest variability.

A new multivariate time series with p hidden time series variables is created. Each d-dimensional time series data point Y_i at the ith timestamp is expressed in terms of a p-dimensional hidden series data point. This is achieved by using the p basis vectors

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 283 284 285 286 287 288 289 290 ... 423