Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	286/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 282 283 284 285 286 287 288 289 ... 423

1-Data Mining tarjima

p

y_t =a_i · y_t−i + c + _t	(14.15)

i=1

A model that uses the preceding window of length p is referred to as an AR(p) model. The values of the regression coeﬃcients a₁ . . . a_p , c need to be learned from the training data. The larger the value of p, the greater the lag that one is willing to incorporate in the autocorrelations. The choice of p should be guided by the level of autocorrelation of Eq. 14.14. Because the autocorrelation often reduces with increasing values of the lag L, a value of p should be selected, so that the autocorrelation at lag L = p is small. In such cases, increasing the window of regression further may not help the accuracy of the modeling process, and may sometimes result in overfitting. Typically, the autocorrelation plot (Fig. 14.4) is used to identify the window. Instead of using a window of coeﬃcients in

468 CHAPTER 14. MINING TIME SERIES DATA

	1
	0.8
AUTOCORRELATION	0.6
AUTOCORRELATION	0.4
	0.4
	0.2
	0
	−0.2
	−0.4

−0.6

−0.8

−1

0

50

100

150

200

250

LAG

(a) IBM stock

	1
	0.8
AUTOCORRELATION	0.6
	0.4
	0.2
	0
	−0.2

	−0.4
	−0.6
	−0.8
	−1 ₀	100	200	300	400	500	600	700	800	900	1000

LAG (DEGREES)

(b) Sine wave

Figure 14.4: Autocorrelation plots for various series

Eq. 14.15, it is also possible to select coeﬃcients with specific lag values. In particular, lag values with high absolute autocorrelation in the autocorrelation plot may be selected. Such an approach is also helpful for forecasting periodic series.

Each timestamp in the past history of the time series data creates a linear equation between the time series variables. A set of linear equations between the coeﬃcients can be created by using the value at each timestamp in the training data, along with its imme-diately preceding window of length p. When the number of timestamps available is much larger than p, this is an over-determined system of equations, which is infeasible. Therefore, any (infeasible) solution will have an error associated with it. The coeﬃcients a₁, . . . a_p, c can be approximated with least-squares regression, to minimize the square-error of the over-determined system (cf. Sect. 11.5 of Chap. 11). Note that the model can be used eﬀectively for forecasting future values, only if the key properties of the time series, such as the mean, variance, and autocorrelation do not change significantly with time. Many oﬀ-the-shelf com-mercial solvers are available for these models. The eﬀectiveness of the forecasting model may be quantified by using the noise level in the estimated coeﬃcients. Specifically, the R2-value, which is also referred to as the coeﬃcient of determination, measures the ratio of the white

noise to the series variance:				Mean_t( _t²)
R	2	= 1	−		(14.16)
				Variance_t(y_t)

The coeﬃcient of determination quantifies the fraction of variability in the series that is explained by the regression, as opposed to random noise. It is therefore desirable for this coeﬃcient to be as close to 1 as possible.

14.3.2 Autoregressive Moving Average Models

While autocorrelation is a useful predictive property of time series, it does not always explain all the variations. In fact, the unexpected component of the variations (shocks), does impact future values of the time series. This component can be captured with the use of a moving average model (MA). The autoregressive model can therefore be made more robust by combining it with an MA. Before discussing the autoregressive moving average model (ARMA), the MA will be introduced.

14.3. TIME SERIES FORECASTING

469

The moving average model predicts subsequent series values on the basis of the past history of deviations from predicted values. A deviation from a predicted value can be viewed as white noise, or a shock. This model is best used in scenarios where the behavioral attribute value at a timestamp is dependent on the history of shocks in the time series, rather than the actual series values. The moving average model is defined as follows:

q
y_t = b_i · _t−i + c + _t

i=1

The aforementioned model is also referred to as M A(q). The parameter c is the mean of the time series. The values of b₁ . . . b_q are the coeﬃcients that need to be learned from the data. The moving average model is quite diﬀerent from the autoregressive model, in that it relates the current value to the mean of the series and the previous history of deviations from forecasts, rather than the actual values. Here the values of _t are assumed to be white noise error terms that are uncorrelated with one another. A problem here is that the error terms _tare not part of observed data, but also need to be derived from the forecasting model. This circularity implies that the system of equations is inherently nonlinear when expressed purely in terms of the coeﬃcients and the observed values y_i. Typically, iterative nonlinear fitting procedures are used instead of the linear least-squares approach to determine a solution to the moving average model. It is rare that the series values can be predicted in terms of only the shocks, and not the autocorrelations. Autocorrelations are extremely important in time series analysis because of the inherent temporal continuity of time series data. At the same time, the history of shocks do impact the future values of the series. Therefore, neither the autoregressive nor the moving average model can fully capture all the correlations needed for forecasting in isolation.

A more general model may be obtained by combining the power of both the autoregres-sive model and the moving average model. The idea is to learn the appropriate impact of both the autocorrelations and the shocks in predicting time series values. The two models can be combined with p autoregressive terms and q moving average terms. This model is referred to as the ARMA model. In this case, the relationships between the diﬀerent terms may be expressed as follows:

y_t = a_i · y_t−i + b_i · _t−i + c + _t

i=1 i=1

The aforementioned model is the ARM A(p, q) model. A key question here is about the choice of the parameters p and q in these models. If the values of p and q are set to be too small, then the model will not fit the data well. On the other hand if the values of p and q are set to be too large, then the model is likely to overfit the data. In general, it is advisable to select the values of p and q as small as possible, so that the model fits the data well. As in the previous case, autoregressive moving average models are best used with stationary data.

In many cases, nonstationary data can be addressed by combining diﬀerencing with the autoregressive moving average model. This results in the autoregressive integrated moving average model (ARIMA). In principle, diﬀerences of any order may be used, although first-and second-order diﬀerences are most commonly used. Consider the case where the first order diﬀerenced value y_t is used. Then, the ARIMA model can be expressed as follows:

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 282 283 284 285 286 287 288 289 ... 423