Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə286/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   282   283   284   285   286   287   288   289   ...   423
1-Data Mining tarjima

p




yt =ai · yt−i + c + t

(14.15)

i=1

A model that uses the preceding window of length p is referred to as an AR(p) model. The values of the regression coefficients a1 . . . ap , c need to be learned from the training data. The larger the value of p, the greater the lag that one is willing to incorporate in the autocorrelations. The choice of p should be guided by the level of autocorrelation of Eq. 14.14. Because the autocorrelation often reduces with increasing values of the lag L, a value of p should be selected, so that the autocorrelation at lag L = p is small. In such cases, increasing the window of regression further may not help the accuracy of the modeling process, and may sometimes result in overfitting. Typically, the autocorrelation plot (Fig. 14.4) is used to identify the window. Instead of using a window of coefficients in


468 CHAPTER 14. MINING TIME SERIES DATA









1







0.8




AUTOCORRELATION

0.6




0.4













0.2







0







−0.2







−0.4




−0.6

−0.8



−1

























0

50

100

150

200

250
















LAG













(a) IBM stock








1





































0.8


































AUTOCORRELATION

0.6


































0.4


































0.2


































0


































−0.2









































































−0.4





































−0.6





































−0.8





































−1 0

100

200

300

400

500

600

700

800

900

1000




LAG (DEGREES)


(b) Sine wave





Figure 14.4: Autocorrelation plots for various series


Eq. 14.15, it is also possible to select coefficients with specific lag values. In particular, lag values with high absolute autocorrelation in the autocorrelation plot may be selected. Such an approach is also helpful for forecasting periodic series.

Each timestamp in the past history of the time series data creates a linear equation between the time series variables. A set of linear equations between the coefficients can be created by using the value at each timestamp in the training data, along with its imme-diately preceding window of length p. When the number of timestamps available is much larger than p, this is an over-determined system of equations, which is infeasible. Therefore, any (infeasible) solution will have an error associated with it. The coefficients a1, . . . ap, c can be approximated with least-squares regression, to minimize the square-error of the over-determined system (cf. Sect. 11.5 of Chap. 11). Note that the model can be used effectively for forecasting future values, only if the key properties of the time series, such as the mean, variance, and autocorrelation do not change significantly with time. Many off-the-shelf com-mercial solvers are available for these models. The effectiveness of the forecasting model may be quantified by using the noise level in the estimated coefficients. Specifically, the R2-value, which is also referred to as the coefficient of determination, measures the ratio of the white





noise to the series variance:










Meant( t2)







R

2

= 1



(14.16)







Variancet(yt)




The coefficient of determination quantifies the fraction of variability in the series that is explained by the regression, as opposed to random noise. It is therefore desirable for this coefficient to be as close to 1 as possible.


14.3.2 Autoregressive Moving Average Models


While autocorrelation is a useful predictive property of time series, it does not always explain all the variations. In fact, the unexpected component of the variations (shocks), does impact future values of the time series. This component can be captured with the use of a moving average model (MA). The autoregressive model can therefore be made more robust by combining it with an MA. Before discussing the autoregressive moving average model (ARMA), the MA will be introduced.



14.3. TIME SERIES FORECASTING

469

The moving average model predicts subsequent series values on the basis of the past history of deviations from predicted values. A deviation from a predicted value can be viewed as white noise, or a shock. This model is best used in scenarios where the behavioral attribute value at a timestamp is dependent on the history of shocks in the time series, rather than the actual series values. The moving average model is defined as follows:




q
yt = bi · t−i + c + t


i=1

The aforementioned model is also referred to as M A(q). The parameter c is the mean of the time series. The values of b1 . . . bq are the coefficients that need to be learned from the data. The moving average model is quite different from the autoregressive model, in that it relates the current value to the mean of the series and the previous history of deviations from forecasts, rather than the actual values. Here the values of t are assumed to be white noise error terms that are uncorrelated with one another. A problem here is that the error terms t are not part of observed data, but also need to be derived from the forecasting model. This circularity implies that the system of equations is inherently nonlinear when expressed purely in terms of the coefficients and the observed values yi. Typically, iterative nonlinear fitting procedures are used instead of the linear least-squares approach to determine a solution to the moving average model. It is rare that the series values can be predicted in terms of only the shocks, and not the autocorrelations. Autocorrelations are extremely important in time series analysis because of the inherent temporal continuity of time series data. At the same time, the history of shocks do impact the future values of the series. Therefore, neither the autoregressive nor the moving average model can fully capture all the correlations needed for forecasting in isolation.


A more general model may be obtained by combining the power of both the autoregres-sive model and the moving average model. The idea is to learn the appropriate impact of both the autocorrelations and the shocks in predicting time series values. The two models can be combined with p autoregressive terms and q moving average terms. This model is referred to as the ARMA model. In this case, the relationships between the different terms may be expressed as follows:





  • q

yt = ai · yt−i + bi · t−i + c + t


i=1 i=1

The aforementioned model is the ARM A(p, q) model. A key question here is about the choice of the parameters p and q in these models. If the values of p and q are set to be too small, then the model will not fit the data well. On the other hand if the values of p and q are set to be too large, then the model is likely to overfit the data. In general, it is advisable to select the values of p and q as small as possible, so that the model fits the data well. As in the previous case, autoregressive moving average models are best used with stationary data.


In many cases, nonstationary data can be addressed by combining differencing with the autoregressive moving average model. This results in the autoregressive integrated moving average model (ARIMA). In principle, differences of any order may be used, although first-and second-order differences are most commonly used. Consider the case where the first order differenced value yt is used. Then, the ARIMA model can be expressed as follows:




1   ...   282   283   284   285   286   287   288   289   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin