Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə288/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   284   285   286   287   288   289   290   291   ...   423
1-Data Mining tarjima

14.3. TIME SERIES FORECASTING

471








1.1



















PRICES

1.05



















1



















STOCK

0.95








































RELATIVE

0.9



















0.85











































0.8


































GOLD ETF (GLD)










0.75










SILVER ETF (SLV)



















PLATINUM ETF (PPLT)































0.7 0










GOLD MINER ETF (GDX)










50

100

150

200

250




NUMBER OF TRADING DAYS


(a) Correlated stock prices





1

1.8



















SERIES

1.75



















1.7



















1.65



















1.6



















HIDDEN



















1.55



















1.5



















1.45






















0

50

100

150

200

250













NUMBER OF TRADING DAYS










2






















SERIES

0.5



















0.4



















HIDDEN

0.3



















0.20






















50

100

150

200

250




NUMBER OF TRADING DAYS


(b) Uncorrelated hidden variables





Figure 14.5: Normalized prices of four precious metal exchange traded funds (ETFs) from September 5, 2013 to September 4, 2014 and corresponding uncorrelated hidden variables


derived in the previous step. Therefore, the p-dimensional hidden value Zi = (z1 . . . zp) i i








=




Ptruncated

(14.18)




Zi

Yi




The value of Z i represents the p different values for the hidden series variables at the ith timestamp. Thus, this step creates p different hidden variable time series that are approximately independent of one another. Note that the other (d − p) hidden vari-ables in YiP are approximately constant over time because of their small eigenvalues (variance). The means of these (d−p) approximately constant values are noted as well. No predictive modeling is required for the vast majority of these hidden variables with constant values. In Fig. 14.5a, the stock prices of four precious metal-related exchange traded funds (ETFs) are illustrated for a period of 1 year. Each series was multiplica-tively scaled to a relative value starting at 1. The top two hidden variable series are illustrated in Fig. 14.5b. Note that these derived series are uncorrelated and the first hidden variable has much higher variance than the second. The remaining two hidden variables are not shown because their variance is even smaller. In fact, each of the four correlated series in Fig. 14.5a can be approximately expressed as a different linear combination of the two hidden-variable series in Fig. 14.5b. Therefore, forecasting the hidden variables yields approximate forecasts of the original series.



  1. For each of the p uncorrelated and high-variance series, use any univariate forecasting model to predict the values of the p hidden variables at the (n + 1)th timestamp. A univariate approach can be used effectively because the different hidden variables are uncorrelated by design. This provides a set of values Zn+1 = (zn1+1 . . . znp+1). Append the means of the approximately constant values of the remaining (d − p) hidden series to Zn+1 to create a new d-dimensional hidden variable vector Wn+1.




  1. Transform back the predicted hidden variables Wn+1 to the original d-dimensional representation by using the reverse transformation. This provides the forecasted values of the original series:







=




P T

(14.19)




Yn+1

Wn+1




472 CHAPTER 14. MINING TIME SERIES DATA






4

























3

























2

























1






















VALUE

0

















































−1

























−2

























−3







REPEATED MOTIFS
















−40

10

20

30

40

50

60
















TIME INDEX













Figure 14.6: Repeated motif in a single time series


The aforementioned description is a simplified version of the SPIRIT framework. It reduces the computational effort of prediction because simplified univariate modeling is performed only on a small number p d of independent time series. On the other hand, it does incur the overhead of computing eigenvectors. The hidden-variable series is a linear combination of many different series. Therefore, the noise effects of individual series are often smoothed out within the hidden variables, which increases the robustness of the forecasting process.

14.4 Time Series Motifs


A motif is a frequently occurring pattern or shape in the time series. Motif discovery can be formulated in a wide variety of ways, depending on application-specific requirements. These different formulations vary in terms of the input data and the nature of the motifs discovered. These variations are as follows:






  1. Yüklə 17,13 Mb.

    Dostları ilə paylaş:
1   ...   284   285   286   287   288   289   290   291   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin