Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	288/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 284 285 286 287 288 289 290 291 ... 423

1-Data Mining tarjima

14.3. TIME SERIES FORECASTING

471

	1.1
PRICES	1.05
PRICES	1
STOCK	0.95
STOCK
RELATIVE	0.9
	0.85

	0.8
					GOLD ETF (GLD)
	0.75				SILVER ETF (SLV)
	0.75				PLATINUM ETF (PPLT)
					PLATINUM ETF (PPLT)
	0.7 ₀				GOLD MINER ETF (GDX)
	0.7 ₀	50	100	150	200	250

NUMBER OF TRADING DAYS

(a) Correlated stock prices

1	1.8
SERIES	1.75
	1.7
	1.65
	1.6
HIDDEN	1.6
	1.55
	1.5
	1.45
	0	50	100	150	200	250
			NUMBER OF TRADING DAYS

2
SERIES	0.5
SERIES	0.4
HIDDEN	0.3
HIDDEN	0.2₀
	0.2₀	50	100	150	200	250

NUMBER OF TRADING DAYS

(b) Uncorrelated hidden variables

Figure 14.5: Normalized prices of four precious metal exchange traded funds (ETFs) from September 5, 2013 to September 4, 2014 and corresponding uncorrelated hidden variables

derived in the previous step. Therefore, the p-dimensional hidden value Z_i = (z1 . . . z^p) i i

	=		^Ptruncated	(14.18)
Z_i		Y_i

The value of Z _i represents the p diﬀerent values for the hidden series variables at the ith timestamp. Thus, this step creates p diﬀerent hidden variable time series that are approximately independent of one another. Note that the other (d − p) hidden vari-ables in Y_iP are approximately constant over time because of their small eigenvalues (variance). The means of these (d−p) approximately constant values are noted as well. No predictive modeling is required for the vast majority of these hidden variables with constant values. In Fig. 14.5a, the stock prices of four precious metal-related exchange traded funds (ETFs) are illustrated for a period of 1 year. Each series was multiplica-tively scaled to a relative value starting at 1. The top two hidden variable series are illustrated in Fig. 14.5b. Note that these derived series are uncorrelated and the first hidden variable has much higher variance than the second. The remaining two hidden variables are not shown because their variance is even smaller. In fact, each of the four correlated series in Fig. 14.5a can be approximately expressed as a diﬀerent linear combination of the two hidden-variable series in Fig. 14.5b. Therefore, forecasting the hidden variables yields approximate forecasts of the original series.

For each of the p uncorrelated and high-variance series, use any univariate forecasting model to predict the values of the p hidden variables at the (n + 1)th timestamp. A univariate approach can be used eﬀectively because the diﬀerent hidden variables are uncorrelated by design. This provides a set of values Z_n₊₁ = (z_n¹₊₁ . . . z_n^p₊₁). Append the means of the approximately constant values of the remaining (d − p) hidden series to Z_n₊₁ to create a new d-dimensional hidden variable vector W_n₊₁.

Transform back the predicted hidden variables W_n₊₁ to the original d-dimensional representation by using the reverse transformation. This provides the forecasted values of the original series:

	=		P ^T	(14.19)
^Yn+1		^Wn+1

472 CHAPTER 14. MINING TIME SERIES DATA

	4
	3
	2
	1
VALUE	0
VALUE
	−1
	−2
	−3			REPEATED MOTIFS
	−4₀	10	20	30	40	50	60
				TIME INDEX

Figure 14.6: Repeated motif in a single time series

The aforementioned description is a simplified version of the SPIRIT framework. It reduces the computational eﬀort of prediction because simplified univariate modeling is performed only on a small number p d of independent time series. On the other hand, it does incur the overhead of computing eigenvectors. The hidden-variable series is a linear combination of many diﬀerent series. Therefore, the noise eﬀects of individual series are often smoothed out within the hidden variables, which increases the robustness of the forecasting process.

14.4 Time Series Motifs

A motif is a frequently occurring pattern or shape in the time series. Motif discovery can be formulated in a wide variety of ways, depending on application-specific requirements. These diﬀerent formulations vary in terms of the input data and the nature of the motifs discovered. These variations are as follows:

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 284 285 286 287 288 289 290 291 ... 423