Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	298/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 294 295 296 297 298 299 300 301 ... 423

1-Data Mining tarjima

14.10. EXERCISES

491

For the time series of Exercise 1, construct the rolling average series for a window size of 2 units. Compare the results to those obtained in the previous exercise.

For the time series of Exercise 1, construct the exponentially smoothed series, with a smoothing parameter α = 0.5. Set the initial smoothed value y₀ to the first point in the series.

Implement the binning, moving average, and exponential smoothing methods.

Consider a series, in which consecutive values are related as follows:

y_i₊₁ = y_i · (1 + R_i)

(14.27)

Here R_i is a random variable drawn from [0.01, 0.05]. What transformation would you apply to make this series stationary?

6. Consider the series in which y_i is defined as follows:
y_i = 1 + i + i² + R_i	(14.28)

Here R_i is a random variable drawn from [0.01, 0.05]. What transformation would you apply to make this series stationary?

For a real-valued time series x₀ . . . x_n−₁ with Fourier coeﬃcients X₀ . . . X_n−₁, show that X_k + X_n−k is real-valued for each k ∈ {1 . . . n − 1}.

Suppose that you wanted to implement the k-means algorithm for a set of time series, and you were given the same subset of complex Fourier coeﬃcients for each dimensionality-reduced series. How would the implementation be diﬀerent from that of using k-means on the original time series?

Use Parseval’s theorem and additivity to show that the dot product of two series is proportional to the sum of the dot products of the real parts and the dot prod-ucts of the imaginary parts of the Fourier coeﬃcients of the two series. What is the proportionality factor?

Implement a shape-based k-nearest neighbor classifier for time series data.

Generalize the distance-based motif discovery algorithm, discussed in this chapter to the case where the motifs are allowed to be of any length [a, b], and the Manhattan segmental distance is used for distance comparison. The Manhattan segmental dis-tance between a pair of series is the same as the Manhattan distance, except that it divides the distance with the motif length for normalization.

Suppose you have a database of N series, and the frequency of motifs are counted, so that their occurrence once in any series is given a credit of one. Discuss the details of an algorithm that can use wavelets to determine motifs at diﬀerent resolutions.

Chapter 15

Mining Discrete Sequences

“I am above the weakness of seeking to establish a sequence of cause and eﬀect.”—Edgar Allan Poe

15.1 Introduction

Discrete sequence data can be considered the categorical analog of timeseries data. As in the case of timeseries data, it contains a single contextual attribute that typically corre-sponds to time. However, the behavioral attribute is categorical. Some examples of relevant applications are as follows:

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 294 295 296 297 298 299 300 301 ... 423