Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə298/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   294   295   296   297   298   299   300   301   ...   423
1-Data Mining tarjima

14.10. EXERCISES

491




  1. For the time series of Exercise 1, construct the rolling average series for a window size of 2 units. Compare the results to those obtained in the previous exercise.




  1. For the time series of Exercise 1, construct the exponentially smoothed series, with a smoothing parameter α = 0.5. Set the initial smoothed value y0 to the first point in the series.




  1. Implement the binning, moving average, and exponential smoothing methods.




  1. Consider a series, in which consecutive values are related as follows:




yi+1 = yi · (1 + Ri)

(14.27)

Here Ri is a random variable drawn from [0.01, 0.05]. What transformation would you apply to make this series stationary?





6. Consider the series in which yi is defined as follows:




yi = 1 + i + i2 + Ri

(14.28)

Here Ri is a random variable drawn from [0.01, 0.05]. What transformation would you apply to make this series stationary?





  1. For a real-valued time series x0 . . . xn−1 with Fourier coefficients X0 . . . Xn−1, show that Xk + Xn−k is real-valued for each k ∈ {1 . . . n − 1}.




  1. Suppose that you wanted to implement the k-means algorithm for a set of time series, and you were given the same subset of complex Fourier coefficients for each dimensionality-reduced series. How would the implementation be different from that of using k-means on the original time series?




  1. Use Parseval’s theorem and additivity to show that the dot product of two series is proportional to the sum of the dot products of the real parts and the dot prod-ucts of the imaginary parts of the Fourier coefficients of the two series. What is the proportionality factor?




  1. Implement a shape-based k-nearest neighbor classifier for time series data.




  1. Generalize the distance-based motif discovery algorithm, discussed in this chapter to the case where the motifs are allowed to be of any length [a, b], and the Manhattan segmental distance is used for distance comparison. The Manhattan segmental dis-tance between a pair of series is the same as the Manhattan distance, except that it divides the distance with the motif length for normalization.




  1. Suppose you have a database of N series, and the frequency of motifs are counted, so that their occurrence once in any series is given a credit of one. Discuss the details of an algorithm that can use wavelets to determine motifs at different resolutions.



Chapter 15


Mining Discrete Sequences

I am above the weakness of seeking to establish a sequence of cause and effect.”—Edgar Allan Poe


15.1 Introduction


Discrete sequence data can be considered the categorical analog of timeseries data. As in the case of timeseries data, it contains a single contextual attribute that typically corre-sponds to time. However, the behavioral attribute is categorical. Some examples of relevant applications are as follows:






  1. Yüklə 17,13 Mb.

    Dostları ilə paylaş:
1   ...   294   295   296   297   298   299   300   301   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin