Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə52/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   48   49   50   51   52   53   54   55   ...   423
1-Data Mining tarjima

Temporal (contextual) attribute scaling: In this case, the series may need to be stretched or compressed along the temporal axis to allow more effective matching. This is referred to as time warping. An additional complication is that different tem-poral segments of the series may need to be warped differently to allow for better matching. In Fig. 3.7, the simplest case of warping is shown where the entire set of values for stock A has been stretched. In general, the time warping can be more complex where different windows in the same series may be stretched or compressed differently. This is referred to as dynamic time warping (DTW).




  1. Noncontiguity in matching: Long time series may have noisy segments that do not match very well with one another. For example, one of the series in Fig. 3.7 has a window of dropped readings because of data collection limitations. This is common in sensor data. The distance function may need to be robust to such noise.

Some of these issues can be addressed by attribute normalization during preprocessing.


3.4.1.1 Impact of Behavioral Attribute Normalization


The translation and scaling issues are often easier to address for the behavioral attributes as compared to contextual attributes, because they can be addressed by normalization during preprocessing:





  1. Behavioral attribute translation: The behavioral attribute is mean centered during preprocessing.




  1. Behavioral attribute scaling: The standard deviation of the behavioral attribute is scaled to 1 unit.

It is important to remember that these normalization issues may not be relevant to every application. Some applications may require only translation, only scaling, or neither of the two. Other applications may require both. In fact, in some cases, the wrong choice of



3.4. TEMPORAL SIMILARITY MEASURES

79









































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































Figure 3.8: Illustration of dynamic time warping by repeating elements


normalization may have detrimental effects on the interpretability of the results. Therefore, an analyst needs to judiciously select a normalization approach depending on application-specific needs.

3.4.1.2 Lp-Norm


The Lp-norm may be defined for two series X = (x1 . . . xn ) and Y = (y 1 . . . yn). This measure treats a time series as a multidimensional data point in which each time stamp is a dimension.




n 1/p



Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   48   49   50   51   52   53   54   55   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin