Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə325/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   321   322   323   324   325   326   327   328   ...   423
1-Data Mining tarjima

16.2. MINING WITH CONTEXTUAL SPATIAL ATTRIBUTES

535

Figure 16.2: PET Scans of the brain of a cognitively healthy person versus an Alzheimer’s patient. (Image courtesy of the National Institute on Aging/National Institutes of Health)





Figure 16.3: Rotation and mirror image effects on shape matching



536 CHAPTER 16. MINING SPATIAL DATA

3

2

1


0

−1


−2



−3

−2

−1

0

1

2

3




−3







DISTANCE FROM CENTROID




3

























2.8

























2.6

























2.4

























2.2

























2

























1.8

























1.6

























1.4

























1.2

























10

50

100

150

200

250

300

350

400










DEGREES FROM START OF SWEEP













(a) elliptical shape (b) distance from centroid for (a)

3

2

1


0

−1


−2



−3

−2

−1

0

1

2

3




−3





DISTANCE FROM CENTROID

3

2.8

2.6

2.4

2.2

2


1.8

1.6

1.4

1.2




1


































0

50

100

150

200

250

300

350

400
















DEGREES FROM START OF SWEEP



















(c) elliptical shape (d) distance from centroid for (c)

Figure 16.4: Conversion from shapes to time series


shape of Fig. 16.3a. The shape in Fig. 16.3d is a mirror image of the shape of Fig. 16.3a.


While rotations result in cyclic translations, mirror images result in a reversal of the series.


Figure 16.4c represents a rotation of the shape of Fig. 16.4a by 45. Correspondingly, the time series representation in Fig. 16.4d is a (cyclic) translation of time series representation in Fig. 16.4b. Similarly, the mirror image of a shape corresponds to a reversal of the time series. It will be evident later that the impact of rotation or mirror images needs to be explicitly incorporated into the distance or similarity function for the application at hand. After the time series has been extracted, it may be normalized in different ways, depending on the needs of the application:





  • If no normalization is performed, then the data mining approach is sensitive to the absolute sizes of the underlying objects. This may be the case in many medical images such as MRI scans, in which all spatial objects are drawn to the same scale.




  • If all time series values are multiplicatively scaled down by the same factor to unit mean, such an approach will allow the matching of shapes of different sizes, but discriminate between different levels of relative variations in the shapes. For example, two ellipses with very different ratios of the major and minor axes will be discriminated well.




  • If all time series are standardized to zero mean and unit variance, then such an approach will match shapes where relative local variations in the shape are similar, but the overall shape may be quite different. For example, such an approach will not

16.2. MINING WITH CONTEXTUAL SPATIAL ATTRIBUTES

537

discriminate very well between two ellipses with very different ratios of the major and minor axes, but will discriminate between two such shapes with different relative local deviations in the boundaries. The only exception is a circular shape that appears as a straight line. Furthermore, noise effects in the contour will be differentially enhanced in shapes that are less elongated. For example, for two ellipses with similar noisy deviations at the boundaries, but different levels of elongation (major to minor axis ratio), the overall shape of the time series will be similar, but the local noisy devi-ations in the extracted time series will be differentially suppressed in the elongated shape. This can sometimes provide a distorted picture from the perspective of shape analysis. A perfectly circular shape may show unstable and large noisy deviations in the extracted time series because of trivial variations such as image rasterization effects. Thus, the usual mean and variance normalization of time series analysis often leads to unintended results.


In general, it is advisable to select the normalization method in an application-specific way. After the shapes have been converted to time series, they can be used in the context of a wide variety of applications. For example, motifs in the time series correspond to frequent contours in the spatial shapes. Similarly, clusters of similar shapes may be discovered by determining clusters in the time series. Similar observations apply to the problems of outlier detection and classification.


16.2.2 Spatial to Multidimensional Transformation with Wavelets


For data types such as meteorological data in which behavioral attribute values vary across the entire spatial domain, a contour-based shape may not be available for analysis. There-fore, the shape to time series transformation is not appropriate in these cases.


Wavelets are a popular method for the transformation of time series data to multidi-mensional data. Spatial data shares a number of similarities with time series data. Time series data has a single contextual attribute (time) along which a behaviorial attribute (e.g., temperature) may exhibit a smooth variation. Correspondingly, spatial data has two contextual attributes (spatial coordinates), along which a behavioral attribute (e.g., sea surface temperature) may exhibit a smooth variation. Because of this analogy, it is possible to generalize the wavelet-based approach to the case of multiple contextual attributes with appropriate modifications.


Assume that the spatial data is represented in the form of a 2-dimensional grid of size q×q. Thus, each coordinate of the grid contains an instance of the behavioral attribute, such as the temperature. As discussed for the time series case in Sect. 2.4.4.1 of Chap. 2, differenc-ing operations are applied over contiguous segments of the time series by successive division of the time series in hierarchical fashion. The corresponding basis vectors have +1 and 1 at the relevant positions. The 2-dimensional case is completely analogos, where contiguous areas of the spatial grid are used for successive divisions. These divisions are alternately per-formed along the different axes. The corresponding basis vectors are 2-dimensional matrices of size q × q that regulate how the differencing operations are performed. An example of how sea surface temperatures in a spatial data set may be converted to a multidimensional representation is provided in Fig. 16.5. This will result in a total of q2 wavelet coefficients, though only the large coefficients need to be retained for analysis. A more detailed descrip-tion of the generation of the spatial wavelet coefficients may be found in Sect. 2.4.4.1 of Chap. 2. The aforementioned description is for the case of a single behavioral attribute and multiple contextual attributes (spatial coordinates). Multiple behavioral attributes can also





538













CHAPTER 16. MINING SPATIAL DATA





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   321   322   323   324   325   326   327   328   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin