GLOBAL
|
1
|
1
|
1
|
1
|
75 76 75 72
|
SEA SURFACE
|
|
TEMPERATURE
|
1
|
1
|
1
|
1
|
77 73 73 74
|
TEMPERATURES
|
|
AVERAGE = 75
|
1
|
1
|
1
|
1
|
72 71 78 80
|
ALONG SPATIAL
|
|
COEFFICIENT = 75
|
1
|
1
|
1
|
1
|
74 75 79 76
|
GRID
|
|
|
CUT ALONG X AXIS
|
BASE DATA
|
|
|
|
|
|
|
AVERAGE
TEMPERATURE
DIFFERENCE
BETWEEN LEFT
AND RIGHT
BLOCKS = 7/4
COEFFICIENT=
AVERAGE TEMP.
DIFFERENCE
BETWEEN TOP AND
BOTTOM BLOCKS = 9/4
COEFFICIENT= 9/8
|
1
|
1
|
1
|
1
|
BINARY
|
|
|
|
|
|
1
|
1
|
1
|
1
|
MATRICES
|
|
|
|
|
1
|
1
|
1
|
1
|
REPRESENT
|
|
|
|
|
2 DIMENSIONAL
|
|
|
|
1
|
1
|
1
|
1
|
BASIS MATRICES
|
|
|
7/8
|
|
CUT ALONG
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Y AXIS
|
|
|
|
|
|
|
1
|
1
|
0
|
0
|
|
0
|
0
|
1
|
1
|
AVERAGE
|
|
|
TEMPERATURE
|
|
1
|
1
|
0
|
0
|
|
0
|
0
|
1
|
1
|
|
|
DIFFERENCE BETWEEN
|
|
1
|
1 0
|
0
|
|
0
|
0
|
1
|
1
|
TOP AND BOTTOM
|
|
1
|
1 0
|
0
|
|
0
|
0
|
1
|
1
|
BLOCKS = 19/4
|
|
|
|
|
CUT ALONG
|
|
|
|
COEFFICIENT = 19/8
|
|
|
|
|
|
|
|
|
|
X AXIS
Figure 2.7: Illustration of the top levels of the wavelet decomposition for spatial data in a grid containing sea-surface temperatures
that have the largest average normalized coefficient across the N different series are selected.
The full dimensionality of the wavelet coefficient representation is retained. However, for each time series, the largest normalized coefficients (in magnitude) are selected individually. The remaining values are set to 0. This results in a sparse database of high dimensionality, in which many values are 0. A method such as SVD can be applied as a second step to further reduce the dimensionality. The second step of this approach has the disadvantage of losing interpretability of the features of the wavelet transform. Recall that the Haar wavelet is one of the few dimensionality reduction transforms where the coefficients do have some interpretability in terms of specific trends across particular time-series segments.
The wavelet decomposition method provides a natural method for dimensionality reduction (and data-type transformation) by retaining only a small number of coefficients.
Wavelet Decomposition with Multiple Contextual Attributes
Time-series data contain a single contextual attribute, corresponding to the time value. This helps in simplification of the wavelet decomposition. However, in some cases such as spatial data, there may be two contextual attributes corresponding to the X-coordinate and the Y -coordinate. For example, sea-surface temperatures are measured at spatial locations that are described with the use of two coordinates. How can wavelet decomposition be performed in such cases? In this section, a brief overview of the extension of wavelets to multiple contextual attributes is provided.
Assume that the spatial data is represented in the form of a 2-dimensional grid of size q × q. Recall that in the 1-dimensional case, differencing operations were applied over contiguous segments of the time series by successive division of the time series in hierarchical fashion. The corresponding basis vectors have +1 and −1 at the relevant positions. The 2-dimensional case is completely analogous where contiguous areas of the spatial grid are used
2.4. DATA REDUCTION AND TRANSFORMATION
|
55
|
by successive divisions. These divisions are alternately performed along the different axes. The corresponding basis vectors are 2-dimensional matrices of size q × q that regulate how the differencing operations are performed.
An example of the strategy for 2-dimensional decomposition is illustrated in Fig. 2.7. Only the top two levels of the decomposition are illustrated in the figure. Here, a 4 × 4 grid of spatial temperatures is used as an example. The first division along the X-axis divides the spatial area into two blocks of size 4 × 2 each. The corresponding two-dimensional binary basis matrix is illustrated into the same figure. The next phase divides each of these 4 × 2 blocks into blocks of size 2 × 2 during the hierarchical decomposition process. As in the case of 1-dimensional time series, the wavelet coefficient is half the difference in the average temperatures between the two halves of the relevant block being decomposed. The alternating process of division along the X-axis and the Y -axis can be carried on to the individual data entries. This creates a hierarchical wavelet error tree, which has many similar properties to that created in the 1-dimensional case. The overall principles of this decomposition are almost identical to the 1-dimensional case, with the major difference in terms of how the cuts along different dimensions are performed by alternating at different levels. The approach can be extended to the case of k > 2 contextual attributes with the use of a round-robin rotation in the axes that are selected at different levels of the tree for the differencing operation.
2.4.4.2 Multidimensional Scaling
Graphs are a powerful mechanism for representing relationships between objects. In some data mining scenarios, the data type of an object may be very complex and heteroge-neous such as a time series annotated with text and other numeric attributes. However, a crisp notion of distance between several pairs of data objects may be available based on application-specific goals. How can one visualize the inherent similarity between these objects? How can one visualize the “nearness” of two individuals connected in a social net-work? A natural way of doing so is the concept of multidimensional scaling (MDS). Although MDS was originally proposed in the context of spatial visualization of graph-structured dis-tances, it has much broader applicability for embedding data objects of arbitrary types in multidimensional space. Such an approach also enables the use of multidimensional data mining algorithms on the embedded data.
For a graph with n nodes, let δij = δji denote the specified distance between nodes i and j. It is assumed that all n2 pairwise distances between nodes are specified. It is desired to map the n nodes to n different k-dimensional vectors denoted by X1 . . . Xn, so that the distances in multidimensional space closely correspond to the n2 distance values in the distance graph. In MDS, the k coordinates of each of these n points are treated as variables that need to be optimized, so that they can fit the current set of pairwise distances. Metric MDS, also referred to as classical MDS, attempts to solve the following optimization (minimization) problem:
O =
|
(||
|
|
−
|
|
|| − δij )2
|
(2.16)
|
|
Xi
|
Xj
|
|
|
i,j:i
|
|
|
Here || · || represents Euclidean norm. In other words, each node is represented by a mul-tidimensional data point, such that the Euclidean distances between these points reflect the graph distances as closely as possible. In other forms of nonmetric MDS, this objective function might be different. This optimization problem therefore has n · k variables, and it scales with the size of the data n and the desired dimensionality k of the embedding. The
56 CHAPTER 2. DATA PREPARATION
Table 2.3: Scaled eigenvectors of various similarity matrices yield embeddings with different properties
Method
Relevant similarity matrix
|
PCA
|
Dot product matrix DDT after mean centering D
|
|
|
|
SVD
|
Dot product matrix DDT
|
|
|
|
|
|
|
|
|
|
|
|
|
Spectral embedding
|
Sparsified/normalized similarity matrix Λ−1/2W Λ−1/2
|
|
|
|
(Symmetric Version)
|
(cf. Sect. 19.3.4 of Chap. 19)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MDS/ISOMAP
|
Similarity matrix derived from distance matrix Δ with
|
|
|
|
|
cosine law S =
|
−
|
|
1
|
(I
|
−
|
U
|
)Δ(I
|
−
|
U
|
)
|
|
|
|
|
|
|
|
|
|
|
|
U
|
|
|
|
|
|
2
|
|
| Dostları ilə paylaş: |