Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə38/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   34   35   36   37   38   39   40   41   ...   423
1-Data Mining tarjima

Linear equations: Many data mining applications are optimization problems in which the solution is recast into a system of linear equations. For any linear system Ay = 0, any right singular vector of A with 0 singular value will satisfy the system of equations (see Exercise 14). Therefore, any linear combination of the 0 singular vectors will provide a solution.




  1. Matrix inversion: SVD can be used for the inversion of a square d×d matrix D. Let the decomposition of D be given by QΣP T . Then, the inverse of D is D−1 = P Σ1QT . Note that Σ1 can be trivially computed from Σ by inverting its diagonal entries. The approach can also be generalized to the determination of the Moore–Penrose pseudoinverse D+ of a rank-k matrix D by inverting only the nonzero diagonal entries of Σ. The approach can even be generalized to non-square matrices by performing the additional operation of transposing Σ. Such matrix inversion operations are required in many data mining applications such as least-squares regression (cf. Sect. 11.5 of Chap. 11) and social network analysis (cf. Chap. 19).




  1. Matrix algebra: Many network mining applications require the application of alge-braic operations such as the computation of the powers of a matrix. This is common in random-walk methods (cf. Chap. 19), where the kth powers of the symmetric adja-cency matrix of an undirected network may need to be computed. Such symmetric adjacency matrices can be decomposed into the form QΔQT . The kth power of this decomposition can be efficiently computed as Dk = QΔkQT . In fact, any polynomial function of the matrix can be computed efficiently.



SVD and PCA are extraordinarily useful because matrix and linear algebra operations are ubiquitous in data mining. SVD and PCA facilitate such matrix operations by providing convenient decompositions and basis representations. SVD has rightly been referred to [481] as “absolutely a high point of linear algebra.”

2.4.4 Dimensionality Reduction with Type Transformation


In these methods, dimensionality reduction is coupled with type transformation. In most cases, the data is transformed from a more complex type to a less complex type, such as multidimensional data. Thus, these methods serve the dual purpose of data reduction and type portability. This section will study two such transformation methods:





  1. Time series to multidimensional: A number of methods, such as the discrete Fourier transform and discrete wavelet transform are used. While these methods can also be viewed as a rotation of an axis system defined by the various time stamps of the contextual attribute, the data are no longer dependency oriented after the rotation. Therefore, the resulting data set can be processed in a similar way to multidimensional data. We will study the Haar wavelet transform because of its intuitive simplicity.




  1. Weighted graphs to multidimensional: Multidimensional scaling and spectral methods are used to embed weighted graphs in multidimensional spaces, so that the similarity or distance values on the edges are captured by a multidimensional embedding.

50 CHAPTER 2. DATA PREPARATION


Table 2.2: An example of wavelet coefficient computation



Granularity (order k)


Averages
(Φ values)

DWT coefficients


(ψ values)






k = 4

(8, 6, 2, 3,

4, 6, 6, 5)






k = 3

(7, 2.5,

5, 5.5)

(1, 0.5, 1, 0.5)




k = 2

(4.75,

5.25)

(2.25, 0.25)




k = 1

(5)

(0.25)
















This section will discuss each of these techniques.


2.4.4.1 Haar Wavelet Transform


Wavelets are a well -known technique that can be used for multigranularity decomposition and summarization of time-series data into the multidimensional representation. The Haar wavelet is a particularly popular form of wavelet decomposition because of its intuitive nature and ease of implementation. To understand the intuition behind wavelet decompo-sition, an example of sensor temperatures will be used.


Suppose that a sensor measured the temperatures over the course of 12 h from the morning until the evening. Assume that the sensor samples temperatures at the rate of 1 sample/s. Thus, over the course of a single day, a sensor will collect 12 × 60 × 60 = 43, 200 readings. Clearly, this will not scale well over many days and many sensors. An important observation is that many adjacent sensor readings will be very similar, causing this representation to be very wasteful. So, how can we represent this data approximately in a small amount of space? How can we determine the key regions where “variations” in readings occur, and store these variations instead of repeating values?


Suppose we only stored the average over the entire day. This provides some idea of the temperature but not much else about the variation over the day. Now, if the difference in average temperature between the first half and second half of the day is also stored, we can derive the averages for both the first and second half of the day from these two values. This principle can be applied recursively because the first half of the day can be divided into the first quarter of the day and the second quarter of the day. Thus, with four stored values, we can perfectly reconstruct the averages in four quarters of the day. This process can be applied recursively right down to the level of granularity of the sensor readings. These “difference values” are used to derive wavelet coefficients. Of course, we did not yet achieve any data reduction because the number of such coefficients can be shown to be exactly equal to the length of the original time series.


It is important to understand that large difference values tell us more about the varia-tions in the temperature values than the small ones, and they are therefore more important to store. Therefore, larger coefficient values are stored after a normalization for the level of granularity. This normalization, which is discussed later, has a bias towards storing coeffi-cients representing longer time scales because trends over longer periods of time are more informative for (global) series reconstruction.


More formally, the wavelet technique creates a decomposition of the time series into a set of coefficient -weighted wavelet basis vectors. Each of the coefficients represents the rough variation of the time series between the two halves of a particular time range. The





2.4. DATA REDUCTION AND TRANSFORMATION



















51





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   34   35   36   37   38   39   40   41   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin