Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə330/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   326   327   328   329   330   331   332   333   ...   423
1-Data Mining tarjima

16.3. TRAJECTORY MINING



















547

PQRS

T

P

Q




R

S

T

A




A AP

AQ




AR

AS

AT


































B

B

BP




C

C

CP




D

D

DP
















E

E

EP






















BQ BR


CQ CR


DQ DR


EQ ER
BS BT


CS CT


DS DT


ES ET



(a) Trajectory (b) Relevant grid regions


Figure 16.8: Grid-based discretization of trajectories


sequence. Once this conversion has been performed, any sequential pattern mining algorithm can be applied to the transformed data.

The most effective way to convert a multidimensional trajectory to a discrete sequence is to use grid-based discretization. In Fig. 16.8a, a trajectory has been illustrated, together with a 4×4 grid representation of the underlying data space. The grid ranges along one of the dimensions are denoted by A, B, C , D, and E. The grid ranges along the other dimension are denoted by P , Q, R, S , and T . The 2-dimensional tiles are denoted by a combination of the ranges along each of the dimensions. For example, the tile AP represents the inter-section of the grid range A with the grid range P . Thus, each tile has a distinct (discrete) identifier, and a trajectory can be expressed in terms of the sequence of discrete identifiers through which it passes. The shaded tiles for the trajectory in Fig. 16.8a are illustrated in Fig. 16.8b. The corresponding 1-dimensional sequential pattern is as follows:




EP, DQ, CQ, BQ, BR, CS, BT

This transformation is referred to as the spatial tile transformation. In principle, it is pos-sible to enhance the discretization further, by imposing a minimum time spent in each grid square, though this is not considered here. Consider a database containing N different tra-jectories. The frequent sequential paths can be determined from these trajectories by using a two-step approach:





  1. Convert each of the N trajectories into a discrete sequence, using grid-based dis-cretization.




  1. Apply the sequential pattern mining algorithms discussed in Sect. 15.2 of Chap. 15 to discover frequent sequential patterns from the resulting data set.

By incorporating different types of constraints on the sequential pattern mining process, such as time-gap constraints, it is also possible to apply these constraints on the trajectories. One advantage of this transformation -based approach is that it can take advantage of the power of all the different variations of sequential pattern mining. Sequential pattern mining has a rich body of literature on constraint-based methods.



548 CHAPTER 16. MINING SPATIAL DATA

Another interesting aspect is that this formulation can be modified to address situations in which the patterns of movements occur at the same period in time. The time period over which the movement occurs is discretized into m periods denoted by 1 . . . m. For example, the intervals could be [8AM, 9AM ], [9AM, 10AM ], [10AM, 11AM ], and so on. Thus, for each time-interval, the grid identifier is tagged with the relevant time-period identifier. A time period identifier is tagged to a grid region if a minimum amount of time from that time period range was spent in that region by the trajectory. This results in a set of patterns defined on a new set of discrete symbols of the form < GridId >:< T imeId >. In the tra-jectory of Fig. 16.8a, a possible sequence that appends the time period identifier is as follows:




EP : 1,EP : 2,DQ : 2,DQ : 3,DQ : 4,CQ : 5,BQ : 5,BR : 5,CS : 6,CS : 7,BT : 7

This transformation is referred to as the spatiotemporal tile transformation. Note that this sequence is longer than that in Fig. 16.8b because the trajectory may spend more than one interval in the same grid region. A set of N different sequences are extracted, corresponding to the N different trajectories. The sequential pattern mining can be performed on this new representation. Because of the addition of the time identifiers, the resulting patterns will correspond to simultaneous movements in time. Thus, the sequential pattern mining approach has significant flexibility in terms of either detecting patterns of similar shapes, or patterns of simultaneous movements. Furthermore, because the sequential pattern min-ing formulation does not require successive symbols in a frequent sequential pattern to be contiguous in the original sequence, it can ignore noisy gaps in the underlying trajecto-ries, while mining patterns. Furthermore, by using different constrained sequential pattern mining formulations, different kinds of constrained trajectory patterns can be discovered.


One drawback of the approach is that the granularity level of the discretization may affect the quality of the results. It is possible to address this issue by using a multigranu-larity discretization of the spatial regions. A different approach for conversion to symbolic representation is the use of spatial clustering on the different temporal snapshots of the object positions. The cluster identifiers of each object over different snapshots may be used to construct its sequence representation. The bibliographic notes contain pointers to several algorithms for transformation and pattern discovery from trajectories. The broader idea of many of these methods is to convert to a symbolic sequence representation for more effective pattern mining.


16.3.3.2 Colocation Patterns

Colocation patterns are designed to discover social connections between the trajectories of different individuals. The basic idea of colocation patterns is that individuals who frequently appear at the same point at the same time are likely to be related to one another. Colo-cation pattern mining attempts to discover patterns of individuals, rather than patterns of spatial trajectory paths. Because of the complementary nature of this analysis, a vertical representation of the sequence database is particularly convenient.


A similar grid discretization (as designed for the case of frequent trajectory patterns) can be used for preprocessing. However, in this case, a somewhat different (vertical) repre-sentation is used for the locations of the different individuals in the grid regions at different times. For each grid region and time-interval pair, a list of person identifiers (or trajectory identifiers) is determined. Thus, for the grid region EP and time interval 5, if the persons 3, 9, and 11 are present, then the corresponding set is constructed:




Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   326   327   328   329   330   331   332   333   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin