Event logs: A wide variety of computer systems, Web servers, and Web applications create event logs on the basis of user activity. An example of an event log is a sequence of user actions at a financial Web site:
Login Password Login Password Login Password ....
This particular sequence may represent a scenario where a user is attempting to break into a password-protected system, and it may be interesting from the perspective of anomaly detection.
Biological data: In this case, the sequences may correspond to strings of nucleotides or amino acids. The ordering of such units provides information about the characteristics of protein function. Therefore, the data mining process can be used to determine interesting patterns that are reflective of different biological properties.
Discrete sequences are often more challenging for mining algorithms because they do not have the smooth value continuity of time-series data. Methods for sequence mining are discussed in Chap. 15.
1.3.2.3 Spatial Data
In spatial data, many nonspatial attributes (e.g., temperature, pressure, image pixel color intensity) are measured at spatial locations. For example, sea-surface temperatures are often collected by meteorologists to forecast the occurrence of hurricanes. In such cases, the spatial coordinates correspond to contextual attributes, whereas attributes such as the temperature correspond to the behavioral attributes. Typically, there are two spatial attributes. As in the case of time-series data, it is also possible to have multiple behavioral attributes. For example, in the sea-surface temperature application, one might also measure other behavioral attributes such as the pressure.
Definition 1.3.4 (Spatial Data) A d-dimensional spatial data record contains d behav-ioral attributes and one or more contextual attributes containing the spatial location. There-fore, a d-dimensional spatial data set is a set of d dimensional records X1 . . . Xn, together
with a set of n locations L1 . . . Ln, such that the record Xi is associated with the location Li.
12 CHAPTER 1. AN INTRODUCTION TO DATA MINING
The aforementioned definition provides broad flexibility in terms of how record Xi and location Li may be defined. For example, the behavioral attributes in record Xi may be numeric or categorical, or a mixture of the two. In the meteorological application, Xi may contain the temperature and pressure attributes at location Li . Furthermore, Li may be specified in terms of precise spatial coordinates, such as latitude and longitude, or in terms of a logical location, such as the city or state.
Spatial data mining is closely related to time-series data mining, in that the behavioral attributes in most commonly studied spatial applications are continuous, although some applications may use categorical attributes as well. Therefore, value continuity is observed across contiguous spatial locations, just as value continuity is observed across contiguous time stamps in time-series data.
Spatiotemporal Data
A particular form of spatial data is spatiotemporal data, which contains both spatial and temporal attributes. The precise nature of the data also depends on which of the attributes are contextual and which are behavioral. Two kinds of spatiotemporal data are most com-mon:
Both spatial and temporal attributes are contextual: This kind of data can be viewed as a direct generalization of both spatial data and temporal data. This kind of data is particularly useful when the spatial and temporal dynamics of particular behavioral attributes are measured simultaneously. For example, consider the case where the variations in the sea-surface temperature need to be measured over time. In such cases, the temperature is the behavioral attribute, whereas the spatial and temporal attributes are contextual.
Dostları ilə paylaş: |