CCCCDDDD.
Implement the GSP algorithm for sequential pattern mining.
Consider a special case of the sequential pattern mining problem where elements are always singleton items. What difference would it make to (a) GSP algorithm, and (b) algorithms based on the candidate tree.
Discuss the generalizability of k-medoids and graph-based methods for clustering of arbitrary data types.
The chapter introduces a number of string kernels for classification with SVMs. Discuss some other data mining applications you can implement with string kernels.
Discuss the similarity and differences between Markovian models for discovering posi-tion outliers in sequential data, with autoregressive models for discovering point out-liers in timeseries data.
Write a computer program to determine all maximal frequent subsequences from a collection using GSP. Implement a program to express the sequences in a database in terms of these subsequences, in vector space representation. Implement a k-means algorithm on this representation.
Write a computer program to determine position outliers using order-1 Markovian Models.
Consider the discrete sequence ACGTACGTACGTACGTATGT. Construct an order-1 Markovian model to determine the position outliers. Which positions are found as outliers?
For the discrete sequence of Exercise 9, determine all subsequences of length 2. Use a frequency-based approach to assign combination outlier scores to subsequences. Which subsequences should be considered combination outliers?
Write a computer program to learn the state transition and symbol emission prob-abilities for a Hidden Markov Model. Execute your program using the sequence of Exercise 9.
Compute the kernel similarity between the two sequences in Exercise 1 with a bag-of-words kernel and a spectrum kernel in which sequences of length 2 are used.
15.9. EXERCISES 529
What is the maximum number of possible sequential patterns of length at most k, where the alphabet size is |Σ|. Compare this with frequent pattern mining. Which is larger?
Suppose that the speed of an athlete on a racetrack probabilistically depends upon whether the day is cold, moderate, or hot. Accordingly, the athlete runs a race that is graded either Fast (F), Slow (S), or Average (A). The weather on a particular day probabilistically depends on the weather on the previous day. Suppose that you have a sequence of performances of the athlete on successive days in the form of a string, such as F SF AAF . Construct a Hidden Markov Model that explains the athlete’s performance, without any knowledge of the weather on those days.
Chapter 16
Mining Spatial Data
“Time and space are modes by which we think and not conditions in which we live.”—Albert Einstein
16.1 Introduction
Spatial data arises commonly in geographical data mining applications. Numerous appli-cations related to meteorological data, earth science, image analysis, and vehicle data are spatial in nature. In many cases, spatial data is integrated with temporal components. Such data is referred to as spatiotemporal data. Some examples of applications in which spatial data arise, are as follows:
Dostları ilə paylaş: |