CONTENTS
|
|
|
xvii
|
|
13.4.1
|
Use in Dimensionality Reduction and Comparison with Latent
|
|
|
|
Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
443
|
|
13.4.2
|
Use in Clustering and Comparison with Probabilistic
|
|
|
|
Clustering
|
...............................
|
445
|
|
13.4.3
|
Limitations of PLSA . . . . . . . . . . . . . . . . . . . . . . . . .
|
446
|
13.5
|
Specialized Classification Methods for Text . . . . . . . . . . . . . . . . . .
|
446
|
|
13.5.1
|
Instance-Based Classifiers . . . . . . . . . . . . . . . . . . . . . .
|
447
|
|
|
13.5.1.1 Leveraging Latent Semantic Analysis . . . . . . . . . .
|
447
|
|
|
13.5.1.2
|
Centroid-Based Classification . . . . . . . . . . . . . .
|
447
|
|
|
13.5.1.3
|
Rocchio Classification . . . . . . . . . . . . . . . . . . .
|
448
|
|
13.5.2
|
Bayes Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
448
|
|
|
13.5.2.1
|
Multinomial Bayes Model . . . . . . . . . . . . . . . . .
|
449
|
|
13.5.3
|
SVM Classifiers for High-Dimensional and Sparse Data . . . . . .
|
451
|
13.6
|
Novelty and First Story Detection . . . . . . . . . . . . . . . . . . . . . . .
|
453
|
|
13.6.1
|
Micro-clustering Method . . . . . . . . . . . . . . . . . . . . . . .
|
453
|
13.7
|
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
454
|
13.8
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
454
|
13.9
|
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
455
|
14 Mining Time Series Data
|
457
|
14.1
|
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
457
|
14.2
|
Time Series Preparation and Similarity . . . . . . . . . . . . . . . . . . . .
|
459
|
|
14.2.1
|
Handling Missing Values . . . . . . . . . . . . . . . . . . . . . . .
|
459
|
|
14.2.2
|
Noise Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
460
|
|
14.2.3
|
Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
461
|
|
14.2.4
|
Data Transformation and Reduction . . . . . . . . . . . . . . . .
|
462
|
|
|
14.2.4.1
|
Discrete Wavelet Transform . . . . . . . . . . . . . . .
|
462
|
|
|
14.2.4.2
|
Discrete Fourier Transform . . . . . . . . . . . . . . . .
|
462
|
|
|
14.2.4.3 Symbolic Aggregate Approximation (SAX) . . . . . . .
|
464
|
|
14.2.5
|
Time Series Similarity Measures . . . . . . . . . . . . . . . . . . .
|
464
|
14.3
|
Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
464
|
|
14.3.1
|
Autoregressive Models . . . . . . . . . . . . . . . . . . . . . . . .
|
467
|
|
14.3.2
|
Autoregressive Moving Average Models . . . . . . . . . . . . . . .
|
468
|
|
14.3.3
|
Multivariate Forecasting with Hidden Variables . . . . . . . . . .
|
470
|
14.4
|
Time Series Motifs
|
...............................
|
472
|
|
14.4.1
|
Distance-Based Motifs . . . . . . . . . . . . . . . . . . . . . . . .
|
473
|
|
14.4.2
|
Transformation to Sequential Pattern Mining . . . . . . . . . . .
|
475
|
|
14.4.3
|
Periodic Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
476
|
14.5
|
Time Series Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
476
|
|
14.5.1
|
Online Clustering of Coevolving Series . . . . . . . . . . . . . . .
|
477
|
|
14.5.2
|
Shape-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . .
|
479
|
|
|
14.5.2.1 k-Means . . . . . . . . . . . . . . . . . . . . . . . . . .
|
480
|
|
|
14.5.2.2 k-Medoids . . . . . . . . . . . . . . . . . . . . . . . . .
|
480
|
|
|
14.5.2.3
|
Hierarchical Methods . . . . . . . . . . . . . . . . . . .
|
481
|
|
|
14.5.2.4
|
Graph-Based Methods . . . . . . . . . . . . . . . . . .
|
481
|
14.6
|
Time Series Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . .
|
481
|
|
14.6.1
|
Point Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
482
|
|
14.6.2
|
Shape Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
483
|
14.7
|
Time Series Classification . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
485
|
xviii
|
|
|
|
CONTENTS
|
|
|
14.7.1
|
Supervised Event Detection . . . . . . . . . . . . . . . . .
|
.... 485
|
|
|
14.7.2
|
Whole Series Classification . . . . . . . . . . . . . . . . . .
|
.... 488
|
|
|
|
14.7.2.1
|
Wavelet-Based Rules . . . . . . . . . . . . . . .
|
.... 488
|
|
|
|
14.7.2.2
|
Nearest Neighbor Classifier . . . . . . . . . . . .
|
.... 489
|
|
|
|
14.7.2.3
|
Graph-Based Methods . . . . . . . . . . . . . .
|
.... 489
|
|
14.8
|
Summary . . . .
|
............................
|
.... 489
|
|
14.9
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
.... 490
|
|
14.10
|
Exercises . . . . .
|
............................
|
.... 490
|
|
15 Mining Discrete Sequences
|
493
|
|
15.1
|
Introduction . . .
|
............................
|
.... 493
|
|
15.2
|
Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . .
|
.... 494
|
|
|
15.2.1
|
Frequent Patterns to Frequent Sequences . . . . . . . . . .
|
.... 497
|
|
|
15.2.2
|
Constrained Sequential Pattern Mining . . . . . . . . . . .
|
.... 500
|
|
15.3
|
Sequence Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
.... 501
|
|
|
15.3.1
|
Distance-Based Methods . . . . . . . . . . . . . . . . . . .
|
.... 502
|
|
|
15.3.2
|
Graph-Based Methods . . . . . . . . . . . . . . . . . . . .
|
.... 502
|
|
|
15.3.3
|
Subsequence-Based Clustering . . . . . . . . . . . . . . . .
|
.... 503
|
|
|
15.3.4
|
Probabilistic Clustering . . . . . . . . . . . . . . . . . . . .
|
.... 504
|
|
|
|
15.3.4.1
|
Markovian Similarity-Based Algorithm: CLUSEQ . . . 504
|
|
|
|
15.3.4.2 Mixture of Hidden Markov Models . . . . . . .
|
.... 506
|
|
15.4
|
Outlier Detection in Sequences . . . . . . . . . . . . . . . . . . . .
|
.... 507
|
|
|
15.4.1
|
Position Outliers . . . . . . . . . . . . . . . . . . . . . . .
|
.... 508
|
|
|
|
15.4.1.1 Efficiency Issues: Probabilistic Suffix Trees . . .
|
.... 510
|
|
|
15.4.2
|
Combination Outliers . . . . . . . . . . . . . . . . . . . . .
|
.... 512
|
|
|
|
15.4.2.1
|
Distance-Based Models . . . . . . . . . . . . . .
|
.... 513
|
|
|
|
15.4.2.2
|
Frequency-Based Models . . . . . . . . . . . . .
|
.... 514
|
|
15.5
|
Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . .
|
.... 514
|
|
|
15.5.1
|
Formal Definition and Techniques for HMMs.
|
.... 517
|
|
|
15.5.2
|
Evaluation:...... .Computing...... the.. Fit.. .Probability...... for.. Observed......
|
|
|
|
|
|
|
Sequence
|
............................
|
.... 518
|
|
|
15.5.3
|
Explanation: Determining the Most Likely State Sequence
|
|
|
|
|
for Observed Sequence . . . . . . . . . . . . . . . . . . . .
|
.... 519
|
|
|
15.5.4
|
Training: Baum–Welch Algorithm . . . . . . . . . . . . . .
|
.... 520
|
|
|
15.5.5
|
Applications . . . . . . . . . . . . . . . . . . . . . . . . . .
|
.... 521
|
|
15.6
|
Sequence Classification . . . . . . . . . . . . . . . . . . . . . . . . .
|
.... 521
|
|
|
15.6.1
|
Nearest Neighbor Classifier . . . . . . . . . . . . . . . . . .
|
.... 522
|
|
|
15.6.2
|
Graph-Based Methods . . . . . . . . . . . . . . . . . . . .
|
.... 522
|
|
|
15.6.3
|
Rule-Based Methods . . . . . . . . . . . . . . . . . . . . .
|
.... 523
|
|
|
15.6.4
|
Kernel Support Vector Machines . . . . . . . . . . . . . . .
|
.... 524
|
|
|
|
15.6.4.1
|
Bag-of-Words Kernel . . . . . . . . . . . . . . .
|
.... 524
|
|
|
|
15.6.4.2
|
Spectrum Kernel . . . . . . . . . . . . . . . . .
|
.... 524
|
|
|
|
15.6.4.3
|
Weighted Degree Kernel . . . . . . . . . . . . .
|
.... 525
|
|
|
15.6.5
|
Probabilistic Methods: Hidden Markov Models . . . . . . .
|
.... 525
|
|
15.7
|
Summary . . . .
|
............................
|
.... 526
|
|
15.8
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
.... 527
|
|
15.9
|
Exercises . . . . .
|
............................
|
.... 528
|
|