CONTENTS
|
|
|
xix
|
16 Mining Spatial Data
|
|
531
|
16.1
|
Introduction . . .
|
................................
|
531
|
16.2 Mining with Contextual Spatial Attributes . . . . . . . . . . . . . . . . . .
|
532
|
|
16.2.1
|
Shape to Time Series Transformation . . . . . . . . . . . . . . . .
|
533
|
|
16.2.2
|
Spatial to Multidimensional Transformation with Wavelets . . . .
|
537
|
|
16.2.3
|
Spatial Colocation Patterns . . . . . . . . . . . . . . . . . . . . .
|
538
|
|
16.2.4
|
Clustering Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
539
|
|
16.2.5
|
Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
540
|
|
|
16.2.5.1
|
Point Outliers . . . . . . . . . . . . . . . . . . . . . . .
|
541
|
|
|
16.2.5.2
|
Shape Outliers . . . . . . . . . . . . . . . . . . . . . . .
|
543
|
|
16.2.6
|
Classification of Shapes . . . . . . . . . . . . . . . . . . . . . . . .
|
544
|
16.3
|
Trajectory Mining
|
................................
|
544
|
|
16.3.1
|
Equivalence of Trajectories and Multivariate Time Series . . . . .
|
545
|
|
16.3.2
|
Converting Trajectories to Multidimensional Data . . . . . . . . .
|
545
|
|
16.3.3
|
Trajectory Pattern Mining . . . . . . . . . . . . . . . . . . . . . .
|
546
|
|
|
16.3.3.1
|
Frequent Trajectory Paths . . . . . . . . . . . . . . . .
|
546
|
|
|
16.3.3.2
|
Colocation Patterns . . . . . . . . . . . . . . . . . . . .
|
548
|
|
16.3.4
|
Trajectory Clustering . . . . . . . . . . . . . . . . . . . . . . . . .
|
549
|
|
|
16.3.4.1
|
Computing Similarity Between Trajectories . . . . . . .
|
549
|
|
|
16.3.4.2
|
Similarity-Based Clustering Methods . . . . . . . . . .
|
550
|
|
|
16.3.4.3
|
Trajectory Clustering as a Sequence Clustering
|
|
|
|
|
Problem . . . . . . . . . . . . . . . . . . . . . . . . . .
|
551
|
|
16.3.5
|
Trajectory Outlier Detection . . . . . . . . . . . . . . . . . . . . .
|
551
|
|
|
16.3.5.1
|
Distance-Based Methods . . . . . . . . . . . . . . . . .
|
551
|
|
|
16.3.5.2
|
Sequence-Based Methods . . . . . . . . . . . . . . . . .
|
552
|
|
16.3.6
|
Trajectory Classification . . . . . . . . . . . . . . . . . . . . . . .
|
553
|
|
|
16.3.6.1
|
Distance-Based Methods . . . . . . . . . . . . . . . . .
|
553
|
|
|
16.3.6.2
|
Sequence-Based Methods . . . . . . . . . . . . . . . . .
|
553
|
16.4
|
Summary . . . .
|
................................
|
554
|
16.5
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
554
|
16.6
|
Exercises . . . . .
|
................................
|
555
|
17 Mining Graph Data
|
|
557
|
17.1
|
Introduction . . .
|
................................
|
557
|
17.2 Matching and Distance Computation in Graphs . . . . . . . . . . . . . . .
|
559
|
|
17.2.1
|
Ullman’s Algorithm for Subgraph Isomorphism . . . . . . . . . .
|
562
|
|
|
17.2.1.1
|
Algorithm Variations and Refinements . . . . . . . . .
|
563
|
|
17.2.2
|
Maximum Common Subgraph (MCG) Problem . . . . . . . . . .
|
564
|
|
17.2.3
|
Graph Matching Methods for Distance Computation . . . . . . .
|
565
|
|
|
17.2.3.1
|
MCG-based Distances . . . . . . . . . . . . . . . . . . .
|
565
|
|
|
17.2.3.2
|
Graph Edit Distance . . . . . . . . . . . . . . . . . . .
|
567
|
17.3
|
Transformation-Based Distance Computation . . . . . . . . . . . . . . . .
|
570
|
|
17.3.1
|
Frequent Substructure-Based Transformation and Distance
|
|
|
|
Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
570
|
|
17.3.2
|
Topological Descriptors . . . . . . . . . . . . . . . . . . . . . . . .
|
571
|
|
17.3.3
|
Kernel-Based Transformations and Computation . . . . . . . . . .
|
573
|
|
|
17.3.3.1
|
Random Walk Kernels . . . . . . . . . . . . . . . . . .
|
573
|
|
|
17.3.3.2
|
Shortest-Path Kernels . . . . . . . . . . . . . . . . . . .
|
575
|
17.4 Frequent Substructure Mining in Graphs . . . . . . . . . . . . . . . . . . .
|
575
|
|
17.4.1
|
Node-Based Join Growth . . . . . . . . . . . . . . . . . . . . . . .
|
578
|
xx
|
|
|
|
CONTENTS
|
|
17.4.2
|
Edge-Based Join Growth . . . . . . . . . . . . . . . . . .
|
..... 578
|
|
17.4.3
|
Frequent Pattern Mining to Graph Pattern Mining . . .
|
..... 578
|
17.5
|
Graph Clustering
|
...........................
|
..... 579
|
|
17.5.1
|
Distance-Based Methods . . . . . . . . . . . . . . . . . .
|
..... 579
|
|
17.5.2
|
Frequent Substructure-Based Methods . . . . . . . . . .
|
..... 580
|
|
|
17.5.2.1
|
Generic Transformational Approach . . . . . .
|
..... 580
|
|
|
17.5.2.2 XProj: Direct Clustering with Frequent Subgraph
|
|
|
|
Discovery . . . . . . . . . . . . . . . . . . . . .
|
..... 581
|
17.6
|
Graph Classification . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 582
|
|
17.6.1
|
Distance-Based Methods . . . . . . . . . . . . . . . . . .
|
..... 583
|
|
17.6.2
|
Frequent Substructure-Based Methods . . . . . . . . . .
|
..... 583
|
|
|
17.6.2.1
|
Generic Transformational Approach . . . . . .
|
..... 583
|
|
|
17.6.2.2 XRules: A Rule-Based Approach . . . . . . . .
|
..... 584
|
|
17.6.3
|
Kernel SVMs . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 585
|
17.7
|
Summary . . . .
|
...........................
|
..... 585
|
17.8
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 586
|
17.9
|
Exercises . . . . .
|
...........................
|
..... 586
|
18 Mining Web Data
|
|
589
|
18.1
|
Introduction . . .
|
...........................
|
..... 589
|
18.2
|
Web Crawling and Resource Discovery . . . . . . . . . . . . . . .
|
..... 591
|
|
18.2.1
|
A Basic Crawler Algorithm . . . . . . . . . . . . . . . . .
|
..... 591
|
|
18.2.2
|
Preferential Crawlers . . . . . . . . . . . . . . . . . . . .
|
..... 593
|
|
18.2.3
|
Multiple Threads . . . . . . . . . . . . . . . . . . . . . .
|
..... 593
|
|
18.2.4
|
Combatting Spider Traps . . . . . . . . . . . . . . . . . .
|
..... 593
|
|
18.2.5
|
Shingling for Near Duplicate Detection . . . . . . . . . .
|
..... 594
|
18.3
|
Search Engine Indexing and Query Processing . . . . . . . . . . .
|
..... 594
|
18.4
|
Ranking Algorithms . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 597
|
|
18.4.1
|
PageRank
|
..........................
|
..... 598
|
|
|
18.4.1.1
|
Topic-Sensitive PageRank . . . . . . . . . . .
|
..... 601
|
|
|
18.4.1.2
|
SimRank . . . . . . . . . . . . . . . . . . . . .
|
..... 601
|
|
18.4.2
|
HITS . .
|
...........................
|
..... 602
|
18.5
|
Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 604
|
|
18.5.1
|
Content-Based Recommendations . . . . . . . . . . . . .
|
..... 606
|
|
18.5.2
|
Neighborhood-Based Methods for Collaborative Filtering
|
..... 607
|
|
|
18.5.2.1 User-Based Similarity with Ratings . . . . . .
|
..... 607
|
|
|
18.5.2.2 Item-Based Similarity with Ratings . . . . . .
|
..... 608
|
|
18.5.3
|
Graph-Based Methods . . . . . . . . . . . . . . . . . . .
|
..... 608
|
|
18.5.4
|
Clustering Methods . . . . . . . . . . . . . . . . . . . . .
|
..... 609
|
|
|
18.5.4.1
|
Adapting k-Means Clustering . . . . . . . . .
|
..... 610
|
|
|
18.5.4.2
|
Adapting Co-Clustering . . . . . . . . . . . . .
|
..... 610
|
|
18.5.5
|
Latent Factor Models . . . . . . . . . . . . . . . . . . . .
|
..... 611
|
|
|
18.5.5.1
|
Singular Value Decomposition . . . . . . . . .
|
..... 612
|
|
|
18.5.5.2
|
Matrix Factorization . . . . . . . . . . . . . .
|
..... 612
|
18.6
|
Web Usage Mining
|
..........................
|
..... 613
|
|
18.6.1
|
Data Preprocessing . . . . . . . . . . . . . . . . . . . . .
|
..... 614
|
|
18.6.2
|
Applications . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 614
|
18.7
|
Summary . . . .
|
...........................
|
..... 615
|
18.8
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 616
|
18.9
|
Exercises . . . . .
|
...........................
|
..... 616
|