Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə7/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   2   3   4   5   6   7   8   9   10   ...   423
1-Data Mining tarjima

CONTENTS







xix

16 Mining Spatial Data




531

16.1

Introduction . . .

................................

531

16.2 Mining with Contextual Spatial Attributes . . . . . . . . . . . . . . . . . .

532




16.2.1

Shape to Time Series Transformation . . . . . . . . . . . . . . . .

533




16.2.2

Spatial to Multidimensional Transformation with Wavelets . . . .

537




16.2.3

Spatial Colocation Patterns . . . . . . . . . . . . . . . . . . . . .

538




16.2.4

Clustering Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . .

539




16.2.5

Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . .

540







16.2.5.1

Point Outliers . . . . . . . . . . . . . . . . . . . . . . .

541







16.2.5.2

Shape Outliers . . . . . . . . . . . . . . . . . . . . . . .

543




16.2.6

Classification of Shapes . . . . . . . . . . . . . . . . . . . . . . . .

544

16.3

Trajectory Mining

................................

544




16.3.1

Equivalence of Trajectories and Multivariate Time Series . . . . .

545




16.3.2

Converting Trajectories to Multidimensional Data . . . . . . . . .

545




16.3.3

Trajectory Pattern Mining . . . . . . . . . . . . . . . . . . . . . .

546







16.3.3.1

Frequent Trajectory Paths . . . . . . . . . . . . . . . .

546







16.3.3.2

Colocation Patterns . . . . . . . . . . . . . . . . . . . .

548




16.3.4

Trajectory Clustering . . . . . . . . . . . . . . . . . . . . . . . . .

549







16.3.4.1

Computing Similarity Between Trajectories . . . . . . .

549







16.3.4.2

Similarity-Based Clustering Methods . . . . . . . . . .

550







16.3.4.3

Trajectory Clustering as a Sequence Clustering













Problem . . . . . . . . . . . . . . . . . . . . . . . . . .

551




16.3.5

Trajectory Outlier Detection . . . . . . . . . . . . . . . . . . . . .

551







16.3.5.1

Distance-Based Methods . . . . . . . . . . . . . . . . .

551







16.3.5.2

Sequence-Based Methods . . . . . . . . . . . . . . . . .

552




16.3.6

Trajectory Classification . . . . . . . . . . . . . . . . . . . . . . .

553







16.3.6.1

Distance-Based Methods . . . . . . . . . . . . . . . . .

553







16.3.6.2

Sequence-Based Methods . . . . . . . . . . . . . . . . .

553

16.4

Summary . . . .

................................

554

16.5

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

554

16.6

Exercises . . . . .

................................

555

17 Mining Graph Data




557

17.1

Introduction . . .

................................

557

17.2 Matching and Distance Computation in Graphs . . . . . . . . . . . . . . .

559




17.2.1

Ullman’s Algorithm for Subgraph Isomorphism . . . . . . . . . .

562







17.2.1.1

Algorithm Variations and Refinements . . . . . . . . .

563




17.2.2

Maximum Common Subgraph (MCG) Problem . . . . . . . . . .

564




17.2.3

Graph Matching Methods for Distance Computation . . . . . . .

565







17.2.3.1

MCG-based Distances . . . . . . . . . . . . . . . . . . .

565







17.2.3.2

Graph Edit Distance . . . . . . . . . . . . . . . . . . .

567

17.3

Transformation-Based Distance Computation . . . . . . . . . . . . . . . .

570




17.3.1

Frequent Substructure-Based Transformation and Distance










Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

570




17.3.2

Topological Descriptors . . . . . . . . . . . . . . . . . . . . . . . .

571




17.3.3

Kernel-Based Transformations and Computation . . . . . . . . . .

573







17.3.3.1

Random Walk Kernels . . . . . . . . . . . . . . . . . .

573







17.3.3.2

Shortest-Path Kernels . . . . . . . . . . . . . . . . . . .

575

17.4 Frequent Substructure Mining in Graphs . . . . . . . . . . . . . . . . . . .

575




17.4.1

Node-Based Join Growth . . . . . . . . . . . . . . . . . . . . . . .

578




xx










CONTENTS




17.4.2

Edge-Based Join Growth . . . . . . . . . . . . . . . . . .

..... 578




17.4.3

Frequent Pattern Mining to Graph Pattern Mining . . .

..... 578

17.5

Graph Clustering

...........................

..... 579




17.5.1

Distance-Based Methods . . . . . . . . . . . . . . . . . .

..... 579




17.5.2

Frequent Substructure-Based Methods . . . . . . . . . .

..... 580







17.5.2.1

Generic Transformational Approach . . . . . .

..... 580







17.5.2.2 XProj: Direct Clustering with Frequent Subgraph










Discovery . . . . . . . . . . . . . . . . . . . . .

..... 581

17.6

Graph Classification . . . . . . . . . . . . . . . . . . . . . . . . .

..... 582




17.6.1

Distance-Based Methods . . . . . . . . . . . . . . . . . .

..... 583




17.6.2

Frequent Substructure-Based Methods . . . . . . . . . .

..... 583







17.6.2.1

Generic Transformational Approach . . . . . .

..... 583







17.6.2.2 XRules: A Rule-Based Approach . . . . . . . .

..... 584




17.6.3

Kernel SVMs . . . . . . . . . . . . . . . . . . . . . . . .

..... 585

17.7

Summary . . . .

...........................

..... 585

17.8

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .

..... 586

17.9

Exercises . . . . .

...........................

..... 586

18 Mining Web Data




589

18.1

Introduction . . .

...........................

..... 589

18.2

Web Crawling and Resource Discovery . . . . . . . . . . . . . . .

..... 591




18.2.1

A Basic Crawler Algorithm . . . . . . . . . . . . . . . . .

..... 591




18.2.2

Preferential Crawlers . . . . . . . . . . . . . . . . . . . .

..... 593




18.2.3

Multiple Threads . . . . . . . . . . . . . . . . . . . . . .

..... 593




18.2.4

Combatting Spider Traps . . . . . . . . . . . . . . . . . .

..... 593




18.2.5

Shingling for Near Duplicate Detection . . . . . . . . . .

..... 594

18.3

Search Engine Indexing and Query Processing . . . . . . . . . . .

..... 594

18.4

Ranking Algorithms . . . . . . . . . . . . . . . . . . . . . . . . .

..... 597




18.4.1

PageRank

..........................

..... 598







18.4.1.1

Topic-Sensitive PageRank . . . . . . . . . . .

..... 601







18.4.1.2

SimRank . . . . . . . . . . . . . . . . . . . . .

..... 601




18.4.2

HITS . .

...........................

..... 602

18.5

Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . .

..... 604




18.5.1

Content-Based Recommendations . . . . . . . . . . . . .

..... 606




18.5.2

Neighborhood-Based Methods for Collaborative Filtering

..... 607







18.5.2.1 User-Based Similarity with Ratings . . . . . .

..... 607







18.5.2.2 Item-Based Similarity with Ratings . . . . . .

..... 608




18.5.3

Graph-Based Methods . . . . . . . . . . . . . . . . . . .

..... 608




18.5.4

Clustering Methods . . . . . . . . . . . . . . . . . . . . .

..... 609







18.5.4.1

Adapting k-Means Clustering . . . . . . . . .

..... 610







18.5.4.2

Adapting Co-Clustering . . . . . . . . . . . . .

..... 610




18.5.5

Latent Factor Models . . . . . . . . . . . . . . . . . . . .

..... 611







18.5.5.1

Singular Value Decomposition . . . . . . . . .

..... 612







18.5.5.2

Matrix Factorization . . . . . . . . . . . . . .

..... 612

18.6

Web Usage Mining

..........................

..... 613




18.6.1

Data Preprocessing . . . . . . . . . . . . . . . . . . . . .

..... 614




18.6.2

Applications . . . . . . . . . . . . . . . . . . . . . . . . .

..... 614

18.7

Summary . . . .

...........................

..... 615

18.8

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .

..... 616

18.9

Exercises . . . . .

...........................

..... 616





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   10   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin