Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə3/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   2   3   4   5   6   7   8   9   ...   423
1-Data Mining tarjima

CONTENTS







xi







6.2.1.3

Entropy . . . . . . . . . . . . . . . . . . . . . . . . . .

156







6.2.1.4

Hopkins Statistic . . . . . . . . . . . . . . . . . . . . .

157




6.2.2

Wrapper Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .

158

6.3

Representative-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . .

159




6.3.1

The k-Means Algorithm . . . . . . . . . . . . . . . . . . . . . . .

162




6.3.2

The Kernel k-Means Algorithm . . . . . . . . . . . . . . . . . . .

163




6.3.3

The k-Medians Algorithm . . . . . . . . . . . . . . . . . . . . . .

164




6.3.4

The k-Medoids Algorithm . . . . . . . . . . . . . . . . . . . . . .

164

6.4

Hierarchical Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . .

166




6.4.1

Bottom-Up Agglomerative Methods . . . . . . . . . . . . . . . . .

167







6.4.1.1

Group-Based Statistics . . . . . . . . . . . . . . . . . .

169




6.4.2

Top-Down Divisive Methods . . . . . . . . . . . . . . . . . . . . .

172







6.4.2.1

Bisecting k-Means . . . . . . . . . . . . . . . . . . . . .

173

6.5

Probabilistic Model-Based Algorithms . . . . . . . . . . . . . . . . . . . .

173




6.5.1

Relationship of EM to k-means and Other Representative










Methods

................................

176

6.6

Grid-Based and Density-Based Algorithms . . . . . . . . . . . . . . . . . .

178




6.6.1

Grid-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . .

179




6.6.2

DBSCAN

...............................

181




6.6.3

DENCLUE...............................

184

6.7

Graph-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . .

187




6.7.1

Properties of Graph-Based Algorithms . . . . . . . . . . . . . . .

189

6.8

Non-negative Matrix Factorization . . . . . . . . . . . . . . . . . . . . . .

191




6.8.1

Comparison with Singular Value Decomposition . . . . . . . . . .

194

6.9

Cluster Validation

................................

195




6.9.1

Internal Validation Criteria . . . . . . . . . . . . . . . . . . . . . .

196







6.9.1.1

Parameter Tuning with Internal Measures . . . . . . .

198




6.9.2

External Validation Criteria . . . . . . . . . . . . . . . . . . . . .

198




6.9.3

General Comments . . . . . . . . . . . . . . . . . . . . . . . . . .

201

6.10

Summary . . . .

................................

201

6.11

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

201

6.12

Exercises . . . . .

................................

202

7 Cluster Analysis: Advanced Concepts

205

7.1

Introduction . . .

................................

205

7.2

Clustering Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . . .

206




7.2.1

Representative-Based Algorithms . . . . . . . . . . . . . . . . . .

207







7.2.1.1

k-Modes Clustering . . . . . . . . . . . . . . . . . . . .

208







7.2.1.2

k-Medoids Clustering . . . . . . . . . . . . . . . . . . .

209




7.2.2

Hierarchical Algorithms . . . . . . . . . . . . . . . . . . . . . . . .

209







7.2.2.1

ROCK ...........................

209




7.2.3

Probabilistic Algorithms . . . . . . . . . . . . . . . . . . . . . . .

211




7.2.4

Graph-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . .

212

7.3

Scalable Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . .

212




7.3.1

CLARANS...............................

213




7.3.2

BIRCH .

................................

214




7.3.3

CURE .

................................

216

7.4

High-Dimensional Clustering . . . . . . . . . . . . . . . . . . . . . . . . . .

217




7.4.1

CLIQUE

................................

219




7.4.2

PROCLUS...............................

220




xii










CONTENTS




7.4.3

ORCLUS

...........................

..... 222

7.5

Semisupervised Clustering . . . . . . . . . . . . . . . . . . . . . .

..... 224




7.5.1

Pointwise Supervision . . . . . . . . . . . . . . . . . . . .

..... 225




7.5.2

Pairwise Supervision . . . . . . . . . . . . . . . . . . . .

..... 226

7.6

Human and Visually Supervised Clustering . . . . . . . . . . . .

..... 227




7.6.1

Modifications of Existing Clustering Algorithms . . . . .

..... 228




7.6.2

Visual Clustering . . . . . . . . . . . . . . . . . . . . . .

..... 228

7.7

Cluster Ensembles

..........................

..... 231




7.7.1

Selecting Different Ensemble Components . . . . . . . .

..... 231




7.7.2

Combining Different Ensemble Components . . . . . . .

..... 232







7.7.2.1

Hypergraph Partitioning Algorithm . . . . . .

..... 232







7.7.2.2

Meta-clustering Algorithm . . . . . . . . . . .

..... 232

7.8

Putting Clustering to Work: Applications . . . . . . . . . . . . .

..... 233




7.8.1

Applications to Other Data Mining Problems . . . . . .

..... 233







7.8.1.1

Data Summarization . . . . . . . . . . . . . .

..... 233







7.8.1.2

Outlier Analysis . . . . . . . . . . . . . . . . .

..... 233







7.8.1.3

Classification . . . . . . . . . . . . . . . . . . .

..... 233







7.8.1.4

Dimensionality Reduction . . . . . . . . . . .

..... 234







7.8.1.5

Similarity Search and Indexing . . . . . . . . .

..... 234




7.8.2

Customer Segmentation and Collaborative Filtering . . .

..... 234




7.8.3

Text Applications . . . . . . . . . . . . . . . . . . . . . .

..... 234




7.8.4

Multimedia Applications . . . . . . . . . . . . . . . . . .

..... 234




7.8.5

Temporal and Sequence Applications . . . . . . . . . . .

..... 234




7.8.6

Social Network Analysis . . . . . . . . . . . . . . . . . .

..... 235

7.9

Summary . . . .

...........................

..... 235

7.10

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .

..... 235

7.11

Exercises . . . . .

...........................

..... 236

8 Outlier Analysis




237

8.1

Introduction . . .

...........................

..... 237

8.2

Extreme Value Analysis . . . . . . . . . . . . . . . . . . . . . . .

..... 239




8.2.1

Univariate Extreme Value Analysis . . . . . . . . . . . .

..... 240




8.2.2

Multivariate Extreme Values . . . . . . . . . . . . . . . .

..... 242




8.2.3

Depth-Based Methods . . . . . . . . . . . . . . . . . . .

..... 243

8.3

Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . .

..... 244

8.4

Clustering for Outlier Detection . . . . . . . . . . . . . . . . . . .

..... 246

8.5

Distance-Based Outlier Detection . . . . . . . . . . . . . . . . . .

..... 248




8.5.1

Pruning Methods . . . . . . . . . . . . . . . . . . . . . .

..... 249







8.5.1.1

Sampling Methods . . . . . . . . . . . . . . . .

..... 249







8.5.1.2

Early Termination Trick with Nested Loops .

..... 250




8.5.2

Local Distance Correction Methods . . . . . . . . . . . .

..... 251







8.5.2.1

Local Outlier Factor (LOF) . . . . . . . . . .

..... 252







8.5.2.2

Instance-Specific Mahalanobis Distance . . . .

..... 254

8.6

Density-Based Methods . . . . . . . . . . . . . . . . . . . . . . .

..... 255




8.6.1

Histogram- and Grid-Based Techniques . . . . . . . . . .

..... 255




8.6.2

Kernel Density Estimation . . . . . . . . . . . . . . . . .

..... 256

8.7

Information-Theoretic Models . . . . . . . . . . . . . . . . . . . .

..... 256

8.8

Outlier Validity .

...........................

..... 258




8.8.1

Methodological Challenges . . . . . . . . . . . . . . . . .

..... 258





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin