Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə4/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   2   3   4   5   6   7   8   9   ...   423
1-Data Mining tarjima

CONTENTS







xiii




8.8.2

Receiver Operating Characteristic . . . . . . . . . . . . . . . . . .

259




8.8.3

Common Mistakes . . . . . . . . . . . . . . . . . . . . . . . . . . .

261

8.9

Summary . . . .

................................

261

8.10

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

262

8.11

Exercises . . . . .

................................

262

9 Outlier Analysis: Advanced Concepts

265

9.1

Introduction . . .

................................

265

9.2

Outlier Detection with Categorical Data . . . . . . . . . . . . . . . . . . .

266




9.2.1

Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . .

266




9.2.2

Clustering and Distance-Based Methods . . . . . . . . . . . . . .

267




9.2.3

Binary and Set-Valued Data . . . . . . . . . . . . . . . . . . . . .

268

9.3

High-Dimensional Outlier Detection . . . . . . . . . . . . . . . . . . . . . .

268




9.3.1

Grid-Based Rare Subspace Exploration . . . . . . . . . . . . . . .

270







9.3.1.1

Modeling Abnormal Lower Dimensional Projections . .

271







9.3.1.2

Grid Search for Subspace Outliers . . . . . . . . . . . .

271




9.3.2

Random Subspace Sampling . . . . . . . . . . . . . . . . . . . . .

273

9.4

Outlier Ensembles

................................

274




9.4.1

Categorization by Component Independence . . . . . . . . . . . .

275







9.4.1.1

Sequential Ensembles . . . . . . . . . . . . . . . . . . .

275







9.4.1.2

Independent Ensembles . . . . . . . . . . . . . . . . . .

276




9.4.2

Categorization by Constituent Components . . . . . . . . . . . .

277







9.4.2.1

Model-Centered Ensembles . . . . . . . . . . . . . . . .

277







9.4.2.2

Data-Centered Ensembles . . . . . . . . . . . . . . . . .

278




9.4.3

Normalization and Combination . . . . . . . . . . . . . . . . . . .

278

9.5

Putting Outliers to Work: Applications . . . . . . . . . . . . . . . . . . . .

279




9.5.1

Quality Control and Fault Detection . . . . . . . . . . . . . . . .

279




9.5.2

Financial Fraud and Anomalous Events . . . . . . . . . . . . . . .

280




9.5.3

Web Log Analytics . . . . . . . . . . . . . . . . . . . . . . . . . .

280




9.5.4

Intrusion Detection Applications . . . . . . . . . . . . . . . . . . .

280




9.5.5

Biological and Medical Applications . . . . . . . . . . . . . . . . .

281




9.5.6

Earth Science Applications . . . . . . . . . . . . . . . . . . . . . .

281

9.6

Summary . . . .

................................

281

9.7

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

281

9.8

Exercises . . . . .

................................

283

10 Data Classification




285

10.1

Introduction . . .

................................

285

10.2 Feature Selection for Classification . . . . . . . . . . . . . . . . . . . . . .

287




10.2.1

Filter Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

288







10.2.1.1

Gini Index . . . . . . . . . . . . . . . . . . . . . . . . .

288







10.2.1.2

Entropy . . . . . . . . . . . . . . . . . . . . . . . . . .

289







10.2.1.3

Fisher Score . . . . . . . . . . . . . . . . . . . . . . . .

290







10.2.1.4

Fisher’s Linear Discriminant . . . . . . . . . . . . . . .

290




10.2.2

Wrapper Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .

292




10.2.3

Embedded Models . . . . . . . . . . . . . . . . . . . . . . . . . . .

292

10.3

Decision Trees . .

................................

293




10.3.1

Split Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

294




10.3.2

Stopping Criterion and Pruning . . . . . . . . . . . . . . . . . . .

297




xiv










CONTENTS




10.3.3

Practical Issues . . . . . . . . . . . . . . . . . . . . . . .

..... 298

10.4

Rule-Based Classifiers . . . . . . . . . . . . . . . . . . . . . . . .

..... 298




10.4.1

Rule Generation from Decision Trees . . . . . . . . . . .

..... 300




10.4.2

Sequential Covering Algorithms . . . . . . . . . . . . . .

..... 301







10.4.2.1 Learn-One-Rule . . . . . . . . . . . . . . . . .

..... 302




10.4.3

Rule Pruning . . . . . . . . . . . . . . . . . . . . . . . .

..... 304




10.4.4

Associative Classifiers . . . . . . . . . . . . . . . . . . . .

..... 305

10.5

Probabilistic Classifiers . . . . . . . . . . . . . . . . . . . . . . . .

..... 306




10.5.1

Naive Bayes Classifier . . . . . . . . . . . . . . . . . . . .

..... 306







10.5.1.1 The Ranking Model for Classification . . . . .

..... 309







10.5.1.2 Discussion of the Naive Assumption . . . . . .

..... 310




10.5.2

Logistic Regression . . . . . . . . . . . . . . . . . . . . .

..... 310







10.5.2.1 Training a Logistic Regression Classifier . . .

..... 311







10.5.2.2 Relationship with Other Linear Models . . . .

..... 312

10.6

Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . .

..... 313




10.6.1

Support Vector Machines for Linearly Separable Data . .

..... 313







10.6.1.1 Solving the Lagrangian Dual . . . . . . . . . .

..... 318




10.6.2

Support Vector Machines with Soft Margin










for Nonseparable Data . . . . . . . . . . . . . . . . . . .

..... 319







10.6.2.1 Comparison with Other Linear Models . . . .

..... 321




10.6.3

Nonlinear Support Vector Machines . . . . . . . . . . . .

..... 321




10.6.4

The Kernel Trick . . . . . . . . . . . . . . . . . . . . . .

..... 323







10.6.4.1 Other Applications of Kernel Methods . . . .

..... 325

10.7

Neural Networks

...........................

..... 326




10.7.1

Single-Layer Neural Network: The Perceptron . . . . . .

..... 326




10.7.2

Multilayer Neural Networks . . . . . . . . . . . . . . . .

..... 328




10.7.3

Comparing Various Linear Models . . . . . . . . . . . . .

..... 330

10.8

Instance-Based Learning . . . . . . . . . . . . . . . . . . . . . . .

..... 331




10.8.1

Design Variations of Nearest Neighbor Classifiers . . . .

..... 332







10.8.1.1

Unsupervised Mahalanobis Metric . . . . . . .

..... 332







10.8.1.2

Nearest Neighbors with Linear Discriminant Analysis . 332

10.9

Classifier Evaluation . . . . . . . . . . . . . . . . . . . . . . . . .

..... 334




10.9.1

Methodological Issues . . . . . . . . . . . . . . . . . . . .

..... 335







10.9.1.1

Holdout . . . . . . . . . . . . . . . . . . . . . .

..... 336







10.9.1.2 Cross-Validation . . . . . . . . . . . . . . . . .

..... 336







10.9.1.3

Bootstrap . . . . . . . . . . . . . . . . . . . .

..... 337




10.9.2

Quantification Issues . . . . . . . . . . . . . . . . . . . .

..... 337







10.9.2.1 Output as Class Labels . . . . . . . . . . . . .

..... 338







10.9.2.2 Output as Numerical Score . . . . . . . . . . .

..... 339

10.10

Summary . . . .

...........................

..... 342

10.11

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .

..... 342

10.12

Exercises . . . . .

...........................

..... 343

11 Data Classification: Advanced Concepts

345

11.1

Introduction . . .

...........................

..... 345

11.2

Multiclass Learning . . . . . . . . . . . . . . . . . . . . . . . . . .

..... 346

11.3

Rare Class Learning . . . . . . . . . . . . . . . . . . . . . . . . .

..... 347




11.3.1

Example Reweighting . . . . . . . . . . . . . . . . . . . .

..... 348




11.3.2

Sampling Methods . . . . . . . . . . . . . . . . . . . . .

..... 349





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin