CONTENTS
|
|
|
xiii
|
|
8.8.2
|
Receiver Operating Characteristic . . . . . . . . . . . . . . . . . .
|
259
|
|
8.8.3
|
Common Mistakes . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
261
|
8.9
|
Summary . . . .
|
................................
|
261
|
8.10
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
262
|
8.11
|
Exercises . . . . .
|
................................
|
262
|
9 Outlier Analysis: Advanced Concepts
|
265
|
9.1
|
Introduction . . .
|
................................
|
265
|
9.2
|
Outlier Detection with Categorical Data . . . . . . . . . . . . . . . . . . .
|
266
|
|
9.2.1
|
Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . .
|
266
|
|
9.2.2
|
Clustering and Distance-Based Methods . . . . . . . . . . . . . .
|
267
|
|
9.2.3
|
Binary and Set-Valued Data . . . . . . . . . . . . . . . . . . . . .
|
268
|
9.3
|
High-Dimensional Outlier Detection . . . . . . . . . . . . . . . . . . . . . .
|
268
|
|
9.3.1
|
Grid-Based Rare Subspace Exploration . . . . . . . . . . . . . . .
|
270
|
|
|
9.3.1.1
|
Modeling Abnormal Lower Dimensional Projections . .
|
271
|
|
|
9.3.1.2
|
Grid Search for Subspace Outliers . . . . . . . . . . . .
|
271
|
|
9.3.2
|
Random Subspace Sampling . . . . . . . . . . . . . . . . . . . . .
|
273
|
9.4
|
Outlier Ensembles
|
................................
|
274
|
|
9.4.1
|
Categorization by Component Independence . . . . . . . . . . . .
|
275
|
|
|
9.4.1.1
|
Sequential Ensembles . . . . . . . . . . . . . . . . . . .
|
275
|
|
|
9.4.1.2
|
Independent Ensembles . . . . . . . . . . . . . . . . . .
|
276
|
|
9.4.2
|
Categorization by Constituent Components . . . . . . . . . . . .
|
277
|
|
|
9.4.2.1
|
Model-Centered Ensembles . . . . . . . . . . . . . . . .
|
277
|
|
|
9.4.2.2
|
Data-Centered Ensembles . . . . . . . . . . . . . . . . .
|
278
|
|
9.4.3
|
Normalization and Combination . . . . . . . . . . . . . . . . . . .
|
278
|
9.5
|
Putting Outliers to Work: Applications . . . . . . . . . . . . . . . . . . . .
|
279
|
|
9.5.1
|
Quality Control and Fault Detection . . . . . . . . . . . . . . . .
|
279
|
|
9.5.2
|
Financial Fraud and Anomalous Events . . . . . . . . . . . . . . .
|
280
|
|
9.5.3
|
Web Log Analytics . . . . . . . . . . . . . . . . . . . . . . . . . .
|
280
|
|
9.5.4
|
Intrusion Detection Applications . . . . . . . . . . . . . . . . . . .
|
280
|
|
9.5.5
|
Biological and Medical Applications . . . . . . . . . . . . . . . . .
|
281
|
|
9.5.6
|
Earth Science Applications . . . . . . . . . . . . . . . . . . . . . .
|
281
|
9.6
|
Summary . . . .
|
................................
|
281
|
9.7
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
281
|
9.8
|
Exercises . . . . .
|
................................
|
283
|
10 Data Classification
|
|
285
|
10.1
|
Introduction . . .
|
................................
|
285
|
10.2 Feature Selection for Classification . . . . . . . . . . . . . . . . . . . . . .
|
287
|
|
10.2.1
|
Filter Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
288
|
|
|
10.2.1.1
|
Gini Index . . . . . . . . . . . . . . . . . . . . . . . . .
|
288
|
|
|
10.2.1.2
|
Entropy . . . . . . . . . . . . . . . . . . . . . . . . . .
|
289
|
|
|
10.2.1.3
|
Fisher Score . . . . . . . . . . . . . . . . . . . . . . . .
|
290
|
|
|
10.2.1.4
|
Fisher’s Linear Discriminant . . . . . . . . . . . . . . .
|
290
|
|
10.2.2
|
Wrapper Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
292
|
|
10.2.3
|
Embedded Models . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
292
|
10.3
|
Decision Trees . .
|
................................
|
293
|
|
10.3.1
|
Split Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
294
|
|
10.3.2
|
Stopping Criterion and Pruning . . . . . . . . . . . . . . . . . . .
|
297
|
xiv
|
|
|
|
CONTENTS
|
|
10.3.3
|
Practical Issues . . . . . . . . . . . . . . . . . . . . . . .
|
..... 298
|
10.4
|
Rule-Based Classifiers . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 298
|
|
10.4.1
|
Rule Generation from Decision Trees . . . . . . . . . . .
|
..... 300
|
|
10.4.2
|
Sequential Covering Algorithms . . . . . . . . . . . . . .
|
..... 301
|
|
|
10.4.2.1 Learn-One-Rule . . . . . . . . . . . . . . . . .
|
..... 302
|
|
10.4.3
|
Rule Pruning . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 304
|
|
10.4.4
|
Associative Classifiers . . . . . . . . . . . . . . . . . . . .
|
..... 305
|
10.5
|
Probabilistic Classifiers . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 306
|
|
10.5.1
|
Naive Bayes Classifier . . . . . . . . . . . . . . . . . . . .
|
..... 306
|
|
|
10.5.1.1 The Ranking Model for Classification . . . . .
|
..... 309
|
|
|
10.5.1.2 Discussion of the Naive Assumption . . . . . .
|
..... 310
|
|
10.5.2
|
Logistic Regression . . . . . . . . . . . . . . . . . . . . .
|
..... 310
|
|
|
10.5.2.1 Training a Logistic Regression Classifier . . .
|
..... 311
|
|
|
10.5.2.2 Relationship with Other Linear Models . . . .
|
..... 312
|
10.6
|
Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . .
|
..... 313
|
|
10.6.1
|
Support Vector Machines for Linearly Separable Data . .
|
..... 313
|
|
|
10.6.1.1 Solving the Lagrangian Dual . . . . . . . . . .
|
..... 318
|
|
10.6.2
|
Support Vector Machines with Soft Margin
|
|
|
|
for Nonseparable Data . . . . . . . . . . . . . . . . . . .
|
..... 319
|
|
|
10.6.2.1 Comparison with Other Linear Models . . . .
|
..... 321
|
|
10.6.3
|
Nonlinear Support Vector Machines . . . . . . . . . . . .
|
..... 321
|
|
10.6.4
|
The Kernel Trick . . . . . . . . . . . . . . . . . . . . . .
|
..... 323
|
|
|
10.6.4.1 Other Applications of Kernel Methods . . . .
|
..... 325
|
10.7
|
Neural Networks
|
...........................
|
..... 326
|
|
10.7.1
|
Single-Layer Neural Network: The Perceptron . . . . . .
|
..... 326
|
|
10.7.2
|
Multilayer Neural Networks . . . . . . . . . . . . . . . .
|
..... 328
|
|
10.7.3
|
Comparing Various Linear Models . . . . . . . . . . . . .
|
..... 330
|
10.8
|
Instance-Based Learning . . . . . . . . . . . . . . . . . . . . . . .
|
..... 331
|
|
10.8.1
|
Design Variations of Nearest Neighbor Classifiers . . . .
|
..... 332
|
|
|
10.8.1.1
|
Unsupervised Mahalanobis Metric . . . . . . .
|
..... 332
|
|
|
10.8.1.2
|
Nearest Neighbors with Linear Discriminant Analysis . 332
|
10.9
|
Classifier Evaluation . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 334
|
|
10.9.1
|
Methodological Issues . . . . . . . . . . . . . . . . . . . .
|
..... 335
|
|
|
10.9.1.1
|
Holdout . . . . . . . . . . . . . . . . . . . . . .
|
..... 336
|
|
|
10.9.1.2 Cross-Validation . . . . . . . . . . . . . . . . .
|
..... 336
|
|
|
10.9.1.3
|
Bootstrap . . . . . . . . . . . . . . . . . . . .
|
..... 337
|
|
10.9.2
|
Quantification Issues . . . . . . . . . . . . . . . . . . . .
|
..... 337
|
|
|
10.9.2.1 Output as Class Labels . . . . . . . . . . . . .
|
..... 338
|
|
|
10.9.2.2 Output as Numerical Score . . . . . . . . . . .
|
..... 339
|
10.10
|
Summary . . . .
|
...........................
|
..... 342
|
10.11
|
Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 342
|
10.12
|
Exercises . . . . .
|
...........................
|
..... 343
|
11 Data Classification: Advanced Concepts
|
345
|
11.1
|
Introduction . . .
|
...........................
|
..... 345
|
11.2
|
Multiclass Learning . . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 346
|
11.3
|
Rare Class Learning . . . . . . . . . . . . . . . . . . . . . . . . .
|
..... 347
|
|
11.3.1
|
Example Reweighting . . . . . . . . . . . . . . . . . . . .
|
..... 348
|
|
11.3.2
|
Sampling Methods . . . . . . . . . . . . . . . . . . . . .
|
..... 349
|