Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə423/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   415   416   417   418   419   420   421   422   423
1-Data Mining tarjima

χ2 Measure, 123

-diversity, 682




k-anonymity, 670, 671


t-closeness, 684

AdaBoost, 381


Agglomerative Clustering, 167


Aggregate Change Points, 419


Almost Closed Sets, 139


AMS Sketch, 406


Approximate Frequent Patterns, 139


Apriori Algorithm, 100


AR Model, 467


ARIMA Model, 469


ARMA Model, 469


Association Pattern Mining, 15, 93


Association Rule Hiding, 688


Association Rules, 98


Associative Classifiers, 305


Authorities, 602


Autoregressive Integrated Moving Average


Model, 469


Autoregressive Model, 467


Autoregressive Moving Average Model, 469


AVC-set, 351


Bag-of-Words Kernel, 524


Bagging, 379


Balaban Index, 573


Barabasi-Albert Model, 622


Baum-Welch Algorithm, 520


Bayes Classifier, 306


Bayes Optimal Privacy, 684


Bayes Reconstruction Method, 665


Bayes Text Classifier, 448


Behavioral Attributes, 10, 458, 532


Bernoulli Bayes Model, 309


Between-Class Scatter Matrix, 291


Betweenness Centrality, 626


Bias Term in SVMs, 314


Biased Sampling, 38


Big Data, 389


Binarization, 31


Binning of Time Series, 460


Biological Sequences, 493


BIRCH, 214


Bisecting K-Means, 173


Bloom Filter, 399


BOAT, 351


Boosting, 381


Bootstrap, 337


Bootstrapped Aggregating, 379


Bucket of Models, 383


Buckshot, 435


C4.5rules, 300


Candidate Distribution Algorithm, 112


Cascade, 655


Categorical Data Clustering, 206


CBA, 148, 305


Centrality, 623


Centroid Distance Signature, 533


Centroid-based Text Classification, 447


Chebychev Inequality, 394


Chernoff Bound (Lower-Tail), 395


Chernoff Bound (Upper-Tail), 396


Circuit Rank, 573


CLARA, 213


CLARANS, 213


Classification, 285








C. C. Aggarwal, Data Mining: The Textbook, DOI 10.1007/978-3-319-14142-8

727

c Springer International Publishing Switzerland 2015



728

Classification Based on Associations, 305 Classification of Time Series, 488 Classifier Evaluation, 334


Classifying Graphs, 582 Cleaning Data, 34 CLIQUE, 219 Closed Itemsets, 137


Closed Patterns, 137 Closeness Centrality, 624 CLUSEQ, 504


Cluster Digest for Text, 434 Cluster Validation, 195 Clustering, 153 Clustering Coefficient, 621 Clustering Data Streams, 411 Clustering Graphs, 579 Clustering Tendency, 154 Clustering Text, 434 Clustering Time Series, 476 Clusters and Outliers, 246 CluStream, 413 Co-clustering, 438


Co-clustering for Recommendations, 610 Co-location Patterns, 548 Co-Training, 363


Coefficient of Determination, 361, 468 Collaborative Filtering, 149, 234, 605 Collective Classification, 367, 641 Combination Outliers in Sequences, 508 Community Detection, 627 Compression-based Dissimilarity Measure,


513

Concept Drift, 22, 390 Condensation-based Anonymization, 680 Confidence, 97

Confidence Monotonicity, 98


Constrained Clustering, 225


Constrained Pattern Mining, 146


Constrained Sequential Patterns, 500


Content-based Recommendations, 605


Contextual Attributes, 10, 458, 532


CONTOUR, 504


Coordinate Descent, 355


Core of Joined Subgraphs, 578


Count-Min Sketch, 403


Cross-Validation, 336


CSketch, 417


CURE, 216


CVFDT, 423




INDEX

Cyclomatic Number, 573


Data Classification, 18, 285


Data Cleaning, 34


Data Clustering, 16, 153


Data Reduction, 37


Data Streams, 389


Data Type Portability, 30


Data Types, 6


Data-centered Ensembles, 278


DBSCAN, 181


Decision List, 300


Decision Trees, 293


Degree Centrality, 624


Degree Prestige, 624


DENCLUE, 184


Dendrogram, 168


Densification, 622


Density Attractors, 185


DepthProject Algorithm, 106


Differencing Time Series, 466


Diffusion Models, 655


Dijkstra Algorithm, 86


Dimensionality Curse in Privacy, 687


Dimensionality Reduction, 41


Discrete Cosine Transform, 464


Discrete Fourier Transform, 462


Discrete Sequence Similarity Measures, 82


Discretization, 30


Discriminative Classifier, 306


Distance-based Clustering, 159


Distance-based Entropy, 156


Distance-based Motifs, 473


Distance-based Outlier Detection, 248


Distance-based Sequence Clustering, 502


Distance-based Sequence Outliers, 513


Distributed Privacy, 689


Document Preparation, 431


Document-Term Matrix, 8


Domain Generalization Hierarchy, 670


Downward Closure Property, 96


DWT, 50

Dynamic Programming in HMM, 520

Dynamic Time Warping Distance, 79


Dynamics of Network Formation, 622


Early Termination Trick, 250 Earth Mover Distance, 685 Eckart-Young Theorem, 46





INDEX

Eclat, 110


Edit Distance, 82, 513


Edit Distance in Graphs, 567


Eigenvector Centrality, 627


EM Algorithm for Continuous Data, 173, 244 EM Algorithm for Data Clustering, 175 Embedded Models, 292


Energy of a Data Set, 46 Ensemble Classification, 373 Ensemble Clustering, 231 Ensemble-based Streaming Classification,


424

Entropy, 156, 289

Entropy -diversity, 683


Enumeration Tree, 103


Equivalence Class in Privacy, 671


Error Tree of Wavelet Representation, 52


Estrada Index, 572


Euclidean Metric, 64


Event Detection, 485


Evolutionary Outlier Algorithms, 271 Example Re-weighting, 348 Expected Error Reduction, 372 Expected Model Change, 371 Expected Variance Reduction, 373 Explaining Sequence Anomalies, 519 Exponential Smoothing, 461 Extreme Value Analysis, 239


Feature Bagging, 274

Feature Selection, 40


Feature Selection for Classification, 287 Feature Selection for Clustering, 154 Filter Models, 155, 288 Finite State Automaton, 509


First Story Detection, 418, 453 Fisher Score, 290


Fisher’s Linear Discriminant, 290


Flajolet-Martin Algorithm, 408


FOIL’s Information Gain, 304


Forward Algorithm, 519


Forward-backward Algorithm, 520


Fowlkes-Mallows Measure, 201


Fractionation, 435


Frequency-based Sequence Outliers, 514


Frequent Itemset, 93


Frequent Pattern Mining, 15, 93 Frequent Pattern Mining in Streams, 409 Frequent Substructure Mining, 575


729

Frequent Trajectory Paths, 546 Frequent Traversal Patterns, 615 Full-Domain Generalization, 673

Generalization in Privacy, 670


Generalization Property, 675


Generalized Linear Models, 357


Generative Classifier, 306


Geodesic Distances, 71


Gini Index, 288


Girvan-Newman Algorithm, 631


GLM, 357

Global Recoding, 672

Global Statistical Similarity, 74


Goodall Measure, 75


Graph Classification, 582


Graph Clustering, 579


Graph Database, 557


Graph Distances and Matching, 565


Graph Edit Distance, 567


Graph Isomorphism, 559


Graph Kernels, 573


Graph Matching, 559


Graph Similarity Measures, 85


Graph-based Algorithms, 187


Graph-based Collaborative Filtering, 608


Graph-based Methods, 522


Graph-based Semisupervised Learning, 367


Graph-based Sequence Clustering, 502


Graph-based Spatial Neighborhood, 541


Graph-based Spatial Outliers, 542


Graph-based Time-Series Clustering, 481


Gregariousness in Social Networks, 624


Grid-based Outliers, 255


Grid-based Projected Outliers, 270


GSP Algorithm, 495


Haar Wavelets, 50


Heavy Hitters, 405


Hidden Markov Model Clustering, 506


Hidden Markov Models, 514


Hierarchical Clustering Algorithms, 166


High Dimensional Privacy, 687


Hinge Loss, 319


Histogram-based Outliers, 255


HITS, 602


HMETIS, 232


HMM, 514

HMM Applications, 521


730

Hoeffding Inequality, 397


Hoeffding Trees, 421


Holdout, 336


Homophily, 58, 621


Hopkin’s Statistic, 157


Hosoya Index, 572


HOTSAX, 483


Hubs, 602


Hybrid Feature Selection, 159


Imputation, 49


Incognito, 675


Incognito Super-roots, 678


Inconsistent Data, 36


Independent Cascade Model, 656


Independent Ensembles, 276


Inductive Classifiers, 362


Influence Analysis, 655


Information Gain, 289


Information Theoretic Measures, 513


Instance-based Learning, 331


Instance-based Text Classification, 447


Interest Ratio, 124


Internal Validation Criteria, 196


Intrinsic Dimensionality, 41


Inverse Document Frequency, 74


Inverse Occurrence Frequency, 74


Inverted Index, 143


ISOMAP, 57, 71


Item-based Recommendations, 608


Itemset, 94


Iterative Classification Algorithm, 641


Jaccard Coefficient, 76, 432


Jaccard for Multiway Similarity, 125


K-Means, 162, 480


K-Medians, 164


K-Medoids, 164, 480, 579


K-Modes, 208


Katz Centrality, 653


Kernel Density Estimation, 256


Kernel Fisher’s Discriminant, 360


Kernel K-Means, 163, 325


Kernel Logistic Regression, 360


Kernel PCA, 44, 325


Kernel Ridge Regression, 359


Kernel SVM, 323, 524, 585


Kernel Trick, 323, 359




INDEX

Kernels in Graphs, 573 Kernighan-Lin Algorithm, 629 Keyword-based Sequence Similarity, 502 Kruskal Stress, 56


Label Propagation Algorithm, 643

Lagrangian Optimization in NMF, 193


Large Itemset, 93


Lasso, 355


Latent Components of NMF, 192 Latent Components of SVD, 47 Latent Factor Models, 611 Latent Semantic Indexing, 447 Law Enforcement, 18 Lazy Learners, 331


Learn-One-Rule, 302 Leave-One-Out Bootstrap, 337 Leave-One-Out Cross-Validation, 336 Left Eigenvector, 600


Level-wise Algorithms, 100 Levenshtein Distance, 82


Lexicographic Tree, 103 Likelihood Ratio Statistic, 304 Linear Discriminant Analysis, 291 Linear Threshold Model, 656 Link Prediction, 650


Link Prediction for Recommendations, 608 Loadshedding, 390


Local Outlier Factor, 252


Local Recoding, 672


LOF, 252

Logistic Regression, 310, 358 Longest Common Subsequence, 84 Lookahead-based Pruning, 110 Lossy Counting Algorithm, 410 LSA, 47, 447

MA Model, 468


Macro-clustering, 413


Mahalanobis k-means, 163


Mahalanobis Distance, 70, 242


Manhattan Metric, 64


Margin, 314


Margin Constraints, 315


Markov Inequality, 394


Massive-Domain Stream Clustering, 417


Massive-Domain Streaming Classification,


425




INDEX

Match-based Distance Measures in Graphs,


565

Maximal Frequent Itemsets, 96, 136

Maximum Common Subgraph, 561


Maximum Common Subgraph Problem, 564


Mean-Shift Clustering, 186


Mercer Kernel Map, 324


Mercer’s Theorem, 323


METIS, 634


Metric, 565


Micro-clustering, 413


Min-Max Scaling, 37


Minkowski Distance, 65


Missing Data, 35


Missing Time-Series Values, 459


Mixture Modeling, 173, 244


Model Selection, 383


Model-centered Ensembles, 277


Mondrian Algorithm, 678


Moore-Penrose Pseudoinverse, 49


Morgan Index, 572


Motif Discovery, 472


Moving Average Model, 468


Moving Average Smoothing, 460


Multiclass Learning, 346


Multidimensional Change Points, 419


Multidimensional Scaling, 55


Multidimensional Spatial Neighborhood, 541


Multidimensional Spatial Outliers, 542


Multilayer Neural Network, 328


Multinomial Bayes Model, 309, 448, 449


Multivariate Extreme Values, 242


Multivariate Time Series, 10, 458, 459


Multivariate Time-Series Forecasting, 470


Multiview Clustering, 231


Naive Bayes Classifier, 306 NCSA Common Log Format, 613 Near Duplicate Detection, 594 Nearest Neighbor Classifier, 522 Neighborhood-based Collaborative Filtering,


607

Network Data, 12

Neural Networks, 326


NMF, 191

Node-Induced Subgraph, 560

Noise Removal from Time Series, 460 Non-stationary Time Series, 465 Nonlinear Regression, 359


731

Nonlinear Support Vector Machines, 321

Nonnegative Matrix Factorization, 191


Normalization, 37


Normalization of Time Series, 461


Normalized Wavelet Basis, 52


Novelties in Text, 453


Oblivious Transfer Protocol, 690


One-Against-One Multiclass Learning, 347


One-Against-Rest Multiclass Learning, 347


Online Novelty Detection, 419


Online Time-Series Clustering, 477


ORCLUS, 222


Ordered Probit Regression, 359


Outlier Analysis, 17


Outlier Detection, 17


Outlier Ensembles, 274


Outlier Validity, 258


Output Privacy, 688


Overfitting, 287


PAA, 460

PageRank, 86, 592, 598 Partial Periodic Patterns, 476 Partition Algorithm, 110, 128 Partition-1, 111 PCA, 42

Perceptron, 326


Periodic Patterns, 476


Perturbation for Privacy, 664


Pessimistic Error Rate, 304


Piecewise Aggregate Approximation, 460 PLSA, 440


Point Outliers in Time Series, 482 Poisson Regression, 359 Polynomial Regression, 359 Pool-based Active Learning, 369 Position Outliers in Sequences, 507 Power-Iteration Method, 600 Power-Law Degree Distribution, 623 Predictive Attribute Dependence, 155 Preferential Attachment, 622 Preferential Crawlers, 591 Prestige, 623


Principal Component Analysis, 42 Principal Components Regression, 356 Privacy-Preserving Data Mining, 663 Privacy-Preserving Data Publishing, 667 Probabilistic Classifiers, 306

732

Probabilistic Clustering, 173


Probabilistic Latent Semantic Analysis, 440 Probabilistic Outlier Detection, 244 Probabilistic Suffix Trees, 510 Probabilistic Text Clustering, 436 Probit Regression, 359


PROCLUS, 220


Product Graph, 574


Profile Association Rules, 148 Projected Outliers, 270 Projection-based Reuse, 107 Projection-based Reuse of Support Count-


ing, 107

Proximal Gradient Methods, 355 Proximity Models for Mixed Data, 75 Proximity Prestige, 624 PST, 510

Pyramidal Time Frame, 415


Query Auditing, 688 Query-by-Committee, 371 Querying Patterns, 141 QuickSI Algorithm, 564


RainForest, 351


Randic Index, 573


Random Forests, 380


Random Subspace Ensemble, 274 Random Subspace Sampling, 273 Random Walks, 86, 598 Random-Walk Kernels, 573 Randomization for Privacy, 664 Rank Prestige, 627 Ranking Algorithms, 597


Rare Class Learning, 347


Ratings Matrix, 604


Recommendations, 149


Recommender Systems, 604


Recursive (c, )-diversity, 683


Regression Modeling, 353


Regularization, 312, 355, 613


Regularization in Collective Classification,


647

Rendezvous Label Propagation, 646 Representative-based Clustering, 159 Representativeness-based Active Learning,

373

Reservoir Sampling, 39, 391 Response Variable, 353


INDEX

Ridge Regression, 355


Right Eigenvector, 600


RIPPER, 300


Rocchio Classification, 448


ROCK, 209


Samarati’s Algorithm, 673


Sampling, 38


SAX, 32, 464


Scalable Classification, 350 Scalable Clustering, 212 Scalable Decision Trees, 351 Scale-Free Networks, 622 Scaling, 37


Scatter Gather Text Clustering, 434 Secure Multi-party Computation, 690 Secure Set Union Protocol, 690 Selective Sampling, 369 Self Training, 363


Semisupervised Bayes Classification, 364 Semisupervised Clustering, 224 Semisupervised Learning, 361 Sensor-Selection, 479


Sequence Classification, 521 Sequence Data, 10


Sequence Outlier Detection, 507 Sequential Covering Algorithms, 301 Sequential Ensembles, 275 Sequential Pattern Mining, 494 Shape Analysis, 533 Shape Clustering, 539


Shape Outliers, 543


Shape-based Time-Series Clustering, 479


Shared Nearest Neighbors, 73


Shingling, 594


Short Memory Property, 509 Shortest Path Kernels, 575


Shrinking Diameters, 623 Signature Table, 144


Similarity Computation with Mixed Data, 75 Simple Matching Coefficient, 513


Simple Redundancy, 143 SimRank, 86, 601


Singular Value Decomposition, 44 Small World Networks, 622 SMOTE, 350


Social Influence Analysis, 655 Soft SVM, 319


Spatial Co-location Patterns, 538





INDEX

Spatial Data, 11


Spatial Data Mining, 531 Spatial Outliers, 540


Spatial Tile Transformation, 547 Spatial Wavelets, 537 Spatiotemporal Data, 12 Spectral Clustering, 637 Spectral Decomposition, 47


Spectral Methods in Collective Classifica-tion, 646


Spectrum Kernel, 524


Spider Traps, 593


Spiders, 591


SPIRIT, 472


Stacking, 384


Standardization, 37, 354, 462


Stationary Time Series, 465


Stop-word Removal, 431


STORM, 426


Stratified Cross-Validation, 336 Stratified Sampling, 39 STREAM Algorithm, 411 Streaming Classification, 421 Streaming Data, 389


Streaming Frequent Pattern Mining, 409 Streaming Novelty Detection, 419 Streaming Outlier Detection, 417 Streaming Privacy, 681 Streaming Synopsis, 391


Strict Redundancy, 143


String Data, 10


Subgraph Isomorphism, 560


Subgraph Matching, 560


Subsequence, 495


Subsequence-based Clustering, 503


Superset-based Pruning, 110


Supervised Feature Selection, 41


Supervised Micro-clusters for Classification,


424

Support, 95

Support Vector Machines, 313 Support Vectors, 314 Suppression in Privacy, 670 SVD, 44


SVM for Text, 451


SVMLight, 352


SVMPerf, 451


Symbolic Aggregate Approximation, 32, 464 Symmetric Confidence Measure, 124


733

Synopsis for Streams, 391

Synthetic Data for Anonymization, 680 Synthetic Over-sampling, 350 System Diagnosis, 493


Tag Trees, 433


TARZAN, 514


Temporal Similarity Measures, 77


Term Strength, 155


Text Classification, 446


Text Clustering, 434


Text SVM, 451


Tikhonov Regularization, 355


Time Series Similarity Measures, 77


Time Warping, 78


Time-Series Classification, 485


Time-Series Correlation Clustering, 477


Time-Series Data, 9


Time-Series Data Mining, 457


Time-Series Forecasting, 464


Time-Series Preparation, 459


Topic Modeling, 440


Topic-Sensitive PageRank, 601


Topological Descriptors, 571


Trajectory Classification, 553


Trajectory Clustering, 549


Trajectory Mining, 544


Trajectory Outlier Detection, 551


Trajectory Pattern Mining, 546


Transductive Classifiers, 362, 583


Transductive Support Vector Machines, 366


TreeProjection Algorithm, 106


Triadic Closure, 621


Ullman’s Isomorphism Algorithm, 562 Uncertainty Sampling, 370 Universal Crawlers, 591 Unsupervised Feature Selection, 40 User-based Recommendations, 607 Utility in Privacy, 664, 674, 687, 691 Utility Matrix, 604


Value Generalization Hierarchy, 670 Velocity Density Estimation, 419 Vertical Counting Methods, 110 VF2 Algorithm, 564 Viterbi Algorithm, 519


Ward’s Method, 171 Wavelet-based Rules, 523

734

Wavelets, 50


Web Crawling, 591


Web Document Processing, 433


Web Resource Discovery, 591


Web Server Logs, 613


Web Usage Mining, 613


Weighted Degree Kernel, 525


Wiener Index, 572




INDEX

Within-Class Scatter Matrix, 291 Wrapper Models, 158, 292


XProj, 581


XRules, 584


Z-Index, 572



Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   415   416   417   418   419   420   421   422   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin