Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə2/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   2   3   4   5   6   7   8   9   ...   423
1-Data Mining tarjima

CONTENTS







ix







3.2.1.7

Nonlinear Distributions: ISOMAP . . . . . . . . . . . .

70







3.2.1.8

Impact of Local Data Distribution . . . . . . . . . . . .

72







3.2.1.9

Computational Considerations . . . . . . . . . . . . . .

73




3.2.2

Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . . . .

74




3.2.3

Mixed Quantitative and Categorical Data . . . . . . . . . . . . .

75

3.3

Text Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75




3.3.1

Binary and Set Data . . . . . . . . . . . . . . . . . . . . . . . . .

77

3.4

Temporal Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . .

77




3.4.1

Time-Series Similarity Measures . . . . . . . . . . . . . . . . . . .

77







3.4.1.1

Impact of Behavioral Attribute Normalization . . . . .

78







3.4.1.2

Lp-Norm . . . . . . . . . . . . . . . . . . . . . . . . . .

79







3.4.1.3

Dynamic Time Warping Distance . . . . . . . . . . . .

79







3.4.1.4

Window-Based Methods . . . . . . . . . . . . . . . . .

82




3.4.2

Discrete Sequence Similarity Measures . . . . . . . . . . . . . . .

82







3.4.2.1

Edit Distance . . . . . . . . . . . . . . . . . . . . . . .

82







3.4.2.2

Longest Common Subsequence . . . . . . . . . . . . . .

84

3.5

Graph Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . .

85




3.5.1

Similarity between Two Nodes in a Single Graph . . . . . . . . .

85







3.5.1.1

Structural Distance-Based Measure . . . . . . . . . . .

85







3.5.1.2

Random Walk-Based Similarity . . . . . . . . . . . . .

86




3.5.2

Similarity Between Two Graphs . . . . . . . . . . . . . . . . . . .

86

3.6

Supervised Similarity Functions . . . . . . . . . . . . . . . . . . . . . . . .

87

3.7

Summary . . . .

................................

88

3.8

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

3.9

Exercises . . . . .

................................

90

4 Association Pattern Mining

93

4.1

Introduction . . .

................................

93

4.2

The Frequent Pattern Mining Model . . . . . . . . . . . . . . . . . . . . .

94

4.3

Association Rule Generation Framework . . . . . . . . . . . . . . . . . . .

97

4.4

Frequent Itemset Mining Algorithms . . . . . . . . . . . . . . . . . . . . .

99




4.4.1

Brute Force Algorithms . . . . . . . . . . . . . . . . . . . . . . . .

99




4.4.2

The Apriori Algorithm . . . . . . . . . . . . . . . . . . . . . . . .

100







4.4.2.1

Efficient Support Counting . . . . . . . . . . . . . . . .

102




4.4.3

Enumeration-Tree Algorithms . . . . . . . . . . . . . . . . . . . .

103







4.4.3.1

Enumeration-Tree-Based Interpretation of Apriori . . .

105







4.4.3.2

TreeProjection and DepthProject . . . . . . . . . . . .

106







4.4.3.3

Vertical Counting Methods . . . . . . . . . . . . . . . .

110




4.4.4

Recursive Suffix-Based Pattern Growth Methods . . . . . . . . . .

112







4.4.4.1

Implementation with Arrays but No Pointers . . . . . .

114







4.4.4.2

Implementation with Pointers but No FP-Tree . . . . .

114







4.4.4.3

Implementation with Pointers and FP-Tree . . . . . . .

116







4.4.4.4

Trade-offs with Different Data Structures . . . . . . . .

118







4.4.4.5

Relationship Between FP-Growth and Enumeration-













Tree Methods . . . . . . . . . . . . . . . . . . . . . . .

119

4.5

Alternative Models: Interesting Patterns . . . . . . . . . . . . . . . . . . .

122




4.5.1

Statistical Coefficient of Correlation . . . . . . . . . . . . . . . . .

123




4.5.2

χ2 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

123




4.5.3

Interest Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

124




x










CONTENTS




4.5.4

Symmetric Confidence Measures . . . . . . . . . . . . . .

..... 124




4.5.5

Cosine Coefficient on Columns . . . . . . . . . . . . . . .

..... 125




4.5.6

Jaccard Coefficient and the Min-hash Trick . . . . . . . .

..... 125




4.5.7

Collective Strength . . . . . . . . . . . . . . . . . . . . .

..... 126




4.5.8

Relationship to Negative Pattern Mining . . . . . . . . .

..... 127

4.6

Useful Meta-algorithms . . . . . . . . . . . . . . . . . . . . . . . .

..... 127




4.6.1

Sampling Methods . . . . . . . . . . . . . . . . . . . . .

..... 128




4.6.2

Data Partitioned Ensembles . . . . . . . . . . . . . . . .

..... 128




4.6.3

Generalization to Other Data Types . . . . . . . . . . .

..... 129







4.6.3.1

Quantitative Data . . . . . . . . . . . . . . . .

..... 129







4.6.3.2

Categorical Data . . . . . . . . . . . . . . . .

..... 129

4.7

Summary . . . .

...........................

..... 129

4.8

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .

..... 130

4.9

Exercises . . . . .

...........................

..... 132

5 Association Pattern Mining: Advanced Concepts

135

5.1

Introduction . . .

...........................

..... 135

5.2

Pattern Summarization . . . . . . . . . . . . . . . . . . . . . . . .

..... 136




5.2.1

Maximal Patterns . . . . . . . . . . . . . . . . . . . . . .

..... 136




5.2.2

Closed Patterns . . . . . . . . . . . . . . . . . . . . . . .

..... 137




5.2.3

Approximate Frequent Patterns . . . . . . . . . . . . . .

..... 139







5.2.3.1

Approximation in Terms of Transactions . . .

..... 139







5.2.3.2

Approximation in Terms of Itemsets . . . . . .

..... 140

5.3

Pattern Querying

...........................

..... 141




5.3.1

Preprocess-once Query-many Paradigm . . . . . . . . . .

..... 141







5.3.1.1

Leveraging the Itemset Lattice . . . . . . . . .

..... 142







5.3.1.2

Leveraging Data Structures for Querying . . .

..... 143




5.3.2

Pushing Constraints into Pattern Mining . . . . . . . . .

..... 146

5.4

Putting Associations to Work: Applications . . . . . . . . . . . .

..... 147




5.4.1

Relationship to Other Data Mining Problems . . . . . .

..... 147







5.4.1.1

Application to Classification . . . . . . . . . .

..... 147







5.4.1.2

Application to Clustering . . . . . . . . . . . .

..... 148







5.4.1.3

Applications to Outlier Detection . . . . . . .

..... 148




5.4.2

Market Basket Analysis . . . . . . . . . . . . . . . . . . .

..... 148




5.4.3

Demographic and Profile Analysis . . . . . . . . . . . . .

..... 148




5.4.4

Recommendations and Collaborative Filtering . . . . . .

..... 149




5.4.5

Web Log Analysis . . . . . . . . . . . . . . . . . . . . . .

..... 149




5.4.6

Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . .

..... 149




5.4.7

Other Applications for Complex Data Types . . . . . . .

..... 150

5.5

Summary . . . .

...........................

..... 150

5.6

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . .

..... 151

5.7

Exercises . . . . .

...........................

..... 152

6 Cluster Analysis




153

6.1

Introduction . . .

...........................

..... 153

6.2

Feature Selection for Clustering . . . . . . . . . . . . . . . . . . .

..... 154




6.2.1

Filter Models . . . . . . . . . . . . . . . . . . . . . . . .

..... 155







6.2.1.1

Term Strength . . . . . . . . . . . . . . . . . .

..... 155







6.2.1.2

Predictive Attribute Dependence . . . . . . .

..... 155





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin