Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə1/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
  1   2   3   4   5   6   7   8   9   ...   423
1-Data Mining tarjima


Data Mining: The Textbook


Charu C. Aggarwal

Data Mining


The Textbook

Charu C. Aggarwal


IBM T.J. Watson Research Center


Yorktown Heights


New York

USA

A solution manual for this book is available on Springer.com.


ISBN 978-3-319-14141-1 ISBN 978-3-319-14142-8 (eBook)


DOI 10.1007/978-3-319-14142-8


Library of Congress Control Number: 2015930833


Springer Cham Heidelberg New York Dordrecht London c Springer International Publishing Switzerland 2015


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.


The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.


The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.


Printed on acid-free paper


Springer is part of Springer Science+Business Media (www.springer.com)

To my wife Lata,

and my daughter Sayani


v


Contents




1 An Introduction to Data Mining

1

1.1

Introduction . . .

................................

1

1.2

The Data Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . .

3




1.2.1

The Data Preprocessing Phase . . . . . . . . . . . . . . . . . . . .

5




1.2.2

The Analytical Phase . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.3

The Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6




1.3.1

Nondependency-Oriented Data . . . . . . . . . . . . . . . . . . . .

7







1.3.1.1

Quantitative Multidimensional Data . . . . . . . . . . .

7







1.3.1.2

Categorical and Mixed Attribute Data . . . . . . . . .

8







1.3.1.3

Binary and Set Data . . . . . . . . . . . . . . . . . . .

8







1.3.1.4

Text Data . . . . . . . . . . . . . . . . . . . . . . . . .

8




1.3.2

Dependency-Oriented Data . . . . . . . . . . . . . . . . . . . . . .

9







1.3.2.1

Time-Series Data . . . . . . . . . . . . . . . . . . . . .

9







1.3.2.2

Discrete Sequences and Strings . . . . . . . . . . . . . .

10







1.3.2.3

Spatial Data . . . . . . . . . . . . . . . . . . . . . . . .

11







1.3.2.4

Network and Graph Data . . . . . . . . . . . . . . . . .

12

1.4

The Major Building Blocks: A Bird’s Eye View . . . . . . . . . . . . . . .

14




1.4.1

Association Pattern Mining . . . . . . . . . . . . . . . . . . . . .

15




1.4.2

Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16




1.4.3

Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . .

17




1.4.4

Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . .

18




1.4.5

Impact of Complex Data Types on Problem Definitions . . . . . .

19







1.4.5.1

Pattern Mining with Complex Data Types . . . . . . .

20







1.4.5.2

Clustering with Complex Data Types . . . . . . . . . .

20







1.4.5.3

Outlier Detection with Complex Data Types . . . . . .

21







1.4.5.4

Classification with Complex Data Types . . . . . . . .

21

1.5

Scalability Issues and the Streaming Scenario . . . . . . . . . . . . . . . .

21

1.6

A Stroll Through Some Application Scenarios . . . . . . . . . . . . . . . .

22




1.6.1

Store Product Placement . . . . . . . . . . . . . . . . . . . . . . .

22




1.6.2

Customer Recommendations . . . . . . . . . . . . . . . . . . . . .

23




1.6.3

Medical Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . .

23




1.6.4

Web Log Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . .

24

1.7

Summary . . . .

................................

24




vii


viii










CONTENTS




1.8

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25




1.9

Exercises . . . . .

................................

25

2

Data Preparation




27




2.1

Introduction . . .

................................

27




2.2

Feature Extraction and Portability . . . . . . . . . . . . . . . . . . . . . .

28







2.2.1

Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . .

28







2.2.2

Data Type Portability . . . . . . . . . . . . . . . . . . . . . . . .

30










2.2.2.1

Numeric to Categorical Data: Discretization . . . . . .

30










2.2.2.2

Categorical to Numeric Data: Binarization . . . . . . .

31










2.2.2.3

Text to Numeric Data . . . . . . . . . . . . . . . . . . .

31










2.2.2.4

Time Series to Discrete Sequence Data . . . . . . . . .

32










2.2.2.5

Time Series to Numeric Data . . . . . . . . . . . . . . .

32










2.2.2.6

Discrete Sequence to Numeric Data . . . . . . . . . . .

33










2.2.2.7

Spatial to Numeric Data . . . . . . . . . . . . . . . . .

33










2.2.2.8

Graphs to Numeric Data . . . . . . . . . . . . . . . . .

33










2.2.2.9

Any Type to Graphs for Similarity-Based Applications

33




2.3

Data Cleaning . .

................................

34







2.3.1

Handling Missing Entries . . . . . . . . . . . . . . . . . . . . . . .

35







2.3.2

Handling Incorrect and Inconsistent Entries . . . . . . . . . . . .

36







2.3.3

Scaling and Normalization . . . . . . . . . . . . . . . . . . . . . .

37




2.4

Data Reduction and Transformation . . . . . . . . . . . . . . . . . . . . .

37







2.4.1

Sampling

................................

38










2.4.1.1

Sampling for Static Data . . . . . . . . . . . . . . . . .

38










2.4.1.2

Reservoir Sampling for Data Streams . . . . . . . . . .

39







2.4.2

Feature Subset Selection . . . . . . . . . . . . . . . . . . . . . . .

40







2.4.3

Dimensionality Reduction with Axis Rotation . . . . . . . . . . .

41










2.4.3.1

Principal Component Analysis . . . . . . . . . . . . . .

42










2.4.3.2

Singular Value Decomposition . . . . . . . . . . . . . .

44










2.4.3.3

Latent Semantic Analysis . . . . . . . . . . . . . . . . .

47










2.4.3.4

Applications of PCA and SVD . . . . . . . . . . . . . .

48







2.4.4

Dimensionality Reduction with Type Transformation . . . . . . .

49










2.4.4.1

Haar Wavelet Transform . . . . . . . . . . . . . . . . .

50










2.4.4.2

Multidimensional Scaling . . . . . . . . . . . . . . . . .

55










2.4.4.3

Spectral Transformation and Embedding of Graphs . .

57




2.5

Summary . . . .

................................

59




2.6

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60




2.7

Exercises . . . . .

................................

61

3

Similarity and Distances

63




3.1

Introduction . . .

................................

63




3.2

Multidimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64







3.2.1

Quantitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . .

64










3.2.1.1

Impact of Domain-Specific Relevance . . . . . . . . . .

65










3.2.1.2

Impact of High Dimensionality . . . . . . . . . . . . . .

65










3.2.1.3

Impact of Locally Irrelevant Features . . . . . . . . . .

66










3.2.1.4

Impact of Different Lp-Norms . . . . . . . . . . . . . .

67










3.2.1.5

Match-Based Similarity Computation . . . . . . . . . .

68










3.2.1.6

Impact of Data Distribution . . . . . . . . . . . . . . .

69





Yüklə 17,13 Mb.

Dostları ilə paylaş:
  1   2   3   4   5   6   7   8   9   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin