Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	348/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 344 345 346 347 348 349 350 351 ... 423

1-Data Mining tarjima

A A B A A A A A A B

17.9. EXERCISES

587

Consider two graphs containing 2 · n nodes and n distinct labels, each of which occurs twice. What is the maximum number of isomorphic matchings between the two graphs?

Implement the basic algorithm for subgraph isomorphism with no pruning optimiza-tions. Test it by trying to match pairs of randomly generated graphs, containing a varying number of nodes. How does the running time vary with the size of the graph?

Compute the Morgan indices of order 1 and 2, for each node of the acetaminophen graph of Fig. 17.1. How does the Morgan index vary with the labels (corresponding to chemical elements)?

Write a computer program to compute each of the topological descriptors of a graph discussed in this chapter.

Write a computer program to execute the node-based candidate growth for frequent subgraph discovery. Refer to the bibliographic notes, if needed, for the paper describing specific details of the algorithm.

Write a computer program to execute the edge-based candidate growth for frequent subgraph discovery. Refer to the bibliographic notes for the paper describing specific details of the algorithm.

Show the diﬀerent node-based joins that can be performed between the two graphs

below, while accounting for isomorphism.

A A B A A

A A A A B

Show the diﬀerent edge-based joins that can performed between the two graphs of Exercise 8, while accounting for isomorphism.

Determine the maximum number of candidates that can be generated with node-based join growth using a single pair of graphs, while accounting for isomorphism. Assume that the matching core of these graphs is a cycle of size k. What conditions in the core of the joined portion result in this scenario?

Discuss how the node-based growth and edge-based growth strategies translate into a candidate tree structure that is analogous to the enumeration tree in frequent pattern mining.

Implement a computer program to construct a text-like representation for a database of graphs, as discussed in the chapter. Use any feature selection approach of your choice of minimize redundancy. Implement a k-means clustering algorithm with this representation.

Repeat Exercise 12 for the classification problem. Use a naive Bayes classifier, as discussed in Chapter 10, for the final classification step and an appropriately chosen supervised feature selection method from the same chapter.

What changes would be require in the subgraph isomorphism algorithm for cases in which the query graph is disconnected?

Chapter 18

Mining Web Data

“Data is a precious thing, and will last longer than the systems themselves.”—Tim Berners-Lee

18.1 Introduction

The Web is an unique phenomenon in many ways, in terms of its scale, the distributed and uncoordinated nature of its creation, the openness of the underlying platform, and the resulting diversity of applications it has enabled. Examples of such applications include e-commerce, user collaboration, and social network analysis. Because of the distributed and uncoordinated nature in which the Web is both created and used, it is a rich treasure trove of diverse types of data. This data can be either a source of knowledge about various subjects, or personal information about users.

Aside from the content available in the documents on the Web, the usage of the Web results in a significant amount of data in the form of user logs or Web transactions. There are two primary types of data available on the Web that are used by mining algorithms.

Yüklə 17,13 Mb.

Dostları ilə paylaş: