Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	388/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 384 385 386 387 388 389 390 391 ... 423

1-Data Mining tarjima

19.9. EXERCISES

661

Implement the linear threshold and independent cascade model for influence analysis.

The chapter provides a 1-dimensional formulation for the symmetric version using the column vector z. Set up a generalized formulation for the symmetric version using an n × k matrix Z.
1. Let Y be the decision variables for the random walk formulation discussed in the chapter. Show that Z = ^√ΛY .

Show that the unit-norm scaled rows of Y and Z are the same.

It is well known that a symmetric matrix always has real eigenvalues. Use this result to show that the stochastic transition matrix of an undirected graph always has real eigenvalues.

Show that if (y, λ) is an eigenvector–eigenvalue pair of the normalized Laplacian Λ−1(Λ−W ), then (y, 1−λ) is an eigenvector–eigenvalue pair of the normalized weight matrix Λ−1W . Here, Λ is a diagonal matrix containing the sum of each row in the weighted adjacency matrix W .

Chapter 20

Privacy-Preserving Data Mining

“Civilization is the progress toward a society of privacy. The savage’s

whole existence is public, ruled by the laws of his tribe. Civilization is the process of setting man free from men.”—Ayn Rand

20.1 Introduction

A significant amount of application data is of a personal nature. These kind of data sets may contain sensitive information about an individual, such as his or her financial status, political beliefs, sexual orientation, and medical history. The knowledge about such personal information can compromise the privacy of individuals. Therefore, it is crucial to design data collection, dissemination, and mining techniques, so that individuals are assured of their privacy. Privacy-preservation methods can generally be executed at diﬀerent steps of the data mining process:

Data collection and publication: The privacy-driven modification of a data set may be done at either the data collection time, or the data publication time. In anonymous data collection, a modified version of the data is collected using a software plugin within the collection platform. Therefore, the contributors of the data are assured that their data is not available even to the entity collecting the data. The implicit assumption in the collection-oriented model is that the data collector is not trusted, and therefore the privacy must be preserved at collection time. In anonymous data publication, the entire data set is available to a trusted entity, who has usually collected the data in the normal course of business. An example is a hospital that has collected data about its patients. Eventually, the entity may wish to release or publish the data to one of more third-parties for data analysis. For example, a hospital may want to use the data to study the long-term impact of various treatment alternatives. A real-world example is the Netflix prize data set [559], in which the anonymized movie ratings of users were published to advance studies on collaborative filtering algorithms. During data publication, identifying or sensitive attribute values need to either be removed or be specified approximately to preserve privacy. Generally, such publication algorithms

C. C. Aggarwal, Data Mining: The Textbook, DOI 10.1007/978-3-319-14142-8 20

663

c Springer International Publishing Switzerland 2015

664 CHAPTER 20. PRIVACY-PRESERVING DATA MINING

can control the level of privacy much better than collection algorithms, because of their access to the entire data set on a trusted server.

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 384 385 386 387 388 389 390 391 ... 423