Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə389/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   385   386   387   388   389   390   391   392   ...   423
1-Data Mining tarjima

Output privacy of data mining algorithms: Privacy can also be violated by the output of data mining algorithms. For example, consider a scenario where a user is allowed to determine association patterns, or otherwise query the data through a Web service, but is not provided access to the data set. In such a case, the output of the data mining and query processing algorithms provides valuable information, some of which may be private.

In some applications, organizations may wish to share their data in a private way, so that only patterns in the shared data may be mined, but the statistics of the local databases are not revealed to the participants. This problem is referred to as distributed privacy preservation.


In general, most forms of privacy-preserving data mining reduce the representation accu-racy of the data, in order to preserve privacy. This accuracy reduction is performed in a vari-ety of ways, such as data distortion, approximation (generalization), suppression, attribute value swapping, or microaggregation. Clearly, since the data is no longer specified exactly, this will have a detrimental impact on the quality of the data mining results. The effective-ness of the released data for mining applications is often quantified explicitly, and is referred to as its utility. A natural trade-off exists between privacy and utility. For example, in a case, where data values are suppressed, one might simply choose to suppress all entries. While such a solution provides perfect privacy, it offers no utility. This observation is also true for privacy-preserving publication algorithms in which noise is added to the data. When a greater amount of noise is added, a higher level of privacy is achieved, but utility is reduced. The goal of privacy-preservation methods is to maximize utility at a fixed level of privacy.


This chapter is organized as follows. Methods for privacy-preserving data collection are addressed in Sect. 20.2. Section 20.3 addresses the problem of privacy-preserving data publishing. This section includes several models such as the k-anonymity model, the - diversity model, and the t-closeness model. The problem of output privacy is addressed in Sect. 20.4. Methods for distributed and cryptographic privacy are discussed in Sect. 20.5. A summary is given in Sect. 20.6.


20.2 Privacy During Data Collection


The randomization method is designed for privacy-preservation at data collection time. The implicit assumption is that the data collector is not trusted, and therefore the privacy must be preserved at data collection time. The basic idea of the approach is to allow users to enter the data through a software platform that is able to add random perturbations to the data. This approach is one of the most conservative models for ensuring data privacy, because the original data records are never stored on any single server.


The random perturbations are added using a publicly available distribution. Examples of commonly used perturbing distributions include the uniform and the Gaussian distributions. In other words, the probability distribution used to perturb the data is specified together with the data set if and when the data collector releases the data for public use. This additional distribution information is needed to use the data effectively in the context of data mining algorithms. The basic idea is to reconstruct the distribution of the original data, by “subtracting out” the noise distribution. This aggregate distribution is then used for mining purposes. The overall approach is as follows:



20.2. PRIVACY DURING DATA COLLECTION

665




  1. Privacy-preserving data collection: In this step, random noise is added to the data while collecting data from users, with the use of a software plugin. The collected data is publicly released along with the probability distribution function (and parameters) used to add the random noise.





  1. Yüklə 17,13 Mb.

    Dostları ilə paylaş:
1   ...   385   386   387   388   389   390   391   392   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin