Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə399/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   395   396   397   398   399   400   401   402   ...   423
1-Data Mining tarjima

patients has a much higher expected chance of having HIV, than the base population.
In this context, a notion of Bayes optimal privacy exists, which ensures that the addi-tional posterior information gained after release of information is as small as possible. Unfor-tunately, the notion of Bayes optimal privacy is practically and computationally difficult to implement. The t-closeness model may be viewed as a practical and heuristic approach that attempts to achieve similar goals as the notion of Bayes optimal privacy. This is achieved by using the distance functions between distributions. Informally, the goal is to create an

20.3. PRIVACY-PRESERVING DATA PUBLISHING

685

anonymization, such that the distance between the sensitive attribute distributions of each anonymized group and the base data is bounded by a user-defined threshold.


Definition 20.3.5 (t-closeness Principle) Let P = (p1 . . . pr) be a vector representing




the fraction of the data records belonging to the r different values of the sensitive attribute in an equivalence class. Let Q = (q1 . . . qr) be the corresponding fractional distributions in the full data set. Then, the equivalence class is said to satisfy t-closeness, if the following is true, for an appropriately chosen distance function Dist(·, ·):



Dist(




,




) ≤ t

(20.12)




P

Q






An anonymized table is said to satisfy t-closeness, if all equivalence classes in it satisfy t-closeness.

The previous definition does not specify any particular distance function. There are many different ways to instantiate the distance function, depending on application-specific goals. Two common instantiations of the distance function are as follows:





  1. Variational distance: This is simply equal to half the Manhattan distance between the two distribution vectors:



















r

|pi − qi|







Dist(




,




) =

i=1

(20.13)





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   395   396   397   398   399   400   401   402   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin