patients has a much higher expected chance of having HIV, than the base population.
In this context, a notion of Bayes optimal privacy exists, which ensures that the addi-tional posterior information gained after release of information is as small as possible. Unfor-tunately, the notion of Bayes optimal privacy is practically and computationally difficult to implement. The t-closeness model may be viewed as a practical and heuristic approach that attempts to achieve similar goals as the notion of Bayes optimal privacy. This is achieved by using the distance functions between distributions. Informally, the goal is to create an
anonymization, such that the distance between the sensitive attribute distributions of each anonymized group and the base data is bounded by a user-defined threshold.
Definition 20.3.5 (t-closeness Principle) Let P = (p1 . . . pr) be a vector representing
the fraction of the data records belonging to the r different values of the sensitive attribute in an equivalence class. Let Q = (q1 . . . qr) be the corresponding fractional distributions in the full data set. Then, the equivalence class is said to satisfy t-closeness, if the following is true, for an appropriately chosen distance function Dist(·, ·):
Dist(
|
|
,
|
|
) ≤ t
|
(20.12)
|
|
P
|
Q
|
|
An anonymized table is said to satisfy t-closeness, if all equivalence classes in it satisfy t-closeness.
The previous definition does not specify any particular distance function. There are many different ways to instantiate the distance function, depending on application-specific goals. Two common instantiations of the distance function are as follows:
Variational distance: This is simply equal to half the Manhattan distance between the two distribution vectors:
|
|
|
|
|
r
|
|pi − qi|
|
|
|
Dist(
|
|
,
|
|
) =
|
i=1
|
(20.13)
|
|
|
Dostları ilə paylaş: |