Johannes Gehrke
(
Michael Hay (
Edward Lui
(
Abstract:
We introduce a new definition of privacy called crowd-blending privacy that strictly relaxes the notion of
differential privacy. Roughly speaking, $k$-crowd blending private
sanitization of a database requires that each individual $i$
in the database “blends” with $k$ other individuals $j$ in the
database, in the sense that the output of the sanitizer is
“indistinguishable” if $i$'s data is
replaced by $j$'s.
We demonstrate crowd-blending private mechanisms for histograms and for
releasing synthetic data points, achieving strictly better utility than what
is possible using differentially private mechanisms. Additionally, we
demonstrate that if a crowd-blending private mechanism is combined with a
“pre-sampling” step, where the individuals in the database are
randomly drawn from some underlying population (as is often the case during
data collection), then the combined mechanism satisfies not only differential
privacy, but also the stronger notion of zero-knowledge privacy. This holds
even if the pre-sampling is slightly biased and an adversary knows whether
certain individuals were sampled or not. Taken together, our results yield a
practical approach for collecting and privately releasing data while ensuring
higher utility than previous approaches.