|
 |
On the design and quantification of privacy preserving data mining algorithms
Written by:
Dakshi Agrawal
and
Charu C Aggarwal.
Citation:
Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pages 247-255. ACM Press, 2001.
Copyright © (2001) by Association for Computing Machinery,
Inc. Permission to make digital or hard copies of part of all of this
work for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage. To
copy otherwise, to republish, to post on servers, or to redistribute to
lists, requires prior specific permission and/or a fee.
Abstract:
The increasing ability to track and collect large amounts of data with
the use of current hardware technology has lead to an interest in the
development of data mining algorithms which preserve user privacy. A
recently proposed technique addresses the issue of privacy preservation by
perturbing the data and reconstructing distributions at an aggregate level
in order to perform the mining. This method is able to retain privacy
while accessing the information implicit in the original attributes. The
distribution reconstruction process naturally leads to some loss of
information which is acceptable in many practical situations. This paper
discusses an Expectation Maximization (EM) algorithm for distribution
reconstruction which is more effective than the currently available
method in terms of the level of information loss. Specifically, we prove
that the EM algorithm converges to the maximum likelihood estimate of
the original distribution based on the perturbed data. We show that
when a large amount of data is available, the EM algorithm provides
robust estimates of the original distribution. We propose metrics
for quantification and measurement of privacy-preserving data mining
algorithms. Thus, this paper provides the foundations for measurement
of the effectiveness of privacy preserving data mining algorithms. Our
privacy metrics illustrate some interesting results on the relative
effectiveness of different perturbing distributions.
|
|