IBM Skip to main contentUnited States
     Home  |  Products & services  |  Support & downloads  |  My account
 Select a country
 IBM Research
Dakshi Agrawal's homepage
Research interests
Publications
Presentations
Page Contact

 
Dakshi Agrawal's homepage   >   Publications   >
On the design and quantification of privacy preserving data mining algorithms

Written by: Dakshi Agrawal and Charu C Aggarwal.

Citation: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pages 247-255. ACM Press, 2001.

Copyright © (2001) by Association for Computing Machinery, Inc. Permission to make digital or hard copies of part of all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

PDF  Preprint version (202KB)
 Get Adobe® Reader®
Abstract:
The increasing ability to track and collect large amounts of data with the use of current hardware technology has lead to an interest in the development of data mining algorithms which preserve user privacy. A recently proposed technique addresses the issue of privacy preservation by perturbing the data and reconstructing distributions at an aggregate level in order to perform the mining. This method is able to retain privacy while accessing the information implicit in the original attributes. The distribution reconstruction process naturally leads to some loss of information which is acceptable in many practical situations. This paper discusses an Expectation Maximization (EM) algorithm for distribution reconstruction which is more effective than the currently available method in terms of the level of information loss. Specifically, we prove that the EM algorithm converges to the maximum likelihood estimate of the original distribution based on the perturbed data. We show that when a large amount of data is available, the EM algorithm provides robust estimates of the original distribution. We propose metrics for quantification and measurement of privacy-preserving data mining algorithms. Thus, this paper provides the foundations for measurement of the effectiveness of privacy preserving data mining algorithms. Our privacy metrics illustrate some interesting results on the relative effectiveness of different perturbing distributions.
  About IBM  |  Privacy  |  Terms of use  |  Contact