Skip to main content

FRAPP: a framework for high-accuracy privacy-preserving mining

Abstract

To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric positive-definite perturbation matrix with minimal condition number can be identified, substantially enhancing the accuracy even under strict privacy requirements. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal reduction in accuracy. The quantitative utility of FRAPP, which is a general-purpose random-perturbation-based privacy-preserving mining technique, is evaluated specifically with regard to association and classification rule mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, either substantially lower modeling errors are incurred as compared to the prior techniques, or the errors are comparable to those of direct mining on the true database.

This is a preview of subscription content, access via your institution.

References

  • Adam N, Wortman J (1989) Security control methods for statistical databases. ACM Comput Surv 21(4): 515–556

    Article  Google Scholar 

  • Aggarwal C, Yu P (2004, March) A condensation approach to privacy preserving data mining. In: Proceedings of the 9th international conference on extending database technology (EDBT), Heraklion, Crete, Greece

  • Agrawal D, Aggarwal C (2001, May) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the ACM symposium on principles of database systems (PODS), Santa Barbara, California, USA

  • Agrawal R, Bayardo R, Faloutsos C, Kiernan J, Rantzau R, Srikant R (2004, August) Auditing compliance with a hippocratic database. In: Proceedings of the 30th international conference on very large data bases (VLDB), Toronto, Canada

  • Agrawal R, Kiernan J, Srikant R, Xu Y (2002, August) Hippocratic databases. In: Proceedings of the 28th international conference on very large data bases (VLDB), Hong Kong, China

  • Agrawal R, Kini A, LeFevre K, Wang A, Xu Y, Zhou D (2004, June) Managing healthcare data hippocratically. In: Proceedings of the ACM SIGMOD international conference on management of data, Paris, France

  • Agrawal R, Srikant R (1994, September) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB), Santiago de Chile, Chile

  • Agrawal R, Srikant R (2000, May) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, Texas, USA

  • Agrawal R, Srikant R, Thomas D (2005, June) Privacy-preserving OLAP. In: Proceedings of the ACM SIGMOD international conference on management of data, Baltimore, Maryland, USA

  • Agrawal S, Krishnan V, Haritsa J (2004, March) On addressing efficiency concerns in privacy-preserving mining. In: Proceedings of the 9th international conference on database systems for advanced applications (DASFAA), Jeju Island, Korea

  • Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999, November) Disclosure limitation of sensitive rules. In: Proceedings of the IEEE knowledge and data engineering exchange workshop (KDEX), Chicago, Illinois, USA

  • Cranor L, Reagle J, Ackerman M (1999, April) Beyond concern: understanding net users’ attitudes about online privacy, AT&T labs research technical report TR 99.4.3

  • Dasseni E, Verykios V, Elmagarmid A, Bertino E (2001, April) Hiding association rules by using confidence and support. In: Proceedings of the 4th international information hiding workshop (IHW), Pittsburgh, Pennsylvania, USA

  • de Wolf P, Gouweleeuw J, Kooiman P, Willenborg L (1998, March) Reflections on PRAM. In: Proceedings of the statistical data protection conference, Lisbon, Portugal

  • Denning D (1982) Cryptography and data security. Addison-Wesley

  • Duncan G, Pearson R (1991) Enhancing access to microdata while protecting confidentiality: prospects for the future. Stat Sci 6(3): 219–232

    Article  Google Scholar 

  • Evfimievski A, Gehrke J, Srikant R (2003, June) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the ACM symposium on principles of database systems (PODS), San Diego, California, USA

  • Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002, July) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Edmonton, Alberta, Canada

  • Feller W (1988) An introduction to probability theory and its applications, vol I. Wiley

  • Gouweleeuw J, Kooiman P, Willenborg L, de Wolf P (1998) Post randomisation for statistical disclosure control: Theory and implementation. J Off Stat 14(4): 485–502

    Google Scholar 

  • Kantarcioglu M, Clifton C (2002, June) Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery (DMKD), Madison, Wisconsin, USA

  • Kargupta H, Datta S, Wang Q, Sivakumar K (2003, December) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, Florida, USA

  • LeFevre K, Agrawal R, Ercegovac V, Ramakrishnan R, Xu Y, DeWitt D (2004, August) Limiting disclosure in hippocratic databases. In: Proceedings of the 30th international conference on very large data bases (VLDB), Toronto, Canada

  • Mishra N, Sandler M (2006, June) Privacy via pseudorandom sketches. In: Proceedings of the ACM symposium on principles of database systems (PODS), Chicago, Illinois, USA

  • Mitchell T (1997) Machine learning. McGraw Hill

  • Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press

  • Pudi V, Haritsa J (2000) Quantifying the utility of the past in mining large databases. Inf Sys 25(5): 323–344

    Article  Google Scholar 

  • Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann

  • Rastogi V, Suciu D, Hong S (2007, September) The boundary between privacy and utility in data publishing. In: Proceedings of the 33rd international conference on very large data bases (VLDB), Vienna, Austria

  • Rizvi S, Haritsa J (2002, August) Maintaining data privacy in association rule mining. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong, China

  • Samarati P, Sweeney L (1998, June) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the ACM symposium on principles of database systems (PODS), Seattle, Washington, USA

  • Saygin Y, Verykios V, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Rec 30(4): 45–54

    Article  Google Scholar 

  • Saygin Y, Verykios V, Elmagarmid A (2002, February) Privacy preserving association rule mining. In: Proceedings of the 12th international workshop on research issues in data engineering (RIDE), San Jose, California, USA

  • Shoshani A (1982, September) Statistical databases: characteristics, problems and some solutions. In: Proceedings of the 8th international conference on very large databases (VLDB), Mexico City, Mexico

  • Strang G (1988) Linear algebra and its applications. Thomson Learning Inc

  • Vaidya J, Clifton C (2002, July) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIKGDD international conference on knowledge discovery and data mining (KDD), Edmonton, Alberta, Canada

  • Vaidya J, Clifton C (2003, August) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Washington, DC, USA

  • Vaidya J, Clifton C (2004, April) Privacy preserving naive bayes classifier for vertically partitioned data. In: Proceedings of the SIAM international conference on data mining (SDM), Toronto, Canada

  • Wang Y (1993) On the number of successes in independent trials. Statistica Silica 3

  • Warner S (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60: 63–69

    Article  Google Scholar 

  • Westin A (1999, July) Freebies and privacy: what net users think. Technical report, Opinion Research Corporation

  • Zhang N, Wang S, Zhao W (2004, September) A new scheme on privacy-preserving association rule mining. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD), Pisa, Italy

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jayant R. Haritsa.

Additional information

Responsible editor: Johannes Gehrke.

A partial and preliminary version of this paper appeared in the Proc. of the 21st IEEE Intl. Conf. on Data Engineering (ICDE), Tokyo, Japan, 2005, pgs. 193–204.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Agrawal, S., Haritsa, J.R. & Prakash, B.A. FRAPP: a framework for high-accuracy privacy-preserving mining. Data Min Knowl Disc 18, 101–139 (2009). https://doi.org/10.1007/s10618-008-0119-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0119-9

Keywords

  • Privacy
  • Data mining