Data Mining and Knowledge Discovery

, Volume 18, Issue 1, pp 101–139

FRAPP: a framework for high-accuracy privacy-preserving mining

  • Shipra Agrawal
  • Jayant R. Haritsa
  • B. Aditya Prakash
Article

Abstract

To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric positive-definite perturbation matrix with minimal condition number can be identified, substantially enhancing the accuracy even under strict privacy requirements. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal reduction in accuracy. The quantitative utility of FRAPP, which is a general-purpose random-perturbation-based privacy-preserving mining technique, is evaluated specifically with regard to association and classification rule mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, either substantially lower modeling errors are incurred as compared to the prior techniques, or the errors are comparable to those of direct mining on the true database.

Keywords

Privacy Data mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adam N, Wortman J (1989) Security control methods for statistical databases. ACM Comput Surv 21(4): 515–556CrossRefGoogle Scholar
  2. Aggarwal C, Yu P (2004, March) A condensation approach to privacy preserving data mining. In: Proceedings of the 9th international conference on extending database technology (EDBT), Heraklion, Crete, GreeceGoogle Scholar
  3. Agrawal D, Aggarwal C (2001, May) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the ACM symposium on principles of database systems (PODS), Santa Barbara, California, USAGoogle Scholar
  4. Agrawal R, Bayardo R, Faloutsos C, Kiernan J, Rantzau R, Srikant R (2004, August) Auditing compliance with a hippocratic database. In: Proceedings of the 30th international conference on very large data bases (VLDB), Toronto, CanadaGoogle Scholar
  5. Agrawal R, Kiernan J, Srikant R, Xu Y (2002, August) Hippocratic databases. In: Proceedings of the 28th international conference on very large data bases (VLDB), Hong Kong, ChinaGoogle Scholar
  6. Agrawal R, Kini A, LeFevre K, Wang A, Xu Y, Zhou D (2004, June) Managing healthcare data hippocratically. In: Proceedings of the ACM SIGMOD international conference on management of data, Paris, FranceGoogle Scholar
  7. Agrawal R, Srikant R (1994, September) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB), Santiago de Chile, ChileGoogle Scholar
  8. Agrawal R, Srikant R (2000, May) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, Dallas, Texas, USAGoogle Scholar
  9. Agrawal R, Srikant R, Thomas D (2005, June) Privacy-preserving OLAP. In: Proceedings of the ACM SIGMOD international conference on management of data, Baltimore, Maryland, USAGoogle Scholar
  10. Agrawal S, Krishnan V, Haritsa J (2004, March) On addressing efficiency concerns in privacy-preserving mining. In: Proceedings of the 9th international conference on database systems for advanced applications (DASFAA), Jeju Island, KoreaGoogle Scholar
  11. Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999, November) Disclosure limitation of sensitive rules. In: Proceedings of the IEEE knowledge and data engineering exchange workshop (KDEX), Chicago, Illinois, USAGoogle Scholar
  12. Cranor L, Reagle J, Ackerman M (1999, April) Beyond concern: understanding net users’ attitudes about online privacy, AT&T labs research technical report TR 99.4.3Google Scholar
  13. Dasseni E, Verykios V, Elmagarmid A, Bertino E (2001, April) Hiding association rules by using confidence and support. In: Proceedings of the 4th international information hiding workshop (IHW), Pittsburgh, Pennsylvania, USAGoogle Scholar
  14. de Wolf P, Gouweleeuw J, Kooiman P, Willenborg L (1998, March) Reflections on PRAM. In: Proceedings of the statistical data protection conference, Lisbon, PortugalGoogle Scholar
  15. Denning D (1982) Cryptography and data security. Addison-WesleyGoogle Scholar
  16. Duncan G, Pearson R (1991) Enhancing access to microdata while protecting confidentiality: prospects for the future. Stat Sci 6(3): 219–232CrossRefGoogle Scholar
  17. Evfimievski A, Gehrke J, Srikant R (2003, June) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the ACM symposium on principles of database systems (PODS), San Diego, California, USAGoogle Scholar
  18. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002, July) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Edmonton, Alberta, CanadaGoogle Scholar
  19. Feller W (1988) An introduction to probability theory and its applications, vol I. WileyGoogle Scholar
  20. Gouweleeuw J, Kooiman P, Willenborg L, de Wolf P (1998) Post randomisation for statistical disclosure control: Theory and implementation. J Off Stat 14(4): 485–502Google Scholar
  21. Kantarcioglu M, Clifton C (2002, June) Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery (DMKD), Madison, Wisconsin, USAGoogle Scholar
  22. Kargupta H, Datta S, Wang Q, Sivakumar K (2003, December) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, Florida, USAGoogle Scholar
  23. LeFevre K, Agrawal R, Ercegovac V, Ramakrishnan R, Xu Y, DeWitt D (2004, August) Limiting disclosure in hippocratic databases. In: Proceedings of the 30th international conference on very large data bases (VLDB), Toronto, CanadaGoogle Scholar
  24. Mishra N, Sandler M (2006, June) Privacy via pseudorandom sketches. In: Proceedings of the ACM symposium on principles of database systems (PODS), Chicago, Illinois, USAGoogle Scholar
  25. Mitchell T (1997) Machine learning. McGraw HillGoogle Scholar
  26. Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University PressGoogle Scholar
  27. Pudi V, Haritsa J (2000) Quantifying the utility of the past in mining large databases. Inf Sys 25(5): 323–344CrossRefGoogle Scholar
  28. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan KaufmannGoogle Scholar
  29. Rastogi V, Suciu D, Hong S (2007, September) The boundary between privacy and utility in data publishing. In: Proceedings of the 33rd international conference on very large data bases (VLDB), Vienna, AustriaGoogle Scholar
  30. Rizvi S, Haritsa J (2002, August) Maintaining data privacy in association rule mining. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong, ChinaGoogle Scholar
  31. Samarati P, Sweeney L (1998, June) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the ACM symposium on principles of database systems (PODS), Seattle, Washington, USAGoogle Scholar
  32. Saygin Y, Verykios V, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Rec 30(4): 45–54CrossRefGoogle Scholar
  33. Saygin Y, Verykios V, Elmagarmid A (2002, February) Privacy preserving association rule mining. In: Proceedings of the 12th international workshop on research issues in data engineering (RIDE), San Jose, California, USAGoogle Scholar
  34. Shoshani A (1982, September) Statistical databases: characteristics, problems and some solutions. In: Proceedings of the 8th international conference on very large databases (VLDB), Mexico City, MexicoGoogle Scholar
  35. Strang G (1988) Linear algebra and its applications. Thomson Learning IncGoogle Scholar
  36. Vaidya J, Clifton C (2002, July) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIKGDD international conference on knowledge discovery and data mining (KDD), Edmonton, Alberta, CanadaGoogle Scholar
  37. Vaidya J, Clifton C (2003, August) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Washington, DC, USAGoogle Scholar
  38. Vaidya J, Clifton C (2004, April) Privacy preserving naive bayes classifier for vertically partitioned data. In: Proceedings of the SIAM international conference on data mining (SDM), Toronto, CanadaGoogle Scholar
  39. Wang Y (1993) On the number of successes in independent trials. Statistica Silica 3Google Scholar
  40. Warner S (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60: 63–69CrossRefGoogle Scholar
  41. Westin A (1999, July) Freebies and privacy: what net users think. Technical report, Opinion Research CorporationGoogle Scholar
  42. Zhang N, Wang S, Zhao W (2004, September) A new scheme on privacy-preserving association rule mining. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD), Pisa, ItalyGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Shipra Agrawal
    • 1
    • 2
  • Jayant R. Haritsa
    • 1
  • B. Aditya Prakash
    • 3
  1. 1.Indian Institute of ScienceBangaloreIndia
  2. 2.Stanford UniversityStanfordUSA
  3. 3.Indian Institute of TechnologyMumbaiIndia

Personalised recommendations