Mining Association Rules under Privacy Constraints

  • Jayant R. Haritsa
Part of the Advances in Database Systems book series (ADBS, volume 34)

Data mining services require accurate input data for their results to be meaningful, but privacy concerns may impel users to provide spurious information. In this chapter, we study whether users can be encouraged to provide correct information by ensuring that the mining process cannot, with any reasonable degree of certainty, violate their privacy. Our analysis is in the context of extracting association rules from large historical databases, a popular mining process that identifies interesting correlations between database attributes. We analyze the various schemes that have been proposed for this purpose with regard to a variety of parameters including the degree of trust, privacy metric, model accuracy and mining efficiency.

Keywords

Privacy data Mining association rules 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    N. Adam and J. Wortman. Security control methods for statistical databases. ACM Computing Surveys, 21(4), 1989.Google Scholar
  2. 2.
    C. Aggarwal and P. Yu. A condensation approach to privacy preserving data mining. Proc. of 9th Intl. Conf. on Extending Database Technology (EDBT), March 2004.Google Scholar
  3. 3.
    D. Agrawal and C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. Proc. of ACM Symp. on Principles of Database Systems (PODS), May 2001.Google Scholar
  4. 4.
    R. Agrawal, R. Bayardo, C. Faloutsos, J. Kiernan, R. Rantzau and R. Srikant. Auditing compliance with a hippocratic database. Proc. of 30th Intl. Conf. on Very Large Data Bases (VLDB), August 2004.Google Scholar
  5. 5.
    R. Agrawal, J. Kiernan, R. Srikant and Y. Xu. Hippocratic databases. Proc. of 28th Intl. Conf. on Very Large Data Bases (VLDB), August 2002.Google Scholar
  6. 6.
    R. Agrawal, A. Kini, K. LeFevre, A. Wang, Y. Xu and D. Zhou. Managing healthcare data hippocratically. Proc. of ACM SIGMOD Intl. Conf. on Management of Data, June 2004.Google Scholar
  7. 7.
    R. Agrawal, T. Imielinski and A. Swami. Mining association rules between sets of items in large databases. Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 1993.Google Scholar
  8. 8.
    R. Agrawal and R. Srikant. Fast algorithms for mining association rules. Proc. of 20th Intl. Conf. on Very Large Data Bases (VLDB), September 1994.Google Scholar
  9. 9.
    R. Agrawal and R. Srikant. Privacy-preserving data mining. Proc. of ACM SIGMOD Intl. Conf. on Management of Data, May 2000.Google Scholar
  10. 10.
    S. Agrawal and J. Haritsa. A Framework for High-Accuracy Privacy-Preserving Mining. Proc. of 21st IEEE Intl. Conf. on Data Engineering (ICDE), April 2005.Google Scholar
  11. 11.
    S. Agrawal and J. Haritsa. A Framework for High-Accuracy Privacy-Preserving Mining. Tech. Rep. TR-2004-02, DSL/SERC, Indian Institute of Science, 2004. http://dsl.serc.iisc.ernet.in/pub/TR/TR-2004-02.pdf
  12. 12.
    S. Agrawal, V. Krishnan and J. Haritsa. On addressing efficiency concerns in privacy-preserving mining. Proc. of 9th Intl. Conf. on Database Systems for Advanced Applications (DASFAA), March 2004.Google Scholar
  13. 13.
    M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim and V. Verykios. Disclosure limitation of sensitive rules. Proc. of IEEE Knowledge and Data Engineering Exchange Workshop (KDEX), November 1999.Google Scholar
  14. 14.
    L. Cranor, J. Reagle and M. Ackerman. Beyond concern: Understanding net users’ attitudes about online privacy. AT&T Tech. Rep. 99.4.3, April 1999.Google Scholar
  15. 15.
    E. Dasseni, V. Verykios, A. Elmagarmid and E. Bertino. Hiding association rules by using confidence and support. Proc. of 4th Intl. Information Hiding Workshop (IHW), April 2001.Google Scholar
  16. 16.
    P. de Wolf, J. Gouweleeuw, P. Kooiman, and L. Willenborg. Reflections on PRAM. Proc. of Statistical Data Protection Conf., March 1998.Google Scholar
  17. 17.
    D. Denning. Cryptography and Data Security. Addison-Wesley, 1982.Google Scholar
  18. 18.
    A. Evfimievski, J. Gehrke and R. Srikant. Limiting privacy breaches in privacy preserving data mining. Proc. of ACM Symp. on Principles of Database Systems (PODS), June 2003.Google Scholar
  19. 19.
    A. Evfimievski, R. Srikant, R. Agrawal and J. Gehrke. Privacy preserving mining of association rules. Proc. of 8th ACM Intl. Conf. on Knowledge Discovery and Data Mining (KDD), July 2002.Google Scholar
  20. 20.
    W. Feller. An Introduction to Probability Theory and its Applications (Vol. I). Wiley, 1988.Google Scholar
  21. 21.
    M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.Google Scholar
  22. 22.
    A. Gkoulalas-Divanis and V. Verykios. An integer programming approach for frequent itemset hiding. Proc. of 15th ACM Conf. on Information and Knowledge Management (CIKM), November 2006.Google Scholar
  23. 23.
    O. Goldreich. Secure Multi-party Computation. www.wisdom.weizmann.ac.il/˜oded/pp.html, 1998.
  24. 24.
    J. Gouweleeuw, P. Kooiman, L. Willenborg and P. de Wolf. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics, 14(4), 1998.Google Scholar
  25. 25.
    M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. Proc. of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), June 2002.Google Scholar
  26. 26.
    H. Kargupta, S. Datta, Q. Wang and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. Proc. of the 3rd IEEE Intl. Conf. on Data Mining (ICDM), December 2003.Google Scholar
  27. 27.
    K. LeFevre, R. Agrawal, V. Ercegovac, R. Ramakrishnan, Y. Xu and D. DeWitt. Limiting disclosure in hippocratic databases. Proc. of 30th Intl. Conf. on Very Large Data Bases (VLDB), 2004.Google Scholar
  28. 28.
    N. Mishra and M. Sandler. Privacy via pseudorandom sketches. Proc. of 25th ACM Symp. on Principles of Database Systems (PODS), 2006.Google Scholar
  29. 29.
    T. Mitchell. Machine Learning. McGraw Hill, 1997.Google Scholar
  30. 30.
    G. Moustakides and V. Verykios. A Max-Min Approach for Hiding Frequent Itemsets. Proc. of 6th IEEE Intl. Conf. on Data Mining - Workshops, December 2006.Google Scholar
  31. 31.
    R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.Google Scholar
  32. 32.
    V. Pudi and J. Haritsa. Quantifying the Utility of the Past in Mining Large Databases. Information Systems, Elsevier Science Publishers, vol. 25, no. 5, July 2000, pgs. 323–344CrossRefGoogle Scholar
  33. 33.
    J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  34. 34.
    S. Rizvi and J. Haritsa. Maintaining data privacy in association rule mining. Proc. of 28th Intl. Conf. on Very Large Databases (VLDB), August 2002.Google Scholar
  35. 35.
    P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information. Proc. of 17th ACM Symp. on Principles of Database Systems (PODS), June 1998.Google Scholar
  36. 36.
    Y. Saygin, V. Verykios and C. Clifton. Using unknowns to prevent discovery of association rules. ACM SIGMOD Record, vol. 30, no. 4, 2001.Google Scholar
  37. 37.
    Y. Saygin, V. Verykios and A. Elmagarmid. Privacy preserving association rule mining. Proc. of 12th Intl. Workshop on Research Issues in Data Engineering (RIDE), February 2002.Google Scholar
  38. 38.
    A. Shoshani. Statistical databases: Characteristics, problems and some solutions. Proc. of 8th Intl. Conf. on Very Large Databases (VLDB), September 1982.Google Scholar
  39. 39.
    G. Strang. Linear Algebra and its Applications. Thomson Learning Inc., 1988.Google Scholar
  40. 40.
    H. Toivonen. Sampling large databases for association rules. Proc. of 22nd Intl. Conf. on Very Large Databases (VLDB), August 1996.Google Scholar
  41. 41.
    J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. Proc. of 8th ACM Intl. Conference on Knowledge Discovery and Data Mining (KDD), July 2002.Google Scholar
  42. 42.
    J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. Proc. of 9th ACM Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 2003.Google Scholar
  43. 43.
    J. Vaidya and C. Clifton. Privacy preserving naive bayes classifier for vertically partitioned data. Proc. of SIAM Intl. Conf. on Data Mining, April 2004.Google Scholar
  44. 44.
    V. Verykios, A. Elmagarmid, E. Bertino, Y. Saygin and E. Dasseni. Association Rule Hiding. IEEE Trans. on Knowledge and Data Engineering, 16(4), 2004.Google Scholar
  45. 45.
    Y. Wang. On the number of successes in independent trials. Statistica Silica 3, 1993.Google Scholar
  46. 46.
    A. Westin. Freebies and privacy: What net users think. Tech. Rep., Opinion Research Corporation, 1999.Google Scholar
  47. 47.
    N. Zhang, S. Wang and W. Zhao. A new scheme on privacy-preserving association rule mining. Proc. of 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), September 2004.Google Scholar
  48. 48.
    Data from US Census beaurau : National Health Interview Survey : Person, 1993. http://dataferrett.census.gov.
  49. 49.
  50. 50.
  51. 51.

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Jayant R. Haritsa
    • 1
  1. 1.Database Systems LabIndian Institute of ScienceBangaloreIndia

Personalised recommendations