Abstract
The collection of large amounts of users’ sensible data by services providers such as Google, Yahoo, or Facebook poses several relevant and challenging issues. A particularly relevant problem is how to ensure a suitable degree of statistical analysis over such data without disrupting user privacy. Toward this end, in the past few years several anonymization and privacy-preserving data mining techniques have been proposed. In this work, we propose a survey of such methodologies and techniques with a particular focus on advanced topics, such as privacy preserving management of time-varying anonymized data and privacy-preserving data mining over distributed data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal R. and Srikant R. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, 12–15 September, Santiago, Chile, pp. 487–499, 1994.
Agrawal R. and Srikant R. Privacy preserving data mining. In: Proceedings of the ACM SIGMOD Conference, Dallas, Texas, USA, 2000.
Atzori M., Bonchi F., Giannotti F., and Pedreschi D. Blocking anonymity threats raised by frequent itemset mining. In: Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27–30 November 2005, Houston, Texas, USA, IEEE Computer Society, Washington, DC, pp. 561–564, 2005.
Atzori M., Bonchi F., Giannotti F., and Pedreschi D. k-anonymous patterns. In: Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3–7, 2005, Proceedings, Lecture Notes in Computer Science, Springer, Berlin, pp. 10–21, 2005.
Bellare M. and Rogaway P. Random oracles are practical: A paradigm for designing efficient protocols. In: Proceedings of the First ACM Conference on Computer and Communications Security, Fairfax, Virginia, USA, pp. 62–73, 1993.
Byun J.W., Li T., Bertino E., Li N., and Sohn Y. Privacy-preserving incremental data dissemination. Journal of Computer Security 17(1):43–68, 2009.
Cao J., Carminati B., Ferrari E., and Tan K.L. Castle: A delay-constrained scheme for k-anonymizing data streams. In: Proceedings of IEEE ICED Conference, Penang, Malaysia, 2008.
Cheung D.W-L., Han J., Ng V., Fu A.W-C., and Fu Y. A fast distributed algorithm for mining association rules. In: Proceedings of the 1996 International Conference on Parallel and Distributed Information Systems (PDIS’96), December 1996, Miami Beach, Florida, USA, IEEE, pp. 31–42.
Chin F. and Ozsoyoglu G. Auditing for secure statistical databases. In: Proceedings of the ACM’81 Conference, Los Angeles, California, USA, 1981.
Fienberg S. and McIntyre J. Data swapping: Variations on a theme by Dalenius and Reiss. In: Proceedings of Privacy in Statistical Databases, Barcelona, Spain, 2004.
Freedman M.J., Nissim K., and Pinkas B. Efficient private matching and set intersection. In: Eurocrypt 2004, Interlaken, Switzerland, 2–6 May. International Association for Cryptologic Research (IACR), Interlaken, Switzerland, 2004.
Fung B.C.M., Wang K., Fu A.W.C., Pei J. Anonymity for continuous data publishing. In: Proceedings of EDBT, ACM, ACM International Conference Proceeding Series, Nantes, France, vol. 261, pp. 264–275, 2008.
Goethals B., Laur S., Lipmaa H., and Mielikainen T. On secure scalar product computation for privacy-preserving data mining. In: The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), Seoul, Korea, 2–3 December (eds. C. Park and S. Chee), pp. 104–120, 2004.
Goldreich O., Micali S., and Wigderson A. Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems. Journal of the ACM, 38:690–728, 1991.
Goldwasser S., Micali S., and Rackoff C. The knowledge complexity of interactive proof systems. In: Proceedings of the 17th Annual ACM Symposium on Theory of Computing, Providence, Rhode Island, USA, 6–8 May, pp. 291–304, 1985.
Jagannathan G., Pillaipakkamnatt K., and Wright R.N. A new privacy preserving distributed k-clustering algorithm. In: Proceedings of the 2006 SIAM International Conference on Data Mining (SDM06), Bethesda, Maryland, USA, 2006.
Jagannathan G. and Wright R.N. Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, 21–24 August, pp. 593–599, 2005.
Jiang W. and Atzori M. Secure distributed k-anonymous pattern mining. In: Sixth IEEE International Conference on Data Mining (ICDM06), Hong Kong, China, 18–22 December, pp. 319–329, 2006.
Jiang W. and Clifton C. A secure distributed framework for achieving k-anonymity. Special Issue of the VLDB Journal on Privacy-Preserving Data Management, September 2006.
Jiang W. and Clifton C. AC-framework for privacy-preserving collaboration. In: SIAM International Conference on Data Mining, Minneapolis, Minnesota, 26–28 April, 2007.
Jiang W., Clifton C., and Kantarcioglu M. Transforming semi-honest protocols to ensure accountability. Data and Knowledge Engineering, 65(1):57–74, 2008.
Johnson W. and Lindenstrauss J. Extensions of Lipshitz mapping into Hilbert space. Contemporary Mathematics, 26:189–206, 1984.
Han Y., Pei J., Jiang B., Tao Y., and Jia Y. Continuous privacy preserving publishing of data streams. In: Proceedings of EDBT, ACM, ACM International Conference Proceeding Series, Saint-Petersburg, Russia, vol. 360, pp. 648–659, 2009.
Kantarcıoglu M. and Clifton C. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, 16(9):1026–1037, September 2004.
LeFevre K, DeWitt D, and Ramakrishnan R. Incognito: Full domain k-anonymity. In: Proceedings of ACM SIGMOD Conference, Chicago, Illinois, USA, 2006.
LeFevre K., DeWitt D., and Ramakrishnan R. Mondrian multidomain k-anonymity. In: Proceedings of IEEE ICDE Conference, Istanbul, Turkey, 2007.
Li N., Li T., and Venkatasubramanian S. t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proceedings of IEEE ICED Conference, 2007.
Li J., Ooi B.C., Wang W. Anonymizing streaming data for privacy protection. In: Proceedings of IEEE ICDE Conference, 2008.
Lindell Y. and Pinkas B. Privacy preserving data mining. In: Advances in Cryptology – CRYPTO 2000, Springer-Verlag, Berlin, 20–24 August, pp. 36–54, 2000.
Lindell Y. and Pinkas B. Privacy preserving data mining. Journal of Cryptology, 15(3): 177–206, 2002.
Machanavajjahala A., Gehrke J., Kifer D., and Venkitasubramanian M. l-diversity: Privacy beyond k-anonymity. In: Proceedings of IEEE ICDE Conference, 2006.
Mitchell T. Machine Learning. McGraw-Hill Science/Engineering/Math, New York, NY, 1st edition, 1997.
Meyerson A. and Williams R. On the complexity of optimal k-anonymity. In: Proceedings of ACM PODS Conference, 2006.
Paillier P. Public key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology – Eurocrypt ‘99 Proceedings, Prague, Czech Republic, 2–6 May, LNCS 1592, pp. 223–238, Springer-Verlag, Berlin, 1999.
Park H. and Shim K. Approximate algorithms for k-anonymity. In: Proceedings of ACM SIGMOD Conference, Beijing, China, 2007.
Quinlan J.R. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
Schneier B. Applied Cryptography. Wiley, New York, NY, 2nd edition, 1995.
Sweeney L. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):571–588, 2002.
Trombetta A., Jiang W., Bertino E., Bossi L. Privately updating suppression and generalization based k-anonymous databases. In: Proceedings of ICDE, IEEE, Cancun, Mexico, pp. 1370–1372, 2008.
Vaidya J. and Clifton C. Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 23–26 July, pp. 639–644, 2002.
Vaidya J. and Clifton C. Privacy-preserving k-means clustering over vertically partitioned data. In: The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, 24–27 August, pp. 206–215, 2003.
Vaidya J. and Clifton C. Privacy preserving naïve Bayes classifier for vertically partitioned data. In: 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, 22–24 April, pp. 522–526, 2004.
Vaidya J. and Clifton C. Privacy-preserving decision trees over vertically partitioned data. In: The 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security, Storrs, Connecticut, 7–10 August, Springer, Berlin, 2005.
Vaidya J. and Clifton C. Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 13(4):593–622, November 2005.
Vaidya J., Clifton C., Kantarcioglu M., and Scott Patterson A. Privacy-preserving decision trees over vertically partitioned data. ACM Transactions on Knowledge Discovery in Data, 2(3):1–27, October 2008.
Vaidya J., Kantarcioglu M., and Clifton C. Privacy preserving naive Bayes classification. International Journal on Very Large Data Bases, 17(4):879–898, July 2008.
Vaidya J., Yu H., and Jiang X. Privacy preserving SVM classification. Knowledge and Information Systems, 14(2):161–178, February 2008.
Yao C, Wang XS, Jajodia S. Checking for k-anonymity violation by views. In: Proceedings VLDB Conference, Trondheim, Norway, 2005.
Yu H., Jiang X., and Vaidya J. Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: SAC ’06: Proceedings of the 2006 ACM Symposium on Applied Computing, ACM Press, New York, NY, USA, pp. 603–610, 2006.
Yu H., Vaidya J., and Jiang X. Privacy-preserving SVM classification on vertically partitioned data. In: Proceedings of PAKDD ’06, volume 3918 of Lecture Notes in Computer Science, Springer-Verlag, Berlin, January, pp. 647–656, 2006.
Wang K. and Fung B. Anonymizing sequential releases. In: Proceedings ACM KDD Conference, Philadelphia, Pennsylvania, USA, 2006.
Xiao X. and Tao Y. M-invariance: Towards privacy preserving re-publication of dynamic datasets. In: Proceeding of SIGMOD Conference, ACM, pp. 689–700, 2007.
Zhan J., Matwin S., and Chang L.W. Privacy-preserving collaborative association rule mining. In: Proceedings of the 19th Annual IFIP WG 11.3 Working Conference on Database and Applications Security, Storrs, Connecticut, 7–10 August, 2005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer London
About this chapter
Cite this chapter
Trombetta, A., Jiang, W., Bertino, E. (2010). Advanced Privacy-Preserving Data Management and Analysis. In: Nin, J., Herranz, J. (eds) Privacy and Anonymity in Information Management Systems. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84996-238-4_2
Download citation
DOI: https://doi.org/10.1007/978-1-84996-238-4_2
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84996-237-7
Online ISBN: 978-1-84996-238-4
eBook Packages: Computer ScienceComputer Science (R0)