Skip to main content
Log in

Privacy-preserving clustering with distributed EM mixture modeling

  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Privacy and security considerations can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery can alleviate this problem. We present a technique that uses EM mixture modeling to perform clustering on distributed data. This method controls data sharing, preventing disclosure of individual data items or any results that can be traced to an individual site.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, Santa Barbara, CA, pp 247–255 *http://doi.acm.org/10.1145/375551.375602

  2. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD conference on management of data. ACM, Dallas, TX, pp 439–450 *http://doi.acm.org/10.1145/342009.335438

  3. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

    Article  MathSciNet  Google Scholar 

  4. Benaloh JC (1986) Secret sharing homomorphisms: keeping shares of a secret secret. In: Odlyzko A (ed) Advances in cryptography—CRYPTO86: proceedings (Lecture notes in computer science), vol 263. Springer, Berlin Heidelberg New York pp 251–260 *http://springerlink.metapress.com/openurl.asp?genre=article&issn=0302-9743&volume=263&spage=251

  5. Blackmer S and Wilmer, Cutler, Pickering (1998) Transborder personal data flows: administrative practice. In: The privacy and American business meeting on model data protection contracts and laws. Washington, DC *http://www.privacyexchange.org/tbdi/pdataflow.html

  6. Celeux G, Chauveau D, Diebolt J (1996) Stochastic versions of the EM algorithm: an experimental study in the mixture case. J Stat Comput Simul 55:287–314

    Article  Google Scholar 

  7. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1–38

    MathSciNet  Google Scholar 

  8. Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model based cluster analysis. Comput J 41:578–588

    Article  Google Scholar 

  9. Goldreich O, Micali S, Wigderson A (1987) How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM symposium on the theory of computing, pp 218–229 *http://doi.acm.org/10.1145/28395.28420

  10. Kantarcıoglu M, Clifton C (to appear) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng

  11. Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: Advances in cryptology—CRYPTO 2000. Springer, Berlin Heidelberg New York, pp 36–54 *http://link.springer.de/link/service/series/0558/bibs/1880/18800036.htm*

  12. McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Dekker, New York

    Google Scholar 

  13. McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York

  14. McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York

  15. Meng XL, Rubin DB (1991) Using EM to obtain asymptotic variance covariance matrices: the SEM algorithm. J Am Stat Assoc 86:899–909

    Article  Google Scholar 

  16. Moore AW (1999) Very fast EM-based mixture model clustering using multiresolution kd-trees. Adv Neur Inf Process Syst 11

  17. Pri (2001) National omnibus laws, http://www.privacyexchange.org/legal/nat/omni/nol.html *http://www.privacyexchange.org/legal/nat/omni/nol.html

  18. Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of 28th international conference on very large data bases. VLDB, Hong Kong, pp 682–693 *http://www.vldb.org/conf/2002/S19P03.pdf

  19. Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: The eighth ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, Alberta, Canada, pp 639–644 *http://doi.acm.org/10.1145/775047.775142

  20. Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC

  21. Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science. IEEE, pp 162–167

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, X., Clifton, C. & Zhu, M. Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8, 68–81 (2005). https://doi.org/10.1007/s10115-004-0148-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-004-0148-7

Keywords

Navigation