Privacy-preserving clustering with distributed EM mixture modeling

Lin, Xiaodong; Clifton, Chris; Zhu, Michael

doi:10.1007/s10115-004-0148-7

Privacy-preserving clustering with distributed EM mixture modeling

Published: 01 July 2005

Volume 8, pages 68–81, (2005)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xiaodong Lin¹,
Chris Clifton² &
Michael Zhu³

409 Accesses
90 Citations
Explore all metrics

Abstract

Privacy and security considerations can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery can alleviate this problem. We present a technique that uses EM mixture modeling to perform clustering on distributed data. This method controls data sharing, preventing disclosure of individual data items or any results that can be traced to an individual site.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling

Private Distributed Three-Party Learning of Gaussian Mixture Models

Latent mixture modeling for clustered data

Article 26 June 2018

References

Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, Santa Barbara, CA, pp 247–255 *http://doi.acm.org/10.1145/375551.375602
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD conference on management of data. ACM, Dallas, TX, pp 439–450 *http://doi.acm.org/10.1145/342009.335438
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Article MathSciNet Google Scholar
Benaloh JC (1986) Secret sharing homomorphisms: keeping shares of a secret secret. In: Odlyzko A (ed) Advances in cryptography—CRYPTO86: proceedings (Lecture notes in computer science), vol 263. Springer, Berlin Heidelberg New York pp 251–260 *http://springerlink.metapress.com/openurl.asp?genre=article&issn=0302-9743&volume=263&spage=251
Blackmer S and Wilmer, Cutler, Pickering (1998) Transborder personal data flows: administrative practice. In: The privacy and American business meeting on model data protection contracts and laws. Washington, DC *http://www.privacyexchange.org/tbdi/pdataflow.html
Celeux G, Chauveau D, Diebolt J (1996) Stochastic versions of the EM algorithm: an experimental study in the mixture case. J Stat Comput Simul 55:287–314
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1–38
MathSciNet Google Scholar
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model based cluster analysis. Comput J 41:578–588
Article Google Scholar
Goldreich O, Micali S, Wigderson A (1987) How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM symposium on the theory of computing, pp 218–229 *http://doi.acm.org/10.1145/28395.28420
Kantarcıoglu M, Clifton C (to appear) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng
Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: Advances in cryptology—CRYPTO 2000. Springer, Berlin Heidelberg New York, pp 36–54 *http://link.springer.de/link/service/series/0558/bibs/1880/18800036.htm*
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Dekker, New York
Google Scholar
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Meng XL, Rubin DB (1991) Using EM to obtain asymptotic variance covariance matrices: the SEM algorithm. J Am Stat Assoc 86:899–909
Article Google Scholar
Moore AW (1999) Very fast EM-based mixture model clustering using multiresolution kd-trees. Adv Neur Inf Process Syst 11
Pri (2001) National omnibus laws, http://www.privacyexchange.org/legal/nat/omni/nol.html *http://www.privacyexchange.org/legal/nat/omni/nol.html
Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of 28th international conference on very large data bases. VLDB, Hong Kong, pp 682–693 *http://www.vldb.org/conf/2002/S19P03.pdf
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: The eighth ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, Alberta, Canada, pp 639–644 *http://doi.acm.org/10.1145/775047.775142
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC
Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science. IEEE, pp 162–167

Download references

Author information

Authors and Affiliations

Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH, 45221-0025, USA
Xiaodong Lin
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Chris Clifton
Department of Statistics, Purdue University, West Lafayette, IN, USA
Michael Zhu

Authors

Xiaodong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chris Clifton
View author publications
You can also search for this author in PubMed Google Scholar
Michael Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, X., Clifton, C. & Zhu, M. Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8, 68–81 (2005). https://doi.org/10.1007/s10115-004-0148-7

Download citation

Received: 18 January 2003
Revised: 16 August 2003
Accepted: 28 October 2003
Published: 01 July 2005
Issue Date: July 2005
DOI: https://doi.org/10.1007/s10115-004-0148-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy-preserving clustering with distributed EM mixture modeling

Abstract

Access this article

Similar content being viewed by others

Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling

Private Distributed Three-Party Learning of Gaussian Mixture Models

Latent mixture modeling for clustered data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Privacy-preserving clustering with distributed EM mixture modeling

Abstract

Access this article

Similar content being viewed by others

Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling

Private Distributed Three-Party Learning of Gaussian Mixture Models

Latent mixture modeling for clustered data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation