A distributed EM algorithm to estimate the parameters of a finite mixture of components

Safarinejadian, Behrooz; Menhaj, Mohammad B.; Karrari, Mehdi

doi:10.1007/s10115-009-0218-y

A distributed EM algorithm to estimate the parameters of a finite mixture of components

Regular Paper
Published: 04 June 2009

Volume 23, pages 267–292, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Behrooz Safarinejadian¹,
Mohammad B. Menhaj¹ &
Mehdi Karrari¹

313 Accesses
20 Citations
Explore all metrics

Abstract

In this paper, a distributed expectation maximization (DEM) algorithm is first introduced in a general form for estimating the parameters of a finite mixture of components. This algorithm is used for density estimation and clustering of data distributed over nodes of a network. Then, a distributed incremental EM algorithm (DIEM) with a higher convergence rate is proposed. After a full derivation of distributed EM algorithms, convergence of these algorithms is analyzed based on the negative free energy concept used in statistical physics. An analytical approach is also developed for evaluating the convergence rate of both incremental and distributed incremental EM algorithms. It is analytically shown that the convergence rate of DIEM is much faster than that of the DEM algorithm. Finally, simulation results approve that DIEM remarkably outperforms DEM for both synthetic and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modified EM Algorithms for Parameter Estimation in Finite Mixture Models

Distributed Estimation of Mixture Models

An MML Embedded Approach for Estimating the Number of Clusters

References

Assent I, Krieger R, Glavic B (2008) Clustering multidimensional sequences in spatial and temporal databases. Knowl Inf Syst 16: 29–51
Article Google Scholar
Besag J (1975) Statistical analysis of non-lattice data. Statistician 24(3): 79–195
Article MathSciNet Google Scholar
Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc B (Methodological) 48(3): 259–302
MATH MathSciNet Google Scholar
Brecheisen S, Kriegel HP, Pfeifle M (2006) Multi-step density-based clustering. Knowl Inf Syst 9(3): 284–308
Article Google Scholar
Chen R, Sivakumar K, Kargupta H (2004) Collective mining of Bayesian networks from distributed heterogeneous data. Knowl Inf Syst 6: 164–187
Google Scholar
Dasgupta S (1999) Learning mixtures of Gaussians. In: Proceedings of the 40th annual symposium on foundations of computer science. IEEE Computer Society, New York, 17–19 October, pp 634–644
Datta S, Bhaduri K, Giannella C et al (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput 10: 18–26
Article Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood estimation from incomplete data via the em algorithm. J R Stat Soc Ser B 39: 1–38
MATH MathSciNet Google Scholar
Dutta S, Gianella C, Kargupta H (2005) K-means clustering over peer-to-peer networks. In: 8th international workshop on high performance and distributed mining, SIAM international conference on data mining
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3): 381–396
Article Google Scholar
Gabriela M, Sander J, Ester M (2008) Robust projected clustering. Knowl Inf Syst 14: 273–298
Article MATH Google Scholar
Ghosh D, Chinnaiyan AM (2002) Mixture modeling of gene expression data from microarray experiments. Bioinformatics 18: 275–286
Article Google Scholar
Giannella C, Dutta H, Mukherjee S et al (2006) Efficient kernel density estimation over distributed data. In: 9th international workshop on high performance and distributed mining, SIAM international conference on data mining
Gondek D, Hofmann T (2007) Non-redundant data clustering. Knowl Inf Syst 12: 1–24
Article Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2/3): 107–145
Article MATH Google Scholar
Hinnerburge D, Keim DA (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5: 387–415
Article Google Scholar
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11): 1370–1386
Article Google Scholar
Kargupta H, Kamath C, Chan P (2000) Distributed and parallel data mining: emergence, growth, and future directions. Advances in distributed and parallel knowledge discovery, AAAI/MIT Press, Cambridge, pp 409–416
Kowalczyk W, Vlassis N (2005) Newscast EM. Advances in neural information processing systems, vol 17. MIT Press, Cambridge
Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8: 68–81
Article Google Scholar
Ma J, Xu L, Jordan MI (2000) Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Comput 12: 2881–2907
Article Google Scholar
McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18: 413–422
Article Google Scholar
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York, pp 120–211
MATH Google Scholar
Neal R, Hinton G (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI(eds) Learning in graphical models. MIT Press, Cambridge, pp 355–368
Google Scholar
Nowak RD (2003) Distributed EM algorithms for density estimation and clustering in sensor networks. IEEE Trans Signal Process 51: 2245–2253
Article Google Scholar
Ordonez C, Omiecinski E (2002) FREM: fast and robust EM clustering for large data sets. In: Proceedings of the ACM CIKM conference, pp 590–599
Ordonez C, Omiecinski E (2005) Accelerating EM clustering to find high-quality solutions. Knowl Inf Syst 7: 135–157
Article Google Scholar
Roweis S, Ghahramani Z (1999) A unifying review of linear Gaussian models. Neural Comput 11: 305–345
Article Google Scholar
Thiesson B, Meek C, Heckerman D (2001) Accelerating EM for large databases. Mach Learn 45: 279–299
Article MATH Google Scholar
Verbeek JJ, Vlassis N, Nunnink JRJ (2003) A variational EM approach for large-scale mixture modeling. In: Proceedings of 8th annual conference of the advanced school of computing and imaging. Heijen, The Netherlands
Vincent C, Wüthrich B (2002) Distributed mining of classification rules. Knowl Inf Syst 4: 1–30
Article MATH Google Scholar
Wolff R, Schuster A (2004) Association rule mining in peer-to-peer systems. IEEE Trans Syst Man Cybern B 34: 2426–2438
Article Google Scholar
Wu X, Kumar V, Quinlan J et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14: 1–37
Article Google Scholar
Xia Y, Zhang C, Weng S et al (2005) Fault-tolerant EM algorithm for GMM in sensor networks. In: Proceedings of the 2005 international conference on data mining, Las Vegas, Nevada, USA, pp 166–172
Xu L, Jordan MI (1996) On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput 8: 129–151
Article Google Scholar
Yeung KY, Fraley C, Murua A et al (2001) Model-based clustering and data transformation for gene expression data. Bioinformatics 17: 977–987
Article Google Scholar
Yuille A, Stolorz P, Utans J (1994) Mixtures of distributions and the EM algorithm. Neural Comput 6(1): 334–340
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
Behrooz Safarinejadian, Mohammad B. Menhaj & Mehdi Karrari

Authors

Behrooz Safarinejadian
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad B. Menhaj
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Karrari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behrooz Safarinejadian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Safarinejadian, B., Menhaj, M.B. & Karrari, M. A distributed EM algorithm to estimate the parameters of a finite mixture of components. Knowl Inf Syst 23, 267–292 (2010). https://doi.org/10.1007/s10115-009-0218-y

Download citation

Received: 14 June 2008
Revised: 18 April 2009
Accepted: 02 May 2009
Published: 04 June 2009
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10115-009-0218-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distributed EM algorithm to estimate the parameters of a finite mixture of components

Abstract

Access this article

Similar content being viewed by others

Modified EM Algorithms for Parameter Estimation in Finite Mixture Models

Distributed Estimation of Mixture Models

An MML Embedded Approach for Estimating the Number of Clusters

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A distributed EM algorithm to estimate the parameters of a finite mixture of components

Abstract

Access this article

Similar content being viewed by others

Modified EM Algorithms for Parameter Estimation in Finite Mixture Models

Distributed Estimation of Mixture Models

An MML Embedded Approach for Estimating the Number of Clusters

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation