Gaussian mixture modeling and model-based clustering under measurement inconsistency

Sarkar, Shuchismita; Melnykov, Volodymyr; Zheng, Rong

doi:10.1007/s11634-020-00393-9

Gaussian mixture modeling and model-based clustering under measurement inconsistency

Regular Article
Published: 12 May 2020

Volume 14, pages 379–413, (2020)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Shuchismita Sarkar¹,
Volodymyr Melnykov² &
Rong Zheng³

1029 Accesses
9 Citations
Explore all metrics

Abstract

Finite mixtures present a powerful tool for modeling complex heterogeneous data. One of their most important applications is model-based clustering. It assumes that each data group can be reasonably described by one mixture model component. This establishes a one-to-one relationship between mixture components and clusters. In some cases, however, this relationship can be broken due to the presence of observations from the same class recorded in different ways. This effect can occur because of recording inconsistencies due to the use of different scales, operator errors, or simply various recording styles. The idea presented in this paper aims to alleviate this issue through modifications incorporated into mixture models. While the proposed methodology is applicable to a broad class of mixture models, in this paper it is illustrated on Gaussian mixtures. Several simulation studies and an application to a real-life data set are considered, yielding promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 9

Recent Developments in Model-Based Clustering with Applications

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Article 04 March 2020

Sanjeena Subedi & Paul D. McNicholas

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

References

Alimoglu F, Alpaydin E (1996) Methods of combining multiple classifiers based on different representations for pen-based handwriting recognition. In: Proceedings of the fifth Turkish artificial intelligence and artificial neural networks symposium (TAINN 96)
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
Article MathSciNet MATH Google Scholar
Baudry J-P, Raftery A, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19:332–353
Article MathSciNet Google Scholar
Bunke H, Sanfeliu A (1990) Syntactic and structural pattern recognition: theory and applications, vol 7. World Scientific, Singapore
Book MATH Google Scholar
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14:315–332
Article MathSciNet MATH Google Scholar
Celeux G Govaert (1995) Gaussian parsimonious clustering models. Comput Stat Data Anal 2:781–93
Google Scholar
Dasgupta S (1999) Learning mixtures of Gaussians. In: Proceedings of the IEEE symposium on foundations of computer science, New York, pp 633–644
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
MATH Google Scholar
Di Zio M, Guarnera U, Rocci R (2007) A mixture of mixture models for a classification problem: the unity measure error. Comput Stat Data Anal 51(5):2573–2585
Article MathSciNet MATH Google Scholar
Eden M (1961) On the formalization of handwriting. In: Structure of language and its mathematical aspect
Fisher P (1999) Models of uncertainty in spatial data. Geogr Inf Syst 1:191–205
Google Scholar
Fop M, Murphy TB, Hanlon L (2017) Model-based clustering of data with measurement errors. In: CLADAG, 2017
Gormley IC, Murphy TB (2010) A mixture of experts latent position cluster model for social network data. Stat Methodol 7:385–405
Article MathSciNet MATH Google Scholar
Govindan V, Shivaprasad A (1990) Character recognition—a review. Pattern Recognit 23:671–683
Article Google Scholar
Han J, Kamber M, Pei J (eds) (2012) Data mining: concepts and techniques, 3rd edn. Elsevier, Amsterdam
MATH Google Scholar
Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Classif 4:3–34
Article MathSciNet MATH Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article MATH Google Scholar
Ikeda K, Yamamura T, Mitamura Y, Fujiwara S, Tominaga Y, Kiyono T (1981) On-line recognition of hand-written characters utilizing positional and stroke vector sequences. Pattern Recognit 13:191–206
Article Google Scholar
Just BH, Marc D, Munns M, Sandefer R (2016) Why patient matching is a challenge: research on master patient index (MPI) data discrepancies in key identifying fields. Perspect Health Inf Manag 13:1e
Google Scholar
Kaufman L, Rousseuw PJ (1990) Finding groups in data. Wiley, New York
Book Google Scholar
Kumar M, Patel N (2007) Clustering data with measurement errors. Comput Stat Data Anal 51(12):6084–6101
Article MathSciNet MATH Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium. vol 1, pp 281–297
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Melnykov V (2013) Finite mixture modelling in mass spectrometry analysis. J R Stat Soc Ser C 62:573–592
Article MathSciNet Google Scholar
Melnykov V (2016) Merging mixture components for clustering through pairwise overlap. J Comput Graph Stat 25:66–90
Article MathSciNet Google Scholar
Melnykov V, Chen W-C, Maitra R (2012) MixSim: R package for simulating datasets with pre-specified clustering complexity. J Stat Softw 51:1–25
Article Google Scholar
Pankove JI (2012) Optical processes in semiconductors. Courier Corporation, Chelmsford
Google Scholar
Pearson K (1894) Contribution to the mathematical theory of evolution. Philos Trans R Soc 185:71–110
MATH Google Scholar
Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Data Eng Bull 23(4):3–13
Google Scholar
Schlattmann P (2009) Medical applications of finite mixture models. Springer, Berlin
MATH Google Scholar
Schwarz G (1978) Estimating the dimensions of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Sethi IK, Chatterjee B (1977) Machine recognition of constrained hand printed Devanagari. Pattern Recognit 9:69–75
Article Google Scholar
Sneath P (1957) The application of computers to taxonomy. J Gen Microbiol 17:201–226
Article Google Scholar
Sokal R, Michener C (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438
Google Scholar
Thomas H, Lohaus A, Brainerd C (1993) Modeling growth and individual differences in spatial tasks. Monogr Soc Res Child Devd 58:1–190
Article Google Scholar
Tjaden B (2006) An approach for clustering gene expression data with error information. BMC Bioinform 7(1):17
Article Google Scholar
Ullrich B, Antillòn A, Bhowmick M, Wang J, Xi H (2014) Atomic transition region at the crossover between quantum dots to molecules. Phys Scr 89(2):025801
Article Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244
Article MathSciNet Google Scholar
Young WC, Raftery AE, Yeung KY (2016) Model-based clustering with data correction for removing artifacts in gene expression data. Ann Appl Stat 11:1998
Article MathSciNet MATH Google Scholar
Zhu X, Melnykov V (2018) Manly transformation in finite mixture modeling. Comput Stat Data Anal 121:190–208
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Bowling Green State University, Bowling Green, OH, 43402, USA
Shuchismita Sarkar
The University of Alabama, Tuscaloosa, AL, 35487, USA
Volodymyr Melnykov
Western Illinois University, Macomb, IL, 61455, USA
Rong Zheng

Authors

Shuchismita Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Volodymyr Melnykov
View author publications
You can also search for this author in PubMed Google Scholar
Rong Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Volodymyr Melnykov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 52 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, S., Melnykov, V. & Zheng, R. Gaussian mixture modeling and model-based clustering under measurement inconsistency. Adv Data Anal Classif 14, 379–413 (2020). https://doi.org/10.1007/s11634-020-00393-9

Download citation

Received: 21 October 2018
Revised: 14 December 2019
Accepted: 06 March 2020
Published: 12 May 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11634-020-00393-9

Keywords

Mathematics Subject Classification

62H30

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Gaussian mixture modeling and model-based clustering under measurement inconsistency

Abstract

Access this article

Similar content being viewed by others

Recent Developments in Model-Based Clustering with Applications

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 52 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Gaussian mixture modeling and model-based clustering under measurement inconsistency

Abstract

Access this article

Similar content being viewed by others

Recent Developments in Model-Based Clustering with Applications

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 52 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation