Finite Mixture Model Clustering of SNP Data

Bargary, Norma; Hinde, J.; Garcia, A. Augusto F.

doi:10.1007/978-3-319-04579-5_11

Norma Bargary³,
J. Hinde⁴ &
A. Augusto F. Garcia⁵

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

3051 Accesses
2 Citations

Abstract

Finite mixture models have been used extensively in clustering applications, where each component of the mixture distribution is assumed to represent an individual cluster. The simplest example describes each cluster in terms of a multivariate Gaussian density with various covariance structures. However, using finite mixture models as a clustering tool is highly flexible and allows for the specification of a wide range of statistical models to describe the data within each cluster. These include modelling each cluster using linear regression models, mixed effects models, generalized linear models, etc. This paper investigates using mixtures of orthogonal regression models to cluster biological data arising from a study of the sugarcane plant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Celeux, G., Martin, O., & Lavergne, C. (2005). Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments. Statistical Modelling, 5, 243–267.
Article MATH MathSciNet Google Scholar
Cordeiro, G., Eliott, F., McIntyre, C., Casu, R. E., & Henry, R. J. (2006). Characterization of single nucleotide polymorphisms in sugarcane ESTs. Theoretical and Applied Genetics, 113, 331–343.
Article Google Scholar
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.
MATH MathSciNet Google Scholar
Eisen, M., Spellman, P., Brown, P., & Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proceeding of the National Academy of Sciences of the USA, 95, 14863–14868.
Google Scholar
Fraley, C., & Raftery, A. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631.
Article MATH MathSciNet Google Scholar
Fujisawa, H., Eguchi, S., Ushijima, M., Miyata, S., Miki, Y., Muto, T., et al. (2004). Genotyping of single nucleotide polymorphism using model-based clustering. Bioinformatics, 20, 718–726.
Article Google Scholar
Futschik, M., & Carlisle, B. (2005). Noise-robust soft clustering of gene expression time-course data. Journal of Bioinformatics and Computational Biology, 3, 965–988.
Article Google Scholar
Golub, G., & Van Loan, C. (1980). An analysis of the total least squares problem. SIAM Journal of Numerical Analysis, 17, 883–893.
Article MATH Google Scholar
Grün, B., & Leisch, F. (2008). FlexMix Version 2: Finite mixtures with concomitant variables and varying and constant parameters. Journal of Statistical Software, 28, 1–35.
Google Scholar
Hartigan, J., & Wong, M. (1978). A k-means clustering algorithm. Applied Statistics, 28, 100–108.
Article Google Scholar
Joliffe, I. (2002). Principal component analysis. New York: Springer.
Google Scholar
Kohonen, T. (1997). Self-organizing maps. New York: Springer.
Book MATH Google Scholar
Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11, 1–18.
Google Scholar
McLachlan, G., Bean, R., & Peel, D. (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics, 18, 413–422.
Article Google Scholar
McLachlan, G., Ng, S., & Bean, R. (2006). Robust cluster analysis via mixture models. Austrian Journal of Statistics, 35, 157–174.
Google Scholar
McLachlan, G., Peel, D., & Bean, R. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics and Data Analysis, 41, 379–388.
Article MATH MathSciNet Google Scholar
Ng, S., McLachlan, G., Wang, K., Ben-Tovim Jones, L., & Ng, S. (2006). A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics, 22, 1745–1752.
Article Google Scholar
Olivier, M. (2005). The invader assay for SNP genotyping. Mutational Research, 573, 103–110.
Article Google Scholar
Palhares, A., Rodrigues-Morais, T., Van Sluys, M. A., Domingues, D., Maccheroni, W., Jordão, H., et al. (2012). A novel linkage map of sugarcane with evidence for clustering of retrotransposon-based markers. BMC Genetics, 13(51), 1–16.
Google Scholar
Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., et al. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273–3297.
Article Google Scholar
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., et al. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceedings of the National Academy of Science USA, 96, 2907–2912.
Google Scholar
Tseng, G., & Wong, W. (2005). Tight clustering: A resampling-based approach for identifying stable and tight patterns in data. Biometrics, 61, 10–16.
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

Norma Bargary (nee Coffey) carried out this work at the National University of Ireland, Galway while supported by Science Foundation Ireland under Grant No. 07/MI/012 (BIO-SI project). The authors would also like to thank Dr. Anete P. Souza who carried out the laboratory work and is the co-ordinator of the project that produced the data analyzed.

Author information

Authors and Affiliations

Department of Mathematic & Statistics, University of Limerick, Limerick, Ireland
Norma Bargary
School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Ireland
J. Hinde
Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo, São Paulo, Brazil
A. Augusto F. Garcia

Authors

Norma Bargary
View author publications
You can also search for this author in PubMed Google Scholar
J. Hinde
View author publications
You can also search for this author in PubMed Google Scholar
A. Augusto F. Garcia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Norma Bargary .

Editor information

Editors and Affiliations

Centre of Biostatistics, University of Limerick, Limerick, Ireland
Gilbert MacKenzie
Centre of Biostatistics, University of Limerick, Limerick, Ireland
Defen Peng

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bargary, N., Hinde, J., Garcia, A.A.F. (2014). Finite Mixture Model Clustering of SNP Data. In: MacKenzie, G., Peng, D. (eds) Statistical Modelling in Biostatistics and Bioinformatics. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-04579-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-04579-5_11
Published: 13 March 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04578-8
Online ISBN: 978-3-319-04579-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics