Skip to main content

Finite Mixture Model Clustering of SNP Data

  • Chapter
  • First Online:
Statistical Modelling in Biostatistics and Bioinformatics

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Abstract

Finite mixture models have been used extensively in clustering applications, where each component of the mixture distribution is assumed to represent an individual cluster. The simplest example describes each cluster in terms of a multivariate Gaussian density with various covariance structures. However, using finite mixture models as a clustering tool is highly flexible and allows for the specification of a wide range of statistical models to describe the data within each cluster. These include modelling each cluster using linear regression models, mixed effects models, generalized linear models, etc. This paper investigates using mixtures of orthogonal regression models to cluster biological data arising from a study of the sugarcane plant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Celeux, G., Martin, O., & Lavergne, C. (2005). Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments. Statistical Modelling, 5, 243–267.

    Article  MATH  MathSciNet  Google Scholar 

  • Cordeiro, G., Eliott, F., McIntyre, C., Casu, R. E., & Henry, R. J. (2006). Characterization of single nucleotide polymorphisms in sugarcane ESTs. Theoretical and Applied Genetics, 113, 331–343.

    Article  Google Scholar 

  • Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.

    MATH  MathSciNet  Google Scholar 

  • Eisen, M., Spellman, P., Brown, P., & Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proceeding of the National Academy of Sciences of the USA, 95, 14863–14868.

    Google Scholar 

  • Fraley, C., & Raftery, A. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631.

    Article  MATH  MathSciNet  Google Scholar 

  • Fujisawa, H., Eguchi, S., Ushijima, M., Miyata, S., Miki, Y., Muto, T., et al. (2004). Genotyping of single nucleotide polymorphism using model-based clustering. Bioinformatics, 20, 718–726.

    Article  Google Scholar 

  • Futschik, M., & Carlisle, B. (2005). Noise-robust soft clustering of gene expression time-course data. Journal of Bioinformatics and Computational Biology, 3, 965–988.

    Article  Google Scholar 

  • Golub, G., & Van Loan, C. (1980). An analysis of the total least squares problem. SIAM Journal of Numerical Analysis, 17, 883–893.

    Article  MATH  Google Scholar 

  • Grün, B., & Leisch, F. (2008). FlexMix Version 2: Finite mixtures with concomitant variables and varying and constant parameters. Journal of Statistical Software, 28, 1–35.

    Google Scholar 

  • Hartigan, J., & Wong, M. (1978). A k-means clustering algorithm. Applied Statistics, 28, 100–108.

    Article  Google Scholar 

  • Joliffe, I. (2002). Principal component analysis. New York: Springer.

    Google Scholar 

  • Kohonen, T. (1997). Self-organizing maps. New York: Springer.

    Book  MATH  Google Scholar 

  • Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11, 1–18.

    Google Scholar 

  • McLachlan, G., Bean, R., & Peel, D. (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics, 18, 413–422.

    Article  Google Scholar 

  • McLachlan, G., Ng, S., & Bean, R. (2006). Robust cluster analysis via mixture models. Austrian Journal of Statistics, 35, 157–174.

    Google Scholar 

  • McLachlan, G., Peel, D., & Bean, R. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics and Data Analysis, 41, 379–388.

    Article  MATH  MathSciNet  Google Scholar 

  • Ng, S., McLachlan, G., Wang, K., Ben-Tovim Jones, L., & Ng, S. (2006). A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics, 22, 1745–1752.

    Article  Google Scholar 

  • Olivier, M. (2005). The invader assay for SNP genotyping. Mutational Research, 573, 103–110.

    Article  Google Scholar 

  • Palhares, A., Rodrigues-Morais, T., Van Sluys, M. A., Domingues, D., Maccheroni, W., Jordão, H., et al. (2012). A novel linkage map of sugarcane with evidence for clustering of retrotransposon-based markers. BMC Genetics, 13(51), 1–16.

    Google Scholar 

  • Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., et al. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273–3297.

    Article  Google Scholar 

  • Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., et al. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceedings of the National Academy of Science USA, 96, 2907–2912.

    Google Scholar 

  • Tseng, G., & Wong, W. (2005). Tight clustering: A resampling-based approach for identifying stable and tight patterns in data. Biometrics, 61, 10–16.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

Norma Bargary (nee Coffey) carried out this work at the National University of Ireland, Galway while supported by Science Foundation Ireland under Grant No. 07/MI/012 (BIO-SI project). The authors would also like to thank Dr. Anete P. Souza who carried out the laboratory work and is the co-ordinator of the project that produced the data analyzed.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norma Bargary .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Bargary, N., Hinde, J., Garcia, A.A.F. (2014). Finite Mixture Model Clustering of SNP Data. In: MacKenzie, G., Peng, D. (eds) Statistical Modelling in Biostatistics and Bioinformatics. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-04579-5_11

Download citation

Publish with us

Policies and ethics