Abstract
Finite mixture models have been used extensively in clustering applications, where each component of the mixture distribution is assumed to represent an individual cluster. The simplest example describes each cluster in terms of a multivariate Gaussian density with various covariance structures. However, using finite mixture models as a clustering tool is highly flexible and allows for the specification of a wide range of statistical models to describe the data within each cluster. These include modelling each cluster using linear regression models, mixed effects models, generalized linear models, etc. This paper investigates using mixtures of orthogonal regression models to cluster biological data arising from a study of the sugarcane plant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Celeux, G., Martin, O., & Lavergne, C. (2005). Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments. Statistical Modelling, 5, 243–267.
Cordeiro, G., Eliott, F., McIntyre, C., Casu, R. E., & Henry, R. J. (2006). Characterization of single nucleotide polymorphisms in sugarcane ESTs. Theoretical and Applied Genetics, 113, 331–343.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.
Eisen, M., Spellman, P., Brown, P., & Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proceeding of the National Academy of Sciences of the USA, 95, 14863–14868.
Fraley, C., & Raftery, A. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631.
Fujisawa, H., Eguchi, S., Ushijima, M., Miyata, S., Miki, Y., Muto, T., et al. (2004). Genotyping of single nucleotide polymorphism using model-based clustering. Bioinformatics, 20, 718–726.
Futschik, M., & Carlisle, B. (2005). Noise-robust soft clustering of gene expression time-course data. Journal of Bioinformatics and Computational Biology, 3, 965–988.
Golub, G., & Van Loan, C. (1980). An analysis of the total least squares problem. SIAM Journal of Numerical Analysis, 17, 883–893.
Grün, B., & Leisch, F. (2008). FlexMix Version 2: Finite mixtures with concomitant variables and varying and constant parameters. Journal of Statistical Software, 28, 1–35.
Hartigan, J., & Wong, M. (1978). A k-means clustering algorithm. Applied Statistics, 28, 100–108.
Joliffe, I. (2002). Principal component analysis. New York: Springer.
Kohonen, T. (1997). Self-organizing maps. New York: Springer.
Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11, 1–18.
McLachlan, G., Bean, R., & Peel, D. (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics, 18, 413–422.
McLachlan, G., Ng, S., & Bean, R. (2006). Robust cluster analysis via mixture models. Austrian Journal of Statistics, 35, 157–174.
McLachlan, G., Peel, D., & Bean, R. (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics and Data Analysis, 41, 379–388.
Ng, S., McLachlan, G., Wang, K., Ben-Tovim Jones, L., & Ng, S. (2006). A mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics, 22, 1745–1752.
Olivier, M. (2005). The invader assay for SNP genotyping. Mutational Research, 573, 103–110.
Palhares, A., Rodrigues-Morais, T., Van Sluys, M. A., Domingues, D., Maccheroni, W., Jordão, H., et al. (2012). A novel linkage map of sugarcane with evidence for clustering of retrotransposon-based markers. BMC Genetics, 13(51), 1–16.
Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., et al. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273–3297.
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., et al. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceedings of the National Academy of Science USA, 96, 2907–2912.
Tseng, G., & Wong, W. (2005). Tight clustering: A resampling-based approach for identifying stable and tight patterns in data. Biometrics, 61, 10–16.
Acknowledgements
Norma Bargary (nee Coffey) carried out this work at the National University of Ireland, Galway while supported by Science Foundation Ireland under Grant No. 07/MI/012 (BIO-SI project). The authors would also like to thank Dr. Anete P. Souza who carried out the laboratory work and is the co-ordinator of the project that produced the data analyzed.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Bargary, N., Hinde, J., Garcia, A.A.F. (2014). Finite Mixture Model Clustering of SNP Data. In: MacKenzie, G., Peng, D. (eds) Statistical Modelling in Biostatistics and Bioinformatics. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-04579-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-04579-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04578-8
Online ISBN: 978-3-319-04579-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)