Statistical methods for classifying genotypes

Crossa, José; Franco, Jorge

doi:10.1023/B:EUPH.0000040500.86428.e8

Statistical methods for classifying genotypes

Published: June 2004

Volume 137, pages 19–37, (2004)
Cite this article

Euphytica Aims and scope Submit manuscript

José Crossa¹ &
Jorge Franco²

490 Accesses
76 Citations
Explore all metrics

Abstract

In genetic resource conservation and plant breeding, multivariate data on continuous and categorical traits are collected with the objective of selecting genotypes and accessions that best represent the entire population or gene collection with the minimum loss of genetic diversity. Therefore, the best numerical classification strategy is the one that produces the most compact and well-separated groups, that is, minimum variability within each group and maximum variability among groups. In this study, we review geometric classification techniques as well as statistical models based on mixed distribution models. The two-stage sequential clustering strategy uses all variables, continuous and categorical, and it tends to form more homogeneous groups of individuals than other clustering strategies. The sequential clustering strategy can be applied to three-way data comprising genotypes × environments × attributes. This approach groups genotypes with consistent responses for most of the continuous and categorical traits across environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Anderberg, M.R., 1973. Cluster Analysis for Applications. Academic Press, New York.
Google Scholar
Basford, K. & G.J. McLachlan, 1985. The mixture method of clus-tering applied to three-way data. J Classif 2: 109–125.
Article Google Scholar
Binder, D.A., 1978. Bayesian Cluster Analysis. Biometrika 65: 31–38.
Article Google Scholar
Crossa, J., 1990. Statistical analyses of multilocation trials. Adv Agron 44: 55–85.
Article Google Scholar
Calinski, R.B. & J. Harabasz, 1974. A dendrite method for cluster analysis. Comm Statistics 3: 1–27.
Article Google Scholar
Chang, P.C. & A.A. Afifi, 1974. Classification based on dichotomous and continuous variables. JASA 69: 336–339.
Google Scholar
Day, N.E., 1969. Estimating the components of a mixture of normal distributions. Biometrika 56: 463–474.
Article Google Scholar
Dempster, A.P., N.M. Laird & D.B. Rubin, 1977. Maximum likeli-hood from incomplete data via the EM algorithm. J. Roy Stat Soc Ser B 39: 1–38.
Google Scholar
Duda, R.O. & P.E. Hart, 1973. Pattern Classification and Scene Analysis, Wiley, New York.
Google Scholar
Efron, B., 1979. Bootstrap methods: another look at the jacknife. Ann Stat 7: 1–26.
Google Scholar
Everitt, B.S., 1981. A Monte Carlo investigation of the likelihood ratio test for the number of components in a mixture of normal distributions. Multivar Behav Res 16: 171–180.
Article Google Scholar
Forgy, E.W., 1965. Cluster analysis of multivariate data: efficiency vs. interpretability of classifications (abstract). Biometrics 21: 768–769.
Google Scholar
Franco, J., J. Crossa, J. Villaseñor, S. Taba & S.A. Eberhart, 1997a. Classifying Mexican maize accessions using hierarchical and density search methods. Crop Sci 37(3): 972–980.
Article Google Scholar
Franco, J., J. Crossa, J. Díaz, S. Taba, J. Villaseñor & S.A. Eberhart, 1997b. A sequential clustering strategy for classifying gene bank accessions. Crop Sci 37(5): 1656–1662.
Article Google Scholar
Franco, J., J. Crossa, J. Villaseñor, S. Taba & S.A. Eberhart, 1998. Classifying genetic resources by categorical and continuous vari-ables. Crop Sci 38(6): 1688–1696.
Article Google Scholar
Franco, J., J. Crossa, J. Villaseñor, A. Castillo, S. Taba & S.A. Eberhart, 1999. Atwo stages, three-way method for classifying genetic resources in multiple environments. Crop Sci 39(1): 259–267.
Article Google Scholar
Franco, J. & J. Crossa, 2002. The Modified Location Model for classifying genetic resources: I. Association between categorical and continuous variables. Crop Science 42:1719–1726.
Article Google Scholar
Franco, J., J. Crossa, S. Taba & S.A. Eberhart, 2002. The Modified Location Model for classifying genetic resources: II. Unrestricted variance-covariance matrices. Crop Sci 42: 1727–1736.
Article Google Scholar
Franco, J., J. Crossa, S. Taba & H. Shands, 2003. A multivariate method for classifying cultivars and studying Group ×Environ-ment ×Trait interaction. Crop Sci 43: 1249–1258.
Article Google Scholar
Friedman, H.P. & J. Rubin, 1967. On some invariant criteria for grouping data. J Amer Stat Assoc 62: 1159–1178.
Article Google Scholar
Gower, J.C., 1971. A general coefficient of similarity and some of its properties. Biometrics 27: 857–874.
Article Google Scholar
Gutierrez, L., J. Franco, J. Crossa, T. Abadie, 2003. Comparing a preliminary classification with a numerical classification of the maize landraces of Uruguay. Crop Sci 43: 718–727.
Article Google Scholar
Jaccard, P., 1908. Nouvelles recherches sur la distribution florale. Bull Soc Vaud Sci Nat 44: 223–270.
Google Scholar
Krzanowski, W.J., 1983. Distance between populations using mixed continuous and categorical variables. Biometrika 70: 235–243.
Article Google Scholar
Krzanowski, W.J., 1993. The location model for mixtures of cate-gorical and continuous variables. J Classif 10: 25–49.
Google Scholar
Krzanowski, W.J. & F.H.C. Marriott, 1994. Multivariate Analysis, Part 1. Edward Arnold and John Wiley, London.
Google Scholar
Laird, N., 1993. The EM Algorithm. En C.R. Rao (Ed.), Handbook of Statistics, Vol 9, pp. 509–520. Elsevier Science Publishers, Amsterdam.
Google Scholar
Lawrence, C.J. & W.J. Krzanowski, 1996. Mixture separation for mixed-mode data. Stat Comput 6: 85–92.
Article Google Scholar
Mahalanobis, P.C., 1930. On tests and measures of group diver-gences. J Proc Asiatic Soc Bengal 26: 541–588.
Google Scholar
Mardia, K.V., J.T. Kent & J.M. Bibby, 1979. Multivariate Analysis. Academic Press, London.
Google Scholar
McLachlan, G.J. & K.E. Basford, 1988. Mixture Models, Inference and Applications to Clustering. Marcel Dekker, New York.
Google Scholar
Milligan, G.W. & M. Cooper, 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50–2: 159–179.
Article Google Scholar
Milligan, G.W. & M. Cooper, 1988. A study of standarization of variables in Cluster Analysis. J Classif 5: 181–204.
Article Google Scholar
Mojena, R., 1977. Hierarchical grouping methods and stopping rules: an evaluation. The Comput J 20: 359–363.
Article Google Scholar
Muir, W., W.E. Nyquist & S. Xu, 1992. Alternative partitioning of the genotype-by-environment interaction. Theor Appl Genet 84: 193–200.
Article Google Scholar
Olkin, I. & R.F. Tate, 1961. Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32: 448–465.
Google Scholar
SAS Institute Inc., 1990. SAS/STAT ^® User's Guide, Version 6, 4th edn. Vol. 2, SAS Institute Inc., Cary, NC.
Google Scholar
Sokal, R.R. & C. Michener, 1958. A statistical method for evalu-ating systematic relationships. Univ Kansas Sci Bull 38: 1409–1438.
Google Scholar
Taba, S., J. Diaz, J. Franco & J. Crossa, 1998. Evaluation of caribbean maize accessions to develop a core subset. Crop Sci 38(5): 1378–1386.
Article Google Scholar
Taba, S., J. Diaz, J. Franco, J. Crossa & S.A. Eberhart, 1999. A core subset of LAMP, CD, CIMMYT, México.
Ward, J., 1963. Hierarchical grouping to optimize an objective func-tion. J Amer Stat Assoc 58: 236–244.
Article Google Scholar
Wishart, D., 1986. Hierarchical cluster analysis with messy data. In: W. Gaul & M. Schader (eds.), Classification as a Tool of Research, pp. 453–460. Elsevier Science Publishers B.V., Amsterdam, Holland.
Google Scholar
Wolfe, J.H., 1967. NORMIX; computational methods for estimating the parameters of multivariate normal mixtures of distributions. Research Memorandum, SRM 68–2, U.S. Naval Personnel Re-search Activity, San Diego.
Google Scholar
Wolfe, J.H., 1970. Pattern clustering by multivariate mixture analy-sis. Multivariate Behav Res 5: 329–350.
Article Google Scholar
Wolfe, J.H., 1971. A Monte Carlo study of the sampling distribution of the likelihood ratio for mixtures of multinormal distributions. Naval Personnel Train Res Lab Tech Bull STB 72–2.

Download references

Author information

Authors and Affiliations

Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo, Postal 6-641, 06600, México DF, México
José Crossa
J. Franco, Facultad de Agronomía, Universidad de la República Oriental del Uruguay, Garzó
Jorge Franco

Authors

José Crossa
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Franco
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crossa, J., Franco, J. Statistical methods for classifying genotypes. Euphytica 137, 19–37 (2004). https://doi.org/10.1023/B:EUPH.0000040500.86428.e8

Download citation

Issue Date: June 2004
DOI: https://doi.org/10.1023/B:EUPH.0000040500.86428.e8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical methods for classifying genotypes

Abstract

Access this article

Similar content being viewed by others

Principal components analysis - K-means transposon element based foxtail millet core collection selection method

Comparison of Hierarchical Clustering Methods for Binary Data From SSR and ISSR Molecular Markers

DNA Markers in Diversity Analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Statistical methods for classifying genotypes

Abstract

Access this article

Similar content being viewed by others

Principal components analysis - K-means transposon element based foxtail millet core collection selection method

Comparison of Hierarchical Clustering Methods for Binary Data From SSR and ISSR Molecular Markers

DNA Markers in Diversity Analysis

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation