Skip to main content
Log in

Statistical methods for classifying genotypes

  • Published:
Euphytica Aims and scope Submit manuscript

Abstract

In genetic resource conservation and plant breeding, multivariate data on continuous and categorical traits are collected with the objective of selecting genotypes and accessions that best represent the entire population or gene collection with the minimum loss of genetic diversity. Therefore, the best numerical classification strategy is the one that produces the most compact and well-separated groups, that is, minimum variability within each group and maximum variability among groups. In this study, we review geometric classification techniques as well as statistical models based on mixed distribution models. The two-stage sequential clustering strategy uses all variables, continuous and categorical, and it tends to form more homogeneous groups of individuals than other clustering strategies. The sequential clustering strategy can be applied to three-way data comprising genotypes × environments × attributes. This approach groups genotypes with consistent responses for most of the continuous and categorical traits across environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderberg, M.R., 1973. Cluster Analysis for Applications. Academic Press, New York.

    Google Scholar 

  • Basford, K. & G.J. McLachlan, 1985. The mixture method of clus-tering applied to three-way data. J Classif 2: 109–125.

    Article  Google Scholar 

  • Binder, D.A., 1978. Bayesian Cluster Analysis. Biometrika 65: 31–38.

    Article  Google Scholar 

  • Crossa, J., 1990. Statistical analyses of multilocation trials. Adv Agron 44: 55–85.

    Article  Google Scholar 

  • Calinski, R.B. & J. Harabasz, 1974. A dendrite method for cluster analysis. Comm Statistics 3: 1–27.

    Article  Google Scholar 

  • Chang, P.C. & A.A. Afifi, 1974. Classification based on dichotomous and continuous variables. JASA 69: 336–339.

    Google Scholar 

  • Day, N.E., 1969. Estimating the components of a mixture of normal distributions. Biometrika 56: 463–474.

    Article  Google Scholar 

  • Dempster, A.P., N.M. Laird & D.B. Rubin, 1977. Maximum likeli-hood from incomplete data via the EM algorithm. J. Roy Stat Soc Ser B 39: 1–38.

    Google Scholar 

  • Duda, R.O. & P.E. Hart, 1973. Pattern Classification and Scene Analysis, Wiley, New York.

    Google Scholar 

  • Efron, B., 1979. Bootstrap methods: another look at the jacknife. Ann Stat 7: 1–26.

    Google Scholar 

  • Everitt, B.S., 1981. A Monte Carlo investigation of the likelihood ratio test for the number of components in a mixture of normal distributions. Multivar Behav Res 16: 171–180.

    Article  Google Scholar 

  • Forgy, E.W., 1965. Cluster analysis of multivariate data: efficiency vs. interpretability of classifications (abstract). Biometrics 21: 768–769.

    Google Scholar 

  • Franco, J., J. Crossa, J. Villaseñor, S. Taba & S.A. Eberhart, 1997a. Classifying Mexican maize accessions using hierarchical and density search methods. Crop Sci 37(3): 972–980.

    Article  Google Scholar 

  • Franco, J., J. Crossa, J. Díaz, S. Taba, J. Villaseñor & S.A. Eberhart, 1997b. A sequential clustering strategy for classifying gene bank accessions. Crop Sci 37(5): 1656–1662.

    Article  Google Scholar 

  • Franco, J., J. Crossa, J. Villaseñor, S. Taba & S.A. Eberhart, 1998. Classifying genetic resources by categorical and continuous vari-ables. Crop Sci 38(6): 1688–1696.

    Article  Google Scholar 

  • Franco, J., J. Crossa, J. Villaseñor, A. Castillo, S. Taba & S.A. Eberhart, 1999. Atwo stages, three-way method for classifying genetic resources in multiple environments. Crop Sci 39(1): 259–267.

    Article  Google Scholar 

  • Franco, J. & J. Crossa, 2002. The Modified Location Model for classifying genetic resources: I. Association between categorical and continuous variables. Crop Science 42:1719–1726.

    Article  Google Scholar 

  • Franco, J., J. Crossa, S. Taba & S.A. Eberhart, 2002. The Modified Location Model for classifying genetic resources: II. Unrestricted variance-covariance matrices. Crop Sci 42: 1727–1736.

    Article  Google Scholar 

  • Franco, J., J. Crossa, S. Taba & H. Shands, 2003. A multivariate method for classifying cultivars and studying Group ×Environ-ment ×Trait interaction. Crop Sci 43: 1249–1258.

    Article  Google Scholar 

  • Friedman, H.P. & J. Rubin, 1967. On some invariant criteria for grouping data. J Amer Stat Assoc 62: 1159–1178.

    Article  Google Scholar 

  • Gower, J.C., 1971. A general coefficient of similarity and some of its properties. Biometrics 27: 857–874.

    Article  Google Scholar 

  • Gutierrez, L., J. Franco, J. Crossa, T. Abadie, 2003. Comparing a preliminary classification with a numerical classification of the maize landraces of Uruguay. Crop Sci 43: 718–727.

    Article  Google Scholar 

  • Jaccard, P., 1908. Nouvelles recherches sur la distribution florale. Bull Soc Vaud Sci Nat 44: 223–270.

    Google Scholar 

  • Krzanowski, W.J., 1983. Distance between populations using mixed continuous and categorical variables. Biometrika 70: 235–243.

    Article  Google Scholar 

  • Krzanowski, W.J., 1993. The location model for mixtures of cate-gorical and continuous variables. J Classif 10: 25–49.

    Google Scholar 

  • Krzanowski, W.J. & F.H.C. Marriott, 1994. Multivariate Analysis, Part 1. Edward Arnold and John Wiley, London.

    Google Scholar 

  • Laird, N., 1993. The EM Algorithm. En C.R. Rao (Ed.), Handbook of Statistics, Vol 9, pp. 509–520. Elsevier Science Publishers, Amsterdam.

    Google Scholar 

  • Lawrence, C.J. & W.J. Krzanowski, 1996. Mixture separation for mixed-mode data. Stat Comput 6: 85–92.

    Article  Google Scholar 

  • Mahalanobis, P.C., 1930. On tests and measures of group diver-gences. J Proc Asiatic Soc Bengal 26: 541–588.

    Google Scholar 

  • Mardia, K.V., J.T. Kent & J.M. Bibby, 1979. Multivariate Analysis. Academic Press, London.

    Google Scholar 

  • McLachlan, G.J. & K.E. Basford, 1988. Mixture Models, Inference and Applications to Clustering. Marcel Dekker, New York.

    Google Scholar 

  • Milligan, G.W. & M. Cooper, 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50–2: 159–179.

    Article  Google Scholar 

  • Milligan, G.W. & M. Cooper, 1988. A study of standarization of variables in Cluster Analysis. J Classif 5: 181–204.

    Article  Google Scholar 

  • Mojena, R., 1977. Hierarchical grouping methods and stopping rules: an evaluation. The Comput J 20: 359–363.

    Article  Google Scholar 

  • Muir, W., W.E. Nyquist & S. Xu, 1992. Alternative partitioning of the genotype-by-environment interaction. Theor Appl Genet 84: 193–200.

    Article  Google Scholar 

  • Olkin, I. & R.F. Tate, 1961. Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32: 448–465.

    Google Scholar 

  • SAS Institute Inc., 1990. SAS/STAT ® User's Guide, Version 6, 4th edn. Vol. 2, SAS Institute Inc., Cary, NC.

    Google Scholar 

  • Sokal, R.R. & C. Michener, 1958. A statistical method for evalu-ating systematic relationships. Univ Kansas Sci Bull 38: 1409–1438.

    Google Scholar 

  • Taba, S., J. Diaz, J. Franco & J. Crossa, 1998. Evaluation of caribbean maize accessions to develop a core subset. Crop Sci 38(5): 1378–1386.

    Article  Google Scholar 

  • Taba, S., J. Diaz, J. Franco, J. Crossa & S.A. Eberhart, 1999. A core subset of LAMP, CD, CIMMYT, México.

  • Ward, J., 1963. Hierarchical grouping to optimize an objective func-tion. J Amer Stat Assoc 58: 236–244.

    Article  Google Scholar 

  • Wishart, D., 1986. Hierarchical cluster analysis with messy data. In: W. Gaul & M. Schader (eds.), Classification as a Tool of Research, pp. 453–460. Elsevier Science Publishers B.V., Amsterdam, Holland.

    Google Scholar 

  • Wolfe, J.H., 1967. NORMIX; computational methods for estimating the parameters of multivariate normal mixtures of distributions. Research Memorandum, SRM 68–2, U.S. Naval Personnel Re-search Activity, San Diego.

    Google Scholar 

  • Wolfe, J.H., 1970. Pattern clustering by multivariate mixture analy-sis. Multivariate Behav Res 5: 329–350.

    Article  Google Scholar 

  • Wolfe, J.H., 1971. A Monte Carlo study of the sampling distribution of the likelihood ratio for mixtures of multinormal distributions. Naval Personnel Train Res Lab Tech Bull STB 72–2.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crossa, J., Franco, J. Statistical methods for classifying genotypes. Euphytica 137, 19–37 (2004). https://doi.org/10.1023/B:EUPH.0000040500.86428.e8

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:EUPH.0000040500.86428.e8

Navigation