Abstract
In genetic resource conservation and plant breeding, multivariate data on continuous and categorical traits are collected with the objective of selecting genotypes and accessions that best represent the entire population or gene collection with the minimum loss of genetic diversity. Therefore, the best numerical classification strategy is the one that produces the most compact and well-separated groups, that is, minimum variability within each group and maximum variability among groups. In this study, we review geometric classification techniques as well as statistical models based on mixed distribution models. The two-stage sequential clustering strategy uses all variables, continuous and categorical, and it tends to form more homogeneous groups of individuals than other clustering strategies. The sequential clustering strategy can be applied to three-way data comprising genotypes × environments × attributes. This approach groups genotypes with consistent responses for most of the continuous and categorical traits across environments.
Similar content being viewed by others
References
Anderberg, M.R., 1973. Cluster Analysis for Applications. Academic Press, New York.
Basford, K. & G.J. McLachlan, 1985. The mixture method of clus-tering applied to three-way data. J Classif 2: 109–125.
Binder, D.A., 1978. Bayesian Cluster Analysis. Biometrika 65: 31–38.
Crossa, J., 1990. Statistical analyses of multilocation trials. Adv Agron 44: 55–85.
Calinski, R.B. & J. Harabasz, 1974. A dendrite method for cluster analysis. Comm Statistics 3: 1–27.
Chang, P.C. & A.A. Afifi, 1974. Classification based on dichotomous and continuous variables. JASA 69: 336–339.
Day, N.E., 1969. Estimating the components of a mixture of normal distributions. Biometrika 56: 463–474.
Dempster, A.P., N.M. Laird & D.B. Rubin, 1977. Maximum likeli-hood from incomplete data via the EM algorithm. J. Roy Stat Soc Ser B 39: 1–38.
Duda, R.O. & P.E. Hart, 1973. Pattern Classification and Scene Analysis, Wiley, New York.
Efron, B., 1979. Bootstrap methods: another look at the jacknife. Ann Stat 7: 1–26.
Everitt, B.S., 1981. A Monte Carlo investigation of the likelihood ratio test for the number of components in a mixture of normal distributions. Multivar Behav Res 16: 171–180.
Forgy, E.W., 1965. Cluster analysis of multivariate data: efficiency vs. interpretability of classifications (abstract). Biometrics 21: 768–769.
Franco, J., J. Crossa, J. Villaseñor, S. Taba & S.A. Eberhart, 1997a. Classifying Mexican maize accessions using hierarchical and density search methods. Crop Sci 37(3): 972–980.
Franco, J., J. Crossa, J. Díaz, S. Taba, J. Villaseñor & S.A. Eberhart, 1997b. A sequential clustering strategy for classifying gene bank accessions. Crop Sci 37(5): 1656–1662.
Franco, J., J. Crossa, J. Villaseñor, S. Taba & S.A. Eberhart, 1998. Classifying genetic resources by categorical and continuous vari-ables. Crop Sci 38(6): 1688–1696.
Franco, J., J. Crossa, J. Villaseñor, A. Castillo, S. Taba & S.A. Eberhart, 1999. Atwo stages, three-way method for classifying genetic resources in multiple environments. Crop Sci 39(1): 259–267.
Franco, J. & J. Crossa, 2002. The Modified Location Model for classifying genetic resources: I. Association between categorical and continuous variables. Crop Science 42:1719–1726.
Franco, J., J. Crossa, S. Taba & S.A. Eberhart, 2002. The Modified Location Model for classifying genetic resources: II. Unrestricted variance-covariance matrices. Crop Sci 42: 1727–1736.
Franco, J., J. Crossa, S. Taba & H. Shands, 2003. A multivariate method for classifying cultivars and studying Group ×Environ-ment ×Trait interaction. Crop Sci 43: 1249–1258.
Friedman, H.P. & J. Rubin, 1967. On some invariant criteria for grouping data. J Amer Stat Assoc 62: 1159–1178.
Gower, J.C., 1971. A general coefficient of similarity and some of its properties. Biometrics 27: 857–874.
Gutierrez, L., J. Franco, J. Crossa, T. Abadie, 2003. Comparing a preliminary classification with a numerical classification of the maize landraces of Uruguay. Crop Sci 43: 718–727.
Jaccard, P., 1908. Nouvelles recherches sur la distribution florale. Bull Soc Vaud Sci Nat 44: 223–270.
Krzanowski, W.J., 1983. Distance between populations using mixed continuous and categorical variables. Biometrika 70: 235–243.
Krzanowski, W.J., 1993. The location model for mixtures of cate-gorical and continuous variables. J Classif 10: 25–49.
Krzanowski, W.J. & F.H.C. Marriott, 1994. Multivariate Analysis, Part 1. Edward Arnold and John Wiley, London.
Laird, N., 1993. The EM Algorithm. En C.R. Rao (Ed.), Handbook of Statistics, Vol 9, pp. 509–520. Elsevier Science Publishers, Amsterdam.
Lawrence, C.J. & W.J. Krzanowski, 1996. Mixture separation for mixed-mode data. Stat Comput 6: 85–92.
Mahalanobis, P.C., 1930. On tests and measures of group diver-gences. J Proc Asiatic Soc Bengal 26: 541–588.
Mardia, K.V., J.T. Kent & J.M. Bibby, 1979. Multivariate Analysis. Academic Press, London.
McLachlan, G.J. & K.E. Basford, 1988. Mixture Models, Inference and Applications to Clustering. Marcel Dekker, New York.
Milligan, G.W. & M. Cooper, 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50–2: 159–179.
Milligan, G.W. & M. Cooper, 1988. A study of standarization of variables in Cluster Analysis. J Classif 5: 181–204.
Mojena, R., 1977. Hierarchical grouping methods and stopping rules: an evaluation. The Comput J 20: 359–363.
Muir, W., W.E. Nyquist & S. Xu, 1992. Alternative partitioning of the genotype-by-environment interaction. Theor Appl Genet 84: 193–200.
Olkin, I. & R.F. Tate, 1961. Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32: 448–465.
SAS Institute Inc., 1990. SAS/STAT ® User's Guide, Version 6, 4th edn. Vol. 2, SAS Institute Inc., Cary, NC.
Sokal, R.R. & C. Michener, 1958. A statistical method for evalu-ating systematic relationships. Univ Kansas Sci Bull 38: 1409–1438.
Taba, S., J. Diaz, J. Franco & J. Crossa, 1998. Evaluation of caribbean maize accessions to develop a core subset. Crop Sci 38(5): 1378–1386.
Taba, S., J. Diaz, J. Franco, J. Crossa & S.A. Eberhart, 1999. A core subset of LAMP, CD, CIMMYT, México.
Ward, J., 1963. Hierarchical grouping to optimize an objective func-tion. J Amer Stat Assoc 58: 236–244.
Wishart, D., 1986. Hierarchical cluster analysis with messy data. In: W. Gaul & M. Schader (eds.), Classification as a Tool of Research, pp. 453–460. Elsevier Science Publishers B.V., Amsterdam, Holland.
Wolfe, J.H., 1967. NORMIX; computational methods for estimating the parameters of multivariate normal mixtures of distributions. Research Memorandum, SRM 68–2, U.S. Naval Personnel Re-search Activity, San Diego.
Wolfe, J.H., 1970. Pattern clustering by multivariate mixture analy-sis. Multivariate Behav Res 5: 329–350.
Wolfe, J.H., 1971. A Monte Carlo study of the sampling distribution of the likelihood ratio for mixtures of multinormal distributions. Naval Personnel Train Res Lab Tech Bull STB 72–2.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Crossa, J., Franco, J. Statistical methods for classifying genotypes. Euphytica 137, 19–37 (2004). https://doi.org/10.1023/B:EUPH.0000040500.86428.e8
Issue Date:
DOI: https://doi.org/10.1023/B:EUPH.0000040500.86428.e8