Abstract
We examined the predictive validity of the results using discriminant analysis to distinguish statistically among two or more populations with a large sample of random amplified polymorphic DNA (RAPD) loci but a small sample of genotypes from each population. We compared and contrasted results from randomized data with results from real data of three studies by 100 randomized shuffling of genotypes into various populations. We generally obtained substantial differences between results from randomized data compared to those from the real data in several characteristics of discriminant analysis. We showed that a high level of correctly classified percentage is also obtainable in the analysis of randomized data, mainly with a low number of populations. However, the correctly classified percentage obtained from the real data was generally significantly higher than the percentage obtained from the randomized data. We suggested that the high level of real differences in allele frequencies of the RAPD polymorphic loci clearly distinguished the various populations and that the populations differ significantly in their RAPD contents in accordance with ecological heterogeneity. We obtained either no or a low level of difference between the correct classification rate obtained by the leaving-one-out procedure and that obtained from the original data, attributed to a low number of loci selected by the stepwise method. The results strengthen and support our conclusion and lead us to focus on the discriminant analysis by selecting only low numbers of discriminating variables.
Similar content being viewed by others
References
Aiken, H.H., J.A. Harr, R.L. Ashenhurst et al., 1955. Tables of the Cumulative Binomial Probability Distribution. Harvard University Press, Cambridge, MA.
Aitchison, J. & C.G.G. Aitken, 1976. Multivariate binary discrimination by the kernel method. Biometrika 63: 413–420.
Costanza, M.C. & A.A. Afifi, 1979. Comparison of stopping rules in forward stepwise discriminant analysis. J. Am. Statist. Assoc. 74: 777–785.
Costanza, M.C. & T. Ashikaga, 1986. Monte Carlo study of forward stepwise discrimination based on small samples. Comput. Math. Appl. 12A: 245–252.
Fahima, T., G.L. Sun, A. Beharav, T. Krugman, A. Beiles & E. Nevo, 1999. RAPD polymorphism of wild emmer wheat populations, Triticum dicoccoides, in Israel. Theor. Appl. Genet. 98: 434–447.
Hand, D.J., 1981. Discrimination and Classification. Wiley, New York.
Hand, D.J., 1983. A comparison of two methods of discriminant analysis applied to binary data. Biometrics 39: 683–694.
Hartl, D.L., 1980. Principles of Population Genetics. Sinauer Associates, Sunderland, MA.
Huberty, C.J., 1994. Applied Discriminant Analysis. Wiley, New York.
Kenward, M.G., 1979. An intuitive approach to the MANOVA test criteria. The Statistician 28: 193–198.
Klecka, R.K., 1975. Discriminant analysis, pp. 434–467 in SPSS, Statistical Package for the Social Sciences, edited by N.H. Nie, C.H. Hull, J.G. Jenkins, K. Steinbrenner & D.H. Bent, McGraw-Hill, New York, 2nd edn.
Lachenbruch, P.A. & M.R. Mickey, 1968. Estimation of error rates in discriminant analysis. Technometrics 10: 1–11.
Li, Y., T. Fahima, A. Beiles, A.B. Korol & E. Nevo, 1999. Microclimatic stress and adaptive DNA differentiation in wild emmer wheat, Triticum dicoccoides. Theor. Appl. Genet. 98: 873–883.
Lynch, M. & B.G. Milligan, 1994. Analysis of population genetic structure with RAPD markers. Mol. Ecol. 3: 91–99.
Manly, B.F.J., 1997. Randomization, Bootstrap and Monte Carlo Methods in Biology. Chapman & Hall, London, 2nd edn.
McLachlan, G.J., 1992. Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York.
Moore, D.H., 1973. Evaluation of five discrimination procedures for binary variables. J. Am. Statist. Assoc. 68: 399–404.
Nei, M., 1973. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70: 3321–3323.
Owuor, E.D., A. Beharav, T. Fahima, V.M. Kirzhner, A.B. Korol & E. Nevo, 2003. Microscale ecological stress causes molecular selection in wild barley. Genet. Res. Crop Evol. 50: 213–224.
Raveh, A., 1989. A nonmetric approach to linear discriminant analysis. J. Am. Statist. Assoc. 84: 176–183.
SAS Institute, 1996. SAS User Guide: Statistics, Version 6.09. SAS Institute Inc., Cary, NC.
Solow, R., 1990. A randomization test for misclassification probability in discriminant analysis. Ecology 71: 2379–2382.
Tallal, P., R. Stark & E. Mellits, 1985. Identification of languageimpaired children on the basis of rapid perception and production skills. Brain and Language 25: 314–322.
Weir, B.S., 1990. Genetic Data Analysis: Methods for Discrete Population Genetic Data. Sinauer Associates, Sunderland, MA.
Williams, J.G.K., A.R. Kubelik, K.J. Livak, J.A. Rafalski & S.V. Tingey, 1990. DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucl. Acids Res. 18: 6531–6535.
Wright, S., 1946. Isolation by distance under diverse systems of mating. Genetics 31: 39–59.
Zhang, X. & B. Tomblin, 1998. Can children with language impairment be accurately identified using temporal processing measures? A simulated study. Brain and Language 65: 395–403.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Beharav, A., Nevo, E. Predictive Validity of Discriminant Analysis for Genetic Data. Genetica 119, 259–267 (2003). https://doi.org/10.1023/B:GENE.0000003666.33328.22
Issue Date:
DOI: https://doi.org/10.1023/B:GENE.0000003666.33328.22