Skip to main content
Log in

Predictive Validity of Discriminant Analysis for Genetic Data

  • Published:
Genetica Aims and scope Submit manuscript

Abstract

We examined the predictive validity of the results using discriminant analysis to distinguish statistically among two or more populations with a large sample of random amplified polymorphic DNA (RAPD) loci but a small sample of genotypes from each population. We compared and contrasted results from randomized data with results from real data of three studies by 100 randomized shuffling of genotypes into various populations. We generally obtained substantial differences between results from randomized data compared to those from the real data in several characteristics of discriminant analysis. We showed that a high level of correctly classified percentage is also obtainable in the analysis of randomized data, mainly with a low number of populations. However, the correctly classified percentage obtained from the real data was generally significantly higher than the percentage obtained from the randomized data. We suggested that the high level of real differences in allele frequencies of the RAPD polymorphic loci clearly distinguished the various populations and that the populations differ significantly in their RAPD contents in accordance with ecological heterogeneity. We obtained either no or a low level of difference between the correct classification rate obtained by the leaving-one-out procedure and that obtained from the original data, attributed to a low number of loci selected by the stepwise method. The results strengthen and support our conclusion and lead us to focus on the discriminant analysis by selecting only low numbers of discriminating variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aiken, H.H., J.A. Harr, R.L. Ashenhurst et al., 1955. Tables of the Cumulative Binomial Probability Distribution. Harvard University Press, Cambridge, MA.

    Google Scholar 

  • Aitchison, J. & C.G.G. Aitken, 1976. Multivariate binary discrimination by the kernel method. Biometrika 63: 413–420.

    Google Scholar 

  • Costanza, M.C. & A.A. Afifi, 1979. Comparison of stopping rules in forward stepwise discriminant analysis. J. Am. Statist. Assoc. 74: 777–785.

    Google Scholar 

  • Costanza, M.C. & T. Ashikaga, 1986. Monte Carlo study of forward stepwise discrimination based on small samples. Comput. Math. Appl. 12A: 245–252.

    Google Scholar 

  • Fahima, T., G.L. Sun, A. Beharav, T. Krugman, A. Beiles & E. Nevo, 1999. RAPD polymorphism of wild emmer wheat populations, Triticum dicoccoides, in Israel. Theor. Appl. Genet. 98: 434–447.

    Google Scholar 

  • Hand, D.J., 1981. Discrimination and Classification. Wiley, New York.

    Google Scholar 

  • Hand, D.J., 1983. A comparison of two methods of discriminant analysis applied to binary data. Biometrics 39: 683–694.

    PubMed  Google Scholar 

  • Hartl, D.L., 1980. Principles of Population Genetics. Sinauer Associates, Sunderland, MA.

    Google Scholar 

  • Huberty, C.J., 1994. Applied Discriminant Analysis. Wiley, New York.

    Google Scholar 

  • Kenward, M.G., 1979. An intuitive approach to the MANOVA test criteria. The Statistician 28: 193–198.

    Google Scholar 

  • Klecka, R.K., 1975. Discriminant analysis, pp. 434–467 in SPSS, Statistical Package for the Social Sciences, edited by N.H. Nie, C.H. Hull, J.G. Jenkins, K. Steinbrenner & D.H. Bent, McGraw-Hill, New York, 2nd edn.

    Google Scholar 

  • Lachenbruch, P.A. & M.R. Mickey, 1968. Estimation of error rates in discriminant analysis. Technometrics 10: 1–11.

    Google Scholar 

  • Li, Y., T. Fahima, A. Beiles, A.B. Korol & E. Nevo, 1999. Microclimatic stress and adaptive DNA differentiation in wild emmer wheat, Triticum dicoccoides. Theor. Appl. Genet. 98: 873–883.

    Google Scholar 

  • Lynch, M. & B.G. Milligan, 1994. Analysis of population genetic structure with RAPD markers. Mol. Ecol. 3: 91–99.

    Google Scholar 

  • Manly, B.F.J., 1997. Randomization, Bootstrap and Monte Carlo Methods in Biology. Chapman & Hall, London, 2nd edn.

    Google Scholar 

  • McLachlan, G.J., 1992. Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York.

    Google Scholar 

  • Moore, D.H., 1973. Evaluation of five discrimination procedures for binary variables. J. Am. Statist. Assoc. 68: 399–404.

    Google Scholar 

  • Nei, M., 1973. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70: 3321–3323.

    Google Scholar 

  • Owuor, E.D., A. Beharav, T. Fahima, V.M. Kirzhner, A.B. Korol & E. Nevo, 2003. Microscale ecological stress causes molecular selection in wild barley. Genet. Res. Crop Evol. 50: 213–224.

    Google Scholar 

  • Raveh, A., 1989. A nonmetric approach to linear discriminant analysis. J. Am. Statist. Assoc. 84: 176–183.

    Google Scholar 

  • SAS Institute, 1996. SAS User Guide: Statistics, Version 6.09. SAS Institute Inc., Cary, NC.

    Google Scholar 

  • Solow, R., 1990. A randomization test for misclassification probability in discriminant analysis. Ecology 71: 2379–2382.

    Google Scholar 

  • Tallal, P., R. Stark & E. Mellits, 1985. Identification of languageimpaired children on the basis of rapid perception and production skills. Brain and Language 25: 314–322.

    PubMed  Google Scholar 

  • Weir, B.S., 1990. Genetic Data Analysis: Methods for Discrete Population Genetic Data. Sinauer Associates, Sunderland, MA.

    Google Scholar 

  • Williams, J.G.K., A.R. Kubelik, K.J. Livak, J.A. Rafalski & S.V. Tingey, 1990. DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucl. Acids Res. 18: 6531–6535.

    PubMed  Google Scholar 

  • Wright, S., 1946. Isolation by distance under diverse systems of mating. Genetics 31: 39–59.

    Google Scholar 

  • Zhang, X. & B. Tomblin, 1998. Can children with language impairment be accurately identified using temporal processing measures? A simulated study. Brain and Language 65: 395–403.

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beharav, A., Nevo, E. Predictive Validity of Discriminant Analysis for Genetic Data. Genetica 119, 259–267 (2003). https://doi.org/10.1023/B:GENE.0000003666.33328.22

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:GENE.0000003666.33328.22

Navigation