Skip to main content
Log in

Sampling Issues Affecting Accuracy of Likelihood-based Classification Using Genetical Data

  • Published:
Environmental Biology of Fishes Aims and scope Submit manuscript

Abstract

We demonstrate the effectiveness of a genetic algorithm for discovering multi-locus combinations that provide accurate individual assignment decisions and estimates of mixture composition based on likelihood classification. Using simulated data representing different levels of inter-population differentiation (Fst∼ 0.01 and 0.10), genetic diversities (four or eight alleles per locus), and population sizes (20, 40, 100 individuals in baseline populations), we show that subsets of loci can be identified that provide comparable levels of accuracy in classification decisions relative to entire multi-locus data sets, where 5, 10, or 20 loci were considered. Microsatellite data sets from hatchery strains of lake trout, Salvelinus namaycush, representing a comparable range of inter-population levels of differentiation in allele frequencies confirmed simulation results. For both simulated and empirical data sets, assignment accuracy was achieved using fewer loci (e.g., three or four loci out of eight for empirical lake trout studies). Simulation results were used to investigate properties of the ‘leave-one-out’ (L1O) method for estimating assignment error rates. Accuracy of population assignments based on L1O methods should be viewed with caution under certain conditions, particularly when baseline population sample sizes are low (<50).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Angers, B.A., L. Bernatchez, A. Angers & L. Desgroseillers. 1995. Specific microsatellite loci for brook charr reveal strong population subdivision on a microgeographic scale. J. Fish Biol. 47(Suppl. A): 177–185.

    CAS  Google Scholar 

  • Aurelle, D., S. Lek, J.L. Giraudel & P. Berrebi. 1999. Microsatellites and artificial neural networks: Tools for the discrimination between natural and hatchery brown trout (Salmo trutta, L.) in Atlantic populations. Ecol. Model. 120: 313–324.

    Article  Google Scholar 

  • Belkhir, K., P. Borsa, J. Goudet, L. Chikhi & F. Bonhomme. 1996. GENETIX v. 3.0, logiciel sous Windows™ pour la génétique des populations. Montpellier, Laboratoire Génome et Populations, Université Montpellier 2, France.

    Google Scholar 

  • Bernatchez, L. & P. Duchesne. 2000. Individual-based genotype analysis in studies of parentage and population assignment: How many loci, how many alleles? Can. J. Fish. Aquat. Sci. 57: 1–12.

    Article  Google Scholar 

  • Brenner, C.H. 1998. Difficulties in the estimation of ethnic affiliation. Am. J. Hum. Genet. 62: 1558–1560.

    Article  CAS  Google Scholar 

  • Cornuet, J.M., S. Aulagnier, S. Lek, P. Franck & M. Solignac. 1996. Classifying individuals among infra-specific taxa using microsatellite data and neural networks. C. R. Acad. Sci. Paris, Life Sci. 319: 1167–1177.

    CAS  Google Scholar 

  • Cornuet, J.M., S. Piry, G. Luikart, A. Estoup & M. Solignac. 1999. New methods employing multi-locus genotypes to select or exclude populations as origins of individuals. Genetics 153: 1989–2000.

    CAS  Google Scholar 

  • Duda, R.O., P.E. Hart & D.G. Stork. 2000. Pattern Classification. 2nd edition, John Wiley and Sons, New York. 654 pp.

    Google Scholar 

  • Girman, D.J., M.L.G. Mills, E. Geffen & R.K. Wayne. 1997. A molecular genetic analysis of social structure, dispersal, and interpack relationships of the African wild dog (Lycaon pictus). Behav. Ecol. Sociobiol. 40: 187–198.

    Article  Google Scholar 

  • Goldberg, D. 1989. Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading. 452 pp.

    Google Scholar 

  • Gomulkiewicz, R., J.K.T. Brodziak & M. Mangel. 1990. Ranking loci for genetic stock identification by curvature methods. Can. J. Fish. Aquat. Sci. 47: 611–619.

    Google Scholar 

  • Hansen, M.M., E. Kenchington & E.E. Nielsen. 2001. Assigning individual fish to populations using microsatellite DNA markers. Fish Fish. 2: 93–112.

    Google Scholar 

  • Hansen, M.M., D.E. Ruzzante, E.E. Nielsen & K.L.D. Mensberg. 2000. Microsatellite and mitochondrial DNA polymorphism reveals life-history dependent interbreeding between hatchery and wild brown trout (Salmo trutta L.). Mol. Ecol. 9: 583–594.

    Article  CAS  Google Scholar 

  • Holland, J. 1994. Adaptation in Natural and Artificial Systems, MIT Press, Cambridge. 221 pp.

    Google Scholar 

  • Jain, A.K. & D. Zongker. 1997.Feature selection: Evaluation, application and small sample performance. IEEE Trans. Patt. Anal. Mach. Intell. 19: 153–158.

    Google Scholar 

  • Jain, A.K, R.P.W. Duin & J. Mao. 2000. Statistical pattern recognition: A review. IEEE Trans. Patt. Anal. Mach. Intell. 22: 4–37.

    Google Scholar 

  • Letcher, B.H. & T.L. King. 1999. Targeted stock identification using multi-locus genotype ‘family-printing’. Fish. Res. 43: 99–111.

    Article  Google Scholar 

  • Lewis, P.O. 1998. A genetic algorithm for maximum-likelihood inference using nucleotide sequence data.Mol. Biol. Evol. 15: 277–283.

    CAS  Google Scholar 

  • Martinez, J.L., J. Dumas, E. Beall & E. Garcia-Vazquez. 2001. Assessing introgression of foreign strains in wild Atlantic salmon populations: Variation in microsatellites assessed in historic scale collections.Freshw. Biol. 46: 835–844.

    Article  Google Scholar 

  • Mitchell, M. & C.E. Taylor. 1999. Evolutionary computation: An overview. Annu. Rev. Ecol. Syst. 30: 593–616.

    Article  Google Scholar 

  • Neraas, L.P. & P. Spruell. 2001. Fragmentation of riverine systems: The genetic effects of dams on bull trout (Salvelinus confluentus) in the Clark Fork River system. Mol. Ecol. 10: 1153–1164.

    Article  CAS  Google Scholar 

  • Nielsen, E.E., M.M. Hansen, C. Schmidt, D. Meldrup & P. Grønkjaer. 2001. Population of origin of Atlantic cod. Nature 413: 272.

    Article  CAS  Google Scholar 

  • Norris, A.T., D.G. Bradley & E.P. Cunningham. 2000. Parentage and relatedness determination in farmed Atlantic salmon (Salmo salar) using microsatellite markers. Aquaculture 182: 73–83.

    Article  Google Scholar 

  • Olsen, J.B., P. Bentzen & J.E. Seeb. 1998. Characterization of seven microsatellite loci derived from pink salmon. Mol. Ecol. 7: 1087–1089.

    CAS  Google Scholar 

  • Olsen, J.B., P. Bentzen, M.A. Banks, J.B. Shaklee & S. Young. 2000. Microsatellites reveal population identity of individual pink salmon to allow supportive breeding of a population at risk of extinction. Trans. Amer. Fish. Soc. 129: 232–242.

    Article  Google Scholar 

  • O'Reilly, P.T., L.C. Hamilton, S.K. McConnell & J.W. Wright. 1996. Rapid analysis of genetic variation in Atlantic salmon (Salmo salar) by PCR multiplexing of dinucleotide and tetranucleotide microsatellite. Can. J. Fish. Aquat. Sci. 53: 2292–2298.

    Article  Google Scholar 

  • Paetkau, D., W. Calvert, I. Stirling & C. Strobeck. 1995. Microsatellite analysis of population structure in Canadian polar bears. Mol. Ecol. 4: 347–354.

    CAS  Google Scholar 

  • Pella, J.J. & G.B. Milner. 1987. Use of genetic marks in stock composition analysis. pp. 247–276. In: N. Ryman & F. Utter (ed.) Population Genetics and Fisheries Management. Univeristy of Washington Press, Seattle, WA.

    Google Scholar 

  • Potvin, C. & L. Bernatchez. 2001. Lacustrine spatial distribution of landlocked Atlantic salmon populations assessed across generations by multi-locus individual assignment and mixed-stock analysis. Mol. Ecol. 10: 22375–22388.

    Article  Google Scholar 

  • Pritchard, J.K., M. Stephens & P. Donnelly. 2000. Inference of population structure using multi-locus genotype data. Genetics 155: 945–959.

    CAS  Google Scholar 

  • Queller, D.C. & K.F. Goodnight. 1989. Estimating relatedness using genetic markers. Evolution 43: 258–275.

    Google Scholar 

  • Rannala, B. & J.L. Mountain. 1997. Detecting immigration using multi-locus genotypes. Proc. Natl. Acad. Sci. USA 94: 9197–9202.

    Article  CAS  Google Scholar 

  • Roques, S., P. Duchesne & L. Bernatchez. 1999. Potential of microsatellites for individual assignment: The North Atlantic redfish (genus Sebastes) species complex as a case study. Mol. Ecol. 8: 1703–1717.

    Article  CAS  Google Scholar 

  • Scribner, K.T., J.R. Gust & R.L. Fields. 1999. Isolation and characterization of novel salmon microsatellite loci: Cross-species amplification and population genetics applications. Can. J. Fish. Aquat. Sci. 53: 833–841.

    Google Scholar 

  • Shao, J. 1993. Linear model selection by cross-validation. J. Amer. Stat. Assoc. 88: 486–494.

    Google Scholar 

  • Shriver, M.D., M.W. Smith, L. Jin, A. Marcini, J.M. Akey et al. 1997 Ethnic-affiliation estimation by use of population-specific DNA markers. Amer. J. Hum. Genet. 60: 957–964.

    CAS  Google Scholar 

  • Smouse, P. E. & C. Chevillon. 1998. Analytical aspects of population-specific DNA fingerprinting for individuals. J. Hered. 89: 143–150.

    Article  CAS  Google Scholar 

  • Smouse, P.E., R.S. Spielman & M.H. Park. 1982. Multiple-locus allocation of individuals to groups as a function of the genetic variation within and differences among human populations. Amer. Nat. 119: 445–463.

    Google Scholar 

  • Smouse, P.E., R.S.Waples & J.A.Tworek. 1990 Agenetic mixture analysis for use with incomplete source population data. Can. J. Fish. Aquat. Sci. 47: 20–634.

    Google Scholar 

  • Sokal, R.R. & J.F. Rohlf. 1995. Biometry. 2nd edition, Freeman, USA. 887 pp.

    Google Scholar 

  • Stefanini, M.F. & A. Camussi. 2000. The reduction of large molecular profiles to informative components using a genetic algorithm. Bioinformatics 16: 923–931.

    Article  CAS  Google Scholar 

  • Taylor, E.B., A. Kuiper, P.M. Troffe, D.J. Hoysak & S. Pollard. 2000. Variation in developmental biology and microsatellite DNA in reproductive ecotypes of kokanee, Oncorhynchus nerka: Implications for declining populations in a large British Columbia lake. Conserv. Genet. 1: 213–249.

    Article  Google Scholar 

  • Taylor, E.B., Z. Redenbach, A.B. Costello, S.J. Pollard & C.J. Pacas. 2001. Nested analysis of genetic variation in northwestern North American char, Dolly Varden (Salvelinus malma) and bull trout (S. confluentus). Can. J. Fish. Aquat. Sci. 58: 406–420.

    CAS  Google Scholar 

  • Trunk, G.V. 1979. A problem of dimensionality: A simple example. IEEE Trans. Patt. Anal. Mach. Intell. 1: 306–307.

    Google Scholar 

  • Waser, P.M. & C. Strobeck. 1998. Genetic signatures of interpopulation dispersal. Trends Ecol. Evol. 13: 43–44.

    Article  Google Scholar 

  • Weir, B.S. 1979. Inferences about linkage disequilibrium. Biometrics 25: 235–254.

    Google Scholar 

  • Weir, B.S. & C.C. Cockerham. 1984. Estimating F-statistics for the analysis of population structure. Evolution 43: 1358–1370.

    Google Scholar 

  • Wright, S. 1965. The interpretation of population structure by F-statistics with special regards to system of mating. Evolution 19: 395–420.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guinand, B., Scribner, K., Topchy, A. et al. Sampling Issues Affecting Accuracy of Likelihood-based Classification Using Genetical Data. Environmental Biology of Fishes 69, 245–259 (2004). https://doi.org/10.1023/B:EBFI.0000022869.72448.cd

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:EBFI.0000022869.72448.cd

Navigation