Collision probabilities for AFLP bands, with an application to simple measures of genetic similarity

  • Gerrit Gort
  • Wim J. M. Koopman
  • Alfred Stein
  • Fred A. van Eeuwijk
Article

Abstract

AFLP is a frequently used DNA fingerprinting technique that is popular in the plant sciences. A problem encountered in the interpretation and comparison of individual plant profiles, consisting of band presence-absence patterns, is that multiple DNA fragments of the same length can be generated that eventually show up as single bands on a gel. The phenomenon of two or more fragments coinciding in a band within an individual profile is a type of homoplasy, that we call collision. Homoplasy biases estimates of genetic similarity. In this study, we show how to calculate collision probabilities for bands as a function of band length, given the fragment count, the band count, or band lengths. We also determine probabilities of higher order collisions, and estimate the total number of collisions for a profile. Since short fragments occur more often, short bands are more likely to contain collisions. For a typical plant genome and AFLP procedure, the collision probability for the shortest band is 25 times larger than for the longest. In a profile with 100 bands a quarter of the bands may contain collisions, concentrated at the shorter band lengths. All calculations require a careful estimate of the monotonically decreasing fragment length distribution. Modifications of Dice and Jaccard coefficients are proposed. The principles are illustrated on data from a phylogenetic study in lettuce.

Key Words

Dice Fragment length distribution Jaccard Occupancy distribution Saddlepoint approximation Size homoplasy 

References

  1. Althoff, D. M., Gitzendanner, M. A., and Segraves, Kari A. (2007), “The Utility of Amplified Fragment Length Polymorphisms in Phylogenetics: A Comparison of Homology Within and Between Genomes,” Systematic Biology, 56, 477–484.CrossRefGoogle Scholar
  2. Bennett, M.D., Bhandol, P., and Leitch, I. J. (2000), “Nuclear DNA Amounts in Angiosperms and Their Modern Uses—807 New Estimates,” Annals of Botany, 86, 859–909.CrossRefGoogle Scholar
  3. Butler, R.W., and Sutton, R.K. (1998), “Saddlepoint Approximation for Multivariate Cumulative Distribution Functions and Probability Computations in Sampling Theory and Outlier Testing,” Journal of the American Statistical Association, 93, 596–604.MATHCrossRefMathSciNetGoogle Scholar
  4. Chakraborty, R. (1993), “A Class of Population Genetic Questions Formulated as the Generalized Occupancy Problem,” Genetics, 134, 953–958.Google Scholar
  5. Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977), “Maximum Likelihood from Incomplete Data via EM Algorithm,” Journal of the Royal Statistical Society, Series B, 39, 1–38.MATHMathSciNetGoogle Scholar
  6. Duim, B., Vandamme, P.A.R., Rigter, A., Laevens, S., Dijkstra, J.R., and Wagenaar, J.A. (2001), “Differentiation of Campylobacter Species by AFLP Fingerprinting,” Microbiology, 147, 2729–2737.Google Scholar
  7. Feller, W. (1968), An Introduction to Probability Theory and Its Applications Volume, New York: Wiley.Google Scholar
  8. Gort, G., Koopman, W.J.M., and Stein, A. (2006), “Fragment Length Distributions and Collision Probabilities for AFLP Markers,” Biometrics, 62, 1107–1115.MATHCrossRefMathSciNetGoogle Scholar
  9. Hansen, M., Kraft, T., Christiansson, M., and Nilsson, N.O. (1999), “Evaluation of AFLP in Beta,” Theoretical and Applied Genetics, 98, 845–852.CrossRefGoogle Scholar
  10. Innan, H., Terauchi, R., Kahl, G., and Tajima, F. (1999), “A Method for Estimating Nucleotide Diversity from AFLP Data,” Genetics, 151, 1157–1164.Google Scholar
  11. Jeuken, M. R., Van Wijk, R., Peleman, J., and Lindhout, P. (2001), “An Integrated Interspecific AFLP Map of Lettuce (Lactuca) Based on Two L-sativa × L-saligna F-2 Populations,” Theoretical and Applied Genetics, 103, 638–647.CrossRefGoogle Scholar
  12. Koopman, W.J.M. (2002), “Zooming in on the Lettuce Genome,” PhD thesis, Wageningen University. Chapter 6, Evolution of DNA Content and Base Composition in Lactuca (Asteraceae) and Related Genera.Google Scholar
  13. Koopman, W.J.M., and Gort, G. (2004), “Significance Tests and Weighted Values for AFLP Similarities, Based on Arabidopsis in Silico AFLP Fragment Length Distributions,” Genetics, 167, 1915–1928.CrossRefGoogle Scholar
  14. Koopman, W.J.M., Zevenbergen, M.J., and Van den Berg, R.G. (2001), “Species Relationships in Lactuca s.l. (Lactuceae, Asteraceae) Inferred from AFLP Fingerprints,” American Journal of Botany, 88, 1881–1887.CrossRefGoogle Scholar
  15. Kosman, E., and Leonard, K.J. (2005), “Similarity Coefficients for Molecular Markers in Studies of Genetic Relationships Between Individuals for Haploid, Diploid, and Polyploid Species,” Molecular Ecology, 14, 415–424.CrossRefGoogle Scholar
  16. McCullagh, P., and Nelder, J.A. (1989), Generalized Linear Models (2nd ed.), New York: Chapman & Hall.MATHGoogle Scholar
  17. Mechanda, S.M., Baum, B. R., Johnson, D. A., and Arnason, J. T. (2004), “Sequence Assessment of Comigrating AFLP Bands in Echinacea—Implications for Comparative Biological Studies,” Genome, 47, 15–25.CrossRefGoogle Scholar
  18. Mueller, U.G., and LaReesa Wolfenbarger, L. (1999), “AFLP Genotyping and Fingerprinting,” Trends in Ecology and Evolution, 14, 389–394.CrossRefGoogle Scholar
  19. Munford, A.G. (1977), “A Note on the Uniformity Assumption in the Birthday Problem,” The American Statistician, 31, 119.CrossRefMathSciNetGoogle Scholar
  20. O’Hanlon, P.C., and Peakall, R. (2000), “A Simple Method for the Detection of Size Homoplasy Among Amplified Fragment Length Polymorphism Fragments,” Molecular Ecology, 9, 815–816.CrossRefGoogle Scholar
  21. Piepho, H.P., and Koch, G. (2000), “Codominant Analysis of Banding Data from a Dominant Marker System by Normal Mixtures,” Genetics, 155, 1459–1468.Google Scholar
  22. Prochazka, M., Walder, K., and Xia, J. (2001), “AFLP Fingerprinting of the Human Genome,” Human Genetics, 108, 59–65.CrossRefGoogle Scholar
  23. R Development Core Team (2005), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. Available online at http://www.R-project.org. Google Scholar
  24. Robinson, J. P., and Harris, S. A. (1999), “Amplified Fragment Length Polymorphisms and Microsatellites: A Phylogenetic Perspective,” in Which DNA Marker for Which Purpose?, ed. E M Gillet. E-book available at http://webdoc.sub.gwdg.de/ebook/y/1999/whichmarker/index.htm.Google Scholar
  25. van Eeuwijk, F.A., and Law, J.R. (2004), “Statistical Aspects of Essential Derivation, With Illustrations Based on Lettuce and Barley,” Euphytica, 137, 129–137.CrossRefGoogle Scholar
  26. Vekemans, X., Beauwens, T., Lemaire, M., and Roldán-Ruiz, I. (2002), “Data from Amplified Fragment Length Polymorphism (AFLP) Markers Show Indication of Size Homoplasy and of a Relationship Between Degree of Homoplasy and Fragment Size,” Molecular Ecology, 11, 139–151.CrossRefGoogle Scholar
  27. Vos, P., Hogers, R., Bleeker, M., Reijans, M., Vandelee, T., Hornes, M., Frijters, A., Pot, J., Peleman, J., Kuiper, M., and Zabeau, M. (1995), “AFLP: A New Technique for DNA Fingerprinting,” Nucleic Acid Research, 23, 4407–4414.CrossRefGoogle Scholar

Copyright information

© International Biometric Society 2008

Authors and Affiliations

  • Gerrit Gort
    • 1
  • Wim J. M. Koopman
    • 2
  • Alfred Stein
    • 3
  • Fred A. van Eeuwijk
    • 1
  1. 1.Wageningen UniversityWageningenThe Netherlands
  2. 2.Biosystematics Group, National Herbarium NederlandWageningen University branchWageningenThe Netherlands
  3. 3.Department of Earth Observation ScienceITCEnschedeThe Netherlands

Personalised recommendations