Skip to main content

The Practical Evaluation of DNA Barcode Efficacy

  • Protocol
  • First Online:
DNA Barcodes

Part of the book series: Methods in Molecular Biology ((MIMB,volume 858))

Abstract

This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman–Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, “the probability of correct identification” (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hebert PD, Cywinska A, Ball SL, Dewaard JR (2003) Biological identifications through DNA barcodes. Proc Biol Sci 270:313–321

    Article  PubMed  CAS  Google Scholar 

  2. Floyd R, Abebe E, Papert A, Blaxter M (2002) Molecular barcodes for soil nematode identification. Mol Ecol 11:839–850

    Article  PubMed  CAS  Google Scholar 

  3. Hebert PD, Ratnasingham S, Dewaard JR (2003) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc Biol Sci 270:S96–S99

    Article  PubMed  CAS  Google Scholar 

  4. Hajibabaei M, Janzen DM, Burns JM et al (2006) DNA barcodes distinguish species of tropical lepidoptera. Proc Natl Acad Sci U S A 103:968–971

    Article  PubMed  Google Scholar 

  5. Hogg ID, Hebert PDN (2004) Biological identification of springtails (hexapoda: Collembola) from the canadian arctic, using mitochondrial DNA barcodes. Can J Zool 82:749–754

    Article  Google Scholar 

  6. Lorenz JG, Jackson WE, Beck JC, Hanner R (2005) The problems and promise of DNA barcodes for species diagnosis of primate biomaterials. Philos Trans R Soc Lond B Biol Sci 360:1869–1877

    Article  PubMed  CAS  Google Scholar 

  7. Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biol 3:e422

    Article  PubMed  Google Scholar 

  8. Saunders GW (2005) Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philos Trans R Soc Lond B Biol Sci 360:1879–1888

    Article  PubMed  CAS  Google Scholar 

  9. Smith MA, Fisher BL, Hebert PDN (2005) DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar. Philos Trans R Soc Lond B Biol Sci 360:1825–1834

    Article  PubMed  CAS  Google Scholar 

  10. Smith MA, Woodley NE, Janzen DH et al (2006) DNA barcodes reveal cryptic host-specificity within the presumed polyphagous members of a genus of parasitoid flies (diptera: Tachinidae). Proc Natl Acad Sci U S A 103:3657–3662

    Article  PubMed  CAS  Google Scholar 

  11. Chase MW, Salamin N, Wilkinson M et al (2005) Land plants and DNA barcodes: short-term and long-term goals. Philos Trans R Soc Lond B Biol Sci 360:1889–1895

    Article  PubMed  CAS  Google Scholar 

  12. Cowan RS, Chase MW, Kress JW, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55:611–616

    Article  Google Scholar 

  13. Kress WJ, Erickson DL (2008) DNA barcodes: genes, genomics, and bioinformatics. Proc Natl Acad Sci U S A 105:2761–2762

    Article  PubMed  CAS  Google Scholar 

  14. Cbol Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci U S A 106:12794–12797

    Article  Google Scholar 

  15. Meier R, Shiyang K, Vaidya G, Ng PK (2006) DNA barcoding and taxonomy in diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55:715–728

    Article  PubMed  Google Scholar 

  16. Huang D, Meier R, Todd PA, Chou LM (2008) Slow mitochondrial coI sequence evolution at the base of the metazoan tree and its implications for DNA barcoding. J Mol Evol 66:167–174

    Article  PubMed  CAS  Google Scholar 

  17. Erickson DL, Spouge JL, Resch A et al (2008) DNA barcoding in land plants: developing standards to quantify and maximize success. Taxon 13:1304–1316

    Google Scholar 

  18. Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcl gene complements the non-coding trnh-psba spacer region. PLoS One 2:e508

    Article  PubMed  Google Scholar 

  19. Austerlitz F (2007) Comparing phylogenetic and statistical classification methods for DNA barcoding. Paper presented at the second international barcode of life conference, Taipei, Taiwan, 2007

    Google Scholar 

  20. Little DP, Stevenson DW (2007) A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics 23:1–27

    Article  Google Scholar 

  21. Altschul SF, Madden TL, Schaffer AA et al (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  PubMed  CAS  Google Scholar 

  22. Altschul S (1999) Hot papers – bioinformatics – gapped blast and psi-blast: a new generation of protein database search programs by s.F. Altschul, t.L. Madden, a.A. Schaffer, j.H. Zhang, z. Zhang, w. Miller, d.J. Lipman – comments. Scientist 13:15

    Google Scholar 

  23. Wouters MA, Husain A (2001) Changes in zinc ligation promote remodeling of the active site in the zinc hydrolase superfamily. J Mol Biol 314:1191–1207

    Article  PubMed  CAS  Google Scholar 

  24. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  PubMed  CAS  Google Scholar 

  25. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  PubMed  CAS  Google Scholar 

  26. Eddy SR (1995) Multiple alignment using hidden markov models. Proc Int Conf Intell Syst Mol Biol 3:114–120

    PubMed  CAS  Google Scholar 

  27. Edgar RC (2004) Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113

    Article  PubMed  Google Scholar 

  28. Katoh K, Misawa K, Kuma K, Miyata T (2002) Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res 30:3059–3066

    Article  PubMed  CAS  Google Scholar 

  29. Matz MV, Nielsen R (2005) A likelihood ratio test for species membership based on DNA sequence data. Philos Trans R Soc Lond B Biol Sci 360:1969–1974

    Article  PubMed  CAS  Google Scholar 

  30. Nielsen R, Matz M (2006) Statistical approaches for DNA barcoding. Syst Biol 55: 162–169

    Article  PubMed  Google Scholar 

  31. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood appr-oach. J Mol Evol 17:368–376

    Article  PubMed  CAS  Google Scholar 

  32. Felsenstein J (1988) Phylogenies from molecular sequences – inference and reliability. Annu Rev Genet 22:521–565

    Article  PubMed  CAS  Google Scholar 

  33. Kuksa P, Pavlovic V (2009) Efficient alignment-free DNA barcode analytics. BMC Bioinformatics 10:S9

    Article  PubMed  Google Scholar 

  34. Efron B, Stein C (1981) The jackknife estimate of variance. Ann Stat 9:586–596

    Article  Google Scholar 

  35. Ferguson JWH (2002) On the use of genetic divergence for identifying species. Biol J Linnean Soc 75:509–516

    Article  Google Scholar 

  36. Blaxter M, Mann J, Chapman T et al (2005) Defining operational taxonomic units using DNA barcode data. Philos Trans R Soc Lond B Biol Sci 360:1935–1943

    Article  PubMed  CAS  Google Scholar 

  37. Lambert DM, Baker A, Huynen L et al (2005) Is a large-scale DNA-based inventory of ancient life possible? J Hered 96(3):279–284

    Article  PubMed  CAS  Google Scholar 

  38. Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic, New York, pp 21–123

    Google Scholar 

  39. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

    Article  PubMed  CAS  Google Scholar 

  40. Jin L, Nei M (1990) Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol 7:82–102

    PubMed  CAS  Google Scholar 

  41. Tamura K (1994) Model selection in the estimation of the number of nucleotide substitutions. Mol Biol Evol 11:154–157

    PubMed  CAS  Google Scholar 

  42. Waterman MS, Smith TF, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387

    Article  Google Scholar 

Download references

Acknowledgment

This research was supported in part by the Intramural Research Program of the NIH, NLM, NCBI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John L. Spouge .

Editor information

Editors and Affiliations

Appendix

Appendix

For a barcode with several markers, each of which can have a failed PCR, specimen identification ultimately relies on the markers with a successful PCR. To quantify the identification process, number the markers \( \left\{1,2,\mathrm{...},m\right\}\), and consider any subset \( M\) of \( \left\{1,2,\mathrm{...},m\right\}\). For a particular specimen, let the probability that \( M\) is the subset of markers with PCR success be denoted by \( {s}_{M}\), and let the PCI for the barcode based on the marker subset \( M\) be \( {p}_{M}\). A species PCI \( p\) can then be calculated from the values of \( {s}_{M}\) and \( {p}_{M}\) (although the calculation depends on the definition of species PCI: see Section 2.3 for various definitions.)

One very reasonable definition of the PCR-adjusted species PCI is the average \( p={\displaystyle {\sum }_{(M)}{p}_{M}{s}_{M}}\). For the case of a barcode based on a single marker, e.g., \( M\)is a subset of \( \left\{1\right\}\), i.e., the empty set \( \left\{\right\}\)or \( \left\{1\right\}\). Because the empty set\( \left\{\right\}\)corresponds to a complete absence of information about a specimen, the corresponding PCI is \( {p}_{\left\{\right\}}=0\), so \( p={p}_{\left\{\right\}}{s}_{\left\{\right\}}+{p}_{\left\{1\right\}}{s}_{\left\{1\right\}}={p}_{\left\{1\right\}}{s}_{\left\{1\right\}}\), which agrees with the formula for the PCR-adjusted PCI in the main text, for a barcode based on a single marker.

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Spouge, J.L., Mariño-Ramírez, L. (2012). The Practical Evaluation of DNA Barcode Efficacy. In: Kress, W., Erickson, D. (eds) DNA Barcodes. Methods in Molecular Biology, vol 858. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-591-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-591-6_17

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-590-9

  • Online ISBN: 978-1-61779-591-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics