Skip to main content
Log in

AntCaller: an accurate variant caller incorporating ancient DNA damage

  • Methods Paper
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Ancient DNA obtained from ancient samples, such as sediments, bones, and teeth, is an important genetic resource that can be used to reconstruct an evolutional history of humans, animals, and plants. The application of high-throughput sequencing enables the research of ancient DNA to be conducted in a whole genome scale. However, post-mortem DNA damage mainly caused by deamination of cytosine to uracil (or methylated cytosine to thymine) may confound the variant calling and downstream analysis. In this article, we develop a Python program to implement a new variant caller, “AntCaller”, which extracts the information on nucleotide substitutions from sequencing data and calculates the probability of each genotype based on a Bayesian rule. Through both simulation studies and real data analyses, it was shown that our method reduced the false discovery rate caused by nucleotide misincorporations and outperformed two mainstream variant callers (i.e., GATK and SAMtools) in terms of calling accuracy. In a real application with serious DNA damage, AntCaller still outperformed GATK and SAMtools combined with quality score recalling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23:147

    Article  CAS  PubMed  Google Scholar 

  • Bos KI (2014) Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature 514:494

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Briggs AW, Stenzel U, Johnson PL, Green RE, Kelso J, Prüfer K, Meyer M, Krause J, Ronan MT, Lachmann M (2007) Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci 104:14616–14621

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Briggs AW, Stenzel U, Meyer M, Krause J, Kircher M, Pääbo S (2010) Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res 38:e87–e87

    Article  Google Scholar 

  • Consortium GP (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65

    Article  Google Scholar 

  • DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Del Angel G, Rivas MA, Hanna M (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Freudenberg-Hua Y, Freudenberg J, Kluck N, Cichon S, Propping P, Nöthen MM (2003) Single nucleotide variation analysis in 65 candidate genes for CNS disorders in a representative sample of the European population. Genome Res 13:2271–2276

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fu Q, Meyer M, Gao X, Stenzel U, Burbano HA, Kelso J, Pääbo S (2013) DNA analysis of an early modern human from Tianyuan Cave, China. Proc Natl Acad Sci 110:2223–2227

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-Petri A, Prüfer K, De FC (2014a) Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514:445–449

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-Petri A, Prüfer K, de Filippo C (2014b) Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514:445–449

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Green RE, Malaspinas A-S, Krause J, Briggs AW, Johnson PL, Uhler C, Meyer M, Good JM, Maricic T, Stenzel U (2008) A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134:416–426

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y (2010) A draft sequence of the Neandertal genome. Science 328:710–722

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Higuchi R, Bowman B, Freiberger M, Ryder OA, Wilson AC (1984) DNA sequences from the quagga, an extinct member of the horse family. Nature 312:282–284

    Article  CAS  PubMed  Google Scholar 

  • Hofmanová Z, Kreutzer S, Hellenthal G, Sell C, Diekmann Y, Díez-Del-Molino D, Van DL, López S, Kousathanas A, Link V (2016) Early farmers from across Europe directly descended from Neolithic Aegeans. Proc Natl Acad Sci 113:6886–6891

    Article  PubMed  PubMed Central  Google Scholar 

  • Hofreiter M, Mead JI, Martin P, Poinar HN (2003) Molecular caving. Curr Biol 13:R693–R695

    Article  CAS  PubMed  Google Scholar 

  • Höss M, Dilling A, Currant A, Pääbo S (1996) Molecular phylogeny of the extinct ground sloth Mylodon darwinii. Proc Natl Acad Sci 93:181–185

    Article  PubMed  PubMed Central  Google Scholar 

  • Hu J, Li T, Xiu Z, Zhang H (2015a) MAFsnp: a multi-sample accurate and flexible SNP caller using next-generation sequencing data. PLoS ONE 10:e0135332

    Article  PubMed  PubMed Central  Google Scholar 

  • Hu Y, Ding Q, He Y, Xu S, Jin L (2015b) Reintroduction of a homocysteine level-associated allele into East Asians by Neanderthal introgression. Mol Biol Evol 32:3108–3113

    CAS  PubMed  Google Scholar 

  • Huang W, Li L, Myers JR, Marth GT (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28:593–594

    Article  PubMed  Google Scholar 

  • Jeong C, Ozga AT, Witonsky DB, Malmström H, Edlund H, Hofman CA, Hagan RW, Jakobsson M, Lewis CM, Aldenderfer MS (2016) Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc. Proc Natl Acad Sci 113:7485–7490

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L (2013) mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29:1682–1684

    Article  PubMed  PubMed Central  Google Scholar 

  • Knapp M, Clarke AC, Horsburgh KA, Matisoo-Smith EA (2012) Setting the stage—building and working in an ancient DNA laboratory. Ann Anat 194:3–6

    Article  CAS  PubMed  Google Scholar 

  • Li H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27:2987–2993

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  • Lindgreen S, Krogh A, Pedersen JS (2014) SNPest: a probabilistic graphical model for estimating genotypes. BMC Res Notes 7:1–12

    Article  Google Scholar 

  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, De FC (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338:222–226

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Meyer M, Arsuaga J-L, de Filippo C, Nagel S, Aximu-Petri A, Nickel B, Martínez I, Gracia A, de Castro JMB, Carbonell E (2016) Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531:504–507

    Article  CAS  PubMed  Google Scholar 

  • Pääbo S (1989) Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proc Natl Acad Sci 86:1939–1943

    Article  PubMed  PubMed Central  Google Scholar 

  • Parducci L, Jørgensen T, Tollefsrud MM, Elverland E, Alm T, Fontana SL, Bennett KD, Haile J, Matetovici I, Suyama Y (2012) Glacial survival of boreal trees in northern Scandinavia. Science 335:1083–1086

    Article  CAS  PubMed  Google Scholar 

  • Parks M, Lambert D (2015) Impacts of low coverage depths and post-mortem DNA damage on variant calling: a simulation study. BMC Genomics 16:1

    Article  CAS  Google Scholar 

  • Pedersen JS, Valen E, Velazquez AMV, Parker BJ, Rasmussen M, Lindgreen S, Lilje B, Tobin DJ, Kelly TK, Vang S (2014) Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome. Genome Res 24:454–466

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pedersen MW, Overballe-Petersen S, Ermini L, Der Sarkissian C, Haile J, Hellstrom M, Spens J, Thomsen PF, Bohmann K, Cappellini E (2015) Ancient and modern environmental DNA. Philos Trans R Soc Lond B Biol Sci 370:20130383

    Article  PubMed  PubMed Central  Google Scholar 

  • Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, de Filippo C (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505:43–49

    Article  PubMed  Google Scholar 

  • Rasmussen M, Anzick SL, Waters MR, Skoglund P, Degiorgio M, Stafford ST Jr, Rasmussen S, Moltke I, Albrechtsen A, Doyle SM (2014) The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature 506:225–229

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Renaud G, Stenzel U, Kelso J (2014) leeHom: adaptor trimming and merging for Illumina sequencing reads. Nucleic Acids Res 42:e141–e141

    Article  Google Scholar 

  • Renaud G, Slon V, Duggan AT, Kelso J (2015) Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biol 16:1–18

    Article  Google Scholar 

  • Rohland N, Harney E, Mallick S, Nordenfelt S, Reich D (2015) Partial uracil–DNA–glycosylase treatment for screening of ancient DNA. Philos Trans R Soc Lond B Biol Sci 370:20130624

    Article  PubMed  PubMed Central  Google Scholar 

  • Sawyer S, Krause J, Guschanski K, Savolainen V, Pääbo S (2012) Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS ONE 7:e34131

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Serre D, Langaney A, Chech M, Teschler-Nicola M, Paunovic M, Mennecier P, Hofreiter M, Possnert G, Pääbo S (2004) No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol 2:e57

    Article  PubMed  PubMed Central  Google Scholar 

  • Shapiro B, Hofreiter M (2014) A paleogenomic perspective on evolution and gene function: new insights from ancient DNA. Science 343:1236573

    Article  CAS  PubMed  Google Scholar 

  • Shoemaker JS, Painter IS, Weir BS (1999) Bayesian statistics in genetics: a guide for the uninitiated. Trends Genet 15:354–358

    Article  CAS  PubMed  Google Scholar 

  • Skoglund P, Malmström H, Omrak A, Raghavan M, Valdiosera C, Günther T, Hall P, Tambets K, Parik J, Sjögren K-G (2014a) Genomic diversity and admixture differs for Stone-Age Scandinavian foragers and farmers. Science 344:747–750

    Article  CAS  PubMed  Google Scholar 

  • Skoglund P, Northoff BH, Shunkov MV, Derevianko AP, Pääbo S, Krause J, Jakobsson M (2014b) Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc Natl Acad Sci 111:2229–2234

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tackney JC, Potter BA, Raff J, Powers M, Watkins WS, Warner D, Reuther JD, Irish JD, O’Rourke DH (2015) Two contemporaneous mitogenomes from terminal Pleistocene burials in eastern Beringia. Proc Natl Acad Sci 112:13833–13838

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wall JD, Kim SK (2007) Inconsistencies in Neanderthal genomic DNA sequences. PLoS Genet 3:e175

    Article  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Zhang.

Ethics declarations

Funding

This work was funded by the National Natural Science Foundation of China (Nos. 11371101, 31671297, and 81671874), MOE Scientific Research Project (No. 113022A).

Conflict of interest

The authors declare no potential conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Data availability

VCF files of the altai Neandertal: http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/. Data for a Neanderthal (Vindija 33.16): http://cdna.eva.mpg.de/neandertal/Vindija/bam/Green_etal_2010/. Data for a Stone Age Scandinavian (Gökhem2): https://export.uppmax.uu.se/b2013240/neolithic2/. Data for a Late Pleistocene Native American (Anzick-1): http://www.cbs.dtu.dk/public/clovis/Anzick-1/bams/.

Additional information

Communicated by S. Hohmann.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 450 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, B., Wen, S., Wang, L. et al. AntCaller: an accurate variant caller incorporating ancient DNA damage. Mol Genet Genomics 292, 1419–1430 (2017). https://doi.org/10.1007/s00438-017-1358-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-017-1358-5

Keywords

Navigation