Abstract
Ancient DNA obtained from ancient samples, such as sediments, bones, and teeth, is an important genetic resource that can be used to reconstruct an evolutional history of humans, animals, and plants. The application of high-throughput sequencing enables the research of ancient DNA to be conducted in a whole genome scale. However, post-mortem DNA damage mainly caused by deamination of cytosine to uracil (or methylated cytosine to thymine) may confound the variant calling and downstream analysis. In this article, we develop a Python program to implement a new variant caller, “AntCaller”, which extracts the information on nucleotide substitutions from sequencing data and calculates the probability of each genotype based on a Bayesian rule. Through both simulation studies and real data analyses, it was shown that our method reduced the false discovery rate caused by nucleotide misincorporations and outperformed two mainstream variant callers (i.e., GATK and SAMtools) in terms of calling accuracy. In a real application with serious DNA damage, AntCaller still outperformed GATK and SAMtools combined with quality score recalling.
Similar content being viewed by others
References
Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23:147
Bos KI (2014) Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature 514:494
Briggs AW, Stenzel U, Johnson PL, Green RE, Kelso J, Prüfer K, Meyer M, Krause J, Ronan MT, Lachmann M (2007) Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci 104:14616–14621
Briggs AW, Stenzel U, Meyer M, Krause J, Kircher M, Pääbo S (2010) Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res 38:e87–e87
Consortium GP (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Del Angel G, Rivas MA, Hanna M (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498
Freudenberg-Hua Y, Freudenberg J, Kluck N, Cichon S, Propping P, Nöthen MM (2003) Single nucleotide variation analysis in 65 candidate genes for CNS disorders in a representative sample of the European population. Genome Res 13:2271–2276
Fu Q, Meyer M, Gao X, Stenzel U, Burbano HA, Kelso J, Pääbo S (2013) DNA analysis of an early modern human from Tianyuan Cave, China. Proc Natl Acad Sci 110:2223–2227
Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-Petri A, Prüfer K, De FC (2014a) Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514:445–449
Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-Petri A, Prüfer K, de Filippo C (2014b) Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514:445–449
Green RE, Malaspinas A-S, Krause J, Briggs AW, Johnson PL, Uhler C, Meyer M, Good JM, Maricic T, Stenzel U (2008) A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134:416–426
Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y (2010) A draft sequence of the Neandertal genome. Science 328:710–722
Higuchi R, Bowman B, Freiberger M, Ryder OA, Wilson AC (1984) DNA sequences from the quagga, an extinct member of the horse family. Nature 312:282–284
Hofmanová Z, Kreutzer S, Hellenthal G, Sell C, Diekmann Y, Díez-Del-Molino D, Van DL, López S, Kousathanas A, Link V (2016) Early farmers from across Europe directly descended from Neolithic Aegeans. Proc Natl Acad Sci 113:6886–6891
Hofreiter M, Mead JI, Martin P, Poinar HN (2003) Molecular caving. Curr Biol 13:R693–R695
Höss M, Dilling A, Currant A, Pääbo S (1996) Molecular phylogeny of the extinct ground sloth Mylodon darwinii. Proc Natl Acad Sci 93:181–185
Hu J, Li T, Xiu Z, Zhang H (2015a) MAFsnp: a multi-sample accurate and flexible SNP caller using next-generation sequencing data. PLoS ONE 10:e0135332
Hu Y, Ding Q, He Y, Xu S, Jin L (2015b) Reintroduction of a homocysteine level-associated allele into East Asians by Neanderthal introgression. Mol Biol Evol 32:3108–3113
Huang W, Li L, Myers JR, Marth GT (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28:593–594
Jeong C, Ozga AT, Witonsky DB, Malmström H, Edlund H, Hofman CA, Hagan RW, Jakobsson M, Lewis CM, Aldenderfer MS (2016) Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc. Proc Natl Acad Sci 113:7485–7490
Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L (2013) mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29:1682–1684
Knapp M, Clarke AC, Horsburgh KA, Matisoo-Smith EA (2012) Setting the stage—building and working in an ancient DNA laboratory. Ann Anat 194:3–6
Li H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27:2987–2993
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760
Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Lindgreen S, Krogh A, Pedersen JS (2014) SNPest: a probabilistic graphical model for estimating genotypes. BMC Res Notes 7:1–12
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303
Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, De FC (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338:222–226
Meyer M, Arsuaga J-L, de Filippo C, Nagel S, Aximu-Petri A, Nickel B, Martínez I, Gracia A, de Castro JMB, Carbonell E (2016) Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531:504–507
Pääbo S (1989) Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proc Natl Acad Sci 86:1939–1943
Parducci L, Jørgensen T, Tollefsrud MM, Elverland E, Alm T, Fontana SL, Bennett KD, Haile J, Matetovici I, Suyama Y (2012) Glacial survival of boreal trees in northern Scandinavia. Science 335:1083–1086
Parks M, Lambert D (2015) Impacts of low coverage depths and post-mortem DNA damage on variant calling: a simulation study. BMC Genomics 16:1
Pedersen JS, Valen E, Velazquez AMV, Parker BJ, Rasmussen M, Lindgreen S, Lilje B, Tobin DJ, Kelly TK, Vang S (2014) Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome. Genome Res 24:454–466
Pedersen MW, Overballe-Petersen S, Ermini L, Der Sarkissian C, Haile J, Hellstrom M, Spens J, Thomsen PF, Bohmann K, Cappellini E (2015) Ancient and modern environmental DNA. Philos Trans R Soc Lond B Biol Sci 370:20130383
Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, de Filippo C (2014) The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505:43–49
Rasmussen M, Anzick SL, Waters MR, Skoglund P, Degiorgio M, Stafford ST Jr, Rasmussen S, Moltke I, Albrechtsen A, Doyle SM (2014) The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature 506:225–229
Renaud G, Stenzel U, Kelso J (2014) leeHom: adaptor trimming and merging for Illumina sequencing reads. Nucleic Acids Res 42:e141–e141
Renaud G, Slon V, Duggan AT, Kelso J (2015) Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biol 16:1–18
Rohland N, Harney E, Mallick S, Nordenfelt S, Reich D (2015) Partial uracil–DNA–glycosylase treatment for screening of ancient DNA. Philos Trans R Soc Lond B Biol Sci 370:20130624
Sawyer S, Krause J, Guschanski K, Savolainen V, Pääbo S (2012) Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS ONE 7:e34131
Serre D, Langaney A, Chech M, Teschler-Nicola M, Paunovic M, Mennecier P, Hofreiter M, Possnert G, Pääbo S (2004) No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol 2:e57
Shapiro B, Hofreiter M (2014) A paleogenomic perspective on evolution and gene function: new insights from ancient DNA. Science 343:1236573
Shoemaker JS, Painter IS, Weir BS (1999) Bayesian statistics in genetics: a guide for the uninitiated. Trends Genet 15:354–358
Skoglund P, Malmström H, Omrak A, Raghavan M, Valdiosera C, Günther T, Hall P, Tambets K, Parik J, Sjögren K-G (2014a) Genomic diversity and admixture differs for Stone-Age Scandinavian foragers and farmers. Science 344:747–750
Skoglund P, Northoff BH, Shunkov MV, Derevianko AP, Pääbo S, Krause J, Jakobsson M (2014b) Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc Natl Acad Sci 111:2229–2234
Tackney JC, Potter BA, Raff J, Powers M, Watkins WS, Warner D, Reuther JD, Irish JD, O’Rourke DH (2015) Two contemporaneous mitogenomes from terminal Pleistocene burials in eastern Beringia. Proc Natl Acad Sci 112:13833–13838
Wall JD, Kim SK (2007) Inconsistencies in Neanderthal genomic DNA sequences. PLoS Genet 3:e175
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This work was funded by the National Natural Science Foundation of China (Nos. 11371101, 31671297, and 81671874), MOE Scientific Research Project (No. 113022A).
Conflict of interest
The authors declare no potential conflicts of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Data availability
VCF files of the altai Neandertal: http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/. Data for a Neanderthal (Vindija 33.16): http://cdna.eva.mpg.de/neandertal/Vindija/bam/Green_etal_2010/. Data for a Stone Age Scandinavian (Gökhem2): https://export.uppmax.uu.se/b2013240/neolithic2/. Data for a Late Pleistocene Native American (Anzick-1): http://www.cbs.dtu.dk/public/clovis/Anzick-1/bams/.
Additional information
Communicated by S. Hohmann.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhou, B., Wen, S., Wang, L. et al. AntCaller: an accurate variant caller incorporating ancient DNA damage. Mol Genet Genomics 292, 1419–1430 (2017). https://doi.org/10.1007/s00438-017-1358-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00438-017-1358-5