Inference of Population Structure from Ancient DNA

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10812)

Abstract

Methods for inferring population structure from genetic information traditionally assume samples are contemporary. Yet, the increasing availability of ancient DNA sequences begs revision of this paradigm. We present Dystruct (Dynamic Structure), a framework and toolbox for inference of shared ancestry from data that include ancient DNA. By explicitly modeling population history and genetic drift as a time-series, Dystruct more accurately and realistically discovers shared ancestry from ancient and contemporary samples. Formally, we use a normal approximation of drift, which allows a novel, efficient algorithm for optimizing model parameters using stochastic variational inference. We show that Dystruct outperforms the state of the art when individuals are sampled over time, as is common in ancient DNA datasets. We further demonstrate the utility of our method on a dataset of 92 ancient samples alongside 1941 modern ones genotyped at 222755 loci. Our model tends to present modern samples as the mixtures of ancestral populations they really are, rather than the artifactual converse of presenting ancestral samples as mixtures of contemporary groups.

Keywords

Population genetics Population structure Ancient DNA Time-series Variational inference Kalman filtering 

Notes

Acknowledgements

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No. DGE 16-44869, and the NSF under Grant No. DGE-1144854, and Grant No. CCF 1547120. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the NSF.

References

  1. 1.
    Alexander, D.H., Novembre, J., Lange, K.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19(9), 1655–1664 (2009)CrossRefGoogle Scholar
  2. 2.
    Allentoft, M.E., Sikora, M., Sjögren, K.G., Rasmussen, S., Rasmussen, M., Stenderup, J., Damgaard, P.B., Schroeder, H., Ahlström, T., Vinner, L., et al.: Population genomics of bronze age Eurasia. Nature 522(7555), 167–172 (2015)CrossRefGoogle Scholar
  3. 3.
    Blei, D.M.: Probabilistic topic models. Commun. ACM. 55(4), 77–84 (2012)CrossRefGoogle Scholar
  4. 4.
    Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the International Conference on Machine Learning, pp. 113–120. ACM (2006)Google Scholar
  6. 6.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  7. 7.
    Cavalli-Sforza, L.L., Edwards, A.W.: Phylogenetic analysis: models and estimation procedures. Evolution 21(3), 550–570 (1967)CrossRefGoogle Scholar
  8. 8.
    Fu, Q., Li, H., Moorjani, P., Jay, F., Slepchenko, S.M., Bondarev, A.A., Johnson, P.L., Aximu-Petri, A., Prüfer, K., de Filippo, C., et al.: Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514(7523), 445–449 (2014)CrossRefGoogle Scholar
  9. 9.
    Fu, Q., Posth, C., Hajdinjak, M., Petr, M., Mallick, S., Fernandes, D., Furtwängler, A., Haak, W., Meyer, M., Mittnik, A., et al.: The genetic history of ice age Europe. Nature 534, 200 (2016)CrossRefGoogle Scholar
  10. 10.
    Gamba, C., Jones, E.R., Teasdale, M.D., McLaughlin, R.L., Gonzalez-Fortes, G., Mattiangeli, V., Domboróczki, L., Kővári, I., Pap, I., Anders, A., et al.: Genome flux and stasis in a five millennium transect of European prehistory. Nat. Commun. 5, 5257 (2014)CrossRefGoogle Scholar
  11. 11.
    Gopalan, P., Hao, W., Blei, D.M., Storey, J.D.: Scaling probabilistic models of genetic variation to millions of humans. Nat. Genet. 48(12), 1587 (2016)CrossRefGoogle Scholar
  12. 12.
    Green, R.E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M.H.Y., et al.: A draft sequence of the neandertal genome. Science 328(5979), 710–722 (2010)CrossRefGoogle Scholar
  13. 13.
    Haak, W., Lazaridis, I., Patterson, N., Rohland, N., Mallick, S., Llamas, B., Brandt, G., Nordenfelt, S., Harney, E., Stewardson, K., et al.: Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522(7555), 207–211 (2015)CrossRefGoogle Scholar
  14. 14.
    Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.W.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)MathSciNetMATHGoogle Scholar
  15. 15.
    Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)CrossRefGoogle Scholar
  16. 16.
    Keller, A., Graefen, A., Ball, M., Matzas, M., Boisguerin, V., Maixner, F., Leidinger, P., Backes, C., Khairat, R., Forster, M., et al.: New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. Nat. Commun. 3, 698 (2012)CrossRefGoogle Scholar
  17. 17.
    Lazaridis, I., Patterson, N., Mittnik, A., Renaud, G., Mallick, S., Kirsanow, K., Sudmant, P.H., Schraiber, J.G., Castellano, S., Lipson, M., et al.: Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513(7518), 409–413 (2014)CrossRefGoogle Scholar
  18. 18.
    Lipson, M., Loh, P.R., Levin, A., Reich, D., Patterson, N., Berger, B.: Efficient moment-based inference of admixture parameters and sources of gene flow. Mol. Biol. Evol. 30(8), 1788–1802 (2013)CrossRefGoogle Scholar
  19. 19.
    Nielsen, R., Akey, J.M., Jakobsson, M., Pritchard, J.K., Tishkoff, S., Willerslev, E.: Tracing the peopling of the world through genomics. Nature 541(7637), 302–310 (2017)CrossRefGoogle Scholar
  20. 20.
    Olalde, I., Allentoft, M.E., Sánchez-Quinto, F., Santpere, G., Chiang, C.W., DeGiorgio, M., Prado-Martinez, J., Rodríguez, J.A., Rasmussen, S., Quilez, J., et al.: Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature 507(7491), 225–228 (2014)CrossRefGoogle Scholar
  21. 21.
    Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y., Genschoreck, T., Webster, T., Reich, D.: Ancient admixture in human history. Genetics 192(3), 1065–1093 (2012)CrossRefGoogle Scholar
  22. 22.
    Peter, B.M.: Admixture, population structure, and F-statistics. Genetics 202(4), 1485–1501 (2016)CrossRefGoogle Scholar
  23. 23.
    Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945–959 (2000)Google Scholar
  24. 24.
    Prüfer, K., Racimo, F., Patterson, N., Jay, F., Sankararaman, S., Sawyer, S., Heinze, A., Renaud, G., Sudmant, P.H., De Filippo, C., et al.: The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505(7481), 43–49 (2014)CrossRefGoogle Scholar
  25. 25.
    Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., De Bakker, P.I., Daly, M.J., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007)CrossRefGoogle Scholar
  26. 26.
    Raghavan, M., Skoglund, P., Graf, K.E., Metspalu, M., Albrechtsen, A., Moltke, I., Rasmussen, S., Stafford Jr., T.W., Orlando, L., Metspalu, E., et al.: Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505(7481), 87–91 (2014)CrossRefGoogle Scholar
  27. 27.
    Raj, A., Stephens, M., Pritchard, J.K.: fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197(2), 573–589 (2014)CrossRefGoogle Scholar
  28. 28.
    Rasmussen, M., Li, Y., Lindgreen, S., Pedersen, J.S., Albrechtsen, A., Moltke, I., Metspalu, M., Metspalu, E., Kivisild, T., Gupta, R., et al.: Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463(7282), 757–762 (2010)CrossRefGoogle Scholar
  29. 29.
    Reich, D., Green, R.E., Kircher, M., Krause, J., Patterson, N., Durand, E.Y., Viola, B., Briggs, A.W., Stenzel, U., Johnson, P.L., et al.: Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468(7327), 1053–1060 (2010)CrossRefGoogle Scholar
  30. 30.
    Schlebusch, C.M., Malmström, H., Günther, T., Sjödin, P., Coutinho, A., Edlund, H., Munters, A.R., Vicente, M., Steyn, M., Soodyall, H., et al.: Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago. Science 358(6383), 652–655 (2017)CrossRefGoogle Scholar
  31. 31.
    Seguin-Orlando, A., Korneliussen, T.S., Sikora, M., Malaspinas, A.S., Manica, A., Moltke, I., Albrechtsen, A., Ko, A., Margaryan, A., Moiseyev, V., et al.: Genomic structure in Europeans dating back at least 36,200 years. Science 346(6213), 1113–1118 (2014)CrossRefGoogle Scholar
  32. 32.
    Skoglund, P., Malmström, H., Omrak, A., Raghavan, M., Valdiosera, C., Günther, T., Hall, P., Tambets, K., Parik, J., Sjögren, K.G., et al.: Genomic diversity and admixture differs for stone-age Scandinavian foragers and farmers. Science 344(6185), 747–750 (2014)CrossRefGoogle Scholar
  33. 33.
    Skoglund, P., Malmström, H., Raghavan, M., Storå, J., Hall, P., Willerslev, E., Gilbert, M.T.P., Götherström, A., Jakobsson, M.: Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336(6080), 466–469 (2012)CrossRefGoogle Scholar
  34. 34.
    Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceColumbia UniversityNew YorkUSA
  2. 2.Department of Systems BiologyColumbia UniversityNew YorkUSA
  3. 3.Data Science InstituteColumbia UniversityNew YorkUSA

Personalised recommendations