Advertisement

Human Genetics

, Volume 135, Issue 7, pp 727–740 | Cite as

Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads

  • Joshua J. Faber-Hammond
  • Kim H. BrownEmail author
Original Investigation

Abstract

The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2–5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10–20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine.

Keywords

Alignment Score Human Genome Reference Assembly Contigs Genome Resequencing Secondary Assembly 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This work was supported by start-up funds from the Portland State University Department of Biology and NIEHS grant R00ES018892 to KHB.

Compliance with ethical standards

Competing interests

The authors declare no competing interest, financial or otherwise, with the publication of this manuscript.

Supplementary material

439_2016_1667_MOESM1_ESM.docx (1.9 mb)
Supplementary material 1 (DOCX 1898 kb)

References

  1. Agarwala R, Barrett T, Beck J, Benson DA, Bollin C, Bolton E, Bourexis D, Brister JR, Bryant SH, Canese K, Clark K, DiCuccio M, Dondoshansky I, Federhen S, Feolo M, Funk K, Geer LY, Gorelenkov V, Hoeppner M, Holmes B, Johnson M, Khotomlianski VE, Kimchi A, Kimelman M, Kitts P, Klimke W, Krasnov S, Kuznetsov A, Landrum MJ, Landsman D, Lee JM, Lipman DJ, Lu ZY, Madden TL, Madej T, Marchler-Bauer A, Karsch-Mizrachi I, Murphy T, Orris R, Ostell J, O’Sullivan C, Panchenko A, Phan L, Preuss D, Pruitt KD, Rubinstein W, Sayers EW, Schneider V, Schuler GD, Sherry ST, Sirotkin K, Siyan K, Slotta D, Soboleva A, Soussov V, Starchenko G, Tatusova TA, Trawick BW, Vakatov D, Wang YL, Ward M, Wilbur WJ, Yaschenko E, Zbicz K, NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43:D6–D17. doi: 10.1093/nar/gku1130 CrossRefGoogle Scholar
  2. Alkan C, Sajjadian S, Eichler EE (2010) Limitations of next-generation genome sequence assembly. Nat Methods 8:61–65. doi: 10.1038/nmeth.1527 CrossRefPubMedPubMedCentralGoogle Scholar
  3. Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and genotyping. Nat Rev Genet 12:363–376. doi: 10.1038/nrg2958 CrossRefPubMedPubMedCentralGoogle Scholar
  4. Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Dinh H, Kovar C, Lee S, Lewis L, Muzny D, Reid J, Wang M, Fang XD, Guo XS, Jian M, Jiang H, Jin X, Li GQ, Li JX, Li YR, Li Z, Liu X, Lu Y, Ma XD, Su Z, Tai SS, Tang MF, Wang B, Wang GB, Wu HL, Wu RH, Yin Y, Zhang WW, Zhao J, Zhao MR, Zheng XL, Zhou Y, Gupta N, Clarke L, Leinonen R, Smith RE, Zheng-Bradley X, Grocock R, Humphray S, James T, Kingsbury Z, Sudbrak R, Albrecht MW, Amstislavskiy VS, Borodina TA, Lienhard M, Mertes F, Sultan M, Timmermann B, Yaspo ML, Fulton L, Fulton R, Weinstock GM, Balasubramaniam S, Burton J, Danecek P, Keane TM, Kolb-Kokocinski A, McCarthy S, Stalker J, Quail M, Davies CJ, Gollub J, Webster T, Wong B, Zhan YP, Auton A, Yu F, Bainbridge M, Challis D, Evani US, Lu J, Nagaswamy U, Sabo A et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65. doi: 10.1038/nature11632 CrossRefGoogle Scholar
  5. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19:Unit 19.10 1–21Google Scholar
  6. Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, Nusbaum C, Jaffe DB (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18:810–820. doi: 10.1101/gr.7337908 CrossRefPubMedPubMedCentralGoogle Scholar
  7. Cann RL, Stoneking M, Wilson AC (1987) Mitochondrial DNA and human-evolution. Nature 325:31–36. doi: 10.1038/325031a0 CrossRefPubMedGoogle Scholar
  8. Carey LA, Perou CM, Livasy CA, Dressler LG, Cowan D, Conway K, Karaca G, Troester MA, Tse CK, Edmiston S, Deming SL, Geradts J, Cheang MCU, Nielsen TO, Moorman PG, Earp HS, Millikan RC (2006) Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA J Am Med Assoc 295:2492–2502. doi: 10.1001/jama.295.21.2492 CrossRefGoogle Scholar
  9. Cavalli-Sforza LL (2005) Opinion—the human genome diversity project: past, present and future. Nat Rev Genet 6:333–340. doi: 10.1038/nrg1579 PubMedGoogle Scholar
  10. Chevreux B, Wetter T, Suhai S (1999) Genome sequence assembly using trace signals and additional sequence information. Comput Sci Biol Proc German Conf Bioinf (GCB) 99:45–56Google Scholar
  11. Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen HC, Agarwala R, McLaren WM, Ritchie GRS, Albracht D, Kremitzki M, Rock S, Kotkiewicz H, Kremitzki C, Wollam A, Trani L, Fulton L, Fulton R, Matthews L, Whitehead S, Chow W, Torrance J, Dunn M, Harden G, Threadgold G, Wood J, Collins J, Heath P, Griffiths G, Pelan S, Grafham D, Eichler EE, Weinstock G, Mardis ER, Wilson RK, Howe K, Flicek P, Hubbard T (2011) Modernizing reference genome assemblies. PLoS Biol. doi: 10.1371/journal.pbio.1001091 PubMedPubMedCentralGoogle Scholar
  12. Collins FS, Lander ES, Rogers J, Waterston RH, International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945. doi: 10.1038/nature03001 CrossRefGoogle Scholar
  13. Colonna V, Ayub Q, Chen Y, Pagani L, Luisi P, Pybus M, Garrison E, Xue Y, Tyler-Smith C (2014) Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences. Genome Biol 15:R88CrossRefPubMedPubMedCentralGoogle Scholar
  14. Dogan H, Can H, Otu HH (2014) Whole genome sequence of a Turkis individual. PLoS One 9:e85233. doi: 10.1371/journal.pone.0085233 CrossRefPubMedPubMedCentralGoogle Scholar
  15. Eichler EE, Clark RA, She XW (2004) An assessment of the sequence gaps: unfinished business in a finished human genome. Nat Rev Genet 5:345–354. doi: 10.1038/nrg1322 CrossRefPubMedGoogle Scholar
  16. Faber-Hammond JJ, Brown KH (2016) Pseudo-de novo assembly and analysis of unmapped genome sequence reads in wild zebrafish reveals novel gene content. Zebrafish 13:95–102. doi: 10.1089/zeb.2015.1154 CrossRefPubMedGoogle Scholar
  17. Fujimoto A, Nakagawa H, Hosono N, Nakano K, Abe T, Boroevich KA, Nagasaki M, Yamaguchi R, Shibuya T, Kubo M, Miyano S, Nakamura Y, Tsunoda T (2010) Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massibely parallel sequencing. Nat Genet 42:931–936. doi: 10.1038/ng.691 CrossRefPubMedGoogle Scholar
  18. Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15:1451–1455. doi: 10.1101/gr.4086505 CrossRefPubMedPubMedCentralGoogle Scholar
  19. Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB (2010) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci 108:1513–1518. doi: 10.1073/pnas.1017351108 CrossRefPubMedPubMedCentralGoogle Scholar
  20. Goecks J, Nekrutenko A, Taylor J, Galaxy T (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. doi: 10.1186/gb-2010-11-8-r86 PubMedPubMedCentralGoogle Scholar
  21. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD, Chen ZH, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. doi: 10.1038/nbt.1883 CrossRefPubMedPubMedCentralGoogle Scholar
  22. Green RE, Krause J, Ptak SE, Briggs AW, Ronan MT, Simons JF, Du L, Egholm M, Rothberg JM, Paunovic M, Paabo S (2006) Analysis of one million base pairs of Neanderthal DNA. Nature 444:330–336. doi: 10.1038/nature05336 CrossRefPubMedGoogle Scholar
  23. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai WW, Fritz MHY, Hansen NF, Durand EY, Malaspinas AS, Jensen JD, Marques-Bonet T, Alkan C, Prufer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Hober B, Hoffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, Novod N, Affourtit J, Egholm M, Verna C, Rudan P, Brajkovic D, Kucan Z, Gusic I, Doronichev VB, Golovanova LV, Lalueza-Fox C, de la Rasilla M, Fortea J, Rosas A, Schmitz RW, Johnson PLF, Eichler EE, Falush D, Birney E, Mullikin JC, Slatkin M, Nielsen R, Kelso J, Lachmann M, Reich D, Paabo S (2010) A draft sequence of the neandertal genome. Science 328:710–722. doi: 10.1126/science.1188021 CrossRefPubMedGoogle Scholar
  24. Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A, Besenbacher S, Magnusson G, Halldorsson BV, Hjartarson E, Sigurdsson GT, Stacey SN, Frigge ML, Holm H, Saemundsdottir J, Helgadottir HT, Johannsdottir H, Sigfusson G, Thorgeirsson G, Sverrisson JT, Gretarsdottir S, Walters GB, Rafnar T, Thjodleifsson B, Bjornsson ES, Olafsson S, Thorarinsdottir H, Steingrimsdottir T, Gudmundsdottir TS, Theodors A, Jonasson JG, Sigurdsson A, Bjornsdottir G, Jonsson JJ, Thorarensen O, Ludvigsson P, Gudbjartsson H, Eyjolfsson GI, Sigurdardottir O, Olafsson I, Arnar DO, Magnusson OT, Kong A, Masson G, Thorsteinsdottir U, Helgason A, Sulem P, Stefansson K (2015) Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. doi: 10.1038/ng.3247 PubMedCentralGoogle Scholar
  25. Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen JM, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Guemues ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GRS, Rosenfeld JA, Sisu C, Wei XM, Wilson M, Xue YL, Yu FL, Dermitzakis ET, Yu HY, Rubin MA, Tyler-Smith C, Gerstein M, 1000 Genomes Project Consortium (2013) Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342:84. doi: 10.1126/science.1235587 CrossRefGoogle Scholar
  26. Kidd JM, Sampas N, Antonacci F, Graves T, Fulton R, Hayden HS, Alkan C, Malig M, Ventura M, Giannuzzi G, Kallicki J, Anderson P, Tsalenko A, Yamada NA, Tsang P, Kaul R, Wilson RK, Bruhn L, Eichler EE (2010) Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat Methods 7:365–371. doi: 10.1038/nmeth.1451 CrossRefPubMedPubMedCentralGoogle Scholar
  27. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, Taillon BE, Chen Z, Tanzer A, Saunders ACE, Chi J, Yang F, Carter NP, Hurles ME, Weissman SM, Harkins TT, Gerstein MB, Egholm M, Snyder M (2007) Paired-end mapping reveals extensive structural variation in the human genome. Science 318:420–426CrossRefPubMedPubMedCentralGoogle Scholar
  28. Lander ES, International Human Genome Sequencing Consortium, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. doi: 10.1038/35057062 CrossRefPubMedGoogle Scholar
  29. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi: 10.1038/nmeth.1923 CrossRefPubMedPubMedCentralGoogle Scholar
  30. Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q, Qian W, Ren Y, Tian G, Li J, Zhou G, Zhu X, Wu H, Qin J, Jin X, Li D, Cao H, Hu X, Blanche H, Cann H, Zhang X, Li S, Bolund L, Kristiansen K, Yang H, Wang J, Wang J (2010) Building the sequence map of the human pan-genome. Nat Biotechnol 28:57–63. doi: 10.1038/nbt.1596 CrossRefPubMedGoogle Scholar
  31. Liu Y, Koyutürk M, Maxwell S, Xiang M, Veigl M, Cooper RS, Tayo BO, Li L, LaFramboise T, Wang Z, Zhu X, Chance MR (2014) Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing. BMC Genom 15:685. doi: 10.1186/1471-2164-15-685 CrossRefGoogle Scholar
  32. Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prufer K, de Filippo C, Sudmant PH, Alkan C, Fu QM, Do R, Rohland N, Tandon A, Siebauer M, Green RE, Bryc K, Briggs AW, Stenzel U, Dabney J, Shendure J, Kitzman J, Hammer MF, Shunkov MV, Derevianko AP, Patterson N, Andres AM, Eichler EE, Slatkin M, Reich D, Kelso J, Paabo S (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338:222–226. doi: 10.1126/science.1224344 CrossRefPubMedPubMedCentralGoogle Scholar
  33. Miga KH, Eisenhart C, Kent WJ (2015) Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Res. doi: 10.1093/nar/gkv671 PubMedCentralGoogle Scholar
  34. Mills RE, Pittard WS, Mullaney JM, Farooq U, Creasy TH, Mahurkar AA, Kemeza DM, Strassler DS, Ponting CP, Webber C, Devine SE (2011a) Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 21:830–839. doi: 10.1101/gr.115907.110 CrossRefPubMedPubMedCentralGoogle Scholar
  35. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HYK, Leng J, Li R, Li Y, Lin CY, Luo R, Mu XJ, Nemesh J, Peckham HE, Rausch T, Scally A, Shi X, Stromberg MP, Stütz AM, Urban AE, Walker JA, Wu J, Zhang Y, Zhang ZD, Batzer MA, Ding L, Marth GT, McVean G, Sebat J, Snyder M, Wang J, Ye K, Eichler EE, Gerstein MB, Hurles ME, Lee C, McCarroll SA, Korbel JO, 1000 Genomes Project (2011b) Mapping copy number variation by population scale genome sequencing. Nature 470:59–65CrossRefPubMedPubMedCentralGoogle Scholar
  36. Montgomery SB, Goode DL, Kvikstad E, Albers CA, Zhang ZDD, Mu XJ, Ananda G, Howie B, Karczewski KJ, Smith KS, Anaya V, Richardson R, Davis J, MacArthur DG, Sidow A, Duret L, Gerstein M, Makova KD, Marchini J, McVean G, Lunter G, 1000 Genomes Project Consortium (2013) The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res 23:749–761. doi: 10.1101/gr.148718.112 CrossRefPubMedPubMedCentralGoogle Scholar
  37. Morgulis A, Gertz EM, Schaffer AA, Agarwala R (2006) WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22:134–141. doi: 10.1093/bioinformatics/bit774 CrossRefPubMedGoogle Scholar
  38. Mundry M, Bornberg-Bauer E, Sammeth M, Feulner PGD (2012) Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach. PLoS One. doi: 10.1371/journal.pone.0031410 Google Scholar
  39. Pendleton M, Sebra R, Pang AWC, Ummat A, Franzen O, Rausch T, Stutz AM, Stedman W, Anantharaman T, Hastie A, Dai H, Fritz MHY, Cao H, Cohainl A, Deikusl G, Durrett RE, Blanchard SC, Altman R, Chin CS, Guo Y, Paxinos EE, Korbe JO, Darne RB, McCombiemii WR, Kwok PY, Mason CE, Schadt EE, Bashirl A (2015) Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods 12:780–786. doi: 10.1038/nmeth.3454 CrossRefPubMedPubMedCentralGoogle Scholar
  40. Ramos RTJ, Carneiro A, Azevedo RV, Schneider MP, Barh D, Silva A (2012) Simplifier: a web tool to eliminate redundant NGS contigs. Bioinformation 8:996–999CrossRefPubMedPubMedCentralGoogle Scholar
  41. Reich D, Nalls MA, Kao WH, Akylbekova EL, Tandon A, Patterson N, Mullikin J, Hsueh WC, Cheng CY, Coresh J, Boerwinkle E, Li M, Waliszewska A, Neubauer J, Li R, Leak TS, Ekunwe L, Files JC, Hardy CL, Zmuda JM, Taylor HA, Ziv E, Harris TB, Wilson JG (2009) Reduced neutrophil count in people of African descent is due to a regulatory variant in the Duffy antigen receptor for chemokines gene. PLoS Genet 5:e1000360. doi: 10.1371/journal.pgen.1000360 CrossRefPubMedPubMedCentralGoogle Scholar
  42. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26. doi: 10.1038/nbt.1754 CrossRefPubMedPubMedCentralGoogle Scholar
  43. Schmieder R, Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–864. doi: 10.1093/bioinformatics/btr026 CrossRefPubMedPubMedCentralGoogle Scholar
  44. Stark A, Kleer CG, Martin I, Awuah B, Nsiah-Asare A, Takyi V, Braman M, Quayson SE, Zarbo R, Wicha M, Newman L (2010) African ancestry and higher prevalence of triple-negative breast cancer findings from an International Study. Cancer 116:4926–4932. doi: 10.1002/cncr.25276 CrossRefPubMedPubMedCentralGoogle Scholar
  45. Stewart C, Kural D, Stromberg MP, Walker JA, Konkel MK, Stutz AM, Urban AE, Grubert F, Lam HYK, Lee WP, Busby M, Indap AR, Garrison E, Huff C, Xing JC, Snyder MP, Jorde LB, Batzer MA, Korbel JO, Marth GT, Genomes P (2011) A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genet. doi: 10.1371/journal.pgen.1002236 Google Scholar
  46. Stringer C, McKie R (1996) African exodus: the origins of modern humanity. Henery Holt and Company, New YorkGoogle Scholar
  47. Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, Eichler EE (2010) Diversity of human copy number variation and multicopy genes. Science 330:641–646. doi: 10.1126/science.1197005 CrossRefPubMedPubMedCentralGoogle Scholar
  48. Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, Coe BP, Baker C, Nordenfelt S, Bamshad M, Jorde LB, Posukh OL, Sahakyan H, Watkins WS, Yepiskoposyan L, Abdullah MS, Bravi CM, Capelli C, Hervig T, Wee JTS, Tyler-Smith C, Driem G, Romero IG, Jha AR, Karachanak-Yankova S, Toncheva D, Comas D, Henn B, Kivisild T, Ruiz-Linares A, Sajantila A, Metspalu E, Parik J, Villems R, Starikovskaya EB, Ayodo G, Beall CM, Rienzo AD, Hammer M, Khusainova R, Khusnutdinova E, Klitz W, Winkler C, Labuda D, Metspalu M, Tishkoff SA, Dryomov S, Sukernik R, Patterson N, Reich D, Eichler EE (2015) Global diversity, population stratification, and selection of human copy number variation. Science http://sciencemag.org/content/early/recent/6August2015/Page2/. doi: 10.1126/science.aab3761
  49. Templeton AR (2002) Out of Africa again and again. Nature 416:45–51. doi: 10.1038/416045a CrossRefPubMedGoogle Scholar
  50. Udpa N, Ronen R, Zhou D, Liang J, Stobdan T, Appenzeller O, Yin Y, Du Y, Guo L, Cao R, Wang Y, Jin X, Huang C, Jia W, Cao D, Guo G, Claydon VE, Hainsworth R, Gamboa JL, Zibenigus M, Zenebe G, Xue J, Liu S, Frazer KA, Li Y, Bafna V, Haddad GG (2014) Whole genome sequencing of Ethiopian highlanders reveals conserved hypoxia tolerance genes. Genome Biol 15:R36CrossRefPubMedPubMedCentralGoogle Scholar
  51. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XQH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang JH, Miklos GLG, Nelson C, Broder S, Clark AG, Nadeau C, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng ZM, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge WM, Gong FC, Gu ZP, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke ZX, Ketchum KA, Lai ZW, Lei YD, Li ZY, Li JY, Liang Y, Lin XY, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue BX, Sun JT, Wang ZY, Wang AH, Wang X, Wang J, Wei MH, Wides R, Xiao CL, Yan CH et al (2001) The sequence of the human genome. Science 291:1304. doi: 10.1126/science.1058040 CrossRefPubMedGoogle Scholar
  52. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, Gomes X, Tartaro K, Niazi F, Turcotte CL, Irzyk GP, Lupski JR, Chinault C, Song XZ, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny DM, Margulies M, Weinstock GM, Gibbs RA, Rothberg JM (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature 452:872–876. doi: 10.1038/nature06884 CrossRefPubMedGoogle Scholar
  53. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi: 10.1101/gr.074492.107 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Department of BiologyPortland State UniversityPortlandUSA

Personalised recommendations