Advertisement

A Step-by-Step Guide to Assemble a Reptilian Genome

  • Asier Ullate-Agote
  • Yingguang Frank Chan
  • Athanasia C. TzikaEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1650)

Abstract

Multiple technologies and software are now available facilitating the de novo sequencing and assembly of any vertebrate genome. Yet the quality of most available sequenced genomes is substantially poorer than that of the golden standard in the field: the human genome. Here, we present a step-by-step protocol for the successful sequencing and assembly of a high-quality snake genome that can be applied to any other reptilian or avian species. We combine the great sequencing depth and accuracy of short reads with the use of different insert size libraries for extended scaffolding followed by optical mapping. We show that this procedure improved the corn snake scaffold N50 from 3.7 kbp to 1.4 Mbp, currently making it one of the snake genomes with the longest scaffolds. Short guidelines are also given on the extraction of long DNA molecules from reptilian blood and the necessary modifications in DNA extraction protocols. This chapter is accompanied by a website (www.reptilomics.org/stepbystep.html), where we provide links to the suggested software, examples of input and output files, and running parameters.

Key words

Genome sequencing Genome assembly Genomics Corn snake Optical mapping Snake genome DNA extraction Transcriptome 

Notes

Acknowledgments

We would like to thank Adrien Debry for blood collection, William H. Beluch for the sample preparation of the HiSeq2500 run, and Carine Langrez for the isolation of megabase DNA. We thank the Lucigen team for their support and results. The VIB Nucleomics Core team for the preparation of the BioNano samples and initial downstream analyses. Most computations were performed at the Vital-IT Centre for high-performance computing (www.vital-it.ch) of the SIB Swiss Institute of Bioinformatics and the Baobab cluster of the University of Geneva. This work was supported by grants from the University of Geneva (Switzerland), the Swiss National Science Foundation (FNSNF, grants 31003A_140785 and SINERGIA CRSII3_132430), and the SystemsX.ch initiative (project EpiPhysX). The authors thank Michel Milinkovitch for comments on an earlier version of this manuscript. Yingguang Frank Chan was supported by the Max Planck Society. An iGE3 PhD award was granted to Asier Ullate-Agote.

References

  1. 1.
    Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J, International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921. doi: 10.1038/35057062 CrossRefPubMedGoogle Scholar
  2. 2.
    Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IM, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DM, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara ECM, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang GD, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM, O’Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Ernest Sohna Sohna J, Spence EJ, Stevens K, Sutton N, Szajkowski L, Tregidgo CL, Turcatti G, Vandevondele S, Verhovsky Y, Virk SM, Wakelin S, Walcott GC, Wang J, Worsley GJ, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218):53–59. doi: 10.1038/nature07517 CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Science 323(5910):133–138. doi: 10.1126/science.1162986 CrossRefPubMedGoogle Scholar
  4. 4.
    Lewin HA, Larkin DM, Pontius J, O’Brien SJ (2009) Every genome sequence needs a good map. Genome Res 19(11):1925–1928. doi: 10.1101/gr.094557.109 CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Milinkovitch MC, Helaers R, Depiereux E, Tzika AC, Gabaldon T (2010) 2× Genomes—depth does matter. Genome Biol 11(2):R16. doi: 10.1186/gb-2010-11-2-r16 CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Saenko SV, Lamichhaney S, Martinez Barrio A, Rafati N, Andersson L, Milinkovitch MC (2015) Amelanism in the corn snake is associated with the insertion of an LTR-retrotransposon in the OCA2 gene. Sci Rep 5:17118. doi: 10.1038/srep17118 CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Ullate-Agote A, Milinkovitch MC, Tzika AC (2014) The genome sequence of the corn snake (Pantherophis guttatus), a valuable resource for EvoDevo studies in squamates. Int J Dev Biol 58(10-12):881–888. doi: 10.1387/ijdb.150060at CrossRefPubMedGoogle Scholar
  8. 8.
    Alfoldi J, Di Palma F, Grabherr M, Williams C, Kong L, Mauceli E, Russell P, Lowe CB, Glor RE, Jaffe JD, Ray DA, Boissinot S, Shedlock AM, Botka C, Castoe TA, Colbourne JK, Fujita MK, Moreno RG, ten Hallers BF, Haussler D, Heger A, Heiman D, Janes DE, Johnson J, de Jong PJ, Koriabine MY, Lara M, Novick PA, Organ CL, Peach SE, Poe S, Pollock DD, de Queiroz K, Sanger T, Searle S, Smith JD, Smith Z, Swofford R, Turner-Maier J, Wade J, Young S, Zadissa A, Edwards SV, Glenn TC, Schneider CJ, Losos JB, Lander ES, Breen M, Ponting CP, Lindblad-Toh K (2011) The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature 477(7366):587–591. doi: 10.1038/nature10390 CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Badenhorst D, Hillier LW, Literman R, Montiel EE, Radhakrishnan S, Shen Y, Minx P, Janes DE, Warren WC, Edwards SV, Valenzuela N (2015) Physical mapping and refinement of the painted turtle genome (Chrysemys picta) inform amniote genome evolution and challenge turtle-bird chromosomal conservation. Genome Biol Evol 7(7):2038–2050. doi: 10.1093/gbe/evv119 CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Di-Poi N, Montoya-Burgos JI, Miller H, Pourquie O, Milinkovitch MC, Duboule D (2010) Changes in Hox genes’ structure and function during the evolution of the squamate body plan. Nature 464(7285):99–103. doi: 10.1038/nature08789 CrossRefPubMedGoogle Scholar
  11. 11.
    Vonk FJ, Casewell NR, Henkel CV, Heimberg AM, Jansen HJ, McCleary RJ, Kerkkamp HM, Vos RA, Guerreiro I, Calvete JJ, Wuster W, Woods AE, Logan JM, Harrison RA, Castoe TA, de Koning AP, Pollock DD, Yandell M, Calderon D, Renjifo C, Currier RB, Salgado D, Pla D, Sanz L, Hyder AS, Ribeiro JM, Arntzen JW, van den Thillart GE, Boetzer M, Pirovano W, Dirks RP, Spaink HP, Duboule D, McGlinn E, Kini RM, Richardson MK (2013) The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc Natl Acad Sci U S A 110(51):20651–20656. doi: 10.1073/pnas.1314702110 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, Chitsaz H, Chou WC, Corbeil J, Del Fabbro C, Docking TR, Durbin R, Earl D, Emrich S, Fedotov P, Fonseca NA, Ganapathy G, Gibbs RA, Gnerre S, Godzaridis E, Goldstein S, Haimel M, Hall G, Haussler D, Hiatt JB, Ho IY, Howard J, Hunt M, Jackman SD, Jaffe DB, Jarvis ED, Jiang H, Kazakov S, Kersey PJ, Kitzman JO, Knight JR, Koren S, Lam TW, Lavenier D, Laviolette F, Li Y, Li Z, Liu B, Liu Y, Luo R, Maccallum I, Macmanes MD, Maillet N, Melnikov S, Naquin D, Ning Z, Otto TD, Paten B, Paulo OS, Phillippy AM, Pina-Martins F, Place M, Przybylski D, Qin X, Qu C, Ribeiro FJ, Richards S, Rokhsar DS, Ruby JG, Scalabrin S, Schatz MC, Schwartz DC, Sergushichev A, Sharpe T, Shaw TI, Shendure J, Shi Y, Simpson JT, Song H, Tsarev F, Vezzi F, Vicedomini R, Vieira BM, Wang J, Worley KC, Yin S, Yiu SM, Yuan J, Zhang G, Zhang H, Zhou S, Korf IF (2013) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2(1):10. doi: 10.1186/2047-217X-2-10 CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HO, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol I, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B (2011) Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res 21(12):2224–2241. doi: 10.1101/gr.126599.111 CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Pendleton M, Sebra R, Pang AW, Ummat A, Franzen O, Rausch T, Stutz AM, Stedman W, Anantharaman T, Hastie A, Dai H, Fritz MH, Cao H, Cohain A, Deikus G, Durrett RE, Blanchard SC, Altman R, Chin CS, Guo Y, Paxinos EE, Korbel JO, Darnell RB, McCombie WR, Kwok PY, Mason CE, Schadt EE, Bashir A (2015) Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods 12(8):780–786. doi: 10.1038/nmeth.3454 CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Jiang H, Lei R, Ding SW, Zhu S (2014) Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15:182. doi: 10.1186/1471-2105-15-182 CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. doi: 10.1186/1471-2105-10-421 CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Tzika AC, Ullate-Agote A, Grbic D, Milinkovitch MC (2015) Reptilian transcriptomes v2.0: an extensive resource for sauropsida genomics and transcriptomics. Genome Biol Evol 7(6):1827–1841. doi: 10.1093/gbe/evv106 CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120. doi: 10.1093/bioinformatics/btu170 CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. doi: 10.1093/bioinformatics/btp324 CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing Subgroup (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079. doi: 10.1093/bioinformatics/btp352 CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Sahlin K, Vezzi F, Nystedt B, Lundeberg J, Arvestad L (2014) BESST—efficient scaffolding of large fragmented assemblies. BMC Bioinformatics 15:281. doi: 10.1186/1471-2105-15-281 CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD (2013) REAPR: a universal tool for genome assembly evaluation. Genome Biol 14(5):R47. doi: 10.1186/gb-2013-14-5-r47 CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27(4):578–579. doi: 10.1093/bioinformatics/btq683 CrossRefPubMedGoogle Scholar
  24. 24.
    Xue W, Li JT, Zhu YP, Hou GY, Kong XF, Kuang YY, Sun XW (2013) L_RNA_scaffolder: scaffolding genomes with transcripts. BMC Genomics 14:604. doi: 10.1186/1471-2164-14-604 CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067. doi: 10.1093/bioinformatics/btm071 CrossRefPubMedGoogle Scholar
  26. 26.
    Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212. doi: 10.1093/bioinformatics/btv351 CrossRefPubMedGoogle Scholar
  27. 27.
    Love RR, Weisenfeld NI, Jaffe DB, Besansky NJ, Neafsey DE (2016) Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly. BMC Genomics 17:187. doi: 10.1186/s12864-016-2531-7 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  • Asier Ullate-Agote
    • 1
    • 2
    • 3
  • Yingguang Frank Chan
    • 4
  • Athanasia C. Tzika
    • 1
    • 2
    • 3
    Email author
  1. 1.Laboratory of Artificial and Natural Evolution (LANE), Department of Genetics and EvolutionUniversity of GenevaGenevaSwitzerland
  2. 2.SIB Swiss Institute of BioinformaticsGenevaSwitzerland
  3. 3.Institute of Genetics and Genomics of Geneva (iGE3)University of GenevaGenevaSwitzerland
  4. 4.Friedrich Miescher Laboratory of the Max Planck SocietyTübingenGermany

Personalised recommendations