Skip to main content

From Sequence Mapping to Genome Assemblies

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1201))

Abstract

The development of “next-generation” high-throughput sequencing technologies has made it possible for many labs to undertake sequencing-based research projects that were unthinkable just a few years ago. Although the scientific applications are diverse, e.g., new genome projects, gene expression analysis, genome-wide functional screens, or epigenetics—the sequence data are usually processed in one of two ways: sequence reads are either mapped to an existing reference sequence, or they are built into a new sequence (“de novo assembly”). In this chapter, we first discuss some limitations of the mapping process and how these may be overcome through local sequence assembly. We then introduce the concept of de novo assembly and describe essential assembly improvement procedures such as scaffolding, contig ordering, gap closure, error evaluation, gene annotation transfer and ab initio gene annotation. The results are high-quality draft assemblies that will facilitate informative downstream analyses.

An erratum to this chapter is available at http://dx.doi.org/10.1007/978-1-4939-1438-8_21

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-1-4939-1438-8_21

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9(9):868–877

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Myers EW et al (2000) A whole-genome assembly of Drosophila. Science 287:2196–2204

    Article  CAS  PubMed  Google Scholar 

  3. Simpson JT et al (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19(6):1117–1123

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Compeau PE, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29(11):987–991

    Article  CAS  PubMed  Google Scholar 

  6. Alkan C, Sajjadian S, Eichler EE (2011) Limitations of next-generation genome sequence assembly. Nat Methods 8(1):61–65

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Boetzer M et al (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27(4):578–579

    Article  CAS  PubMed  Google Scholar 

  8. Pop M, Kosack D, Salzberg S (2004) Hierarchical scaffolding with bambus. Genome Res 14:149–159

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Assefa S et al (2009) ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25(15):1968–1969

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. van Hijum S et al (2005) Projector 2: contig mapping for effecient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acid Res 33:560–566

    Article  Google Scholar 

  11. Tsai IJ, Otto TD, Berriman M (2010) Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol 11:R41

    Article  PubMed Central  PubMed  Google Scholar 

  12. Boetzer M, Pirovano W (2012) Toward almost closed genomes with GapFiller. Genome Biol 13(6):R56

    Article  PubMed Central  PubMed  Google Scholar 

  13. Otto TD et al (2010) Iterative correction of reference nucleotides (iCORN) using second generation sequencing technology. Bioinformatics 26(14):1704–1707

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Ronen R et al (2012) SEQuel: improving the accuracy of genome assemblies. Bioinformatics 28:i188–i196

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  15. Otto TD et al (2011) RATT: rapid annotation transfer tool. Nucleic Acids Res 39:e57

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Logan-Klumpler FJ et al (2012) GeneDB—an annotation database for pathogens. Nucleic Acids Res 40(Database issue):D98–D108

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Quail MA et al (2012) Optimal enzymes for amplifying sequencing libraries. Nat Methods 9:10–11

    Article  CAS  Google Scholar 

  18. Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22(3):549–556

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079

    Article  PubMed Central  PubMed  Google Scholar 

  20. Carver T et al (2012) BamView: visualizing and interpretation of next-generation sequencing read. Brief Bioinform 14:203–212

    Article  PubMed Central  PubMed  Google Scholar 

  21. Delcher AL et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Stanke M, Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 22:W465–W467

    Article  Google Scholar 

  23. Swain MT et al (2012) A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes. Nat Protoc 7(7):1260–1284

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Fonseca NA et al (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28:3169–3177

    Article  CAS  PubMed  Google Scholar 

  25. Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

I would like to thank Adam Reid, Martin Hunt, and Bernardo Foth for proofreading the chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas D. Otto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this protocol

Cite this protocol

Otto, T.D. (2015). From Sequence Mapping to Genome Assemblies. In: Peacock, C. (eds) Parasite Genomics Protocols. Methods in Molecular Biology, vol 1201. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-1438-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1438-8_2

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-1437-1

  • Online ISBN: 978-1-4939-1438-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics