From Sequence Mapping to Genome Assemblies

Otto, Thomas D.

doi:10.1007/978-1-4939-1438-8_2

From Sequence Mapping to Genome Assemblies

Thomas D. Otto³

Protocol
First Online: 01 January 2014

5017 Accesses
3 Citations
1 Altmetric

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1201))

Abstract

The development of “next-generation” high-throughput sequencing technologies has made it possible for many labs to undertake sequencing-based research projects that were unthinkable just a few years ago. Although the scientific applications are diverse, e.g., new genome projects, gene expression analysis, genome-wide functional screens, or epigenetics—the sequence data are usually processed in one of two ways: sequence reads are either mapped to an existing reference sequence, or they are built into a new sequence (“de novo assembly”). In this chapter, we first discuss some limitations of the mapping process and how these may be overcome through local sequence assembly. We then introduce the concept of de novo assembly and describe essential assembly improvement procedures such as scaffolding, contig ordering, gap closure, error evaluation, gene annotation transfer and ab initio gene annotation. The results are high-quality draft assemblies that will facilitate informative downstream analyses.

An erratum to this chapter is available at http://dx.doi.org/10.1007/978-1-4939-1438-8_21

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-1-4939-1438-8_21

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9(9):868–877
Article PubMed Central CAS PubMed Google Scholar
Myers EW et al (2000) A whole-genome assembly of Drosophila. Science 287:2196–2204
Article CAS PubMed Google Scholar
Simpson JT et al (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19(6):1117–1123
Article PubMed Central CAS PubMed Google Scholar
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829
Article PubMed Central CAS PubMed Google Scholar
Compeau PE, Pevzner PA, Tesler G (2011) How to apply de Bruijn graphs to genome assembly. Nat Biotechnol 29(11):987–991
Article CAS PubMed Google Scholar
Alkan C, Sajjadian S, Eichler EE (2011) Limitations of next-generation genome sequence assembly. Nat Methods 8(1):61–65
Article PubMed Central CAS PubMed Google Scholar
Boetzer M et al (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27(4):578–579
Article CAS PubMed Google Scholar
Pop M, Kosack D, Salzberg S (2004) Hierarchical scaffolding with bambus. Genome Res 14:149–159
Article PubMed Central CAS PubMed Google Scholar
Assefa S et al (2009) ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25(15):1968–1969
Article PubMed Central CAS PubMed Google Scholar
van Hijum S et al (2005) Projector 2: contig mapping for effecient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acid Res 33:560–566
Article Google Scholar
Tsai IJ, Otto TD, Berriman M (2010) Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol 11:R41
Article PubMed Central PubMed Google Scholar
Boetzer M, Pirovano W (2012) Toward almost closed genomes with GapFiller. Genome Biol 13(6):R56
Article PubMed Central PubMed Google Scholar
Otto TD et al (2010) Iterative correction of reference nucleotides (iCORN) using second generation sequencing technology. Bioinformatics 26(14):1704–1707
Article PubMed Central CAS PubMed Google Scholar
Ronen R et al (2012) SEQuel: improving the accuracy of genome assemblies. Bioinformatics 28:i188–i196
Article PubMed Central CAS PubMed Google Scholar
Otto TD et al (2011) RATT: rapid annotation transfer tool. Nucleic Acids Res 39:e57
Article PubMed Central CAS PubMed Google Scholar
Logan-Klumpler FJ et al (2012) GeneDB—an annotation database for pathogens. Nucleic Acids Res 40(Database issue):D98–D108
Article PubMed Central CAS PubMed Google Scholar
Quail MA et al (2012) Optimal enzymes for amplifying sequencing libraries. Nat Methods 9:10–11
Article CAS Google Scholar
Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22(3):549–556
Article PubMed Central CAS PubMed Google Scholar
Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079
Article PubMed Central PubMed Google Scholar
Carver T et al (2012) BamView: visualizing and interpretation of next-generation sequencing read. Brief Bioinform 14:203–212
Article PubMed Central PubMed Google Scholar
Delcher AL et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27(23):4636–4641
Article PubMed Central CAS PubMed Google Scholar
Stanke M, Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 22:W465–W467
Article Google Scholar
Swain MT et al (2012) A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes. Nat Protoc 7(7):1260–1284
Article PubMed Central CAS PubMed Google Scholar
Fonseca NA et al (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28:3169–3177
Article CAS PubMed Google Scholar
Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067
Article CAS PubMed Google Scholar

Download references

Acknowledgements

I would like to thank Adam Reid, Martin Hunt, and Bernardo Foth for proofreading the chapter.

Author information

Authors and Affiliations

Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
Thomas D. Otto

Authors

Thomas D. Otto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas D. Otto .

Editor information

Editors and Affiliations

University of Western Australia, Nedlands, West Australia, Australia
Christopher Peacock

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Otto, T.D. (2015). From Sequence Mapping to Genome Assemblies. In: Peacock, C. (eds) Parasite Genomics Protocols. Methods in Molecular Biology, vol 1201. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-1438-8_2

Download citation

DOI: https://doi.org/10.1007/978-1-4939-1438-8_2
Published: 07 October 2014
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-1437-1
Online ISBN: 978-1-4939-1438-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics