Short-Read Mapping

Ribeca, Paolo

doi:10.1007/978-1-4614-0782-9_7

Paolo Ribeca⁴

7773 Accesses
2 Citations

Abstract

Present-day high-throughput sequencing techniques routinely produce a flood of genomic information (as high as 540-600 Gbases/machine/week for some technologies). The output comes under the form of short sequence reads; in a typical resequencing application (where the knowledge of a reference genome for the organism being studied is assumed) the sequence reads need to be aligned to the reference.

Such high yields make the use of traditional alignment programs like BLAST unpractical; while resequencing, on the other hand, one is usually interested in considering only matches showing a very high sequence similarity with the original read. This new working setup required the development of a generation of new high-throughput lower-sensitivity alignment programs, called short-read mappers.

Influenced by the standpoint of the algorithm designer, published literature tends to overemphasize speed, and standard working conditions, at the expense of accuracy. In this chapter we attempt to review the state-of-the-art of short-read alignment technology, focusing more on the user’s standpoint, and on what is necessary to know to be able to design a high-quality mapping analysis workflow, rather than on purely technical issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

M. L. Metzker. Sequencing technologies – the next generation. Nature Reviews Genetics, 11(1):31–46, January 2010.
Article PubMed CAS Google Scholar
J. M. Rothberg and J. H. Leamon. The development and impact of 454 sequencing. Nature Biotechnologies, 26(10):1117–1124, 2008.
Article CAS Google Scholar
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–10, October 1990.
PubMed CAS Google Scholar
H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754–1760, 2009.
Article PubMed CAS Google Scholar
W. J. Kent. BLAT: The BLAST-like alignment tool. Genome Research, 12(4):656–664, 2002.
PubMed CAS Google Scholar
B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25, 2009.
Google Scholar
P. Ribeca and G. Valiente. Computational challenges of sequence classification in microbiomic data. Briefings in Bioinformatics, April 2011.
Google Scholar
The RGASP: RNA-seq read alignment assessment. http://www.gencodegenes.org/rgasp/rgasp3.html.
B. Ewing and P. Green. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Research, 8(3):186–194, 1998.
CAS Google Scholar
P. Ribeca. GEM: GEnomic Multi-tool. http://gemlibrary.sourceforge.net, 2009.
A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, et al. The Genome Analysis ToolKit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9):1297–303, September 2010.
Article PubMed CAS Google Scholar
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16):2078–9, August 2009.
Article PubMed Google Scholar
J. Rozowsky, G. Euskirchen, R. K. Auerbach, Z. D. Zhang, T. Gibson, et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotechnologies, 27(1):66–75, January 2009.
Article CAS Google Scholar
D. Karolchik, A. S. Hinrichs, and W. J. Kent. The UCSC genome browser. Current Protocols in Bioinformatics, Chapter 1:Unit 1.4, March 2007.
Google Scholar
Wikipedia: Sequence alignment software. http://en.wikipedia.org/wiki/Sequence_alignment_software.
H. Li and N. Homer. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 11(5):473–483, 2010.
Article PubMed CAS Google Scholar
H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, and M. Li. ZOOM! Zillions of oligos mapped. Bioinformatics, 24(21):2431–7, November 2008.
Article PubMed CAS Google Scholar
R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15):1966–7, August 2009.
Article PubMed CAS Google Scholar
N. Malhis, Y. S.-N. Butterfield, M. Ester, and S. J.-M. Jones. Slider – maximum use of probability information for alignment of short sequence reads and SNP detection. Bioinformatics, 25(1):6–13, January 2009.
Article PubMed CAS Google Scholar
F. Hach, F. Hormozdiari, C. Alkan, F. Hormozdiari, I. Birol, et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nature Methods, 7(8):576–7, August 2010.
Article PubMed CAS Google Scholar
P. Jokinen and E. Ukkonen. Two algorithms for approxmate string matching in static texts. Mathematical Foundations of Computer Science 1991, pages 240–248, 1991.
Google Scholar
P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS 2000), pages 390–398, 2000.
Google Scholar
D. Gusfield. Algorithms on strings, trees, and sequences. Cambridge University Press, 1997.
Google Scholar
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, CA, 1994.
Google Scholar
J. Seward. Bzip2 and libbzip2: a program and library for data compression. http://sources.redhat.com/bzip2, 1998.
G. Navarro and R. Baeza-Yates. A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms, 1(1):205–239, 2000.
Google Scholar
J. Eid, A. Fehr, J. Gray, K. Luong, J. Lyle, et al. Real-time DNA sequencing from single polymerase molecules. Science, 323(5910):133–138, 2009.
Article PubMed CAS Google Scholar
R. Drmanac, A. B. Sparks, M. J. Callow, A. L. Halpern, N. L. Burns, et al. Human genome sequencing using unchained base reads on self-assembling dna nanoarrays. Science, 327(5961):78–81, January 2010.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Centro Nacional de Análisis Genómico, Baldiri Reixac 4, Barcelona, Spain
Paolo Ribeca

Authors

Paolo Ribeca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paolo Ribeca .

Editor information

Editors and Affiliations

CIC bioGUNE, Genome Analysis Platform, Derio, 48160, Spain
Naiara Rodríguez-Ezpeleta
Computational Genomics and Bioinformatic, University of Granada, Genetics Department & Biomedical Research Center (CIBM), Granada, Spain
Michael Hackenberg
CIC bioGUNE, Genome Analysis Platform, Derio, 48160, Spain
Ana M. Aransay

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ribeca, P. (2012). Short-Read Mapping. In: Rodríguez-Ezpeleta, N., Hackenberg, M., Aransay, A. (eds) Bioinformatics for High Throughput Sequencing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0782-9_7

Download citation

DOI: https://doi.org/10.1007/978-1-4614-0782-9_7
Published: 22 September 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0781-2
Online ISBN: 978-1-4614-0782-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics