Abstract
Present-day high-throughput sequencing techniques routinely produce a flood of genomic information (as high as 540-600 Gbases/machine/week for some technologies). The output comes under the form of short sequence reads; in a typical resequencing application (where the knowledge of a reference genome for the organism being studied is assumed) the sequence reads need to be aligned to the reference.
Such high yields make the use of traditional alignment programs like BLAST unpractical; while resequencing, on the other hand, one is usually interested in considering only matches showing a very high sequence similarity with the original read. This new working setup required the development of a generation of new high-throughput lower-sensitivity alignment programs, called short-read mappers.
Influenced by the standpoint of the algorithm designer, published literature tends to overemphasize speed, and standard working conditions, at the expense of accuracy. In this chapter we attempt to review the state-of-the-art of short-read alignment technology, focusing more on the user’s standpoint, and on what is necessary to know to be able to design a high-quality mapping analysis workflow, rather than on purely technical issues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
M. L. Metzker. Sequencing technologies – the next generation. Nature Reviews Genetics, 11(1):31–46, January 2010.
J. M. Rothberg and J. H. Leamon. The development and impact of 454 sequencing. Nature Biotechnologies, 26(10):1117–1124, 2008.
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–10, October 1990.
H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754–1760, 2009.
W. J. Kent. BLAT: The BLAST-like alignment tool. Genome Research, 12(4):656–664, 2002.
B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25, 2009.
P. Ribeca and G. Valiente. Computational challenges of sequence classification in microbiomic data. Briefings in Bioinformatics, April 2011.
The RGASP: RNA-seq read alignment assessment. http://www.gencodegenes.org/rgasp/rgasp3.html.
B. Ewing and P. Green. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Research, 8(3):186–194, 1998.
P. Ribeca. GEM: GEnomic Multi-tool. http://gemlibrary.sourceforge.net, 2009.
A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, et al. The Genome Analysis ToolKit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9):1297–303, September 2010.
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16):2078–9, August 2009.
J. Rozowsky, G. Euskirchen, R. K. Auerbach, Z. D. Zhang, T. Gibson, et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotechnologies, 27(1):66–75, January 2009.
D. Karolchik, A. S. Hinrichs, and W. J. Kent. The UCSC genome browser. Current Protocols in Bioinformatics, Chapter 1:Unit 1.4, March 2007.
Wikipedia: Sequence alignment software. http://en.wikipedia.org/wiki/Sequence_alignment_software.
H. Li and N. Homer. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 11(5):473–483, 2010.
H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, and M. Li. ZOOM! Zillions of oligos mapped. Bioinformatics, 24(21):2431–7, November 2008.
R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15):1966–7, August 2009.
N. Malhis, Y. S.-N. Butterfield, M. Ester, and S. J.-M. Jones. Slider – maximum use of probability information for alignment of short sequence reads and SNP detection. Bioinformatics, 25(1):6–13, January 2009.
F. Hach, F. Hormozdiari, C. Alkan, F. Hormozdiari, I. Birol, et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nature Methods, 7(8):576–7, August 2010.
P. Jokinen and E. Ukkonen. Two algorithms for approxmate string matching in static texts. Mathematical Foundations of Computer Science 1991, pages 240–248, 1991.
P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS 2000), pages 390–398, 2000.
D. Gusfield. Algorithms on strings, trees, and sequences. Cambridge University Press, 1997.
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, CA, 1994.
J. Seward. Bzip2 and libbzip2: a program and library for data compression. http://sources.redhat.com/bzip2, 1998.
G. Navarro and R. Baeza-Yates. A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms, 1(1):205–239, 2000.
J. Eid, A. Fehr, J. Gray, K. Luong, J. Lyle, et al. Real-time DNA sequencing from single polymerase molecules. Science, 323(5910):133–138, 2009.
R. Drmanac, A. B. Sparks, M. J. Callow, A. L. Halpern, N. L. Burns, et al. Human genome sequencing using unchained base reads on self-assembling dna nanoarrays. Science, 327(5961):78–81, January 2010.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Ribeca, P. (2012). Short-Read Mapping. In: Rodríguez-Ezpeleta, N., Hackenberg, M., Aransay, A. (eds) Bioinformatics for High Throughput Sequencing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0782-9_7
Download citation
DOI: https://doi.org/10.1007/978-1-4614-0782-9_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0781-2
Online ISBN: 978-1-4614-0782-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)