Skip to main content

Short-Read Mapping

  • Chapter
  • First Online:
Bioinformatics for High Throughput Sequencing

Abstract

Present-day high-throughput sequencing techniques routinely produce a flood of genomic information (as high as 540-600 Gbases/machine/week for some technologies). The output comes under the form of short sequence reads; in a typical resequencing application (where the knowledge of a reference genome for the organism being studied is assumed) the sequence reads need to be aligned to the reference.

Such high yields make the use of traditional alignment programs like BLAST unpractical; while resequencing, on the other hand, one is usually interested in considering only matches showing a very high sequence similarity with the original read. This new working setup required the development of a generation of new high-throughput lower-sensitivity alignment programs, called short-read mappers.

Influenced by the standpoint of the algorithm designer, published literature tends to overemphasize speed, and standard working conditions, at the expense of accuracy. In this chapter we attempt to review the state-of-the-art of short-read alignment technology, focusing more on the user’s standpoint, and on what is necessary to know to be able to design a high-quality mapping analysis workflow, rather than on purely technical issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • M. L. Metzker. Sequencing technologies – the next generation. Nature Reviews Genetics, 11(1):31–46, January 2010.

    Article  PubMed  CAS  Google Scholar 

  • J. M. Rothberg and J. H. Leamon. The development and impact of 454 sequencing. Nature Biotechnologies, 26(10):1117–1124, 2008.

    Article  CAS  Google Scholar 

  • S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–10, October 1990.

    PubMed  CAS  Google Scholar 

  • H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754–1760, 2009.

    Article  PubMed  CAS  Google Scholar 

  • W. J. Kent. BLAT: The BLAST-like alignment tool. Genome Research, 12(4):656–664, 2002.

    PubMed  CAS  Google Scholar 

  • B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25, 2009.

    Google Scholar 

  • P. Ribeca and G. Valiente. Computational challenges of sequence classification in microbiomic data. Briefings in Bioinformatics, April 2011.

    Google Scholar 

  • The RGASP: RNA-seq read alignment assessment. http://www.gencodegenes.org/rgasp/rgasp3.html.

  • B. Ewing and P. Green. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Research, 8(3):186–194, 1998.

    CAS  Google Scholar 

  • P. Ribeca. GEM: GEnomic Multi-tool. http://gemlibrary.sourceforge.net, 2009.

  • A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, et al. The Genome Analysis ToolKit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9):1297–303, September 2010.

    Article  PubMed  CAS  Google Scholar 

  • H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16):2078–9, August 2009.

    Article  PubMed  Google Scholar 

  • J. Rozowsky, G. Euskirchen, R. K. Auerbach, Z. D. Zhang, T. Gibson, et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotechnologies, 27(1):66–75, January 2009.

    Article  CAS  Google Scholar 

  • D. Karolchik, A. S. Hinrichs, and W. J. Kent. The UCSC genome browser. Current Protocols in Bioinformatics, Chapter 1:Unit 1.4, March 2007.

    Google Scholar 

  • Wikipedia: Sequence alignment software. http://en.wikipedia.org/wiki/Sequence_alignment_software.

  • H. Li and N. Homer. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 11(5):473–483, 2010.

    Article  PubMed  CAS  Google Scholar 

  • H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, and M. Li. ZOOM! Zillions of oligos mapped. Bioinformatics, 24(21):2431–7, November 2008.

    Article  PubMed  CAS  Google Scholar 

  • R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15):1966–7, August 2009.

    Article  PubMed  CAS  Google Scholar 

  • N. Malhis, Y. S.-N. Butterfield, M. Ester, and S. J.-M. Jones. Slider – maximum use of probability information for alignment of short sequence reads and SNP detection. Bioinformatics, 25(1):6–13, January 2009.

    Article  PubMed  CAS  Google Scholar 

  • F. Hach, F. Hormozdiari, C. Alkan, F. Hormozdiari, I. Birol, et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nature Methods, 7(8):576–7, August 2010.

    Article  PubMed  CAS  Google Scholar 

  • P. Jokinen and E. Ukkonen. Two algorithms for approxmate string matching in static texts. Mathematical Foundations of Computer Science 1991, pages 240–248, 1991.

    Google Scholar 

  • P. Ferragina and G.  Manzini. Opportunistic data structures with applications. In Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS 2000), pages 390–398, 2000.

    Google Scholar 

  • D. Gusfield. Algorithms on strings, trees, and sequences. Cambridge University Press, 1997.

    Google Scholar 

  • M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, CA, 1994.

    Google Scholar 

  • J. Seward. Bzip2 and libbzip2: a program and library for data compression. http://sources.redhat.com/bzip2, 1998.

  • G. Navarro and R. Baeza-Yates. A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms, 1(1):205–239, 2000.

    Google Scholar 

  • J. Eid, A. Fehr, J. Gray, K. Luong, J. Lyle, et al. Real-time DNA sequencing from single polymerase molecules. Science, 323(5910):133–138, 2009.

    Article  PubMed  CAS  Google Scholar 

  • R. Drmanac, A. B. Sparks, M. J. Callow, A. L. Halpern, N. L. Burns, et al. Human genome sequencing using unchained base reads on self-assembling dna nanoarrays. Science, 327(5961):78–81, January 2010.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Ribeca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Ribeca, P. (2012). Short-Read Mapping. In: Rodríguez-Ezpeleta, N., Hackenberg, M., Aransay, A. (eds) Bioinformatics for High Throughput Sequencing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0782-9_7

Download citation

Publish with us

Policies and ethics