Skip to main content

Base-Calling for Bioinformaticians

  • Chapter
  • First Online:
Bioinformatics for High Throughput Sequencing

Abstract

High-throughput platforms execute billions of simultaneous sequencing reactions. Base-calling is the process of decoding the output signals of these reactions into sequence reads. In this chapter, we detail the facets of base-calling using the perspective of signal communication. We primarily focus on the Illumina high-throughput sequencing platform and review different third-party base-calling implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR et al. 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218): 53–59.

    Article  PubMed  CAS  Google Scholar 

  • Erlich Y, Mitra PP, delaBastide M, McCombie WR, Hannon GJ. 2008. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods 5(8): 679–682.

    Google Scholar 

  • Ewing B, Green P. 1998. Base-calling of automated sequencer traces using Phred II error ­probabilities. Genome Res 8(3): 186–194.

    CAS  Google Scholar 

  • Ewing B, Hillier L, Wendl MC, Green P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8(3): 175–185.

    CAS  Google Scholar 

  • Kailath T, Poor HV. 1998. Detection of stochastic processes. IEEE T. Inform Theory 44(6): 2230–2259.

    Article  Google Scholar 

  • Kao WC, Song YS. 2011. naiveBayesCall: An Efficient Model-Based Base-Calling Algorithm for High-Throughput Sequencing. J Comput Biol 18(3): 365–377.

    Article  PubMed  CAS  Google Scholar 

  • Kao WC, Stevens K, Song YS. 2009. BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing. Genome Res 19(10): 1884–1895.

    Article  PubMed  CAS  Google Scholar 

  • Kircher M, Stenzel U, Kelso J. 2009. Improved base-calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol 10(8): R83.

    Article  PubMed  Google Scholar 

  • Kriseman J, Busick C, Szelinger S, Dinu V. 2010. BING: biomedical informatics pipeline for Next Generation Sequencing. J Biomed Inform 43(3): 428–434.

    Article  PubMed  CAS  Google Scholar 

  • Ledergerber C, Dessimoz C. 2011. Base-calling for next-generation sequencing platforms. Brief Bioinform.

    Google Scholar 

  • Li L, Speed TP. 1999. An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequencing. Electrophoresis 20(7): 1433–1442.

    Article  PubMed  CAS  Google Scholar 

  • Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950): 289–293.

    Article  PubMed  CAS  Google Scholar 

  • Lister R, Pelizzola M, Kida YS, Hawkins RD, Nery JR, Hon G, Antosiewicz-Bourget J, O’Malley R, Castanon R, Klugman S et al. 2011. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471(7336): 68–73.

    Article  PubMed  CAS  Google Scholar 

  • Metzker ML. 2010. Sequencing technologies – the next generation. Nat Rev Genet 11(1): 31–46.

    Article  PubMed  CAS  Google Scholar 

  • Quinlan AR, Stewart DA, Stromberg MP, Marth GT. 2008. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Methods 5(2): 179–181.

    Article  PubMed  CAS  Google Scholar 

  • Romiguier J, Ranwez V, Douzery EJ, Galtier N. 2010. Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. Genome Res 20(8): 1001–1009.

    Article  PubMed  CAS  Google Scholar 

  • Rougemont J, Amzallag A, Iseli C, Farinelli L, Xenarios I, Naef F. 2008. Probabilistic base-calling of Solexa sequencing data. BMC Bioinformatics 9: 431.

    Article  PubMed  Google Scholar 

  • Shenoi BA. 2006. Introduction to digital signal processing and filter design. Wiley ; John Wiley [distributor], Hoboken, NJ.

    Google Scholar 

  • Sklar LA. 2005. Flow cytometry for biotechnology. Oxford University Press, New York.

    Google Scholar 

  • Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1): 57–63.

    Article  PubMed  CAS  Google Scholar 

  • Whiteford N, Skelly T, Curtis C, Ritchie ME, Lohr A, Zaranek AW, Abnizova I, Brown C. 2009. Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics 25(17): 2194–2199.

    Article  PubMed  CAS  Google Scholar 

  • Wu X, Ding L, Li Z, Zhang Y, Liu X, Wang L. 2010. Determination of the migration of bisphenol diglycidyl ethers from food contact materials by high performance chromatography-tandem mass spectrometry coupled with multi-walled carbon nanotubes solid phase extraction. Se Pu 28(11): 1094–1098.

    PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Fabian Menges, Giuseppe Narzisi, and Bud Mishra for sharing their early TotalReCaller results, for Dan Valente for formulating the unified distortion model, and for Dina Esposito for useful comments on the chapter. Yaniv Erlich is an Andria and Paul Heafy family fellow.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaniv Erlich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Sheikh, M.A., Erlich, Y. (2012). Base-Calling for Bioinformaticians. In: Rodríguez-Ezpeleta, N., Hackenberg, M., Aransay, A. (eds) Bioinformatics for High Throughput Sequencing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0782-9_5

Download citation

Publish with us

Policies and ethics