Abstract
High-throughput platforms execute billions of simultaneous sequencing reactions. Base-calling is the process of decoding the output signals of these reactions into sequence reads. In this chapter, we detail the facets of base-calling using the perspective of signal communication. We primarily focus on the Illumina high-throughput sequencing platform and review different third-party base-calling implementations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR et al. 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218): 53–59.
Erlich Y, Mitra PP, delaBastide M, McCombie WR, Hannon GJ. 2008. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods 5(8): 679–682.
Ewing B, Green P. 1998. Base-calling of automated sequencer traces using Phred II error Âprobabilities. Genome Res 8(3): 186–194.
Ewing B, Hillier L, Wendl MC, Green P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8(3): 175–185.
Kailath T, Poor HV. 1998. Detection of stochastic processes. IEEE T. Inform Theory 44(6): 2230–2259.
Kao WC, Song YS. 2011. naiveBayesCall: An Efficient Model-Based Base-Calling Algorithm for High-Throughput Sequencing. J Comput Biol 18(3): 365–377.
Kao WC, Stevens K, Song YS. 2009. BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing. Genome Res 19(10): 1884–1895.
Kircher M, Stenzel U, Kelso J. 2009. Improved base-calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol 10(8): R83.
Kriseman J, Busick C, Szelinger S, Dinu V. 2010. BING: biomedical informatics pipeline for Next Generation Sequencing. J Biomed Inform 43(3): 428–434.
Ledergerber C, Dessimoz C. 2011. Base-calling for next-generation sequencing platforms. Brief Bioinform.
Li L, Speed TP. 1999. An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequencing. Electrophoresis 20(7): 1433–1442.
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950): 289–293.
Lister R, Pelizzola M, Kida YS, Hawkins RD, Nery JR, Hon G, Antosiewicz-Bourget J, O’Malley R, Castanon R, Klugman S et al. 2011. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471(7336): 68–73.
Metzker ML. 2010. Sequencing technologies – the next generation. Nat Rev Genet 11(1): 31–46.
Quinlan AR, Stewart DA, Stromberg MP, Marth GT. 2008. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Methods 5(2): 179–181.
Romiguier J, Ranwez V, Douzery EJ, Galtier N. 2010. Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. Genome Res 20(8): 1001–1009.
Rougemont J, Amzallag A, Iseli C, Farinelli L, Xenarios I, Naef F. 2008. Probabilistic base-calling of Solexa sequencing data. BMC Bioinformatics 9: 431.
Shenoi BA. 2006. Introduction to digital signal processing and filter design. Wiley ; John Wiley [distributor], Hoboken, NJ.
Sklar LA. 2005. Flow cytometry for biotechnology. Oxford University Press, New York.
Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1): 57–63.
Whiteford N, Skelly T, Curtis C, Ritchie ME, Lohr A, Zaranek AW, Abnizova I, Brown C. 2009. Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics 25(17): 2194–2199.
Wu X, Ding L, Li Z, Zhang Y, Liu X, Wang L. 2010. Determination of the migration of bisphenol diglycidyl ethers from food contact materials by high performance chromatography-tandem mass spectrometry coupled with multi-walled carbon nanotubes solid phase extraction. Se Pu 28(11): 1094–1098.
Acknowledgments
The authors would like to thank Fabian Menges, Giuseppe Narzisi, and Bud Mishra for sharing their early TotalReCaller results, for Dan Valente for formulating the unified distortion model, and for Dina Esposito for useful comments on the chapter. Yaniv Erlich is an Andria and Paul Heafy family fellow.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Sheikh, M.A., Erlich, Y. (2012). Base-Calling for Bioinformaticians. In: RodrÃguez-Ezpeleta, N., Hackenberg, M., Aransay, A. (eds) Bioinformatics for High Throughput Sequencing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0782-9_5
Download citation
DOI: https://doi.org/10.1007/978-1-4614-0782-9_5
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0781-2
Online ISBN: 978-1-4614-0782-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)