The Role of Spike-In Standards in the Normalization of RNA-seq

Risso, Davide; Ngai, John; Speed, Terence P.; Dudoit, Sandrine

doi:10.1007/978-3-319-07212-8_9

The Role of Spike-In Standards in the Normalization of RNA-seq

Davide Risso⁸,
John Ngai⁹,
Terence P. Speed^8,10,11 &
…
Sandrine Dudoit¹²

Chapter
First Online: 01 January 2014

8872 Accesses
3 Citations
6 Altmetric

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

Abstract

Normalization of RNA-seq data is essential to ensure accurate inference of expression levels, by adjusting for sequencing depth and other more complex nuisance effects, both within and between samples. Recently, the External RNA Control Consortium (ERCC) developed a set of 92 synthetic spike-in standards that are commercially available and relatively easy to add to a typical library preparation. In this chapter, we compare the performance of several state-of-the-art normalization methods, including adaptations that directly use spike-in sequences as controls. We show that although the ERCC spike-ins could in principle be valuable for assessing accuracy in RNA-seq experiments, their read counts are not stable enough to be used for normalization purposes. We propose a novel approach to normalization that can successfully make use of control sequences to remove unwanted effects and lead to accurate estimation of expression fold-changes and tests of differential expression.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Throughout this chapter, we shall use the term sample to refer to an observational unit of interest, i.e., a set of reads from a given lane for a particular library. Thus, as indicated in Fig. 9.1b, there are 128 samples in total for the SEQC dataset, 64 of the reference Sample A type and 64 of the reference Sample B type.

References

Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)
Article Google Scholar
Anders, S., Pyl, P.T., Huber, W.: HTSeq: a Python framework to work with high-throughput sequencing data. Technical Report, bioRxiv preprint (2014). doi:10.1101/002824
Google Scholar
Baker, S.C., Bauer, S.R., Beyer, R.P., Brenton, J.D., Bromley, B., Burrill, J., Causton, H., Conley, M.P., Elespuru, R., Fero, M., et al.: The external RNA controls consortium: a progress report. Nat. Meth. 2(10), 731–734 (2005)
Article Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B 57, 289–300 (1995)
MATH MathSciNet Google Scholar
Bolstad, B.M., Irizarry, R.A., Åstrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)
Article Google Scholar
Brennecke, P., Anders, S., Kim, J.K., Kołodziejczyk, A.A., Zhang, X., Proserpio, V., Baying, B., Benes, V., Teichmann, S.A., Marioni, J.C., Heisler, M.G.: Accounting for technical noise in single-cell RNA-seq experiments. Nat. Meth. 10, 1093–1095 (2013)
Article Google Scholar
Bullard, J., Purdom, E., Hansen, K., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinform. 11(1), 94 (2010)
Article Google Scholar
Canales, R.D., Luo, Y., Willey, J.C., Austermiller, B., Barbacioru, C.C., Boysen, C., Hunkapiller, K., Jensen, R.V., Knight, C.R., Lee, K.Y., et al.: Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24(9), 1115–1122 (2006)
Article Google Scholar
Cleveland, W.S.: Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74(368), 829–836 (1979)
Article MATH MathSciNet Google Scholar
Dillies, M.A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., et al.: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14(6), 671–683 (2013)
Article Google Scholar
Ferreira, T., Wilson, S.R., Choi, Y.G., Risso, D., Dudoit, S., Speed, T.P., Ngai, J.: Silencing of odorant receptor genes by G Protein β γ signaling ensures the expression of one odorant receptor per olfactory sensory neuron. Neuron 81, 847–859 (2014)
Article Google Scholar
Flicek, P., Amode, M.R., Barrell, D., Beal, K., Brent, S., Carvalho-Silva, D., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., et al.: Ensembl 2012. Nucleic Acids Res. 40(D1), D84–D90 (2012)
Article Google Scholar
Gagnon-Bartsch, J., Speed, T.: Using control genes to correct for unwanted variation in microarray data. Biostatistics 13(3), 539–552 (2012)
Article Google Scholar
Gagnon-Bartsch, J., Jacob, L., Speed, T.P.: Removing unwanted variation from high dimensional data with negative controls. Technical Report 820, Department of Statistics, University of California, Berkeley (2013)
Google Scholar
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R.A., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G.K., Tierney, L., Yang, Y.H., Zhang, J.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004)
Article Google Scholar
Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38(12), e131 (2010)
Article Google Scholar
Hansen, K.D., Irizarry, R.A., Zhijin, W.: Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 13(2), 204–216 (2012)
Article Google Scholar
Jiang, L., Schlesinger, F., Davis, C.A., Zhang, Y., Li, R., Salit, M., Gingeras, T.R., Oliver, B.: Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21(9), 1543–1551 (2011)
Article Google Scholar
Lovén, J., Orlando, D., Sigova, A., Lin, C., Rahl, P., Burge, C., Levens, D., Lee, T., Young, R.: Revisiting global gene expression analysis. Cell 151(3), 476–482 (2012)
Article Google Scholar
Marioni, J., Mason, C., Mane, S., Stephens, M., Gilad, Y.: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18(9), 1509 (2008)
Article Google Scholar
McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman and Hall, New York (1989)
Book MATH Google Scholar
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Meth. 5(7), 621–628 (2008)
Article Google Scholar
Oshlack, A., Wakefield, M.: Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct 4(1), 14 (2009)
Article Google Scholar
Oshlack, A., Emslie, D., Corcoran, L.M., Smyth, G.K.: Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes. Genome Biol. 8(1), R2 (2007)
Article Google Scholar
Qing, T., Yu, Y., Du, T., Shi, L.: mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies. Sci. China Life Sci. 56(2), 134–142 (2013)
Article Google Scholar
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.R-project.org
Risso, D., Massa, M.S., Chiogna, M., Romualdi, C.: A modified LOESS normalization applied to microRNA arrays: a comparative evaluation. Bioinformatics 25(20), 2685–2691 (2009)
Article Google Scholar
Risso, D., Schwartz, K., Sherlock, G., Dudoit, S.: GC-content normalization for RNA-Seq data. BMC Bioinform. 12(1), 480 (2011)
Article Google Scholar
Risso, D., Ngai, J., Speed, T., Dudoit, S.: Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. (2014, in press).
Google Scholar
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 12(3), R22 (2011)
Article Google Scholar
Robinson, M.D., McCarthy, D.J., Smyth, G.K.: edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)
Google Scholar
Robinson, M.D., Oshlack, A.: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11(3), R25 (2010)
Article Google Scholar
Su, Z., Labaj, P., Li, S., Thierry-Mieg, J., Thierry-Mieg, D., Shi, W., et al.: Power and limitations of RNA-Seq. Nat. Biotechnol. (2014, in press)
Google Scholar
Sun, Z., Zhu, Y.: Systematic comparison of RNA-Seq normalization methods using measurement error models. Bioinformatics 28(20), 2584–2591 (2012)
Article MathSciNet Google Scholar
Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., Bodeau, J., Tuch, B.B., Siddiqui, A., et al.: mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Meth. 6(5), 377–382 (2009)
Article Google Scholar
Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)
Article Google Scholar
Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)
Article Google Scholar
Wu, D., Hu, Y., Tong, S., Williams, B.R., Smyth, G.K., Gantier, M.P.: The use of miRNA microarrays for the analysis of cancer samples with global miRNA decrease. RNA 19(7), 876–888 (2013)
Article Google Scholar
Yang, Y.H., Dudoit, S., Luu, P., Lin, D.M., Peng, V., Ngai, J., Speed, T.P.: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30(4), e15 (2002)
Article Google Scholar
Zheng, W., Chung, L.M., Zhao, H.: Bias detection and correction in RNA-sequencing data. BMC Bioinform. 12(1), 290 (2011)
Article Google Scholar

Download references

Acknowledgements

We thank Leming Shi for providing the SEQC pilot data and Laurent Jacob for his help with the software implementation of the RUV method.

Author information

Authors and Affiliations

Department of Statistics, University of California, Berkeley, CA, USA
Davide Risso & Terence P. Speed
Department of Molecular and Cell Biology, Helen Wills Neuroscience Institute, and Functional Genomics Laboratory, University of California, Berkeley, CA, USA
John Ngai
Bioinformatics Division, Walter and Eliza Hall Institute, Melbourne, VIC, Australia
Terence P. Speed
Department of Mathematics and Statistics, The University of Melbourne, Victoria, Australia
Terence P. Speed
Division of Biostatistics and Department of Statistics, University of California, Berkeley, CA, USA
Sandrine Dudoit

Authors

Davide Risso
View author publications
You can also search for this author in PubMed Google Scholar
John Ngai
View author publications
You can also search for this author in PubMed Google Scholar
Terence P. Speed
View author publications
You can also search for this author in PubMed Google Scholar
Sandrine Dudoit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davide Risso .

Editor information

Editors and Affiliations

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, USA
Somnath Datta
Department of Statistics, Iowa State University, Ames, Iowa, USA
Dan Nettleton

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Risso, D., Ngai, J., Speed, T.P., Dudoit, S. (2014). The Role of Spike-In Standards in the Normalization of RNA-seq. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-07212-8_9
Published: 17 June 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07211-1
Online ISBN: 978-3-319-07212-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics