Skip to main content

The Role of Spike-In Standards in the Normalization of RNA-seq

  • Chapter
  • First Online:

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

Abstract

Normalization of RNA-seq data is essential to ensure accurate inference of expression levels, by adjusting for sequencing depth and other more complex nuisance effects, both within and between samples. Recently, the External RNA Control Consortium (ERCC) developed a set of 92 synthetic spike-in standards that are commercially available and relatively easy to add to a typical library preparation. In this chapter, we compare the performance of several state-of-the-art normalization methods, including adaptations that directly use spike-in sequences as controls. We show that although the ERCC spike-ins could in principle be valuable for assessing accuracy in RNA-seq experiments, their read counts are not stable enough to be used for normalization purposes. We propose a novel approach to normalization that can successfully make use of control sequences to remove unwanted effects and lead to accurate estimation of expression fold-changes and tests of differential expression.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Throughout this chapter, we shall use the term sample to refer to an observational unit of interest, i.e., a set of reads from a given lane for a particular library. Thus, as indicated in Fig. 9.1b, there are 128 samples in total for the SEQC dataset, 64 of the reference Sample A type and 64 of the reference Sample B type.

References

  1. Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)

    Article  Google Scholar 

  2. Anders, S., Pyl, P.T., Huber, W.: HTSeq: a Python framework to work with high-throughput sequencing data. Technical Report, bioRxiv preprint (2014). doi:10.1101/002824

    Google Scholar 

  3. Baker, S.C., Bauer, S.R., Beyer, R.P., Brenton, J.D., Bromley, B., Burrill, J., Causton, H., Conley, M.P., Elespuru, R., Fero, M., et al.: The external RNA controls consortium: a progress report. Nat. Meth. 2(10), 731–734 (2005)

    Article  Google Scholar 

  4. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B 57, 289–300 (1995)

    MATH  MathSciNet  Google Scholar 

  5. Bolstad, B.M., Irizarry, R.A., Åstrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)

    Article  Google Scholar 

  6. Brennecke, P., Anders, S., Kim, J.K., Kołodziejczyk, A.A., Zhang, X., Proserpio, V., Baying, B., Benes, V., Teichmann, S.A., Marioni, J.C., Heisler, M.G.: Accounting for technical noise in single-cell RNA-seq experiments. Nat. Meth. 10, 1093–1095 (2013)

    Article  Google Scholar 

  7. Bullard, J., Purdom, E., Hansen, K., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinform. 11(1), 94 (2010)

    Article  Google Scholar 

  8. Canales, R.D., Luo, Y., Willey, J.C., Austermiller, B., Barbacioru, C.C., Boysen, C., Hunkapiller, K., Jensen, R.V., Knight, C.R., Lee, K.Y., et al.: Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24(9), 1115–1122 (2006)

    Article  Google Scholar 

  9. Cleveland, W.S.: Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74(368), 829–836 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  10. Dillies, M.A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., Keime, C., Marot, G., Castel, D., Estelle, J., et al.: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14(6), 671–683 (2013)

    Article  Google Scholar 

  11. Ferreira, T., Wilson, S.R., Choi, Y.G., Risso, D., Dudoit, S., Speed, T.P., Ngai, J.: Silencing of odorant receptor genes by G Protein β γ signaling ensures the expression of one odorant receptor per olfactory sensory neuron. Neuron 81, 847–859 (2014)

    Article  Google Scholar 

  12. Flicek, P., Amode, M.R., Barrell, D., Beal, K., Brent, S., Carvalho-Silva, D., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., et al.: Ensembl 2012. Nucleic Acids Res. 40(D1), D84–D90 (2012)

    Article  Google Scholar 

  13. Gagnon-Bartsch, J., Speed, T.: Using control genes to correct for unwanted variation in microarray data. Biostatistics 13(3), 539–552 (2012)

    Article  Google Scholar 

  14. Gagnon-Bartsch, J., Jacob, L., Speed, T.P.: Removing unwanted variation from high dimensional data with negative controls. Technical Report 820, Department of Statistics, University of California, Berkeley (2013)

    Google Scholar 

  15. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R.A., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G.K., Tierney, L., Yang, Y.H., Zhang, J.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004)

    Article  Google Scholar 

  16. Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38(12), e131 (2010)

    Article  Google Scholar 

  17. Hansen, K.D., Irizarry, R.A., Zhijin, W.: Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 13(2), 204–216 (2012)

    Article  Google Scholar 

  18. Jiang, L., Schlesinger, F., Davis, C.A., Zhang, Y., Li, R., Salit, M., Gingeras, T.R., Oliver, B.: Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21(9), 1543–1551 (2011)

    Article  Google Scholar 

  19. Lovén, J., Orlando, D., Sigova, A., Lin, C., Rahl, P., Burge, C., Levens, D., Lee, T., Young, R.: Revisiting global gene expression analysis. Cell 151(3), 476–482 (2012)

    Article  Google Scholar 

  20. Marioni, J., Mason, C., Mane, S., Stephens, M., Gilad, Y.: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18(9), 1509 (2008)

    Article  Google Scholar 

  21. McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman and Hall, New York (1989)

    Book  MATH  Google Scholar 

  22. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Meth. 5(7), 621–628 (2008)

    Article  Google Scholar 

  23. Oshlack, A., Wakefield, M.: Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct 4(1), 14 (2009)

    Article  Google Scholar 

  24. Oshlack, A., Emslie, D., Corcoran, L.M., Smyth, G.K.: Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes. Genome Biol. 8(1), R2 (2007)

    Article  Google Scholar 

  25. Qing, T., Yu, Y., Du, T., Shi, L.: mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies. Sci. China Life Sci. 56(2), 134–142 (2013)

    Article  Google Scholar 

  26. R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013). http://www.R-project.org

  27. Risso, D., Massa, M.S., Chiogna, M., Romualdi, C.: A modified LOESS normalization applied to microRNA arrays: a comparative evaluation. Bioinformatics 25(20), 2685–2691 (2009)

    Article  Google Scholar 

  28. Risso, D., Schwartz, K., Sherlock, G., Dudoit, S.: GC-content normalization for RNA-Seq data. BMC Bioinform. 12(1), 480 (2011)

    Article  Google Scholar 

  29. Risso, D., Ngai, J., Speed, T., Dudoit, S.: Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. (2014, in press).

    Google Scholar 

  30. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 12(3), R22 (2011)

    Article  Google Scholar 

  31. Robinson, M.D., McCarthy, D.J., Smyth, G.K.: edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)

    Google Scholar 

  32. Robinson, M.D., Oshlack, A.: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11(3), R25 (2010)

    Article  Google Scholar 

  33. Su, Z., Labaj, P., Li, S., Thierry-Mieg, J., Thierry-Mieg, D., Shi, W., et al.: Power and limitations of RNA-Seq. Nat. Biotechnol. (2014, in press)

    Google Scholar 

  34. Sun, Z., Zhu, Y.: Systematic comparison of RNA-Seq normalization methods using measurement error models. Bioinformatics 28(20), 2584–2591 (2012)

    Article  MathSciNet  Google Scholar 

  35. Tang, F., Barbacioru, C., Wang, Y., Nordman, E., Lee, C., Xu, N., Wang, X., Bodeau, J., Tuch, B.B., Siddiqui, A., et al.: mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Meth. 6(5), 377–382 (2009)

    Article  Google Scholar 

  36. Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)

    Article  Google Scholar 

  37. Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)

    Article  Google Scholar 

  38. Wu, D., Hu, Y., Tong, S., Williams, B.R., Smyth, G.K., Gantier, M.P.: The use of miRNA microarrays for the analysis of cancer samples with global miRNA decrease. RNA 19(7), 876–888 (2013)

    Article  Google Scholar 

  39. Yang, Y.H., Dudoit, S., Luu, P., Lin, D.M., Peng, V., Ngai, J., Speed, T.P.: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30(4), e15 (2002)

    Article  Google Scholar 

  40. Zheng, W., Chung, L.M., Zhao, H.: Bias detection and correction in RNA-sequencing data. BMC Bioinform. 12(1), 290 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

We thank Leming Shi for providing the SEQC pilot data and Laurent Jacob for his help with the software implementation of the RUV method.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Risso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Risso, D., Ngai, J., Speed, T.P., Dudoit, S. (2014). The Role of Spike-In Standards in the Normalization of RNA-seq. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_9

Download citation

Publish with us

Policies and ethics