Statistical Analysis of Next Generation Sequencing Data

Part of the series Frontiers in Probability and the Statistical Sciences pp 169-190


The Role of Spike-In Standards in the Normalization of RNA-seq

  • Davide RissoAffiliated withDepartment of Statistics, University of California Email author 
  • , John NgaiAffiliated withDepartment of Molecular and Cell Biology, Helen Wills Neuroscience Institute, and Functional Genomics Laboratory, University of California
  • , Terence P. SpeedAffiliated withDepartment of Statistics, University of CaliforniaBioinformatics Division, Walter and Eliza Hall InstituteDepartment of Mathematics and Statistics, The University of Melbourne
  • , Sandrine DudoitAffiliated withDivision of Biostatistics and Department of Statistics, University of California

* Final gross prices may vary according to local VAT.

Get Access


Normalization of RNA-seq data is essential to ensure accurate inference of expression levels, by adjusting for sequencing depth and other more complex nuisance effects, both within and between samples. Recently, the External RNA Control Consortium (ERCC) developed a set of 92 synthetic spike-in standards that are commercially available and relatively easy to add to a typical library preparation. In this chapter, we compare the performance of several state-of-the-art normalization methods, including adaptations that directly use spike-in sequences as controls. We show that although the ERCC spike-ins could in principle be valuable for assessing accuracy in RNA-seq experiments, their read counts are not stable enough to be used for normalization purposes. We propose a novel approach to normalization that can successfully make use of control sequences to remove unwanted effects and lead to accurate estimation of expression fold-changes and tests of differential expression.