Combining Single and Paired End RNA-seq Data for Differential Expression Analyses

  • Zhi-Ping Feng
  • Francois Collin
  • Terence P. SpeedEmail author
Conference paper
Part of the Abel Symposia book series (ABEL, volume 11)


Combining RNA-seq data from different platforms should increase the power to detect differentially expressed genes, but may not be straightforward. Here we show how RUVs, a recently published method for removing unwanted variation and normalizing RNA-seq data, can combine the counts of single and paired end read libraries from formalin fixed, paraffin embedded tumor samples to permit differential expression analysis. Seven other intra- or inter-platform normalization methods are also described and the results are compared with those from RUVs.


Differential Expression Analysis Volcano Plot Unwanted Variation Read Library Negative Control Gene 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Dr. Johann Gagnon-Bartsch and Dr. Davide Risso for providing the latest version of the RUV packages and Prof. Gorden Smyth for the explanation of variances estimation in edgeR and voom-limma.


  1. 1.
    van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–6 (2002)CrossRefGoogle Scholar
  2. 2.
    Grada, A., Weinbrecht, K.: Next-generation sequencing: methodology and application. J. Invest. Dermatol. 133, e11 (2013)CrossRefGoogle Scholar
  3. 3.
    Ching, T., Huang, S., Garmire, L.X.: Power analysis and sample size estimation for RNA-Seq differential expression. RNA 20, 1684–1696 (2014)CrossRefGoogle Scholar
  4. 4.
    Liu, Y., Zhou, J., White, K.P.: RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 30, 301–304 (2014)CrossRefGoogle Scholar
  5. 5.
    Ross, M.G., Russ, C., Costello, M., Hollinger, A., Lennon, N.J., Hegarty, R., Nusbaum, C. Jaffe, D.B.: Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013)CrossRefGoogle Scholar
  6. 6.
    Battke, F., Nieselt, K.: Mayday SeaSight: combined analysis of deep sequencing and microarray data. PLoS One 6, e16345 (2011)CrossRefGoogle Scholar
  7. 7.
    Philippe, N., Bou Samra, E., Boureux, A., Mancheron, A., Ruffle, F., Bai, Q., De Vos, J., Rivals, E., Commes, T.: Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome. Nucleic Acids Res. 42, 2820–2832 (2014)CrossRefGoogle Scholar
  8. 8.
    Wu, J.Q., Habegger, L., Noisa, P., Szekely, A., Qiu, C., Hutchison, S., Raha, D., Egholm, M., Lin, H., Weissman, S., Cui, W., Gerstein, M., Snyder, M.: Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc. Natl. Acad. Sci. U.S.A. 107, 5254–5259 (2010)CrossRefGoogle Scholar
  9. 9.
    Kashofer, K., Viertler, C., Pichler, M., Zatloukal, K.: Quality control of RNA preservation and extraction from paraffin-embedded tissue: implications for RT-PCR and microarray analysis. PLoS One 8, e70714 (2013)CrossRefGoogle Scholar
  10. 10.
    von Ahlfen, S., Missel, A., Bendrat, K., Schlumpberger, M.: Determinants of RNA quality from FFPE samples. PLoS One 2, e1261 (2007)CrossRefGoogle Scholar
  11. 11.
    Sinicropi, D., Qu, K., Collin, F., Crager, M., Liu, M.L., Pelham, R.J., Pho, M., Dei Rossi, A., Jeong, J., Scott, A., Ambannavar, R., Zheng, C., Mena, R., Esteban, J., Stephans, J., Morlan, J., Baker, J.: Whole transcriptome RNA-Seq analysis of breast cancer recurrence risk using formalin-fixed paraffin-embedded tumor tissue. PLoS One 7, e40092 (2012)CrossRefGoogle Scholar
  12. 12.
    Cobleigh, M.A., Tabesh, B., Bitterman, P., Baker, J., Cronin, M., Liu, M.L., Borchik, R., Mosquera, J.M., Walker, M.G., Shak, S.: Tumor gene expression and prognosis in breast cancer patients with 10 or more positive lymph nodes. Clin. Cancer Res. 11, 8623–8631 (2005)CrossRefGoogle Scholar
  13. 13.
    Paik, S., Tang, G., Shak, S., Kim, C., Baker, J., Kim, W., Cronin, M., Baehner, F.L., Watson, D., Bryant, J., Costantino, J.P., Geyer, C.E. Jr., Wickerham, D.L., Wolmark, N.: Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. J. Clin. Oncol. 24, 3726–3734 (2006)CrossRefGoogle Scholar
  14. 14.
    Paik, S., Shak, S., Tang, G., Kim, C., Baker, J., Cronin, M., Baehner, F.L., Walker, M.G., Watson, D., Park, T., Hiller, W., Fisher, E.R., Wickerham, D.L., Bryant, J., Wolmark, N.: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817–2826 (2004)CrossRefGoogle Scholar
  15. 15.
    Cronin, M., Sangli, C., Liu, M.L., Pho, M., Dutta, D., Nguyen, A., Jeong, J., Wu, J., Langone, K.C., Watson, D.: Analytical validation of the Oncotype DX genomic diagnostic test for recurrence prognosis and therapeutic response prediction in node-negative, estrogen receptor-positive breast cancer. Clin. Chem. 53, 1084–1091 (2007)CrossRefGoogle Scholar
  16. 16.
    Law, C.W., Chen, Y., Shi, W., Smyth, G.K.: Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014)CrossRefGoogle Scholar
  17. 17.
    Robinson, M.D., Oshlack, A.: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010)CrossRefGoogle Scholar
  18. 18.
    Risso, D., Ngai, J., Speed, T.P., Dudoit, S.: Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014)CrossRefGoogle Scholar
  19. 19.
    Smyth, G.K.: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, Article 3, 1–25 (2004)Google Scholar
  20. 20.
    Smyth, G.K., Yang, Y.H., Speed, T.: Statistical issues in cDNA microarray data analysis. Methods Mol. Biol. 224, 111–136 (2003)Google Scholar
  21. 21.
    Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., Smyth, G.K.: limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47, 1–13 (2015)Google Scholar
  22. 22.
    Phipson, B., Lee, S., Majewski, I.J., Alexander, W.S., Smyth, G.K.: Technical report. Empirical Bayes in the presence of exceptional cases, with application to microarray data (2013)Google Scholar
  23. 23.
    Eisenberg, E., Levanon, E.Y.: Human housekeeping genes are compact. Trends Genet. 19, 362–365 (2003)CrossRefGoogle Scholar
  24. 24.
    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Robinson, M.D., McCarthy, D.J., Smyth, G.K.: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010)CrossRefGoogle Scholar
  26. 26.
    McCarthy, D.J., Chen, Y., Smyth, G.K.: Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 40, 4288–4297 (2012)CrossRefGoogle Scholar
  27. 27.
    Bullard, J.H., Purdom, E., Hansen, K.D., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinf. 11, 94 (2010)CrossRefGoogle Scholar
  28. 28.
    Gagnon-Bartsch, J.A., Jacob, L., Speed, T.P.: Removing Unwanted Variation from High Dimensional Data with Negative Controls. Technical report. Available via DIALOG (2013), Google Scholar
  29. 29.
    Gagnon-Bartsch, J.A., Speed, T.P.: Using control genes to correct for unwanted variation in microarray data. Biostatistics 13, 539–552 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Zhi-Ping Feng
    • 1
  • Francois Collin
    • 2
  • Terence P. Speed
    • 1
    Email author
  1. 1.The Walter and Eliza Hall Institute of Medical ResearchParkvilleAustralia
  2. 2.Genomic Health, IncRedwood CityUSA

Personalised recommendations