A New Bioinformatic Pipeline to Address the Most Common Requirements in RNA-seq Data Analysis

  • Osvaldo Graña
  • Miriam Rubio-Camarillo
  • Florentino Fdez-Riverola
  • David G. Pisano
  • Daniel Glez-PeñaEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 375)


Many bioinformatic programs have been developed to analyze data from RNA-seq experiments. These programs are widely used and often included in computational pipelines. Nevertheless, there does not seem to be a precise definition of what constitutes a proper workflow for this kind of data. We present here a new workflow that takes into account the most common requirements for RNA-seq analysis, and that is implemented as an automatic pipeline to perform an efficient and complete evaluation.


RNA-seq NGS Pipeline Transcriptomics 



This work was partially funded by the [14VI05] Contract-Programme from the University of Vigo. Also, it was supported by the European Union’s Seventh Framework Programme FP7/REGPOT-2012-2013.1 under grant agreement n° 316265 (BIOCAPS), the Agrupamento INBIOMED from DXPCTSUG-FEDER “unha maneira de facer Europa” (2012/273) and the “Platform of integration of intelligent techniques for analysis of biomedical information” project (TIN2013-47153-C3-3-R) from the Spanish Ministry of Economy and Competitiveness.


  1. 1.
    Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)CrossRefGoogle Scholar
  2. 2.
    International Cancer Genome Consortium, et al: International network of cancer genome projects. Nature 464(7291), 993–998 (2010)CrossRefGoogle Scholar
  3. 3.
    Abbott, A.: Europe to map the human epigenome. Nature 477(7366), 518 (2011)CrossRefGoogle Scholar
  4. 4.
    ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)CrossRefGoogle Scholar
  5. 5.
    Cancer Genome Atlas Research Network et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)Google Scholar
  6. 6.
    Goncalves, A., Tikhonov, A., Brazma, A., Kapushesky, M.: A pipeline for RNA-seq data processing and quality assessment. Bioinformatics 27(6), 867–869 (2011)CrossRefGoogle Scholar
  7. 7.
    Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)Google Scholar
  8. 8.
    Cumbie, J.S., Kimbrel, J.A., Di, Y., Schafer, D.W., Wilhelm, L.J., Fox, S.E., Sullivan, C.M., Curzon, A.D., Carrington, J.C., Mockler, T.C., Chang, J.H.: GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences. PLoS ONE 6(10), e25279 (2011)CrossRefGoogle Scholar
  9. 9.
    Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., Mesirov, J.P.: GenePattern 2.0. Nat. Genet. 38(5), 500–501 (2006)CrossRefGoogle Scholar
  10. 10.
    Knowles, D.G., Röder, M., Merkel, A., Guigó, R.: Grape RNA-Seq analysis pipeline environment. Bioinformatics 29(5), 614–621 (2013)CrossRefGoogle Scholar
  11. 11.
    Kalari, K.R., Nair, A.A., Bhavsar, J.D., O’Brien, D.R., Davila, J.I., Bockol, M.A., Nie, J., Tang, X., Baheti, S., Doughty, J.B., Middha, S., Sicotte, H., Thompson, A.E., Asmann, Y.W., Kocher, J.P.: MAP-RSeq: mayo analysis pipeline for RNA sequencing. BMC Bioinform. 15, 224 (2014)CrossRefGoogle Scholar
  12. 12.
    Torres-García, W., Zheng, S., Sivachenko, A., Vegesna, R., Wang, Q., Yao, R., Berger, M.F., Weinstein, J.N., Getz, G., Verhaak, R.G.: PRADA: pipeline for RNA sequencing data analysis. Bioinformatics 30(15), 2224–2226 (2014)CrossRefGoogle Scholar
  13. 13.
    Engström, P.G., Steijger, T., Sipos, B., Grant, G.R., Kahles, A., Rätsch, G., Goldman, N., Hubbard, T.J., Harrow, J., Guigó, R.: Bertone P; RGASP Consortium. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10(12), 1185–1191 (2013)CrossRefGoogle Scholar
  14. 14.
    Soneson, C., Delorenzi, M.: A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinform. 14, 91 (2013)CrossRefGoogle Scholar
  15. 15.
    Rapaport, F., Khanin, R., Liang, Y., Pirun, M., Krek, A., Zumbo, P., Mason, C.E., Socci, N.D., Betel, D.: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 14(9), R95 (2013)CrossRefGoogle Scholar
  16. 16.
    Steijger, T., Abril, J.F., Engström, P.G., Kokocinski, F., Hubbard, T.J., Guigó, R., Harrow, J., Bertone, P.: RGASP Consortium. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10(12), 1177–1184 (2013)Google Scholar
  17. 17.
    Fonseca, N.A., Marioni, J., Brazma, A.: RNA-Seq gene profiling - A systematic empirical comparison. PLoS ONE 9(9), e107026 (2014)CrossRefGoogle Scholar
  18. 18.
    Rubio-Camarillo, M., Gómez-López, G., Fernández, J.M., Valencia, A., Pisano, D.G.: RUbioSeq: a suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses. Bioinformatics 29(13), 1687–1689 (2013)CrossRefGoogle Scholar
  19. 19.
    Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L., Rice, P.M.: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucl. Acids Res. 38(6), 1767–1771 (2010)CrossRefGoogle Scholar
  20. 20.
    Trapnell, C., et al.: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012)CrossRefGoogle Scholar
  21. 21.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)CrossRefGoogle Scholar
  22. 22.
    Li, H., et al.: The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)CrossRefGoogle Scholar
  23. 23.
    Lovén, J., Orlando, D.A., Sigova, A.A., Lin, C.Y., Rahl, P.B., Burge, C.B., Levens, D.L., Lee, T.I., Young, R.A.: Revisiting global gene expression analysis. Cell 151(3), 476–482 (2012)CrossRefGoogle Scholar
  24. 24.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: a knowledge based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U S A 102(43), 15545–15550 (2005)CrossRefGoogle Scholar
  25. 25.
    Anders, S., Pyl, P.T., Huber, W.: HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2), 166–169 (2015)CrossRefGoogle Scholar
  26. 26.
    Anders, S., McCarthy, D.J., Chen, Y., Okoniewski, M., Smyth, G.K., Huber, W., Robinson, M.D.: Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat. Protoc. 8(9), 1765–1786 (2013)CrossRefGoogle Scholar
  27. 27.
    Kim, D., Salzberg, S.L.: TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12(8), R72 (2011)CrossRefGoogle Scholar
  28. 28.
    Quinlan, A.R., Hall, I.M.: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6), 841–842 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Osvaldo Graña
    • 1
  • Miriam Rubio-Camarillo
    • 2
  • Florentino Fdez-Riverola
    • 3
  • David G. Pisano
    • 1
  • Daniel Glez-Peña
    • 3
    Email author
  1. 1.Bioinformatics Unit, Structural Biology and BioComputing ProgrammeSpanish National Cancer Research Centre (CNIO)MadridSpain
  2. 2.Structural Computational Biology Group, Structural Biology and BioComputing ProgrammeSpanish National Cancer Research Centre (CNIO)MadridSpain
  3. 3.ESEI - Escuela Superior de Ingeniería Informática Edificio PolitécnicoCampus Universitario as Lagoas S/N Universidad de VigoOurenseSpain

Personalised recommendations