RNA-Seq Data Analysis Protocol: Combining In-House and Publicly Available Data

  • Marc W. SchmidEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1669)


Comparing gene expression profiles measured in a wide range of different tissue types, at different developmental stages, or under different environmental conditions can yield valuable insights into the mechanisms of cell/tissue specification and differentiation, or identify cell/tissue-type specific responses to environmental stimuli. Critical for such comparisons is the identical processing of data from different sources. This may also include the integration of a novel data set into an existing collection of data sets (e.g., in-house and publicly available data). Here, I describe a complete workflow for RNA-Seq data, from data processing steps to the comparison of gene expression profiles measured with RNA-Seq. I use publicly available data for demonstration purposes, but I also describe how to integrate your own data sets. The workflow runs on all three major operating systems (Linux, MacOS, and Windows). The scripts and the tutorial can be accessed on

Key words

RNA-Seq Public data Data integration Analysis Differential expression Multigroup comparisons Gene expression Transcriptome Workflow 


  1. 1.
    Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-Seq data analysis. Genome Biol 17:13CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. Google Scholar
  3. 3.
    Liao Y, Smyth GK, Shi W (2013) The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41:e108CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Durinck S, Spellman P, Birney E et al (2009) Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4:1184–1191CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-Seq data. Genome Biol 11:R25CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Ritchie ME, Phipson B, Wu D et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930CrossRefPubMedGoogle Scholar
  9. 9.
    Schmid MW, Grossniklaus U (2015) Rcount: simple and flexible RNA-Seq read counting. Bioinformatics 31:436–437CrossRefPubMedGoogle Scholar
  10. 10.
    Li X, Nair A, Wang S et al (2015) Quality control of RNA-Seq experiments. Methods Mol Biol 1269:137–146CrossRefPubMedGoogle Scholar
  11. 11.
    Qi W, Schlapbach R, Rehrauer H (2017) RNA-seq data analysis: from raw data quality control to differential expression analysis. In: Schmidt A (ed) Plant germline development. Methods in molecular biology. Springer, DordrechtGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZürichSwitzerland
  2. 2.Department of Plant and Microbial BiologyUniversity of ZurichZürichSwitzerland
  3. 3.URPP Global Change and BiodiversityUniversity of ZurichZürichSwitzerland
  4. 4.S3ITUniversity of ZurichZürichSwitzerland

Personalised recommendations