A Method for Cross-Species Visualization and Analysis of RNA-Sequence Data

  • Stephen A. RamseyEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1702)


In this methods article, I describe a computational workflow for cross-species visualization and comparison of mRNA-seq transcriptome profiling data. The workflow is based on gene set variation analysis (GSVA) and is illustrated using commands in the R programming language. I provide a complete step-by-step procedure for the workflow using mRNA-seq data sets from dog and human bladder cancer as an example.

Key words

mRNA-seq Cross-species Transcriptome Bioinformatics Gene function 



This work was supported by the National Science Foundation (award 1553728-DBI), the PhRMA Foundation (Research Starter Grant in Informatics), the Medical Research Foundation of Oregon (New Investigator Grant), and the Animal Cancer Foundation (Comparative Oncology Award). S.A.R. thanks Shay Bracha and Cheri Goodall for kindly providing the dog bladder RNA samples that were used in the transcriptome profiling study [3], Tanjin Xu for assistance with the mRNA-seq data processing, Brent Kronmiller for help with designing the dog mRNA-seq study, and Ilya Shmulevich, Sheila Reynolds, and Matti Nykter for advice.


  1. 1.
    Mortazavi A, Williams BA, McCue K et al (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628. CrossRefPubMedGoogle Scholar
  2. 2.
    Lister R, O'Malley RC, Tonti-Filippini J et al (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133:523–536. CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Ramsey SA, Xu T, Goodall C et al (2017) Cross-species analysis of the canine and human bladder cancer transcriptome and exome. Genes Chrom Cancer (4):56, 328–343.
  4. 4.
    Fowles JS, Brown KC, Hess AM et al (2016) Intra- and interspecies gene expression models for predicting drug response in canine osteosarcoma. BMC Bioinformatics 17:93. CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Dhawan D, Paoloni M, Shukradas S et al (2015) Comparative gene expression analyses identify luminal and basal subtypes of canine invasive urothelial carcinoma that mimic patterns in human invasive bladder cancer. PLoS One 10:e0136688. CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Seok J, Warren HS, Cuenca AG et al (2013) Genomic responses in mouse models poorly mimic human inflammatory diseases. Proc Natl Acad Sci 110:3507–3512. CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Shay T, Jojic V, Zuk O et al (2013) Conservation and divergence in the transcriptional programs of the human and mouse immune systems. Proc Natl Acad Sci 110:2946–2951. CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Chan ET, Quon GT, Chua G et al (2009) Conservation of core gene expression in vertebrate tissues. J Biol 8:33. CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Brawand D, Soumillon M, Necsulea A et al (2011) The evolution of gene expression levels in mammalian organs. Nature 478:343–348. CrossRefPubMedGoogle Scholar
  10. 10.
    Lin S, Lin Y, Nery JR et al (2014) Comparison of the transcriptional landscapes between human and mouse tissues. Proc Natl Acad Sci 111:17224–17229. CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Gilad Y, Mizrahi-Man O (2015) A reanalysis of mouse ENCODE comparative gene expression data. F1000Research 4:121.  10.12688/f1000research.6536.1 PubMedPubMedCentralGoogle Scholar
  12. 12.
    Sudmant PH, Alexis MS, Burge CB (2015) Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biol 16:287. CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Hänzelmann S, Castelo R, Guinney J (2013) GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14:7. CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    NIH Genomic Data Commons Data Portal (2016) v. 4.0.
  15. 15.
    Ripley BD (2001) The R project in statistical computing (2001). MSOR Connections. Newsl LTSN Maths Stat OR Network 1:23–25Google Scholar
  16. 16.
    Ihaka R, Gentleman R (1995) R: a language for data analysis and graphics. J Comp Graph Stat 5:299–314Google Scholar
  17. 17.
    Hornik K (2012) The comprehensive R archive network. Comput Stat 4:394–398. CrossRefGoogle Scholar
  18. 18.
    Wickham H (2007) Reshaping data with the {reshape} package. J Stat Software 21:1–20CrossRefGoogle Scholar
  19. 19.
    Wickham H (2009) ggplot2: elegant graphics for dataanalysis. Springer, New York, NYCrossRefGoogle Scholar
  20. 20.
    Love MI, Huber W, Anders S (2013) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. CrossRefGoogle Scholar
  21. 21.
    Smedley D, Haider S, Ballester B et al (2009) BioMart—biological queries made easy. BMC Genomics 10:22. CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Cunningham F, Amode MR, Barrell D et al (2015) Ensembl 2015. Nucleic Acids Res 43:D662–D669. CrossRefPubMedGoogle Scholar
  23. 23.
    Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25:25–29. CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Subramanian A, Tamayo P, Mootha VK et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci 102:15545–15550. CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Liberzon A (2014) A description of the Molecular Signatures Database (MSigDB) Web site. Methods Mol Biol 1150:153–160. CrossRefPubMedGoogle Scholar
  26. 26.
    Molecular Signatures Database (MSigDB) (2016) v. 5.2.
  27. 27.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. CrossRefPubMedGoogle Scholar
  28. 28.
    Wickham H (2014) Tidy data. J Stat Software 59:10.  10.18637/jss.v059.i10 CrossRefGoogle Scholar
  29. 29.
    Lin Y, Golovnina K, Chen Z-X et al (2016) Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster. BMC Genomics 17:28. CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    George NI, Chang C-W (2014) DAFS: a data-adaptive flag method for RNA-sequencing data to differentiate genes with low and high expression. BMC Bioinformatics 15:92. CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Cox MAA, Cox TF (2001) Multidimensional scaling, 2nd edn. Chapman and Hall, Boca Raton, FLGoogle Scholar
  32. 32.
    Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300Google Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  1. 1.Oregon State UniversityCorvallisUSA

Personalised recommendations