TROM: A Testing-Based Method for Finding Transcriptomic Similarity of Biological Samples
Comparative transcriptomics has gained increasing popularity in genomic research thanks to the development of high-throughput technologies including microarray and next-generation RNA sequencing that have generated numerous transcriptomic data. An important question is to understand the conservation and divergence of biological processes in different species. We propose a testing-based method TROM (Transcriptome Overlap Measure) for comparing transcriptomes within or between different species, and provide a different perspective, in contrast to traditional correlation analyses, about capturing transcriptomic similarity. Specifically, the TROM method focuses on identifying associated genes that capture molecular characteristics of biological samples, and subsequently comparing the biological samples by testing the overlap of their associated genes. We use simulation and real data studies to demonstrate that TROM is more powerful in identifying similar transcriptomes and more robust to stochastic gene expression noise than Pearson and Spearman correlations. We apply TROM to compare the developmental stages of six Drosophila species, C. elegans, S. purpuratus, D. rerio and mouse liver, and find interesting correspondence patterns that imply conserved gene expression programs in the development of these species. The TROM method is available as an R package on CRAN (https://cran.r-project.org/package=TROM) with manuals and source codes available at http://www.stat.ucla.edu/~jingyi.li/software-and-data/trom.html.
KeywordsTranscriptomic similarity measure Multi-species developmental stages Robustness to platform differences Comparative transcriptomics Microarray vs. RNA-seq Pearson correlation coefficient Spearman correlation coefficient overlap test
- 11.Hicks SC, Irizarry RA (2014) When to use quantile normalization? bioRxiv. doi:10.1101/012203
- 21.Tong X, Feng Y, Li JJ (2016) Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC) curves. arXiv preprint arXiv:1608.03109
- 23.Virmani AK, Tsou JA, Siegmund KD, Shen LY, Long TI, Laird PW, Gazdar AF, Laird-Offringa IA (2002) Hierarchical clustering of lung cancer cell lines using DNA methylation markers. Cancer Epidemiol Biomark Prevent 11(3):291–297Google Scholar
- 24.Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, Fang H, Hong H, Shen J, Su Z et al (2014) The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol 32(9):926–932Google Scholar
- 26.Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X (2014) Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PloS One 9(1)Google Scholar