Statistics in Biosciences

, Volume 9, Issue 1, pp 105–136 | Cite as

TROM: A Testing-Based Method for Finding Transcriptomic Similarity of Biological Samples

Article

Abstract

Comparative transcriptomics has gained increasing popularity in genomic research thanks to the development of high-throughput technologies including microarray and next-generation RNA sequencing that have generated numerous transcriptomic data. An important question is to understand the conservation and divergence of biological processes in different species. We propose a testing-based method TROM (Transcriptome Overlap Measure) for comparing transcriptomes within or between different species, and provide a different perspective, in contrast to traditional correlation analyses, about capturing transcriptomic similarity. Specifically, the TROM method focuses on identifying associated genes that capture molecular characteristics of biological samples, and subsequently comparing the biological samples by testing the overlap of their associated genes. We use simulation and real data studies to demonstrate that TROM is more powerful in identifying similar transcriptomes and more robust to stochastic gene expression noise than Pearson and Spearman correlations. We apply TROM to compare the developmental stages of six Drosophila species, C. elegans, S. purpuratus, D. rerio and mouse liver, and find interesting correspondence patterns that imply conserved gene expression programs in the development of these species. The TROM method is available as an R package on CRAN (https://cran.r-project.org/package=TROM) with manuals and source codes available at http://www.stat.ucla.edu/~jingyi.li/software-and-data/trom.html.

Keywords

Transcriptomic similarity measure Multi-species developmental stages Robustness to platform differences Comparative transcriptomics Microarray vs. RNA-seq Pearson correlation coefficient Spearman correlation coefficient overlap test 

References

  1. 1.
    Arbeitman MN, Furlong EE, Imam F, Johnson E, Null BH, Baker BS, Krasnow MA, Scott MP, Davis RW, White KP (2002) Gene expression during the life cycle of Drosophila melanogaster. Science 297(5590):2270–2275ADSCrossRefPubMedGoogle Scholar
  2. 2.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Bolstad BM, Irizarry RA, Åstrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193CrossRefPubMedGoogle Scholar
  4. 4.
    Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S et al (2015) Ensembl 2015. Nucl Acids Res 43(D1):D662–D669CrossRefPubMedGoogle Scholar
  5. 5.
    Davidson EH, Cameron RA, Ransick A (1998) Specification of cell fate in the sea urchin embryo: summary and some proposed mechanisms. Development 125(17):3269–3290PubMedGoogle Scholar
  6. 6.
    Domazet-Lošo T, Tautz D (2010) A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature 468(7325):815–818ADSCrossRefPubMedGoogle Scholar
  7. 7.
    Dong Z, Wei H, Sun R, Tian Z (2007) The roles of innate immune cells in liver injury and regeneration. Cell Mol Immunol 4(4):241–252PubMedGoogle Scholar
  8. 8.
    Fu X, Fu N, Guo S, Yan Z, Xu Y, Hu H, Menzel C, Chen W, Li Y, Zeng R et al (2009) Estimating accuracy of RNA-Seq and microarrays with proteomics. BMC Genom 10(1):161CrossRefGoogle Scholar
  9. 9.
    Gerstein MB, Rozowsky J, Yan KK, Wang D, Cheng C, Brown JB, Davis CA, Hillier L, Sisu C, Li JJ et al (2014) Comparative analysis of the transcriptome across distant species. Nature 512(7515):445–448ADSCrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Hata S, Namae M, Nishina H (2007) Liver development and regeneration: from laboratory study to clinical therapy. Develop Growth Differ 49(2):163–170CrossRefGoogle Scholar
  11. 11.
    Hicks SC, Irizarry RA (2014) When to use quantile normalization? bioRxiv. doi:10.1101/012203
  12. 12.
    Labbé RM, Irimia M, Currie KW, Lin A, Zhu SJ, Brown DD, Ross EJ, Voisin V, Bader GD, Blencowe BJ et al (2012) A comparative transcriptomic analysis reveals conserved features of stem cell pluripotency in planarians and mammals. Stem Cells 30(8):1734–1745CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Le HS, Oltvai ZN, Bar-Joseph Z (2010) Cross-species queries of large gene expression databases. Bioinformatics 26(19):2416–2423CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Li JJ, Huang H, Bickel PJ, Brenner SE (2014) Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modencode RNA-Seq data. Genome Res 24(7):1086–1101CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Li T, Huang J, Jiang Y, Zeng Y, He F, Zhang MQ, Han Z, Zhang X (2009) Multi-stage analysis of gene expression and transcription regulation in c57/b6 mouse liver development. Genomics 93(3):235–242CrossRefPubMedGoogle Scholar
  16. 16.
    Necsulea A, Soumillon M, Warnefors M, Liechti A, Daish T, Zeller U, Baker JC, Grützner F, Kaessmann H (2014) The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505(7485):635–640ADSCrossRefPubMedGoogle Scholar
  17. 17.
    Pantalacci S, Sémon M (2015) Transcriptomics of developing embryos and organs: a raising tool for evo–devo. J Exp Zool Part B Mol Dev Evol 324(4):363–371CrossRefGoogle Scholar
  18. 18.
    Puniyani K, Faloutsos C, Xing EP (2010) Spex2: automated concise extraction of spatial gene expression patterns from fly embryo ISH images. Bioinformatics 26(12):i47–i56CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L, Lobanenkov VV et al (2012) A map of the cis-regulatory sequences in the mouse genome. Nature 488(7409):116–120ADSCrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Spencer WC, Zeller G, Watson JD, Henz SR, Watkins KL, McWhirter RD, Petersen S, Sreedharan VT, Widmer C, Jo J et al (2011) A spatial and temporal map of C. elegans gene expression. Genome Res 21(2):325–341CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Tong X, Feng Y, Li JJ (2016) Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC) curves. arXiv preprint arXiv:1608.03109
  22. 22.
    Tu Q, Cameron RA, Davidson EH (2014) Quantitative developmental transcriptomes of the sea urchin Strongylocentrotus purpuratus. Dev Biol 385(2):160–167CrossRefPubMedGoogle Scholar
  23. 23.
    Virmani AK, Tsou JA, Siegmund KD, Shen LY, Long TI, Laird PW, Gazdar AF, Laird-Offringa IA (2002) Hierarchical clustering of lung cancer cell lines using DNA methylation markers. Cancer Epidemiol Biomark Prevent 11(3):291–297Google Scholar
  24. 24.
    Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, Fang H, Hong H, Shen J, Su Z et al (2014) The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol 32(9):926–932Google Scholar
  25. 25.
    Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X (2014) Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PloS One 9(1)Google Scholar

Copyright information

© International Chinese Statistical Association 2016

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of CaliforniaLos AngelesUSA
  2. 2.Department of Human GeneticsUniversity of CaliforniaLos AngelesUSA

Personalised recommendations