Statistics in Biosciences

, Volume 8, Issue 2, pp 374–394 | Cite as

Pathway crosstalk effects: shrinkage and disentanglement using a Bayesian hierarchical model

  • Alin Tomoiaga
  • Peter Westfall
  • Michele Donato
  • Sorin Draghici
  • Sonia Hassan
  • Roberto Romero
  • Paola Tellaroli


Identifying the biological pathways that are related to various clinical phenotypes is an important concern in biomedical research. Based on estimated expression levels and/or p values, overrepresentation analysis (ORA) methods provide rankings of pathways, but they are tainted because pathways overlap. This crosstalk phenomenon has not been rigorously studied and classical ORA does not take into consideration: (1) that crosstalk effects in cases of overlapping pathways can cause incorrect rankings of pathways, (2) that crosstalk effects can cause both excess type I errors and type II errors, (3) that rankings of small pathways are unreliable, and (4) that type I error rates can be inflated due to multiple comparisons of pathways. We develop a Bayesian hierarchical model that addresses these problems, providing sensible estimates and rankings, and reducing error rates. We show, on both real and simulated data, that the results of our method are more accurate than the results produced by the classical overrepresentation analysis, providing a better understanding of the underlying biological phenomena involved in the phenotypes under study. The R code and the binary datasets for implementing the analyses described in this article are available online at:


Bayes model hierarchical modeling data augmentation genomic pathway analysis gene expression 



This work has been partially supported by the following Grants: NIH RO1 RDK089167, R42 GM087013, and NSF DBI-0965741 (to S.D.), by PARO112419, by the Robert J. Sokol Endowment in Systems Biology, by the Wayne State University Perinatal Initiative, and by the Perinatology Research Branch, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, DHHS. The authors gratefully acknowledge the comments of a reviewer to improve this manuscript. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF, NIH, or any other of the funding agencies.


  1. 1.
    Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679Google Scholar
  2. 2.
    Bauer S, Gagneur J, Robinson PN (2010) Going bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res 38(11):3523–3532.
  3. 3.
    Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300MathSciNetzbMATHGoogle Scholar
  4. 4.
    Chuang HY, Hofree M, Ideker T (2010) A decade of systems biology. Annu Rev Cell Dev Biol 26:721–744CrossRefGoogle Scholar
  5. 5.
    Cowles MK, Carlin BP (1996) Markov chain monte carlo convergence diagnostics: a comparative review. J Am Stat Assoc 91(434):883–904MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M, Garapati P, Gillespie M, Kamdar MR, Jassal B, Jupe S, Matthews L, May B, Palatnik S, Rothfels K, Shamovsky V, Song H, Williams M, Birney E, Hermjakob H, Stein L, D’Eustachio P (2014) The Reactome pathway knowledgebase. Nucleic Acids Res 42(D1):D472–D477CrossRefGoogle Scholar
  7. 7.
    Damian D, Gorfine M (2004) Statistical concerns about the GSEA procedure. Nat Genet 36(7):663. (author reply 663)
  8. 8.
    De Duve C (1963) The lysosome concept. Ciba Foundation Symposium-Lysosomes. Wiley Online Library, New York, pp 1–35CrossRefGoogle Scholar
  9. 9.
    Donato M, Draghici S (2010) Signaling pathways coupling phenomena. In: Neural Networks (IJCNN), The 2010 International Joint Conference on, pp 1–6. doi: 10.1109/IJCNN.2010.5596743
  10. 10.
    Donato M, Xu Z, Tomoiaga A, Granneman JG, MacKenzie RG, Bao R, Than NG, Westfall PH, Romero R, Draghici S (2013) Analysis and correction of crosstalk effects in pathway analysis. Genome Res 23(11):1885–1893CrossRefGoogle Scholar
  11. 11.
    Drăghici S (2011) Statistics and data analysis for microarrays using R and Bioconductor. Chapman and Hall/CRC Press, Boca RatonzbMATHGoogle Scholar
  12. 12.
    Drăghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA (2003) Global functional profiling of gene expression. Genomics 81(2):98–104CrossRefGoogle Scholar
  13. 13.
    Efron B, Tibshirani R (2007) On testing the significance of sets of genes. Ann Appl Stat 1(1):107–129MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Emmert-Streib F, Glazko GV (2011) Pathway analysis of expression data: deciphering functional building blocks of complex diseases. PLoS Comput Biol 7(5):e1002053CrossRefGoogle Scholar
  15. 15.
    Fan J, Han X, Gu W (2012) Estimating false discovery proportion under arbitrary covariance dependence. J Am Stat Assoc 107(499):1019–1035. doi: 10.1080/01621459.2012.720478 MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Gelman A (1996) Inference and monitoring convergence. Markov chain Monte Carlo in practice. Springer, New York, pp 131–143Google Scholar
  17. 17.
    Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, Rubin D (2013) Bayesian Data Analysis, 2nd edn (Chapman & Hall/CRC Texts in Statistical Science), 3 edn. Chapman and Hall/CRC, Boca Raton.
  18. 18.
    Gelman A, van Dyk DA, Huang Z, Boscardin JW (2008) Using redundant parameterizations to fit hierarchical models. J Comput Graph Stat 17(1):95–122. doi: 10.1198/106186008x287337 MathSciNetCrossRefGoogle Scholar
  19. 19.
    Granneman JG, Li P, Zhu Z, Lu Y (2005) Metabolic and cellular plasticity in white adipose tissue I: effects of beta3-adrenergic receptor activation. Am J Physiol Endocrinol Metab 289(4):E608–616CrossRefGoogle Scholar
  20. 20.
    Hassan SS, Romero R, Tarca AL, Nhan-Chang CL, Vaisbuch E, Erez O, Mittal P, Kusanovic JP, Mazaki-Tovi S, Yeo L, Draghici S, Kim JS, Uldbjerg N, Kim CJ (2009) The transcriptome of cervical ripening in human pregnancy before the onset of labor at term: Identification of novel molecular functions involved in this process. J Matern Fetal Neonatal Med 22(12):1183–1193CrossRefGoogle Scholar
  21. 21.
    Ho DE, Quinn KM (2008) Improving the presentation and interpretation of online ratings data with model-based figures. Am Stat 62(4):279–288. doi: 10.1198/000313008X366145 MathSciNetCrossRefGoogle Scholar
  22. 22.
    Holmes CC, Held L (2006) Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal 1(1):145–168. doi: 10.1214/06-ba105 MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Irizarry RA, Chi W, Yun Z, Speed TP (2009) Gene set enrichment analysis made simple. Stat Methods Med Res 18(6):565–575. doi: 10.1177/0962280209351908.
  24. 24.
    Jauhiainen A, Nerman O, Michailidis G, Jornsten R (2012) Transcriptional and metabolic data integration and modeling for identification of active pathways. Biostatistics 13(4):748–761. doi: 10.1093/biostatistics/kxs016.
  25. 25.
    Jeffery IB, Higgins DG, Culhane AC (2006) Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinform 7(1):359CrossRefGoogle Scholar
  26. 26.
    Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30CrossRefGoogle Scholar
  27. 27.
    Kanehisa M, Goto S, Kawashima S, Okunom Y, Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32(Database issue):277–280Google Scholar
  28. 28.
    Kelder T, Conklin BR, Evelo CT, Pico AR (2010) Finding the right questions: exploratory pathway analysis to enhance biological discovery in large datasets. PLoS Biol 8(8):e1000472CrossRefGoogle Scholar
  29. 29.
    Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21(18):3587–3595.
  30. 30.
    Khatri P, Sirota M, Butte AJ (2012) Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol 8(2):e1002375CrossRefGoogle Scholar
  31. 31.
    Kruschke J (2010) Doing Bayesian data analysis: a tutorial introduction with R. Academic Press, USAGoogle Scholar
  32. 32.
    Lee YH, Petkova AP, Mottillo EP, Granneman JG (2012) In vivo identification of bipotential adipocyte progenitors recruited by Beta3-adrenoceptor activation and high-fat feeding. Cell Metab 15(4):480–491CrossRefGoogle Scholar
  33. 33.
    Leppert PC (1995) Anatomy and physiology of cervical ripening. Clin Obstet Gynecol 38(2):267–279.
  34. 34.
    Leppert PC, Cerreta JM, Mandl I (1986) Orientation of elastic fibers in the human cervix. Am J Obstet Gynecol 155(1):219–224.
  35. 35.
    Li M, Carpio DF, Zheng Y, Bruzzo P, Singh V, Ouaaz F, Medzhitov RM, Beg AA (2001) An essential role of the NF-kappa B/Toll-like receptor pathway in induction of inflammatory and tissue-repair gene expression by necrotic cells. J Immunol 166(12):7128–7135CrossRefGoogle Scholar
  36. 36.
    Li P, Zhu Z, Lu Y, Granneman JG (2005) Metabolic and cellular plasticity in white adipose tissue II: role of peroxisome proliferator-activated receptor-alpha. Am J Physiol Endocrinol Metab 289(4):E617–626CrossRefGoogle Scholar
  37. 37.
    Liu D, Ghosh D, Lin X (2008) Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinform 9(1):292. doi: 10.1186/1471-2105-9-292.
  38. 38.
    Mahendroo MS, Porter A, Russell DW, Word RA (1999) The parturition defect in steroid 5alpha-reductase type 1 knockout mice is due to impaired cervical ripening. Mol Endocrinol 13(6):981–992.
  39. 39.
    Misman MF, Deris S, Hashim SZ, Jumali R, Mohamad MS (2009) Pathway-based microarray analysis for defining statistical significant phenotype-related pathways: a review of common approaches. In: Information Management and Engineering. ICIME’09. International Conference on. IEEE, pp 496–500Google Scholar
  40. 40.
    Mitrea C, Taghavi Z, Bokanizad B, Hanoudi S, Tagett R, Donato M, Voichiţa C, Drăghici S (2013) Methods and approaches in the topology-based analysis of biological pathways. Frontiers Physiol 4:278CrossRefGoogle Scholar
  41. 41.
    Mottillo EP, Shen XJ, Granneman JG (2007) Role of hormone-sensitive lipase in beta-adrenergic remodeling of white adipose tissue. Am J Physiol Endocrinol Metab 293(5):E1188–97CrossRefGoogle Scholar
  42. 42.
    Newman SL, Henson JE, Henson PM (1982) Phagocytosis of senescent neutrophils by human monocyte-derived macrophages and rabbit inflammatory macrophages. J Exp Med 156(2):430CrossRefGoogle Scholar
  43. 43.
    Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P (2007) Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis. Ann Appl Stat 1(1):85–106. doi: 10.1214/07-AOAS104.
  44. 44.
    Reiner A, Yekutieli D, Benjamini Y (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19(3):368–375. doi: 10.1093/bioinformatics/btf877.
  45. 45.
    Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3(1):Article3Google Scholar
  46. 46.
    Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102(43):15545–15550CrossRefGoogle Scholar
  47. 47.
    Tan PK, Downey TJ, Spitznagel EL Jr, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 31(19):5676–5684CrossRefGoogle Scholar
  48. 48.
    Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82(398):528–540MathSciNetCrossRefzbMATHGoogle Scholar
  49. 49.
    Tarca AL, Drǎghici S, Bhatti G, Romero R (2012) Down-weighting overlapping genes improves gene set analysis. BMC Bioinform 13(1):136CrossRefGoogle Scholar
  50. 50.
    Uldbjerg N, Ekman G, Malmström A, Olsson K, Ulmsten U (1983) Ripening of the human uterine cervix related to changes in collagen, glycosaminoglycans, and collagenolytic activity. Am J Obstet Gynecol 147(6):662–666CrossRefGoogle Scholar
  51. 51.
    Westfall P (2010) Comment on correlated z-values and the accuracy of large-scale statistical estimates by Bradley Efron. J Am Stat Assoc 105:1063–1066MathSciNetCrossRefzbMATHGoogle Scholar
  52. 52.
    Yongchao G, Sealfon SC, Speed TP (2009) Multiple testing and its applications to microarrays. Stat Methods Med Res 18(6):543–563. doi: 10.1177/0962280209351899.

Copyright information

© International Chinese Statistical Association 2016

Authors and Affiliations

  • Alin Tomoiaga
    • 1
  • Peter Westfall
    • 1
  • Michele Donato
    • 2
  • Sorin Draghici
    • 2
  • Sonia Hassan
    • 3
  • Roberto Romero
    • 3
  • Paola Tellaroli
    • 4
  1. 1.Center for Advanced Analytics and Business IntelligenceTexas Tech UniversityLubbockUSA
  2. 2.Department of Obstetrics and GynecologyWayne State University School of MedicineDetroitUSA
  3. 3.Perinatology Research Branch, Program for Perinatal Research and Obstetrics, Division of Intramural ResearchEunice Kennedy Shriver National Institute of Child Health and Human DevelopmentNIH DetroitUSA
  4. 4.Department of Statistical SciencesUniversity of PaduaPaduaItaly

Personalised recommendations