Bayesian Semiparametric Model for Pathway-Based Analysis with Zero-Inflated Clinical Outcomes

  • Lulu Cheng
  • Inyoung KimEmail author
  • Herbert Pang


In this paper, we propose a semiparametric regression approach for identifying pathways related to zero-inflated clinical outcomes, where a pathway is a gene set derived from prior biological knowledge. Our approach is developed by using a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero-inflated Poisson hierarchical regression model with an unknown link function. Nonparametric pathway effect was estimated via a kernel machine, and the unknown link function was estimated by transforming a mixture of the beta cumulative density function. Our approach provides flexible nonparametric settings to describe the complicated association between gene expressions and zero-inflated clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor were adopted to make statistical inferences. Our simulation results support that our semiparametric approach is more accurate and flexible than zero-inflated Poisson regression with the canonical link function, which is especially true when the number of genes is large. The usefulness of our approach is demonstrated through its applications to the Canine data set from Enerson et al. (Toxicol Pathol 34:27–32, 2006). Our approach can also be applied to other settings where a large number of highly correlated predictors are present.

Supplementary materials accompanying this paper appear on-line.


Gaussian process Marginal likelihood Mixed model Unknown link Pathway based analysis Zero-inflated poisson 



This study was partially supported by grants from the National Science Foundation (CNS-096480 and CNS-1115839).

Supplementary material

13253_2016_264_MOESM1_ESM.pdf (581 kb)
Supplementary material 1 (pdf 581 KB)


  1. Ali, Z.A., Bursill, C.A., Douglas, G., McNeill, E., Papaspyridonos, M., Tatham, A.L., Bendall, J.K., Akhtar, A.M., Alp, N.J., Greaves, D.R., and Channon, K.M. (2008). CCR2-mediated anti-inflammatory effects of endothelial tetrahydrobiopterin inhibit vascular injury-induced accelerated atherosclerosis. Circulation, 118, S71–S77CrossRefGoogle Scholar
  2. Bai, X., Margariti, A., Hu, Y., Sato, Y., Zeng, L., Ivetic, A., Habi, O., Mason, J.C., Wang, X., and Xu, Q. (2010). Protein kinase Cdelta deficiency accelerates neointimal lesions of mouse injured artery involving delayed reendothelialization and vasohibin-1 accumulation. Arteriosclerosis, Thrombosis, and Vascular Biology, 30, 2467–74.CrossRefGoogle Scholar
  3. Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96, 270–281.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association, 91, 883–904.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Dettling, M. (2004). BagBoosting for Tumor Classification with Gene Expression Data. Bioinformatics, 20, 18, 3583–3593.CrossRefGoogle Scholar
  6. Diaconis, P. and Ylvisaker, D. (1985). Quantifying prior opinion (with discussions). Bayesian Statist, North-Holland, Amsterdam, 133–156.zbMATHGoogle Scholar
  7. Enerson, B.E., Lin,A., Lu, B., Zhao, H., Lawton, M.P., and Floyd, E. (2006). Acute Drug-Induced Vascular Injury in Beagle Dogs: Pathology and Correlating Genomic Expression. Toxicologic Pathology, 34, 27–32.Google Scholar
  8. Fang, Z, Kim, I., and Schaumont, P. (2016). Flexible variable selection for recovering sparsity in nonadditive nonparametric model. Biometrics. doi: 10.1111/biom.12518
  9. Gelman, A. and Rubin, DB. (1992) Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.CrossRefGoogle Scholar
  10. Geyer, C. J. (1992) Practical Markov chain Monte Carlo. Statistical Science, 7, 473–483.CrossRefGoogle Scholar
  11. Goeman, J.J., van de Geer, S.A., de Kort, F., van Houwelingen, H.C., Mukherjee, S., Ebert,B.L., Gillette, M. A., Paulovich,A., Pomeroy,S.L., Golub,T.R., , and E.S., ,J.P., (2004). A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20, 1, 93–99.Google Scholar
  12. Harris, M.A. et al (2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 32, D258–261.CrossRefGoogle Scholar
  13. Hilbe, J. M. (2009). Logistic Regression Models, Boca Raton, FL: Chapman & Hall/CRC.zbMATHGoogle Scholar
  14. Hilbe, J. M. (2011). Negative Binomial Regression Extensions, Cambridge University, UKCrossRefzbMATHGoogle Scholar
  15. Jeffreys H. (1961). The Theory of Probability, Oxford, New York.Google Scholar
  16. Kaminska, B. (2005). MAPK signalling pathways as molecular targets for anti-inflammatory therapy–from molecular mechanisms to therapeutic benefits. Biochimica et Biophysica Acta , 1754, 253–262.CrossRefGoogle Scholar
  17. Kim, I., Pang, H., and Zhao, H. (2012). Bayesian Semiparametric Regression Models for Evaluating Pathway Effects on Clinical Continuous and Binary Outcomes. Statistics in Medicine, 15, 1633–1651MathSciNetCrossRefGoogle Scholar
  18. Laud, P. and Ibrahim, J. (1995). Predictive model selection. Journal of the Royal Statistical Society Series B, 57, 247–262.MathSciNetzbMATHGoogle Scholar
  19. Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric Regression of Multidimensional Genetic Pathway Data: Least-Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 4, 1079–1088.Google Scholar
  20. Maity, A. and Lin, X. (2011). Powerful tests for detecting a gene effect in the presence of possible gene-gene interactions using garrote kernel machines. Biometrics, 67, 1271–1284.Google Scholar
  21. Mallick, B.K., and Gelfand, A.E. (1994). Generalized linear models with unknown link functions. Biometrika, 81, 2, 237–245.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Melaragno, M.G., Wuthrich, D.A., Poppa, V., Gill, D., Lindner, V., Berk, B.C., and Corson, M.A. (1998) Increased expression of Axl tyrosine kinase after vascular injury and regulation by G protein-coupled receptor agonists in rats. Circulation Research, 83, 697–704.CrossRefGoogle Scholar
  23. Mootha, V. K., Handschin, C., Arlow, D., Xie, X., Pierre, J. S., Sihag, S., Yang, W., Altshuler, D., Puigserver, P., Patterson, N., Willy, P. J., Schulman, I. G., Heyman, R. A., Lander, E. S., and Spiegelman, B. M. (2004). Err\(\alpha \)-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle. Proceedings of the National Academy of Sciences, 101, 6570–6575.CrossRefGoogle Scholar
  24. Pang, H., Lin, A., Holford, M., Enerson, B.E., Lu, B., Lawton, M.P., Floyd, E., and Zhao, H. (2006). Pathway analysis using random forests classification and regression. Bioinformatics, 22, 2028–2036.CrossRefGoogle Scholar
  25. Pettit, L. I., and Young ,K. D. S. (1990). Measuring the effect of observation on Bayes factors. Biometrika, 77, 455–466.MathSciNetCrossRefGoogle Scholar
  26. Roberts, G.O. (1999). A note on acceptance rate criteria for CLTs for Metropolis-Hastings algorithms. Journal of Applied Probability, 36, 1210–1217.MathSciNetCrossRefzbMATHGoogle Scholar
  27. Ruusalepp, A., Yan, Z.Q., Carlsen, H., Czibik G, Hansson, G.K., Moskaug, J.Ø., Blomhoff, R., and Valen, G. (2006). Gene deletion of NF-kappaB p105 enhances neointima formation in a mouse model of carotid artery injury. Cardiovascular Drugs and Therapy, 20, 103–111.CrossRefGoogle Scholar
  28. Somjen, D., Kohen, F., Jaffe, A., Amir-Zaltsman, Y., Knoll, E., and Stern, N. (1998). Effects of gonadal steroids and their antagonists on DNA synthesis in human vascular cells. Hypertension, 32, 39–45.CrossRefGoogle Scholar
  29. Stingo, F.C., Chen, Y.A., Tadesse, M.G. and Vannucci, M. (2011). Incorporating Biological Information into Linear Models: A Bayesian Approach to the Selection of Pathways and Genes. Annals of Applied Statistics, 5, 1978–2002.MathSciNetCrossRefzbMATHGoogle Scholar
  30. Subramanian, A., Tamayo, P., Mootha, V.K. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102, 43, 15545–15550.CrossRefGoogle Scholar
  31. Vanhoutte, P.M. (2010) Regeneration of the endothelium in vascular injury. cardiovascular Drugs and Therapy, 24, 299–303.CrossRefGoogle Scholar
  32. Vecchione, C., Aretini, A., Marino, G., Bettarini, U., Poulet, R., Maffei, A., Sbroggió, M., Pastore, L., Gentile, M.T., Notte, A., Iorio, L., Hirsch, E., Tarone, G., and Lembo, G. (2006) Selective Rac-1 inhibition protects from diabetes-induced vascular injury. Circulation Research, 98, 218–225.CrossRefGoogle Scholar

Copyright information

© International Biometric Society 2016

Authors and Affiliations

  1. 1.Department of StatisticsVirginia Polytechnic Institute and State University (Virginia Tech.)BlacksburgUSA
  2. 2.Division of Epidemiology and Biostatistics, School of Public HealthThe University of Hong KongPokfulamHong Kong

Personalised recommendations