Skip to main content
Log in

Bayesian Semiparametric Model for Pathway-Based Analysis with Zero-Inflated Clinical Outcomes

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

In this paper, we propose a semiparametric regression approach for identifying pathways related to zero-inflated clinical outcomes, where a pathway is a gene set derived from prior biological knowledge. Our approach is developed by using a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero-inflated Poisson hierarchical regression model with an unknown link function. Nonparametric pathway effect was estimated via a kernel machine, and the unknown link function was estimated by transforming a mixture of the beta cumulative density function. Our approach provides flexible nonparametric settings to describe the complicated association between gene expressions and zero-inflated clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor were adopted to make statistical inferences. Our simulation results support that our semiparametric approach is more accurate and flexible than zero-inflated Poisson regression with the canonical link function, which is especially true when the number of genes is large. The usefulness of our approach is demonstrated through its applications to the Canine data set from Enerson et al. (Toxicol Pathol 34:27–32, 2006). Our approach can also be applied to other settings where a large number of highly correlated predictors are present.

Supplementary materials accompanying this paper appear on-line.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Ali, Z.A., Bursill, C.A., Douglas, G., McNeill, E., Papaspyridonos, M., Tatham, A.L., Bendall, J.K., Akhtar, A.M., Alp, N.J., Greaves, D.R., and Channon, K.M. (2008). CCR2-mediated anti-inflammatory effects of endothelial tetrahydrobiopterin inhibit vascular injury-induced accelerated atherosclerosis. Circulation, 118, S71–S77

    Article  Google Scholar 

  • Bai, X., Margariti, A., Hu, Y., Sato, Y., Zeng, L., Ivetic, A., Habi, O., Mason, J.C., Wang, X., and Xu, Q. (2010). Protein kinase Cdelta deficiency accelerates neointimal lesions of mouse injured artery involving delayed reendothelialization and vasohibin-1 accumulation. Arteriosclerosis, Thrombosis, and Vascular Biology, 30, 2467–74.

    Article  Google Scholar 

  • Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96, 270–281.

    Article  MathSciNet  MATH  Google Scholar 

  • Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association, 91, 883–904.

    Article  MathSciNet  MATH  Google Scholar 

  • Dettling, M. (2004). BagBoosting for Tumor Classification with Gene Expression Data. Bioinformatics, 20, 18, 3583–3593.

    Article  Google Scholar 

  • Diaconis, P. and Ylvisaker, D. (1985). Quantifying prior opinion (with discussions). Bayesian Statist, North-Holland, Amsterdam, 133–156.

    MATH  Google Scholar 

  • Enerson, B.E., Lin,A., Lu, B., Zhao, H., Lawton, M.P., and Floyd, E. (2006). Acute Drug-Induced Vascular Injury in Beagle Dogs: Pathology and Correlating Genomic Expression. Toxicologic Pathology, 34, 27–32.

  • Fang, Z, Kim, I., and Schaumont, P. (2016). Flexible variable selection for recovering sparsity in nonadditive nonparametric model. Biometrics. doi:10.1111/biom.12518

  • Gelman, A. and Rubin, DB. (1992) Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.

    Article  Google Scholar 

  • Geyer, C. J. (1992) Practical Markov chain Monte Carlo. Statistical Science, 7, 473–483.

    Article  Google Scholar 

  • Goeman, J.J., van de Geer, S.A., de Kort, F., van Houwelingen, H.C., Mukherjee, S., Ebert,B.L., Gillette, M. A., Paulovich,A., Pomeroy,S.L., Golub,T.R., , and E.S., ,J.P., (2004). A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20, 1, 93–99.

  • Harris, M.A. et al (2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 32, D258–261.

    Article  Google Scholar 

  • Hilbe, J. M. (2009). Logistic Regression Models, Boca Raton, FL: Chapman & Hall/CRC.

    MATH  Google Scholar 

  • Hilbe, J. M. (2011). Negative Binomial Regression Extensions, Cambridge University, UK

    Book  MATH  Google Scholar 

  • Jeffreys H. (1961). The Theory of Probability, Oxford, New York.

  • Kaminska, B. (2005). MAPK signalling pathways as molecular targets for anti-inflammatory therapy–from molecular mechanisms to therapeutic benefits. Biochimica et Biophysica Acta , 1754, 253–262.

    Article  Google Scholar 

  • Kim, I., Pang, H., and Zhao, H. (2012). Bayesian Semiparametric Regression Models for Evaluating Pathway Effects on Clinical Continuous and Binary Outcomes. Statistics in Medicine, 15, 1633–1651

    Article  MathSciNet  Google Scholar 

  • Laud, P. and Ibrahim, J. (1995). Predictive model selection. Journal of the Royal Statistical Society Series B, 57, 247–262.

    MathSciNet  MATH  Google Scholar 

  • Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric Regression of Multidimensional Genetic Pathway Data: Least-Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 4, 1079–1088.

  • Maity, A. and Lin, X. (2011). Powerful tests for detecting a gene effect in the presence of possible gene-gene interactions using garrote kernel machines. Biometrics, 67, 1271–1284.

  • Mallick, B.K., and Gelfand, A.E. (1994). Generalized linear models with unknown link functions. Biometrika, 81, 2, 237–245.

    Article  MathSciNet  MATH  Google Scholar 

  • Melaragno, M.G., Wuthrich, D.A., Poppa, V., Gill, D., Lindner, V., Berk, B.C., and Corson, M.A. (1998) Increased expression of Axl tyrosine kinase after vascular injury and regulation by G protein-coupled receptor agonists in rats. Circulation Research, 83, 697–704.

    Article  Google Scholar 

  • Mootha, V. K., Handschin, C., Arlow, D., Xie, X., Pierre, J. S., Sihag, S., Yang, W., Altshuler, D., Puigserver, P., Patterson, N., Willy, P. J., Schulman, I. G., Heyman, R. A., Lander, E. S., and Spiegelman, B. M. (2004). Err\(\alpha \)-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle. Proceedings of the National Academy of Sciences, 101, 6570–6575.

    Article  Google Scholar 

  • Pang, H., Lin, A., Holford, M., Enerson, B.E., Lu, B., Lawton, M.P., Floyd, E., and Zhao, H. (2006). Pathway analysis using random forests classification and regression. Bioinformatics, 22, 2028–2036.

    Article  Google Scholar 

  • Pettit, L. I., and Young ,K. D. S. (1990). Measuring the effect of observation on Bayes factors. Biometrika, 77, 455–466.

    Article  MathSciNet  Google Scholar 

  • Roberts, G.O. (1999). A note on acceptance rate criteria for CLTs for Metropolis-Hastings algorithms. Journal of Applied Probability, 36, 1210–1217.

    Article  MathSciNet  MATH  Google Scholar 

  • Ruusalepp, A., Yan, Z.Q., Carlsen, H., Czibik G, Hansson, G.K., Moskaug, J.Ø., Blomhoff, R., and Valen, G. (2006). Gene deletion of NF-kappaB p105 enhances neointima formation in a mouse model of carotid artery injury. Cardiovascular Drugs and Therapy, 20, 103–111.

    Article  Google Scholar 

  • Somjen, D., Kohen, F., Jaffe, A., Amir-Zaltsman, Y., Knoll, E., and Stern, N. (1998). Effects of gonadal steroids and their antagonists on DNA synthesis in human vascular cells. Hypertension, 32, 39–45.

    Article  Google Scholar 

  • Stingo, F.C., Chen, Y.A., Tadesse, M.G. and Vannucci, M. (2011). Incorporating Biological Information into Linear Models: A Bayesian Approach to the Selection of Pathways and Genes. Annals of Applied Statistics, 5, 1978–2002.

    Article  MathSciNet  MATH  Google Scholar 

  • Subramanian, A., Tamayo, P., Mootha, V.K. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102, 43, 15545–15550.

    Article  Google Scholar 

  • Vanhoutte, P.M. (2010) Regeneration of the endothelium in vascular injury. cardiovascular Drugs and Therapy, 24, 299–303.

    Article  Google Scholar 

  • Vecchione, C., Aretini, A., Marino, G., Bettarini, U., Poulet, R., Maffei, A., Sbroggió, M., Pastore, L., Gentile, M.T., Notte, A., Iorio, L., Hirsch, E., Tarone, G., and Lembo, G. (2006) Selective Rac-1 inhibition protects from diabetes-induced vascular injury. Circulation Research, 98, 218–225.

    Article  Google Scholar 

Download references

Acknowledgments

This study was partially supported by grants from the National Science Foundation (CNS-096480 and CNS-1115839).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inyoung Kim.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 581 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, L., Kim, I. & Pang, H. Bayesian Semiparametric Model for Pathway-Based Analysis with Zero-Inflated Clinical Outcomes. JABES 21, 641–662 (2016). https://doi.org/10.1007/s13253-016-0264-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-016-0264-3

Keywords

Navigation