Abstract
In this paper, we propose a semiparametric regression approach for identifying pathways related to zero-inflated clinical outcomes, where a pathway is a gene set derived from prior biological knowledge. Our approach is developed by using a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero-inflated Poisson hierarchical regression model with an unknown link function. Nonparametric pathway effect was estimated via a kernel machine, and the unknown link function was estimated by transforming a mixture of the beta cumulative density function. Our approach provides flexible nonparametric settings to describe the complicated association between gene expressions and zero-inflated clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor were adopted to make statistical inferences. Our simulation results support that our semiparametric approach is more accurate and flexible than zero-inflated Poisson regression with the canonical link function, which is especially true when the number of genes is large. The usefulness of our approach is demonstrated through its applications to the Canine data set from Enerson et al. (Toxicol Pathol 34:27–32, 2006). Our approach can also be applied to other settings where a large number of highly correlated predictors are present.
Supplementary materials accompanying this paper appear on-line.
Similar content being viewed by others
References
Ali, Z.A., Bursill, C.A., Douglas, G., McNeill, E., Papaspyridonos, M., Tatham, A.L., Bendall, J.K., Akhtar, A.M., Alp, N.J., Greaves, D.R., and Channon, K.M. (2008). CCR2-mediated anti-inflammatory effects of endothelial tetrahydrobiopterin inhibit vascular injury-induced accelerated atherosclerosis. Circulation, 118, S71–S77
Bai, X., Margariti, A., Hu, Y., Sato, Y., Zeng, L., Ivetic, A., Habi, O., Mason, J.C., Wang, X., and Xu, Q. (2010). Protein kinase Cdelta deficiency accelerates neointimal lesions of mouse injured artery involving delayed reendothelialization and vasohibin-1 accumulation. Arteriosclerosis, Thrombosis, and Vascular Biology, 30, 2467–74.
Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96, 270–281.
Cowles, M. K. and Carlin, B. P. (1996). Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association, 91, 883–904.
Dettling, M. (2004). BagBoosting for Tumor Classification with Gene Expression Data. Bioinformatics, 20, 18, 3583–3593.
Diaconis, P. and Ylvisaker, D. (1985). Quantifying prior opinion (with discussions). Bayesian Statist, North-Holland, Amsterdam, 133–156.
Enerson, B.E., Lin,A., Lu, B., Zhao, H., Lawton, M.P., and Floyd, E. (2006). Acute Drug-Induced Vascular Injury in Beagle Dogs: Pathology and Correlating Genomic Expression. Toxicologic Pathology, 34, 27–32.
Fang, Z, Kim, I., and Schaumont, P. (2016). Flexible variable selection for recovering sparsity in nonadditive nonparametric model. Biometrics. doi:10.1111/biom.12518
Gelman, A. and Rubin, DB. (1992) Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.
Geyer, C. J. (1992) Practical Markov chain Monte Carlo. Statistical Science, 7, 473–483.
Goeman, J.J., van de Geer, S.A., de Kort, F., van Houwelingen, H.C., Mukherjee, S., Ebert,B.L., Gillette, M. A., Paulovich,A., Pomeroy,S.L., Golub,T.R., , and E.S., ,J.P., (2004). A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20, 1, 93–99.
Harris, M.A. et al (2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 32, D258–261.
Hilbe, J. M. (2009). Logistic Regression Models, Boca Raton, FL: Chapman & Hall/CRC.
Hilbe, J. M. (2011). Negative Binomial Regression Extensions, Cambridge University, UK
Jeffreys H. (1961). The Theory of Probability, Oxford, New York.
Kaminska, B. (2005). MAPK signalling pathways as molecular targets for anti-inflammatory therapy–from molecular mechanisms to therapeutic benefits. Biochimica et Biophysica Acta , 1754, 253–262.
Kim, I., Pang, H., and Zhao, H. (2012). Bayesian Semiparametric Regression Models for Evaluating Pathway Effects on Clinical Continuous and Binary Outcomes. Statistics in Medicine, 15, 1633–1651
Laud, P. and Ibrahim, J. (1995). Predictive model selection. Journal of the Royal Statistical Society Series B, 57, 247–262.
Liu, D., Lin, X., and Ghosh, D. (2007). Semiparametric Regression of Multidimensional Genetic Pathway Data: Least-Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 4, 1079–1088.
Maity, A. and Lin, X. (2011). Powerful tests for detecting a gene effect in the presence of possible gene-gene interactions using garrote kernel machines. Biometrics, 67, 1271–1284.
Mallick, B.K., and Gelfand, A.E. (1994). Generalized linear models with unknown link functions. Biometrika, 81, 2, 237–245.
Melaragno, M.G., Wuthrich, D.A., Poppa, V., Gill, D., Lindner, V., Berk, B.C., and Corson, M.A. (1998) Increased expression of Axl tyrosine kinase after vascular injury and regulation by G protein-coupled receptor agonists in rats. Circulation Research, 83, 697–704.
Mootha, V. K., Handschin, C., Arlow, D., Xie, X., Pierre, J. S., Sihag, S., Yang, W., Altshuler, D., Puigserver, P., Patterson, N., Willy, P. J., Schulman, I. G., Heyman, R. A., Lander, E. S., and Spiegelman, B. M. (2004). Err\(\alpha \)-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle. Proceedings of the National Academy of Sciences, 101, 6570–6575.
Pang, H., Lin, A., Holford, M., Enerson, B.E., Lu, B., Lawton, M.P., Floyd, E., and Zhao, H. (2006). Pathway analysis using random forests classification and regression. Bioinformatics, 22, 2028–2036.
Pettit, L. I., and Young ,K. D. S. (1990). Measuring the effect of observation on Bayes factors. Biometrika, 77, 455–466.
Roberts, G.O. (1999). A note on acceptance rate criteria for CLTs for Metropolis-Hastings algorithms. Journal of Applied Probability, 36, 1210–1217.
Ruusalepp, A., Yan, Z.Q., Carlsen, H., Czibik G, Hansson, G.K., Moskaug, J.Ø., Blomhoff, R., and Valen, G. (2006). Gene deletion of NF-kappaB p105 enhances neointima formation in a mouse model of carotid artery injury. Cardiovascular Drugs and Therapy, 20, 103–111.
Somjen, D., Kohen, F., Jaffe, A., Amir-Zaltsman, Y., Knoll, E., and Stern, N. (1998). Effects of gonadal steroids and their antagonists on DNA synthesis in human vascular cells. Hypertension, 32, 39–45.
Stingo, F.C., Chen, Y.A., Tadesse, M.G. and Vannucci, M. (2011). Incorporating Biological Information into Linear Models: A Bayesian Approach to the Selection of Pathways and Genes. Annals of Applied Statistics, 5, 1978–2002.
Subramanian, A., Tamayo, P., Mootha, V.K. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102, 43, 15545–15550.
Vanhoutte, P.M. (2010) Regeneration of the endothelium in vascular injury. cardiovascular Drugs and Therapy, 24, 299–303.
Vecchione, C., Aretini, A., Marino, G., Bettarini, U., Poulet, R., Maffei, A., Sbroggió, M., Pastore, L., Gentile, M.T., Notte, A., Iorio, L., Hirsch, E., Tarone, G., and Lembo, G. (2006) Selective Rac-1 inhibition protects from diabetes-induced vascular injury. Circulation Research, 98, 218–225.
Acknowledgments
This study was partially supported by grants from the National Science Foundation (CNS-096480 and CNS-1115839).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Cheng, L., Kim, I. & Pang, H. Bayesian Semiparametric Model for Pathway-Based Analysis with Zero-Inflated Clinical Outcomes. JABES 21, 641–662 (2016). https://doi.org/10.1007/s13253-016-0264-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-016-0264-3