Abstract
This paper develops a Bayesian spike and slab model for zero-inflated count models which are commonly used in health economics. We account for model uncertainty and allow for model averaging in situations with many potential regressors. The proposed techniques are applied to a German data set analyzing the demand for health care. An accompanying package for the free statistical software environment R is provided.
Similar content being viewed by others
Notes
The original data are available for download from the Journal of Applied Econometrics Data Archive website (http://econ.queensu.ca/jae/). The version of the data set that is used in this application is also included in the R package that accompanies this article.
Like Greene (2005), who uses the same data set, we changed all observations on health that were recorded between 6 and 7 to 7.
References
Böhning D, Dietz E, Schlattmann P, Mendonça L, Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc Series A (Statistics in Society) 162(2):195–209
Deb P, Munkin MK, Trivedi PK (2006) Private insurance, selection, and health care use: a Bayesian analysis of a Roy-type model. J Bus Econ Stat 24:403–415
Eddelbuettel D, François R (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40:1–18
Gelfand A, Dey D, Chang H (1992) Model determination using predictive distributions with implementation via sampling-based methods. In: Bernardo J, Berger J, Dawid A, Smith A (eds) Bayesian statistics, vol 4. Oxford University Press, Oxford, pp 147–168
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889
Geweke J, Keane M (2007) Smoothly mixing regressions. J Econom 138(1):252–290
Gill J, Pang X (2009) Spike and slab prior distributions for simultaneous Bayesian hypothesis testing, model selection, and prediction, of nonlinear outcomes. Working Paper, The Society for Political Methodology
Greene W (2005) Functional form and heterogeneity in models for count data. Found Trends Econom 1:113–218
Grootendorst PV (1995) A comparison of alternative models of prescription drug utilisation. Health Econ 4:183–198
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417
Ishwaran H, Rao JS (2003) Detecting differentially expressed genes in microarrays using Bayesian model selection. J Am Stat Assoc 98(462):438–455
Ishwaran H, Rao JS (2005) Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat 33(2):730–773
Jochmann M, León-González R (2004) Estimating the demand for health care with panel data: a semiparametric Bayesian approach. Health Econ 13:1003–1014
Koop G (2003) Bayesian econometrics. Wiley, Chichester
Lambert D (1992) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14
Liu JS (2001) Monte carlo strategies in scientific computing. Springer, New York
Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546
Mitchell TJ, Beauchamp JJ (1988) Bayesian variable selection in linear regression. J Am Stat Assoc 83:1023–1032
Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33:341–365
Pizer SD, Prentice JC (2011) Time is money: outpatient waiting times and health insurance choices of elderly veterans in the United States. J Health Econ 30:626–636
R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Riphahn RT, Wambach A, Million A (2003) Incentive effects in the demand for health care: a bivariate panel count data estimation. J Appl Econ 18:387–405
Robert CP, Casella G (2004) Monte carlo statistical methods, 2nd edn. Springer, New York
Sanderson C (2010) Armadillo: an open source C++ linear algebra library for fast prototyping and computationally intensive experiments. Technical Report, NICTA
Sari N (2009) Physical inactivity and its impact on healthcare utilization. Health Econ 18:885–901
Street A, Jones A, Furuta A (1999) Cost-sharing and pharmaceutical utilisation and expenditure in Russia. J Health Econ 18:459–472
Tanner MA, Wong W (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82:528–550
Wagner GG, Frick JR, Schupp J (2007) The German socio-economic panel study (SOEP) - scope, evolution and enhancements. Schmollers Jahrbuch 127:139–169
Author information
Authors and Affiliations
Corresponding author
Additional information
We thank Gary Koop and two anonymous referees for their valuable comments.
Appendix: Details of the sampling algorithm
Appendix: Details of the sampling algorithm
The proposed Gibbs sampling algorithm consists of the following steps:
-
1.
Sample \((\alpha ,\varvec{\beta }^{\prime })^{\prime }\) from \(\varphi (\alpha ,\varvec{\beta }|\varvec{\eta }^*,\sigma ^2,\varvec{\tau }^\beta , \varvec{\kappa }^\beta )=\mathrm N [(\alpha ,\varvec{\beta }^{\prime })^{\prime }|\varvec{\mu },\varvec{\varSigma }]\) with variance
$$\begin{aligned} \varvec{\varSigma }=\left(\varvec{B}^{-1}+\frac{1}{\sigma ^2}\sum _{i=1} ^N{\tilde{{\varvec{x}}}}_i{\tilde{{\varvec{x}}}}_i^{\prime }\right)^{-1} \end{aligned}$$(20)and mean
$$\begin{aligned} \varvec{\mu }=\varvec{\varSigma }_{\varvec{\beta }}\left(\frac{1}{\sigma ^2} \sum _{i=1}^N{\tilde{{\varvec{x}}}}_i\eta _i^*\right), \end{aligned}$$(21)where \(\varvec{B}\equiv \mathrm diag (a_0,\tau _1^\beta \kappa _1^\beta , \ldots ,\tau _K^\beta \kappa _K^\beta )\) and \({\tilde{{\varvec{x}}}}_i=(1,{\varvec{{x}}}_i^{\prime })^{\prime }\).
-
2.
Draw \(\sigma ^2\) from \(\varphi (\sigma ^2|\varvec{\eta }^*,\alpha ,\varvec{\beta }) =\text{ Inv-Gamma}\left(e_0+\frac{N}{2},f_0+\frac{\sum _{i=1} ^N(\eta _i^*-\alpha -{\varvec{{x}}}_i^{\prime }\varvec{\beta })^2}{2}\right)\).
-
3.
Sample \((\gamma ,\varvec{\delta }^{\prime })^{\prime }\) from \(\varphi (\gamma ,\varvec{\delta }|\varvec{d}^*,\varvec{\tau }^\delta ,\varvec{\kappa }^\delta ) =\mathrm N [(\gamma ,\varvec{\delta }^{\prime })^{\prime }|\varvec{\mu },\varvec{\varSigma }]\) with variance
$$\begin{aligned} \varvec{\varSigma }=\left(\varvec{D}^{-1}+\sum _{i=1}^N{\tilde{{\varvec{x}}}}_i{\varvec{{x}}}_i^{\prime }\right)^{-1} \end{aligned}$$(22)and mean
$$\begin{aligned} \varvec{\mu }=\varvec{\varSigma }_{\varvec{\delta }}\left(\sum _{i=1}^N{\tilde{{\varvec{x}}}}_id_i^*\right), \end{aligned}$$(23)where \(\varvec{D}\equiv \mathrm diag (c_0,\tau _1^\delta \kappa _1^\delta ,\ldots , \tau _K^\delta \kappa _K^\delta )\) and \({\tilde{{\varvec{x}}}}_i=(1,{\varvec{{x}}}_i^{\prime })^{\prime }\).
-
4.
For \(i=1,\ldots ,N\) sample \((\eta _i^*,d_i^*)\): If \(y_i=0\), first draw \(\eta _i^*\) from
$$\begin{aligned} \varphi (\eta _i^*|\alpha ,\varvec{\beta },\sigma ^{2},\gamma ,\varvec{\delta })&= \left\{ 1-\Phi (\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta })+\exp [-\exp (\eta _i^*)] \Phi (\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta })\right\} \nonumber \\&\times \mathrm N (\eta ^*_i|\alpha +{\varvec{{x}}}_i^{\prime }\varvec{\beta },\sigma ^2), \end{aligned}$$(24)where \(\Phi (\cdot )\) denotes the standard-normal CDF. Second, draw \(d_i^*\) from
$$\begin{aligned} \varphi (d_i^*|\eta _i^*,\alpha ,\varvec{\beta },\sigma ^{2},\gamma ,\varvec{\delta })&= \frac{\rho _0}{\rho _0+\rho _1}\mathrm TN ^-(\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta },1,0)\nonumber \\&+\frac{\rho _1}{\rho _0+\rho _1}\mathrm TN ^+(\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta },1,0) \end{aligned}$$(25)with \(\rho _0=1-\Phi (\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta })\) and \(\rho _1= \Phi (\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta })\exp [-\exp (\eta _i^*)]\). \(\mathrm TN ^-(\mu ,\sigma ^2,a)\) and \(\mathrm TN ^+(\mu ,\sigma ^2,a)\) denote Normal distributions with mean \(\mu \) and variance \(\sigma ^2\) that are truncated at the right and at the left at \(a\), respectively. If \(y_i>0\), first draw \(\eta _i^*\) from
$$\begin{aligned} \varphi (\eta _i^*|\alpha ,\varvec{\beta },\sigma ^{-2},\gamma ,\varvec{\delta }) = \frac{\exp [-\exp (\eta ^*_i)]\exp (\eta ^*_iy_i)}{y_i!} \; \mathrm N (\eta ^*_i|\alpha +{\varvec{{x}}}_i^{\prime }\varvec{\beta },\sigma ^2). \end{aligned}$$(26)Second, draw \(d_i^*\) from \(\varphi (d_i^*|\gamma ,\varvec{\delta })=\mathrm TN ^+(\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta },1,0)\).
-
5.
For \(k\!=\!1,\ldots ,K\) sample \(\kappa _k^\beta \) from \(\varphi (\kappa _k^\beta |\beta _k,\tau _k^\beta )\!=\!\text{ Inv-Gamma} \left(\!g_0^\beta \!+\!\frac{1}{2},h_0^\beta \!+\!\frac{\beta _k^2}{2\tau _k^\beta } \!\right)\).
-
6.
For \(k=1,\ldots ,K\) sample \(\tau _k^\beta \) from
$$\begin{aligned} \varphi (\tau _k^\beta |\beta _k,\kappa _k^\beta ,\omega ^\beta )= \frac{\omega _{0k}}{\omega _{0k}+\omega _{1k}}\delta _{\nu ^\beta _0} +\frac{\omega _{1k}}{\omega _{0k}+\omega _{1k}}\delta _1 \end{aligned}$$with
$$\begin{aligned} \omega _{0k}=(1-\omega ^\beta )(\nu _0^\beta )^{-\frac{1}{2}} \exp \left(-\frac{\beta ^2_k}{2\nu _0^\beta \kappa _k^\beta }\right) \end{aligned}$$and
$$\begin{aligned} \omega _{1k}=\omega ^\beta \exp \left(-\frac{\beta ^2_k}{2\kappa _k^\beta }\right). \end{aligned}$$ -
7.
Sample \(\omega ^\beta \) from \(\varphi (\omega ^\beta |\varvec{\tau }^\beta )=\mathrm Beta \left(1+\#\{k:\tau _k^\beta =1\} ,1+\#\{k:\tau _k^\beta =\nu ^\beta _0\}\right).\)
-
8.
For \(k\!=\!1,\ldots ,K\) sample \(\kappa _k^\delta \) from \(\varphi (\kappa _k^\delta |\delta _k,\tau _k^\delta )\!=\!\text{ Inv-Gamma} \left(g_0^\delta +\frac{1}{2},h_0^\delta +\frac{\delta _k^2}{2\tau _k^\delta }\right)\).
-
9.
For \(k=1,\ldots ,K\) sample \(\tau _k^\delta \) from
$$\begin{aligned} \varphi (\tau _k^\delta |\delta _k,\kappa _k^\delta ,\omega ^\delta ) =\frac{\omega _{0k}}{\omega _{0k}+\omega _{1k}}\delta _{\nu ^\delta _0} +\frac{\omega _{1k}}{\omega _{0k}+\omega _{1k}}\delta _1 \end{aligned}$$with
$$\begin{aligned} \omega _{0k}=(1-\omega ^\delta )(\nu _0^\delta )^{-\frac{1}{2}} \exp \left(-\frac{\delta ^2_k}{2\nu _0^\delta \kappa _k^\delta }\right) \end{aligned}$$and
$$\begin{aligned} \omega _{1k}=\omega ^\delta \exp \left(-\frac{\delta ^2_k}{2\kappa _k^\delta }\right). \end{aligned}$$ -
10.
Sample \(\omega ^\delta \) from \(\varphi (\omega ^\delta |\varvec{\tau }^\delta )=\mathrm Beta \left(1+\#\{k:\tau _k^\delta =1\},1+\#\{k:\tau _k^\delta =\nu ^\delta _0\}\right).\)
Rights and permissions
About this article
Cite this article
Jochmann, M. What belongs where? Variable selection for zero-inflated count models with an application to the demand for health care. Comput Stat 28, 1947–1964 (2013). https://doi.org/10.1007/s00180-012-0388-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-012-0388-z
Keywords
- Bayesian
- Spike and slab model
- Model uncertainty
- Model averaging
- Count data
- Zero-inflation
- Demand for health care