Skip to main content

Advertisement

Log in

What belongs where? Variable selection for zero-inflated count models with an application to the demand for health care

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This paper develops a Bayesian spike and slab model for zero-inflated count models which are commonly used in health economics. We account for model uncertainty and allow for model averaging in situations with many potential regressors. The proposed techniques are applied to a German data set analyzing the demand for health care. An accompanying package for the free statistical software environment R is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. For a general introduction to Bayesian inference, see Koop (2003). Examples of Bayesian methods applied to count data applications in health economics include Jochmann and León-González (2004) and Deb et al. (2006).

  2. The original data are available for download from the Journal of Applied Econometrics Data Archive website (http://econ.queensu.ca/jae/). The version of the data set that is used in this application is also included in the R package that accompanies this article.

  3. Like Greene (2005), who uses the same data set, we changed all observations on health that were recorded between 6 and 7 to 7.

References

  • Böhning D, Dietz E, Schlattmann P, Mendonça L, Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc Series A (Statistics in Society) 162(2):195–209

    Google Scholar 

  • Deb P, Munkin MK, Trivedi PK (2006) Private insurance, selection, and health care use: a Bayesian analysis of a Roy-type model. J Bus Econ Stat 24:403–415

    Article  MathSciNet  Google Scholar 

  • Eddelbuettel D, François R (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40:1–18

    Google Scholar 

  • Gelfand A, Dey D, Chang H (1992) Model determination using predictive distributions with implementation via sampling-based methods. In: Bernardo J, Berger J, Dawid A, Smith A (eds) Bayesian statistics, vol 4. Oxford University Press, Oxford, pp 147–168

  • George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889

    Article  Google Scholar 

  • Geweke J, Keane M (2007) Smoothly mixing regressions. J Econom 138(1):252–290

    Article  MathSciNet  Google Scholar 

  • Gill J, Pang X (2009) Spike and slab prior distributions for simultaneous Bayesian hypothesis testing, model selection, and prediction, of nonlinear outcomes. Working Paper, The Society for Political Methodology

  • Greene W (2005) Functional form and heterogeneity in models for count data. Found Trends Econom 1:113–218

    Article  Google Scholar 

  • Grootendorst PV (1995) A comparison of alternative models of prescription drug utilisation. Health Econ 4:183–198

    Article  Google Scholar 

  • Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417

    Article  MathSciNet  MATH  Google Scholar 

  • Ishwaran H, Rao JS (2003) Detecting differentially expressed genes in microarrays using Bayesian model selection. J Am Stat Assoc 98(462):438–455

    Article  MathSciNet  MATH  Google Scholar 

  • Ishwaran H, Rao JS (2005) Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat 33(2):730–773

    Article  MathSciNet  MATH  Google Scholar 

  • Jochmann M, León-González R (2004) Estimating the demand for health care with panel data: a semiparametric Bayesian approach. Health Econ 13:1003–1014

    Article  Google Scholar 

  • Koop G (2003) Bayesian econometrics. Wiley, Chichester

    Google Scholar 

  • Lambert D (1992) Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14

    Article  MATH  Google Scholar 

  • Liu JS (2001) Monte carlo strategies in scientific computing. Springer, New York

    MATH  Google Scholar 

  • Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546

    Article  MATH  Google Scholar 

  • Mitchell TJ, Beauchamp JJ (1988) Bayesian variable selection in linear regression. J Am Stat Assoc 83:1023–1032

    Article  MathSciNet  MATH  Google Scholar 

  • Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33:341–365

    Article  MathSciNet  Google Scholar 

  • Pizer SD, Prentice JC (2011) Time is money: outpatient waiting times and health insurance choices of elderly veterans in the United States. J Health Econ 30:626–636

    Google Scholar 

  • R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  • Riphahn RT, Wambach A, Million A (2003) Incentive effects in the demand for health care: a bivariate panel count data estimation. J Appl Econ 18:387–405

    Article  Google Scholar 

  • Robert CP, Casella G (2004) Monte carlo statistical methods, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Sanderson C (2010) Armadillo: an open source C++ linear algebra library for fast prototyping and computationally intensive experiments. Technical Report, NICTA

  • Sari N (2009) Physical inactivity and its impact on healthcare utilization. Health Econ 18:885–901

    Article  Google Scholar 

  • Street A, Jones A, Furuta A (1999) Cost-sharing and pharmaceutical utilisation and expenditure in Russia. J Health Econ 18:459–472

    Article  Google Scholar 

  • Tanner MA, Wong W (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82:528–550

    Article  MathSciNet  MATH  Google Scholar 

  • Wagner GG, Frick JR, Schupp J (2007) The German socio-economic panel study (SOEP) - scope, evolution and enhancements. Schmollers Jahrbuch 127:139–169

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Jochmann.

Additional information

We thank Gary Koop and two anonymous referees for their valuable comments.

Appendix: Details of the sampling algorithm

Appendix: Details of the sampling algorithm

The proposed Gibbs sampling algorithm consists of the following steps:

  1. 1.

    Sample \((\alpha ,\varvec{\beta }^{\prime })^{\prime }\) from \(\varphi (\alpha ,\varvec{\beta }|\varvec{\eta }^*,\sigma ^2,\varvec{\tau }^\beta , \varvec{\kappa }^\beta )=\mathrm N [(\alpha ,\varvec{\beta }^{\prime })^{\prime }|\varvec{\mu },\varvec{\varSigma }]\) with variance

    $$\begin{aligned} \varvec{\varSigma }=\left(\varvec{B}^{-1}+\frac{1}{\sigma ^2}\sum _{i=1} ^N{\tilde{{\varvec{x}}}}_i{\tilde{{\varvec{x}}}}_i^{\prime }\right)^{-1} \end{aligned}$$
    (20)

    and mean

    $$\begin{aligned} \varvec{\mu }=\varvec{\varSigma }_{\varvec{\beta }}\left(\frac{1}{\sigma ^2} \sum _{i=1}^N{\tilde{{\varvec{x}}}}_i\eta _i^*\right), \end{aligned}$$
    (21)

    where \(\varvec{B}\equiv \mathrm diag (a_0,\tau _1^\beta \kappa _1^\beta , \ldots ,\tau _K^\beta \kappa _K^\beta )\) and \({\tilde{{\varvec{x}}}}_i=(1,{\varvec{{x}}}_i^{\prime })^{\prime }\).

  2. 2.

    Draw \(\sigma ^2\) from \(\varphi (\sigma ^2|\varvec{\eta }^*,\alpha ,\varvec{\beta }) =\text{ Inv-Gamma}\left(e_0+\frac{N}{2},f_0+\frac{\sum _{i=1} ^N(\eta _i^*-\alpha -{\varvec{{x}}}_i^{\prime }\varvec{\beta })^2}{2}\right)\).

  3. 3.

    Sample \((\gamma ,\varvec{\delta }^{\prime })^{\prime }\) from \(\varphi (\gamma ,\varvec{\delta }|\varvec{d}^*,\varvec{\tau }^\delta ,\varvec{\kappa }^\delta ) =\mathrm N [(\gamma ,\varvec{\delta }^{\prime })^{\prime }|\varvec{\mu },\varvec{\varSigma }]\) with variance

    $$\begin{aligned} \varvec{\varSigma }=\left(\varvec{D}^{-1}+\sum _{i=1}^N{\tilde{{\varvec{x}}}}_i{\varvec{{x}}}_i^{\prime }\right)^{-1} \end{aligned}$$
    (22)

    and mean

    $$\begin{aligned} \varvec{\mu }=\varvec{\varSigma }_{\varvec{\delta }}\left(\sum _{i=1}^N{\tilde{{\varvec{x}}}}_id_i^*\right), \end{aligned}$$
    (23)

    where \(\varvec{D}\equiv \mathrm diag (c_0,\tau _1^\delta \kappa _1^\delta ,\ldots , \tau _K^\delta \kappa _K^\delta )\) and \({\tilde{{\varvec{x}}}}_i=(1,{\varvec{{x}}}_i^{\prime })^{\prime }\).

  4. 4.

    For \(i=1,\ldots ,N\) sample \((\eta _i^*,d_i^*)\): If \(y_i=0\), first draw \(\eta _i^*\) from

    $$\begin{aligned} \varphi (\eta _i^*|\alpha ,\varvec{\beta },\sigma ^{2},\gamma ,\varvec{\delta })&= \left\{ 1-\Phi (\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta })+\exp [-\exp (\eta _i^*)] \Phi (\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta })\right\} \nonumber \\&\times \mathrm N (\eta ^*_i|\alpha +{\varvec{{x}}}_i^{\prime }\varvec{\beta },\sigma ^2), \end{aligned}$$
    (24)

    where \(\Phi (\cdot )\) denotes the standard-normal CDF. Second, draw \(d_i^*\) from

    $$\begin{aligned} \varphi (d_i^*|\eta _i^*,\alpha ,\varvec{\beta },\sigma ^{2},\gamma ,\varvec{\delta })&= \frac{\rho _0}{\rho _0+\rho _1}\mathrm TN ^-(\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta },1,0)\nonumber \\&+\frac{\rho _1}{\rho _0+\rho _1}\mathrm TN ^+(\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta },1,0) \end{aligned}$$
    (25)

    with \(\rho _0=1-\Phi (\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta })\) and \(\rho _1= \Phi (\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta })\exp [-\exp (\eta _i^*)]\). \(\mathrm TN ^-(\mu ,\sigma ^2,a)\) and \(\mathrm TN ^+(\mu ,\sigma ^2,a)\) denote Normal distributions with mean \(\mu \) and variance \(\sigma ^2\) that are truncated at the right and at the left at \(a\), respectively. If \(y_i>0\), first draw \(\eta _i^*\) from

    $$\begin{aligned} \varphi (\eta _i^*|\alpha ,\varvec{\beta },\sigma ^{-2},\gamma ,\varvec{\delta }) = \frac{\exp [-\exp (\eta ^*_i)]\exp (\eta ^*_iy_i)}{y_i!} \; \mathrm N (\eta ^*_i|\alpha +{\varvec{{x}}}_i^{\prime }\varvec{\beta },\sigma ^2). \end{aligned}$$
    (26)

    Second, draw \(d_i^*\) from \(\varphi (d_i^*|\gamma ,\varvec{\delta })=\mathrm TN ^+(\gamma +{\varvec{{x}}}_i^{\prime }\varvec{\delta },1,0)\).

  5. 5.

    For \(k\!=\!1,\ldots ,K\) sample \(\kappa _k^\beta \) from \(\varphi (\kappa _k^\beta |\beta _k,\tau _k^\beta )\!=\!\text{ Inv-Gamma} \left(\!g_0^\beta \!+\!\frac{1}{2},h_0^\beta \!+\!\frac{\beta _k^2}{2\tau _k^\beta } \!\right)\).

  6. 6.

    For \(k=1,\ldots ,K\) sample \(\tau _k^\beta \) from

    $$\begin{aligned} \varphi (\tau _k^\beta |\beta _k,\kappa _k^\beta ,\omega ^\beta )= \frac{\omega _{0k}}{\omega _{0k}+\omega _{1k}}\delta _{\nu ^\beta _0} +\frac{\omega _{1k}}{\omega _{0k}+\omega _{1k}}\delta _1 \end{aligned}$$

    with

    $$\begin{aligned} \omega _{0k}=(1-\omega ^\beta )(\nu _0^\beta )^{-\frac{1}{2}} \exp \left(-\frac{\beta ^2_k}{2\nu _0^\beta \kappa _k^\beta }\right) \end{aligned}$$

    and

    $$\begin{aligned} \omega _{1k}=\omega ^\beta \exp \left(-\frac{\beta ^2_k}{2\kappa _k^\beta }\right). \end{aligned}$$
  7. 7.

    Sample \(\omega ^\beta \) from \(\varphi (\omega ^\beta |\varvec{\tau }^\beta )=\mathrm Beta \left(1+\#\{k:\tau _k^\beta =1\} ,1+\#\{k:\tau _k^\beta =\nu ^\beta _0\}\right).\)

  8. 8.

    For \(k\!=\!1,\ldots ,K\) sample \(\kappa _k^\delta \) from \(\varphi (\kappa _k^\delta |\delta _k,\tau _k^\delta )\!=\!\text{ Inv-Gamma} \left(g_0^\delta +\frac{1}{2},h_0^\delta +\frac{\delta _k^2}{2\tau _k^\delta }\right)\).

  9. 9.

    For \(k=1,\ldots ,K\) sample \(\tau _k^\delta \) from

    $$\begin{aligned} \varphi (\tau _k^\delta |\delta _k,\kappa _k^\delta ,\omega ^\delta ) =\frac{\omega _{0k}}{\omega _{0k}+\omega _{1k}}\delta _{\nu ^\delta _0} +\frac{\omega _{1k}}{\omega _{0k}+\omega _{1k}}\delta _1 \end{aligned}$$

    with

    $$\begin{aligned} \omega _{0k}=(1-\omega ^\delta )(\nu _0^\delta )^{-\frac{1}{2}} \exp \left(-\frac{\delta ^2_k}{2\nu _0^\delta \kappa _k^\delta }\right) \end{aligned}$$

    and

    $$\begin{aligned} \omega _{1k}=\omega ^\delta \exp \left(-\frac{\delta ^2_k}{2\kappa _k^\delta }\right). \end{aligned}$$
  10. 10.

    Sample \(\omega ^\delta \) from \(\varphi (\omega ^\delta |\varvec{\tau }^\delta )=\mathrm Beta \left(1+\#\{k:\tau _k^\delta =1\},1+\#\{k:\tau _k^\delta =\nu ^\delta _0\}\right).\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jochmann, M. What belongs where? Variable selection for zero-inflated count models with an application to the demand for health care. Comput Stat 28, 1947–1964 (2013). https://doi.org/10.1007/s00180-012-0388-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-012-0388-z

Keywords

JEL Classification

Navigation