Abstract
This paper introduces a new and computationally efficient Markov chain Monte Carlo (MCMC) estimation algorithm for the Bayesian analysis of zero, one, and zero and one inflated beta regression models. The algorithm is computationally efficient in the sense that it has low MCMC autocorrelations and computational time. A simulation study shows that the proposed algorithm outperforms the slice sampling and random walk Metropolis–Hastings algorithms in both small and large sample settings. An empirical illustration on a loss given default banking model demonstrates the usefulness of the proposed algorithm.
Similar content being viewed by others
Notes
Note that the CG algorithm is only applicable to NIB regression models and to the continuous portion of the proposed ZOIB model. Moreover, although the CG algorithm can be nested in the DA algorithm to create another algorithm for the first scenario, this nesting is not considered as the results from the second scenario suggest that the QL-based algorithm marginally outperforms the CG algorithm for \(\beta \) but significantly outperforms CG for \(\phi \).
The priors are set as follows. The hyperparameters \(b_0\) and \(g_0\) are appropriately-sized vectors of zeros. The matrices \(B_0\) and \(G_0\) are identity matrices multiplied by 1000. The hyperparameters \(s_0\) and \(S_0\) are respectively set to 1 and 100. For the truncated normal prior, \(p_0\) and \(P_0\) are respectively set to 2 and 2000. For the log-normal prior, denoted as \(\mathcal {LN}(\phi \vert l_0, L_0)\), \(l_0\) and \(L_0\) are respectively set to 0 and 4. These priors are set so that they are relatively uninformative. Note that the priors for \(\beta \) and \(\phi \) from the second scenario are also set this way.
Using starting values other than the maximum likelihood estimates do not affect convergence or the qualitative results.
This implementation is confirmed with Matlab technical support.
Computational time includes the time to obtain the QL estimates.
The CG algorithm, which also samples \(\beta \) and \(\phi \) conditional on each other, is based on first-order Taylor series approximations around the most up-to-date values of the parameters. One possible reason for the CG algorithm not performing as well as the QL algorithm may be due to approximation errors in the first-order Taylor series expansions used by CG.
Although a model that accounts for temporal correlations can be used, the variation in this dataset is too limited to meaningfully estimate such model specifications.
References
Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679
Banking Committee on Banking Supervision (2001) Overview of the new basel capital accord. Technical report. http://www.bis.org/publ/bcbsca02.pdf
Billio M, Casarin R (2011) Beta autoregressive transition Markov–Switching models for business cycle analysis. Stud Nonlinear Dyn Econom 15(4):1–32
Bonat WH, Ribeiro PJ Jr, Zeviani WM (2015) Likelihood analysis for a class of beta mixed models. J Appl Stat 42(2):252–266
Branscum AJ, Johnson WO, Thurmond MC (2007) Bayesian beta regression: applications to household expenditure data and genetic distance between foot-and-mouth disease viruses. Aust N Z J Stat 49(3):287–301
Calabrese R (2012) Regression Model for Proportions with Probability Masses at Zero and One. Working Papers 201209, Geary Institute, University College Dublin
Casarin R, Dalla Valle L, Leisen F et al (2012) Bayesian model selection for beta autoregressive processes. Bayesian Anal 7(2):385–410
Cepeda E, Gamerman D (2005) Bayesian methodology for modeling parameters in the two parameter exponential family. Revista Estadística 57(168–169):93–105
Cepeda-Cuervo E, Garrido L (2015) Bayesian beta regression models with joint mean and dispersion modeling. Monte Carlo Methods Appl 21(1):49–58
Cook DO, Kieschnick R, McCullough BD (2008) Regression analysis of proportions in finance with self selection. J Empir Financ 15(5):860–867
Da-Silva C, Migon H (2012) Hierarchical dynamic beta model. Technical report 253, Departmento de Metodos Estatisticos, Universidade Federal do Rio de Janeiro
Da-Silva C, Migon HS, Correia L (2011) Dynamic bayesian beta models. Comput Stat Data Anal 55(6):2074–2089
Ferrari S, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Stat 31(7):799–815
Figueroa-Zúniga JI, Arellano-Valle RB, Ferrari SL (2013) Mixed beta regression: a Bayesian perspective. Comput Stat Data Anal 61:137–147
Imai K, van Dyk DA (2005) A bayesian analysis of the multinomial probit model using marginal data augmentation. J Econom 124(2):311–334
Jeliazkov I, Graves J, Kutzbach M (2008) Fitting and comparison of models for multivariate ordinal outcomes. Adv Econom 23:115–156
Kieschnick R, McCullough BD (2003) Regression analysis of variates observed on (0, 1): percentages, proportions and fractions. Stat Model 3(3):193–213
Le Cam L, Yang GL (2012) Asymptotics in statistics: some basic concepts. Springer, Berlin
Li P, Qi M, Zhang X, Zhao X (2016) Further investigation of parametric loss given default modeling. J Credit Risk 12(4):17–47
Liu F, Eugenio EC (2016) A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression. Statistical methods in medical research, p 0962280216650699
Liu F, Kong Y (2015) zoib: an R package for bayesian inference for beta regression and zero/one inflated beta regression. R J 7(2):34–51
Liu JS, Wong WH, Kong A (1994) Covariance structure of the gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81(1):27–40
McFadden D (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P (ed) Frontiers in econometrics. Academic Press, New York, pp 105–142
Neal RM (2003) Slice sampling. Ann Stat 31:705–741
Ospina R, Ferrari SL (2010) Inflated beta distributions. Stat Papers 51(1):111–126
Ospina R, Ferrari SL (2012) A general class of zero-or-one inflated beta regression models. Comput Stat Data Anal 56(6):1609–1623
Paolino P (2001) Maximum likelihood estimation of models with beta-distributed dependent variables. Polit Anal 9(4):325–346
Papke LE, Wooldridge JM (1996) Econometric methods for fractional response variables with an application to 401 (k) plan participation rates. J Appl Econom 11(6):619–632
Prakasa Rao BLS (1987) Asymptotic theory of statistical inference. Wiley, New York
Qi M, Zhao X (2011) Comparison of modeling methods for loss given default. J Bank Financ 35(11):2842–2855
Qi M, Zhao X (2013) Debt structure, market value of firm and recovery rate. J Credit Risk 9(1):3–37
Rocha AV, Cribari-Neto F (2009) Beta autoregressive moving average models. Test 18(3):529–545
Rydlewski JP (2007) Beta-regression model for periodic data with a trend. Univ Lagellonicae Acta Math 45:211–222
Simas AB, Barreto-Souza W, Rocha AV (2010) Improved estimators for a general class of beta regression models. Comput Stat Data Anal 54(2):348–366
Smithson M, Verkuilen J (2006) A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol Methods 11(1):54
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82(398):528–540
Verkuilen J, Smithson M (2012) Mixed and mixture regression models for continuous bounded responses using the beta distribution. J Educ Behav Stat 37(1):82–113
Wieczorek J, Nugent C, Hawala S (2012) A bayesian zero-one inflated beta model for small area shrinkage estimation. In: Proceedings, section on survey research methods. American Statistical Association, Alexandria, VA
Zimprich D (2010) Modeling change in skewed variables using mixed beta regression models. Res Hum Dev 7(1):9–26
Acknowledgements
I would like to thank Xinlei Zhao for generously providing the data and model specification. Also, Sibel Sirakaya, Alicia Lloro, Angela Vossmeyer, Jonathan Cook, and Andrew Chang provided numerous suggestions that improved this paper. I am also indebted to an anonymous referee for providing helpful suggestions. This paper was started before I joined the Office of Financial Research. The views and opinions expressed here are not necessarily those of the Department of the Treasury or of the Office of Financial Research.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Parameterization of the inverse-gamma distribution
If X has an inverse-gamma distribution with shape a and scale b, then the density function is parameterized as
The mean is \(\mathbb {E}(X) = (b(a-1))^{-1}\) for \(a>1\), and the variance is \(\mathbb {V}(X)=(b^2(a-1)^2(a-2))^{-1}\) for \(a>2\).
1.2 Quasi-likelihood quantities
The following describes the method to obtain consistent estimates of \(\beta \) and \(\phi \) (or equivalently, \(\psi \)). Following Papke and Wooldridge (1996), the first step is to maximize the following log Bernoulli quasi-likelihood function
with respect to the \(K_2 \times 1\) vector b. In (23), \(t=1,\ldots ,n_{01}\) indexes the \(N_{01}\) subsample, and \(n_{01}\) is the cardinality of \(N_{01}\). Note that the referenced paper expresses the log QL function in terms of an arbitrary function \(G(\cdot )\); in this paper, \(G( x_t^\top b) = \text {logit}^{-1}(x_t^\top b) = \frac{ \exp (x_t^\top b) }{ 1 + \exp (x_t^\top b)}\), which leads to (23). Papke and Wooldridge (1996) show that the maximizing vector, denoted as \(\widehat{\beta }\), is a consistent estimator of \(\beta \), as long as \(\mathbb {E}(y_t) = \frac{\exp (x_t^\top \beta )}{1+\exp (x_t^\top \beta )}\) for \(t=1,\ldots ,n_{01}\) (which holds due to (3) and (8) holding for \(i \in N_{01}\)). Next, under the assumption that \(\mathbb {V}(y_t) = \delta ^2 \mu _t (1-\mu _t)\) for \(t=1,\ldots ,n_{01}\) with \(\delta ^2 = \frac{1}{1+\phi }\) (which is true given (9) for \(i \in N_{01}\)), the authors also show that the estimator
is consistent for \(\delta ^2\), where
and
Then, by the continuous mapping theorem, the estimator \(\widehat{\phi }= \frac{1-\widehat{\delta }^2}{\widehat{\delta }^2}\) is consistent for \(\phi \). The vector \(\widehat{\psi }\) in Sect. 3.1 is defined as \(\widehat{\psi } = (\widehat{\beta }^\top , \widehat{\phi })^\top \).
1.3 Information matrix quantities
Partition \(\widehat{K}\) as
Following Ferrari and Cribari-Neto (2004), the quantities in (24) are defined as
where \(t=1,\ldots , n_{01}\), \(X_{01}=(x_1, \ldots , x_{n_{01}})^\top \) is an \(n_{01} \times K_2\) matrix of covariates for the observations in \(N_{01}\), and \(\psi '(\cdot )\) denotes the trigamma function. Note that the quantities above are defined over \(N_{01}\), while the quantities in Ferrari and Cribari-Neto (2004) are defined over the full sample. Also, the analogous quantities for (25) and (26) in Ferrari and Cribari-Neto (2004) are expressed for an arbitrary link function \(g( \mu _t)\). In this paper, \(g(\mu _t) = \text {logit}(\mu _t) = \log (\frac{\mu _t}{1-\mu _t})\), and \(g'(\mu _t) = \frac{1}{\mu _t(1-\mu _t)}\) (the first derivative), so plugging these quantities into the formulas in the referenced paper results in the expressions for (25) and (26).
Rights and permissions
About this article
Cite this article
Li, P. Efficient MCMC estimation of inflated beta regression models. Comput Stat 33, 127–158 (2018). https://doi.org/10.1007/s00180-017-0747-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0747-x