Abstract
This paper presents a new class of regression models for continuous data restricted to the interval (0, 1), such as rates and proportions. The proposed class of models assumes a beta distribution for the variable of interest with regression structures for the mean and dispersion parameters. These structures consider covariates, unknown regression parameters, and parametric link functions. Link functions depend on parameters that model the relationship between the random component and the linear predictors. The symmetric and asymmetric Aranda-Ordaz link functions are considered in details. Depending on the parameter values, these link functions refer to particular cases of fixed links such as logit and complementary log–log functions. Joint estimation of the regression and link function parameters is performed by maximum likelihood. Closed-form expressions for the score function and Fishers information matrix are presented. Aspects of large sample inferences are discussed, and some diagnostic measures are proposed. A Monte Carlo simulation study is used to evaluate the finite sample performance of point estimators. Finally, a practical application that employs real data is presented and discussed.
Similar content being viewed by others
Notes
When constant mean and dispersion are considered, no regression structures are considered; thus, there are no estimates for \(\lambda _\delta \).
References
Adewale AJ, Xu X (2010) Robust designs for generalized linear models with possible overdispersion and misspecified link functions. Comput Stat Data Anal 54(4):875–890
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–726
Akaike H (1983) Information measures and model selection. Bull Int Stat Inst 50:277–290
Andrade ACG (2007) Efeitos da especificação incorreta da função de ligação no modelo de regressão beta. Master’s thesis, Universidade Federal de São Paulo
Aranda-Ordaz FJ (1981) On two families of transformations to additivity for binary response data. Biometrika 68(2):357–363
Atkinson A (1981) Two graphical display for outlying and influential observations in regression. Biometrika 68(1):13–20
Atkinson AC (1985) Plots, transformations and regression: an introduction to graphical methods of diagnostic regression analysis. Oxford University Press, New York
Bayer FM, Cribari-Neto F (2017) Model selection criteria in beta regression with varying dispersion. Commun Stat Simul Comput 46(1):729–746
Colosimo EA, Chalita LVAS, Demétrio CGB (2000) Tests of proportional hazards and proportional odds models for grouped survival data. Biometrics 56(4):1233–1240
Cook RD (1977) Detection of influential observations in linear regression. Technometrics 19(1):15–18
Cox DR, Reid N (1987) Parameter orthogonality and approximate conditional inference. J R Stat Soc B 49(1):1–39
Cribari-Neto F, Souza TC (2012) Testing inference in variable dispersion beta regressions. J Stat Comput Simul 82(12):1827–1843
Cribari-Neto F, Souza TC (2013) Religious belief and intelligence: worldwide evidence. Intelligence 41(5):482–489
Czado C (1994) Parametric link modification of both tails in binary regression. Stat Pap 35(1):189–201
Czado C (1997) On selecting parametric link transformation families in generalized linear models. J Stat Plan Inference 61(1):125–139
Czado C, Raftery AE (2006) Choosing the link function and accounting for link uncertainty in generalized linear models using Bayes factors. Stat Pap 47(3):419–442
Dehbi H, Cortina-Borja M, Geraci M (2014) AOfamilies: Aranda-Ordaz transformation families. R Package. http://cran.r-project.org/package=AOfamilies
Dehbi HM, Cortina-Borja M, Geraci M (2016) Aranda-Ordaz quantile regression for student performance assessment. J Appl Stat 43(1):58–71
Espinheira P, Ferrari SLP, Cribari-Neto F (2008a) On beta regression residuals. J Appl Stat 35(4):407–419
Espinheira PL, Ferrari SLP, Cribari-Neto F (2008b) Influence diagnostics in beta regression. Comput Stat Data Anal 52(9):4417–4431
Ferrari SLP, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Stat 31(7):799–815
Ferrari SLP, Pinheiro EC (2011) Improved likelihood inference in beta regression. J Stat Comput Simul 81(4):431–443
Ferrari SLP, Espinheira PL, Cribari-Neto F (2011) Diagnostic tools in beta regression with varying dispersion. Stat Neerl 65(3):337–351
Geraci M, Jones MC (2015) Improved transformation-based quantile regression. Can J Stat 43(1):118–132
Gomes GSdS, Ludermir TB (2013) Optimization of the weights and asymmetric activation function family of neural network for time series forecasting. Expert Syst Appl 40(16):6438–6446
Guerrero VM, Johnson RA (1982) Use of the Box-Cox transformation with binary response models. Biometrika 69(2):309–314
Kaiser MS (1997) Maximum likelihood estimation of link function parameters. Comput Stat Data Anal 24(1):79–87
Koenker R, Yoon J (2009) Parametric links for binary choice models: a Fisherian-Bayesian colloquy. J Econ 152(2):120–130
McCullagh P, Nelder J (1989) Generalized linear models, 2nd edn. Chapman and Hall, Boca Raton
Morgan BJ (1992) Analysis of quantal response data. Chapman and Hall/CRC, Boca Raton
Nagelkerke NJD (1991) A note on a general definition of the coefficient of determination. Biometrika 78(3):691–692
Neyman J, Pearson ES (1928) On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika 20A(1/2):175–240
Oliveira JSC (2013) Detectando má especificação em regressão beta. Master’s thesis, Universidade Federal de Pernanbuco
Ospina R, Ferrari SLP (2012) A general class of zero-or-one inflated beta regression models. Comput Stat Data Anal 56(6):1609–1623
Ospina R, Cribari-Neto F, Vasconcellos KLP (2006) Improved point and interval estimation for a beta regression model. Comput Stat Data Anal 51(2):960–981
Paolino P (2001) Maximum likelihood estimation of models with beta-distributed dependent variables. Polit Anal 9(4):325–346
Pawitan Y (2001) In all likelihood: statistical modelling and inference using likelihood. Oxford Science Publications, Oxford
Pereira TL, Cribari-Neto F (2013) Detecting model misspecification in inflated beta regressions. Commun Stat Simul Comput 43(3):631–656
Pregibon D (1980) Goodness of link tests for generalized linear models. Appl Stat 29(1):15–24
Press W, Teukolsky S, Vetterling W, Flannery B (1992) Numerical recipes in C: the art of scientific computing, 2nd edn. Cambridge University Press, London
R Development Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0
Ramalho EA, Ramalho JJ, Murteira JMR (2011) Alternative estimating and testing empirical strategies for fractional regression models. J Econ Surv 25(1):16–68
Ramsey JB (1969) Tests for specification errors in classical linear least-squares regression analysis. J R Stat Soc 31(2):350–371
Rao C (1948) Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Math Proc Camb Philos Soc 44(1):50–57
Rigby R, Stasinopoulos D (2005) Generalized additive models for location, scale and shape (with discussion). Appl Stat 54(3):507–554
Scallan A, Guilchrist R, Green M (1984) Fitting parametric link functions in generalized linear models. Comput Stat Data Anal 2(1):37–49
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Simas AB, Barreto-Souza W, Rocha AV (2010) Improved estimators for a general class of beta regression models. Comput Stat Data Anal 54(2):348–366
Smith DM (2003) Computing single parameter transformations. Commun Stat Simul Comput 32(3):605–618
Smithson M, Verkuilen J (2006) A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol Methods 11(1):54–71
Smyth GK, Verbyla AP (1999) Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10(6):695–709
Stukel TA (1988) Generalized logistic models. J Am Stat Assoc 83(402):426–431
Taneichi N, Sekiya Y, Toyama J (2014) A new family of parametric links for binomial generalized linear models. J Jpn Stat Soc 44(2):119–133
Terrell GR (2002) The gradient statistic. Comput Sci Stat 34:206–215
Vargas TM, Ferrari SL, Lemonte AJ (2014) Improved likelihood inference in generalized linear models. Comput Stat Data Anal 74:110–124
Wald A (1943) Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans Am Math Soc 54:426–482
Zhao W, Zhang R, Lv Y, Liu J (2014) Variable selection for varying dispersion beta regression model. J Appl Stat 41(1):95–108
Zimprich D (2010) Modeling change in skewed variables using mixed beta regression models. Res Hum Dev 7(1):9–26
Acknowledgements
This research was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this appendix we obtain the score function and the Fisher’s information matrix for all parameters (\(\varvec{\beta }\),\(\varvec{\gamma }\),\(\lambda _1\),\(\lambda _2\)).
The elements of the score vector are given by:
for \(i=1,\ldots ,r\) and \(j=1, \ldots , s\), where \(\dfrac{\partial \ell _{t}(\mu _{t},\sigma _{t})}{\partial \mu _t} = \dfrac{1-\sigma ^2_t}{\sigma ^2_t}(y^*_t-\mu ^*_t)\), \(\dfrac{\partial \mu _t}{\partial \eta _{1t}} = \left[ \dfrac{\partial g_1(\mu _{t},\lambda _1)}{\partial \mu _t}\right] ^{-1}\), \(\dfrac{\partial \eta _{1t}}{\partial \beta _{i}}=x_{ti}\), \(\dfrac{\partial \ell _{t}(\mu _{t},\sigma _{t})}{\partial \sigma _t}=a_t\), \(\dfrac{\partial \sigma _t}{\partial \eta _{2t}} = \left[ \dfrac{\partial g_2(\sigma _{t},\lambda _2)}{\partial \sigma _t}\right] ^{-1}\) and \(\dfrac{\partial \eta _{2t}}{\partial \gamma _{i}}=z_{tj}\).
The second order derivatives of the log-likelihood function are given by:
where \(\dfrac{\partial }{\partial \lambda _2} \left( \dfrac{\partial \mu _t}{\partial \eta _{1t}} \right) =0\),
Taking the expected value of the second order derivatives given above, since \(\mathbb {E}\left( \dfrac{\partial \ell _t(\mu _t,\sigma _t)}{\partial \mu _t} \right) = 0\), we have:
Since
we arrive at the conclusion that
In relation to \(\beta _i\) and \(\lambda _1\), we have:
The expected value of the second order derivative with respect to \(\beta _i\) and \(\lambda _2\) is given by:
Since \(\mathbb {E}\left( \dfrac{\partial \ell _{t}(\mu _{t},\sigma _{t})}{\partial \sigma _t}\right) =0\), we have
where
With respect to \(\gamma _j\) and \(\lambda _1\), we have:
For \(\gamma _j\) and \(\lambda _2\), we have:
Finally, we have:
and
In matrix form, we have:
Rights and permissions
About this article
Cite this article
Canterle, D.R., Bayer, F.M. Variable dispersion beta regressions with parametric link functions. Stat Papers 60, 1541–1567 (2019). https://doi.org/10.1007/s00362-017-0885-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0885-9
Keywords
- Aranda-Ordaz link function
- Maximum likelihood estimator
- Parametric link functions
- Variable dispersion beta regression