Skip to main content

Robust beta regression modeling with errors-in-variables: a Bayesian approach and numerical applications

Abstract

Beta regression models have become a popular tool for describing and predicting limited-range continuous data such as rates and proportions. However, these models can be severely affected by outlying observations that the beta distribution does not handle well. A robust alternative to the modeling with the beta distribution is considering the rectangular beta (RB) distribution, which is an extension of the former one. The RB distribution can deal with heavy tails and is therefore more flexible than the beta distribution. Regression modeling where covariates are measured with error is a frequent issue in different areas. This paper derives robust regression modeling for proportions with errors-in-variables using the RB distribution under a new parametrization recently proposed in the literature. We use a Bayesian approach to estimate the model parameters with a specification of prior distributions and a computational implementation carried out via the Gibbs sampling. Monte Carlo simulations allow us to conduct numerical evaluation to detect the statistical performance of the approach considered. Then, an illustration with real-world data is presented to show its potential uses.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. Aykroyd RG, Leiva V, Marchant C (2018) Multivariate Birnbaum–Saunders distributions: modelling and applications. Risks 6:21

    Article  Google Scholar 

  2. Bayes C, Bazán J (2014) An EM algorithm for beta-rectangular regression models. Personal Communication

  3. Bayes C, Bazán J, García C (2012) A new robust regression model for proportions. Bayesian Anal 7:841–866

    MathSciNet  Article  Google Scholar 

  4. Borssoi JA, Paula GA, Galea M (2020) Elliptical linear mixed models with a covariate subject to measurement error. Stat Pap 61:31–69

    MathSciNet  Article  Google Scholar 

  5. Bouguila N, Djemel Z, Monga E (2006) Practical Bayesian estimation of a finite beta mixture through Gibbs sampling and its applications. Stat Comput 16:215–225

    MathSciNet  Article  Google Scholar 

  6. Brooks SP (2002) Discussion on the paper by Spiegelhalter, Best, Carlin, and van der Linde (2002). J R Stat Soc B 64:616–618

    Google Scholar 

  7. Brooks SP, Gelman A (1998) General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 7:434–455

    MathSciNet  Google Scholar 

  8. Buonaccorsi JP (2010) Measurement error: models, methods and applications. Chapman and Hall, Boca Raton

    Book  Google Scholar 

  9. Carlin BP, Louis TA (2001) Bayes and empirical Bayes methods for data analysis. Chapman and Hall, Boca Raton

    MATH  Google Scholar 

  10. Carrasco JMF, Ferrari SLP, Arellano-Valle RB (2014) Errors-in-variables beta regression models. J Appl Stat 41:1530–1547

    MathSciNet  Article  Google Scholar 

  11. Carrasco JMF, Figueroa-Zúniga JI, Leiva V, Riquelme M, Aykroyd RG (2020) An errors-in-variables model based on the Birnbaum–Saunders and its diagnostics with an application to earthquake data. Stoch Env Res Risk Assess 34:369–380

    Article  Google Scholar 

  12. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. Chapman and Hall, New York

    Book  Google Scholar 

  13. Carvalho CM, Polson NG, Scott JG (2009) Handling sparsity via the horseshoe. Artif Intell Stat 16:73–80

    Google Scholar 

  14. Chahuan-Jimenez K, Rubilar R, de la Fuente-Mella H, Leiva V (2021) Breakpoint analysis for the COVID-19 pandemic and its effect on the stock markets. Entropy 32:100

    Article  Google Scholar 

  15. Cheng C, Van Ness JW (1999) Statistical regression with measurement error. Oxford University Press, London

    MATH  Google Scholar 

  16. de la Fuente-Mella H, Rojas Fuentes JL, Leiva V (2020) Econometric modeling of productivity and technical efficiency in the Chilean manufacturing industry. Comput Ind Eng 139:105793

    Article  Google Scholar 

  17. Ferrari SLP, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Stat 31:799–815

    MathSciNet  Article  Google Scholar 

  18. Figueroa-Zúniga JI, Niklitschek S, Leiva V, Liu S (2022) Modeling heavy-tailed bounded data by the trapezoidal beta distribution with applications. REVSTAT, pages in press available at https://www.ine.pt/revstat/inicio.html

  19. Figueroa-Zúniga JI, Arellano-Valle RB, Ferrari SL (2013) Mixed beta regression: a Bayesian perspective. Comput Stat Data Anal 61:137–147

    MathSciNet  Article  Google Scholar 

  20. Fong Y, Rue H, Wakefield J (2010) Bayesian inference for generalized linear mixed models. Biostatistics 11:397–412

    Article  Google Scholar 

  21. Fuller WA (1987) Measurement error models. Wiley, New York

    Book  Google Scholar 

  22. García C, García J, Dorp JV (2011) Modeling heavy-tailed, skewed and peaked uncertainty phenomena with bounded support. Stat Methods Appl 20:463–486

    MathSciNet  Article  Google Scholar 

  23. Garcia-Papani F, Leiva V, Uribe-Opazo M, Aykroyd RG (2018) Birnbaum–Saunders spatial regression models: diagnostics and application to chemical data. Chemom Intell Lab Syst 177:114–128

    Article  Google Scholar 

  24. Giraldo R, Herrera L, Leiva V (2020) Cokriging prediction using as secondary variable a functional random field with application in environmental pollution. Mathematics 8:1305

    Article  Google Scholar 

  25. Hahn ED (2008) Mixture densities for project management activity times: a robust approach to PERT. Eur J Oper Res 188:450–459

    Article  Google Scholar 

  26. Hoffman MD, Gelman A (2014) The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15:1593–1623

    MathSciNet  MATH  Google Scholar 

  27. Ibrahim JG, Lipsitz SR, Chen MH (1999) Missing covariates in generalized linear models when the missing data mechanism is non-ignorable. J R Stat Soc B 61:173–190

    MathSciNet  Article  Google Scholar 

  28. Leao J, Leiva V, Saulo H, Tomazella V (2018) A survival model with Birnbaum–Saunders frailty for uncensored and censored cancer data. Braz J Probab Stat 32:707–729

    MathSciNet  Article  Google Scholar 

  29. Leiva V, Sanchez L, Galea M, Saulo H (2020) Global and local diagnostic analytics for a geostatistical model based on a new approach to quantile regression. Stoch Env Res Risk Assess 34:1457–1471

    Article  Google Scholar 

  30. Leiva V, Saulo H, Souza R, Aykroyd RG, Vila R (2021) A new BISARMA time series model for forecasting mortality using weather and particulate matter data. J Forecast 40:346–364

    MathSciNet  Article  Google Scholar 

  31. Markatou M (2000) Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56:483–486

    Article  Google Scholar 

  32. Martinez-Florez G, Leiva V, Gomez-Deniz E, Marchant C (2020) A family of skew-normal distributions for modeling proportions and rates with zeros/ones excess. Symmetry 12:1439

    Article  Google Scholar 

  33. Mazucheli J, Menezes AFB, Dey S (2018) The unit Birnbaum–Saunders distribution with applications. Chilean J Stat 9:47–57

    MathSciNet  MATH  Google Scholar 

  34. Mazucheli J, Bapat SR, Menezes AFB (2019) A new one-parameter unit Lindley distribution. Chilean J Stat 11:53–67

    MathSciNet  Google Scholar 

  35. Mazucheli M, Leiva V, Alves B, Menezes AFB (2021) A new quantile regression for modeling bounded data under a unit Birnbaum–Saunders distribution with applications in medicine and politics. Symmetry 13:682

    Article  Google Scholar 

  36. Neal R (2011) MCMC using Hamiltonian dynamics. In: Brooks S, Gelman A, Jones GL, Meng XL (eds) Handbook of Markov Chain Monte Carlo, chapter 5. Chapman and Hall, London, pp 116–162

    Google Scholar 

  37. Roberts GO, Rosenthal JS (1998) Optimal scaling of discrete approximations to Langevin diffusions. J R Stat Soc B 60:255–268

    MathSciNet  Article  Google Scholar 

  38. Saulo H, Dasilva A, Leiva V, Sanchez L, de la Fuente-Mella H (2022) Log-symmetric quantile regression models. Statistica Neerlandica. https://doi.org/10.1111/stan.12243

    Article  Google Scholar 

  39. Silva AR, Azevedo CL, Bazán J, Nobre JS (2021) Augmented-limited regression models with an application to the study of the risk perceived using continuous scales. J Appl Stat 48:1998–2021

    MathSciNet  Article  Google Scholar 

  40. Smithson M, Verkuilen J (2006) A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol Methods 11:54–71

    Article  Google Scholar 

  41. Spiegelhalter D, Best N, Carlin B, Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc B 64:583–639

    MathSciNet  Article  Google Scholar 

  42. Stan Development Team (2016) Stan Modeling Language User’s Guide and Reference Manual. Version 2(11)

  43. Ventura M, Saulo H, Leiva V, Monsueto S (2019) Log-symmetric regression models: information criteria, application to movie business and industry data with economic implications. Appl Stoch Model Bus Ind 35:963–977

    MathSciNet  Article  Google Scholar 

  44. Villa C, Walker S (2015) An objective Bayesian criterion to determine model prior probabilities. Scand J Stat 42:947–966

    MathSciNet  Article  Google Scholar 

  45. Wei C, Yang J (2020) Stochastic restricted estimation in partially linear additive errors-in-variables models. Stat Pap 61:1269–1279

    MathSciNet  Article  Google Scholar 

  46. Wolf M (2017) Hemoglobin-dilution method: effect of measurement errors on vascular volume estimation. Comput Math Methods Med

Download references

Acknowledgements

The authors would like to thank the Editors and Reviewers for their constructive comments on an earlier version of this manuscript which led to an improved presentation. The authors acknowledge funding supported by Grants: VRID 217.014.027-1 from Universidad de Concepción, Chile (J.I. Figueroa-Zúñiga), DGI-2014-0017/0070 and DGI-2014-0077/0065 from the Dirección de Gestión de la Investigación at PUCP, Peru (C.L. Bayes); and FONDECYT 1200525 from the National Agency for Research and Development (ANID) of the Chilean government (V. Leiva).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Víctor Leiva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Next, we present BUGS codes used for fitting the RB regression models with errors-in-variables.

figurea

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Figueroa-Zúñiga, J.I., Bayes, C.L., Leiva, V. et al. Robust beta regression modeling with errors-in-variables: a Bayesian approach and numerical applications. Stat Papers (2021). https://doi.org/10.1007/s00362-021-01260-1

Download citation

Keywords

  • Bayesian statistics
  • Measurement errors
  • Monte Carlo simulation
  • Regression analysis
  • Statistical software