Quasi-beta Longitudinal Regression Model Applied to Water Quality Index Data

  • Ricardo Rasmussen PetterleEmail author
  • Wagner Hugo Bonat
  • Cassius Tadeu Scarpin


We propose a new class of regression models to deal with longitudinal continuous bounded data. The model is specified using second-moment assumptions, and we employ an estimating function approach for parameter estimation and inference. The main advantage of the proposed approach is that it does not need to assume a multivariate probability distribution for the response vector. The fitting procedure is easily implemented using a simple and efficient Newton scoring algorithm. Thus, the quasi-beta longitudinal regression model can easily handle data in the unit interval, including exact zeros and ones. The covariance structure is defined in terms of a matrix linear predictor composed by known matrices. A simulation study was conducted to check the properties of the estimating function estimators of the regression and dispersion parameter estimators. The NORTA algorithm (NORmal To Anything) was used to simulate correlated beta random variables. The results of this simulation study showed that the estimators are consistent and unbiased for large samples. The model is motivated by a data set concerning the water quality index, whose goal is to investigate the effect of dams on the water quality index measured on power plant reservoirs. Furthermore, diagnostic techniques were adapted to the proposed models, such as DFFITS, DFBETAS, Cook’s distance and half-normal plots with simulated envelope. The R code and data set are available in the supplementary material.


Unit interval Longitudinal data Estimating function Diagnostic techniques Simulation study NORTA algorithm 



  1. Abbasi, T. and Abbasi, S. A. (2012). Water quality indices, Elsevier.Google Scholar
  2. Barndorff-Nielsen, O. E. and Jørgensen, B. (1991). Some parametric models on the simplex, Journal of Multivariate Analysis 39(1): 106–116.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Bayer, F. M., Bayer, D. M. and Pumi, G. (2017). Kumaraswamy autoregressive moving average models for double bounded environmental data, Journal of Hydrology 555: 385–396.CrossRefGoogle Scholar
  4. Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression diagnostics. J, Wiley & Sons, New York, New York.CrossRefzbMATHGoogle Scholar
  5. Bonat, W. H. (2016). mcglm: Multivariate covariance generalized linear models, R package version 0.4.0.
  6. ——— (2017). Modelling mixed types of outcomes in additive genetic models, The International Journal of Biostatistics 13(2): 1–16.Google Scholar
  7. ——— (2018). Multiple response variables regression models in R: The mcglm package, Journal of StatisticalSoftware 84(1): 1–30.Google Scholar
  8. Bonat, W. H. and Jørgensen, B. (2016). Multivariate covariance generalized linear models, Journal of the Royal Statistical Society: Series C (Applied Statistics) 65(5): 649–675.MathSciNetCrossRefGoogle Scholar
  9. Bonat, W. H., Jørgensen, B., Kokonendji, C. C., Hinde, J. and Demétrio, C. G. (2018). Extended Poisson–Tweedie: properties and regression models for count data, Statistical Modelling 18(1): 24–49.MathSciNetCrossRefGoogle Scholar
  10. Bonat, W. H., Lopes, J. E., Shimakura, S. E. and Ribeiro Jr, P. J. (2018). Likelihood analysis for a class of simplex mixed models., Chilean Journal of Statistics 9(2).Google Scholar
  11. Bonat, W. H., Petterle, R. R., Hinde, J. and Demétrio, C. G. (2018). Flexible quasi-beta regression models for continuous bounded data, Statistical Modelling p. (published online).Google Scholar
  12. Bonat, W. H., Ribeiro Jr, P. J. and Shimakura, S. E. (2015). Bayesian analysis for a class of beta mixed models, Chilean Journal of Statistics 6(1): 3–13.MathSciNetGoogle Scholar
  13. Bonat, W. H., Ribeiro Jr, P. J. and Zeviani, W. M. (2012). Regression models with responses on the unit interval: specification, estimation and comparison, Biometric Brazilian Journal 30(4): 415–431.Google Scholar
  14. ——— (2015). Likelihood analysis for a class of beta mixed models, Journal of Applied Statistics 42(2): 252–266.Google Scholar
  15. Bonat, W., Olivero, J., Grande-Vega, M., Farfán, M. and Fa, J. (2017). Modelling the covariance structure in marginal multivariate count models: Hunting in bioko island, Journal of Agricultural, Biological and Environmental Statistics pp. 1–19.Google Scholar
  16. Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models, Journal of the American statistical Association 88(421): 9–25.zbMATHGoogle Scholar
  17. Cario, M. C. and Nelson, B. L. (1997). Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix, Technical report, Citeseer.Google Scholar
  18. Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots, Journal of the American Statistical Association 74(368): 829–836.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Cook, R. D. (1977). Detection of influential observation in linear regression, Technometrics 19(1): 15–18.MathSciNetzbMATHGoogle Scholar
  20. da Silva, C., Migon, H. and Correia, L. (2011). Dynamic Bayesian beta models, Computational Statistics & Data Analysis 55(6): 2074–2089.MathSciNetCrossRefzbMATHGoogle Scholar
  21. Demidenko, E. (2013). Mixed Models: Theory and Applications with R, Wiley.Google Scholar
  22. Diggle, P., Heagerty, P., Liang, K.-Y. and Zeger, S. (2002). Analysis of Longitudinal Data (Second edition), Oxford University Press, United Kingdom.zbMATHGoogle Scholar
  23. Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions, Journal of Applied Statistics 31(7): 799–815.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Figueroa-Zúñiga, J. I., Arellano-Valle, R. B. and Ferrari, S. L. (2013). Mixed beta regression: A Bayesian perspective, Computational Statistics & Data Analysis 61(0): 137–147.MathSciNetCrossRefzbMATHGoogle Scholar
  25. Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. (2008). Longitudinal data analysis, CRC Press.,Google Scholar
  26. Fitzmaurice, G. M., Laird, N. M. and Ware, J. H. (2011). Applied Longitudinal Analysis (Second edition), John Wiley and Sons Inc., New Jersey.CrossRefzbMATHGoogle Scholar
  27. Godambe, V. P. and Thompson, M. (1978). Some aspects of the theory of estimating equations, Journal of Statistical Planning and Inference 2(1): 95–104.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Grunwald, G. K., Raftery, A. E. and Guttorp, P. (1993). Time series of continuous proportions, Journal of the Royal Statistical Society, Series B 55(1): 103–116.zbMATHGoogle Scholar
  29. Hunger, M., Döring, A. and Holle, R. (2012). Longitudinal beta regression models for analyzing health-related quality of life scores over time, BMC Medical Research Methodology 12(1): 144.CrossRefGoogle Scholar
  30. Jørgensen, B. and Knudsen, S. J. (2004). Parameter orthogonality and bias adjustment for estimating functions, Scandinavian Journal of Statistics 31(1): 93–114.MathSciNetCrossRefzbMATHGoogle Scholar
  31. Kaya, Y. and Leite, W. L. (2017). Assessing change in latent skills across time with longitudinal cognitive diagnosis modeling: An evaluation of model performance, Educational and Psychological Measurement 77(3): 369–388.CrossRefGoogle Scholar
  32. Lemonte, A. J. and Bazán, J. L. (2016). New class of Johnson SB distributions and its associated regression model for rates and proportions, Biometrical Journal 58(4): 727–746.MathSciNetCrossRefzbMATHGoogle Scholar
  33. Li, S. T. and Hammond, J. L. (1975). Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients, IEEE Transactions on Systems, Man, and Cybernetics (5): 557–561.Google Scholar
  34. Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models, Biometrika 73(1): 13–22.MathSciNetCrossRefzbMATHGoogle Scholar
  35. Masarotto, G., Varin, C. et al. (2012). Gaussian copula marginal regression, Electronic Journal of Statistics 6: 1517–1549.MathSciNetCrossRefzbMATHGoogle Scholar
  36. McKenzie, E. (1985). An autoregressive process for beta random variables, Management Science 31(8): 988–997.CrossRefzbMATHGoogle Scholar
  37. Menarin, V., Lara, I. A. R. d. and Silva, S. C. d. (2017). Longitudinal model for categorical data applied in an agriculture experiment about elephant grass, Scientia Agricola 74(4): 265–274.CrossRefGoogle Scholar
  38. Mitnik, P. A. and Baek, S. (2013). The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation, Statistical Papers 54(1): 177–192.MathSciNetCrossRefzbMATHGoogle Scholar
  39. Mohd Din, S. H., Molas, M., Luime, J. and Lesaffre, E. (2014). Longitudinal profiles of bounded outcome scores as predictors for disease activity in rheumatoid arthritis patients: a joint modeling approach, Journal of Applied Statistics 41(8): 1627–1644.MathSciNetCrossRefGoogle Scholar
  40. Molenberghs, G. and Verbeke, G. (2006). Models for Discrete Longitudinal Data, Springer Series in Statistics, Springer New York.zbMATHGoogle Scholar
  41. Mousa, A. M., El-Sheikh, A. A. and Abdel-Fattah, M. A. (2016). A gamma regression for bounded continuous variables, Advances and Applications in Statistics 49(4): 305.CrossRefzbMATHGoogle Scholar
  42. Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models, Journal of the Royal Statistical Society, Series A 135(3): 370–384.CrossRefGoogle Scholar
  43. Petterle, R. R., Bonat, W. H., Kokonendji, C. C., Seganfredo, J. C., Moraes, A. and Gomes-da Silva, M. M. (2019). Double Poisson–Tweedie regression models, to appear .Google Scholar
  44. Qiu, Z., Song, P. X.-K. and Tan, M. (2008). Simplex mixed-effects models for longitudinal proportional data, Scandinavian Journal of Statistics 35(4): 577–596.MathSciNetCrossRefzbMATHGoogle Scholar
  45. R Core Team (2018). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria.
  46. Rocha, A. V. and Cribari-Neto, F. (2008). Beta autoregressive moving average models, Test 18(3): 529–545.MathSciNetCrossRefzbMATHGoogle Scholar
  47. Song, P. X.-K., Qiu, Z. and Tan, M. (2004). Modelling heterogeneous dispersion in marginal models for longitudinal proportional data, Biometrical Journal 46(5): 540–553.MathSciNetCrossRefGoogle Scholar
  48. Song, P. X.-K. and Tan, M. (2000). Marginal models for longitudinal continuous proportional data, Biometrics 56(2): 496–502.CrossRefzbMATHGoogle Scholar
  49. Su, P. (2014). NORTARA: Generation of Multivariate Data with Arbitrary Marginals. R package version 1.0.0.Google Scholar
  50. Venezuela, M. K., Aparecida Botter, D. and Carneiro Sandoval, M. (2007). Diagnostic techniques in generalized estimating equations, Journal of Statistical Computation and Simulation 77(10): 879–888.MathSciNetCrossRefzbMATHGoogle Scholar
  51. Verbeke, G., Fieuws, S., Molenberghs, G. and Davidian, M. (2014). The analysis of multivariate longitudinal data: A review, Statistical Methods in Medical Research 23(1): 42–59.MathSciNetCrossRefGoogle Scholar
  52. Verbeke, G. and Molenberghs, G. (2001). Linear Mixed Models for Longitudinal Data, Springer Series in Statistics, Springer New York.zbMATHGoogle Scholar
  53. Verkuilen, J. and Smithson, M. (2012). Mixed and mixture regression models for continuous bounded responses using the beta distribution, Journal of Educational and Behavioral Statistics 37(1): 82–113.CrossRefGoogle Scholar
  54. Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method, Biometrika 61(3): 439–447.MathSciNetzbMATHGoogle Scholar
  55. Zeger, S. L., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach, Biometrics 44(4): 1049–1060.MathSciNetCrossRefzbMATHGoogle Scholar
  56. Zhao, W., Lian, H. and Bandyopadhyay, D. (2018). A partially linear additive model for clustered proportion data, Statistics in Medicine 37(6): 1009–1030.MathSciNetCrossRefGoogle Scholar
  57. Zheng, X., Qin, G. and Tu, D. (2017). A generalized partially linear mean-covariance regression model for longitudinal proportional data, with applications to the analysis of quality of life data from cancer clinical trials, Statistics in Medicine 36(12): 1884–1894.MathSciNetGoogle Scholar

Copyright information

© International Biometric Society 2019

Authors and Affiliations

  • Ricardo Rasmussen Petterle
    • 1
    Email author
  • Wagner Hugo Bonat
    • 2
  • Cassius Tadeu Scarpin
    • 3
  1. 1.Sector of Health Sciences, Medical SchoolParaná Federal UniversityCuritibaBrazil
  2. 2.Department of StatisticsParaná Federal UniversityCuritibaBrazil
  3. 3.Research Group of Technology Applied to Optimization (GTAO)Paraná Federal University (UFPR)CuritibaBrazil

Personalised recommendations