Abstract
In the context of regression with a beta-type response variable, we propose a new method that links two methodologies: a distance-based model, and a beta regression with variable dispersion. The proposed model is useful for those situations where the response variable is a rate, a proportion or parts per million, and this variable is related to a mixture of continuous and categorical explanatory variables. We present the main statistical properties and several measures for selection of the most predictive dimensions for the model. In our proposal we only need to choose a suitable distance for both the mean model and the variable dispersion model depending on the type of explanatory variables. The mean and precision predictions for a new individual, and the problem of missing data are also developed. Rather than removing variables or observations with missing data, we use the distance-based method to work with all data without the need to fill in or impute missing values. Finally, an application of mutual funds is presented using the Gower distance for both the mean model and the variable dispersion model. This methodology is applicable to any problem where estimation of distance-based beta regression coefficients for correlated explanatory variables is of interest.
Similar content being viewed by others
References
Anderson, D.R., Sweeney, D.J., Williams, T.A.: Statistics for Business and Economics. South-Western, Cengage Learning, Mason (2011)
Arenas, C., Cuadras, C.M.: Recent statistical methods based on distances. Constributions to science. Institut d’Estudis Catalans Barcelona 2(2), 183–191 (2002)
Atkinson, A.: Plots, Tranformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Clarendon Press, Oxford (1985)
Boj, E., Claramunt, M.M., Fortiana, J.: Selection of predictors in distance-based regression. Commun. Stat. Theory Meth. 36, 87–98 (2007a)
Boj, E., Gran, A., Fortiana, J., Claramunt, M.: Implementing PLS for distance-based regression: computational issues. Comput. Stat. 22, 237–248 (2007b)
Boj, E., Delicado, P., Fortiana, J.: Distance-based local linear regression for functional predictors. Comput. Stat. Data Anal. 54(2), 429–437 (2010)
Cox, C.: Nonlinear quasi-likelihood models: applications to continuous proportions. Comput. Stat. Data. Anal. 21, 449–461 (1996)
Cuadras, C.M.: Distance analysis in discrimination and classification using both continuous and categorical variables. In: Dodge, Y. (ed.) Recent Developments in Statistical Data Analysis and Inference, pp. 459–474. Elsevier Sciencie Publisher, Holland (1989)
Cuadras, C.M.: Interpreting an inequality in multiple regression. Am. Stat. 47(4), 256–258 (1993)
Cuadras, C.M., Arenas, C.: A distance based regression model for prediction with mixed data. Commun. Stat. A 19, 2261–2279 (1990)
Cuadras, C.M., Arenas, C., Fortiana, J.: Some computational aspects of a distance-based model for prediction. Commun. Stat. 25(3), 593–609 (1996)
Dobson, A.J.: An Introduction to Generalized Linear Models, 2nd edn. Chapman Hall, New York (2002)
Enders, C.: Applied Missing Data Analysis. Guilford Press, New York (2010)
Espinheira, P.L., Ferrari, S.L.P., Cribari-Neto, F.: Influence diagnostics in beta regression. Comput. Stat. Data Anal. 52, 4417–4431 (2008a)
Espinheira, P.L., Ferrari, S.L.P., Cribari-Neto, F.: On beta regression residuals. J. Appl. Stat. 35(4), 407–419 (2008b)
Esteve, A., Boj, E., Fortiana, J.: Interaction terms in distance-based regression. Commun. Stat. 38(19), 3498–3509 (2009)
Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31, 799–815 (2004)
Ferrari, S.L.P., Espinheira, P.L., Cribari-Neto, F.: Diagnostic Tools in Beta Regression With Varying Dispersion. Statistica Neerlandica. 65(3), 337–351 (2011)
Galvis, D.M., Badyophadyay, D., Lachos, V.H.: Augmented Mixed Beta Regression Models for Periodontal Proportion Data, Technical Report, vol. 6. Universidade Estadual de Campinas, Brazil (2013)
Gayer, G., Gilboa, I., Lieberman, O.: Rule-based and case-based reasoning in housing prices, B.E. J. Theor. Econ. 7(1), (2007). (Advances), Article 10: Available at: http://www.bepress.com/bejte/vol7/iss1/art10
Gilboa, I., Lieberman, O., Schmeidler, D.: A similarity-based approach to prediction. J. Econom. 162(1), 124–131 (2011)
Gower, J.: Adding a point to vector diagrams in multivariate analysis. Biometrika 55, 582–585 (1968)
Gower, J.: A general coefficient of similarity and some of its properties. Biometrics 27, 857–874 (1971)
Hoerl, A.E., Kennard, R.W.: Ridge regression: applications to non-orthogonal problems. Technometrics 12(1), 69–82 (1970a)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970b)
Ibrahim, J.: Incomplete data in generalized linear models. J. Am. Stat. Assoc. 85(411), 765–769 (1990)
Kieschnick, R., McCullough, B.: Regression analysis of variates observed on (0, 1): percentages, proportions, and fractions. Stat. Model 3, 193–213 (2003)
Kosmidis, I., Firth, D.: Bias reduction in exponential family nonlinear models. Biometrika 96, 793–804 (2009)
Kosmidis, I., Firth, D.: A generic algorithm for reducing bias in parametric estimation. Electron. J. Stat. 4, 1097–1112 (2010)
Lieberman, O.: Asymptotic theory for empirical similarity models. Econom. Theory 26(4), 1032–1059 (2010)
Little, R.J., Rubin, D.: Statistical Analysis with Missing Data. John Wiley & Sons, Hoboken (2002)
López, F.O.: A bayesian approach to parameter estimation in simplex regression model: a comparison with beta regression. Revista Colombiana de Estadística 36(1), 1–21 (2013)
Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press Inc, London (2002)
McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman Hall, London (1989)
McCulloch, C.E., Searle, S.R.: Generalized Linear and Mixed Models. John Wiley & Sons, New York (2001)
Morningstar, Kinnel, R., Berry, S.: Morningstar Funds 500: 2008. John Wiley & Sons, New Jersey (2008)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer-Verlag, New York (1999)
Ospina, R., Cribari-Neto, F., Vasconcellos, K.L.P.: Improved point and interval estimation for a beta regression model. Comput. Stat. Data Anal. 51(2), 960–981 (2006)
Ospina, R., Ferrari, S.: Inflated beta distributions. Stat. Papers 51(1), 111–126 (2010)
Paolina, P.: Maximum likelihood estimation of models with beta-distributed dependent variables. Political Anal. 9, 325–346 (2001)
Papke, L., Wooldridge, J.: Econometric methods for fractional response variables with an application to 401(K) plan participation rates. J. Appl. Econom. 11, 619–632 (1996)
R Development Core Team: R: A Language and Environment for Statistical Computing, Vienna, (2013). http://www.R-project.org/
Rocke, D.M.: On the beta transformation family. Technometrics 35(1), 72–81 (1993)
Ruszczynski, A.: Nonlinear Optimization. Princeton University Press, Princeton (2006)
Simas, A.B., Barreto-Souza, W., Rocha, A.V.: Improved estimators for a general class of beta regression models. Comput. Stat. Data Anal. 54(2), 348–366 (2010)
Smithson, M., Verkuilen, J.: A better lemon squeezer? Maximum-likelihood regression with beta-distributed depedent variables. Psychol. Meth. 11(1), 54–71 (2006)
Vasconcellos, K.L.P., Cribari-Neto, F.: Improved maximum likelihood estimation in a new class of beta regression models. Brazil. J. Prob. Stat. 19, 13–31 (2005)
Verkuilen, J., Smithson, M.: Mixed and mixture regression models for continuous bounded responses using the beta distribution. J. Educat. Behav. Stat. 37(1), 82–113 (2012)
Zeileis, A.: Beta regression in R. J. Stat. Softw. 34(2), 1–24 (2010)
Acknowledgments
We sincerely thank two referees for helpful comments and suggestions which led to improve this paper. Work partially funded and supported by Grant MTM2010-14961 from the Spanish Ministry of Science and Education, by Carolina Foundation, by Applied Statistics in Experimental Research, Industry and Biotechnology (Universidad Nacional de Colombia), and by Core Spatial Data Research (Faculty of Engineering, Universidad Distrital Francisco José de Caldas).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Melo, O.O., Melo, C.E. & Mateu, J. Distance-based beta regression for prediction of mutual funds. AStA Adv Stat Anal 99, 83–106 (2015). https://doi.org/10.1007/s10182-014-0232-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-014-0232-6