Skip to main content

Distance-based beta regression for prediction of mutual funds

Abstract

In the context of regression with a beta-type response variable, we propose a new method that links two methodologies: a distance-based model, and a beta regression with variable dispersion. The proposed model is useful for those situations where the response variable is a rate, a proportion or parts per million, and this variable is related to a mixture of continuous and categorical explanatory variables. We present the main statistical properties and several measures for selection of the most predictive dimensions for the model. In our proposal we only need to choose a suitable distance for both the mean model and the variable dispersion model depending on the type of explanatory variables. The mean and precision predictions for a new individual, and the problem of missing data are also developed. Rather than removing variables or observations with missing data, we use the distance-based method to work with all data without the need to fill in or impute missing values. Finally, an application of mutual funds is presented using the Gower distance for both the mean model and the variable dispersion model. This methodology is applicable to any problem where estimation of distance-based beta regression coefficients for correlated explanatory variables is of interest.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  • Anderson, D.R., Sweeney, D.J., Williams, T.A.: Statistics for Business and Economics. South-Western, Cengage Learning, Mason (2011)

    Google Scholar 

  • Arenas, C., Cuadras, C.M.: Recent statistical methods based on distances. Constributions to science. Institut d’Estudis Catalans Barcelona 2(2), 183–191 (2002)

    Google Scholar 

  • Atkinson, A.: Plots, Tranformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Clarendon Press, Oxford (1985)

    Google Scholar 

  • Boj, E., Claramunt, M.M., Fortiana, J.: Selection of predictors in distance-based regression. Commun. Stat. Theory Meth. 36, 87–98 (2007a)

    MATH  Google Scholar 

  • Boj, E., Gran, A., Fortiana, J., Claramunt, M.: Implementing PLS for distance-based regression: computational issues. Comput. Stat. 22, 237–248 (2007b)

    Article  MATH  Google Scholar 

  • Boj, E., Delicado, P., Fortiana, J.: Distance-based local linear regression for functional predictors. Comput. Stat. Data Anal. 54(2), 429–437 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Cox, C.: Nonlinear quasi-likelihood models: applications to continuous proportions. Comput. Stat. Data. Anal. 21, 449–461 (1996)

    Article  MATH  Google Scholar 

  • Cuadras, C.M.: Distance analysis in discrimination and classification using both continuous and categorical variables. In: Dodge, Y. (ed.) Recent Developments in Statistical Data Analysis and Inference, pp. 459–474. Elsevier Sciencie Publisher, Holland (1989)

    Chapter  Google Scholar 

  • Cuadras, C.M.: Interpreting an inequality in multiple regression. Am. Stat. 47(4), 256–258 (1993)

    Google Scholar 

  • Cuadras, C.M., Arenas, C.: A distance based regression model for prediction with mixed data. Commun. Stat. A 19, 2261–2279 (1990)

    Article  MathSciNet  Google Scholar 

  • Cuadras, C.M., Arenas, C., Fortiana, J.: Some computational aspects of a distance-based model for prediction. Commun. Stat. 25(3), 593–609 (1996)

    Article  MATH  Google Scholar 

  • Dobson, A.J.: An Introduction to Generalized Linear Models, 2nd edn. Chapman Hall, New York (2002)

    MATH  Google Scholar 

  • Enders, C.: Applied Missing Data Analysis. Guilford Press, New York (2010)

    Google Scholar 

  • Espinheira, P.L., Ferrari, S.L.P., Cribari-Neto, F.: Influence diagnostics in beta regression. Comput. Stat. Data Anal. 52, 4417–4431 (2008a)

    Article  MATH  MathSciNet  Google Scholar 

  • Espinheira, P.L., Ferrari, S.L.P., Cribari-Neto, F.: On beta regression residuals. J. Appl. Stat. 35(4), 407–419 (2008b)

    Article  MATH  MathSciNet  Google Scholar 

  • Esteve, A., Boj, E., Fortiana, J.: Interaction terms in distance-based regression. Commun. Stat. 38(19), 3498–3509 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31, 799–815 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Ferrari, S.L.P., Espinheira, P.L., Cribari-Neto, F.: Diagnostic Tools in Beta Regression With Varying Dispersion. Statistica Neerlandica. 65(3), 337–351 (2011)

    Google Scholar 

  • Galvis, D.M., Badyophadyay, D., Lachos, V.H.: Augmented Mixed Beta Regression Models for Periodontal Proportion Data, Technical Report, vol. 6. Universidade Estadual de Campinas, Brazil (2013)

    Google Scholar 

  • Gayer, G., Gilboa, I., Lieberman, O.: Rule-based and case-based reasoning in housing prices, B.E. J. Theor. Econ. 7(1), (2007). (Advances), Article 10: Available at: http://www.bepress.com/bejte/vol7/iss1/art10

  • Gilboa, I., Lieberman, O., Schmeidler, D.: A similarity-based approach to prediction. J. Econom. 162(1), 124–131 (2011)

    Article  MathSciNet  Google Scholar 

  • Gower, J.: Adding a point to vector diagrams in multivariate analysis. Biometrika 55, 582–585 (1968)

    Article  MATH  Google Scholar 

  • Gower, J.: A general coefficient of similarity and some of its properties. Biometrics 27, 857–874 (1971)

    Article  Google Scholar 

  • Hoerl, A.E., Kennard, R.W.: Ridge regression: applications to non-orthogonal problems. Technometrics 12(1), 69–82 (1970a)

    Article  MATH  MathSciNet  Google Scholar 

  • Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970b)

    Article  MATH  MathSciNet  Google Scholar 

  • Ibrahim, J.: Incomplete data in generalized linear models. J. Am. Stat. Assoc. 85(411), 765–769 (1990)

    Article  Google Scholar 

  • Kieschnick, R., McCullough, B.: Regression analysis of variates observed on (0, 1): percentages, proportions, and fractions. Stat. Model 3, 193–213 (2003)

    Article  MathSciNet  Google Scholar 

  • Kosmidis, I., Firth, D.: Bias reduction in exponential family nonlinear models. Biometrika 96, 793–804 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Kosmidis, I., Firth, D.: A generic algorithm for reducing bias in parametric estimation. Electron. J. Stat. 4, 1097–1112 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Lieberman, O.: Asymptotic theory for empirical similarity models. Econom. Theory 26(4), 1032–1059 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Little, R.J., Rubin, D.: Statistical Analysis with Missing Data. John Wiley & Sons, Hoboken (2002)

    Book  MATH  Google Scholar 

  • López, F.O.: A bayesian approach to parameter estimation in simplex regression model: a comparison with beta regression. Revista Colombiana de Estadística 36(1), 1–21 (2013)

    MATH  Google Scholar 

  • Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press Inc, London (2002)

    Google Scholar 

  • McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman Hall, London (1989)

    Book  MATH  Google Scholar 

  • McCulloch, C.E., Searle, S.R.: Generalized Linear and Mixed Models. John Wiley & Sons, New York (2001)

    MATH  Google Scholar 

  • Morningstar, Kinnel, R., Berry, S.: Morningstar Funds 500: 2008. John Wiley & Sons, New Jersey (2008)

    Google Scholar 

  • Nocedal, J., Wright, S.J.: Numerical Optimization. Springer-Verlag, New York (1999)

    Book  MATH  Google Scholar 

  • Ospina, R., Cribari-Neto, F., Vasconcellos, K.L.P.: Improved point and interval estimation for a beta regression model. Comput. Stat. Data Anal. 51(2), 960–981 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Ospina, R., Ferrari, S.: Inflated beta distributions. Stat. Papers 51(1), 111–126 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Paolina, P.: Maximum likelihood estimation of models with beta-distributed dependent variables. Political Anal. 9, 325–346 (2001)

    Article  Google Scholar 

  • Papke, L., Wooldridge, J.: Econometric methods for fractional response variables with an application to 401(K) plan participation rates. J. Appl. Econom. 11, 619–632 (1996)

    Article  Google Scholar 

  • R Development Core Team: R: A Language and Environment for Statistical Computing, Vienna, (2013). http://www.R-project.org/

  • Rocke, D.M.: On the beta transformation family. Technometrics 35(1), 72–81 (1993)

    Article  Google Scholar 

  • Ruszczynski, A.: Nonlinear Optimization. Princeton University Press, Princeton (2006)

    MATH  Google Scholar 

  • Simas, A.B., Barreto-Souza, W., Rocha, A.V.: Improved estimators for a general class of beta regression models. Comput. Stat. Data Anal. 54(2), 348–366 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Smithson, M., Verkuilen, J.: A better lemon squeezer? Maximum-likelihood regression with beta-distributed depedent variables. Psychol. Meth. 11(1), 54–71 (2006)

    Article  Google Scholar 

  • Vasconcellos, K.L.P., Cribari-Neto, F.: Improved maximum likelihood estimation in a new class of beta regression models. Brazil. J. Prob. Stat. 19, 13–31 (2005)

    MATH  MathSciNet  Google Scholar 

  • Verkuilen, J., Smithson, M.: Mixed and mixture regression models for continuous bounded responses using the beta distribution. J. Educat. Behav. Stat. 37(1), 82–113 (2012)

    Article  Google Scholar 

  • Zeileis, A.: Beta regression in R. J. Stat. Softw. 34(2), 1–24 (2010)

    Google Scholar 

Download references

Acknowledgments

We sincerely thank two referees for helpful comments and suggestions which led to improve this paper. Work partially funded and supported by Grant MTM2010-14961 from the Spanish Ministry of Science and Education, by Carolina Foundation, by Applied Statistics in Experimental Research, Industry and Biotechnology (Universidad Nacional de Colombia), and by Core Spatial Data Research (Faculty of Engineering, Universidad Distrital Francisco José de Caldas).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oscar O. Melo.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Melo, O.O., Melo, C.E. & Mateu, J. Distance-based beta regression for prediction of mutual funds. AStA Adv Stat Anal 99, 83–106 (2015). https://doi.org/10.1007/s10182-014-0232-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-014-0232-6

Keywords

  • Distance-based beta regression
  • Missing data
  • Mutual funds
  • Predictions
  • Principal coordinates analysis
  • Variable dispersion