Advertisement

Quality and Quantity

, Volume 32, Issue 3, pp 229–245 | Cite as

Goodness of Fit in Regression Analysis – R2 and G2 Reconsidered

  • Curt Hagquist
  • Magnus Stenbeck
Article

Abstract

There has been considerable debate on how important goodness of fit is as a tool in regression analysis, especially with regard to the controversy on R2 in linear regression. This article reviews some of the arguments of this debate and its relationship to other goodness of fit measures. It attempts to clarify the distinction between goodness of fit measures and other model evaluation tools as well as the distinction between model test statistics and descriptive measures used to make decisions on the agreement between models and data. It also argues that the utility of goodness of fit measures depends on whether the analysis focuses on explaining the outcome (model orientation) or explaining the effect(s) of some regressor(s) on the outcome (factor orientation).

In some situations a decisive goodness of fit test statistic exists and is a central tool in the analysis. In other situations, where the goodness of fit measure is not a test statistic but a descripitive measure, it can be used as a heuristic device along with other evidence whenever appropriate. The availability of goodness of fit test statistics depends on whether the variability in the observations is restricted, as in table analysis, or whether it is unrestricted, as in OLS and logistic regression on individual data. Hence, G2 is a decisive tool for measuring goodness of fit, whereas R2 and SEE are heuristic tools.

Keywords

Regression Analysis Logistic Regression Model Test Individual Data Decisive Goodness 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achen, C. H. (1982). Interpreting and Using Regression. Newbury Park: Sage Publications.Google Scholar
  2. Achen, C. H. (1990). What Does “Explained Variance” Explain?: Reply, Political Analysis, 2. Ann Arbor: The University of Michigan Press, pp. 173–184.Google Scholar
  3. Agresti, A. (1996). An Introduction to Categorical Data Analysis. New York: John Wiley and Sons.Google Scholar
  4. Agresti, A. (1990). Categorical Data Analysis. New York: John Wiley and Sons.Google Scholar
  5. Aldrich, J. H. & Nelson, F. D. (1984). Linear Probability, Logit, and Probit Models. Newbury Park: Sage Publications.Google Scholar
  6. Berry, W. D. & Feldman, S. (1985). Multiple Regression in Practice. Newbury Park: Sage Publications.Google Scholar
  7. Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete Multivariate Analysis. Theory and Practice. Cambridge: The MIT Press.Google Scholar
  8. Bollen, K. A. & Long, J. S. (1993). Introduction. In: K. A. Bollen & J. S. Long (eds), Testing Structural Equation Models. Newbury Park: Sage Publications.Google Scholar
  9. Clogg, C. C. & Shihadeh, E. S. (1994). Statistical Models for Ordinal Variables. Thousand Oaks: Sage Publications.Google Scholar
  10. Demaris, A. (1992). Logit Modeling. Practical Applications. Newbury Park: Sage Publications.Google Scholar
  11. Duncan, O. D. (1985). Personal letter to David Burke.Google Scholar
  12. Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data. Cambridge: The MIT Press.Google Scholar
  13. Gilbert, N. (1993). Analyzing Tabular Data. Loglinear and Logistic Models for Social Researchers. London: UCL Press.Google Scholar
  14. Hagle, T. M. & Mitchell, G. E. (1992). Goodness of fit measures for probit and logit. American Journal of Political Science 36: 762–784.Google Scholar
  15. Hanushek, E. A. & Jackson, J. E. (1977). Statistical Methods for Social Scientists. Orlando: Academic Press.Google Scholar
  16. Hosmer, D. W. & Lemeshow, S. (1989). Applied Logistic Regression. New York: John Wiley and Sons.Google Scholar
  17. King, G. (1986). How not to lie with statistics: Avoiding common mistakes in quantitative political science. American Journal of Political Science 30: 666–687.Google Scholar
  18. King, G. (1990). Stochastic Variation: A Comment on Lewis-Beck and Skalaban's “The R-Squared”. Political Analysis, 2. Ann Arbor: The University of Michigan Press, pp. 185–200.Google Scholar
  19. Knoke, D. & Burke, P. J. (1980). Log-linear models. Newbury Park: Sage Publications.Google Scholar
  20. Lewis-Beck, M. S. (1980). Applied Regression. An Introduction. Newbury Park: Sage Publications.Google Scholar
  21. Lewis-Beck, M. S. & Skalaban, A. (1990). The R-Squared: Some Straight Talk. Political Analysis, 2. Ann Arbor: The University of Michigan Press, pp. 153–171.Google Scholar
  22. McCullagh, P. & Nelder, J. A. (1989). Generalized Linear Models. London: Chapman and Hall.Google Scholar
  23. McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers of Econometrics. New York: Academic Press, pp. 105–142.Google Scholar
  24. Menard, S. (1995). Applied Logistic Regression Analysis. Thousand Oaks: Sage Publications.Google Scholar
  25. Schroeder, L. D., Sjoquist, D. L., & Stephan, P. E. (1986). Understanding Regression Analysis. An Introductory Guide. Newbury Park: Sage Publications.Google Scholar
  26. SPSS (1993). SPSS for Windows. Advanced Statistics Release 6.0. Chicago: SPSS.Google Scholar
  27. SPSS (1993). SPSS for Windows. Base System User's Guide. Release 6.0. Chicago: SPSS.Google Scholar
  28. SPSS (1994). SPSS 6.1 for Windows update. Chicago: SPSS.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Curt Hagquist
    • 1
  • Magnus Stenbeck
    • 2
  1. 1.Centre for Public Health ResearchUniversity of KarlstadKarlstadSweden
  2. 2.Centre for Epidemiology, National Board of Health and WelfareStockholmSweden

Personalised recommendations