Skip to main content
Log in

Bias in Estimation of Misclassification Rates

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

When a simple random sample of size n is employed to establish a classification rule for prediction of a polytomous variable by an independent variable, the best achievable rate of misclassification is higher than the corresponding best achievable rate if the conditional probability distribution is known for the predicted variable given the independent variable. In typical cases, this increased misclassification rate due to sampling is remarkably small relative to other increases in expected measures of prediction accuracy due to samplings that are typically encountered in statistical analysis.

This issue is particularly striking if a polytomous variable predicts a polytomous variable, for the excess misclassification rate due to estimation approaches 0 at an exponential rate as n increases. Even with a continuous real predictor and with simple nonparametric methods, it is typically not difficult to achieve an excess misclassification rate on the order of n −1. Although reduced excess error is normally desirable, it may reasonably be argued that, in the case of classification, the reduction in bias is related to a more fundamental lack of sensitivity of misclassification error to the quality of the prediction. This lack of sensitivity is not an issue if criteria based on probability prediction such as logarithmic penalty or least squares are employed, but the latter measures typically involve more substantial issues of bias. With polytomous predictors, excess expected errors due to sampling are typically of order n −1. For a continuous real predictor, the increase in expected error is typically of order n −2/3

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. 1 Anderson, T.W., & Rubin, H. (1956). Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium, Vol. 5, pp. 111–150.

  2. 2 Apostol, T.M. (1967). Calculus (2nd ed., Vol. 1). New York: Wiley.

    Google Scholar 

  3. 3 Bartlett, M.S. (1937). The statistical conception of mental factors. British Journal of Psychology, 28, 97–104.

    Google Scholar 

  4. 4 Bekker, P.A., Merckens, A., & Wansbeek, T.J. (1994). Identification, equivalent models, and computer algebra. Boston: Academic Press.

    Google Scholar 

  5. 5 Browne, M.W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.

    Article  Google Scholar 

  6. 6 Dijkstra, T.K. (1981). Latent variables in linear stochastic models. Amsterdam: Sociometric Research Foundation.

    Google Scholar 

  7. 7 Ferguson, T.S. (1996). A course in large sample theory. London: Chapman & Hall.

    Book  Google Scholar 

  8. 8 Green, B.F. (1969). Best linear composites with a specified structure. Psychometrika, 34, 301–318.

    Article  Google Scholar 

  9. 9 Guttman, L. (1955). The determinacy of factor score matrices with implications for five other basic problems of common-factor theory. British Journal of Statistical Psychology, 8, 65–81.

    Article  Google Scholar 

  10. 10 Hayashi, K., & Bentler, P.M. (2000a). On the relations among regular, equal unique variances and image factor analysis. Psychometrika, 65, 59–72.

    Article  Google Scholar 

  11. 11 Hayashi, K., & Bentler, P.M. (2000b). The asymptotic covariance matrix of maximum-likelihood estimates in factor analysis: The case of a nearly singular matrix of estimates of unique variances. Linear Algebra and its Applications, 321, 153–173.

    Article  Google Scholar 

  12. 12 Holzinger, K.J., & Swineford, F. (1939). A study in factor analysis: The stability of a bi-factor solution. Supplementary Educational Monographs, No. 48. Chicago: University of Chicago Press.

    Google Scholar 

  13. 13 Horn, R.A., & Johnson, C.R. (1985). Matrix analysis. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  14. 14 Jöreskog, K.G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183–202.

    Article  Google Scholar 

  15. 15 Kano, Y. (1983). Consistency of estimators in factor analysis. Journal of the Japan Statistical Society, 13, 137–144.

    Google Scholar 

  16. 16 Kano, Y. (1984). Construction of additional variables conforming to a common factor model. Statistics & Probability Letters, 2, 241–244.

    Article  Google Scholar 

  17. 17 Krijnen, W.P. (2002). On the construction of all factors of the model for factor analysis. Psychometrika, 67, 161–172.

    Article  Google Scholar 

  18. 18 Krijnen, W.P. (2004). Convergence in mean square of factor predictors. British Journal of Mathematical and Statistical Psychology, 57, 311–326.

    Article  Google Scholar 

  19. 19 Krijnen, W.P., Wansbeek, T.J., & Ten Berge, J.M.F. (1996). Best linear predictors for factor scores. Communications in Statistics: Theory and Methods, 25, 3013–3025.

    Article  Google Scholar 

  20. 20 Lawley, D.N., & Maxwell, A.E. (1971). Factor analysis as a statistical method (2nd ed.). Butterworth: London.

    Google Scholar 

  21. 21 Lee, S.Y., & Poon, W.Y. (1995). Estimation of factor scores in a two-level confirmatory factor analysis model. Computational Statistics & Data Analysis, 20, 275–284.

    Article  Google Scholar 

  22. 22 Lehmann, E.L. (1999). Elements of large sample theory. New York: Springer-Verlag.

    Book  Google Scholar 

  23. 23 Lord, M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Wesley.

    Google Scholar 

  24. 25 McDonald, R.P. (1981). Constrained least squares estimators of oblique common factors. Psychometrika, 46, 337–341.

    Article  Google Scholar 

  25. 24 McDonald, R.P., & Burr, E.J. (1967). A comparison of four methods of constructing factor scores. Psychometrika, 32, 381–401.

    Article  Google Scholar 

  26. 26 Neudecker, H. (2004). On best affine unbiased covariance-preserving prediction of factor scores. Statistics and Operations Research Transactions, 28, 27–36.

    Google Scholar 

  27. 27 Rao, C.R. (1973). Linear statistical inference and its applications. New York: Wiley.

    Book  Google Scholar 

  28. 28 Rao, C.R., & Toutenburg, H. (1995). Linear models. New York: Springer-Verlag.

    Book  Google Scholar 

  29. 29 Rudin, W. (1976). Principles of mathematical analysis (3rd ed.). New York: McGraw-Hill.

    Google Scholar 

  30. 30 Schneeweiss, H. (1997). Factors and principal components in the near spherical case. Multivariate Behavioural Research, 32, 375–401.

    Article  Google Scholar 

  31. 31 Schneeweiss, H., & Mathes, H. (1995). Factor analysis and principal components. Journal of Multivariate Analysis, 55, 105–124.

    Article  Google Scholar 

  32. 32 Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66, 563–576.

    Article  Google Scholar 

  33. 33 Steiger, J.H. (1979). Factor indeterminacy in the 1930's and the 1970's: Some interesting parallels. Psychometrika, 44, 157–167.

    Article  Google Scholar 

  34. 35 Ten Berge, J.M.F. (1983). On Green's best linear composites with a specified structure, and oblique estimates of factor scores. Psychometrika, 48, 371–375.

    Article  Google Scholar 

  35. 34 Ten Berge, J.M.F., Krijnen, W.P., Wansbeek, T.J., & Shapiro, A. (1999). Some new results on correlation preserving factor scores prediction methods. Linear Algebra and its Applications, 289, 311–318.

    Article  Google Scholar 

  36. 36 Wang S.G., & Chow, S.C. (1994). Advanced linear models. New York: Marcel Dekker.

    Google Scholar 

  37. 37 Williams, J.S. (1978). A definition for the common-factor analysis model and the elimination of problems of factor score indeterminacy. Psychometrika, 43, 293–306.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shelby J. Haberman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haberman, S.J. Bias in Estimation of Misclassification Rates. Psychometrika 71, 387–394 (2006). https://doi.org/10.1007/s11336-004-1145-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-004-1145-6

Keywords

Navigation