Bias in Estimation of Misclassification Rates

Haberman, Shelby J.

doi:10.1007/s11336-004-1145-6

Bias in Estimation of Misclassification Rates

Published: 11 February 2017

Volume 71, pages 387–394, (2006)
Cite this article

Psychometrika Aims and scope Submit manuscript

Shelby J. Haberman^1,2

150 Accesses
1 Citation
Explore all metrics

Abstract

When a simple random sample of size n is employed to establish a classification rule for prediction of a polytomous variable by an independent variable, the best achievable rate of misclassification is higher than the corresponding best achievable rate if the conditional probability distribution is known for the predicted variable given the independent variable. In typical cases, this increased misclassification rate due to sampling is remarkably small relative to other increases in expected measures of prediction accuracy due to samplings that are typically encountered in statistical analysis.

This issue is particularly striking if a polytomous variable predicts a polytomous variable, for the excess misclassification rate due to estimation approaches 0 at an exponential rate as n increases. Even with a continuous real predictor and with simple nonparametric methods, it is typically not difficult to achieve an excess misclassification rate on the order of n ⁻¹. Although reduced excess error is normally desirable, it may reasonably be argued that, in the case of classification, the reduction in bias is related to a more fundamental lack of sensitivity of misclassification error to the quality of the prediction. This lack of sensitivity is not an issue if criteria based on probability prediction such as logarithmic penalty or least squares are employed, but the latter measures typically involve more substantial issues of bias. With polytomous predictors, excess expected errors due to sampling are typically of order n ⁻¹. For a continuous real predictor, the increase in expected error is typically of order n ^−2/3

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

1 Anderson, T.W., & Rubin, H. (1956). Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium, Vol. 5, pp. 111–150.
2 Apostol, T.M. (1967). Calculus (2nd ed., Vol. 1). New York: Wiley.
Google Scholar
3 Bartlett, M.S. (1937). The statistical conception of mental factors. British Journal of Psychology, 28, 97–104.
Google Scholar
4 Bekker, P.A., Merckens, A., & Wansbeek, T.J. (1994). Identification, equivalent models, and computer algebra. Boston: Academic Press.
Google Scholar
5 Browne, M.W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.
Article Google Scholar
6 Dijkstra, T.K. (1981). Latent variables in linear stochastic models. Amsterdam: Sociometric Research Foundation.
Google Scholar
7 Ferguson, T.S. (1996). A course in large sample theory. London: Chapman & Hall.
Book Google Scholar
8 Green, B.F. (1969). Best linear composites with a specified structure. Psychometrika, 34, 301–318.
Article Google Scholar
9 Guttman, L. (1955). The determinacy of factor score matrices with implications for five other basic problems of common-factor theory. British Journal of Statistical Psychology, 8, 65–81.
Article Google Scholar
10 Hayashi, K., & Bentler, P.M. (2000a). On the relations among regular, equal unique variances and image factor analysis. Psychometrika, 65, 59–72.
Article Google Scholar
11 Hayashi, K., & Bentler, P.M. (2000b). The asymptotic covariance matrix of maximum-likelihood estimates in factor analysis: The case of a nearly singular matrix of estimates of unique variances. Linear Algebra and its Applications, 321, 153–173.
Article Google Scholar
12 Holzinger, K.J., & Swineford, F. (1939). A study in factor analysis: The stability of a bi-factor solution. Supplementary Educational Monographs, No. 48. Chicago: University of Chicago Press.
Google Scholar
13 Horn, R.A., & Johnson, C.R. (1985). Matrix analysis. Cambridge: Cambridge University Press.
Book Google Scholar
14 Jöreskog, K.G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183–202.
Article Google Scholar
15 Kano, Y. (1983). Consistency of estimators in factor analysis. Journal of the Japan Statistical Society, 13, 137–144.
Google Scholar
16 Kano, Y. (1984). Construction of additional variables conforming to a common factor model. Statistics & Probability Letters, 2, 241–244.
Article Google Scholar
17 Krijnen, W.P. (2002). On the construction of all factors of the model for factor analysis. Psychometrika, 67, 161–172.
Article Google Scholar
18 Krijnen, W.P. (2004). Convergence in mean square of factor predictors. British Journal of Mathematical and Statistical Psychology, 57, 311–326.
Article Google Scholar
19 Krijnen, W.P., Wansbeek, T.J., & Ten Berge, J.M.F. (1996). Best linear predictors for factor scores. Communications in Statistics: Theory and Methods, 25, 3013–3025.
Article Google Scholar
20 Lawley, D.N., & Maxwell, A.E. (1971). Factor analysis as a statistical method (2nd ed.). Butterworth: London.
Google Scholar
21 Lee, S.Y., & Poon, W.Y. (1995). Estimation of factor scores in a two-level confirmatory factor analysis model. Computational Statistics & Data Analysis, 20, 275–284.
Article Google Scholar
22 Lehmann, E.L. (1999). Elements of large sample theory. New York: Springer-Verlag.
Book Google Scholar
23 Lord, M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Wesley.
Google Scholar
25 McDonald, R.P. (1981). Constrained least squares estimators of oblique common factors. Psychometrika, 46, 337–341.
Article Google Scholar
24 McDonald, R.P., & Burr, E.J. (1967). A comparison of four methods of constructing factor scores. Psychometrika, 32, 381–401.
Article Google Scholar
26 Neudecker, H. (2004). On best affine unbiased covariance-preserving prediction of factor scores. Statistics and Operations Research Transactions, 28, 27–36.
Google Scholar
27 Rao, C.R. (1973). Linear statistical inference and its applications. New York: Wiley.
Book Google Scholar
28 Rao, C.R., & Toutenburg, H. (1995). Linear models. New York: Springer-Verlag.
Book Google Scholar
29 Rudin, W. (1976). Principles of mathematical analysis (3rd ed.). New York: McGraw-Hill.
Google Scholar
30 Schneeweiss, H. (1997). Factors and principal components in the near spherical case. Multivariate Behavioural Research, 32, 375–401.
Article Google Scholar
31 Schneeweiss, H., & Mathes, H. (1995). Factor analysis and principal components. Journal of Multivariate Analysis, 55, 105–124.
Article Google Scholar
32 Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66, 563–576.
Article Google Scholar
33 Steiger, J.H. (1979). Factor indeterminacy in the 1930's and the 1970's: Some interesting parallels. Psychometrika, 44, 157–167.
Article Google Scholar
35 Ten Berge, J.M.F. (1983). On Green's best linear composites with a specified structure, and oblique estimates of factor scores. Psychometrika, 48, 371–375.
Article Google Scholar
34 Ten Berge, J.M.F., Krijnen, W.P., Wansbeek, T.J., & Shapiro, A. (1999). Some new results on correlation preserving factor scores prediction methods. Linear Algebra and its Applications, 289, 311–318.
Article Google Scholar
36 Wang S.G., & Chow, S.C. (1994). Advanced linear models. New York: Marcel Dekker.
Google Scholar
37 Williams, J.S. (1978). A definition for the common-factor analysis model and the elimination of problems of factor score indeterminacy. Psychometrika, 43, 293–306.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Educational Testing Service, Princeton
Shelby J. Haberman
Mailstop 12T, Educational Testing Service, Rosedale Road, Princeton, NJ, 08541
Shelby J. Haberman

Authors

Shelby J. Haberman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shelby J. Haberman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haberman, S.J. Bias in Estimation of Misclassification Rates. Psychometrika 71, 387–394 (2006). https://doi.org/10.1007/s11336-004-1145-6

Download citation

Received: 26 May 2005
Accepted: 09 June 2006
Published: 11 February 2017
Issue Date: June 2006
DOI: https://doi.org/10.1007/s11336-004-1145-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bias in Estimation of Misclassification Rates

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Statistical power for cluster analysis

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bias in Estimation of Misclassification Rates

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Statistical power for cluster analysis

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation