Random Measurement Error

  • Gideon J. MellenberghEmail author


A psychological or educational test is an instrument for the measurement of a person’s maximum performance or typical response under standardized conditions, where the performance or response is assumed to reflect one or more latent variables. A test consists of a set of items. Conventional test scoring assigns a priori scores to test takers’ item responses, and a test taker’s observed test score is the sum of his (her) item scores. Test scores are affected by random and systematic errors. Random errors decrease the measurement precision of tests, and systematic errors bias the measurements. A within-person and a between-persons aspect of measurement precision are distinguished. The within-person aspect is the variance of a given test taker’s observed score across hypothetical replications, which assesses the precision of the measurement of the test taker’s true score. The between-persons aspect is the reliability, which is the squared product moment correlation between observed and true test scores in a population of test takers. Measurement precision is increased by applying guidelines for test construction and administration. Classical and modern psychometric methods assess the quality of tests and items. Classical item analysis indices are the item p-value and item-rest correlation, and modern indices are the item difficulty and discrimination parameters of item response models.


Birnbaum’s two-parameter logistic item response model Classical analysis of items and tests Constructed-response item Item writing guidelines Latent variable Reliability Selected-response item Test Testlet Within-person measurement precision 


  1. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick, Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.Google Scholar
  2. Canavos, G. C. (1984). Applied probability and statistical methods. Boston: Little, Brown and Company.Google Scholar
  3. Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). New York, NY: Harper & Row.Google Scholar
  4. Dekking, Y. M. (1983). Handleiding S.A.S.-K Sociale Angstschaal voor Kinderen [Manual Social Anxiety Scale for Children]. Lisse, The Netherlands: Swets & Zeitlinger.Google Scholar
  5. Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between constructs and measures. Psychological Methods, 5, 125–174.CrossRefGoogle Scholar
  6. Gough, H. G., & Heilbrun, A. B. (1980). The adjective check list manual. Palo Alto, CA: Consulting Psychologists Press.Google Scholar
  7. Gulliksen, H. (1950). Theory of mental tests. New York, NY: Wiley.CrossRefGoogle Scholar
  8. Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.), Measurement and prediction (pp. 60–90). Princeton, NJ: Princeton University Press.Google Scholar
  9. Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15, 309–334.CrossRefGoogle Scholar
  10. Hendriks, C., Meiland, F., Bakker, M., & Loos, I. (1985). Eenzaamheid en persoonlijkheidskenmerken [Loneliness and personality characteristics]. Unpublished manuscript, Department of Psychological Methods, University of Amsterdam, The Netherlands.Google Scholar
  11. Hogan, J. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20, 427–441.CrossRefGoogle Scholar
  12. Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133.CrossRefGoogle Scholar
  13. Krosnick, J. A., & Presser, S. (2010). Questions and questionnaire design. In P. V. Marsden, & J. D. Wright (Eds.), Handbook of survey research (2nd ed., pp. 263–313). Bingley, UK: Emerald.Google Scholar
  14. Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.), Measurement and prediction (pp. 362–412). Princeton, NJ: Princeton University Press.Google Scholar
  15. Leighton, J. P., & Gierl, M. J. (2007). Verbal reports as data for cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 146–172). New York, NY: Cambridge University Press.Google Scholar
  16. Levy, P. (1995). Charles Spearman’s contributions to test theory. British Journal of Mathematical and Statistical Psychology, 48, 221–235.CrossRefGoogle Scholar
  17. Lord, F. M. (1952). A theory of test scores. Psychometric Monographs, No. 7.Google Scholar
  18. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
  19. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  20. Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293–299.CrossRefGoogle Scholar
  21. Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis, and application of psychological and educational tests. The Hague, The Netherlands: Eleven International Publishing.Google Scholar
  22. Mokken, R. J. (1971). A theory and procedure of scale analysis with applications in political research. Berlin, Germany: De Gruyter.CrossRefGoogle Scholar
  23. Oosterveld, P. (1989). ACL verkorting en unidimensionaliteit [Shortening of the ACL and unidimensionality]. Unpublished manuscript, Department of Psychological Methods, University of Amsterdam, The Netherlands.Google Scholar
  24. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: The Danish Institute for Educational Research.Google Scholar
  25. Raykov, T. (2007). Reliability if deleted, not “alpha if deleted”: Evaluation of scale reliability following component deletion. British Journal of Mathematical and Statistical Psychology, 60, 201–216.CrossRefGoogle Scholar
  26. Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24, 3–13.CrossRefGoogle Scholar
  27. Samejima, F. (1969). Estimation of ability using a response pattern of graded scores. Psychometrika Monograph, No. 17.Google Scholar
  28. Torgerson, W. S. (1958). Theory and methods of scaling. New York, NY: Wiley.Google Scholar
  29. van den Berg, R. G. (2002). Psychometrics report: Analysis of the aggression scale of the ACL. Unpublished manuscript, Psychological Methods, Department of Psychological Methods, University of Amsterdam, The Netherlands.Google Scholar
  30. van der Linden, W. J. (Ed.) (2016). Handbook of item response theory. Vol. I. Models. Boca Raton, Fl: Chapman & Hall/CRC.Google Scholar
  31. Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and applications. New York, NY: Cambridge University Press.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Emeritus Professor Psychological Methods, Department of PsychologyUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations