, Volume 65, Issue 4, pp 437–456 | Cite as

A test-theoretic approach to observed-score equating

  • Wim J. van der LindenEmail author


Observed-score equating using the marginal distributions of two tests is not necessarily the universally best approach it has been claimed to be. On the other hand, equating using the conditional distributions given the ability level of the examinee is theoretically ideal. Possible ways of dealing with the requirement of known ability are discussed, including such methods as conditional observed-score equating at point estimates or posterior expected conditional equating. The methods are generalized to the problem of observed-score equating with a multivariate ability structure underlying the scores.

Key words

observed-score equating equipercentile method equating criteria multidimensionality 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Braun, H.I., & Holland, P.W. (1982). Observed score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.).Test equating (pp. 9–49). New York: Academic Press.Google Scholar
  2. Campbell, N. R. (1928).An account of the principles of measurement and calculation. London: Longmans, Green & Co.Google Scholar
  3. Cizek, G.J., Kenney, P.A., Kolen, M.J., Peters, C.W., & van der Linden, W.J. (1999).The feasibility of linking scores on the proposed Voluntary National Test and the National Assessment of Educational Progress [Final report]. Washington, DC: National Assessment Governing Board.Google Scholar
  4. Dorans, N.J. (1999).Correspondences between ACT and SAT I scores (College Board Rep. No. 99-1). New York: College Entrance Board.Google Scholar
  5. Dubois, P.H. (1970).A history of psychological testing. Boston: Allyn & Bacon.Google Scholar
  6. Feuer, M.J., Holland, P.W., Green, B.F., Bertenthal, M. W., & Hemphill, F. C. (Eds.). (1999).Uncommon measures: Equivalence and linkage among educational tests. Washington, DC: National Academy Press.Google Scholar
  7. Glas, C.A.W. (1992). A Rasch model with a multivariate distribution of ability. In M. Wilson (Ed.),Objective measurement: Theory into practice (Vol. 1, pp. 236–260). Norwood, NJ: Ablex.Google Scholar
  8. Grayson, D.A. (1988). Two-group classification in latent trait theory: Scores with monotone likelihood ratio.Psychometrika, 53, 383–392.CrossRefGoogle Scholar
  9. Harris, D.B., & Crouse, J.D. (1993). A study of criteria used in equating.Applied Measurement in Education, 6, 195–240.Google Scholar
  10. Holland, P.W., & Rubin, D.B. (Eds.). (1982).Test equating. New York: Academic Press.Google Scholar
  11. Junker, B.W., & Sijtsma, K. (2000). Latent and manifest monotonicity in item response models.Applied Psychological Measurement, 24, 65–81.Google Scholar
  12. Kolen, M.J., & Brennan, R.L. (1995).Test equating: Methods and practices. New York, NY: Springer-Verlag.Google Scholar
  13. Koretz, D.M., Bertenthal, M.W., & Green, B.F. (Eds.). (1999).Embedded questions: The pursuit of a common measure in uncommon tests. Washington, DC: National Academy Press.Google Scholar
  14. Lehmann, E.L. (1986).Testing statistical hypothesis (2nd ed.). New York, NY: Wiley & Sons.Google Scholar
  15. Linn, R.L. (1993). Linking results of distincts assessments.Applied Measurement in Education, 6, 83–102.Google Scholar
  16. Liou, M., & Cheng, P.E. (1995). Asymptotic standard error of equipercentile equating.Journal of Educational and Behavioral Statistics, 20, 119–136.Google Scholar
  17. Lord, F.M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
  18. Lord, F.M. (1982). The standard error of equipercentile equating.Journal of Educational Statistics, 7, 165–174.Google Scholar
  19. Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”.Applied Psychological Measurement, 8, 452–461.Google Scholar
  20. Mislevy, R.J. (1992).Linking educational assessments: Concepts, issues, methods, and prospects. Princeton, NJ: Educational Testing Service.Google Scholar
  21. Morris, C.N. (1982). On the foundations of test equating. In P.W. Holland & D.B. Rubin (Eds.),Test equating (pp. 169–191). New York, NY: Academic Press.Google Scholar
  22. Pashley, P.J., & Philips, G.W. (1993).Towards world-class standards: A research study linking international and national assessments. Princeton, NJ: Educational Testing Service, Center for Educational Progress.Google Scholar
  23. Rasch, G. (1960).Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.Google Scholar
  24. Spearman, C. (1904). The proof and measurement of association between two things.American Journal of Psychology, 15, 72–101.Google Scholar
  25. Suppes, P., & Zinnes, J.L. (1963). Basic measurement theory. In R.D. Luce, R.R. Bush, & E. Galanter (Eds.),Handbook of mathematical psychology (Vol. 1, pp. 1–76). New York, NY: Wiley & Sons.Google Scholar
  26. van der Linden, W. J. (1996). Assembling tests for the measurement of multiple abilities.Applied Psychological Measurement, 20, 373–388.Google Scholar
  27. van der Linden, W.J. (1998a). Stochastic order in dichotomous iem response models for fixed, adaptive, and multidimensional tests.Psychometrika, 63, 211–226.Google Scholar
  28. van der Linden, W.J. (1998b). Optimal assembly of psychological and educational tests.Applied Psychological Measurement, 22, 195–211.Google Scholar
  29. van der Linden, W.J. (in press). Adaptive testing with equated number-correct scoring.Applied Psychological Measurement, 25.Google Scholar
  30. van der Linden, W.J., & Luecht, R.M. (1998). Observed-equating as a test assembly problem.Psychometrika, 63, 401–418.Google Scholar
  31. van der Linden, W.J. & Vos, J.H. (1996). A compensatory approach to optimal selection with mastery scores.Psychometrika, 61, 155–172.CrossRefGoogle Scholar
  32. Wilk, M. B., & Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data.Biometrika, 55, 1–17.PubMedGoogle Scholar
  33. Williams, V., Billaud, L., Davis, D., Thissen, D., & Sanford, E. (1995).Projecting the NAEP scale: Results from the North Carolina end—of-grade testing program (Technical Rep. No. 34). Chapel Hill, NC: University of North Carolina, Chapel Hill, National Institute of Statistical Sciences.Google Scholar
  34. Yen, W. (1983). Tau-equivalence and equipercentile equating.Psychometrika, 48, 353–369.CrossRefGoogle Scholar
  35. Zeng, L., & Kolen, M.J. (1995). An alternative approach for IRT observed-score equating of number-correct scores.Applied Psychological Measurement, 19, 231–240.Google Scholar

Copyright information

© The Psychometric Society 2000

Authors and Affiliations

  1. 1.Department of Educational Measurement and Data AnalysisUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations