Quantifying the Bias of Non-linear Equating and Score Transformations

von Davier, Matthias; Clauser, Brian

doi:10.1007/978-3-031-10370-4_9

Matthias von Davier¹² &
Brian Clauser¹³

Part of the book series: Methodology of Educational Measurement and Assessment ((MEMA))

431 Accesses

Abstract

This paper shows that using non-linear functions for equating and score transformations leads to consequences that are not commensurable with classical test theory (CTT). More specifically, a well-known theorem from calculus shows that the expected value of a non-linearly transformed variable does not equal the transformed expected value of this variable. Translated to CTT this implies that the transformed observed test score does not have an unbiased expectation, i.e., is different from the transformed true score. In order to quantify the bias, second-order Taylor expansions are used in this work to show that non-linear equating and scale transformations do not only lead to variability of SEMs but also to predictable bias in the expected values of the transformed observed scores. In line with Lord’s finding that is often described as “Equating is either unnecessary or impossible,” this bias due to non-linear equating vanishes either for perfectly reliable tests, or if the equating function is indeed linear, i.e., the tests are congeneric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Braun (2021), personal communication.
2.
The acronyms ACT and SAT are household names in the USA as well as for many international students. ACT stands for “American College Test,” and SAT has no meaning as an acronym. The SAT acronym originally stood for “Scholastic Aptitude Test,” but as the test evolved, the acronym’s meaning was dropped (https://blog.collegeboard.org/difference-between-sat-and-psat).
3.
Three score points are located within a closed interval of length 2 on the ACT scale, and 5 points are contained in a closed interval of 4.

References

ACT, Inc. (2009). ACT-SAT concordance table. https://research.collegeboard.org/sites/default/files/publications/2012/7/researchnote-2009-40-act-sat-concordance-tables.pdf
Dorans, N. J. (1999). Correspondences between ACT and SAT I scores (College Board Research Reports 99-01). The College Board. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/j.2333-8504.1999.tb01800.x
Book Google Scholar
Feldt, L. S., & Qualls, A. L. (1998). Approximating scale score standard error of measurement from the raw score standard error. Applied Measurement in Education, 11(2), 159–177. https://doi.org/10.1207/s15324818ame1102_3
Article Google Scholar
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38. https://doi.org/10.2307/2336755
Article Google Scholar
Forster, O. (1984). Analysis 1. Differential- und Integralrechnung einer Veränderlichen [Calculus 1. Univariate differential and integral calculus]. Vieweg & Sohn.
Google Scholar
Hendricks, J. (1967). The Iowa tests of educational development as predictors of academic success at Utah State University. All Graduate Theses and Dissertations. https://doi.org/10.26076/9f3e-393b
Holland, P. W., & Hoskens, M. (2003). Classical Test Theory as a first-order Item Response Theory: Application to true-score prediction from a possibly nonparallel test. Psychometrika, 68, 123–149. https://doi.org/10.1007/BF02296657
Jensen, J. L. W. V. (1906). Sur les fonctions convexes et les inégalités entre les valeurs moyennes [On convex functions and inequalities between mean values]. Acta Mathematica, 30(1), 175–193. https://doi.org/10.1007/BF02418571
Article Google Scholar
Jones, E., Oliphant, T., & Peterson, P. (2001). SciPy: Open source scientific tools for Python [Computer Software]. http://www.scipy.org/
Kolen, M. J., Hanson, B. A., & Brennan, R. L. (1992). Conditional standard errors of measurement for scale scores. Journal of Educational Measurement, 29(4), 285–307. http://www.jstor.org/stable/1435086
Article Google Scholar
Lockwood, J. R., & McCaffrey, D. F. (2015). Should nonlinear functions of test scores be used as covariates in a regression model? In R. W. Lissitz & H. Jiao (Eds.), Value added modeling and growth modeling with particular application to teacher and school effectiveness (pp. 1–36). Information Age Publishing.
Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erblaum.
Google Scholar
Rohatgi, A. (2015). Web plot digitizer [Computer Software]. https://automeris.io/WebPlotDigitizer
Taylor, B. (1715). Methodus incrementorum directa et inversa [Direct and inverse increments method]. William Innys.
Google Scholar
von Davier, M. (1995). Winmira user manual. Chapter on person parameter estimation using WLE. IPN: Kiel University. http://208.76.80.46/~svfklumu/wmira/winmiramanual.pdf
von Davier, A. (2008). New results on the linear equating methods for the non-equivalent-groups design. Journal of Educational and Behavioral Statistics, 33(2), 186–203. http://www.jstor.org/stable/20172112
Article Google Scholar
von Davier, M. (2017). CTT and No-DIF and ? = (Almost) Rasch Model. In M. Rosén, K. Yang Hansen, & U. Wolff (Eds.), Cognitive abilities and educational outcomes: A Festschrift in Honour of Jan-Eric Gustafsson (pp. 249–272). Springer. https://doi.org/10.1007/978-3-319-43473-5_14
Chapter Google Scholar
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
Article Google Scholar
Wolter, K. (2007). Introduction to variance estimation (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-35099-8
Book Google Scholar
Woodruff, D., Traynor, A., Cui, Z., & Fang, Y. (2013). A comparison of three methods for computing scale score conditional standard errors of measurement (ACT Research Report No. 2013-7). American College Testing Program. https://files.eric.ed.gov/fulltext/ED555593.pdf
Google Scholar
Yin, P., Brennan, R. L., & Kolen, M. J. (2004). Concordance between ACT and ITED scores from different populations. Applied Psychological Measurement, 28(4), 274–289. https://doi.org/10.1177/0146621604265034
Article Google Scholar

Download references

Author information

Authors and Affiliations

Boston College, Chestnut Hill, MA, USA
Matthias von Davier
NBME, Philadelphia, PA, USA
Brian Clauser

Authors

Matthias von Davier
View author publications
You can also search for this author in PubMed Google Scholar
Brian Clauser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias von Davier .

Editor information

Editors and Affiliations

Research Institute of Child Development and Education, University of Amsterdam, Amsterdam, The Netherlands
L. Andries van der Ark
Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands
Wilco H. M. Emons
The expertise group Psychometrics and Statistics, University of Groningen, Groningen, The Netherlands
Rob R. Meijer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

von Davier, M., Clauser, B. (2023). Quantifying the Bias of Non-linear Equating and Score Transformations. In: van der Ark, L.A., Emons, W.H.M., Meijer, R.R. (eds) Essays on Contemporary Psychometrics. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-031-10370-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-10370-4_9
Published: 16 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10369-8
Online ISBN: 978-3-031-10370-4
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics