Abstract
There are various reasons for placing different weights on items of test forms such as increasing test reliability or validity and improving measurement precision. Different weighting schemes have been used to accommodate different purposes under different testing situations. However, when the items are weighted, the question is how to equate the test forms containing those weighted items. Under IRT, there are two commonly used equating methods—IRT true score equating and IRT observed score equating. Applying the weights on items to IRT true score equating is straightforward and the software WITSE (Chien and Shin, WITSE: A program for weighted IRT true score equating, Version 1.0. Iowa City, IA: Pearson, 2008) had been specifically developed for weighted scores using IRT true score equating. Yet, currently, there is no procedure or algorithm available for the IRT weighted observed score equating due to the great complexity augmented by imposing weights on items. The regular IRT observed score equating constructs the estimated observed score distributions for two test forms, which are typically obtained using recursive algorithm. However, when items have different weights, the recursive algorithm is no longer feasible. Therefore, an extended recursive algorithm based on the recursive algorithm is proposed in this paper to construct the estimated observed score distribution and is illustrated with a real data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brennan, R. L., Wang, T., Kim, S., & Seol, J. (2009). Equating recipes (CASMA Monograph Number 1). Iowa City, IA: Center for Advanced Studies in Measurement and Assessment, the University of Iowa. Available from the web address: http://www.uiowa.edu/~casma
Chang, S. (2009). Choice of weighting scheme in forming the composite. Bulletin of Educational Psychology, 40(3), 489–510.
Chien, Y., & Shin, D. C. (2008). WITSE: A program for weighted IRT true score equating, Version 1.0. Iowa City, IA: Pearson.
Ercikan, K., Schwarz, R. D., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and scoring of tests with multiple-choice and constructed-response item types. Journal of Educational Measurement, 35, 137–154.
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Ito, K., & Sykes, R. C. (2000, June). An evaluation of “intentional” weighting of extended-response or constructed-response items in tests with mixed item types. Paper presented at the annual national conference on large scale assessment, Snowbird, Utah.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. New York, NY: Springer.
Lord, F. M. & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8, 453–461.
Lukhele, R., & Sireci, G. (1995). Using IRT to combine multiple-choice and free-response sections of a test onto a common scale using a priori weights. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.
McDonald, R. P. (1968). A unified treatment of the weighting problem. Psychometrika, 33, 351–381.
Rudner, L. M. (2001). Informed test component weighting. Educational Measurement: Issues and Practice, 20(1), 16–19.
Schaeffer, G. A., Henderson-Montero, D., & Julian, M. (2002). A comparison of three scoring methods for tests with selected-response and constructed-response items. Educational Assessment, 8(4), 317–340.
Stucky, B. D. (2009). Item response theory for weighted summed scores (Unpublished manuscript).
Sykes, R. C., & Hou, L. (2003). Weighting constructed-response items in IRT-based exams. Applied Measurement in Education, 16, 257–275.
Sykes, R. C., Truskosky, D., & White, H. (2001, April). Determining the representation of constructed-response items in mixed-item format exams. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle, WA.
Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103–118.
Wang, M. D., & Stanley, J. C. (1970). Differential weighting: A review of methods and empirical studies. Review of Educational Research, 40, 663–705.
Wilson, M., & Wang, W. (1995). Complex composites: Issues that arise in combining different modes of assessment. Applied Psychological Measurement, 19, 51–71.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Chien, Y., Shin, C.D. (2013). A Recursive Algorithm for IRT Weighted Observed Score Equating. In: Millsap, R.E., van der Ark, L.A., Bolt, D.M., Woods, C.M. (eds) New Developments in Quantitative Psychology. Springer Proceedings in Mathematics & Statistics, vol 66. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9348-8_24
Download citation
DOI: https://doi.org/10.1007/978-1-4614-9348-8_24
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-9347-1
Online ISBN: 978-1-4614-9348-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)