Abstract
The numbers that are associated with examinee performance on educational or psychological tests are defined through the process of scaling. This process produces a score scale, and the scores that are reported to examinees are referred to as scale scores. Kolen (2006) referred to the term primary score scale, which is the focus of this chapter, as the scale that is used to underlie psychometric properties for tests.
A key component in the process of developing a score scale is the raw score for an examinee on a test, which is a function of the item scores for that examinee. Raw scores can be as simple as a sum of the item scores or be so complicated that they depend on the entire pattern of item responses.
Raw scores are transformed to scale scores to facilitate the meaning of scores for test users. For example, raw scores might be transformed to scale scores so that they have predefined distributional properties for a particular group of examinees, referred to as a norm group. Normative information might be incorporated by constructing scale scores to be approximately normally distributed with a mean of 50 and a standard deviation of 10 for a national population of examinees. In addition, procedures can be used for incorporating content and score precision information into score scales.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ACT. (2001). EXPLORE technical manual. Iowa City, IA: Author.
Allen, N. L., Carlson, J. E., & Zelenak, C. A. (1999). The NAEP 1996 technical report. Washington, DC: National Center for Education Statistics.
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: ETS. (Reprinted from Educational measurement, 2nd ed., pp. 508–600, by R. L. Thorndike, Ed., 1971, Washington, DC: American Council on Education)
Ban, J.-C., & Lee, W.-C. (2007). Defining a score scale in relation to measurement error for mixed format tests (CASMA Research Report Number 24). Iowa City, IA: Center for Advanced Studies in Measurement and Assessment.
Bock, R. D., Thissen, D., & Zimowski, M. F. (1997). IRT estimation of domain scores. Journal of Educational Measurement, 34(3), 197–211.
Drasgow, F., Luecht, R. M., & Bennett, R. E. (2006). Technology and testing. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 471–515). Westport, CT: American Council on Education and Praeger.
Ebel, R. L. (1962). Content standard test scores. Educational and Psychological Measurement, 22(1), 15–25.
Flanagan, J. C. (1951). Units, scores, and norms. In E. F. Lindquist (Ed.), Educational measurement (pp. 695-763). Washington, DC: American Council on Education.
Freeman, M. F., & Tukey, J. W. (1950). Transformations related to the angular and square root. Annals of Mathematical Statistics, 21(4), 607–611.
Hambleton, R. K., & Pitoniak, M. J. (2006) Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433–470). Westport, CT: American Council on Education and Praeger.
Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187–220). Westport, CT: American Council on Education and Praeger.
Iowa Tests of Educational Development. (1958). Manual for the school administrator (Rev. ed.). Iowa City: State University of Iowa.
Kolen, M. J. (1988). Defining score scales in relation to measurement error. Journal of Educational Measurement, 25(2), 97–110.
Kolen, M. J. (2006). Scaling and norming. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 155–186). Westport, CT: American Council on Education and Praeger.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York, NY: Springer-Verlag.
Kolen, M. J., Hanson, B. A., & Brennan, R. L. (1992). Conditional standard errors of measurement for scale scores. Journal of Educational Measurement, 29(4), 285–307.
Kolen, M. J., Zeng, L., & Hanson, B. A. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33(2), 129–140.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings.” Applied Psychological Measurement, 8(4), 453–461.
McCall, W. A. (1939). Measurement: A revision of how to measure in education. New York, NY: Macmillan.
Muraki, E. (1993) Information functions of the generalized partial credit model. Applied Psychological Measurement, 14(4), 351–363.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). New York, NY: Macmillan.
Pommerich, M., Nicewander, W. A., & Hanson, B. A. (1999). Estimating average domain scores. Journal of Educational Measurement, 36(3), 199–216.
Rodriguez, M. C. (2003). Construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163–184.
Rosa, K., Swygert, K. A., Nelson, L., & Thissen, D. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items—Scale scores for patterns of summed scores. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 253–292). Mahwah, NJ: Erlbaum.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, Pt. 1).
Thissen, D., Nelson, L., & Swygert, K. A. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items—Approximation methods for scale scores. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 293–341). Mahwah, NJ: Erlbaum.
Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 73–140). Mahwah, NJ: Erlbaum.
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. L. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19(1), 39–49.
Thissen, D., & Wainer, H. (Eds.). (2001). Test scoring. Mahwah, NJ: Erlbaum.
Tong, Y., & Kolen, M. J. (2005). Assessing equating results on different equating criteria. Applied Psychological Measurement, 29(6), 418–432.
Tong, Y., & Kolen, M. J. (2007). Comparisons of methodologies and results in vertical scaling for educational achievement tests. Applied Measurement in Education, 20(2), 227–253.
van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York, NY: Springer-Verlag.
Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103–118.
Wang, T., Kolen, M. J., & Harris, D. J. (2000). Psychometric properties of scale scores and performance levels for performance assessments using polytomous IRT. Journal of Educational Measurement, 37(2), 141–162.
Yen, W., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 111–153). Westport, CT: American Council on Education and Praeger.
Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15–25.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Kolen, M.J., Tong, Y., Brennan, R.L. (2009). Scoring and Scaling Educational Tests. In: von Davier, A. (eds) Statistical Models for Test Equating, Scaling, and Linking. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-0-387-98138-3_3
Download citation
DOI: https://doi.org/10.1007/978-0-387-98138-3_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-98137-6
Online ISBN: 978-0-387-98138-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)