Scoring and Scaling Educational Tests

Kolen, Michael J.; Tong, Ye; Brennan, Robert L.

doi:10.1007/978-0-387-98138-3_3

Michael J. Kolen²,
Ye Tong³ &
Robert L. Brennan⁴

Part of the book series: Statistics for Social and Behavioral Sciences ((SSBS))

2445 Accesses

Abstract

The numbers that are associated with examinee performance on educational or psychological tests are defined through the process of scaling. This process produces a score scale, and the scores that are reported to examinees are referred to as scale scores. Kolen (2006) referred to the term primary score scale, which is the focus of this chapter, as the scale that is used to underlie psychometric properties for tests.

A key component in the process of developing a score scale is the raw score for an examinee on a test, which is a function of the item scores for that examinee. Raw scores can be as simple as a sum of the item scores or be so complicated that they depend on the entire pattern of item responses.

Raw scores are transformed to scale scores to facilitate the meaning of scores for test users. For example, raw scores might be transformed to scale scores so that they have predefined distributional properties for a particular group of examinees, referred to as a norm group. Normative information might be incorporated by constructing scale scores to be approximately normally distributed with a mean of 50 and a standard deviation of 10 for a national population of examinees. In addition, procedures can be used for incorporating content and score precision information into score scales.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

ACT. (2001). EXPLORE technical manual. Iowa City, IA: Author.
Google Scholar
Allen, N. L., Carlson, J. E., & Zelenak, C. A. (1999). The NAEP 1996 technical report. Washington, DC: National Center for Education Statistics.
Google Scholar
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: ETS. (Reprinted from Educational measurement, 2nd ed., pp. 508–600, by R. L. Thorndike, Ed., 1971, Washington, DC: American Council on Education)
Google Scholar
Ban, J.-C., & Lee, W.-C. (2007). Defining a score scale in relation to measurement error for mixed format tests (CASMA Research Report Number 24). Iowa City, IA: Center for Advanced Studies in Measurement and Assessment.
Google Scholar
Bock, R. D., Thissen, D., & Zimowski, M. F. (1997). IRT estimation of domain scores. Journal of Educational Measurement, 34(3), 197–211.
Article Google Scholar
Drasgow, F., Luecht, R. M., & Bennett, R. E. (2006). Technology and testing. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 471–515). Westport, CT: American Council on Education and Praeger.
Google Scholar
Ebel, R. L. (1962). Content standard test scores. Educational and Psychological Measurement, 22(1), 15–25.
Google Scholar
Flanagan, J. C. (1951). Units, scores, and norms. In E. F. Lindquist (Ed.), Educational measurement (pp. 695-763). Washington, DC: American Council on Education.
Google Scholar
Freeman, M. F., & Tukey, J. W. (1950). Transformations related to the angular and square root. Annals of Mathematical Statistics, 21(4), 607–611.
Article Google Scholar
Hambleton, R. K., & Pitoniak, M. J. (2006) Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433–470). Westport, CT: American Council on Education and Praeger.
Google Scholar
Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187–220). Westport, CT: American Council on Education and Praeger.
Google Scholar
Iowa Tests of Educational Development. (1958). Manual for the school administrator (Rev. ed.). Iowa City: State University of Iowa.
Google Scholar
Kolen, M. J. (1988). Defining score scales in relation to measurement error. Journal of Educational Measurement, 25(2), 97–110.
Article Google Scholar
Kolen, M. J. (2006). Scaling and norming. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 155–186). Westport, CT: American Council on Education and Praeger.
Google Scholar
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). New York, NY: Springer-Verlag.
Book Google Scholar
Kolen, M. J., Hanson, B. A., & Brennan, R. L. (1992). Conditional standard errors of measurement for scale scores. Journal of Educational Measurement, 29(4), 285–307.
Article Google Scholar
Kolen, M. J., Zeng, L., & Hanson, B. A. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33(2), 129–140.
Article Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
Google Scholar
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings.” Applied Psychological Measurement, 8(4), 453–461.
Article Google Scholar
McCall, W. A. (1939). Measurement: A revision of how to measure in education. New York, NY: Macmillan.
Google Scholar
Muraki, E. (1993) Information functions of the generalized partial credit model. Applied Psychological Measurement, 14(4), 351–363.
Article Google Scholar
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). New York, NY: Macmillan.
Google Scholar
Pommerich, M., Nicewander, W. A., & Hanson, B. A. (1999). Estimating average domain scores. Journal of Educational Measurement, 36(3), 199–216.
Article Google Scholar
Rodriguez, M. C. (2003). Construct equivalence of multiple-choice and constructed-response items: A random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163–184.
Article Google Scholar
Rosa, K., Swygert, K. A., Nelson, L., & Thissen, D. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items—Scale scores for patterns of summed scores. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 253–292). Mahwah, NJ: Erlbaum.
Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, Pt. 1).
Google Scholar
Thissen, D., Nelson, L., & Swygert, K. A. (2001). Item response theory applied to combinations of multiple-choice and constructed-response items—Approximation methods for scale scores. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 293–341). Mahwah, NJ: Erlbaum.
Google Scholar
Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 73–140). Mahwah, NJ: Erlbaum.
Google Scholar
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. L. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19(1), 39–49.
Article Google Scholar
Thissen, D., & Wainer, H. (Eds.). (2001). Test scoring. Mahwah, NJ: Erlbaum.
Google Scholar
Tong, Y., & Kolen, M. J. (2005). Assessing equating results on different equating criteria. Applied Psychological Measurement, 29(6), 418–432.
Article Google Scholar
Tong, Y., & Kolen, M. J. (2007). Comparisons of methodologies and results in vertical scaling for educational achievement tests. Applied Measurement in Education, 20(2), 227–253.
Article Google Scholar
van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York, NY: Springer-Verlag.
Google Scholar
Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103–118.
Article Google Scholar
Wang, T., Kolen, M. J., & Harris, D. J. (2000). Psychometric properties of scale scores and performance levels for performance assessments using polytomous IRT. Journal of Educational Measurement, 37(2), 141–162.
Article Google Scholar
Yen, W., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 111–153). Westport, CT: American Council on Education and Praeger.
Google Scholar
Zwick, R., Senturk, D., Wang, J., & Loomis, S. C. (2001). An investigation of alternative methods for item mapping in the National Assessment of Educational Progress. Educational Measurement: Issues and Practice, 20(2), 15–25.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Iowa, 224 B1 Lindquist Center, Iowa City, IA, 52242, USA
Michael J. Kolen
Pearson, 2510 North Dodge Street, Iowa City, IA, 52245, USA
Ye Tong
University of Iowa, 210D Lindquist Center, Iowa City, IA, 52242, USA
Robert L. Brennan

Authors

Michael J. Kolen
View author publications
You can also search for this author in PubMed Google Scholar
Ye Tong
View author publications
You can also search for this author in PubMed Google Scholar
Robert L. Brennan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael J. Kolen .

Editor information

Editors and Affiliations

Educational Testing Service, Rosedale Road MS 06 P, Princeton, 08541, New Jersey, USA
Alina A. von Davier

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kolen, M.J., Tong, Y., Brennan, R.L. (2009). Scoring and Scaling Educational Tests. In: von Davier, A. (eds) Statistical Models for Test Equating, Scaling, and Linking. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-0-387-98138-3_3

Download citation

DOI: https://doi.org/10.1007/978-0-387-98138-3_3
Published: 15 September 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-98137-6
Online ISBN: 978-0-387-98138-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics