Abstract
When developing a test, there are numerous procedures that are useful for assessing the quality and measurement characteristics of test items. Not all procedures are appropriate for all types of tests, and not all procedures will indicate the same level of quality about a particular item. Classical test theory characteristics such as item difficulty, item discrimination, and response distractors are useful, as are characteristics associated with qualitative analyses and the techniques associated with item response theory. The challenge for all test developers is to evaluate the results of these procedures against the intended use of the test and make item-selection decisions that will support and maximize the test’s overall effectiveness in measuring what it intends to measure.
The better the items, the better the test.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aiken, L. R. (2000). Psychological testing and assessment. Boston, MA: Allyn & Bacon.
Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.
Engelhart, M. D. (1965). A comparison of several item discrimination indices. Journal of Educational Measurement, 2, 69–76.
Friedenberg, L. (1995). Psychological testing: Design, analysis, and use. Boston, MA: Allyn & Bacon.
Hopkins, K. D. (1998). Educational and psychological measurement and evaluation (8th ed.). Boston, MA: Allyn & Bacon.
Jaffe, L. E. (2009). Development, interpretation, and application of the W score and the relative proficiency index (Woodcock-Johnson III Assessment Service Bulletin No. 11). Rolling Meadows, IL: Riverside.
Johnson, A. P. (1951). Notes on a suggested index of item validity: The U-L index. Journal of Educational Measurement, 42, 499–504.
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30, 17–24.
Linn, R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching (8th ed.). Upper Saddle River, NJ: Prentice Hall.
Lord, F. M. (1952). The relation of the reliability of multiple-choice tests to the distribution of item difficulties. Psychometrika, 17, 181–194.
McGrew, K. S., LaForte, E. M., & Schrank, F. A. (2014). Technical Manual. Woodcock-Johnson IV. Rolling Meadows, IL: Riverside.
Nitko, A. J. (2001). Educational assessment of students. Upper Saddle River, NJ: Merrill Prentice Hall.
Oosterhof, A. C. (1976). Similarity of various item discrimination indices. Journal of Educational Measurement, 13, 145–150.
Popham, W. J. (2000). Modern educational measurement: Practical guidelines for educational leaders. Boston, MA: Allyn & Bacon.
Recommended Reading
Embretson, S., & Reise, S. (2000). Item response theory for psychologists. London, England: Taylor & Francis.
Johnson, A. P. (1951). Notes on a suggested index of item validity: The U-L index. Journal of Educational Measurement, 42, 499–504. This is a seminal article in the history of item analysis.
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30, 17–24. A real classic!
Author information
Authors and Affiliations
7.1 Electronic Supplementary Material
Supplementary File 7.1
(PPTX 199 kb)
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Reynolds, C.R., Altmann, R.A., Allen, D.N. (2021). Item Analysis: Methods for Fitting the Right Items to the Right Test. In: Mastering Modern Psychological Testing. Springer, Cham. https://doi.org/10.1007/978-3-030-59455-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-59455-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59454-1
Online ISBN: 978-3-030-59455-8
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)