Abstract
Performance assessments--such as writing prompts in language arts, hands-on experiments in science, or portfolios in mathematics--are being heralded as “authentic” assessments (e.g., Wiggins, 1989). They play a central role in the rhetoric, if not the reality, of proposed state and national testing programs (e.g., Bush, 1991; see Shavelson, Baxter & Pine, 1992). Unfortunately, in the headlong pursuit of testing reform, technical considerations may be pushed aside (e.g., Linn, Baker, & Dunbar, 1991; Shavelson & Baxter, 1992). Technical considerations include reliability and validity of performance assessments, and the interchangeability of alternative methods of measuring performance (e.g., hands-on investigations, computer simulations, short-answer questions). Empirical research has, to date, focused primarily on interrater and intertask reliability (e.g., Dunbar, Koretz, & Hoover, 1991; Shavelson, Baxter, Pine, & Yuré, 1991; Shavelson, Baxter, Pine, Yuré, Goldman, & Smith, 1991). The findings are consistent. Interrater reliability is not a problem. Raters can be trained to score performance reliably in real time or from surrogates such as notebooks (e.g., Baxter, Shavelson, Goldman, & Pine, 1992; Shavelson, et al., in press). Task-sampling variability, however, is large, substantially reducing the reliability of the measurements (e.g., Dunbar et al., 1991; Linn et al., 1991; Shavelson, Baxter, & Gao, in press). If performance assessments are to prove useful in practice, task-sampling variability must be addressed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baxter, G. P., Shavelson, R. J., Goldman, S. R., & Pine, J. (1992). Evaluation of procedure-based scoring for hands-on science assessment. Journal of Educational Measurement, 29 (1), 1–17.
Baxter, G. P., Shavelson, R. J., Herman, S. J., Brown, K. A., & Valadez, J. (in press). Mathematics performance assessment: Technical quality and diverse student impact. Journal of Research in Mathematics Education.
Brennan, R. L. (1992). Elements of generalizability theory (rev. ed.). Iowa City, IA: American College Testing.
Bush, G. W. (1991). America 2000: An educational strategy. Washington, DC: U. S. Department of Education.
Cardinet, J., Tourneur, Y., & Allal, L. (1981). Extension of generalizability theory and its applications in educational measurement. Journal of Educational Measurement, 18 (4), 183–204.
California State Department of Education. (1985). Mathematics framework for California public schools: Kindergarten through grade twelve. Sacramento, CA: The Curriculum Framework and Textbook Department Unit, California State Department of Education.
California State Department of Education. (1990). Science framework for California public schools: Kindergarten through grade twelve. Sacramento, CA: The Curriculum Framework and Textbook Department Unit, California State Department of Education.
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed.), pp. 443–507. Washington, DC: American Council on Education.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: John Wiley.
Dunbar, S. B., Koretz, D. M., & Hoover, H.D., (1991). Quality control in the development and use of performance assessments. Applied Measurement in Education 4 (4), 289–303.
Guion, R. (1977). Content validity-the source of my discontent. Applied Psychological Measurement, 7, 1–10.
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performancebased assessment: Expectations and validation criteria. Educational Researcher, 20 (8), 15–21.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18 (2), 5–11.
Shavelson, R. J., & Baxter, G. P. (1992). Performance assessments in science: A symmetry of teaching and testing. Educational Leadership, 49 (8), 20–25.
Shavelson, R. J., Baxter, G. P., & Gao, X. (in press). Sampling variability of performance assessments. Journal of Educational Measurement.
Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 21 (4), 22–27.
Shavelson, R. J., Baxter, G. P., Pine, J., & Yuré, J. (1991, April). Alternative technologies for assessing science understanding. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
Shavelson, R. J., Baxter, G. P., Pine, J., Yuré, J., Goldman, S. R., & Smith, B. (1991). Alternative technologies for large-scale science assessment: Instruments of educational reform. School Effectiveness and School Improvement, 2(2), 97–114.
Shavelson, R. J., & Webb, N. M. (1981). Generalizability theory: 1973–1980.British Journal of Mathematical and Statistical Psychology, 34, 133–166,
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage Publications.
Wigdor, A. K., & Green, B. F. (1991). Performance assessment for the workplace. Washington D.C.: National Academy Press.
Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 70 (9), 703–713.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer Science+Business Media New York
About this chapter
Cite this chapter
Shavelson, R.J., Gao, X., Baxter, G.P. (1996). On the Content Validity of Performance Assessments: Centrality of Domain Specification. In: Birenbaum, M., Dochy, F.J.R.C. (eds) Alternatives in Assessment of Achievements, Learning Processes and Prior Knowledge. Evaluation in Education and Human Services, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0657-3_5
Download citation
DOI: https://doi.org/10.1007/978-94-011-0657-3_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-4287-1
Online ISBN: 978-94-011-0657-3
eBook Packages: Springer Book Archive