On the Content Validity of Performance Assessments: Centrality of Domain Specification

Shavelson, Richard J.; Gao, Xiaohong; Baxter, Gail P.

doi:10.1007/978-94-011-0657-3_5

Richard J. Shavelson,
Xiaohong Gao &
Gail P. Baxter

Part of the book series: Evaluation in Education and Human Services ((EEHS,volume 42))

562 Accesses
13 Citations

Abstract

Performance assessments--such as writing prompts in language arts, hands-on experiments in science, or portfolios in mathematics--are being heralded as “authentic” assessments (e.g., Wiggins, 1989). They play a central role in the rhetoric, if not the reality, of proposed state and national testing programs (e.g., Bush, 1991; see Shavelson, Baxter & Pine, 1992). Unfortunately, in the headlong pursuit of testing reform, technical considerations may be pushed aside (e.g., Linn, Baker, & Dunbar, 1991; Shavelson & Baxter, 1992). Technical considerations include reliability and validity of performance assessments, and the interchangeability of alternative methods of measuring performance (e.g., hands-on investigations, computer simulations, short-answer questions). Empirical research has, to date, focused primarily on interrater and intertask reliability (e.g., Dunbar, Koretz, & Hoover, 1991; Shavelson, Baxter, Pine, & Yuré, 1991; Shavelson, Baxter, Pine, Yuré, Goldman, & Smith, 1991). The findings are consistent. Interrater reliability is not a problem. Raters can be trained to score performance reliably in real time or from surrogates such as notebooks (e.g., Baxter, Shavelson, Goldman, & Pine, 1992; Shavelson, et al., in press). Task-sampling variability, however, is large, substantially reducing the reliability of the measurements (e.g., Dunbar et al., 1991; Linn et al., 1991; Shavelson, Baxter, & Gao, in press). If performance assessments are to prove useful in practice, task-sampling variability must be addressed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baxter, G. P., Shavelson, R. J., Goldman, S. R., & Pine, J. (1992). Evaluation of procedure-based scoring for hands-on science assessment. Journal of Educational Measurement, 29 (1), 1–17.
Article Google Scholar
Baxter, G. P., Shavelson, R. J., Herman, S. J., Brown, K. A., & Valadez, J. (in press). Mathematics performance assessment: Technical quality and diverse student impact. Journal of Research in Mathematics Education.
Google Scholar
Brennan, R. L. (1992). Elements of generalizability theory (rev. ed.). Iowa City, IA: American College Testing.
Google Scholar
Bush, G. W. (1991). America 2000: An educational strategy. Washington, DC: U. S. Department of Education.
Google Scholar
Cardinet, J., Tourneur, Y., & Allal, L. (1981). Extension of generalizability theory and its applications in educational measurement. Journal of Educational Measurement, 18 (4), 183–204.
Article Google Scholar
California State Department of Education. (1985). Mathematics framework for California public schools: Kindergarten through grade twelve. Sacramento, CA: The Curriculum Framework and Textbook Department Unit, California State Department of Education.
Google Scholar
California State Department of Education. (1990). Science framework for California public schools: Kindergarten through grade twelve. Sacramento, CA: The Curriculum Framework and Textbook Department Unit, California State Department of Education.
Google Scholar
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed.), pp. 443–507. Washington, DC: American Council on Education.
Google Scholar
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: John Wiley.
Google Scholar
Dunbar, S. B., Koretz, D. M., & Hoover, H.D., (1991). Quality control in the development and use of performance assessments. Applied Measurement in Education 4 (4), 289–303.
Article Google Scholar
Guion, R. (1977). Content validity-the source of my discontent. Applied Psychological Measurement, 7, 1–10.
Article Google Scholar
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performancebased assessment: Expectations and validation criteria. Educational Researcher, 20 (8), 15–21.
Google Scholar
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18 (2), 5–11.
Google Scholar
Shavelson, R. J., & Baxter, G. P. (1992). Performance assessments in science: A symmetry of teaching and testing. Educational Leadership, 49 (8), 20–25.
Google Scholar
Shavelson, R. J., Baxter, G. P., & Gao, X. (in press). Sampling variability of performance assessments. Journal of Educational Measurement.
Google Scholar
Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 21 (4), 22–27.
Google Scholar
Shavelson, R. J., Baxter, G. P., Pine, J., & Yuré, J. (1991, April). Alternative technologies for assessing science understanding. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
Google Scholar
Shavelson, R. J., Baxter, G. P., Pine, J., Yuré, J., Goldman, S. R., & Smith, B. (1991). Alternative technologies for large-scale science assessment: Instruments of educational reform. School Effectiveness and School Improvement, 2(2), 97–114.
Article Google Scholar
Shavelson, R. J., & Webb, N. M. (1981). Generalizability theory: 1973–1980.British Journal of Mathematical and Statistical Psychology, 34, 133–166,
Article Google Scholar
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage Publications.
Google Scholar
Wigdor, A. K., & Green, B. F. (1991). Performance assessment for the workplace. Washington D.C.: National Academy Press.
Google Scholar
Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 70 (9), 703–713.
Google Scholar

Download references

Authors

Richard J. Shavelson
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Gail P. Baxter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tel Aviv University, Israel
Menucha Birenbaum
University of Heerlen, The Netherlands
Filip J. R. C. Dochy

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shavelson, R.J., Gao, X., Baxter, G.P. (1996). On the Content Validity of Performance Assessments: Centrality of Domain Specification. In: Birenbaum, M., Dochy, F.J.R.C. (eds) Alternatives in Assessment of Achievements, Learning Processes and Prior Knowledge. Evaluation in Education and Human Services, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0657-3_5

Download citation

DOI: https://doi.org/10.1007/978-94-011-0657-3_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-4287-1
Online ISBN: 978-94-011-0657-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics