Skip to main content

Part of the book series: Evaluation in Education and Human Services ((EEHS,volume 42))

Abstract

Performance assessments--such as writing prompts in language arts, hands-on experiments in science, or portfolios in mathematics--are being heralded as “authentic” assessments (e.g., Wiggins, 1989). They play a central role in the rhetoric, if not the reality, of proposed state and national testing programs (e.g., Bush, 1991; see Shavelson, Baxter & Pine, 1992). Unfortunately, in the headlong pursuit of testing reform, technical considerations may be pushed aside (e.g., Linn, Baker, & Dunbar, 1991; Shavelson & Baxter, 1992). Technical considerations include reliability and validity of performance assessments, and the interchangeability of alternative methods of measuring performance (e.g., hands-on investigations, computer simulations, short-answer questions). Empirical research has, to date, focused primarily on interrater and intertask reliability (e.g., Dunbar, Koretz, & Hoover, 1991; Shavelson, Baxter, Pine, & Yuré, 1991; Shavelson, Baxter, Pine, Yuré, Goldman, & Smith, 1991). The findings are consistent. Interrater reliability is not a problem. Raters can be trained to score performance reliably in real time or from surrogates such as notebooks (e.g., Baxter, Shavelson, Goldman, & Pine, 1992; Shavelson, et al., in press). Task-sampling variability, however, is large, substantially reducing the reliability of the measurements (e.g., Dunbar et al., 1991; Linn et al., 1991; Shavelson, Baxter, & Gao, in press). If performance assessments are to prove useful in practice, task-sampling variability must be addressed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Baxter, G. P., Shavelson, R. J., Goldman, S. R., & Pine, J. (1992). Evaluation of procedure-based scoring for hands-on science assessment. Journal of Educational Measurement, 29 (1), 1–17.

    Article  Google Scholar 

  • Baxter, G. P., Shavelson, R. J., Herman, S. J., Brown, K. A., & Valadez, J. (in press). Mathematics performance assessment: Technical quality and diverse student impact. Journal of Research in Mathematics Education.

    Google Scholar 

  • Brennan, R. L. (1992). Elements of generalizability theory (rev. ed.). Iowa City, IA: American College Testing.

    Google Scholar 

  • Bush, G. W. (1991). America 2000: An educational strategy. Washington, DC: U. S. Department of Education.

    Google Scholar 

  • Cardinet, J., Tourneur, Y., & Allal, L. (1981). Extension of generalizability theory and its applications in educational measurement. Journal of Educational Measurement, 18 (4), 183–204.

    Article  Google Scholar 

  • California State Department of Education. (1985). Mathematics framework for California public schools: Kindergarten through grade twelve. Sacramento, CA: The Curriculum Framework and Textbook Department Unit, California State Department of Education.

    Google Scholar 

  • California State Department of Education. (1990). Science framework for California public schools: Kindergarten through grade twelve. Sacramento, CA: The Curriculum Framework and Textbook Department Unit, California State Department of Education.

    Google Scholar 

  • Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed.), pp. 443–507. Washington, DC: American Council on Education.

    Google Scholar 

  • Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: John Wiley.

    Google Scholar 

  • Dunbar, S. B., Koretz, D. M., & Hoover, H.D., (1991). Quality control in the development and use of performance assessments. Applied Measurement in Education 4 (4), 289–303.

    Article  Google Scholar 

  • Guion, R. (1977). Content validity-the source of my discontent. Applied Psychological Measurement, 7, 1–10.

    Article  Google Scholar 

  • Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performancebased assessment: Expectations and validation criteria. Educational Researcher, 20 (8), 15–21.

    Google Scholar 

  • Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18 (2), 5–11.

    Google Scholar 

  • Shavelson, R. J., & Baxter, G. P. (1992). Performance assessments in science: A symmetry of teaching and testing. Educational Leadership, 49 (8), 20–25.

    Google Scholar 

  • Shavelson, R. J., Baxter, G. P., & Gao, X. (in press). Sampling variability of performance assessments. Journal of Educational Measurement.

    Google Scholar 

  • Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 21 (4), 22–27.

    Google Scholar 

  • Shavelson, R. J., Baxter, G. P., Pine, J., & Yuré, J. (1991, April). Alternative technologies for assessing science understanding. Paper presented at the annual meeting of the American Educational Research Association, Chicago.

    Google Scholar 

  • Shavelson, R. J., Baxter, G. P., Pine, J., Yuré, J., Goldman, S. R., & Smith, B. (1991). Alternative technologies for large-scale science assessment: Instruments of educational reform. School Effectiveness and School Improvement, 2(2), 97–114.

    Article  Google Scholar 

  • Shavelson, R. J., & Webb, N. M. (1981). Generalizability theory: 1973–1980.British Journal of Mathematical and Statistical Psychology, 34, 133–166,

    Article  Google Scholar 

  • Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage Publications.

    Google Scholar 

  • Wigdor, A. K., & Green, B. F. (1991). Performance assessment for the workplace. Washington D.C.: National Academy Press.

    Google Scholar 

  • Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 70 (9), 703–713.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer Science+Business Media New York

About this chapter

Cite this chapter

Shavelson, R.J., Gao, X., Baxter, G.P. (1996). On the Content Validity of Performance Assessments: Centrality of Domain Specification. In: Birenbaum, M., Dochy, F.J.R.C. (eds) Alternatives in Assessment of Achievements, Learning Processes and Prior Knowledge. Evaluation in Education and Human Services, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-0657-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-0657-3_5

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-4287-1

  • Online ISBN: 978-94-011-0657-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics