Using Evidence Centered Design to Think About Assessments
Evidence-centered assessment design (ECD) provides a simple principle as the basis of assessment design: assessment tasks should be designed to provide evidence of the claims which the assessment designers wish to make about the examinees. This paper looks at the Bayesian model of evidence which underlies much of the ECD philosophy. It then goes on to explore how the ECD principle can help assessment designers think about three important issues in the future of assessment: (1) How can we organize evidence about student performance gathered from diverse sources across multiple time points? (2) How should we balance information gathered about multiple aspects of proficiency? (3) How should we collect evidence from complex tasks? The chapter illustrates these ideas with some examples of advanced assessments that have used ECD.
KeywordsEvidence-centered assessment design Decision analysis Constructed response Diagnostic assessment
Evidence-centered assessment design was originally a three-way collaboration between myself, Bob Mislevy and Linda Steinberg. Although this work represents my perspective on ECD, much of my perspective has become sufficiently mixed with ideas that originated with Bob or Linda. Similarly, my perspective has been expanded by discussions with too many colleagues to mention. Malcolm Bauer, Dan Eigner, Yoon-Jeon Kim, and Thomas Quinlan made numerous suggestions that helped improve the clarity of this chapter.
Any opinions expressed in this chapter are those of the author and not necessarily of Educational Testing Service.
- Advanced Distributed Learning (ADL). (2009). SCORM 2004 4th Edition Version 1.1 Overview. Retrieved September 06, 2009, from http://www.adlnet.gov/Technologies/scorm/SCORMSDocuments/2004%204th%20Edition/Overview.aspx.
- Almond, R. G. (in press). Estimating parameters of periodic assessment models. To appear in Educational Testing Service Research Report series, Princeton, NJ.Google Scholar
- Almond, R. G., DiBello, L., Jenkins, F., Mislevy, R. J., Senturk, D., Steinberg, L. S., et al. (2001). Models for conditional probability tables in educational assessment. In T. Jaakkola & T. Richardson (Eds.), Artificial intelligence and statistics 2001 (pp. 137–143). San Francisco: Morgan Kaufmann.Google Scholar
- Almond, R. G., Mislevy, R. J., & Yan, D. (2007). Using anchor sets to identify scale and location of latent variables. Paper presented at Annual meeting of the National Council on Measurement in Education (NCME), Chicago.Google Scholar
- Almond, R. G., Mulder, J., Hemat, L. A., & Yan, D. (2009). Bayesian Network models for local dependence among observable outcome variables. Journal of Educational and Behavioral Statstics, 34(4), 491–521.Google Scholar
- Almond, R. G. (2010). ‘I can name that Bayesian network in two matrixes’. International Journal of Approximate Reasoning, 51, 167–178.Google Scholar
- Behrens, J. T., Mislevy, R. J., Bauer, M., Williamson, D. M., & Levy, R. (2004). Introduction to evidence centered design and lessons learned from its application in a global E-Learning program. International Journal of Measurement, 4, 295–301.Google Scholar
- Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1–94. Retrieved from citeseer.ist.psu.edu/boutilier99decisiontheoretic.html.Google Scholar
- Gansle, K. A., VanDerHeyden, A. M., Noell, G. H., Resetar, J. L., & Williams, K. L. (2006). The technical adequacy of curriculum-based and rating-based measures of written expression for elementary school students. School Psychology Review, 35(3), 435–450.Google Scholar
- Gierl, M. J., Leighton, J. P., & Hunka, S. M. (2007). Using the attribute hierarchy method to make diagnostic inferences about examinees’ cognitive skills. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment: Theories and applications (pp. 242–274). Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
- Gitomer, D. H., Steinberg, L. S., & Mislevy, R. J. (1995). Diagnostic assessment of trouble-shooting skill in an intelligent tutoring system. In P. D. Nichols, S. F. Chipman, & R. L. Brennen (Eds.), Cognitively diagnostic assessment (pp. 73–101). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- Good, I. J. (1950). Probability and the weighing of evidence. London: Charles Griffin.Google Scholar
- Good, I. J. (1985). Weight of evidence: A brief survey. In J. Bernardo, M. DeGroot, D. Lindley, & A. Smith (Eds.), Bayesian statistics 2 (pp. 249–269). Amsterdam: North Holland.Google Scholar
- Good, I. J., & Card, W. (1971). The diagnostic process with special reference to errors. Methods of Information in Medicine, 10, 176–188.Google Scholar
- Howard, R. A., & Matheson, J. E. (1981a). Principles and applications of decision analysis. Menlo Park, CA: Strategic Decisions Group.Google Scholar
- Howard, R. A., & Matheson, J. E. (1981b). Influence diagrams. In R. A. Howard & J. E. Matheson (Eds.), Principles and applications of decision analysis. Menlo Park, CA: Strategic Decisions Group.Google Scholar
- Madigan, D., & Almond, R. G. (1995). Test selection strategies for belief networks. In D. Fisher & H. J. Lenz (Eds.), Learning from data: AI and statistics V (pp. 89–98). New York: Springer.Google Scholar
- Matheson, J. E. (1990). Using influence diagrams to value information and control. In R. M. Oliver & J. Q. Smith (Eds.), Influence diagrams, belief nets and decision analysis (pp. 25–48). Chichester: Wiley.Google Scholar
- Meiser, T., Stern, E., & Langeheine, R. (1998). Latent change in discrete data: Unidimensional, multidimensiona, and mixutre distribution Rasch models for the analysis of repeated observations. Methods of Psychological Research Online, 3(2). Retrieved from http://www.mpr-online.de//issue5/art6/article.html.
- Mislevy, R. J. (1994). Evidence and inference in educational assessment. Psychometrika, 12, 341–369.Google Scholar
- Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). On the roles of task model variables in assessment design. In S. Irvine & P. Kyllonen (Eds.), Generating items for cognitive tests: Theory and practice (pp. 97–128). Mahwah, NJ: Erlbaum.Google Scholar
- Mislevy, R. J., Steinberg, L. S., Almond, R. G., & Lukas, J. F. (2006). Concepts, terminology and basic models of evidence-centered design. In D. M. Williamson, R. J. Mislevy, & I. I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 15–47). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- National Council of Teachers of Mathematics (NCTM). (1989). Curriculum and performance standards for school mathematics. Author.Google Scholar
- Schum, D. A. (1994). The evidential foundations of probabilistic reasoning. New York: Wiley.Google Scholar
- Shute, V. J., Hansen, E. G., & Almond, R. G. (2008). You can’t fatten a hog by weighing it – or can you? Evaluating an assessment for learning system called ACED. International Journal of Artificial Intelligence in Education, 18(4), 289–316. Retrieved from http://www.ijaied.org/iaied/ijaied/abstract/Vol_18/Shute08.html.Google Scholar
- Shute, V. J., Ventura, M., Bauer, M. I., & Zapata-Rivera, D. (2009). Melding the power of serious games and embedded assessment to monitor and foster learning: Flow and grow. In U. Ritterfeld, M. J. Cody, & P. Vorderer (Eds.), The social science of serious games: Theories and applications (pp. 295–321). Philadelphia, PA: Routledge/LEA.Google Scholar
- Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. New York: Oxford University Press. (ISBN: 0195152964.)Google Scholar
- Sinharay, S., & Haberman, S. J. (2008). How much can we reliably know about what examinees know? Measurement: Interdisciplinary Research and Perspectives, 6, 46–49.Google Scholar
- Spandel, V., & Stiggins, R. L. (1990). Creating writers: Linking assessment and writing instruction. New York: Longman.Google Scholar
- Stevens, R. H., & Thadani, V. (2007). Quantifying student’s scientific problem solving efficiency and effectiveness. Technology, Instruction, Cognition, and Learning, 5(4), 325–338.Google Scholar
- Tanimoto, S. L. (2001). Distributed transcripts for online learning: Design issues. Journal of Interactive Media in Education, 2001(2). Retrieved 2009-09-26 from http://www-jime.open.ac.uk/2001/2/.
- Tatsuoka, K. K. (1984). Analysis of errors in fraction addition and subtraction problems (NIE Final report NIE-G-81-002). University of Illinois, Computer-based Education Research, Urbana, IL.Google Scholar
- Tatsuoka, K. K. (1990). Toward an integration of item response theory and cognitive error diagnosis. In N. Frederiksen, R. Glaser, A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 453–488). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- Vygotsky, L. (1978). Mind in society: The development of higher mental processes. Cambridge, MA: Harvard University Press.Google Scholar
- Wainer, H., Veva, J. L., Camacho, F., Reeve, B. B., III, Rosa, K., Nelson, L., et al. (2001). Augmented scores—“Borrowing strength” to compute scores based on a small number of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–388). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- Weaver, R., & Junker, B. W. (2004). Model specification for cognitive assessment of proportional reasoning (Department of Statistics Technical Report No. 777). Carnegie Mellon University: Pittsburgh, PA. Retrieved December 11, 2008, from http://www.stat.cmu.edu/cmu-stats/tr/tr777/tr777.pdf.Google Scholar
- Whittaker, J. (1990). Graphical models in applied multivariate statistics. Chichester: Wiley.Google Scholar