Perspectives on Methodological Issues

  • Mark Wilson
  • Isaac Bejar
  • Kathleen Scalise
  • Jonathan Templin
  • Dylan Wiliam
  • David Torres Irribarra


In this chapter the authors have surveyed the methodological perspectives seen as important for assessing twenty-first century skills. Some of those issues are specific to twenty-first century skills, but the majority would apply more generally to the assessment of other psychological and educational variables. The narrative of the paper initially follows the logic of assessment development, commencing by defining constructs to be assessed, designing tasks that can be used to generate informative student responses, coding/valuing of those responses, delivering the tasks and gathering the responses, and modeling the responses in accordance with the constructs. The paper continues with a survey of the strands of validity evidence that need to be established, and a discussion of specific issues that are prominent in this context, such as the need to resolve issues of generality versus contextual specificity; the relationships of classroom to large-scale assessments; and the possible roles for technological advances in assessing these skills. There is also a brief segment discussing some issues that arise with respect to specific types of variables involved in the assessment of twenty-first century skills. The chapter concludes with a listing of particular challenges that are regarded as being prominent at the time of writing. There is an annexure that describes specific approaches to assessment design that are useful in the development of new assessments.


Test Taker Computerize Adaptive Testing Progress Variable Outcome Space Learning Progression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank the members of the Working Group who have contributed ideas and made suggestions in support of the writing of this paper, in particular, Chris Dede and his group at Harvard, John Hattie, Detlev Leutner, André Rupp, and Hans Wagemaker.

Supplementary material


  1. Abidi, S. S. R., Chong, Y., & Abidi, S. R. (2001). Patient empowerment via ‘pushed’ delivery of customized healthcare educational content over the Internet. Paper presented at the 10th World Congress on Medical Informatics, London.Google Scholar
  2. Ackerman, T., Zhang, W., Henson, R., & Templin, J. (2006, April). Evaluating a third grade science benchmark test using a skills assessment model: Q-matrix evaluation. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME), San FranciscoGoogle Scholar
  3. Adams, W. K., Reid, S., LeMaster, R., McKagan, S., Perkins, K., & Dubson, M. (2008). A study of educational simulations part 1—engagement and learning. Journal of Interactive Learning Research, 19(3), 397–419.Google Scholar
  4. Aleinikov, A. G., Kackmeister, S., & Koenig, R. (Eds.). (2000). 101 Definitions: Creativity. Midland: Alden B Dow Creativity Center Press.Google Scholar
  5. Almond, R. G., Steinberg, L. S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment in Education, 1(5). Available from
  6. Almond, R. G., Steinberg, L. S., & Mislevy, R. J. (2003). A four-process architecture for assessment delivery, with connections to assessment design (Vol. 616). Los Angeles: University of California Los Angeles Center for Research on Evaluations, Standards and Student Testing (CRESST).Google Scholar
  7. American Association for the Advancement of Science (AAAS). (1993). Benchmarks for science literacy. New York: Oxford University Press.Google Scholar
  8. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (AERA, APA, NCME, 1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.Google Scholar
  9. Autor, D. H., Levy, F., & Murnane, R. J. (2003). The skill content of recent technological change: An empirical exploration. Quarterly Journal of Economics, 118(4), 1279–1333.Google Scholar
  10. Ball, S. J. (1985). Participant observation with pupils. In R. Burgess (Ed.), Strategies of educational research: Qualitative methods (pp. 23–53). Lewes: Falmer.Google Scholar
  11. Behrens, J. T., Frezzo D. C., Mislevy R. J., Kroopnick M., & Wise D. (2007). Structural, functional, and semiotic symmetries in simulation-based games and assessments. In E. Baker, J. Dickieson, W. Wulfeck, & H. F. O’Neil (Eds.), Assessment of problem solving using simulations (pp. 59–80). New York: Earlbaum.Google Scholar
  12. Bejar, I. I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2(3). Available from
  13. Bejar, I. I., Braun, H., & Tannenbaum, R. (2007). A prospective, predictive and progressive approach to standard setting. In R. W. Lissitz (Ed.), Assessing and modeling cognitive development in school: Intellectual growth and standard setting (pp. 1–30). Maple Grove: JAM Press.Google Scholar
  14. Bennett, R. E., & Bejar, I. I. (1998). Validity and automated scoring: It’s not only the scoring. Educational Measurement: Issues and Practice, 17(4), 9–16.Google Scholar
  15. Bennett, R. E., & Gitomer, D. H. (2009). Transforming K-12 assessment: Integrating accountability testing, formative assessment and professional support. In C. Wyatt-Smith & J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–61). New York: Springer.Google Scholar
  16. Bennett, R. E., Goodman, M., Hessinger, J., Kahn, H., Ligget, J., & Marshall, G. (1999). Using multimedia in large-scale computer-based testing programs. Computers in Human Behaviour, 15(3–4), 283–294.Google Scholar
  17. Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy. New York: Academic.Google Scholar
  18. Binkley, M., Erstad, O., Herman, J., Raizen, S., Ripley, M., & Rumble, M. (2009). Developing 21st century skills and assessments. White Paper from the Assessment and Learning of 21st Century Skills Project.Google Scholar
  19. Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2003). Assessment for learning. London: Open University Press.Google Scholar
  20. Bloom, B. S. (Ed.). (1956). Taxonomy of educational objectives: The classification of educational goals: Handbook I, cognitive domain. New York/Toronto: Longmans, Green.Google Scholar
  21. Bourque, M. L. (2009). A history of NAEP achievement levels: Issues, implementation, and impact 1989–2009 (No. Paper Commissioned for the 20th Anniversary of the National Assessment Governing Board 1988–2008). Washington, DC: NAGB. Downloaded from
  22. Braun, H. I., & Qian, J. (2007). An enhanced method for mapping state standards onto the NAEP scale. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 313–338). New York: Springer.Google Scholar
  23. Braun, H., Bejar, I. I., & Williamson, D. M. (2006). Rule-based methods for automated scoring: Applications in a licensing context. In D. M. Williamson, R. J. Mislevy, & I. I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 83–122). Mahwah: Lawrence Erlbaum.Google Scholar
  24. Brown, A. L., & Reeve, R. A. (1987). Bandwidths of competence: The role of supportive contexts in learning and development. In L. S. Liben (Ed.), Development and learning: Conflict or congruence? (pp. 173–223). Hillsdale: Erlbaum.Google Scholar
  25. Brown, N. J. S., Furtak, E. M., Timms, M., Nagashima, S. O., & Wilson, M. (2010a). The ­evidence-based reasoning framework: Assessing scientific reasoning. Educational Assessment, 15(3–4), 123–141.Google Scholar
  26. Brown, N. J. S., Nagashima, S. O., Fu, A., Timms, M., & Wilson, M. (2010b). A framework for analyzing scientific reasoning in assessments. Educational Assessment, 15(3–4), 142–174.Google Scholar
  27. Brown, N., Wilson, M., Nagashima, S., Timms, M., Schneider, A., & Herman, J. (2008, March 24–28). A model of scientific reasoning. Paper presented at the Annual Meeting of the American Educational Research Association, New York.Google Scholar
  28. Brusilovsky, P., Sosnovsky, S., & Yudelson, M. (2006). Addictive links: The motivational value of adaptive link annotation in educational hypermedia. In V. Wade, H. Ashman, & B. Smyth (Eds.), Adaptive hypermedia and adaptive Web-based systems, 4th International Conference, AH 2006. Dublin: Springer.Google Scholar
  29. Carnevale, A. P., Gainer, L. J., & Meltzer, A. S. (1990). Workplace basics: The essential skills employers want. San Francisco: Jossey-Bass.Google Scholar
  30. Carpenter, T. P., & Lehrer, R. (1999). Teaching and learning mathematics with understanding.In E. Fennema & T. R. Romberg (Eds.), Mathematics classrooms that promote understanding (pp. 19–32). Mahwah: Lawrence Erlbaum Associates.Google Scholar
  31. Case, R., & Griffin, S. (1990). Child cognitive development: The role of central conceptual structures in the development of scientific and social thought. In E. A. Hauert (Ed.), Developmental psychology: Cognitive, perceptuo-motor, and neurological perspectives (pp. 193–230). North-Holland: Elsevier.Google Scholar
  32. Catley, K., Lehrer, R., & Reiser, B. (2005). Tracing a prospective learning progression for developing understanding of evolution. Paper Commissioned by the National Academies Committee on Test Design for K-12 Science Achievement.
  33. Center for Continuous Instructional Improvement (CCII). (2009). Report of the CCII Panel on learning progressions in science (CPRE Research Report). New York: Columbia University.Google Scholar
  34. Center for Creative Learning. (2007). Assessing creativity index. Retrieved August 27, 2009, from
  35. Chedrawy, Z., & Abidi, S. S. R. (2006). An adaptive personalized recommendation strategy featuring context sensitive content adaptation. Paper presented at the Adaptive Hypermedia and Adaptive Web-Based Systems, 4th International Conference, AH 2006, Dublin, Ireland.Google Scholar
  36. Chen, Z.-L., & Raghavan, S. (2008). Tutorials in operations research: State-of-the-art decision-making tools in the information-intensive age, personalization and recommender systems. Paper presented at the INFORMS Annual Meeting. Retrieved from
  37. Claesgens, J., Scalise, K., Wilson, M., & Stacy, A. (2009). Mapping student understanding in chemistry: The perspectives of chemists. Science Education, 93(1), 56–85.Google Scholar
  38. Clark, A. (1999). An embodied cognitive science? Trends in Cognitive Sciences, 3(9), 345–351.Google Scholar
  39. Conlan, O., O’Keeffe, I., & Tallon, S. (2006). Combining adaptive hypermedia techniques and ontology reasoning to produce Dynamic Personalized News Services. Paper presented at the Adaptive Hypermedia and Adaptive Web-based Systems, Dublin, Ireland.Google Scholar
  40. Crick, R. D. (2005). Being a Learner: A Virtue for the 21st Century. British Journal of Educational Studies, 53(3), 359–374.Google Scholar
  41. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.Google Scholar
  42. Dagger, D., Wade, V., & Conlan, O. (2005). Personalisation for all: Making adaptive course composition easy. Educational Technology & Society, 8(3), 9–25.Google Scholar
  43. Dahlgren, L. O. (1984). Outcomes of learning. In F. Martin, D. Hounsell, & N. Entwistle (Eds.), The experience of learning. Edinburgh: Scottish Academic Press.Google Scholar
  44. DocenteMas. (2009). The Chilean teacher evaluation system. Retrieved from
  45. Drasgow, F., Luecht, R., & Bennett, R. E. (2006). Technology and testing. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 471–515). Westport: Praeger Publishers.Google Scholar
  46. Duncan, R. G., & Hmelo-Silver, C. E. (2009). Learning progressions: Aligning curriculum, instruction, and assessment. Journal of Research in Science Teaching, 46(6), 606–609.Google Scholar
  47. Frazier, E., Greiner, S., & Wethington, D. (Producer). (2004, August 14, 2009) The use of biometrics in education technology assessment. Retrieved from
  48. Frezzo, D. C., Behrens, J. T., & Mislevy, R. J. (2010). Design patterns for learning and assessment: Facilitating the introduction of a complex simulation-based learning environment into a community of instructors. Journal of Science Education and Technology, 19(2), 105–114.Google Scholar
  49. Frezzo, D. C., Behrens, J. T., Mislevy, R. J., West, P., & DiCerbo, K. E. (2009, April). Psychometric and evidentiary approaches to simulation assessment in Packet Tracer software. Paper presented at the Fifth International Conference on Networking and Services (ICNS), Valencia, Spain.Google Scholar
  50. Gao, X., Shavelson, R. J., & Baxter, G. P. (1994). Generalizability of large-scale performance assessments in science: Promises and problems. Applied Measurement in Education, 7(4), 323–342.Google Scholar
  51. Gellersen, H.-W. (1999). Handheld and ubiquitous computing: First International Symposium. Paper presented at the HUC ‘99, Karlsruhe, Germany.Google Scholar
  52. Gifford, B. R. (2001). Transformational instructional materials, settings and economics. In The Case for the Distributed Learning Workshop, Minneapolis.Google Scholar
  53. Giles, J. (2005). Wisdom of the crowd. Decision makers, wrestling with thorny choices, are tapping into the collective foresight of ordinary people. Nature, 438, 281.Google Scholar
  54. Glaser, R. (1963). Instructional technology and the measurement of learning outcomes: Some questions. The American Psychologist, 18, 519–521.Google Scholar
  55. Graesser, A. C., Jackson, G. T., & McDaniel, B. (2007). AutoTutor holds conversations with learners that are responsive to their cognitive and emotional state. Educational Technology, 47, 19–22.Google Scholar
  56. Guilford, J. P. (1946). New standards for test evaluation. Educational and Psychological Measurement, 6, 427–438.Google Scholar
  57. Haladyna, T. M. (1994). Cognitive taxonomies. In T. M. Haladyna (Ed.), Developing and validating multiple-choice test items (pp. 104–110). Hillsdale: Lawrence Erlbaum Associates.Google Scholar
  58. Hartley, D. (2009). Personalisation: The nostalgic revival of child-centred education? Journal of Education Policy, 24(4), 423–434.Google Scholar
  59. Hattie, J. (2009, April 16). Visibly learning from reports: The validity of score reports. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME), San Diego, CA.Google Scholar
  60. Hawkins, D. T. (2007, November). Trends, tactics, and truth in the information industry: The fall 2007 ASIDIC meeting. InformationToday, p. 34.Google Scholar
  61. Hayes, J. R. (1985). Three problems in teaching general skills. In S. F. Chipman, J. W. Segal, & R. Glaser (Eds.), Thinking and learning skills: Research and open questions (Vol. 2, pp. 391–406). Hillsdale: Erlbaum.Google Scholar
  62. Henson, R., & Templin, J. (2008, March). Implementation of standards setting for a geometry end-of-course exam. Paper presented at the 2008 American Educational Research Association conference in New York, New York.Google Scholar
  63. Hernández, J. A., Ochoa Ortiz, A., Andaverde, J., & Burlak, G. (2008). Biometrics in online assessments: A study case in high school student. Paper presented at the 8th International Conference on Electronics, Communications and Computers (conielecomp 2008), Puebla.Google Scholar
  64. Hirsch, E. D. (2006, 26 April). Reading-comprehension skills? What are they really? Education Week, 25(33), 57, 42.Google Scholar
  65. Hopkins, D. (2004). Assessment for personalised learning: The quiet revolution. Paper presented at the Perspectives on Pupil Assessment, New Relationships: Teaching, Learning and Accountability, General Teaching Council Conference, London.Google Scholar
  66. Howe, J. (2008, Winter). The wisdom of the crowd resides in how the crowd is used. Nieman Reports, New Venues, 62(4), 47–50.Google Scholar
  67. International Organization for Standardization. (2009). International standards for business, government and society, JTC 1/SC 37—Biometrics.
  68. Kanter, R. M. (1994). Collaborative advantage: The Art of alliances. Harvard Business Review, 72(4), 96–108.Google Scholar
  69. Kelleher, K. (2006). Personalize it. Wired Magazine, 14(7), 1.Google Scholar
  70. Kyllonen, P. C., Walters, A. M., & Kaufman, J. C. (2005). Noncognitive constructs and their assessment in graduate education: A review. Educational Assessment, 10(3), 143–184.Google Scholar
  71. Lawton, D. L. (1970). Social class, language and education. London: Routledge and Kegan Paul.Google Scholar
  72. Lesgold, A. (2009). Better schools for the 21st century: What is needed and what will it take to get improvement. Pittsburgh: University of Pittsburgh.Google Scholar
  73. Levy, F., & Murnane, R. (2006, May 31). How computerized work and globalization shape human skill demands. Retrieved August 23, 2009, from
  74. Linn, R. L., & Baker, E. L. (1996). Can performance- based student assessments be psychometrically sound? In J. B. Baron, & D. P. Wolf (Eds.), Performance-based student assessment: Challenges and possibilities (pp. 84–103). Chicago: University of Chicago Press.Google Scholar
  75. Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–694.Google Scholar
  76. Lord, F. M. (1971). Tailored testing, an approximation of stochastic approximation. Journal of the American Statistical Association, 66, 707–711.Google Scholar
  77. Margolis, M. J., & Clauser, B. E. (2006). A regression-based procedure for automated scoring of a complex medical performance assessment. In D. M. Williamson, I. J. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer based testing. Mahwah: Lawrence Erlbaum Associates.Google Scholar
  78. Martinez, M. (2002). What is personalized learning? Are we there yet? E-Learning Developer’s Journal. E-Learning Guild (
  79. Marton, F. (1981). Phenomenography—Describing conceptions of the world around us. Instructional Science, 10, 177–200.Google Scholar
  80. Marton, F. (1983). Beyond individual differences. Educational Psychology, 3, 289–303.Google Scholar
  81. Marton, F. (1986). Phenomenography—A research approach to investigating different understandings of reality. Journal of Thought, 21, 29–49.Google Scholar
  82. Marton, F. (1988). Phenomenography—Exploring different conceptions of reality. In D. Fetterman (Ed.), Qualitative approaches to evaluation in education (pp. 176–205). New York: Praeger.Google Scholar
  83. Marton, F., Hounsell, D., & Entwistle, N. (Eds.). (1984). The experience of learning. Edinburgh: Scottish Academic Press. Masters, G.N. & Wilson, M. (1997). Developmental assessment. Berkeley, CA: BEAR Research Report, University of California.Google Scholar
  84. Masters G. (1982). A rasch model for partial credit scoring. Psychometrika 42(2), 149–174.Google Scholar
  85. Masters, G.N. & Wilson, M. (1997). Developmental assessment. Berkeley, CA: BEAR Research Report, University of California.Google Scholar
  86. Mayer, R. E. (1983). Thinking, problem-solving and cognition. New York: W H Freeman.Google Scholar
  87. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Washington, DC: American Council on Education/Macmillan.Google Scholar
  88. Messick, S. (1995). Validity of psychological assessment. Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. The American Psychologist, 50(9), 741–749.Google Scholar
  89. Microsoft. (2009). Microsoft Certification Program. Retrieved from
  90. Miliband, D. (2003). Opportunity for all, targeting disadvantage through personalised learning. New Economy, 1070–3535/03/040224(5), 224–229.Google Scholar
  91. Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003a). A brief introduction to evidence centred design (Vol. RR-03–16). Princeton: Educational Testing Service.Google Scholar
  92. Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003b). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1(1), 3–62.Google Scholar
  93. Mislevy, R. J., Bejar, I. I., Bennett, R. E., Haertel, G. D., & Winters, F. I. (2008). Technology supports for assessment design. In B. McGaw, E. Baker, & P. Peterson (Eds.), International encyclopedia of education (3rd ed.). Oxford: Elsevier.Google Scholar
  94. Mitchell, W. J. (1990). The logic of architecture. Cambridge: MIT Press.Google Scholar
  95. National Research Council, Bransford, J. D., Brown, A. L., & Cocking, R. R. (2000). How people learn: Brain, mind, experience, and school: Expanded edition. Washington, DC: National Academy Press.Google Scholar
  96. National Research Council, Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.Google Scholar
  97. National Research Council, Wilson, M., & Bertenthal, M. (Eds.). (2006). Systems for state science assessment. Committee on Test Design for K-12 Science Achievement. Washington, DC: National Academy Press.Google Scholar
  98. National Research Council, Duschl, R. A., Schweingruber, H. A., & Shouse, A. W. (Eds.). (2007). Taking science to school: Learning and teaching science in Grades K-8. Committee on Science Learning, Kindergarten through Eighth Grade. Washington, DC: National Academy Press.Google Scholar
  99. Newell, A., Simon, H. A., & Shaw, J. C. (1958). Elements of a theory of human problem solving. Psychological Review, 65, 151–166.Google Scholar
  100. Oberlander, J. (2006). Adapting NLP to adaptive hypermedia. Paper presented at the Adaptive Hypermedia and Adaptive Web-Based Systems, 4th International Conference, AH 2006, Dublin, Ireland.Google Scholar
  101. OECD. (2005). PISA 2003 Technical Report. Paris: Organisation for Economic Co-operation and Development.Google Scholar
  102. Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the literature. Practical Assessment, Research & Evaluation, 13(4), 4.Google Scholar
  103. Parshall, C. G., Stewart, R., Ritter, J. (1996, April). Innovations: Sound, graphics, and alternative response modes. Paper presented at the National Council on Measurement in Education, New York.Google Scholar
  104. Parshall, C. G., Davey, T., & Pashley, P. J. (2000). Innovative item types for computerized testing. In W. Van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 129–148). Norwell: Kluwer Academic Publisher.Google Scholar
  105. Parshall, C. G., Spray, J., Kalohn, J., & Davey, T. (2002). Issues in innovative item types practical considerations in computer-based testing (pp. 70–91). New York: Springer.Google Scholar
  106. Patton, M. Q. (1980). Qualitative evaluation methods. Beverly Hills: Sage.Google Scholar
  107. Pellegrino, J., Jones, L., & Mitchell, K. (Eds.). (1999). Grading the Nation’s report card: Evaluating NAEP and transforming the assessment of educational progress. Washington, DC: National Academy Press.Google Scholar
  108. Perkins, D. (1998). What is understanding? In M. S. Wiske (Ed.), Teaching for understanding: Linking research with practice. San Francisco: Jossey-Bass Publishers.Google Scholar
  109. Pirolli, P. (2007). Information foraging theory: Adaptive interaction with information. Oxford: Oxford University Press.Google Scholar
  110. Popham, W. J. (1997). Consequential validity: Right concern—Wrong concept. Educational Measurement: Issues and Practice, 16(2), 9–13.Google Scholar
  111. Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, 58–93.Google Scholar
  112. Reiser, R. A. (2002). A history of instructional design and technology. In R. A. Reiser & J. V. Dempsey (Eds.), Trends and issues in instructional design and technology. Upper Saddle River: Merrill/Prentice Hall.Google Scholar
  113. Reiser, B., Krajcik, J., Moje, E., & Marx, R. (2003, March). Design strategies for developing science instructional materials. Paper presented at the National Association for Research in Science Teaching, Philadelphia, PA.Google Scholar
  114. Robinson, K. (2009). Out of our minds: Learning to be creative. Chichester: Capstone.Google Scholar
  115. Rosenbaum, P. R. (1988). Item Bundles. Psychometrika, 53, 349–359.Google Scholar
  116. Rupp, A. A., & Templin, J. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research and Perspectives, 6, 219–262.Google Scholar
  117. Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guilford Press.Google Scholar
  118. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.Google Scholar
  119. Scalise, K. (2004). A new approach to computer adaptive assessment with IRT construct-modeled item bundles (testlets): An application of the BEAR assessment system. Paper presented at the 2004 International Meeting of the Psychometric Society, Pacific Grove.Google Scholar
  120. Scalise, K. (submitted). Personalised learning taxonomy: Characteristics in three dimensions for ICT. British Journal of Educational Technology.Google Scholar
  121. Scalise, K., & Gifford, B. (2006). Computer-based assessment in E-Learning: A framework for constructing “Intermediate Constraint” questions and tasks for technology platforms. Journal of Technology, Learning, and Assessment, 4(6) [online journal].
  122. Scalise, K., & Wilson, M. (2006). Analysis and comparison of automated scoring approaches: Addressing evidence-based assessment principles. In D. M. Williamson, I. J. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer based testing. Mahwah: Lawrence Erlbaum Associates.Google Scholar
  123. Scalise, K., & Wilson, M. (2007). Bundle models for computer adaptive testing in e-learning assessment. Paper presented at the 2007 GMAC Conference on Computerized Adaptive Testing (Graduate Management Admission Council), Minneapolis, MN.Google Scholar
  124. Schum, D. A. (1987). Evidence and inference for the intelligence analyst. Lanham: University Press of America.Google Scholar
  125. Searle, J. (1969). Speech acts. Cambridge: Cambridge University Press.Google Scholar
  126. Shute, V., Ventura, M., Bauer, M., & Zapata-Rivera, D. (2009). Melding the power of serious games and embedded assessment to monitor and foster learning, flow and grow melding the power of serious games. New York: Routledge.Google Scholar
  127. Shute, V., Maskduki, I., Donmez, O., Dennen, V. P., Kim, Y. J., & Jeong, A. C. (2010). Modeling, assessing, and supporting key competencies within game environments. In D. Ifenthaler, P. Pirnay-Dummer, & N. M. Seel (Eds.), Computer-based diagnostics and systematic analysis of knowledge. New York: Springer. Smith, C., Wiser, M., Anderson, C. W., Krajcik, J. & Coppola, B. (2004). Implications of research on children’s learning for assessment: matter and atomic molecular theory. Paper Commissioned by the National Academies Committee on Test Design for K-12 Science Achievement. Washington DCGoogle Scholar
  128. Simon, H. A. (1980). Problem solving and education. In D. T. Tuma, & R. Reif, (Eds.), Problem solving and education: Issues in teaching and research (pp. 81–96). Hillsdale: Erlbaum.Google Scholar
  129. Smith, C., Wiser, M., Anderson, C. W., Krajcik, J. & Coppola, B. (2004). Implications of research on children’s learning for assessment: matter and atomic molecular theory. Paper Commissioned by the National Academies Committee on Test Design for K-12 Science Achievement. Washington DC.Google Scholar
  130. Smith, C. L., Wiser, M., Anderson, C. W., & Krajcik, J. (2006). Implications of research on Children’s learning for standards and assessment: A proposed learning progression for matter and the atomic molecular theory. Measurement: Interdisciplinary Research and Perspectives, 4(1 & 2).Google Scholar
  131. Stiggins, R. J. (2002). Assessment crisis: The absence of assessment for learning. Phi Delta Kappan, 83(10), 758–765.Google Scholar
  132. Templin, J., & Henson, R. A. (2008, March). Understanding the impact of skill acquisition: relating diagnostic assessments to measureable outcomes. Paper presented at the 2008 American Educational Research Association conference in New York, New York.Google Scholar
  133. Treffinger, D. J. (1996). Creativity, creative thinking, and critical thinking: In search of definitions. Sarasota: Center for Creative Learning.Google Scholar
  134. Valsiner, J., & Veer, R. V. D. (2000). The social mind. Cambridge: Cambridge University Press.Google Scholar
  135. Van der Linden, W. J., & Glas, C. A. W. (2007). Statistical aspects of adaptive testing. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics (Vol. 26, pp. 801–838). New York: Elsevier.Google Scholar
  136. Wainer, H., & Dorans, N. J. (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah: Lawrence Erlbaum Associates.Google Scholar
  137. Wainer, H., Brown, L., Bradlow, E., Wang, X., Skorupski, W. P., & Boulet, J. (2006). An application of testlet response theory in the scoring of a complex certification exam. In D. M. Williamson, I. J. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer based testing. Mahwah: Lawrence Erlbaum Associates.Google Scholar
  138. Wang, W. C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29, 126–149.Google Scholar
  139. Weekley, J. A., & Ployhart, R. E. (2006). Situational judgment tests: Theory, measurement, and application. Mahwah: Lawrence Erlbaum Associates.Google Scholar
  140. Weiss, D. J. (Ed.). (2007). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Available at
  141. Wiley, D. (2008). Lying about personalized learning, iterating toward openness. Retrieved from
  142. Wiliam, D., & Thompson, M. (2007). Integrating assessment with instruction: What will it take to make it work? In C. A. Dwyer (Ed.), The future of assessment: Shaping teaching and learning (pp. 53–82). Mahwah: Lawrence Erlbaum Associates.Google Scholar
  143. Williamson, D. M., Mislevy, R. J., & Bejar, I. I. (2006). Automated scoring of complex tasks in computer-based testing. Mahwah: Lawrence Erlbaum Associates.Google Scholar
  144. Wilson, M. (Ed.). (2004). Towards coherence between classroom assessment and accountability. Chicago: Chicago University Press.Google Scholar
  145. Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah: Lawrence Erlbaum Associates.Google Scholar
  146. Wilson, M., & Adams, R. J. (1995). Rasch models for item bundles. Psychometrika, 60(2), 181–198.Google Scholar
  147. Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13, 181–208.Google Scholar
  148. Wilson, M. (2009). Measuring progressions: Assessment structures underlying a learning progression. Journal of Research in Science Teaching, 46(6), 716–730.Google Scholar
  149. Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43, 19–38.Google Scholar
  150. Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18, 163–183.Google Scholar
  151. Wolf, D. P., & Reardon, S. F. (1996). Access to excellence through new forms of student assessment. In D. P. Wolf, & J. B. Baron (Eds.), Performance-based student assessment: Challenges and possibilities. Ninety-fifth yearbook of the national society for the study of education, part I. Chicago: University of Chicago Press.Google Scholar
  152. Zechner, K., Higgins, D., Xiaoming, X., & Williamson, D. (2009). Automatic scoring of non-native spontaneous speech in test of spoken English. Speech Communication, 51, 883–895.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • Mark Wilson
    • 1
  • Isaac Bejar
    • 2
  • Kathleen Scalise
    • 3
  • Jonathan Templin
    • 4
  • Dylan Wiliam
    • 5
  • David Torres Irribarra
    • 1
  1. 1.University of CaliforniaBerkeleyUSA
  2. 2.Educational Testing ServiceNew YorkUSA
  3. 3.University of OregonEugeneUSA
  4. 4.University of GeorgiaAthensUSA
  5. 5.Institute of EducationUniversity of LondonLondonUK

Personalised recommendations