Research in Science Education

, Volume 45, Issue 4, pp 527–553 | Cite as

Using a Constructed-Response Instrument to Explore the Effects of Item Position and Item Features on the Assessment of Students’ Written Scientific Explanations

  • Meghan Rector Federer
  • Ross H. Nehm
  • John E. Opfer
  • Dennis Pearl


A large body of work has been devoted to reducing assessment biases that distort inferences about students’ science understanding, particularly in multiple-choice instruments (MCI). Constructed-response instruments (CRI), however, have invited much less scrutiny, perhaps because of their reputation for avoiding many of the documented biases of MCIs. In this study we explored whether known biases of MCIs—specifically item sequencing and surface feature effects—were also apparent in a CRI designed to assess students’ understanding of evolutionary change using written explanation (Assessment of COntextual Reasoning about Natural Selection [ACORNS]). We used three versions of the ACORNS CRI to investigate different aspects of assessment structure and their corresponding effect on inferences about student understanding. Our results identified several sources of (and solutions to) assessment bias in this practice-focused CRI. First, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. Second, a counterbalanced design (i.e., Latin Square) mitigated this bias at the population level of analysis. Third, ACORNS response scores were highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Our results suggest that as assessments in science education shift toward the measurement of scientific practices (e.g., explanation), it is critical that biases inherent in these types of assessments be investigated empirically.


Constructed response instrument Item order effects Item surface features Scientific explanation Evolution 



This research was supported by the National Science Foundation REESE program (R.H. Nehm) and the Marilyn Ruth Hathaway Education Scholarship (M.R. Federer). Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the NSF or The Ohio State University.


  1. American Association for the Advancement of Science [AAAS]. (1994). Benchmarks for science literacy. New York: Oxford University.Google Scholar
  2. American Association for the Advancement of Science [AAAS]. (2011). Vision and change in undergraduate biology education. Washington, DC.
  3. Anderson, D. L., Fisher, K. M., & Norman, G. J. (2002). Development and evaluation of the conceptual inventory of natural selection. Journal of Research in Science Teaching, 39, 952–978.CrossRefGoogle Scholar
  4. Bennett, R. E., & Ward, W. C. (1993). Construction versus choice in cognitive measurement: issues in constructed response, performance testing, and portfolio assessment. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  5. Berland, L. K., & McNeill, K. L. (2012). For whom is argument and explanation a necessary distinction? A response to Osborne and Patterson. Science Education, 96(5), 808–813.CrossRefGoogle Scholar
  6. Birney, D. P., Halford, G. S., & Andrews, G. (2006). Measuring the influence of complexity on relational reasoning: the development of the Latin Square Task. Educational & Psychological Measurement, 66(1), 146–171.CrossRefGoogle Scholar
  7. Bishop, B., & Anderson, C. (1990). Student conceptions of natural selection and its role in evolution. Journal of Research in Science Teaching, 27, 415–427.CrossRefGoogle Scholar
  8. Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats. Journal of Educational Measurement, 29(3), 253–271.CrossRefGoogle Scholar
  9. Caleon, I. S., & Subramaniam, R. (2010). Do students know what they know and what they don’t know? Using a four-tier diagnostic test to assess the nature of students’ alternative conceptions. Research in Science Education, 40, 313–337.CrossRefGoogle Scholar
  10. Catley, K. M., Phillips, B. C., & Novick, L. R. (2013). Snakes, eels, and dogs! Oh my! Evaluating high-school students’ tree-thinking skills: an entry point to understanding evolution. Research in Science Education, 43(6), 2327–2348.CrossRefGoogle Scholar
  11. Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121–152.Google Scholar
  12. Clough, E. E., & Wood-Robinson, C. (1985). How secondary students interpret instances of biological adaptation. Journal of Biological Education, 19, 125–130.CrossRefGoogle Scholar
  13. Clough, E. E., & Driver, R. (1986). A study of consistency in the use of students’ conceptual frameworks across different task contexts. Science Education, 70(4), 473–496.CrossRefGoogle Scholar
  14. Cronbach, L. J. (1988). Five perspectives on validity argument (In H. Wainer and H.I. Braun (Eds)). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  15. Duschl, R. A., Schweingruber, H. A., & Shouse, A. W. (2007). Taking science to school: learning and teaching science in grades K-8. Washington DC: National Academies.Google Scholar
  16. Friedman, M. (1974). Explanation and scientific understanding. Journal of Philosophy, 71(1), 5–19.CrossRefGoogle Scholar
  17. Garvin-Doxas, K., & Klymkowsky, M. W. (2008). Understanding randomness and its impact on student learning: lessons learned from building the Biology Concept Inventory (BCI). CBE Life Sciences Education, 7(2), 227–233.CrossRefGoogle Scholar
  18. Gotwals, A. W., & Songer, N. B. (2010). Reasoning up and down a food chain: using an assessment framework to investigate students’ middle knowledge. Science Education, 94, 259–281.Google Scholar
  19. Gray, K. E. (2004). The effect of question order on student responses to multiple choice physics questions. Master thesis, Kansas State University. Retrieved from
  20. Griffiths, T. L., Steyvers, M., & Firl, A. (2007). Google and the mind: predicting fluency with PageRank. Psychological Science, 18, 1069–1067.CrossRefGoogle Scholar
  21. Gulacar, O., & Fynewevr, H. (2010). A research methodology for studying what makes some problems difficult to solve. International Journal of Science Education, 32(16), 2167–2184.CrossRefGoogle Scholar
  22. Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed, pp. 187–220). Westport: American Council on Higher Education and Praeger.Google Scholar
  23. Hempel, C., & Oppenheim, P. (1948). Studies in the logic of explanation. Philosophy of Science, 15, 135–175.CrossRefGoogle Scholar
  24. Jensen, P., Watanabe, H. K., & Richters, J. E. (1999). Who’s up first? Testing for order effects in structured interviews using a counterbalanced experimental design. Journal of Abnormal Child Psychology, 27(6), 439–445.CrossRefGoogle Scholar
  25. Kampourakis, K., & Zygzos, V. (2008). Students’ intuitive explanations of the causes of homologies and adaptations. Science & Education, 17, 27–47.CrossRefGoogle Scholar
  26. Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test format and corrective feedback modify the effect of cross-validation on long-term retention. European Journal of Cognitive Psychology, 19, 528–558.CrossRefGoogle Scholar
  27. Kelemen, D. (2012). Teleological minds: how natural intuitions about agency and purpose influence learning about evolution. In K. S. Rosengren, S. K. Brem, E. M. Evans, & G. M. Sinatra (Eds.), Evolution challenges: integrating research and practice in teaching and learning about evolution (pp. 66–92). Oxford: Oxford University.Google Scholar
  28. Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147–154.CrossRefGoogle Scholar
  29. Kitcher, P. (1981). Explanatory unification. Philosophy of Science, 48(4), 507–531.CrossRefGoogle Scholar
  30. Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: a historical perspective on an immediate concern. Review of Educational Research, 55(3), 387–413.CrossRefGoogle Scholar
  31. Lee, H.-S., Liu, L., & Linn, M. C. (2011). Validating measurement of knowledge integration in science using multiple-choice and explanation items. Applied Measurement in Education, 24(2), 115–136.CrossRefGoogle Scholar
  32. Lewis, D. (1986). Causal explanation. In D. Lewis (Ed.), Philosophical papers (Vol. 2, pp. 214–240). Oxford: Oxford University Press.Google Scholar
  33. Liu, O. L., Lee, H.-S., & Linn, M. C. (2011). An investigation of explanation multiple-choice items in science assessment. Educational Assessment, 16, 164–184.CrossRefGoogle Scholar
  34. MacNicol, K. (1956). Effects of varying order of item difficulty in an unspeeded verbal test. Unpublished manuscript, Educational Testing Service, Princeton, NJ.Google Scholar
  35. Mandler, G., & Rabinowitz, J. C. (1981). Appearance and reality: does a recognition test really improve subsequent recall and recognition? Journal of Experimental Psychology: Human Learning and Memory, 7(2), 79–90.Google Scholar
  36. Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34(4), 207–218.CrossRefGoogle Scholar
  37. Martiniello, M. (2008). Language and the performance of English Language Learners in math word problems. Harvard Educational Review, 78, 333–368.CrossRefGoogle Scholar
  38. McClary, L., & Talanquer, V. (2011). College chemistry students’ mental models of acids and acid strength. Journal of Research in Science Teaching, 48(4), 396–413.CrossRefGoogle Scholar
  39. McNeill, K. L., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. Journal of the Learning Sciences, 15(2), 153–191.CrossRefGoogle Scholar
  40. Messick, S. (1995). Validity of psychological assessment. American Psychologist, 50, 741–749.CrossRefGoogle Scholar
  41. Mollenkopf, W. G. (1950). An experimental study of the effects on item analysis data of changing item placement and test-time limit. Psychometrika, 15, 291–315.CrossRefGoogle Scholar
  42. Monk, J. J., & Stallings, W. M. (1970). Effect of item order on test scores. Journal of Educational Research, 63, 463–465.Google Scholar
  43. National Research Council. (1996). National science education standards. Washington, DC: National Academies.Google Scholar
  44. National Research Council. (2001). Knowing what students know: the science and design of educational assessment. Washington, DC: National Academies.Google Scholar
  45. National Research Council. (2007). Taking science to school: learning and teaching science in grades K-8. Washington, DC: The National Academies.Google Scholar
  46. National Research Council. (2012). A framework for K-12 science education: practices, crosscutting concepts, and core ideas. Washington, DC: National Academies.Google Scholar
  47. Nehm, R. H. (2010). Understanding undergraduates’ problem-solving processes. Journal of Biology and Microbiology Education, 11(2), 119–121.CrossRefGoogle Scholar
  48. Nehm, R. H., & Reilly, L. (2007). Biology majors’ knowledge and misconceptions of natural selection. Bioscience, 57(3), 263–272.CrossRefGoogle Scholar
  49. Nehm, R. H., & Schonfeld, I. (2008). Measuring knowledge of natural selection: a comparison of the CINS, an open-response instrument, and oral interview. Journal of Research in Science Teaching, 45(10), 1131–1160.CrossRefGoogle Scholar
  50. Nehm, R. H., & Ha, M. (2011). Item feature effects in evolution assessment. Journal of Research in Science Teaching, 48(3), 237–256.CrossRefGoogle Scholar
  51. Nehm, R. H., Ha, M., Rector, M., Opfer, J. F., Perrin, L., Ridgway, J., & Mollohan, K. (2010). Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (ACORNS). Technical Report of National Science Foundation REESE Project, 0909999.
  52. Nehm, R. H., Ha, M., & Mayfield, E. (2011). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.CrossRefGoogle Scholar
  53. Nehm, R. H., Beggrow, E., Opfer, J., & Ha, M. (2012). Reasoning about natural selection: Diagnosing contextual competency using the ACORNS instrument. The American Biology Teacher, 74(2), 92–98.CrossRefGoogle Scholar
  54. Opfer, J., Nehm, R. H., & Ha, M. (2012). Cognitive foundations for science assessment design: knowing what students know about evolution. The Journal of Research in Science Teaching, 49(6), 744–777.CrossRefGoogle Scholar
  55. Osborne, J. F., & Patterson, A. (2011). Scientific argument and explanation: a necessary distinction? Science Education, 95, 627–638.CrossRefGoogle Scholar
  56. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: bringing order to the web (Tech. Rep.). Stanford, CA: Stanford Digital Library Technologies Project.Google Scholar
  57. Papadouris, N., Constantinou, C. P., & Kyratsi, T. (2008). Students’ use of the energy model to account for changes in physical systems. Journal of Research in Science Teaching, 45, 444–469.CrossRefGoogle Scholar
  58. Peker, D., & Wallace, C. S. (2011). Characterizing high school students’ written explanations in biology laboratories. Research in Science Education, 41, 169–191.CrossRefGoogle Scholar
  59. Pellegrino, J. W. (2013). Proficiency in science: assessment challenges and opportunities. Science, 340, 320–323.CrossRefGoogle Scholar
  60. Perret, P., Bailleux, C., & Dauvier, B. (2011). The influence of relational complexity and strategy selection on children’s reasoning in the Latin Square Task. Cognitive Development, 26, 127–141.CrossRefGoogle Scholar
  61. Pollatsek, A., & Well, A. D. (1995). On the use of counterbalanced designs in cognitive research: a suggestion for a better and more powerful analysis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(3), 785–794.Google Scholar
  62. Popham, W. J. (2010). Classroom assessment: what teachers need to know. Pearson: Pearson Allyn & Bacon.Google Scholar
  63. Rector, M., Nehm, R. H., & Pearl, D. (2012). Learning the language of evolution: lexical ambiguity and word meaning in student explanations. Research in Science Education, 43(3), 1107–1133.CrossRefGoogle Scholar
  64. Rodrigues, S., Taylor, N., Cameron, M., Syme-Smith, L., & Fortuna, C. (2010). Questioning chemistry: the role of level, familiarity, language, and taxonomy. Science Education International, 21(1), 31–46.Google Scholar
  65. Rodriguez, M. C. (2003). Construct equivalence of multiple choice and constructed-response items: a random effects synthesis of correlations. Journal of Educational Measurement, 40, 163–184.CrossRefGoogle Scholar
  66. Roediger, H. L., III. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology, 31(5), 1155–1159.Google Scholar
  67. Russ, R. S., Scherr, R. E., Hammer, D., & Mikeska, J. (2008). Recognizing mechanistic reasoning in student scientific inquiry: a framework for discourse analysis developed from philosophy of science. Science Education, 92(3), 499–525.CrossRefGoogle Scholar
  68. Salmon, W. C. (1984). Scientific explanation and the causal structure of the world (pp. 79–118). Princeton: University Press.Google Scholar
  69. Sax, G., & Cromack, T. R. (1966). The effects of various forms of item arrangements on test performance. Journal of Educational Measurement, 3, 309–311.CrossRefGoogle Scholar
  70. Settlage, J., & Jensen, M. (1996). Investigating the inconsistencies in college student responses to natural selection test questions. Electronic Journal of Science Education, 1, 1.Google Scholar
  71. Scriven, M. (1959). Explanation and prediction in evolutionary theory. Science, 130, 477–482.CrossRefGoogle Scholar
  72. Singh, C. (2008). Assessing student expertise in introductory physics with isomorphic problems: II. Effects of some potential factors on problem solving and transfer. Physics Education Research, 4(1), 010105-1–010105-10.Google Scholar
  73. Songer, N. B., Kelcey, B., & Gotwals, A. W. (2009). How and when does complex reasoning occur? Empirically driven development of a learning progression focused on complex reasoning about biodiversity. Journal of Research in Science Teaching, 46(6), 610–631.CrossRefGoogle Scholar
  74. Strevens, M. (2004). ‘Scientific explanation’, in macmillan encyclopaedia of philosophy, (2nd ed.).Google Scholar
  75. Ward, W. C., Dupree, D., & Carlson, S. B. (1987). A comparison of free-response and multiple-choice questions in the assessment of reading comprehension (RR 87–20). Princeton, N.J.: Educational Testing Service.Google Scholar
  76. White, B. Y., & Frederickson, J. R. (1998). Inquiry, modeling, and metacognition: making science accessible to all students. Cognition & Instruction, 16, 3–118.Google Scholar
  77. White, B. T., & Yamamoto, S. (2011). Freshman undergraduate biology students’ difficulties with the concept of common ancestry. Evolution: Education & Outreach, 4(4), 680–687.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Meghan Rector Federer
    • 1
  • Ross H. Nehm
    • 2
  • John E. Opfer
    • 3
  • Dennis Pearl
    • 4
  1. 1.Department of Teaching and LearningOhio State UniversityColumbusUSA
  2. 2.Center for Science and Mathematics EducationStony Brook UniversityStony BrookUSA
  3. 3.Department of PsychologyOhio State UniversityColumbusUSA
  4. 4.Department of StatisticsOhio State UniversityColumbusUSA

Personalised recommendations