Skip to main content

Advertisement

Log in

Using a Constructed-Response Instrument to Explore the Effects of Item Position and Item Features on the Assessment of Students’ Written Scientific Explanations

  • Published:
Research in Science Education Aims and scope Submit manuscript

Abstract

A large body of work has been devoted to reducing assessment biases that distort inferences about students’ science understanding, particularly in multiple-choice instruments (MCI). Constructed-response instruments (CRI), however, have invited much less scrutiny, perhaps because of their reputation for avoiding many of the documented biases of MCIs. In this study we explored whether known biases of MCIs—specifically item sequencing and surface feature effects—were also apparent in a CRI designed to assess students’ understanding of evolutionary change using written explanation (Assessment of COntextual Reasoning about Natural Selection [ACORNS]). We used three versions of the ACORNS CRI to investigate different aspects of assessment structure and their corresponding effect on inferences about student understanding. Our results identified several sources of (and solutions to) assessment bias in this practice-focused CRI. First, along the instrument item sequence, items with similar surface features produced greater sequencing effects than sequences of items with dissimilar surface features. Second, a counterbalanced design (i.e., Latin Square) mitigated this bias at the population level of analysis. Third, ACORNS response scores were highly correlated with student verbosity, despite verbosity being an intrinsically trivial aspect of explanation quality. Our results suggest that as assessments in science education shift toward the measurement of scientific practices (e.g., explanation), it is critical that biases inherent in these types of assessments be investigated empirically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. While the administration of three independent assessment tasks to three different student cohorts is a limitation of this study (see Study Limitations for a detailed discussion), we argue that the isomorphic nature of the assessment items allows for comparison across participant samples.

References

  • American Association for the Advancement of Science [AAAS]. (1994). Benchmarks for science literacy. New York: Oxford University.

  • American Association for the Advancement of Science [AAAS]. (2011). Vision and change in undergraduate biology education. Washington, DC. http://visionandchange.org/.

  • Anderson, D. L., Fisher, K. M., & Norman, G. J. (2002). Development and evaluation of the conceptual inventory of natural selection. Journal of Research in Science Teaching, 39, 952–978.

    Article  Google Scholar 

  • Bennett, R. E., & Ward, W. C. (1993). Construction versus choice in cognitive measurement: issues in constructed response, performance testing, and portfolio assessment. Hillsdale, NJ: Lawrence Erlbaum Associates.

  • Berland, L. K., & McNeill, K. L. (2012). For whom is argument and explanation a necessary distinction? A response to Osborne and Patterson. Science Education, 96(5), 808–813.

    Article  Google Scholar 

  • Birney, D. P., Halford, G. S., & Andrews, G. (2006). Measuring the influence of complexity on relational reasoning: the development of the Latin Square Task. Educational & Psychological Measurement, 66(1), 146–171.

    Article  Google Scholar 

  • Bishop, B., & Anderson, C. (1990). Student conceptions of natural selection and its role in evolution. Journal of Research in Science Teaching, 27, 415–427.

    Article  Google Scholar 

  • Bridgeman, B. (1992). A comparison of quantitative questions in open-ended and multiple-choice formats. Journal of Educational Measurement, 29(3), 253–271.

    Article  Google Scholar 

  • Caleon, I. S., & Subramaniam, R. (2010). Do students know what they know and what they don’t know? Using a four-tier diagnostic test to assess the nature of students’ alternative conceptions. Research in Science Education, 40, 313–337.

    Article  Google Scholar 

  • Catley, K. M., Phillips, B. C., & Novick, L. R. (2013). Snakes, eels, and dogs! Oh my! Evaluating high-school students’ tree-thinking skills: an entry point to understanding evolution. Research in Science Education, 43(6), 2327–2348.

    Article  Google Scholar 

  • Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121–152.

  • Clough, E. E., & Wood-Robinson, C. (1985). How secondary students interpret instances of biological adaptation. Journal of Biological Education, 19, 125–130.

    Article  Google Scholar 

  • Clough, E. E., & Driver, R. (1986). A study of consistency in the use of students’ conceptual frameworks across different task contexts. Science Education, 70(4), 473–496.

    Article  Google Scholar 

  • Cronbach, L. J. (1988). Five perspectives on validity argument (In H. Wainer and H.I. Braun (Eds)). Hillsdale, NJ: Lawrence Erlbaum.

  • Duschl, R. A., Schweingruber, H. A., & Shouse, A. W. (2007). Taking science to school: learning and teaching science in grades K-8. Washington DC: National Academies.

  • Friedman, M. (1974). Explanation and scientific understanding. Journal of Philosophy, 71(1), 5–19.

    Article  Google Scholar 

  • Garvin-Doxas, K., & Klymkowsky, M. W. (2008). Understanding randomness and its impact on student learning: lessons learned from building the Biology Concept Inventory (BCI). CBE Life Sciences Education, 7(2), 227–233.

    Article  Google Scholar 

  • Gotwals, A. W., & Songer, N. B. (2010). Reasoning up and down a food chain: using an assessment framework to investigate students’ middle knowledge. Science Education, 94, 259–281.

    Google Scholar 

  • Gray, K. E. (2004). The effect of question order on student responses to multiple choice physics questions. Master thesis, Kansas State University. Retrieved from http://web.phys.ksu.edu/dissertations/

  • Griffiths, T. L., Steyvers, M., & Firl, A. (2007). Google and the mind: predicting fluency with PageRank. Psychological Science, 18, 1069–1067.

    Article  Google Scholar 

  • Gulacar, O., & Fynewevr, H. (2010). A research methodology for studying what makes some problems difficult to solve. International Journal of Science Education, 32(16), 2167–2184.

    Article  Google Scholar 

  • Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed, pp. 187–220). Westport: American Council on Higher Education and Praeger.

  • Hempel, C., & Oppenheim, P. (1948). Studies in the logic of explanation. Philosophy of Science, 15, 135–175.

    Article  Google Scholar 

  • Jensen, P., Watanabe, H. K., & Richters, J. E. (1999). Who’s up first? Testing for order effects in structured interviews using a counterbalanced experimental design. Journal of Abnormal Child Psychology, 27(6), 439–445.

    Article  Google Scholar 

  • Kampourakis, K., & Zygzos, V. (2008). Students’ intuitive explanations of the causes of homologies and adaptations. Science & Education, 17, 27–47.

    Article  Google Scholar 

  • Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test format and corrective feedback modify the effect of cross-validation on long-term retention. European Journal of Cognitive Psychology, 19, 528–558.

    Article  Google Scholar 

  • Kelemen, D. (2012). Teleological minds: how natural intuitions about agency and purpose influence learning about evolution. In K. S. Rosengren, S. K. Brem, E. M. Evans, & G. M. Sinatra (Eds.), Evolution challenges: integrating research and practice in teaching and learning about evolution (pp. 66–92). Oxford: Oxford University.

  • Kingston, N. M., & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147–154.

    Article  Google Scholar 

  • Kitcher, P. (1981). Explanatory unification. Philosophy of Science, 48(4), 507–531.

    Article  Google Scholar 

  • Leary, L. F., & Dorans, N. J. (1985). Implications for altering the context in which test items appear: a historical perspective on an immediate concern. Review of Educational Research, 55(3), 387–413.

    Article  Google Scholar 

  • Lee, H.-S., Liu, L., & Linn, M. C. (2011). Validating measurement of knowledge integration in science using multiple-choice and explanation items. Applied Measurement in Education, 24(2), 115–136.

    Article  Google Scholar 

  • Lewis, D. (1986). Causal explanation. In D. Lewis (Ed.), Philosophical papers (Vol. 2, pp. 214–240). Oxford: Oxford University Press.

    Google Scholar 

  • Liu, O. L., Lee, H.-S., & Linn, M. C. (2011). An investigation of explanation multiple-choice items in science assessment. Educational Assessment, 16, 164–184.

    Article  Google Scholar 

  • MacNicol, K. (1956). Effects of varying order of item difficulty in an unspeeded verbal test. Unpublished manuscript, Educational Testing Service, Princeton, NJ.

  • Mandler, G., & Rabinowitz, J. C. (1981). Appearance and reality: does a recognition test really improve subsequent recall and recognition? Journal of Experimental Psychology: Human Learning and Memory, 7(2), 79–90.

    Google Scholar 

  • Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34(4), 207–218.

    Article  Google Scholar 

  • Martiniello, M. (2008). Language and the performance of English Language Learners in math word problems. Harvard Educational Review, 78, 333–368.

    Article  Google Scholar 

  • McClary, L., & Talanquer, V. (2011). College chemistry students’ mental models of acids and acid strength. Journal of Research in Science Teaching, 48(4), 396–413.

    Article  Google Scholar 

  • McNeill, K. L., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. Journal of the Learning Sciences, 15(2), 153–191.

    Article  Google Scholar 

  • Messick, S. (1995). Validity of psychological assessment. American Psychologist, 50, 741–749.

    Article  Google Scholar 

  • Mollenkopf, W. G. (1950). An experimental study of the effects on item analysis data of changing item placement and test-time limit. Psychometrika, 15, 291–315.

    Article  Google Scholar 

  • Monk, J. J., & Stallings, W. M. (1970). Effect of item order on test scores. Journal of Educational Research, 63, 463–465.

    Google Scholar 

  • National Research Council. (1996). National science education standards. Washington, DC: National Academies.

  • National Research Council. (2001). Knowing what students know: the science and design of educational assessment. Washington, DC: National Academies.

  • National Research Council. (2007). Taking science to school: learning and teaching science in grades K-8. Washington, DC: The National Academies.

  • National Research Council. (2012). A framework for K-12 science education: practices, crosscutting concepts, and core ideas. Washington, DC: National Academies.

  • Nehm, R. H. (2010). Understanding undergraduates’ problem-solving processes. Journal of Biology and Microbiology Education, 11(2), 119–121.

    Article  Google Scholar 

  • Nehm, R. H., & Reilly, L. (2007). Biology majors’ knowledge and misconceptions of natural selection. Bioscience, 57(3), 263–272.

    Article  Google Scholar 

  • Nehm, R. H., & Schonfeld, I. (2008). Measuring knowledge of natural selection: a comparison of the CINS, an open-response instrument, and oral interview. Journal of Research in Science Teaching, 45(10), 1131–1160.

    Article  Google Scholar 

  • Nehm, R. H., & Ha, M. (2011). Item feature effects in evolution assessment. Journal of Research in Science Teaching, 48(3), 237–256.

    Article  Google Scholar 

  • Nehm, R. H., Ha, M., Rector, M., Opfer, J. F., Perrin, L., Ridgway, J., & Mollohan, K. (2010). Scoring guide for the open response instrument (ORI) and evolutionary gain and loss test (ACORNS). Technical Report of National Science Foundation REESE Project, 0909999. www.evolutionassessment.org.

  • Nehm, R. H., Ha, M., & Mayfield, E. (2011). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.

    Article  Google Scholar 

  • Nehm, R. H., Beggrow, E., Opfer, J., & Ha, M. (2012). Reasoning about natural selection: Diagnosing contextual competency using the ACORNS instrument. The American Biology Teacher, 74(2), 92–98.

    Article  Google Scholar 

  • Opfer, J., Nehm, R. H., & Ha, M. (2012). Cognitive foundations for science assessment design: knowing what students know about evolution. The Journal of Research in Science Teaching, 49(6), 744–777.

    Article  Google Scholar 

  • Osborne, J. F., & Patterson, A. (2011). Scientific argument and explanation: a necessary distinction? Science Education, 95, 627–638.

    Article  Google Scholar 

  • Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: bringing order to the web (Tech. Rep.). Stanford, CA: Stanford Digital Library Technologies Project.

    Google Scholar 

  • Papadouris, N., Constantinou, C. P., & Kyratsi, T. (2008). Students’ use of the energy model to account for changes in physical systems. Journal of Research in Science Teaching, 45, 444–469.

    Article  Google Scholar 

  • Peker, D., & Wallace, C. S. (2011). Characterizing high school students’ written explanations in biology laboratories. Research in Science Education, 41, 169–191.

    Article  Google Scholar 

  • Pellegrino, J. W. (2013). Proficiency in science: assessment challenges and opportunities. Science, 340, 320–323.

    Article  Google Scholar 

  • Perret, P., Bailleux, C., & Dauvier, B. (2011). The influence of relational complexity and strategy selection on children’s reasoning in the Latin Square Task. Cognitive Development, 26, 127–141.

    Article  Google Scholar 

  • Pollatsek, A., & Well, A. D. (1995). On the use of counterbalanced designs in cognitive research: a suggestion for a better and more powerful analysis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(3), 785–794.

    Google Scholar 

  • Popham, W. J. (2010). Classroom assessment: what teachers need to know. Pearson: Pearson Allyn & Bacon.

  • Rector, M., Nehm, R. H., & Pearl, D. (2012). Learning the language of evolution: lexical ambiguity and word meaning in student explanations. Research in Science Education, 43(3), 1107–1133.

    Article  Google Scholar 

  • Rodrigues, S., Taylor, N., Cameron, M., Syme-Smith, L., & Fortuna, C. (2010). Questioning chemistry: the role of level, familiarity, language, and taxonomy. Science Education International, 21(1), 31–46.

    Google Scholar 

  • Rodriguez, M. C. (2003). Construct equivalence of multiple choice and constructed-response items: a random effects synthesis of correlations. Journal of Educational Measurement, 40, 163–184.

    Article  Google Scholar 

  • Roediger, H. L., III. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology, 31(5), 1155–1159.

    Google Scholar 

  • Russ, R. S., Scherr, R. E., Hammer, D., & Mikeska, J. (2008). Recognizing mechanistic reasoning in student scientific inquiry: a framework for discourse analysis developed from philosophy of science. Science Education, 92(3), 499–525.

    Article  Google Scholar 

  • Salmon, W. C. (1984). Scientific explanation and the causal structure of the world (pp. 79–118). Princeton: University Press.

    Google Scholar 

  • Sax, G., & Cromack, T. R. (1966). The effects of various forms of item arrangements on test performance. Journal of Educational Measurement, 3, 309–311.

    Article  Google Scholar 

  • Settlage, J., & Jensen, M. (1996). Investigating the inconsistencies in college student responses to natural selection test questions. Electronic Journal of Science Education, 1, 1.

    Google Scholar 

  • Scriven, M. (1959). Explanation and prediction in evolutionary theory. Science, 130, 477–482.

    Article  Google Scholar 

  • Singh, C. (2008). Assessing student expertise in introductory physics with isomorphic problems: II. Effects of some potential factors on problem solving and transfer. Physics Education Research, 4(1), 010105-1–010105-10.

  • Songer, N. B., Kelcey, B., & Gotwals, A. W. (2009). How and when does complex reasoning occur? Empirically driven development of a learning progression focused on complex reasoning about biodiversity. Journal of Research in Science Teaching, 46(6), 610–631.

    Article  Google Scholar 

  • Strevens, M. (2004). ‘Scientific explanation’, in macmillan encyclopaedia of philosophy, (2nd ed.).

  • Ward, W. C., Dupree, D., & Carlson, S. B. (1987). A comparison of free-response and multiple-choice questions in the assessment of reading comprehension (RR 87–20). Princeton, N.J.: Educational Testing Service.

    Google Scholar 

  • White, B. Y., & Frederickson, J. R. (1998). Inquiry, modeling, and metacognition: making science accessible to all students. Cognition & Instruction, 16, 3–118.

  • White, B. T., & Yamamoto, S. (2011). Freshman undergraduate biology students’ difficulties with the concept of common ancestry. Evolution: Education & Outreach, 4(4), 680–687.

    Google Scholar 

Download references

Acknowledgments

This research was supported by the National Science Foundation REESE program (R.H. Nehm) and the Marilyn Ruth Hathaway Education Scholarship (M.R. Federer). Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the NSF or The Ohio State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meghan Rector Federer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Federer, M.R., Nehm, R.H., Opfer, J.E. et al. Using a Constructed-Response Instrument to Explore the Effects of Item Position and Item Features on the Assessment of Students’ Written Scientific Explanations. Res Sci Educ 45, 527–553 (2015). https://doi.org/10.1007/s11165-014-9435-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11165-014-9435-9

Keywords

Navigation