Skip to main content

Applying Rasch Measurement to Assess Knowledge-in-Use in Science Education

  • Chapter
  • First Online:
Advances in Applications of Rasch Measurement in Science Education

Part of the book series: Contemporary Trends and Issues in Science Education ((CTISE,volume 57))

Abstract

This study applied the many-facet Rasch measurement (MFRM) to assess students’ knowledge-in-use in middle school physical science. 240 students completed three knowledge-in-use classroom assessment tasks on an online platform. We developed transformable scoring rubrics to score students’ responses, including a task-generic polytomous rubric (applicable to the three tasks), a task-specific polytomous rubric (for each task), and a task-specific dichotomous rubric (for each task). Three qualified raters scored 240 students’ responses to the three tasks. MFRM reported student ability, item difficulty, rater severity, and interaction effects, which helped improve the assessment tasks and rubrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bond, T. G., & Fox, C. M. (2015). Applying the Rasch models: Fundamental measurement in the human sciences. Lawrence Erlbaum Associate.

    Book  Google Scholar 

  • Boone, W. J., & Staver, J. R. (2020). Advances in Rasch analyses in the human sciences. Springer.

    Book  Google Scholar 

  • Boone, W. J., Townsend, J. S., & Staver, J. R. (2016). Utilizing multifaceted Rasch measurement through FACETS to evaluate science education data sets composed of judges, respondents, and rating scale items: An exemplar utilizing the elementary science teaching analysis matrix instrument. Science Education, 100(2), 221–238.

    Article  Google Scholar 

  • Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18(1), 32–42.

    Article  Google Scholar 

  • Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of instruction. Cognition and Instruction, 8(4), 293–332.

    Article  Google Scholar 

  • Chen, Y.-C., & Terada, T. (2021). Development and validation of an observation-based protocol to measure the eight scientific practices of the next generation science standards in K-12 science classrooms. Journal of Research in Science Teaching, 58(10), 1489–1526.

    Article  Google Scholar 

  • Chen, Y., Irving, P. W., & Sayre, E. C. (2013). Epistemic game for answer making in learning about hydrostatics. Physical Review Special Topics—Physics Education Research, 9(1), 1–7. https://doi.org/10.1103/PhysRevSTPER.9.010108

    Article  Google Scholar 

  • Chi, S., Liu, X., & Wang, Z. (2021). Comparing student science performance between hands-on and traditional item types: A many-facet Rasch analysis. Studies in Educational Evaluation, 70, 100998.

    Article  Google Scholar 

  • Cohen, J. E. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  • Crowder, E. M. (1996). Gestures at work in sensemaking science talk. Journal of the Learning Sciences, 5(3), 173–208.

    Article  Google Scholar 

  • Duncan, R. G., Krajcik, J. S., & Rivet, A. E. (Eds.). (2017). Disciplinary core ideas: Reshaping teaching and learning. NTSA Press.

    Google Scholar 

  • Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185.

    Article  Google Scholar 

  • Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2nd ed.). Peter Lang.

    Google Scholar 

  • Finnish National Board of Education. (2016). National core curriculum for basic education 2014. Finnish National Board of Education.

    Google Scholar 

  • Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of agreement. Educational and Psychological Measurement., 33, 613–619.

    Article  Google Scholar 

  • Fleiss, J., Levin, B., & Paik, M. (2003). Statistical methods for rates & proportions (3rd ed.). Wiley & Sons.

    Book  Google Scholar 

  • Gotwals, A. W., & Songer, N. B. (2013). Validity evidence for learning progression-based assessment items that fuse core disciplinary ideas and science practices. Journal of Research in Science Teaching, 50(5), 597–626.

    Article  Google Scholar 

  • Gotwals, A. W., Songer, N. B., & Bullard, L. (2012). Assessing students’ progressing abilities to construct scientific explanations. In Learning progressions in science (pp. 183–210). Brill Sense.

    Chapter  Google Scholar 

  • Greeno, J. G., Collins, A. M., & Resnick, L. B. (1996). Cognition and learning. Handbook of Educational Psychology, 77, 15–46.

    Google Scholar 

  • Harris, C. J., McNeill, K. L., Lizotte, D. L., Marx, R. W., & Krajcik, J. (2006). Usable assessments for teaching science content and inquiry standards. In M. McMahon, P. Simmons, R. Sommers, D. DeBaets, & F. Crowley (Eds.), Assessment in science: Practical experiences and education research (pp. 67–88). National Science Teachers Association Press.

    Google Scholar 

  • Harris, C. J., Krajcik, J. S., Pellegrino, J. W., & DeBarger, A. H. (2019). Designing knowledge-in-use assessments to promote deeper learning. Educational Measurement: Issues and Practice., 38(2), 53–67.

    Article  Google Scholar 

  • He, P., Liu, X., Zheng, C., & Jia, M. (2016). Using Rasch measurement to validate an instrument for measuring the quality of classroom teaching in secondary chemistry lessons. Chemistry Education Research and Practice, 17(2), 381–393.

    Article  Google Scholar 

  • He, P., Chen, I., Touitou, I., Bartz, K., Schneider, B., & Krajcik, J. (2023). Predicting student science achievement using post-unit assessment performances in a coherent high school chemistry project-based learning system. Journal of Research in Science Teaching, 60(4), 724–760.

    Google Scholar 

  • Jescovitch, L. N., Scott, E. E., Cerchiara, J. A., Merrill, J., Urban-Lurain, M., Doherty, J. H., & Haudek, K. C. (2021). Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression. Journal of Science Education and Technology, 30(2), 150–167.

    Article  Google Scholar 

  • Kaldaras, L., Akaeze, H., & Krajcik, J. (2021). Developing and validating next generation science standards-aligned learning progression to track three-dimensional learning of electrical interactions in high school physical science. Journal of Research in Science Teaching, 58(4), 589–618.

    Article  Google Scholar 

  • Kapon, S. (2017). Unpacking sensemaking. Science Education, 101(1), 165–198.

    Article  Google Scholar 

  • Kim, E. M., Nabors Oláh, L., & Peters, S. (2020). A learning progression for constructing and interpreting data display. ETS Research Report Series, 2020(1), 1–27.

    Article  Google Scholar 

  • Krajcik, J. S. (2021). Commentary—Applying machine learning in science assessment: Opportunity and challenges. Journal of Science Education and Technology, 30(2), 313–318.

    Article  Google Scholar 

  • Krist, C., Schwarz, C. V., & Reiser, B. J. (2019). Identifying essential epistemic heuristics for guiding mechanistic reasoning in science learning. Journal of the Learning Sciences, 28(2), 160–205.

    Article  Google Scholar 

  • Leckie, G., & Baird, J. A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399–418.

    Article  Google Scholar 

  • Linacre, J. M. (1989). Many-faceted Rasch measurement. MESA Press.

    Google Scholar 

  • Liu, X. (2020). Using and developing measurement instruments in science education: A Rasch modeling approach (2nd ed.). IAP.

    Google Scholar 

  • Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., & Linn, M. C. (2016). Validation of automated scoring science assessments. Journal of Research in Science Teaching, 53(2), 215–233.

    Article  Google Scholar 

  • Mao, L., Liu, O. L., Roohr, K., Belur, V., Mulholland, M., Lee, H.-S., & Pallant, A. (2018). Validation of automated scoring for a formative assessment that employs scientific argumentation. Educational Assessment, 23(2), 121–138.

    Article  Google Scholar 

  • Mayer, K., & Krajcik, J. (2015). Designing and assessing scientific modeling tasks. In Encyclopedia of science education (pp. 291–297). Springer.

    Chapter  Google Scholar 

  • McNeill, K. L., Lizotte, D. J., Krajcik, J., & Marx, R. W. (2006). Supporting students’ construction of scientific explanations by fading scaffolds in instructional materials. Journal of the Learning Sciences, 15(2), 153–191.

    Article  Google Scholar 

  • Ministry of Education, P. R. China. (2017). Chemistry curriculum standards for senior high school [普通高中化学课程标准]. People’s Education Press.

    Google Scholar 

  • Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Research Report Series, 2003(1), i–29.

    Article  Google Scholar 

  • National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. The National Academies Press.

    Google Scholar 

  • NGSA. (2022). Next generation science assessment. https://ngss-assessment.portal.concord.org

  • NGSS Lead States. (2013). Next generation science standards: For states, by states. The National Academies Press.

    Google Scholar 

  • Nordine, J., & Lee, O. (Eds.). (2021). Crosscutting concepts: Strengthening science and engineering learning. National Science Teaching Association.

    Google Scholar 

  • Odden, T. O. B., & Russ, R. S. (2019). Defining sensemaking: Bringing clarity to a fragmented theoretical construct. Science Education, 103(1), 187–205.

    Article  Google Scholar 

  • Organization for Economic Cooperation and Development. (2019). PISA 2018 assessment and analytical framework. OECD Publishing. https://doi.org/10.1787/b25efab8-en

    Book  Google Scholar 

  • Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive load theory and instructional design: Recent developments. Educational Psychologist, 38(1), 1–4.

    Article  Google Scholar 

  • Pellegrino, J. W., & Hilton, M. L. (2012). Education for life and work: Developing transferable knowledge and skills in the 21st century. National Academies Press.

    Google Scholar 

  • Penuel, W. R., Turner, M. L., Jacobs, J. K., Van Horne, K., & Sumner, T. (2019). Developing tasks to assess phenomenon-based science learning: Challenges and lessons learned from building proximal transfer tasks. Science Education, 103(6), 1367–1395.

    Article  Google Scholar 

  • Schwarz, C., Passmore, C., & Reiser, B. J. (2017). Helping students make sense of the world using next generation science and engineering practices. National Science Teachers Association.

    Google Scholar 

  • Styck, K. M., Anthony, C. J., Sandilos, L. E., & DiPerna, J. C. (2021). Examining rater effects on the classroom assessment scoring system. Child Development, 92(3), 976–993.

    Article  Google Scholar 

  • Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295–312.

    Article  Google Scholar 

  • Wang, C., Liu, X., Wang, L., Sun, Y., & Zhang, H. (2021). Automated scoring of Chinese grades 7–9 students’ competence in interpreting and arguing from evidence. Journal of Science Education and Technology, 30(2), 269–282.

    Article  Google Scholar 

  • Wright, B. D., & Stone, M. H. (1979). Best test design. MESA press.

    Google Scholar 

  • Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. A. (2007). ACER ConQuest version 2.0: Generalised item response modelling software. ACER Press.

    Google Scholar 

  • Yang, Y., He, P., & Liu, X. (2018). Validation of an instrument for measuring students’ understanding of interdisciplinary science in grades 4-8 over multiple semesters: A Rasch measurement study. International Journal of Science and Mathematics Education, 16(4), 639–654.

    Article  Google Scholar 

  • Zangori, L., Peel, A., Kinslow, A., Friedrichsen, P., & Sadler, T. D. (2017). Student development of model-based reasoning about carbon cycling and climate change in a socio-scientific issues unit. Journal of Research in Science Teaching, 54(10), 1249–1273.

    Article  Google Scholar 

  • Zhai, X. (2022). Assessing high-school students’ modeling performance on newtonian mechanics. Journal of Research in Science Teaching., 59, 1–41.

    Article  Google Scholar 

  • Zhai, X., Haudek, K. C., Stuhlsatz, M. A., & Wilson, C. (2020). Evaluation of construct-irrelevant variance yielded by machine and human scoring of a science teacher PCK constructed response assessment. Studies in Educational Evaluation, 67, 100916.

    Article  Google Scholar 

  • Zhai, X., He, P., & Krajcik, J. (2022). Applying machine learning to automatically assess scientific models. Journal of Research in Science Teaching., 59 (10), 1765–1794.

    Article  Google Scholar 

Download references

Acknowledgements

This study is supported by the National Science Foundation (Grant Numbers 2101104, 2100964, 2201068; DRL-1903103, DRL-1316874, DRL-1316903, DRL-1316908). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng He .

Editor information

Editors and Affiliations

Appendix: ConQuest Many-Facet Rasch Modeling Codes

Appendix: ConQuest Many-Facet Rasch Modeling Codes

MFRM analysis code (by ConQuest)

Annotation

Datafile Fulldata01.Dat;

Format rater 5 responses 7–27 /

 Rater 5 responses 7–27 /

 Rater 5 responses 7–27! Criteria(21);

Label << NGSA240.Nam;

Set update = yes,warning = no;

Model rater + criteria;

Export parameters >> NGSA240.Prm;

Export reg >> NGSA240.Reg;

Export cov >> NGSA240.Cov;

Estimate! Nodes = 10,stderr = full;

Show parameters!Estimates = latent,tables = 1:2:4> > NGSA240.Shw;

The final model in the section of applying dichotomous MFRMs to analyze scores using task-specific dichotomous rubrics. See the main findings in Table 13.4 and Fig. 13.4.

Datafile Fulldata Recode012.Dat;

Format rater 5 responses 7–16 /

 Rater 5 responses 7–16 /

 Rater 5 responses 7–16! Element(10);

Label << NGSA240r.Nam;

Set update = yes,warning = no;

Model rater + element + rater*element + rater*element*step;

Export parameters >> NGSA240r.Prm;

Export reg >> NGSA240r.Reg;

Export cov >> NGSA240r.Cov;

Estimate! Nodes = 10,stderr = full;

Show parameters!Estimates = latent,tables = 1:2:4> > NGSA240r.Shw;

The final model (model 4) in the section of applying partial credit MFRMs to analyze scores using task-specific Polytomous rubric. See the main findings in Table 13.5 and Fig. 13.6.

Datafile Fulldata Recode012Com.Dat;

Format rater 5 task 7 responses 9–12 /

 Rater 5 task 7 responses 9–12 /

 Rater 5 task 7 responses 9–12! Com(4);

Label << NGSA240r2.Nam;

Set update = yes,warning = no;

Model rater + task + com + rater*task + rater*com + rater*com*step;

Export parameters >> NGSA240r2.Prm;

Export reg >> NGSA240r2.Reg;

Export cov >> NGSA240r2.Cov;

Estimate! Nodes = 10,stderr = full;

Show parameters!Estimates = latent,tables = 1:2:4> > NGSA240r2.Shw;

The final model (model 2) in the section of applying partial credit MFRMs to analyze scores using task-generic Polytomous rubrics. See the main findings in Tables 13.8 and 13.9 and Fig. 13.7.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

He, P., Zhai, X., Shin, N., Krajcik, J. (2023). Applying Rasch Measurement to Assess Knowledge-in-Use in Science Education. In: Liu, X., Boone, W.J. (eds) Advances in Applications of Rasch Measurement in Science Education. Contemporary Trends and Issues in Science Education, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-031-28776-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28776-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28775-6

  • Online ISBN: 978-3-031-28776-3

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics