AACR. (2020). September 4, 2020, Retrieved from https://apps.beyondmultiplechoice.org
Balfour, S. P. (2013). Assessing writing in MOOCs: Automated Essay Scoring and Calibrated Peer ReviewTM. Research & Practice in Assessment, 8, 40–48.
Google Scholar
Cheuk, T., Osborne, J., Cunningham, K., Haudek, K., Santiago, M., Urban-Lurain, M., Merril, J., Wilson,C., Stuhlsatz, M.,Donovan, B., Bracey, Z., & Gardner, A. (2019). Towards an Equitable Design Framework of Developing Argumentation in Science tasks and Rubrics for Machine Learning. Presented at the Annual meeting of the National Association for Research in Science Teaching (NARST). Baltimore, MD.
Fleiss, J.L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: John Wiley. ISBN 978–0–471–26370–8.
Geiger, R. S., Yu, K., Yang, Y., Dai, M., Qiu, J., Tang, R., & Huang, J. (2020, January). Garbage in, garbage out? do machine learning application papers in social computing report where human-labeled training data comes from?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 325–336).
Ha, M., & Nehm, R. H. (2016). The impact of misspelled words on automated computer scoring: a case study of scientific explanations. Journal of Science Education and Technology, 25(3), 358–374.
Article
Google Scholar
Harris, C. J., Krajcik, J. S., Pellegrino, J. W., & DeBarger, A. H. (2019). Designing knowledge-in-use assessments to promote deeper learning. Educational Measurement: Issues and Practice, 38(2), 53-67. https://doi.org/10.1111/emip.12253.
Haudek, K., Santiago, M., Wilson, C., Stuhlsatz, M.,Donovan, B., Bracey, Z., Gardner, A., Osborne, J., & Cheuk, T. (2019). Using Automated Analysis to Assess Middle School Students’ Competence with Scientific Argumentation, presented at the Annual Meeting of the National Council on Measurement in Education (NCME). Toronto, ON.
Large, J., Lines, J., & Bagnall, A. (2019). A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates. Data mining and knowledge discovery, 33(6), 1674–1709.
Article
Google Scholar
Lee, H. S., McNamara, D., Bracey, Z. B., Liu, O. L., Gerard, L., Sherin, B., Wilson, C., Pallant, A., Linn, M., Haudek, K., & Osborne, J. (2019a). Computerized text analysis: Assessment and research potentials for promoting learning.
Lee, H. S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019b). Automated text scoring and real-time adjustable feedback: Supporting revision of scientific arguments involving uncertainty. Science Education, 103(3), 590–622.
Article
Google Scholar
Liu, O. L., Brew, C., Blackmore, J., & Gerard, L. (2014). Automated scoring of constructed response science items: Prospects and obstacles. Educational Measurement-Issues and Practices, 33(2), 19–28. https://doi.org/10.1111/emip.12028.
Article
Google Scholar
Lottridge, S., Wood, S., & Shaw, D. (2018). The effectiveness of machine score-ability ratings in predicting automated scoring performance. Applied Measurement in Education, 31(3), 215–232.
Article
Google Scholar
Mao, L., Liu, O. L., Roohr, K., Belur, V., Mulholland, M., Lee, H.-S., & Pallant, A. (2018). Validation of automated scoring for a formative assessment that employs scientific argumentation. Educational Assessment, 23(2), 121–138.
Article
Google Scholar
Mayfield, E., & Rosé, C. (2010, June). An interactive tool for supporting error analysis for text mining. In Proceedings of the NAACL HLT 2010 Demonstration Session (pp. 25–28).
Mayfield, E., & Rosé, C. P. (2013). Open source machine learning for text. Handbook of automated essay evaluation: Current applications and new directions.
Google Scholar
National Academies of Sciences, Engineering, and Medicine. (2019). Science and engineering for grades 6–12: Investigation and design at the center. National Academies Press.
National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. National Academies Press.
National Research Council. (2014). Developing assessments for the next generation science standards. National Academies Press.
Nehm, R. H., & Haertig, H. (2012). Human vs. computer diagnosis of students’ natural selection knowledge: testing the efficacy of text analytic software. Journal of Science Education and Technology, 21(1), 56–73.
NGSS Lead States. (2013). Next generation science standards: For states, by states. Washington, DC: The National Academies Press.
Google Scholar
Pellegrino, J. W. (2013). Proficiency in science: Assessment challenges and opportunities. Science, 340(6130), 320–323.
Article
Google Scholar
Zhai, X., Haudek, K., Shi, L., Nehm, R., Urban-Lurain, M. (2020a). From substitution to redefinition: A framework of machine learning-based science assessment. Journal of Research in Science Teaching, 57(9), 1430-1459. https://doi.org/10.1002/tea.21658.
Zhai, X., Haudek, K., Stuhlsatz, M., Wilson, C. (2020b). Evaluation of construct-irrelevant variance yielded by machine and human scoring of a science teacher PCK constructed response assessment. Studies in Educational Evaluation, 67, 1-12. https://doi.org/10.1016/j.stueduc.2020.100916.
Zhai, X., Yin, Y., Pellegrino, J., Haudek, K., Shi., L. (2020c). Applying machine learning in science assessment: A systematic review. Studies in Science Education. 56(1), 111-151.
Zhu, M., Lee, H.-S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–1668.