Skip to main content
Log in

Testing the Impact of Novel Assessment Sources and Machine Learning Methods on Predictive Outcome Modeling in Undergraduate Biology

  • Published:
Journal of Science Education and Technology Aims and scope Submit manuscript

Abstract

High levels of attrition characterize undergraduate science courses in the USA. Predictive analytics research seeks to build models that identify at-risk students and suggest interventions that enhance student success. This study examines whether incorporating a novel assessment type (concept inventories [CI]) and using machine learning (ML) methods (1) improves prediction quality, (2) reduces the time point of successful prediction, and (3) suggests more actionable course-level interventions. A corpus of university and course-level assessment and non-assessment variables (53 variables in total) from 3225 students (over six semesters) was gathered. Five ML methods were employed (two individuals, three ensembles) at three time points (pre-course, week 3, week 6) to quantify predictive efficacy. Inclusion of course-specific CI data along with university-specific corpora significantly improved prediction performance. Ensemble ML methods, in particular the generalized linear model with elastic net (GLMNET), yielded significantly higher area under the curve (AUC) values compared with non-ensemble techniques. Logistic regression achieved the poorest prediction performance and consistently underperformed. Surprisingly, increasing corpus size (i.e., amount of historical data) did not meaningfully impact prediction success. We discuss the roles that novel assessment types and ML techniques may play in advancing predictive learning analytics and addressing attrition in undergraduate science education.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Ade, R., & Deshmukh, P.R. (2014, October). Classification of students by using an incremental ensemble of classifiers. In Proceedings of the 3rd International Conference on Reliability, Infocom Technologies and Optimization (pp. 1-5). IEEE.

  • Adekitan, A. I., & Noma-Osaghae, E. (2019). Data mining approach to predicting the performance of first year student in a university using the admissions requirement. Education and Information Technologies, 24(2), 1527–1543.

    Article  Google Scholar 

  • Alexandro, D. (2018). Aiming for Success: Evaluating Statistical and Machine Learning Methods to Predict High School Student Performance and Improve Early Warning Systems. (Doctoral Dissertation). University of Connecticut, Storrs, Connecticut.

  • Allensworth, E. M., & Easton, J. Q. (2005). The on-track indicator as a predictor of high school graduation. Chicago, Illinois: Consortium on Chicago School Research.

    Google Scholar 

  • Al-Shabandar, R., Hussain, A., Laws, A., Keight, R., Lunn, J., & Radi, N. (2017). Machine learning approaches to predict learning outcomes in Massive open online courses. 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 713–720). Anchorage: IEEE.

  • Ambler, G., Omar, R. Z., & Royston, P. (2007). A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Statistical methods in medical research, 16(3), 277–298.

    Article  Google Scholar 

  • American Association for the Advancement of Science (2011). Vision and change in undergraduate biology education. AAAS, Washington D.C.

  • Amrieh, E. A., Hamtini, T., & Alijarah, I. (2016). Mining educational data to predict student’s academic performance using ensemble methods. International Journal of Database Theory and Application, 9(8), 119–136.

    Article  Google Scholar 

  • Anderson, D. L., Fisher, K. M., & Norman, G. J. (2002). Development and evaluation of the conceptual inventory of natural selection. Journal of research in science teaching, 39(10), 952–978.

    Article  Google Scholar 

  • Aulck, L., Aras, R., Li, L., L’Heureux, C., Lu, P., & West, J. (2017). STEM-ming the tide: Predicting STEM attrition using student transcript data. Knowledge Discovery and Data Mining (KDD): Halifax.

    Google Scholar 

  • Baker, M. (2016). Reproducibility crisis. Nature, 533(26), 353–366.

    Google Scholar 

  • Baker, R. (2010). Data mining for education. International Encyclopedia of Education, 7(3), 112–118.

    Article  Google Scholar 

  • Bayer, J., Bydzovská, H., Géryk, J., Obšıvac, T., & Popelinský, L. (2012). Predicting Drop-Out from Social Behaviour of Students. Proceedings of the 5th International Conference on Educational Data Mining - EDM 2012, (pp. 103–109). Chania, Greece.

  • Beck, H. P., & Davidson, W. D. (2001). Establishing an early warning system: Predicting low grades in college students from survey of academic orientations scores. Research in Higher Education, 42(6), 709–723.

    Article  Google Scholar 

  • Beemer, J., Spoon, K., He, L., Fan, J., & Levine, R. (2018). Ensemble learning for estimating individualized treatment effects in student success studies. International Journal of Artificial Intelligence in Education, 28(3), 315–335.

    Article  Google Scholar 

  • Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: How closely do they match clinic interview performance? Journal of Science Education and Technology, 23(1), 160–182.

    Article  Google Scholar 

  • Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications, 3(10), 27–38.

    Google Scholar 

  • Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy, & Practice, 18(1), 5–25.

    Google Scholar 

  • Boyd, D., & Crawford, K. (2011). Six provocations for big data. A decade in internet time: Symposium on the dynamics of the internet and society (Volume 21). Oxford, UK: Oxford Internet Institute.

  • Brooks, C., & Thompson, C. (2017). Predictive modelling in teaching and learning. In C. Lang, G. Siemens, A. Wise, & D. Gašević. Handbook of learning analytics (pp. 61–68). SOLAR, Society of Learning Analytics and Research.

  • Bucos, M., & Drăgulescu, B. (2018). Predicting student success using data generated in traditional educational environments. TEM Journal, 7(3), 617.

    Google Scholar 

  • Buuren, S. V., & Groothuis-Oudshoorn, K. (2010). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–68.

    Google Scholar 

  • Chang, M. J., Sharkness, J., Hurtado, S., & Newman, C. B. (2014). What matters in college for retaining aspiring scientists and engineers from underrepresented racial groups. Journal of Research in Science Teaching, 51(5), 555–580.

    Article  Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.

    Article  Google Scholar 

  • Chung, J. Y., & Lee, S. (2019). Dropout early warning systems for high school students using machine learning. Children and Youth Services Review, 96, 346–353.

    Article  Google Scholar 

  • Cohen, W. (1995). Fast effective rule induction. In Machine Learning Proceedings 1995 (pp. 115–123). Elsevier.

  • Colton, J., Sbeglia, G., Finch, S. J., & Nehm, R. H. (2018). A quasi-experimental study of short-and long-term learning of evolution in misconception-focused classes. Paper presented at the American Educational Research Association International conference. New York: NY.

    Google Scholar 

  • Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2016). Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29.

    Article  Google Scholar 

  • Costa, E. B., Fonseca, B., Santana, M. A., de Araújo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256.

    Article  Google Scholar 

  • Croninger, R. G., & Douglas, K. M. (2005). Missing data and institutional research. New directions for institutional research, 2005(127), 33–49.

    Article  Google Scholar 

  • Cox, B. E., McIntosh, K., Reason, R. D., & Terenzini, P. T. (2014). Working with missing data in higher education research: A primer and real-world example. The Review of Higher Education, 37(3), 377–402.

    Article  Google Scholar 

  • Daniel, B.K. (2019). Improving the Pedagogy of Research Methodology through Learning Analytics. Electronics Journal of Business Research Methods, 17(1).

  • Davidson, A.C. & Hinkley, D.V. (1997). Bootstrap Methods and their Application (Volume 1). Cambridge University Press.

  • Dobson, J. L. (2008). The use of formative online quizzes to enhance class preparation and scores on summative exams. Advances in Physiology Education, 32(4), 297–302.

    Article  Google Scholar 

  • Domingos, P. (1999, August). A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 155–164).

  • Dong, Y., & Peng, C. Y. J. (2013). Principled missing data methods for researchers. SpringerPlus, 2(1), 222.

    Article  Google Scholar 

  • Eddy, S. L., Brownell, S. E., & Wenderoth, M. P. (2014). Gender gaps in achievement and participation in multiple introductory biology classrooms. CBE - Life Sciences Education, 13(3), 478–492.

    Article  Google Scholar 

  • Epling, M., Timmons, S., & Wharrad, H. (2003). An educational panopticon? New technology, nurse education and surveillance. Nurse Education Today, 23(6), 412–418.

    Article  Google Scholar 

  • Feng, M., Beck, J.E., & Heffernan, N.T. (2009). Using Learning Decomposition and Bootstrapping with Randomization to Compare the Impact of Different Educational Interventions on Learning. International Working Group on Educational Data Mining.

  • Fox, J., & Weisberg, S. (2018). An R Companion to Applied Regression. Sage Publications.

  • Friedman, J., Hastie, T., & Tibshirani, R. (2001). The Elements of Statistical Learning (Volume 1, No. 10). New York: Springer .

  • Furrow, R.E., & Hsu, J.L. (2019). Concept inventories as a resource for teaching evolution. Evolution: Education and Outreach, 12(1), 2.

  • Getachew, M. (2017). Students' Placement Prediction Model: A Data Mining Approach. (Doctoral Dissertation). Addis Ababa University, Arada, Ethiopia.

  • Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60(1), 549–576.

    Article  Google Scholar 

  • Grimes, P. (2002). The overconfident principles of economics student: An examination of a metacognitive skill. Journal of Economic Education, 33(1), 15–30.

    Article  Google Scholar 

  • Gundlach, E., Richards, K., Nelson, D., & Levesque-Bristol, C. (2015). A comparison of student attitudes, statistical reasoning, performance, and perceptions for web-augmented traditional, fully online, and flipped sections of a statistical literacy class. Journal of Statistics Education, 23(1), 1.

    Google Scholar 

  • Hake, R. R. (1998). Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66, 64–74.

    Article  Google Scholar 

  • Haudek, K. C., Kaplan, J. J., Knight, J., Long, T., Merrill, J., Munn, A., et al. (2011). Harnessing technology to improve formative assessment of student conceptions in STEM: Forging a national network. CBE - Life Science Education, 10(2), 149–155.

    Article  Google Scholar 

  • Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.

    Article  Google Scholar 

  • Jago, R., Zakeri, I., Baranowski, T., & Watson, K. (2007). Decision boundaries and receiver operating characteristic curves: New methods for determining accelerometer cutpoints. Journal of sports sciences, 25(8), 937–944.

    Article  Google Scholar 

  • Jakobsen, J. C., Gluud, C., Wetterslev, J., & Winkel, P. (2017). When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts. BMC Medical Research Methodology, 17(1), 162.

    Article  Google Scholar 

  • James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 112, p. 184). New York: Springer.

  • Jiménez, S., Angeles-Valdez, D., Villicaña, V., Reyes-Zamorano, E., Alcala-Lozano, R., Gonzalez-Olvera, J.J., & Garza-Villarreal, E.A. (2019). Identifying cognitive deficits in cocaine dependence using standard tests and machine learning. Progress in Neuro-Psychopharmacology and Biological Psychiatry, 109709.

  • Kalinowski, S. T., Leonard, M. J., & Taper, M. L. (2016). Development and validation of the conceptual assessment of natural selection (CANS). CBE - Life Sciences Education, 15(4), 64.

    Article  Google Scholar 

  • Khobragade, L. P., & Mahadik, P. (2015). Students’ academic failure prediction using data mining. International Journal of Advanced Research in Computer and Communication Engineering, 4(11), 290–298.

    Google Scholar 

  • Kirpich, A., Ainsworth, E. A., Wedow, J. M., Newman, J. R., Michailidis, G., & McIntyre, L. M. (2018). Variable selection in omics data: A practical evaluation of small sample sizes. PLoS, 13(6), e0197910.

    Article  Google Scholar 

  • Knowles, J. E. (2015). Of needles and haystacks: Building an accurate statewide dropout early warning system in Wisconsin. Journal of Educational Data Mining, 7(3), 18–67.

    Google Scholar 

  • Kotsiantis, S. (2009). Educational data mining: A case study for predicting dropout-prone students. International Journal of Knowledge Engineering and Soft Data Paradigms, 1(2), 101–111.

    Article  Google Scholar 

  • Kotsiantis, S., Patriarcheas, K., & Xenos, M. (2010). A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education. Knowledge-Based Systems, 23(6), 529–535.

    Article  Google Scholar 

  • Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics, 6(1), 1–15.

    Article  Google Scholar 

  • Kuhn, M. (2015). Caret: classification and regression training. Astrophysics Source Code Library.

  • Kumar, M., & Singh, A. (2017). Evaluation of data mining techniques for predicting student’s performance. International Journal of Modern Education and Computer Science, 9(8), 25–31.

    Google Scholar 

  • Lang, C., Siemens, G., Wise, A., & Gašević, D. (2017). The Handbook of Learning Analytics. ISBN: 978–0–9952408–0–3. DOI: https://doi.org/10.18608/hla17.

  • Lavesson, N., & Davidsson, P. (2006, July). Quantifying the impact of learning algorithm parameter tuning. In AAAI (Vol. 6, pp. 395–400).

  • Lee, U. J., Sbeglia, G. C., Ha, M., Finch, S. J., & Nehm, R. H. (2015). Clicker score trajectories and concept inventory scores as predictors for early warning systems for large STEM classes. Journal of Science Education and Technology, 24(6), 848–860.

    Article  Google Scholar 

  • Libarkin, J. C. (2008, October 13–14). Concept inventories in higher education science. Prepared for the national research council promising practices in undergraduate STEM education workshop 2. Washington D.C., United States.

  • Lisitsyna, L., & Oreshin, S. (2019). Machine Learning Approach of Predicting Learning Outcomes of MOOCs to Increase Its Performance. Smart Education and e-Learning 2019 (pp. 107–115). Springer.

  • Lu, F., & Petkova, E. (2014). A comparative study of variable selection methods in the context of developing psychiatric screening instruments. Statistics in Medicine, 33(3), 401–421.

    Article  Google Scholar 

  • Lu, W., Benson, R., Glaser, K., Platts, L., Corna, L., Worts, D., et al. (2017). Relationship between employment histories and frailty trajectories in later life: Evidence from the English Longitudinal Study of Ageing. Journal of Epidemiology Community Health, 71(5), 439–445.

    Article  Google Scholar 

  • Luengo, J., García, S., & Herrera, F. (2012). On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and information systems, 32(1), 77–108.

    Article  Google Scholar 

  • Luo, Y., Li, Z., Guo, H., Cao, H., Song, C., Guo, X., & Zhang, Y. (2017). Predicting congenital heart defects: A comparison of three data mining methods. PLoS ONE, 12(5), e0177811–e0177811.

    Article  Google Scholar 

  • Lykourentzou, I., Giannoukos, I., Mpardis, G., Nikolopoulos, V., & Loumos, V. (2009). Early and dynamic student achievement prediction in e-learning courses using neural networks. Journal of the American Society for Information Science and Technology, 60(2), 372–380.

    Article  Google Scholar 

  • Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & education, 54(2), 588–599.

    Article  Google Scholar 

  • Márquez-Vera, C., Morales, C. R., & Soto, S. V. (2013). Predicting school failure and dropout by using data mining techniques. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, 8(1), 7–14.

    Article  Google Scholar 

  • Márquez-Vera, C., Romero, C., & Ventura, S. (2010). Predicting School Failure Using Data Mining. 4th International Conference on Educational Data Mining, (p. 271). Eindhoven, Netherlands.

  • Marr, B. (2015). Big Data: Using SMART big data, analytics and metrics to make better decisions and improve performance. John Wiley & Sons, 2015.

  • Marshall, A., Altman, D. G., Royston, P., & Holder, R. L. (2010). Comparison of techniques for handling missing covariate data within prognostic modelling studies: A simulation study. BMC medical research methodology, 10(1), 7.

    Article  Google Scholar 

  • Minaei-Bidgoli, B., Kashy, D. A., Kortemeyer, G., & Punch, W. F. (2003, November). Predicting student performance: An application of data mining methods with an education web-based system. 33rd Annual Frontiers in Education, 2003. FIE 2003. (Vol. 1, pp.T2A-13). IEEE.

  • Moharreri, K., Ha, M., & Nehm, R. H. (2014). EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach, 7(1), 15.

  • Mwitondi, K. S., & Said, R. A. (2013). A data-based method for harmonising heterogeneous data modelling techniques across data mining applications. Journal of statistics applications and probability, 2(3), 157–162.

    Article  Google Scholar 

  • National Research Council. (2012). Thinking evolutionarily: evolution education across the life sciences. Washington D.C: National Academic Press.

  • National Research Council and National Academy of Education. (2011). High school dropout, graduation, and completion rates: better data, better measures, better decisions. Washington D.C.: The National Academics Press.

  • Nehm, R. H. (2019). Biology education research: Building integrative frameworks for teaching and learning about living systems. Disciplinary and Interdisciplinary Science Education Research, 1(1), 15.

    Article  Google Scholar 

  • Nehm, R. H., & Reilly, L. (2007). Biology majors’ knowledge and misconceptions of natural selection. BioScience, 57(3), 263–272.

    Article  Google Scholar 

  • Nehm, R. H., Beggrow, E. P., Opfer, E. P., & Ha, M. (2012). Reasoning about natural selection: diagnosing contextual competency using the ACORNS instrument. The American Biology Teacher, 74(2), 92–98.

    Article  Google Scholar 

  • Neild, R. C., Balfanz, R., & Herzog, L. (2007). An early warning system. Educational leadership, 65(2), 28–33.

    Google Scholar 

  • Opfer, J. E., Nehm, R. H., & Ha, M. (2012). Cognitive foundations for science assessment design: Knowing what students know about evolution. Journal of Research in Science Teaching, 49(6), 744–777.

    Article  Google Scholar 

  • Orr, R., & Foster, S. (2013). Increasing student success using online quizzing in introductory (majors) biology. CBE - Life Sciences Education, 12(3), 509–514.

    Article  Google Scholar 

  • Patel, J.A., & Sharma, P. (2014, August). Big data for better health planning. In 2014 International Conference on Advances in Engineering & Technology Research (ICAETR-2014). (pp. 1–5). IEEE.

  • PCAST, PsCoSaT. . (2012). Engage to excel: Producing one million additional college graduates with degrees in science, technology, engineering, and mathematics. Washington DC: Executive Office of the President.

    Google Scholar 

  • Perkins, N. J., & Schisterman, E. F. (2006). The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. American Journal of Epidemiology, 163(7), 670–675.

    Article  Google Scholar 

  • Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4), 525–556.

    Article  Google Scholar 

  • Prinsloo, P., Archer, E., Barnes, G., Chetty, Y., & Van Zyl, D. (2015). Big(ger) data as better data in open distance learning. International Review of Research in Open and Distributed Learning, 16(1), 284–306.

    Article  Google Scholar 

  • R Core Team. (2017). R: A Language for Statistical Computing. Vienna Austria: R Foundation for Statistical Computing. https://www.R-project.org.

  • Radwan, A., & Cataltepe, Z. (2017). Improving performance prediction on education data with noise and class imbalance. Intelligent Automation & Soft Computing, 1–8.

  • Ransom, C. J., Kitchen, N. R., Camberato, J. J., Carter, P. R., Ferguson, R. B., et al. (2019). Statistical and machine learning methods evaluated for incorporating soil and weather into corn nitrogen recommendations. Computers and Electronics in Agriculture, 164, 104872.

    Article  Google Scholar 

  • Rath, K., Peterfreund, A., Xenos, S., Bayliss, F., & Carnal, N. (2007). Supplemental instruction in introductory biology I: Enhancing the performance and retention of underrepresented minority students. CBE- Life Science Education, 6(3), 203–216.

    Article  Google Scholar 

  • Rebok, G. W., Ball, K., Guey, L. T., Jones, R. N., Kim, H. Y., Kim, H. Y., et al. (2014). Ten-year effects of the advanced cognitive training for independent and vital elderly cognitive training trial on cognition and everyday functioning in older adults. Journal of the American Geriatrics Society, 62(1), 16–24.

    Article  Google Scholar 

  • Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1–2), 1–39.

    Article  Google Scholar 

  • Rovira, S., Puertas, E., & Igual, L. (2017). Data-driven system to predict academic grades and dropout. PLoS, 12(2), e0171207.

    Article  Google Scholar 

  • Sayre, E. C., & Heckler, A. F. (2009). Peaks and decays of student knowledge in an introductory E&M course. Physical Review Special Topics-Physics Education Research, 5(1), 1–5.

    Article  Google Scholar 

  • Schisterman, E. F., Perkins, N. J., Liu, A., & Bondell, H. (2005). Optimal cut-points and its corresponding Youden index to discriminate individuals using pooled blood samples. Epidemiology, 16(1), 73–81.

    Article  Google Scholar 

  • Seymour, E. & Hunter, A.B. (Eds.) (2019). Talking about Leaving Revisited. Springer. Nature: Switzerland.

  • Shepherd, D. L., (2016). The open door of learning - Access restricted: School effectiveness and efficiency across the South African education system. (Doctoral Dissertation). Stellenbosch University, Stellenbosch, South Africa .

  • Silva, C., & Fonseca, J. (2017). Educational Data Mining: A Literature Review. Europe and MENA Cooperation Advances in Information and Communication Technologies: Advances in Intelligent Systems and Computing, vol 520 (pp. 87–94). Springer, Cham.

  • Tekin, A. (2014). Early prediction of students’ grade point averages at graduation: A data mining approach. Eurasian Journal of Educational Research, 54, 207–226.

    Article  Google Scholar 

  • Thai-Nghe, N., Gantner, Z., & Schmidt-Thieme, L. (2010). Cost-sensitive learning methods for imbalanced data. In The 2010 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Barcelona, Spain, 2010.

  • Tops, W., Callens, M., Lammertyn, J., Van Hees, V., & Brysbaert, M. (2012). Identifying students with dyslexia in higher education. Annals of Dyslexia, 62(3), 186–203.

    Article  Google Scholar 

  • Vovides, Y., Sanchez-Alonso, S., Mitropoulou, V., & Nickmans, G. (2007). The use of e-learning course management systems to support learning strategies and to improve self-regulated learning. Educational Research Review, 2(1), 64–74.

    Article  Google Scholar 

  • Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.

    Article  Google Scholar 

  • Waterhouse, J. K., Carroll, M. C., & Beeman, P. B. (1993). National council licensure examination success: Accurate prediction of student performance on the post-1988 examination. Journal of Professional Nursing, 9(5), 278–283.

    Article  Google Scholar 

  • Watson, C., Li, F., & Godwin, J. (2013). Predicting performance in an introductory programming course by logging and analyzing student programming behavior. 2013 IEEE 13th International Conference on Advanced Learning Technologies (pp. 319–323). Beijing: IEEE.

  • Xue, Y. (2018, June). Testing the differential efficacy of Data Mining Techniques to predicting student outcomes in higher education. (Doctoral Dissertation). Stony Brook University, Stony Brook, New York.

  • Yang, Q., & Wu, X. (2006). 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 5, 597–604.

    Article  Google Scholar 

  • Yukselturk, E., Ozekes, S., & Turel, Y. K. (2014). Predicting dropout student: An application of data mining methods in an online education program. European Journal of Open, Distance, and e-learning, 17(1), 118–133.

    Article  Google Scholar 

  • Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020). Applying machine learning in science assessments: A systematic review. Studies in Science Education, 56(1), 111–151.

    Article  Google Scholar 

Download references

Acknowledgements

We sincerely thank Drs. Yaqi Xue and Nora Galambos for assembling the databases that we analyzed in this study. We thank the guest editor and anonymous reviewers for providing thoughtful and helpful comments to improve this manuscript.

Funding

The Howard Hughes Medical Institute Science Education Program provided funding. The views in this contribution reflect those of the authors and not necessarily those of HHMI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto Bertolini.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 264 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bertolini, R., Finch, S.J. & Nehm, R.H. Testing the Impact of Novel Assessment Sources and Machine Learning Methods on Predictive Outcome Modeling in Undergraduate Biology. J Sci Educ Technol 30, 193–209 (2021). https://doi.org/10.1007/s10956-020-09888-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10956-020-09888-8

Keywords

Navigation