Abstract
The large rates of students’ failure is a very frequent problem in undergraduate courses, being even more evident in exact sciences. Pointing out the reasons of such problem is a paramount research topic, though not an easy task. An alternative is to use Educational Data Mining techniques (EDM), which enables one to convert data from educational database into useful information, in order to understand and improve teaching and learning processes. In this way, the objective of this paper is to propose mathematical models based on EDM techniques to estimate the probability of a student in a mathematics degree course at IFSP (Federal Institute of São Paulo) to fail in exact sciences disciplines, and later on, indicate which aspects contribute significantly for the Students’ failure rates in these branches. We present three logistic regression models that which were applied based on socioeconomic data and student performance over 4 years. For interpretation and evaluation of such models, odds ratio, ten-fold Cross Validation method and the metrics: accuracy, sensitivity, specificity and area under the ROC curve (AUC) were used. It was noted that through Cross Validation, the models achieved accuracy values accounting for over 70%, sensitivity over 70%, specificity over 60% and AUC over 0.75. Analyzing the predictive variables of these models, we identified that factors such as advantage age, rates of failure through the course and attendance in initial semesters can increase the probability of failure in exact science disciplines in the analyzed course.
Similar content being viewed by others
Data availability
Not applicable.
Code availability
Not applicable.
References
Abu Saa, A., Al-Emran, M., & Shaalan, K. (2019). Factors affecting students’ performance in higher education: A systematic review of predictive data mining techniques. Technology, Knowledge and Learning, 24(4), 567–598.
Aina, C. (2013). Parental background and university dropout in Italy Parental background and university dropout in italy. Higher Education, 65(4), 437–456.
Al-Radaideh, Q. A., Al-Shawakfa, E. M., & Al-Najjar, M. I. (2006). Mining student data using decision trees. In International Arab conference on information technology (ACIT’2006). Jordan: Yarmouk University.
Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, 177–194.
Barbosa, A. C. D. C., & Concordido, C. F. R. (2009). Ensino colaborativo em Ciências Exatas. Ensino Saúde e Ambiente, 2(3).
Barufi, M. C. B. (1999). A construção/negociação de significados no curso universitário inicial de Cálculo Diferencial e Integral. São Paulo: FE–USP.
Bhardwaj, B. K., & Pal, S. (2011). Data mining: A prediction for performance improvement using classification (IJCSIS). The International Journal of Computer Science and Information Security, 9(4), 136–140.
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 145–1159.
Cabral, C. I. S. (2013). Aplicação do modelo de regressão logística num estudo de mercado. School Tese de Mestrado. Universidade de Lisboa.
Cunha, J. P. Z. (2019). Um estudo comparativo das técnicas de validação cruzada aplicadas a modelos mistos. School Tese de Mestrado. São Paulo Instituto de Matemática e Estatística da Universidade de São Paulo (IME - USP).
DesJardins, S. L., Ahlburg, D. A., & McCall, B. P. (2002). A temporal investigation of factors related to timely degree completion. The Journal of Higher Education, 73(5), 555–581.
Fernandes Filho, O. P. (2001). O desenvolvimento cognitivo e a reprovação no curso de engenharia. In: XXIX Congresso Brasileiro de Ensino de Engenharia, pp 15–22. Porto Alegre.
Garman, G., et al. (2010). A logistic approach to predicting student success in online database courses. American Journal of Business Education (AJBE), 3(12), 1–6.
George, G., Moore, E., & Patey, M. (1994). A simple model for predicting success in an engineering programme. International Journal of Engineering Education, 10, 268–268.
Goldfinch, J., & Hughes, M. (2007). Skills, learning styles and success of first-year undergraduates. Active Learning in Higher Education, 8(3), 259–273.
Henning, E., Moro, G., Pacheco, P. S., & Konrath, A. C. (2015). Fatores determinantes para o sucesso na disciplina de cálculo diferencial e integral aplicando a regressão logística. Revista de Ensino de Ciências e Engenharia, 6(1), 122–141.
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2000). Applied logistic regression. New York: Wiley.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in r. New York: Springer.
Kato, L. A., Gerônimo, J. R., Cardoso, V. C., Zanella, M. S., Niro, K. L., & de Souza, J. T. G. (2015). Performance of first-year undergraduate students attending exact sciences courses in problems of the additive conceptual field. Acta Scientiarum Education, 37(4), 383–390.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence (IJCAI), (Vol. 14 pp. 1137–1145).
Kovacic, Z. (2010). Early prediction of student success: Mining students enrolment data. Informing Science & IT Education Conference (InSITE).
Lopes, A. (1999). Algumas reflexões sobre a questão do alto índice de reprovação nos cursos de Cálculo da UFRGS. Sociedade Brasileira de Matemática Rio de Janeiro, 26(/27), 123–146.
López-Díaz, M. T., & Peña, M. (2022). Improving calculus curriculum in engineering degrees: Implementation of technological applications. Mathematics, 10(3), 341.
Lunardon, N., Menardi, G., & Torelli, N. (2014). Rose: A package for binary imbalanced learning. R Journal, 6(1).
Machado, S. (2008). Teoria das situações didáticas. EDUC (Série Trilhas): São Paulo.
Namoun, A., & Alshanqiti, A. (2020). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences, 11(1), 237.
Parsons, S. J. (2004). Overcoming poor failure rates in mathematics for engineering students: A support perspective. Newport: Harper Adams University College.
Pereira, M. V. C. (2018). Análise sobre os índices de reprovação nos cursos de Cálculo I da UFERSA Trabalho de Conclusão de Curso. Rio Grande do Norte: Universidade Federal Rural do Semi-Árido (UFERSA).
Rezende, W. M. (2003). O ensino de Cálculo: Dificuldades de natureza epistemológica Tese de Doutorado. Universidade de São Paulo (USP): São Paulo.
Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618.
Silva, A. C., et al. (2016). Análise dos índices de reprovação nas disciplinas de Cálculo I e AVGA do curso de Engenharia Elétrica do Instituto Federal da Bahia de Vitória da Conquista. XIV International Conference on Engineering and Technology Education.
Spackman, K. A. (1989). Signal detection theory: Valuable tools for evaluating inductive learning. In: Proceedings of the sixth international workshop on machine learning, pp 160–163.
Vassiliadis, P., Simitsis, A., & Skiadopoulos, S. (2002). Conceptual modeling for ETL processes. In: Proceedings of the 5th ACM international workshop on data warehousing and OLAP, pp 14–21.
Wu, X. (2018). Persistence and characteristics of calculus I students in STEM disciplines. West Virginia University.
Funding
This study was funded by IFSP’s institutional program, PIBIFSP, process number 23305.016744.2019-35.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Stella F. Costa and Michael M. Diniz. The first draft of the manuscript was written by Stella F. Costa and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
According to the Ethics Committee in Research with Human Beings of the Federal Institute of São Paulo (CEP-IFSP),Footnote 1 all projects which must be submitted to the Committee are defined in Article 1 of resolution 510/2016.Footnote 2 According to the aforementioned document (article 1, paragraph V and VII), it was not necessary to register and evaluate the activity proposed in this work, since it is a database research, whose information is aggregated, without the possibility of individual identification.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflict of interest/Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Stella F. Costa and Michael M. Diniz contributed equally to this work.
Rights and permissions
About this article
Cite this article
Costa, S.F., Diniz, M.M. Application of logistic regression to predict the failure of students in subjects of a mathematics undergraduate course. Educ Inf Technol 27, 12381–12397 (2022). https://doi.org/10.1007/s10639-022-11117-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-022-11117-1