Skip to main content

Advertisement

Log in

Application of logistic regression to predict the failure of students in subjects of a mathematics undergraduate course

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

The large rates of students’ failure is a very frequent problem in undergraduate courses, being even more evident in exact sciences. Pointing out the reasons of such problem is a paramount research topic, though not an easy task. An alternative is to use Educational Data Mining techniques (EDM), which enables one to convert data from educational database into useful information, in order to understand and improve teaching and learning processes. In this way, the objective of this paper is to propose mathematical models based on EDM techniques to estimate the probability of a student in a mathematics degree course at IFSP (Federal Institute of São Paulo) to fail in exact sciences disciplines, and later on, indicate which aspects contribute significantly for the Students’ failure rates in these branches. We present three logistic regression models that which were applied based on socioeconomic data and student performance over 4 years. For interpretation and evaluation of such models, odds ratio, ten-fold Cross Validation method and the metrics: accuracy, sensitivity, specificity and area under the ROC curve (AUC) were used. It was noted that through Cross Validation, the models achieved accuracy values accounting for over 70%, sensitivity over 70%, specificity over 60% and AUC over 0.75. Analyzing the predictive variables of these models, we identified that factors such as advantage age, rates of failure through the course and attendance in initial semesters can increase the probability of failure in exact science disciplines in the analyzed course.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Not applicable.

Code availability

Not applicable.

Notes

  1. Available in: https://ifsp.edu.br/acoes-e-programas/106-reitoria/conselhos-e-nucleos/858-comite-de-etica-em-pesquisa-com-seres-humanos-cep

  2. Available in: https://www.ifsp.edu.br/images/reitoria/Comites/RESOLUO-510-de07abril_2016---CONEP.pdf

References

  • Abu Saa, A., Al-Emran, M., & Shaalan, K. (2019). Factors affecting students’ performance in higher education: A systematic review of predictive data mining techniques. Technology, Knowledge and Learning, 24(4), 567–598.

    Article  Google Scholar 

  • Aina, C. (2013). Parental background and university dropout in Italy Parental background and university dropout in italy. Higher Education, 65(4), 437–456.

    Article  Google Scholar 

  • Al-Radaideh, Q. A., Al-Shawakfa, E. M., & Al-Najjar, M. I. (2006). Mining student data using decision trees. In International Arab conference on information technology (ACIT’2006). Jordan: Yarmouk University.

  • Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, 177–194.

    Article  Google Scholar 

  • Barbosa, A. C. D. C., & Concordido, C. F. R. (2009). Ensino colaborativo em Ciências Exatas. Ensino Saúde e Ambiente, 2(3).

  • Barufi, M. C. B. (1999). A construção/negociação de significados no curso universitário inicial de Cálculo Diferencial e Integral. São Paulo: FE–USP.

    Book  Google Scholar 

  • Bhardwaj, B. K., & Pal, S. (2011). Data mining: A prediction for performance improvement using classification (IJCSIS). The International Journal of Computer Science and Information Security, 9(4), 136–140.

    Google Scholar 

  • Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 145–1159.

    Article  Google Scholar 

  • Cabral, C. I. S. (2013). Aplicação do modelo de regressão logística num estudo de mercado. School Tese de Mestrado. Universidade de Lisboa.

  • Cunha, J. P. Z. (2019). Um estudo comparativo das técnicas de validação cruzada aplicadas a modelos mistos. School Tese de Mestrado. São Paulo Instituto de Matemática e Estatística da Universidade de São Paulo (IME - USP).

  • DesJardins, S. L., Ahlburg, D. A., & McCall, B. P. (2002). A temporal investigation of factors related to timely degree completion. The Journal of Higher Education, 73(5), 555–581.

    Article  Google Scholar 

  • Fernandes Filho, O. P. (2001). O desenvolvimento cognitivo e a reprovação no curso de engenharia. In: XXIX Congresso Brasileiro de Ensino de Engenharia, pp 15–22. Porto Alegre.

  • Garman, G., et al. (2010). A logistic approach to predicting student success in online database courses. American Journal of Business Education (AJBE), 3(12), 1–6.

    Article  Google Scholar 

  • George, G., Moore, E., & Patey, M. (1994). A simple model for predicting success in an engineering programme. International Journal of Engineering Education, 10, 268–268.

    Google Scholar 

  • Goldfinch, J., & Hughes, M. (2007). Skills, learning styles and success of first-year undergraduates. Active Learning in Higher Education, 8(3), 259–273.

    Article  Google Scholar 

  • Henning, E., Moro, G., Pacheco, P. S., & Konrath, A. C. (2015). Fatores determinantes para o sucesso na disciplina de cálculo diferencial e integral aplicando a regressão logística. Revista de Ensino de Ciências e Engenharia, 6(1), 122–141.

    Google Scholar 

  • Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2000). Applied logistic regression. New York: Wiley.

    Book  MATH  Google Scholar 

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in r. New York: Springer.

    Book  MATH  Google Scholar 

  • Kato, L. A., Gerônimo, J. R., Cardoso, V. C., Zanella, M. S., Niro, K. L., & de Souza, J. T. G. (2015). Performance of first-year undergraduate students attending exact sciences courses in problems of the additive conceptual field. Acta Scientiarum Education, 37(4), 383–390.

    Article  Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence (IJCAI), (Vol. 14 pp. 1137–1145).

  • Kovacic, Z. (2010). Early prediction of student success: Mining students enrolment data. Informing Science & IT Education Conference (InSITE).

  • Lopes, A. (1999). Algumas reflexões sobre a questão do alto índice de reprovação nos cursos de Cálculo da UFRGS. Sociedade Brasileira de Matemática Rio de Janeiro, 26(/27), 123–146.

    Google Scholar 

  • López-Díaz, M. T., & Peña, M. (2022). Improving calculus curriculum in engineering degrees: Implementation of technological applications. Mathematics, 10(3), 341.

    Article  Google Scholar 

  • Lunardon, N., Menardi, G., & Torelli, N. (2014). Rose: A package for binary imbalanced learning. R Journal, 6(1).

  • Machado, S. (2008). Teoria das situações didáticas. EDUC (Série Trilhas): São Paulo.

    Google Scholar 

  • Namoun, A., & Alshanqiti, A. (2020). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences, 11(1), 237.

    Article  Google Scholar 

  • Parsons, S. J. (2004). Overcoming poor failure rates in mathematics for engineering students: A support perspective. Newport: Harper Adams University College.

    Google Scholar 

  • Pereira, M. V. C. (2018). Análise sobre os índices de reprovação nos cursos de Cálculo I da UFERSA Trabalho de Conclusão de Curso. Rio Grande do Norte: Universidade Federal Rural do Semi-Árido (UFERSA).

    Google Scholar 

  • Rezende, W. M. (2003). O ensino de Cálculo: Dificuldades de natureza epistemológica Tese de Doutorado. Universidade de São Paulo (USP): São Paulo.

    Google Scholar 

  • Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618.

    Article  Google Scholar 

  • Silva, A. C., et al. (2016). Análise dos índices de reprovação nas disciplinas de Cálculo I e AVGA do curso de Engenharia Elétrica do Instituto Federal da Bahia de Vitória da Conquista. XIV International Conference on Engineering and Technology Education.

  • Spackman, K. A. (1989). Signal detection theory: Valuable tools for evaluating inductive learning. In: Proceedings of the sixth international workshop on machine learning, pp 160–163.

  • Vassiliadis, P., Simitsis, A., & Skiadopoulos, S. (2002). Conceptual modeling for ETL processes. In: Proceedings of the 5th ACM international workshop on data warehousing and OLAP, pp 14–21.

  • Wu, X. (2018). Persistence and characteristics of calculus I students in STEM disciplines. West Virginia University.

Download references

Funding

This study was funded by IFSP’s institutional program, PIBIFSP, process number 23305.016744.2019-35.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Stella F. Costa and Michael M. Diniz. The first draft of the manuscript was written by Stella F. Costa and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Stella F. Costa.

Ethics declarations

Ethics approval

According to the Ethics Committee in Research with Human Beings of the Federal Institute of São Paulo (CEP-IFSP),Footnote 1 all projects which must be submitted to the Committee are defined in Article 1 of resolution 510/2016.Footnote 2 According to the aforementioned document (article 1, paragraph V and VII), it was not necessary to register and evaluate the activity proposed in this work, since it is a database research, whose information is aggregated, without the possibility of individual identification.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflict of interest/Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Stella F. Costa and Michael M. Diniz contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Costa, S.F., Diniz, M.M. Application of logistic regression to predict the failure of students in subjects of a mathematics undergraduate course. Educ Inf Technol 27, 12381–12397 (2022). https://doi.org/10.1007/s10639-022-11117-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-022-11117-1

Keywords

Navigation