Skip to main content
Log in

A stacking ensemble machine learning method for early identification of students at risk of dropout

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

Early dropout of students is one of the bigger problems that universities face currently. Several machine learning techniques have been used for detecting students at risk of dropout. By using sociodemographic data and qualifications of the previous level, the accuracy of these predictive models is good enough for implementing retention programs. In addition, by using grades of the first semesters, the accuracy of these models increases. Nevertheless, the classification errors produced by these models cause undetected students to be discarded from the retention programs, whereas students with no actual risk consume additional resources. In order to provide more accurate models, we propose the use of a stacking ensemble technique to obtain an improved combined dropout model, while using relatively few variables. The model results show values on the expected ranges for an early dropout model, but with considerably fewer features and historical information, and we show that deploying the models would be cost-efficient for the institution if applied towards an intervention program.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are available in the Institute for the Future of Education’s Educational Innovation collection of the Tecnologico de Monterrey’s Data Hub repository, https://doi.org/10.57687/FK2/PWJRSJ.

Abbreviations

CRISP-DM :

Cross Industry Standard Process for Data Mining

RFE :

Recursive Feature Elimination

SMOTE :

Synthetic Minority Oversampling Technique

ROC :

Receiver Operating Characteristic curve

PRC :

Precision Recall Curve

AUC :

Area Under the Curve

LR :

Logistic regression

KNN :

k-Nearest Neighbors

References

Download references

Acknowledgements

The authors would like to acknowledge the Living Lab & Data Hub of the Institute for the Future of Education, Tecnológico de Monterrey, Mexico, for the data published through the Call “Bringing New Solutions to the Challenges of Predicting and Countering Student Dropout in Higher Education” used in the production of this work.

Funding

The authors would like to thank the financial support from Tecnológico de Monterrey through the “Challenge-Based Research Funding Program 2022”. Project ID # I004 - IFE001 - C2-T3 – T.

Author information

Authors and Affiliations

Authors

Contributions

JAT performed the literature search, data analysis, and developed the 1st draft of the document. HC critically reviewed the work, provided commentary, supervised, and guided the final development of the article. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Juan Andrés Talamás-Carvajal.

Ethics declarations

Institutional review board statement

Privacy issues related to the collection, curation, and publication of student data were validated with Tecnológico de Monterrey’s Data Owners and the Data Security and Information Management Departments.

Competing interests

Juan Talamás and Héctor Ceballos declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Talamás-Carvajal, J.A., Ceballos, H.G. A stacking ensemble machine learning method for early identification of students at risk of dropout. Educ Inf Technol 28, 12169–12189 (2023). https://doi.org/10.1007/s10639-023-11682-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-023-11682-z

Keywords

Navigation