Skip to main content

A Causal Inference Study on the Effects of First Year Workload on the Dropout Rate of Undergraduates

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13355)


In this work, we evaluate the risk of early dropout in undergraduate studies using causal inference methods, and focusing on groups of students who have a relatively higher dropout risk. We use a large dataset consisting of undergraduates admitted to multiple study programs at eight faculties/schools of our university. Using data available at enrollment time, we develop Machine Learning (ML) methods to predict university dropout and underperformance, which show an AUC of 0.70 and 0.74 for each risk respectively. Among important drivers of dropout over which the first-year students have some control, we find that first year workload (i.e., the number of credits taken) is a key one, and we mainly focus on it. We determine the effect of taking a relatively lighter workload in the first year on dropout risk using causal inference methods: Propensity Score Matching (PSM), Inverse Propensity score Weighting (IPW), Augmented Inverse Propensity Weighted (AIPW), and Doubly Robust Orthogonal Random Forest (DROrthoForest). Our results show that a reduction in workload reduces dropout risk.


  • University dropout
  • Machine learning
  • Causal inference
  • Average treatment effect

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    These students have an opportunity of taking a resit exam which may finally result in passing or failing the subject, but given that passing the regular exam at the end of the course is expected, we consider failing the regular exam as underperforming.

  2. 2.

    ENG: Engineering, HUM: Humanities, TRA: Translation and Language Sciences, POL: Political and Social Sciences, HEA: Health and Life Sciences, ECO: Economics and Business, COM: Communication.


  1. Albreiki, B., Zaki, N., Alashwal, H.: A systematic literature review of student’ performance prediction using machine learning techniques. Educ. Sci. 11(9), 552 (2021)

    CrossRef  Google Scholar 

  2. Athey, S.: Machine learning and causal inference for policy evaluation. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 5–6 (2015)

    Google Scholar 

  3. Athey, S., Wager, S.: Estimating treatment effects with causal forests: an application. Observational Stud. 5(2), 37–51 (2019)

    CrossRef  Google Scholar 

  4. Aulck, L., Velagapudi, N., Blumenstock, J., West, J.: Predicting student dropout in higher education. arXiv preprint arXiv:1606.06364 (2016)

  5. Battocchi, K., et al.: EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation, version 0.x (2019).

  6. Bray, B.C., Dziak, J.J., Patrick, M.E., Lanza, S.T.: Inverse propensity score weighting with a latent class exposure: estimating the causal effect of reported reasons for alcohol use on problem alcohol use 16 years later. Prev. Sci. 20(3), 394–406 (2019).

    CrossRef  Google Scholar 

  7. Bukralia, R., Deokar, A.V., Sarnikar, S.: Using academic analytics to predict dropout risk in E-learning courses. In: Iyer, L.S., Power, D.J. (eds.) Reshaping Society through Analytics, Collaboration, and Decision Support. AIS, vol. 18, pp. 67–93. Springer, Cham (2015).

    CrossRef  Google Scholar 

  8. Choi, Y.: Student employment and persistence: evidence of effect heterogeneity of student employment on college dropout. Res. High. Educ. 59(1), 88–107 (2018).

    CrossRef  Google Scholar 

  9. Chounta, I.A., Uiboleht, K., Roosimäe, K., Pedaste, M., Valk, A.: From data to intervention: predicting students at-risk in a higher education institution. In: Companion Proceedings 10th International Conference on Learning Analytics & Knowledge (LAK20) (2020)

    Google Scholar 

  10. Del Bonifro, F., Gabbrielli, M., Lisanti, G., Zingaro, S.P.: Student dropout prediction. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 129–140. Springer, Cham (2020).

    CrossRef  Google Scholar 

  11. Gardner, J., Brooks, C., Baker, R.: Evaluating the fairness of predictive student models through slicing analysis. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 225–234 (2019)

    Google Scholar 

  12. Glynn, A.N., Quinn, K.M.: An introduction to the augmented inverse propensity weighted estimator. Polit. Anal. 18(1), 36–56 (2010)

    CrossRef  Google Scholar 

  13. Hutt, S., Gardner, M., Duckworth, A.L., D’Mello, S.K.: Evaluating fairness and generalizability in models predicting on-time graduation from college applications. International Educational Data Mining Society (2019)

    Google Scholar 

  14. Karimi-Haghighi, M., Castillo, C., Hernandez-Leo, D., Oliver, V.M.: Predicting early dropout: calibration and algorithmic fairness considerations. In: ADORE Workshop at the International Conference on Learning Analytics & Knowledge (LAK) (2021)

    Google Scholar 

  15. Kemper, L., Vorhoff, G., Wigger, B.U.: Predicting student dropout: a machine learning approach. Eur. J. High. Educ. 10(1), 28–47 (2020)

    CrossRef  Google Scholar 

  16. Kizilcec, R.F., Lee, H.: Algorithmic fairness in education. arXiv preprint arXiv:2007.05443 (2020)

  17. Larrabee Sønderlund, A., Hughes, E., Smith, J.: The efficacy of learning analytics interventions in higher education: a systematic review. Br. J. Edu. Technol. 50(5), 2594–2618 (2019)

    CrossRef  Google Scholar 

  18. Leitner, P., Khalil, M., Ebner, M.: Learning analytics in higher education—a literature review. In: Peña-Ayala, A. (ed.) Learning Analytics: Fundaments, Applications, and Trends. SSDC, vol. 94, pp. 1–23. Springer, Cham (2017).

    CrossRef  Google Scholar 

  19. Lemmerich, F., Ifl, M., Puppe, F.: Identifying influence factors on students success by subgroup discovery. In: Educational Data Mining 2011 (2010)

    Google Scholar 

  20. Márquez-Vera, C., Cano, A., Romero, C., Noaman, A.Y.M., Mousa Fardoun, H., Ventura, S.: Early dropout prediction using data mining: a case study with high school students. Expert. Syst. 33(1), 107–124 (2016)

    CrossRef  Google Scholar 

  21. Masserini, L., Bini, M.: Does joining social media groups help to reduce students’ dropout within the first university year? Socioecon. Plann. Sci. 73, 100865 (2021)

    CrossRef  Google Scholar 

  22. Modena, F., Rettore, E., Tanzi, G.M.: The effect of grants on university dropout rates: evidence from the Italian case. J. Hum. Cap. 14(3), 343–370 (2020)

    CrossRef  Google Scholar 

  23. Nagy, M., Molontay, R.: Predicting dropout in higher education based on secondary school performance. In: 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES), pp. 000389–000394. IEEE (2018)

    Google Scholar 

  24. Olaya, D., Vásquez, J., Maldonado, S., Miranda, J., Verbeke, W.: Uplift modeling for preventing student dropout in higher education. Decis. Support Syst. 134, 113320 (2020)

    CrossRef  Google Scholar 

  25. Pal, S.: Mining educational data to reduce dropout rates of engineering students. Int. J. Inf. Eng. Electron. Bus. 4(2), 1 (2012)

    Google Scholar 

  26. Plagge, M.: Using artificial neural networks to predict first-year traditional students second year retention rates. In: Proceedings of the 51st ACM Southeast Conference, pp. 1–5 (2013)

    Google Scholar 

  27. Romero, C., Ventura, S.: Guest editorial: special issue on early prediction and supporting of learning performance. IEEE Trans. Learn. Technol. 12(2), 145–147 (2019)

    CrossRef  Google Scholar 

  28. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)

    CrossRef  MathSciNet  Google Scholar 

  29. Sclater, N., Peasgood, A., Mullan, J.: Learning analytics in higher education. Jisc, London, p. 176 (2016). Accessed 8 Feb 2017

    Google Scholar 

  30. Shapiro, D., et al.: Completing college: a national view of student completion rates-fall 2011 cohort (2017)

    Google Scholar 

  31. Siemens, G.: Learning analytics: the emergence of a discipline. Am. Behav. Sci. 57(10), 1380–1400 (2013)

    CrossRef  Google Scholar 

  32. Syed, M., Anggara, T., Lanski, A., Duan, X., Ambrose, G.A., Chawla, N.V.: Integrated closed-loop learning analytics scheme in a first year experience course. In: Proceedings of the 9th International Conference on Learning Analytics & Knowledge, pp. 521–530 (2019)

    Google Scholar 

  33. Tanvir, H., Chounta, I.-A.: Exploring the importance of factors contributing to dropouts in higher education over time. Int. Educ. Data Min. Soc. (2021). ERIC

    Google Scholar 

  34. Viberg, O., Hatakka, M., Bälter, O., Mavroudi, A.: The current landscape of learning analytics in higher education. Comput. Hum. Behav. 89, 98–110 (2018)

    CrossRef  Google Scholar 

  35. Vossensteyn, J.J., et al.: Dropout and completion in higher education in Europe: main report (2015)

    Google Scholar 

Download references


This work has been partially supported by: the HUMAINT programme (Human Behaviour and Machine Intelligence), Joint Research Centre, European Commission; “la Caixa” Foundation (ID 100010434), under the agreement LCF/PR/PR16/51110009; and the EU-funded “SoBigData++” project, under Grant Agreement 871042. In addition, D. Hernández-Leo acknowledges the support by ICREA under the ICREA Academia programme, and the National Research Agency of the Spanish Ministry (PID2020-112584RB-C33/MICIN/AEI/10.13039/501100011033).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marzieh Karimi-Haghighi .

Editor information

Editors and Affiliations

Ethics declarations

Ethics and Data Protection

We remark that the Data Protection Authority of the studied university performed an ethics and legal review of our research.

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Karimi-Haghighi, M., Castillo, C., Hernández-Leo, D. (2022). A Causal Inference Study on the Effects of First Year Workload on the Dropout Rate of Undergraduates. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11643-8

  • Online ISBN: 978-3-031-11644-5

  • eBook Packages: Computer ScienceComputer Science (R0)