Skip to main content

Computer Science Students Early Drop-Out Prediction Using Machine Learning: A Case Study

  • Chapter
  • First Online:
Learning with Technologies and Technologies in Learning

Abstract

High dropout rates in computer science programs are a major global issue. In this chapter, we describe the results of the educational machine learning case study with the aim of early prediction of computer science students’ dropout in the Virumaa College of Tallinn University of Technology. For many years TalTech Virumaa College has faced a high dropout rate among first-year students. Among computer science students it is about 40%. In order to reduce the dropout rate, interviews and surveys with student candidates are carried out in the College admission process since the academic year 2019/20 to assess students’ motivation and readiness to study and ensure the suitability of the specialty. To determine factors that influence attrition rates we analyze the data collected during the admission process and apply the text mining techniques to analyze the students’ essays conducted at the end of the first-year study in the frame of the course Introduction to the Specialty. Using these factors, we evaluate predictive models and therefore make recommendations to improve the admission process and reduce the drop-out rate for first-year students.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Flinders K (2019) Computer science undergraduates most likely to drop out. Comput Weekly

    Google Scholar 

  2. ICT—Trade with Estonia. https://tradewithestonia.com/sectors/ict/

  3. Estonia—the smartEST research country. https://www.etag.ee/en/funding/programmes/nutikas/estonia-the-smartest-research-country-2/

  4. Maksimova N, Pentel A, Dunajeva O (2021) Predicting first-year computer science students drop-out with machine learning methods: a case study. In: Auer ME, Rüütmann T (eds) educating engineers for future industrial revolutions. ICL 2020. Advances in Intelligent Systems and Computing, vol 1329. Springer, Cham. https://doi.org/10.1007/978-3-030-68201-9_70

  5. Ashenafi M, Riccardi G, Ronchetti M (2015) Predicting students’ final exam scores from their course activities. 2015 IEEE frontiers in education conference (FIE), 1–9. https://doi.org/10.1109/FIE.2015.7344081

  6. Fire M, Katz G, Elovici Y, Shapira B, Rokach L (2012) Predicting student exam’s scores by analyzing social network data. In: Huang R, Ghorbani AA, Pasi G, Yamaguchi T, Yen NY, Jin B (eds) active media technology. AMT 2012. Lecture Notes in Computer Science, vol 7669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35236-2_59

  7. Pentel A, Kaiva L-L (2020) Predicting students’ state examination results based on previous grades and demographics. 2020 11th international conference on information, intelligence, systems and applications (IISA), 1–6. https://doi.org/10.1109/IISA50023.2020.9284401

  8. Farooq M, Chaudhry A, Shafiq M, Berhanu G (2011) Factors affecting students’ quality of academic performance: a case of secondary school level. J Qual Technol Manag 7(2):1–14. https://doi.org/10.3126/ijssm.v6i1.22561

    Article  Google Scholar 

  9. Hijazi S, Naqvi S (2006) Factors affecting students’ performance: a case of private colleges. Bangladesh e-J Soc 3:1–10

    Google Scholar 

  10. Yadav S, Pal S (2012) Data mining: a prediction for performance improvement of engineering students using classification. World of Comput Sci Inf Technol J (WCSIT) 2(2):51–56

    Google Scholar 

  11. Del Bonifro F, Gabbrielli M, Lisanti G, Zingaro SP (2020) Student dropout prediction. In: Bittencourt I, Cukurova M, Muldner K, Luckin R, Millán E (eds) artificial intelligence in education. AIED 2020. Lecture Notes in Computer Science, vol 12163. Springer, Cham. https://doi.org/10.1007/978-3-030-52237-7_11

  12. Vuttipittayamongkol P (2016) Predicting factors of academic performance. 2016 Second asian conference on defence technology (ACDT), 161–166. https://doi.org/10.1109/ACDT.2016.7437662

  13. Guarin C, Guzmán E, González FA (2014) Data mining model to predict academic performance at the Universidad Nacional de Colombia. 12th Latin American and Caribbean conference for engineering and technology (LACCEI2014) July 22–24, 2014

    Google Scholar 

  14. Guarin C, León-Guzmán E, González F (2015) A model to predict low academic performance at a specific enrollment using data mining. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje 10:119–125. https://doi.org/10.1109/RITA.2015.2452632

    Article  Google Scholar 

  15. Garcia E, Mora M (2011) Model prediction of academic performance for first year students. 2011 10th Mexican international conference on artificial intelligence, 169–174. https://doi.org/10.1109/MICAI.2011.28

  16. Parack, S., Zahid, Z., Merchant, F. Application of data mining in educational databases for predicting academic trends and patterns. 2012 IEEE International Conference on Technology Enhanced Education (ICTEE), 1–4 (2012), doi: https://doi.org/10.1109/ICTEE.2012.6208617

  17. Romero C, Ventura S (2010) Educational data mining: a review of the state of the art. IEEE Trans Syst Man Cybern Part C (Appl Rev) 40:601–618. https://doi.org/10.1109/TSMCC.2010.2053532

  18. Cristea A, Alamri A, Kayama M, Stewart C, Alshehri M, Shi L (2018) Earliest predictor of dropout in MOOCs: a longitudinal study of futurelearn courses. InAndersson B, Johansson B, Carlsson S, Barry C, Lang M, Linger H, Schneider C (Eds.), designing digitalization (ISD2018 Proceedings). Lund, Sweden: Lund University. http://aisel.aisnet.org/isd2014/proceedings2018/Education/5

  19. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2021). https://www.R-project.org/

  20. Field A (2009) Discovering statistics using SPSS, 3rd edn. SAGE Publications, London

    MATH  Google Scholar 

  21. Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in R. J Stat Softw 25(5):1–54. https://doi.org/10.18637/jss.v025.i05

  22. Silge J, Robinson D (2016) tidytext: text mining and analysis using tidy data principles in R. J Open Source Soft 1(3):37. https://doi.org/10.21105/joss.00037

    Article  Google Scholar 

  23. Benoit et al (2018) quanteda: An R package for the quantitative analysis of textual data. J Open Source Soft 3(30):774. https://doi.org/10.21105/joss.00774

    Article  Google Scholar 

  24. Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Documen 60(5):503–520. https://doi.org/10.1108/00220410410560582

    Article  Google Scholar 

  25. Maas HD (1972) Zusammenhang zwischen Wortschatzumfang und Länge eines Textes. Zeitschrift für Literaturwissenschaft und Linguistik 8:73–79

    Google Scholar 

  26. URKUND (Plagiarism Checker)—Ouriginal. https://urkund.com

  27. Kincaid J, Fishburne R, Rogers R et al (1975) Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel: research Branch Report 8–75. Millington, TN: Naval Technical Training, U.S. Naval Air Station; Memphis, TN (1975)

    Google Scholar 

  28. Watery Text term—SEO Hero. https://seoheronews.com/watery-text

  29. Everything You Need to Know About SEO Text Analysis. https://marketedly.com/everything-you-need-to-know-about-seo-text-analysis/

  30. Lunardon N, Menardi G, Torelli N (2014) ROSE: a package for binary imbalanced learning. R J 6(1):79–89. https://doi.org/10.32614/rj-2014-008

    Article  Google Scholar 

  31. Kuhn M (2008) Building predictive models in r using the caret package. J Stat Soft 28(5):1–26. https://doi.org/10.18637/jss.v028.i05

  32. Kuhn M (2018) Caret: classification and regression training. https://CRAN.R-project.org/package=caret

  33. Pentel (2015) Effect of different feature types on age based classification of short texts. 2015 6th international conference on information, intelligence, systems and applications (IISA), 1–7. https://doi.org/10.1109/IISA.2015.7388069

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olga Dunajeva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Maksimova, N., Pentel, A., Dunajeva, O. (2022). Computer Science Students Early Drop-Out Prediction Using Machine Learning: A Case Study. In: Auer, M.E., Pester, A., May, D. (eds) Learning with Technologies and Technologies in Learning. Lecture Notes in Networks and Systems, vol 456. Springer, Cham. https://doi.org/10.1007/978-3-031-04286-7_25

Download citation

Publish with us

Policies and ethics