Abstract
High dropout rates in computer science programs are a major global issue. In this chapter, we describe the results of the educational machine learning case study with the aim of early prediction of computer science students’ dropout in the Virumaa College of Tallinn University of Technology. For many years TalTech Virumaa College has faced a high dropout rate among first-year students. Among computer science students it is about 40%. In order to reduce the dropout rate, interviews and surveys with student candidates are carried out in the College admission process since the academic year 2019/20 to assess students’ motivation and readiness to study and ensure the suitability of the specialty. To determine factors that influence attrition rates we analyze the data collected during the admission process and apply the text mining techniques to analyze the students’ essays conducted at the end of the first-year study in the frame of the course Introduction to the Specialty. Using these factors, we evaluate predictive models and therefore make recommendations to improve the admission process and reduce the drop-out rate for first-year students.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Flinders K (2019) Computer science undergraduates most likely to drop out. Comput Weekly
ICT—Trade with Estonia. https://tradewithestonia.com/sectors/ict/
Estonia—the smartEST research country. https://www.etag.ee/en/funding/programmes/nutikas/estonia-the-smartest-research-country-2/
Maksimova N, Pentel A, Dunajeva O (2021) Predicting first-year computer science students drop-out with machine learning methods: a case study. In: Auer ME, Rüütmann T (eds) educating engineers for future industrial revolutions. ICL 2020. Advances in Intelligent Systems and Computing, vol 1329. Springer, Cham. https://doi.org/10.1007/978-3-030-68201-9_70
Ashenafi M, Riccardi G, Ronchetti M (2015) Predicting students’ final exam scores from their course activities. 2015 IEEE frontiers in education conference (FIE), 1–9. https://doi.org/10.1109/FIE.2015.7344081
Fire M, Katz G, Elovici Y, Shapira B, Rokach L (2012) Predicting student exam’s scores by analyzing social network data. In: Huang R, Ghorbani AA, Pasi G, Yamaguchi T, Yen NY, Jin B (eds) active media technology. AMT 2012. Lecture Notes in Computer Science, vol 7669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35236-2_59
Pentel A, Kaiva L-L (2020) Predicting students’ state examination results based on previous grades and demographics. 2020 11th international conference on information, intelligence, systems and applications (IISA), 1–6. https://doi.org/10.1109/IISA50023.2020.9284401
Farooq M, Chaudhry A, Shafiq M, Berhanu G (2011) Factors affecting students’ quality of academic performance: a case of secondary school level. J Qual Technol Manag 7(2):1–14. https://doi.org/10.3126/ijssm.v6i1.22561
Hijazi S, Naqvi S (2006) Factors affecting students’ performance: a case of private colleges. Bangladesh e-J Soc 3:1–10
Yadav S, Pal S (2012) Data mining: a prediction for performance improvement of engineering students using classification. World of Comput Sci Inf Technol J (WCSIT) 2(2):51–56
Del Bonifro F, Gabbrielli M, Lisanti G, Zingaro SP (2020) Student dropout prediction. In: Bittencourt I, Cukurova M, Muldner K, Luckin R, Millán E (eds) artificial intelligence in education. AIED 2020. Lecture Notes in Computer Science, vol 12163. Springer, Cham. https://doi.org/10.1007/978-3-030-52237-7_11
Vuttipittayamongkol P (2016) Predicting factors of academic performance. 2016 Second asian conference on defence technology (ACDT), 161–166. https://doi.org/10.1109/ACDT.2016.7437662
Guarin C, Guzmán E, González FA (2014) Data mining model to predict academic performance at the Universidad Nacional de Colombia. 12th Latin American and Caribbean conference for engineering and technology (LACCEI2014) July 22–24, 2014
Guarin C, León-Guzmán E, González F (2015) A model to predict low academic performance at a specific enrollment using data mining. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje 10:119–125. https://doi.org/10.1109/RITA.2015.2452632
Garcia E, Mora M (2011) Model prediction of academic performance for first year students. 2011 10th Mexican international conference on artificial intelligence, 169–174. https://doi.org/10.1109/MICAI.2011.28
Parack, S., Zahid, Z., Merchant, F. Application of data mining in educational databases for predicting academic trends and patterns. 2012 IEEE International Conference on Technology Enhanced Education (ICTEE), 1–4 (2012), doi: https://doi.org/10.1109/ICTEE.2012.6208617
Romero C, Ventura S (2010) Educational data mining: a review of the state of the art. IEEE Trans Syst Man Cybern Part C (Appl Rev) 40:601–618. https://doi.org/10.1109/TSMCC.2010.2053532
Cristea A, Alamri A, Kayama M, Stewart C, Alshehri M, Shi L (2018) Earliest predictor of dropout in MOOCs: a longitudinal study of futurelearn courses. InAndersson B, Johansson B, Carlsson S, Barry C, Lang M, Linger H, Schneider C (Eds.), designing digitalization (ISD2018 Proceedings). Lund, Sweden: Lund University. http://aisel.aisnet.org/isd2014/proceedings2018/Education/5
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2021). https://www.R-project.org/
Field A (2009) Discovering statistics using SPSS, 3rd edn. SAGE Publications, London
Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in R. J Stat Softw 25(5):1–54. https://doi.org/10.18637/jss.v025.i05
Silge J, Robinson D (2016) tidytext: text mining and analysis using tidy data principles in R. J Open Source Soft 1(3):37. https://doi.org/10.21105/joss.00037
Benoit et al (2018) quanteda: An R package for the quantitative analysis of textual data. J Open Source Soft 3(30):774. https://doi.org/10.21105/joss.00774
Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Documen 60(5):503–520. https://doi.org/10.1108/00220410410560582
Maas HD (1972) Zusammenhang zwischen Wortschatzumfang und Länge eines Textes. Zeitschrift für Literaturwissenschaft und Linguistik 8:73–79
URKUND (Plagiarism Checker)—Ouriginal. https://urkund.com
Kincaid J, Fishburne R, Rogers R et al (1975) Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel: research Branch Report 8–75. Millington, TN: Naval Technical Training, U.S. Naval Air Station; Memphis, TN (1975)
Watery Text term—SEO Hero. https://seoheronews.com/watery-text
Everything You Need to Know About SEO Text Analysis. https://marketedly.com/everything-you-need-to-know-about-seo-text-analysis/
Lunardon N, Menardi G, Torelli N (2014) ROSE: a package for binary imbalanced learning. R J 6(1):79–89. https://doi.org/10.32614/rj-2014-008
Kuhn M (2008) Building predictive models in r using the caret package. J Stat Soft 28(5):1–26. https://doi.org/10.18637/jss.v028.i05
Kuhn M (2018) Caret: classification and regression training. https://CRAN.R-project.org/package=caret
Pentel (2015) Effect of different feature types on age based classification of short texts. 2015 6th international conference on information, intelligence, systems and applications (IISA), 1–7. https://doi.org/10.1109/IISA.2015.7388069
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Maksimova, N., Pentel, A., Dunajeva, O. (2022). Computer Science Students Early Drop-Out Prediction Using Machine Learning: A Case Study. In: Auer, M.E., Pester, A., May, D. (eds) Learning with Technologies and Technologies in Learning. Lecture Notes in Networks and Systems, vol 456. Springer, Cham. https://doi.org/10.1007/978-3-031-04286-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-04286-7_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04285-0
Online ISBN: 978-3-031-04286-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)