Skip to main content

Prediction of Academic Dropout in University Students Using Data Mining: Engineering Case

  • Chapter
  • First Online:
Advances in Cybernetics, Cognition, and Machine Learning for Communication Technologies

Abstract

Student dropout is considered an important indicator for measuring social mobility and reflecting the social contribution that universities offer. In economic terms, there is evidence that students attribute their decision to defect from their academic programs because of their economic situation. Dropout causes significant waging gaps among people who complete their tertiary studies compared to those who do not, leading to a lack of skilled human capital that pays greater productivity to economic development of a country. Given the above, the objective of this study is to present a tree-based classification of decisions (CBAD) with optimized parameters to predict the dropout of students at Colombian universities. The study analyses 10,486 cases of students from three private universities with similar characteristics. The result of the application of this technique with optimized parameters achieved a precision ratio of 88.14%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50(1):159–175

    Article  Google Scholar 

  2. Duan L, Xu L, Liu Y, Lee J (2009) Cluster-based outlier detection. Ann Oper Res 168(1):151–168

    Article  MathSciNet  Google Scholar 

  3. Haykin S (1999) Neural networks a comprehensive foundation, 2nd edn. Macmillan College Publishing, Inc., New York. ISBN: 9780023527616

    Google Scholar 

  4. Haykin S (2009) Neural networks and learning machines. Prentice Hall International, London, NJ

    Google Scholar 

  5. Isasi P, Galván I (2004) Redes de neuronas artificiales. Un enfoque Práctico. Pearson, London. ISBN: 8420540250

    Google Scholar 

  6. Kulkarni S, Haidar I (2009) Forecasting model for crude oil price using artificial neural networks and commodity future prices. Int J Comput Sci Inf Secur 2(1):81–89

    Google Scholar 

  7. Mazón JN, Trujillo J, Serrano M, Piattini M (2005) Designing data warehouses: from business requirement analysis to multidimensional modeling. In: Proceedings of the 1st international workshop on requirements engineering for business need and IT alignment. Paris, France

    Google Scholar 

  8. Izquierdo NV, Lezama OBP, Dorta RG, Viloria A, Deras I, Hernández-Fernández L (2018) Fuzzy logic applied to the performance evaluation. Honduran coffee sector case. In: Tan Y, Shi Y, Tang Q (eds) Advances in swarm intelligence. ICSI 2018. Lecture notes in computer science, vol 10942. Springer, Berlin

    Google Scholar 

  9. Pineda Lezama O, Gómez Dorta R (2017) Techniques of multivariate statistical analysis: an application for the Honduran banking sector. Innovare: J Sci Technol 5(2):61–75

    Google Scholar 

  10. Viloria A, Lis-Gutierrez JP, Gaitán-Angulo M, Godoy ARM, Moreno GC, Kamatkar SJ (2018) Methodology for the design of a student pattern recognition tool to facilitate the teaching—learning process through knowledge data discovery (big data). In: Tan Y, Shi Y, Tang Q (eds) Data mining and big data. DMBD 2018. Lecture notes in computer science, vol 10943. Springer, Berlin

    Google Scholar 

  11. Ben Salem S, Naouali S, Chtourou Z (2018) A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach. Comput Electronic Eng 68:463–483. https://doi.org/10.1016/j.compeleceng.2018.04.023

    Article  Google Scholar 

  12. Chakraborty S, Das S (2018) Simultaneous variable weighting and determining the number of clusters—a weighted gaussian algorithm means. Stat Probab Lett 137:148–156. https://doi.org/10.1016/j.spl.2018.01.015

    Article  MATH  Google Scholar 

  13. Abhay KA, Badal NA (2015) Novel approach for intelligent distribution of data warehouses. Egypt Inform J 17(1):147–159

    Google Scholar 

  14. Aguado-López E, Rogel-Salazar R, Becerril-García A, Baca-Zapata G (2009) Presencia de universidades en la Red: La brecha digital entre Estados Unidos y el resto del mundo. Revista de Universidad y Sociedad del Conocimiento 6(1):1–17

    Google Scholar 

  15. Bontempi G, Ben Taieb S, Borgne YA (2013) Machine learning strategies for time series forecasting. In: Aufaure M-A, Zimányi E (eds) Lecture notes in business information processing, vol 138, no 1. Springer, Heidelberg, pp 70–73

    Google Scholar 

  16. Parthasarathy S et al (2001) Parallel data mining for association rules on shared-memory systems. Knowl Inf Syst 3(1):1–29

    Article  MathSciNet  Google Scholar 

  17. Grossman RL, Bailey SM, Sivakumar H, Turinsky AL (1999) Papyrus: a system for data mining over local and wide area clusters and super-clusters. In: Proceedings of ACM/IEEE conference on supercomputing, Article No 63

    Google Scholar 

  18. Chattratichat J, Darlington J, Guo Y, Hedvall S, Kohler M, Syed J (1999) An architecture for distributed enterprise data mining. In: Proceedings of 7th international conference on high performance computing and networking, Netherlands, 12–14 Apr, pp 573–582

    Google Scholar 

  19. Wang L et al (2013) G-hadoop: MapReduce across distributed data centers for data-intensive computing. Futur Gener Comput Syst 29(3):739–750

    Article  Google Scholar 

  20. Butenhof DR (1997) Programming with POSIX threads. Addison-Wesley, Boston

    Google Scholar 

  21. Bhaduri K, Wolf R, Giannella C, Kargupta H (2008) Distributed decision-tree induction in peer-to-peer systems. Stat Anal Data Min 1(2):85–103

    Article  MathSciNet  Google Scholar 

  22. Rafailidis D, Kefalas P, Manolopoulos Y (2017) Preference dynamics with multimodal user-item interactions in social media recommendation. Expert Syst Appl 74:11–18

    Article  Google Scholar 

  23. Vásquez C, Torres M, Viloria A (2017) Public policies in science and technology in Latin American countries with universities in the top 100 of web ranking. J Eng Appl Sci 12(11):2963–2965

    Google Scholar 

  24. Aguado-López E, Rogel-Salazar R, Becerril-García A, Baca-Zapata G (2009) Presencia de universidades en la Red: La brecha digital entre Estados Unidos y el resto del mundo. Revista de Universidad y Sociedad del Conocimento 6(1):1–17

    Google Scholar 

  25. Torres-Samuel M, Vásquez C, Viloria A, Lis-Gutiérrez JP, Borrero TC, Varela N (2018) Web visibility profiles of top 100 Latin American universities. In: Tan Y, Shi Y, Tang Q (eds) Data mining and big data. DMBD 2018. Lecture notes in computer science, vol 10943. Springer, Berlin

    Google Scholar 

  26. Caicedo EJC, Guerrero S, López D (2016) Propuesta para la construcción de un índice socioeconómico para los estudiantes que presentan las pruebas Saber Pro. Comunicaciones en Estadística 9(1):93–106 (85–97 English)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesús Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Silva, J. et al. (2020). Prediction of Academic Dropout in University Students Using Data Mining: Engineering Case. In: Gunjan, V., Senatore, S., Kumar, A., Gao, XZ., Merugu, S. (eds) Advances in Cybernetics, Cognition, and Machine Learning for Communication Technologies. Lecture Notes in Electrical Engineering, vol 643. Springer, Singapore. https://doi.org/10.1007/978-981-15-3125-5_49

Download citation

Publish with us

Policies and ethics