Skip to main content
Log in

Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Increasing student retention has been a common goal of many academic institutions, especially in the university level. The negative effects of student attrition are evident to students, parents, university and the society as a whole. The first-year students are at the greatest risk of dropping out or not completing their degree on time. With this insight, a number of data mining methods have been developed for early detection of students at risk of dropout, hence the immediate application of assistive measure. As compared to western countries, this subject has attracted only a few studies in Thai university, with educational data mining being limited to the use of conventional classification models. This paper presents the most recent investigation of student dropout at Mae Fah Luang University, Thailand, and the novel reuse of link-based cluster ensemble as a data transformation framework for more accurate prediction. The empirical study on mixed-type data collection related to students’ demographic detail, academic performance and enrollment record, suggests that the proposed approach is usually more effective than several benchmark transformation techniques, across different classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Overview and implementation of these dimensionality reduction methods in Matlab are available at http://www.cad.zju.edu.cn/home/dengcai/Data/DimensionReduction.html.

References

  1. Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230

    Article  Google Scholar 

  2. Antons C, Maltz E (2006) Expanding the role of institutional research at small private universities: a case study in enrollment management using data mining. New Dir Inst Res 131:69–81

    Google Scholar 

  3. Baepler P, Murdoch CJ (2010) Academic analytics and data mining in higher education. Int J Scholarsh Teach Learn 4(2):1–9

    Google Scholar 

  4. Bala M, Ojha DB (2012) Study of applications of data mining techniques in education. Int J Res Sci Technol 1:1–10

    Google Scholar 

  5. Boongoen T, Shang C, Iam-On N, Shen Q (2011) Extending data reliability measure to a filter approach for soft subspace clustering. IEEE Trans Syst Man Cybern Part B 41(6):1705–1714

    Article  Google Scholar 

  6. Cai D, He X, Han J (2007) Isometric projection. In: Proceedings og AAAI Conference on Artificial Intelligence, pp 528–533

  7. Carroll J, Green P, Chaturvedi A (1997) Mathematical tools for applied multivariate analysis. Academic Press, San Diego, CA

    MATH  Google Scholar 

  8. Dettling M, Buhlmann P (2003) Boosting for tumor classification with gene expression data. Bioinformatics 19:1061–1069

    Article  Google Scholar 

  9. Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley-Interscience, New York

    MATH  Google Scholar 

  10. Erdogan SZ, Timor M (2005) A data mining application in a student database. J Aeronaut Space Technol 2(2):53–57

    Google Scholar 

  11. Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of International Conference on Machine Learning, pp 36–43

  12. Fischer B, Buhmann JM (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11):1411–1415

    Article  Google Scholar 

  13. Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Article  Google Scholar 

  14. Harb HM, Moustafa MA (2012) Selecting optimal subset of features for student performance model. Int J Comput Sci Issues 9(5):253–262

    Google Scholar 

  15. He X, Cai D, Yan S, Zhang HJ (2005a) Neighborhood preserving embedding. In: Proceedings of International Conference on Computer Vision, pp 1208–1213

  16. He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005b) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340

    Article  Google Scholar 

  17. Horstmanshof L, Zimitat C (2007) Future time orientation predicts academic engagement among first-year university students. Br J Educ Psychol 77(3):703–718

    Article  Google Scholar 

  18. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304

    Article  Google Scholar 

  19. Iam-On N, Boongoen T (2013) Revisiting link-based cluster ensembles for microarray data classification. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp 4543–4548

  20. Iam-On N, Garrett S (2010) LinkCluE: A MATLAB package for link-based cluster ensembles. J Stat Softw 36(9):1–36

    Article  Google Scholar 

  21. Iam-On N, Boongoen T, Garrett S (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Proceedings of Eleventh International Conference on Discovery Science, pp 222–233

  22. Iam-On N, Boongoen T, Garrett S (2010) LCE: A link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519

    Article  Google Scholar 

  23. Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based approach to the cluster ensemble problem. IEEE Trans Pattern Anal Mach Intell 33(12):2396–2409

    Article  Google Scholar 

  24. Iam-On N, Boongoen T, Garrett S, Price C (2012) A link-based cluster ensemble approach for categorical data clustering. IEEE Trans Knowl Data Eng 24(3):413–425

    Article  Google Scholar 

  25. Iam-On N, Boongoen T, Garrett SM, Price C (2013) New cluster ensemble approach to integrative biological data analysis. Int J Data Min Bioinform 8(2):159–168

    Google Scholar 

  26. Kabra RR, Bichkar RS (2011) Performance prediction of engineering students using decision trees. Int J Comput Appl 36(11):8–12

    Google Scholar 

  27. Kim E, Kim S, Ashlock D, Nam D (2009) MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering. BMC Bioinform 10:260

    Article  Google Scholar 

  28. Koedinger K, Cunningham K, Skogsholm A, Leber B (2008) An open repository and analysis tools for fine-grained, longitudinal learner data. In: Proceedings of First International Conference on Educational Data Mining, pp 157–166

  29. Kongsakun K, Fung CC (2012) Neural network modeling for an intelligent recommendation system supporting srm for universities in thailand. WSEAS Trans Comput 11(2):34–44

    Google Scholar 

  30. Kotsiantis S, Pierrakeas C, Pintelas P (2004) Prediction of student’s performance in distance learning using machine learning techniques. Appl Artif Intell 18(5):411–426

    Article  Google Scholar 

  31. Luan J, Zhao CM (2006) Practicing data mining for enrollment management and beyond. New Dir Inst Res 31(1):117–122

    Google Scholar 

  32. Ma J, Tian D, Gong M, Jiao L (2014) Fuzzy clustering with non-local information for image segmentation. Int J Mach Learn Cybern 5(6):845–859

    Article  Google Scholar 

  33. Miller LD, Soh LK (2013) Meta-reasoning algorithm for improving analysis of student interactions with learning objects using supervised learning. In: Proceedings of 6th International Conference on Educational Data Mining, pp 129–136

  34. Monti S, Tamayo P, Mesirov JP, Golub TR (2003) Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118

    Article  MATH  Google Scholar 

  35. Mostow J, Beck J (2006) Some useful tactics to modify, map and mine data from intelligent tutors. Nat Lang Eng 12:195–208

    Article  Google Scholar 

  36. Nasierding G, Tsoumakas G, Kouzani AZ (2009) Clustering based multi-label classification for image annotation and retrieval. In: Proceedings of IEEE International Conference on System, Man and Cybernetics, pp 4514–4519

  37. Nguyen DV, Rocke DM (2002) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18:39–50

    Article  Google Scholar 

  38. Nguyen HH, Harbi N, Darmont J (2011) An efficient fuzzy clustering-based approach for intrusion detection. In: Proceedings of IEEE International Conference on Data Mining, pp 607–612

  39. Noble K, Flynn NT, Lee JD, Hilton D (2007) Predicting successful college experiences: evidence from a first year retention program. J Coll Stud Retent Res Theory Pract 9(1):39–60

    Article  Google Scholar 

  40. Ramaswami M, Bhaskaran R (2010) A CHAID based performance prediction model in educational data mining. Int J Comput Sci 7(1):10–18

    Google Scholar 

  41. Rana S, Jasola S, Kumar R (2013) A boundary restricted adaptive particle swarm optimization for data clustering. Int J Mach Learn Cybern 4(4):391–400

    Article  Google Scholar 

  42. Reuther P, Walter B (2006) Survey on test collections and techniques for personal name matching. Int J Metadata Semant Ontol 1(2):89–99

    Article  Google Scholar 

  43. Romero C, Ventura S (2010) Educational data mining: a review of the state-of-the-art. IEEE Trans Syst Man Cybern Part C 40:601–618

    Article  Google Scholar 

  44. Romero C, Ventura S (2013) Data mining in education. Wiley Interdisciplinary Reviews. Data Min Knowl Discov 3(1):12–27

    Article  Google Scholar 

  45. Sang-Woon K (2010) A pre-clustering technique for optimizing subclass discriminant analysis. Pattern Recognit Lett 31(6):462–468

    Article  Google Scholar 

  46. Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the k-means clustering method. Int J Mach Learn Cybern 4(2):107–117

    Article  Google Scholar 

  47. Sittichai R (2012) Why are there dropouts among university students? Experiences in a Thai University. Int J Educ Dev 32:283–289

    Article  Google Scholar 

  48. Strayhorn TL (2009) An examination of the impact of first-year seminars on correlates of college student retention. J First Year Exp Stud Transit 21(1):9–27

    Google Scholar 

  49. Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  50. Subyam S (2009) Causes of dropout and program incompletion among undergraduate students from the faculty of engineering, king mongkut university of technology north bangkok. In: Proceedings of 8th National Conference on Engineering Education

  51. Sung-Hyuk C, Tappert C (2009) Constructing binary decision trees using genetic algorithms. J Pattern Recognit Res 1:1–13

    Google Scholar 

  52. Tinto V (2006) Research and practice of student retention: What next? J Coll Stud Retent Res Theory Pract 8(1):1–20

    Article  Google Scholar 

  53. Topchy AP, Jain AK, Punch WF (2005) Clustering ensembles: Models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881

    Article  Google Scholar 

  54. Vandamme J, Meskens N, Superby J (2007) Predicting academic performance by data mining methods. Educ Econ 15(4):405–419

    Article  Google Scholar 

  55. Verma B, Rahman A (2012) Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning. IEEE Trans Knowl Data Eng 24(4):605–618

    Article  Google Scholar 

  56. West M, Blanchette C, Fressman H, Huang E, Ishida S, Spang R, Zuan H, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer using gene expression profiles. Proc Natl Acad Sci USA 98(20):11462–11467

    Article  Google Scholar 

  57. Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recognit 42(1):93–104

    Article  MATH  Google Scholar 

  58. Yadav SK, Pal S (2012) Data mining application in enrollment management: a case study. Int J Comput Appl 41(5):1–6

    Google Scholar 

  59. Yeung DS, Wang XZ (2002) Improving performance of similarity-based clustering by feature weight learning. IEEE Trans Pattern Anal Mach Intell 24(4):556–561

    Article  Google Scholar 

  60. Yu C, Gangi SD, Jannasch-Pennell A, Kaprolet C (2010) A data mining approach for identifying predictors of student retention from sophomore to junior year. J Data Sci 8:307–325

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natthakan Iam-On.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Iam-On, N., Boongoen, T. Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. Int. J. Mach. Learn. & Cyber. 8, 497–510 (2017). https://doi.org/10.1007/s13042-015-0341-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-015-0341-x

Keywords

Navigation