Classification, Clustering and Association Rule Mining in Educational Datasets Using Data Mining Tools: A Case Study

  • Sadiq Hussain
  • Rasha Atallah
  • Amirrudin Kamsin
  • Jiten Hazarika
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 765)


Educational Data Mining is an emerging field in the data mining domain. In this competitive world scenario, the quality of education needs to improve. Unfortunately most of the students’ data are becoming data tombs for not analyzing the hidden knowledge. The educational data mining tries to uncover the hidden knowledge by discovering relationships between student learning characteristics and behavior. With this educational data modeling, the educators may plan for future learning pedagogy to support the student’s learning style. This knowledge may be applied by the academic planners to improve the quality of education and decrease the failure rate. In this paper, we had collected real dataset containing 666 instances with 11 attributes. The data is from the Common Entrance Examination (CEE) data of a particular year for admission to medical colleges of Assam, India conducted by Dibrugarh University. We tried to find out the association rules using the data. Various clustering and classification methods were also used to compare the suitable one for the dataset. The data mining tools applied in the educational data were Orange, Weka and R Studio.


Classification Clustering Association rule mining Educational data mining Data mining tools 


  1. 1.
    Bhardwaj, B.K., Pal, S.: Data mining: a prediction for performance improvement using classification. Int. J. Comput. Sci. Inf. Secur. (IJCSIS) 9(4), 136–140 (2012)Google Scholar
  2. 2.
    Yadav, S.K., Pal, S.: Data mining: a prediction for performance improvement of engineering students using classification. World Comput. Sci. Inf. Technol. J. 2(2), 51–56 (2012)Google Scholar
  3. 3.
    Kukasvadiya, M.S., Divecha, N.H.: Analysis of Data Using Data Mining tool Orange. Int. J. Eng. Develop. Res. 5(2), 1836–1840 (2017)Google Scholar
  4. 4.
    DeFreitas, K., Bernard, M.: Comparative performance analysis of clustering techniques in educational data mining. IADIS Int. J. Comput. Sci. Inf. Syst. 10(2), 65–78 (2015)Google Scholar
  5. 5.
    Dutt, A., Aghabozrgi, S., Ismail, M.A.B., Mahroein, H.: Clustering algorithms applied in educational data mining. Int. J. Inf. Electron. Eng. 5(2), 112–116 (2015)Google Scholar
  6. 6.
    Nagy, H.M., Aly, W.M., Hegazy, O.F.: An educational data mining system for advising higher education students. Int. J. Comput. Inf. Eng. 7(10), 1226–1270 (2013)Google Scholar
  7. 7.
    Oyelade, O.J., Oladipupo, O.O., Obagbuwa, I.C.: Application of K-means clustering algorithm for prediction of students’ academic performance. Int. J. Comput. Sci. Inf. Secur. 7(1), 292–295 (2010)Google Scholar
  8. 8.
    Almarabeh, H.: Analysis of students’ performance by using different data mining classifiers. Int. J. Mod. Educ. Comput. Sci. 9(8), 9–15 (2017)CrossRefGoogle Scholar
  9. 9.
    Sivogolovko, E., Novikov, B.: Validating cluster structures in data mining tasks. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops on - EDBT-ICDT 2012, p. 245. ACM, New York (2012)Google Scholar
  10. 10.
    Everitt, B.: Cluster Analysis. Wiley, Chichester (2011). ISBN 9780470749913CrossRefzbMATHGoogle Scholar
  11. 11.
    Park, H.S., Jun, C.H.: A simple and fast algorithm for K-medoids clustering. Exp. Syst. Appl. 36(2), 3336–3341 (2009)CrossRefGoogle Scholar
  12. 12.
    Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Patt. Anal. Mach. Intel. 24(12), 1650–1654 (2002)CrossRefGoogle Scholar
  13. 13.
    Berkhin, P.P.: A Survey of Clustering Data Mining Techniques. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Chang, W.L., Pang, L.M., Tay, K.M.: Application of self-organizing map to failure modes and effects analysis methodology. Neurocomputing (2017).
  15. 15.
    Ahmed, A.B.E.D., Elaraby, I.S.: Data mining: a prediction for student’s performance using classification method. World J. Comput. Appl. Technol. 2(2), 43–47 (2014)Google Scholar
  16. 16.
    Pandey, U.K., Pal, S.: Data mining: a prediction of performer or underperformer using classification. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 2(2), 686–690 (2011)Google Scholar
  17. 17.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience Publication, New York (2000)zbMATHGoogle Scholar
  18. 18.
    Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)CrossRefzbMATHGoogle Scholar
  19. 19.
    Jiawei, H., Micheline, K.: Data Mining: Concepts and Techniques. Elsevier Book Series (2000)Google Scholar
  20. 20.
    Rakesh, A., Ramakrishnan, S.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp. 487–499 (1994)Google Scholar
  21. 21.
    Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 79–82 (2005)CrossRefGoogle Scholar
  22. 22.
    Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNetGoogle Scholar
  23. 23.
    de Amorim, R.C., Hennig, C.: Recovering the number of clusters in data sets with noise features using feature rescaling factors. Inf. Sci. 324, 126–145 (2015). Scholar
  24. 24.
    Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications, pp. 207–212, 2nd edn. Springer, New York (2005). ISBN 0-387-94845-7Google Scholar
  25. 25.
    Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Stajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan, B.: Orange: data mining toolbox in Python. JMLR. 14(1), 2349–2353 (2013)zbMATHGoogle Scholar
  26. 26.
    Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)Google Scholar
  27. 27.
    Verzani, J.: Getting Started with RStudio, p. 4. O’Reilly Media, Inc. (2011). ISBN 9781449309039Google Scholar
  28. 28.
    Sharma, A., Dey, S.: Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. IJCA 3, 15–20 (2012). Special Issue on Advanced Computing and Communication Technologies for HPC Applications ACCTHPCAGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Sadiq Hussain
    • 1
  • Rasha Atallah
    • 2
  • Amirrudin Kamsin
    • 3
  • Jiten Hazarika
    • 4
  1. 1.Dibrugarh UniversityAssamIndia
  2. 2.Faculty of Computer Science and ITUniversity of MalayaKuala LumpurMalaysia
  3. 3.Department of Computer System and TechnologyUniversity of MalayaKuala LumpurMalaysia
  4. 4.Department of StatisticsDibrugarh UniversityAssamIndia

Personalised recommendations