Extracting Patterns from Educational Traces via Clustering and Associated Quality Metrics

  • Marian Cristian Mihăescu
  • Alexandru Virgil Tănasie
  • Mihai DascaluEmail author
  • Stefan Trausan-Matu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9883)


Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.


Clustering quality metrics Pattern extraction k-means clustering Learner performance 



The work presented in this paper was partially funded by the FP7 2008-212578 LTfLL project and by the EC H2020 project RAGE (Realising and Applied Gaming Eco-System) Grant agreement No. 644187.


  1. 1.
    Koedinger, K.R., Baker, R., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J.: A data repository for the EDM community: the PSLC DataShop. In: Romero, C., Ventura, S., Pechenizkiy, M., Baker, R. (eds.) Handbook of Educational Data Mining. CRC Press, Boca Raton (2010)Google Scholar
  2. 2.
    Cortez, P., Silva, A.: Using data mining to predict secondary school student performance. In: 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), Porto, Portugal, pp. 5–12 (2008)Google Scholar
  3. 3.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  4. 4.
    Burdescu, D.D., Mihaescu, M.C.: TESYS: e-learning application built on a web platform. In: International Conference on e-Business (ICE-B 2006), Setúbal, Portugal (2006)Google Scholar
  5. 5.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  6. 6.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Dasgupta, S., Long, P.M.: Performance guarantees for hierarchical clustering. J. Comput. Syst. Sci. 70(4), 555–569 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. AAAI Press (1996)Google Scholar
  9. 9.
    Jackson, D.A., Somers, K.M., Harvey, H.H.: Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence? Am. Nat. 133(3), 436–453 (1989)CrossRefGoogle Scholar
  10. 10.
    Sneath, P.H.A., Sokal, R.R.: Principles of Numerical Taxonomy. W.H. Freeman, San Francisco (1963)zbMATHGoogle Scholar
  11. 11.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  12. 12.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. An Introduction to Cluster Analysis. Wiley-Interscience, New York (1990)zbMATHGoogle Scholar
  13. 13.
    Jugo, I., Kovačić, B., Tijan, E.: Cluster analysis of student activity in a web-based intelligent tutoring system. Sci. J. Maritime Res. 29, 75–83 (2015)Google Scholar
  14. 14.
    Hompes, B.F.A., Verbeek, H.M.W., van der Aalst, W.M.P.: Finding suitable activity clusters for decomposed process discovery. In: Ceravolo, P., Russo, B., Accorsi, R. (eds.) SIMPDA 2014. LNBIP, vol. 237, pp. 32–57. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-27243-6_2 CrossRefGoogle Scholar
  15. 15.
    Meilă, M.: Comparing clusterings by the variation of information. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 173–187. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Patrikainen, A., Meilă, M.: Comparing subspace clusterings. IEEE Trans. Knowl. Data Eng. 18(7), 902–916 (2006)CrossRefGoogle Scholar
  17. 17.
    Wallace, D.L.: Comment. J. Am. Stat. Assoc. 383, 569–576 (1983)Google Scholar
  18. 18.
    Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 383, 553–569 (1983)CrossRefzbMATHGoogle Scholar
  19. 19.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)CrossRefGoogle Scholar
  20. 20.
    Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Press, Boston (1996)CrossRefzbMATHGoogle Scholar
  21. 21.
    Stein, B., Meyer zu Eissen, S., Wißbrock, F.: On cluster validity and the information need of users. In: 3rd IASTED International Conference on Artificial Intelligence and Applications (AIA 2003), Benalmádena, Spain, pp. 404–413 (2003)Google Scholar
  22. 22.
    Ben-David, S., Ackerman, M.: Measures of clustering quality: a working set of axioms for clustering. In: Neural Information Processing Systems Conference (NIPS 2008), pp. 121–128 (2009)Google Scholar
  23. 23.
    Bogarín, A., Romero, C., Cerezo, R., Sánchez-Santillán, M.: Clustering for improving educational process mining. In: 4th International Conference on Learning Analytics and Knowledge (LAK 2014), pp. 11–15. ACM, New York (2014)Google Scholar
  24. 24.
    Li, C., Yoo, J.: Modeling student online learning using clustering. In: 44th Annual Southeast Regional Conference (ACM-SE 44), pp. 186–191. ACM, New York (2006)Google Scholar
  25. 25.
    Bian, H.: Clustering student learning activity data. In: 3rd International Conference on Educational Data Mining, Pittsburgh, PA, pp. 277–278 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Marian Cristian Mihăescu
    • 1
  • Alexandru Virgil Tănasie
    • 1
  • Mihai Dascalu
    • 2
    Email author
  • Stefan Trausan-Matu
    • 2
  1. 1.Department of Computer ScienceUniversity of CraiovaCraiovaRomania
  2. 2.Computer Science DepartmentUniversity Politehnica of BucharestBucharestRomania

Personalised recommendations