SIDEKICK: Linear Correlation Clustering with Supervised Background Knowledge

  • Maximilian Archimedes Xaver HünemörderEmail author
  • Daniyal Kazempour
  • Peer Kröger
  • Thomas Seidl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11807)


While explainable AI (XAI) is gaining in popularity, other more traditional machine learning algorithms can also benefit from increased explainability. A semi-supervised approach to correlation clustering opens up a promising design space that might provide such explainability to correlation clustering algorithms. In this work, semi-supervised linear correlation clustering is defined as the task of finding arbitrary oriented subspace clusters using only a small sample of supervised background knowledge provided by a domain experts. This work describes a first foray into this novel approach and provides an implementation of a basic algorithm to perform this task. We have found that even a small amount of supervised background knowledge can significantly improve the quality of correlation clustering in general. With confidence it can be stated, the results of this work have the potential to inspire several more semi-supervised approaches to correlation clustering in the future.


Clustering Subspace Correlation Semi-supervised Background knowledge 



This work has been funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibilities for its content.


  1. 1.
    Achtert, E., Böhm, C., David, J., Kröger, P., Zimek, A.: Global correlation clustering based on the hough transform. Stat. Anal. Data Min.: ASA Data Sci. J. 1(3), 111–127 (2008)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Deriving quantitative models for correlation clusters. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 4–13. ACM (2006)Google Scholar
  3. 3.
    Achtert, E., Böhm, C., Kriegel, H.P., Kröger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 413–418. SIAM (2007)Google Scholar
  4. 4.
    Achtert, E., Böhm, C., Kriegel, H.P., Zimek, A., et al.: On exploring complex relationships of correlation clusters. In: Null, p. 7. IEEE (2007)Google Scholar
  5. 5.
    Achtert, E., Böhm, C., Kröger, P., Zimek, A.: Mining hierarchies of correlation clusters. In: 18th International Conference on Scientific and Statistical Database Management, pp. 119–128. IEEE (2006)Google Scholar
  6. 6.
    Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)CrossRefGoogle Scholar
  7. 7.
    Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces, vol. 29. ACM (2000)Google Scholar
  8. 8.
    Goebel, R., et al.: Explainable AI: the new 42? In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2018. LNCS, vol. 11015, pp. 295–303. Springer, Cham (2018). Scholar
  9. 9.
    Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, Boco Raton (2008)zbMATHGoogle Scholar
  10. 10.
    Böhm, C., Kailing, K., Kröger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 455–466. ACM (2004)Google Scholar
  11. 11.
    Davidson, I., Ravi, S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 138–149. SIAM (2005)Google Scholar
  12. 12.
    Gondek, D., Vaithyanathan, S., Garg, A.: Clustering with model-level constraints. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 126–137. SIAM (2005)Google Scholar
  13. 13.
    Holzinger, A., Kieseberg, P., Weippl, E., Tjoa, A.M.: Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2018. LNCS, vol. 11015, pp. 1–8. Springer, Cham (2018). Scholar
  14. 14.
    Kazempour, D., Seidl, T.: Insights into a running clockwork: On interactive process-aware clustering. In: Proceedings of the 22nd International Conference on Extending Database Technology (EDBT) (2019, in press)Google Scholar
  15. 15.
    Kriegel, H.P., Kröger, P., Zimek, A.: Subspace clustering. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2(4), 351–364 (2012)Google Scholar
  16. 16.
    Mises, R., Pollaczek-Geiringer, H.: Praktische verfahren der gleichungsauflösung. ZAMM-J. Appl. Math. Mech./Zeitschrift für Angewandte Mathematik und Mechanik 9(2), 152–164 (1929)CrossRefGoogle Scholar
  17. 17.
    Pukelsheim, F.: The three sigma rule. Am. Stat. 48(2), 88–91 (1994). Scholar
  18. 18.
    Schubert, E., Zimek, A.: ELKI: a large open-source library for data analysis - ELKI release 0.7.5 “heidelberg”. CoRR abs/1902.03616 (2019).
  19. 19.
    Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584 (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Maximilian Archimedes Xaver Hünemörder
    • 1
    Email author
  • Daniyal Kazempour
    • 1
  • Peer Kröger
    • 1
  • Thomas Seidl
    • 1
  1. 1.Ludwig-Maximilians-Universität MünchenMunichGermany

Personalised recommendations