Skip to main content

Expert-driven trace clustering with instance-level constraints


Within the field of process mining, several different trace clustering approaches exist for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or driven by the discovery of a process model for each cluster. The main drawback of these techniques, however, is that their solutions are usually hard to evaluate or justify by domain experts. In this paper, we present two constrained trace clustering techniques that are capable to leverage expert knowledge in the form of instance-level constraints. In an extensive experimental evaluation using two real-life datasets, we show that our novel techniques are indeed capable of producing clustering solutions that are more justifiable without a substantial negative impact on their quality.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. ProM is the leading open-source process mining framework for academicians and practitioners, see:


  1. Van der Aalst W, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Data Min Knowl Discov 2(2):182–192

    Article  Google Scholar 

  2. Augusto A, Conforti R, Dumas M, La Rosa M, Polyvyanyy A (2018) Split miner: automated discovery of accurate and simple business process models from event logs. Knowl Inf Syst.

    Article  Google Scholar 

  3. Ben-Hur A, Elisseeff A, Guyon I (2001) A stability based method for discovering structure in clustered data. In: Pacific symposium on biocomputing, vol 7, pp 6–17

  4. Bose RPJC, van der Aalst WMP (2009) Context aware trace clustering: towards improving process mining results. Sdm, pp 401–412.

  5. Bose RPJC, van der Aalst WMP (2010) Trace clustering based on conserved patterns: Towards achieving better process models. In: Lect. Notes Bus. Inf. Process., vol 43 LNBIP, pp 170–181.

  6. Chen J, Huang X, Kanj IA, Xia G (2006) Strong computational lower bounds via parameterized complexity. J Comput Syst Sci 72(8):1346–1367

    Article  MathSciNet  Google Scholar 

  7. Davidson I, Ravi SS (2005) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 3721 LNAI, pp 59–70.

  8. Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: 10th European conference on principles and practice of knowledge discovery in databases, pp 115–126.

  9. De Koninck P, De Weerdt J, vanden Broucke SKLM (2017) Explaining clusterings of process instances. Data Min Knowl Disc 31(3):774–808.

    Article  MathSciNet  Google Scholar 

  10. De Koninck P, Nelissen K, Baesens B, vanden Broucke S, Snoeck M, De Weerdt J (2017) An approach for incorporating expert knowledge in trace clustering. In: Dubois E, Pohl K (eds) Advanced information systems engineering29th international conference, CAiSE 2017, Essen, Germany, June 12–16, 2017, proceedings. Springer, Cham, pp 561–576.

    Chapter  Google Scholar 

  11. De Smedt J, De Weerdt J, Vanthienen J, Poels G (2016) Mixed-paradigm process modeling with intertwined state spaces. Bus Inf Syst Eng 58(1):19–29.

    Article  Google Scholar 

  12. De Weerdt J, De Backer M, Vanthienen J, Baesens B (2011) A robust f-measure for evaluating discovered process models. In: 2011 IEEE symposium on computational intelligence and data mining (CIDM). IEEE, pp 148–155.

  13. De Weerdt J, De Backer M, Vanthienen J, Baesens B (2012) A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Inf Syst 37(7):654–676.

    Article  Google Scholar 

  14. De Weerdt J, vanden Broucke S, Vanthienen J, Baesens B (2013) Active trace clustering for improved process discovery. IEEE Trans Knowl Data Eng 25(12):2708–2720.

    Article  Google Scholar 

  15. Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowl Based Syst 84:203–213.

    Article  Google Scholar 

  16. Dumas M, Rosa ML, Mendling J, Reijers HA (2018) Fundamentals of business process management, 2nd edn. Springer, Berlin.

    Book  Google Scholar 

  17. Eaton E, des Jardins M, Jacob S (2014) Multi-view constrained clustering with an incomplete mapping between views. Knowl Inf Syst 38(1):231–257.

    Article  Google Scholar 

  18. Goedertier S, Martens D, Vanthienen J, Baesens B (2009) Robust process discovery with artificial negative events. J Mach Learn Res 10:1305–1340

    MathSciNet  MATH  Google Scholar 

  19. Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. Technical report, Stanford

  20. Law M, Topchy A, Jain A (2005) Model-based clustering with probabilistic constraints. Sdm pp 1–5,

  21. Leemans SJJ, Fahland D, van der Aalst WMP (2013) Discovering block-structured process models from event logs: a constructive approach. In: Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). Springer, Berlin, pp 311–329.

  22. Mabroukeh NR, Ezeife CI (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43(1):3:1-3:41.

    Article  Google Scholar 

  23. Mannhardt F, de Leoni M, Reijers HA, van der Aalst WM, Toussaint PJ (2016) From low-level events to activities—a pattern-based approach. In: 14th international conference, BPM 2016, Rio de Janeiro, Brazil, September 18–22, LNCS. Springer, Berlin, pp 125–141.

  24. Martens D, Vanthienen J, Verbeke W, Baesens B (2011) Performance of classification models from a user perspective. Decis Support Syst 51(4):782–793.

    Article  Google Scholar 

  25. Mu noz-Gama J, Carmona J (2010) A fresh look at precision in process conformance. In: Hull R, Mendling J, Tai S (eds) Business process management: 8th international conference, BPM 2010, Hoboken, NJ, USA, September 13–16. Proceedings. Springer, Berlin, pp 211–226.

  26. Murtagh F (1984) A survey of recent advances in hierarchical clustering algorithms which use cluster centers. Comput J 26:354–359

    Article  Google Scholar 

  27. Rozinat A, Van der Aalst WM (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95

    Article  Google Scholar 

  28. Song M, Günther C, van der Aalst WMP (2009) Trace clustering in business process mining. In: Bus. Process Manag. Work. Springer, Berlin, vol 17, pp 109–120.

  29. Tax N, Sidorova N, Haakma R, van der Aalst WMP (2016) Mining local process models. J Innov Dig Ecosyst 3(2):183–196.

    Article  Google Scholar 

  30. van der Aalst WMP, Adriansyah A, Van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192.

    Article  Google Scholar 

  31. Van Dongen B (2015) Bpi challenge 2015 (dataset).

  32. vanden Broucke S, De Weerdt J (2017) Fodina: a robust and flexible heuristic process discovery technique. Decision Support Syst 100(Supplement C):109–118. (Ssmart Business Process Management)

    Article  Google Scholar 

  33. vanden Broucke S, De Weerdt J, Vanthienen J, Baesens B (2014) Determining process model precision and generalization with weighted artificial negative events. IEEE Trans Knowl Data Eng 26(8):1877–1889

    Article  Google Scholar 

  34. Veiga GM, Ferreira DR (2010) Understanding spaghetti models with sequence clustering for prom. In: Rinderle-Ma S, Sadiq S, Leymann F (eds) Business process management workshops. Springer, Berlin, pp 92–103

    Chapter  Google Scholar 

  35. Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: ICML. Morgan Kaufmann, pp 577–584

  36. Wang N, Sun S, OuYang D (2016) Business process modeling abstraction based on semi-supervised clustering analysis. Bus Inf Syst Eng.

    Article  Google Scholar 

  37. Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, KDD ’10, pp 563–572.

  38. Weijters A, van der Aalst WMP, De Medeiros AA (2006) Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Technical Report, WP, vol 166, pp 1–34

  39. Zhu S, Wang D, Li T (2010) Data clustering with size constraints. Knowl Based Syst 23(8):883–889.

    Article  Google Scholar 

Download references


This research has been financed in part by the EC H2020 MSCA RISE NeEDS Project (Grant agreement ID: 822214)

Author information

Authors and Affiliations


Corresponding author

Correspondence to Pieter De Koninck.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

De Koninck, P., Nelissen, K., vanden Broucke, S. et al. Expert-driven trace clustering with instance-level constraints. Knowl Inf Syst 63, 1197–1220 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Trace clustering
  • Process mining
  • Semi-supervised learning
  • Constrained clustering

Mathematics Subject Classification

  • 62H30
  • 91C20