Towards an Entropy-Based Analysis of Log Variability

  • Christoffer Olling BackEmail author
  • Søren Debois
  • Tijs Slaats
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 308)


Process mining algorithms can be partitioned by the type of model that they output: imperative miners output flow-diagrams showing all possible paths through a process, whereas declarative miners output constraints showing the rules governing a process. For processes with great variability, the latter approach tends to provide better results, because using an imperative miner would lead to so-called “spaghetti models” which attempt to show all possible paths and are impossible to read. However, studies have shown that one size does not fit all: many processes contain both structured and unstructured parts and therefore do not fit strictly in one category or the other. This has led to the recent introduction of hybrid miners, which aim to combine flow- and constraint-based models to provide the best possible representation of a log. In this paper we focus on a core question underlying the development of hybrid miners: given a log, can we determine a priori whether the log is best suited for imperative or declarative mining? We propose using the concept of entropy, commonly used in information theory. We consider different measures for entropy that could be applied and show through experimentation on both synthetic and real-life logs that these entropy measures do indeed give insights into the complexity of the log and can act as an indicator of which mining paradigm should be used.


Process mining Hybrid models Process variability Process flexibility Information theory · Entropy Knowledge Work 



We would like to thank both anonymous the reviewers and Jakob Grue Simonsen for valuable and constructive feedback.


  1. 1.
    van der Aalst, W.M.P.: The application of petri nets to workflow management. J. Circuits Syst. Comput. 08, 21–66 (1998)CrossRefGoogle Scholar
  2. 2.
    Object Management Group. Business Process Modeling Notation Version 2.0. Technical report, Object Management Group Final Adopted Specification (2011)Google Scholar
  3. 3.
    Pesic, M., Schonenberg, H., van der Aalst, W.M.P.: Declare: full support for loosely-structured processes. In: 2007 EDOC, pp. 287–300 (2007)Google Scholar
  4. 4.
    Debois, S., Hildebrandt, T., Slaats, T.: Safety, liveness and run-time refinement for modular process-aware information systems with dynamic sub processes. In: Bjørner, N., de Boer, F. (eds.) FM 2015. LNCS, vol. 9109, pp. 143–160. Springer, Cham (2015). CrossRefGoogle Scholar
  5. 5.
    Hull, R., Damaggio, E., De Masellis, R., Fournier, F., Gupta, M., Heath, F.T., Hobson, S., Linehan, M.H., Maradugu, S., Nigam, A., Noi Sukaviriya, P., Vaculín, R.: Business artifacts with guard-stage-milestone lifecycles: managing artifact interactions with conditions and events. In: 2011 DEBS, pp. 51–62 (2011)Google Scholar
  6. 6.
    Reijers, H.A., Slaats, T., Stahl, C.: Declarative modeling–an academic dream or the future for BPM? In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 307–322. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  7. 7.
    Debois, S., Hildebrandt, T., Marquard, M., Slaats, T.: Hybrid process technologies in the financial sector: the case of BRFkredit. In: vom Brocke, J., Mendling, J. (eds.) Business Process Management Cases. MP, pp. 397–412. Springer, Cham (2018). CrossRefGoogle Scholar
  8. 8.
    Slaats, T., Schunselaar, D.M.M., Maggi, F.M., Reijers, H.A.: The Semantics of Hybrid Process Models. In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033. Springer, Cham (2016). Google Scholar
  9. 9.
    Maggi, F.M., Slaats, T., Reijers, H.A.: The automated discovery of hybrid processes. In: Proceedings of 12th International Conference on Business Process Management - BPM 2014, Haifa, Israel, 7-11 September 2014, pp. 392-399 (2014)Google Scholar
  10. 10.
    De Smedt, J., De Weerdt, J., Vanthienen, J.: Fusion miner: process discovery for mixed-paradigm models. Decis. Support Syst. 77, 123–136 (2015)CrossRefGoogle Scholar
  11. 11.
    Schunselaar, D.M.M., Slaats, T., Reijers, H.A., Maggi, F.M., van der Aalst, W.M.P.: Mining hybrid models: a quest for better precision. Unpublished manuscript (2017, Available)Google Scholar
  12. 12.
    Debois, S., Slaats, T.: The analysis of a real life declarative process. In: 2015 CIDM, pp. 1374–1382 (2015)Google Scholar
  13. 13.
    Greco, G., Guzzo, A., Pontieri, L., Sacca, D.: Discovering expressive process models by clustering log traces. IEEE Trans. Knowl. Data Eng. 18(8), 1010–1027 (2006)CrossRefGoogle Scholar
  14. 14.
    Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). CrossRefGoogle Scholar
  15. 15.
    Makanju, A.A.O., Zincir-Heywood, A.N., Milios, E.E.: Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, NY, USA, pp. 1255–1264. ACM, New York (2009)Google Scholar
  16. 16.
    Bose, R.J.C., van der Aalst, W.M.P.: Context aware trace clustering: towards improving process mining results. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 401–412. SIAM (2009)Google Scholar
  17. 17.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)zbMATHGoogle Scholar
  19. 19.
    Breuker, D., Matzner, M., Delfmann, P., Becker, J.: Comprehensible predictive models for business processes. MIS Q. 40(4), 1009–1034 (2016)CrossRefGoogle Scholar
  20. 20.
    Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness, precision, generalization and simplicity in process discovery. In: Meersman, R., Panetto, H., Dillon, T., Rinderle-Ma, S., Dadam, P., Zhou, X., Pearson, S., Ferscha, A., Bergamaschi, S., Cruz, I.F. (eds.) OTM 2012. LNCS, vol. 7565, pp. 305–322. Springer, Heidelberg (2012). CrossRefGoogle Scholar
  21. 21.
    Schürmann, T., Grassberger, P.: Entropy estimation of symbol sequences. Chaos: An Interdisciplinary. J. Nonlinear Sci. 6(3), 414–427 (1996)zbMATHGoogle Scholar
  22. 22.
    van der Aalst, W.M.P., van Dongen, B.F., Günther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: ProM: The process mining toolkit. In: BPM (Demos) (2009)Google Scholar
  23. 23.
    Lesne, A., Blanc, J.-L., Pezard, L.: Entropy estimation of very short symbolic sequences. Phys. Rev. E 79(4), 046208 (2009)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Thomas, J.A., Cover, T.M.: Elements of Information Theory. Wiley, Hoboken (2006)zbMATHGoogle Scholar
  25. 25.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
  26. 26.
    Greco, G., Guzzo, A., Pontieri, L., Sacca, D.: Discovering expressive process models by clustering log traces. IEEE Trans. Knowl. Data Eng. 18(8), 1010–1027 (2006)CrossRefGoogle Scholar
  27. 27.
    Delias, P., Doumpos, M., Grigoroudis, E., Matsatsinis, N.: A non-compensatory approach for trace clustering. Int. Trans. Oper. Res. (2017)Google Scholar
  28. 28.
    Ha, Q.-T., Bui, H.-N., Nguyen, T.-T.: A trace clustering solution based on using the distance graph model. In: Nguyen, N.-T., Manolopoulos, Y., Iliadis, L., Trawiński, B. (eds.) ICCCI 2016. LNCS (LNAI), vol. 9875, pp. 313–322. Springer, Cham (2016). CrossRefGoogle Scholar
  29. 29.
    Song, M., Yang, H., Siadat, S.H., Pechenizkiy, M.: A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Exper. Syst. with Appl. 40(9), 3722–3737 (2013)CrossRefGoogle Scholar
  30. 30.
    Demaine, E.D., Immorlica, N.: Correlation clustering with partial information. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds.) APPROX/RANDOM -2003. LNCS, vol. 2764, pp. 1–13. Springer, Heidelberg (2003). Google Scholar
  31. 31.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: Proceedings of 2002 The 43rd Annual IEEE Symposium on Foundations of Computer Science, pp. 238–247. IEEE (2002)Google Scholar
  32. 32.
    Kozachenko, L.F., Leonenko, N.N.: Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23(2), 9–16 (1987)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CopenhagenCopenhagenDenmark
  2. 2.Department of Computer ScienceIT University of CopenhagenCopenhagenDenmark

Personalised recommendations