Skip to main content

Encoding process discovery problems in SMT

Abstract

Information systems, which are responsible for driving many processes in our lives (health care, the web, municipalities, commerce and business, among others), store information in the form of logs which is often left unused. Process mining, a discipline in between data mining and software engineering, proposes tailored algorithms to exploit the information stored in a log, in order to reason about the processes underlying an information system. A key challenge in process mining is discovery: Given a log, derive a formal process model that can be used afterward for a formal analysis. In this paper, we provide a general approach based on satisfiability modulo theories (SMT) as a solution for this challenging problem. By encoding the problem into the logical/arithmetic domains and using modern SMT engines, it is shown how two separate families of process models can be discovered. The theory of this paper is accompanied with a tool, and experimental results witness the significance of this novel view of the process discovery problem.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    Notice that depending on the notion of valid sequence, the notion of undesired behavior may vary. For instance, for certain formalisms, only complete sequences (i.e., sequences from start to end) may be considered. For the sake of generality, we opt to abstracting from these matters in Algorithm 2.

  2. 2.

    In this case if previous iterations of the algorithm were only due to forbidding \(\sigma \) on other parts of the model, these modifications could in principle be rolled back.

  3. 3.

    Remarkably, the SMT technique proposed can be applied individually to every trace of the log, which allows to independently solve the problem when complexity issues may arise.

  4. 4.

    Although the minimum number of arcs required to guarantee that all activities are connected is \(|A|-1\), the minimum bound in the algorithm is set to |A|. This is because there is a single model that has \(|A|-1\) arcs, which corresponds to a sequence of activities. If this model is feasible, then it should have been already found in \(C_{\text {IF}}(L)\), thus \(\left| \mathrm arcs(C)\right| = |A|-1\) and the algorithm would never enter the loop and return \(C_{\text {IF}}(L)\). On the other hand, if \(\left| \mathrm arcs(C)\right| > |A|-1\), then there is no feasible model with just \(|A|-1\) arcs; thus, the minimum search bound can be set to |A|.

  5. 5.

    One can notice this with the simple example of Fig. 2b: To replay the occurrence of activity a, the three output bindings should be considered as potential successor states, in general to proceed with the replay any of them can be combined with the occurrences of the sequent activities, which in turn may introduce new output binding possibilities.

  6. 6.

    For instance, \(z = x \vee y\) is equivalent to \(z \ge x\), \(z \ge y\) and \(z \le x+y\).

  7. 7.

    This is indeed a region, since the gradient of every event is constant and equal to zero.

  8. 8.

    Testing minimality of model elements is a feature not considered in Algorithm 3, and this is the reason why the generic algorithm (Algorithm 3) and the instantiation (Algorithm 5) have a different structure.

  9. 9.

    STP translates the SMT formula to a SAT formula and then uses the miniSAT solver, but any other SAT (or incremental SAT) tool can be used as backend.

References

  1. 1.

    van der Aalst, W.M.P., Adriansyah, A., de Medeiros, A.K.A., Arcieri, F., Baier, T., Blickle, T., Bose, R.P.J.C., van den Brand, P., Brandtjen, R., Buijs, J., Burattin, A., Carmona, J., Castellanos, M., Claes, J., Cook, J., Costantini, N., Curbera, F., Damiani, E., de Leoni, M., Delias, P., van Dongen, B.F., Dumas, M., Dustdar, S., Fahland, D., Ferreira, D.R., Gaaloul, W., van Geffen, F., Goel, S., Günther, C., Guzzo, A., Harmon, P., ter Hofstede, A., Hoogland, J., Ingvaldsen, J.E., Kato, K., Kuhn, R., Kumar, A., La Rosa, M., Maggi, F., Malerba, D., Mans, R.S., Manuel, A., McCreesh, M., Mello, P., Mendling, J., Montali, M., Motahari-Nezhad, H.R., zur Muehlen, M., Munoz-Gama, J., Pontieri, L., Ribeiro, J., Rozinat, A., Pérez, H.S., Pérez, R.S., Sepúlveda, M., Sinur, J., Soffer, P., Song, M., Sperduti, A., Stilo, G., Stoel, C., Swenson, K., Talamo, M., Tan, W., Turner, C., Vanthienen, J., Varvaressos, G., Verbeek, E., Verdonk, M., Vigo, R., Wang, J., Weber, B., Weidlich, M., Weijters, T., Wen, L., Westergaard, M., Wynn, M.: IEEE task force on process mining: process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) Business Process Management Workshops (1), Lecture Notes in Business Information Processing, vol. 99, pp. 169–194. Springer, New York (2011)

  2. 2.

    van der Aalst, W.M.P.: Process Mining—Discovery, Conformance and Enhancement of Business Processes. Springer, New York (2011)

    MATH  Google Scholar 

  3. 3.

    Murata, T.: Petri Nets: properties, analysis and applications. In: Proceedings of the IEEE, pp. 541–580 (1989)

  4. 4.

    Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Solving SAT and SAT modulo theories: from an abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(t). J. ACM 53(6), 937–977 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Srivastava, S., Gulwani, S., Foster, J.S.: VS3: SMT solvers for program verification. In: Bouajjani, A., Maler, O. (eds.) CAV, Lecture Notes in Computer Science, vol. 5643, pp. 702–708. Springer, New York (2009)

    Google Scholar 

  6. 6.

    Tillmann, N., Schulte, W.: Unit tests reloaded: parameterized unit testing with symbolic execution. IEEE Softw. 23(4), 38–47 (2006)

    Article  Google Scholar 

  7. 7.

    de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS, Lecture Notes in Computer Science, vol. 4963, pp. 337–340. Springer, New York (2008)

    Google Scholar 

  8. 8.

    Metzner, A., Fränzle, M., Herde, C., Stierand, I.: Scheduling distributed real-time systems by satisfiability checking. In: Ng, J.K., Sha, L., Lee, V.C.S., Takashio, K., Ryu, M., Ni, L. (eds.) RTCSA, pp. 409–415. IEEE Computer Society, Los Alamitos (2005)

  9. 9.

    Wolfman, S.A., Weld, D.S.: The LPSAT engine & its application to resource planning. In: Dean, T. (ed.) IJCAI, pp. 310–317. Morgan Kaufmann, Los Altos (1999)

    Google Scholar 

  10. 10.

    de Moura, L.M., Bjørner, N.: Satisfiability modulo theories: introduction and applications. Commun. ACM 54(9), 69–77 (2011)

    Article  Google Scholar 

  11. 11.

    van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.: Causal nets: a modeling language tailored towards process discovery. In: Katoen, J-.P., König, B. (eds.) CONCUR 2011 – Concurrency Theory, Lecture Notes in Computer Science, vol. 6901, pp. 28–42. Springer, Berlin Heidelberg (2011) CONCUR, pp. 28–42 (2011)

  12. 12.

    Solé, M., Carmona, J.: An SMT-based discovery algorithm for C-nets. In: Petri Nets, LNCS, vol. 7347, pp. 51–71 (2012)

  13. 13.

    Solé, M., Carmona, J.: Amending C-net discovery algorithms. In: S.Y. Shin, J.C. Maldonado (eds.) Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, Coimbra, Portugal, March 18–22, 2013, pp. 1418–1425. ACM (2013). doi:10.1145/2480362.2480628. http://doi.acm.org/10.1145/2480362.2480628

  14. 14.

    Bose, R.P.J.C., van der Aalst, W.M.P.: Analysis of patient treatment procedures. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) Business Process Management Workshops (1), Lecture Notes in Business Information Processing, vol. 99, pp. 165–166. Springer, New York (2011)

    Google Scholar 

  15. 15.

    R.S, Mans, Schonenberg, H., Song, M., van der Aalst, W.M.P., Bakker, P.J.M.: Application of process mining in healthcare—a case study in a Dutch hospital. In: Fred, A.L.N., Filipe, J., Gamboa, H. (eds.) BIOSTEC (Selected Papers), Communications in Computer and Information Science, vol. 25, pp. 425–438. Springer, New York (2008)

    Google Scholar 

  16. 16.

    van der Aalst, W.M.P., Verbeek, H.M.W.E.: Process mining in web services: the websphere case. IEEE Data Eng. Bull. 31(3), 45–48 (2008)

    Google Scholar 

  17. 17.

    Rozinat, A., de Jong, I.S.M., Günther, C.W., van der Aalst, W.M.P.: Process mining applied to the test process of wafer scanners in ASML. IEEE Trans. Syst. Man Cybern. Part C 39(4), 474–479 (2009)

    Article  Google Scholar 

  18. 18.

    van der Aalst, W.M.P., van Hee, K.M., van der Werf, J.M.E.M., Verdonk, M.: Auditing 2.0: using process mining to support tomorrow’s auditor. IEEE Comput. 43(3), 90–93 (2010)

    Article  Google Scholar 

  19. 19.

    Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: Quality dimensions in process discovery: the importance of fitness, precision, generalization and simplicity. Int. J. Coop. Inf. Syst. 23(1) (2014). doi:10.1142/S0218843014400012

  20. 20.

    Jha, S., Limaye, R., Seshia, S.: Beaver: Engineering an efficient SMT solver for bit-vector arithmetic. In: Bouajjani, A., Maler, O. (eds.) Computer Aided Verification, pp. 668–674. Springer, New York (2009)

  21. 21.

    van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. IEEE TKDE 16(9), 1128–1142 (2004)

    Google Scholar 

  22. 22.

    Ehrenfeucht, A., Rozenberg, G.: Partial (Set) 2-structures. Part I, II. Acta Inf. 27, 315–368 (1990)

    Article  MATH  Google Scholar 

  23. 23.

    Badouel, E., Darondeau, P.: Theory of regions. In: Reisig, W., Rozenberg, G. (eds.) Petri Nets, LNCS 1491, pp. 529–586. Springer, New York (1998)

  24. 24.

    Carmona, J., Cortadella, J., Kishinevsky, M.: New region-based algorithms for deriving bounded Petri nets. IEEE Trans. Comput. 59(3), 371–384 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  25. 25.

    van der Aalst, W.M.P., Rubin, V., Verbeek, H.M.W., van Dongen, B., Kindler, E., Günther, C.: Process mining: a two-step approach to balance between underfitting and overfitting. Softw. Syst. Model. 9(1), 87–111 (2009)

    Article  Google Scholar 

  26. 26.

    Solé, M., Carmona, J.: Process mining from a basis of state regions. In: Lilius, J., Penczek, W. (eds.) Petri Nets, LNCS 6128, pp. 226–245. Springer, New York (2010)

  27. 27.

    Solé, M., Carmona, J.: Region-based foldings in process discovery. IEEE Trans. Knowl. Data Eng. 25(1), 192–205 (2013). doi:10.1109/TKDE.2011.192

    Article  Google Scholar 

  28. 28.

    Bernardinello, L.: Synthesis of net systems. In: Marsan, M.A. (ed.) Application and Theory of Petri Nets, LNCS, vol. 691, pp. 89–105. Springer, New York (1993)

  29. 29.

    Ganesh, V., Dill, D.L.: A decision procedure for bit-vectors and arrays. In: Damm, W., Hermanns, H. (eds.) Computer Aided Verification, pp. 524–536. Springer, New York (2007)

  30. 30.

    Gebser, M., Kaufmann, B., Neumann, A., Schaub, T.: clasp: a conflict-driven answer set solver. In: Baral, C., Brewka, G., Schlipf, J.S. (eds.) LPNMR, Lecture Notes in Computer Science. Lecture Notes in Computer Science, vol. 4483, pp. 260–265. Springer, New York (2007)

    Google Scholar 

  31. 31.

    van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Process discovery using integer linear programming. In: van Hee, K.M., Valk, R. (eds.) ATPN, pp. 368–387. Springer, New York (2008)

  32. 32.

    Munoz-Gama, J., Carmona, J.: A Fresh look at precision in process conformance. In: Hull, R., Mendling, J., Tai, S. (eds.) Business Process Management (BPM). Springer, New York. (2010)

  33. 33.

    Weijters, A.J.M.M., Ribeiro, J.T.S.: Flexible heuristics miner (FHM). In: Chawla, N., King,I., Sperduti, A. (eds.) CIDM, pp. 310–317. IEEE, Los Alamitos (2011)

  34. 34.

    Guo, Q., Wen, L., Wang, J., Yan, Z., Yu, P.S.: Mining invisible tasks in non-free-choice constructs. In: Proceedings of Business Process Management—13th International Conference, BPM 2015, Innsbruck, Austria, August 31–September 3, 2015, pp. 109–125 (2015)

  35. 35.

    de Medeiros, A.K.A., van der Aalst, W.M.P., Weijters, A.J.M.M.: Workflow mining: current status and future directions. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS/DOA/ODBASE, pp. 389–406. Springer, New York (2003)

  36. 36.

    Wen, L., van der Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2), 145–180 (2007)

    MathSciNet  Article  Google Scholar 

  37. 37.

    van Dongen, B.F., de Medeiros, A.K.A., Wen, L.: Process mining: overview and outlook of petri net discovery algorithms. Trans. Petri Nets Other Models of Concurr. 2, 225–242 (2009)

    Article  Google Scholar 

  38. 38.

    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs—a constructive approach. In: Application and Theory of Petri Nets and Concurrency - 34th International Conference, PETRI NETS 2013, Milan, Italy, June 24–28, 2013. Proceedings, pp. 311–329 (2013)

  39. 39.

    van der Aalst, W.M.P., de Medeiros, A.K.A., Weijters, A.J.M.M.: Genetic process mining. In: Ciardo, G., Darondeau, P. (eds.) CATPN, LNCS, vol. 3536, pp. 48–69. Springer, New York (2005)

  40. 40.

    Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: A genetic algorithm for discovering process trees. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2012, Brisbane, Australia, June 10–15, 2012, pp. 1–8 (2012). doi:10.1109/CEC.2012.6256458

  41. 41.

    Bergenthum, R., Desel, J., Lorenz, R., S.Mauser: Process mining based on regions of languages. In: Alonso, G., Dadam, P., Rosemann, M. (eds.) Business Process Management, pp. 375–383. Springer, New York (2007)

  42. 42.

    Solé, M., Carmona, J.: Light region-based techniques for process discovery. Fundam. Inform. 113(3–4), 343–376 (2011)

    MathSciNet  MATH  Google Scholar 

  43. 43.

    Argelich, J., Manyà, F.: Exact max-sat solvers for over-constrained problems. J. Heuristics 12, 375–392 (2006)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by funds from the Spanish Ministry for Economy and Competitiveness (MINECO) and the European Union (FEDER funds) under Grant COMMAS (Ref. TIN2013-46181-C2-1-R).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Josep Carmona.

Additional information

Communicated by Dr. Daniel Varro.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Solé, M., Carmona, J. Encoding process discovery problems in SMT. Softw Syst Model 17, 1055–1078 (2018). https://doi.org/10.1007/s10270-016-0536-y

Download citation

Keywords

  • Process discovery
  • SMT application
  • Causal nets
  • Petri nets