Finding Suitable Activity Clusters for Decomposed Process Discovery

  • B. F. A. HompesEmail author
  • H. M. W. Verbeek
  • W. M. P. van der Aalst
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 237)


Event data can be found in any information system and provide the starting point for a range of process mining techniques. The widespread availability of large amounts of event data also creates new challenges. Existing process mining techniques are often unable to handle “big event data” adequately. Decomposed process mining aims to solve this problem by decomposing the process mining problem into many smaller problems which can be solved in less time, using less resources, or even in parallel. Many decomposed process mining techniques have been proposed in literature. Analysis shows that even though the decomposition step takes a relatively small amount of time, it is of key importance in finding a high-quality process model and for the computation time required to discover the individual parts. Currently there is no way to assess the quality of a decomposition beforehand. We define three quality notions that can be used to assess a decomposition, before using it to discover a model or check conformance with. We then propose a decomposition approach that uses these notions and is able to find a high-quality decomposition in little time.


Decomposed process mining Decomposed process discovery Distributed computing Event log 


  1. 1.
    van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin (2011)CrossRefGoogle Scholar
  2. 2.
    van der Aalst, W.M.P.: Distributed process discovery and conformance checking. In: de Lara, J., Zisman, A. (eds.) FASE 2012. LNCS, vol. 7212, pp. 1–25. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  3. 3.
    van der Aalst, W.M.P.: A general divide and conquer approach for process mining. In: 2013 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1–10. IEEE (2013)Google Scholar
  4. 4.
    van der Aalst, W.M.P.: Decomposing Petri nets for process mining: a generic approach. Distrib. Parallel Databases 31(4), 471–507 (2013)CrossRefGoogle Scholar
  5. 5.
    van der Aalst, W.M.P., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM Workshops 2011, Part I. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    van der Aalst, W.M.P., Verbeek, H.M.W.: Process discovery and conformance checking using passages. Fundamenta Informaticae 131(1), 103–138 (2014)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances in graph partitioning. CoRR abs/1311.3144 (2013).
  8. 8.
    Cannataro, M., Congiusta, A., Pugliese, A., Talia, D., Trunfio, P.: Distributed data mining on grids: services, tools, and applications. IEEE Trans. Syst. Man Cybern. Part B Cybern. 34(6), 2451–2465 (2004)CrossRefGoogle Scholar
  9. 9.
    Carmona, J.: Projection approaches to process mining using region-based techniques. Data Min. Knowl. Discov. 24(1), 218–246 (2012). CrossRefMathSciNetzbMATHGoogle Scholar
  10. 10.
    Carmona, J.A., Cortadella, J., Kishinevsky, M.: A region-based algorithm for discovering petri nets from event logs. In: Dumas, M., Reichert, M., Shan, M.-C. (eds.) BPM 2008. LNCS, vol. 5240, pp. 358–373. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  11. 11.
    Carmona, J., Cortadella, J., Kishinevsky, M.: Divide-and-conquer strategies for process mining. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 327–343. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  12. 12.
    Goedertier, S., Martens, D., Vanthienen, J., Baesens, B.: Robust process discovery with artificial negative events. J. Mach. Learn. Res. 10, 1305–1340 (2009)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Hompes, B.F.A.: On decomposed process mining: how to solve a Jigsaw puzzle with friends. Master’s thesis, Eindhoven University of Technology, Eindhoven, The Netherlands (2014).
  14. 14.
    Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)CrossRefGoogle Scholar
  15. 15.
    Kim, M., Candan, K.: SBV-Cut: vertex-cut based graph partitioning using structural balance vertices. Data Knowl. Eng. 72, 285–303 (2012)CrossRefGoogle Scholar
  16. 16.
    Munoz-Gama, J., Carmona, J., van der Aalst, W.M.P.: Single-entry single-exit decomposed conformance checking. Inf. Syst. 46, 102–122 (2014). CrossRefGoogle Scholar
  17. 17.
    Park, B.H., Kargupta, H.: Distributed data mining: algorithms, systems, and applications, pp. 341–358 (2002)Google Scholar
  18. 18.
    Reguieg, H., Toumani, F., Motahari-Nezhad, H.R., Benatallah, B.: Using mapreduce to scale events correlation discovery for business processes mining. In: Barros, A., Gal, A., Kindler, E. (eds.) BPM 2012. LNCS, vol. 7481, pp. 279–284. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  19. 19.
    Saha, B., Shah, H., Seth, S., Vijayaraghavan, G., Murthy, A., Curino, C.: Apache tez: A unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1357–1369. ACM (2015)Google Scholar
  20. 20.
    Shukla, R.K., Pandey, P., Kumar, V.: Big data frameworks: at a glance. Int. J. Innov. Adv. Comput. Sci. (IJIACS) 4(1) (2015). ISSN 2347-8616Google Scholar
  21. 21.
    The Apache Software Foundation: Apache Flink: Scalable Batch and Stream Data Processing., July 2015
  22. 22.
    Vanderfeesten, I.T.P.: Product-based design and support of workflow processes. Ph.D. thesis, Eindhoven University of Technology, Eindhoven, The Netherlands (2009)Google Scholar
  23. 23.
    Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM (2013)Google Scholar
  24. 24.
    Verbeek, H.M.W., Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: ProM 6: the process mining toolkit. In: Proceedings of BPM Demonstration Track 2010, vol. 615, pp. 34–39. (2010).
  25. 25.
    van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Process discovery using integer linear programming. In: van Hee, K.M., Valk, R. (eds.) PETRI NETS 2008. LNCS, vol. 5062, pp. 368–387. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  26. 26.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, vol. 10, p. 10 (2010)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2015

Authors and Affiliations

  • B. F. A. Hompes
    • 1
    Email author
  • H. M. W. Verbeek
    • 1
  • W. M. P. van der Aalst
    • 1
  1. 1.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations