Skip to main content

A Probabilistic Approach to Event-Case Correlation for Process Mining

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2019)

Abstract

Process mining aims to understand the actual behavior and performance of business processes from event logs recorded by IT systems. A key requirement is that every event in the log must be associated with a unique case identifier (e.g., the order ID in an order-to-cash process). In reality, however, this case ID may not always be present, especially when logs are acquired from different systems or when such systems have not been explicitly designed to offer process-tracking capabilities. Existing techniques for correlating events have worked with assumptions to make the problem tractable: some assume the generative processes to be acyclic while others require heuristic information or user input. In this paper, we lift these assumptions by presenting a novel technique called EC-SA based on probabilistic optimization. Given as input a sequence of timestamped events (the log without case IDs) and a process model describing the underlying business process, our approach returns an event log in which every event is mapped to a case identifier. The approach minimises the misalignment between the generated log and the input process model, and the variance between activity durations across cases. The experiments conducted on a variety of real-life datasets show the advantages of our approach over the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at https://github.com/DinaBayomie/EC-SA/releases/tag/v1.0.

  2. 2.

    https://data.4tu.nl/repository/collection:event_logs_real.

  3. 3.

    Seven of these logs, namely the BPIC14 log, the five BPIC15 logs and the BPIC17 log, were filtered in [3] using the technique in [8] to remove infrequent behavior. We kept this filtering to be able to use the models associated with these logs in the benchmark dataset.

References

  1. Adriansyah, A., van Dongen, B., van der Aalst, W.: Conformance checking using cost-based fitness analysis. In: Proceedings of EDOC. IEEE (2011)

    Google Scholar 

  2. Askarzadeh, A., dos Santos Coelho, L., Klein, C., Mariani, V.C.: A population-based simulated annealing algorithm for global optimization. In: Proceedings of SMC. IEEE (2016)

    Google Scholar 

  3. Augusto, A., et al.: Automated discovery of process models from event logs: review and benchmark. IEEE TKDE 31(4), 686–705 (2019)

    Google Scholar 

  4. Augusto, A., Conforti, R., Dumas, M., La Rosa, M., Polyvyanyy, A.: Split miner: automated discovery of accurate and simple business process models from event logs. Knowl. Inf. Syst. 59(2), 251–284 (2019). https://doi.org/10.1007/s10115-018-1214-x

    Article  Google Scholar 

  5. Baier, T., Di Ciccio, C., Mendling, J., Weske, M.: Matching events and activities by integrating behavioral aspects and label analysis. SoSyM 17(2), 573–598 (2018)

    Google Scholar 

  6. Bala, S., Mendling, J., Schimak, M., Queteschiner, P.: Case and activity identification for mining process models from middleware. In: Buchmann, R.A., Karagiannis, D., Kirikova, M. (eds.) PoEM 2018. LNBIP, vol. 335, pp. 86–102. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02302-7_6

    Chapter  Google Scholar 

  7. Bayomie, D., Awad, A., Ezat, E.: Correlating unlabeled events from cyclic business processes execution. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 274–289. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39696-5_17

    Chapter  Google Scholar 

  8. Conforti, R., La Rosa, M., ter Hofstede, A.: Filtering out infrequent behavior from business process event logs. IEEE TKDE 29(2), 300–314 (2017)

    Google Scholar 

  9. Ferreira, D.R., Gillblad, D.: Discovering process models from unlabelled event logs. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 143–158. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03848-8_11

    Chapter  Google Scholar 

  10. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann, N., Song, M., Wohed, P. (eds.) BPM 2013. LNBIP, vol. 171, pp. 66–78. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06257-0_6

    Chapter  Google Scholar 

  11. Meroni, G., Di Ciccio, C., Mendling, J.: An artifact-driven approach to monitor business processes through real-world objects. In: Maximilien, M., Vallecillo, A., Wang, J., Oriol, M. (eds.) ICSOC 2017. LNCS, vol. 10601, pp. 297–313. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69035-3_21

    Chapter  Google Scholar 

  12. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  13. Nezhad, H., Saint-Paul, R., Casati, F., Benatallah, B.: Event correlation for process discovery from web service interaction logs. VLDB J. 20(3), 417–444 (2011)

    Article  Google Scholar 

  14. Pourmirza, S., Dijkman, R., Grefen, P.: Correlation miner: mining business process models and event correlations without case identifiers. IJCIS 26(02), 1742002 (2017)

    Google Scholar 

  15. Reguieg, H., Toumani, F., Motahari-Nezhad, H.R., Benatallah, B.: Using Mapreduce to scale events correlation discovery for business processes mining. In: Barros, A., Gal, A., Kindler, E. (eds.) BPM 2012. LNCS, vol. 7481, pp. 279–284. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32885-5_22

    Chapter  Google Scholar 

  16. Soffer, P., Hinze, A., Koschmider, A., Ziekow, H., et al.: From event streams to process models and back: challenges and opportunities. Inf. Syst. 81, 181–200 (2019)

    Article  Google Scholar 

  17. van der Aalst, W.: Process Mining - Data Science in Action, 2nd edn. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4

    Book  Google Scholar 

  18. Walicki, M., Ferreira, D.: Sequence partitioning for process mining with unlabeled event logs. DKE 70(10), 821–841 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

This research is partly funded by the Australian Research Council (DP180102839) and by the EU H2020 programme under agreement 645751 (RISE_BPM).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dina Bayomie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bayomie, D., Di Ciccio, C., La Rosa, M., Mendling, J. (2019). A Probabilistic Approach to Event-Case Correlation for Process Mining. In: Laender, A., Pernici, B., Lim, EP., de Oliveira, J. (eds) Conceptual Modeling. ER 2019. Lecture Notes in Computer Science(), vol 11788. Springer, Cham. https://doi.org/10.1007/978-3-030-33223-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33223-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33222-8

  • Online ISBN: 978-3-030-33223-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics