Improving Process Discovery Results by Filtering Outliers Using Conditional Behavioural Probabilities

  • Mohammadreza Fani SaniEmail author
  • Sebastiaan J. van Zelst
  • Wil M. P. van der Aalst
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 308)


Process discovery, one of the key challenges in process mining, aims at discovering process models from process execution data stored in event logs. Most discovery algorithms assume that all data in an event log conform to correct execution of the process, and hence, incorporate all behaviour in their resulting process model. However, in real event logs, noise and irrelevant infrequent behaviour are often present. Incorporating such behaviour results in complex, incomprehensible process models concealing the correct and/or relevant behaviour of the underlying process. In this paper, we propose a novel general purpose filtering method that exploits observed conditional probabilities between sequences of activities. The method has been implemented in both the ProM toolkit and the RapidProM framework. We evaluate our approach using real and synthetic event data. The results show that the proposed method accurately removes irrelevant behaviour and, indeed, improves process discovery results.


Process mining Process discovery Noise filtering Outlier detection 


  1. 1.
    van der Aalst, W.M.P.: Process Mining - Data Science in Action, 2nd edn. Springer, Heidelberg (2016)CrossRefGoogle Scholar
  2. 2.
    Maruster, L., Weijters, A.J.M.M., van der Aalst, W.M.P., van den Bosch, A.: A rule-based approach for process discovery: dealing with noise and imbalance in process logs. Data Min. Knowl. Discov. 13(1), 67–87 (2006)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Van der Aalst, W.M., van Dongen, B.F., Günther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. BPM (Demos) 489(31) (2009)Google Scholar
  4. 4.
    van der Aalst, W.M.P., Bolt, A., van Zelst, S.J.: RapidProM: Mine your processes and not just your data. CoRR abs/1703.03740 (2017)Google Scholar
  5. 5.
    van Zelst, S., van Dongen, B., van der Aalst, W., Verbeek, H.: Discovering Relaxed Sound Workflow Nets using Integer Linear Programming. arXiv preprint arXiv:1703.06733 (2017)
  6. 6.
    Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness, precision, generalization and simplicity in process discovery. In: Meersman, R., Panetto, H., Dillon, T., Rinderle-Ma, S., Dadam, P., Zhou, X., Pearson, S., Ferscha, A., Bergamaschi, S., Cruz, I.F. (eds.) OTM 2012. LNCS, vol. 7565, pp. 305–322. Springer, Heidelberg (2012). CrossRefGoogle Scholar
  7. 7.
    van der Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)CrossRefGoogle Scholar
  8. 8.
    Weijters, A.J.M.M., Ribeiro, J.T.S.: Flexible Heuristics Miner (FHM). In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE (2011)Google Scholar
  9. 9.
    van der Aalst, W.M.P., Rubin, V., Verbeek, H.M.W., van Dongen, B.F., Kindler, E., Günther, C.W.: Process mining: a two-step approach to balance between underfitting and overfitting. Softw. Syst. Model. 9(1), 87–111 (2008)CrossRefGoogle Scholar
  10. 10.
    Günther, C.W., van der Aalst, W.M.P.: Fuzzy mining – adaptive process simplification based on multi-perspective metrics. In: Alonso, G., Dadam, P., Rosemann, M. (eds.) BPM 2007. LNCS, vol. 4714, pp. 328–343. Springer, Heidelberg (2007). CrossRefGoogle Scholar
  11. 11.
    van der Werf, J.M.E.M., Dongen van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Process discovery using integer linear programming. Fundam. Inform. 94(3–4), 387–412 (2009)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 311–329. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  13. 13.
    Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs containing infrequent behaviour. In: Lohmann, N., Song, M., Wohed, P. (eds.) BPM 2013. LNBIP, vol. 171, pp. 66–78. Springer, Cham (2014). CrossRefGoogle Scholar
  14. 14.
    van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: Avoiding over-fitting in ILP-based process discovery. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 163–171. Springer, Cham (2015). CrossRefGoogle Scholar
  15. 15.
    Yang, W., Hwang, S.: A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst. Appl. 31(1), 56–68 (2006)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Ghionna, L., Greco, G., Guzzo, A., Pontieri, L.: Outlier detection techniques for process mining applications. In: An, A., Matwin, S., Raś, Z.W., Ślȩzak, D. (eds.) ISMIS 2008. LNCS (LNAI), vol. 4994, pp. 150–159. Springer, Heidelberg (2008). CrossRefGoogle Scholar
  17. 17.
    Wang, J., Song, S., Lin, X., Zhu, X., Pei, J.: Cleaning structured event logs: a graph repair approach. In: IEEE 31st International Conference on Data Engineering, ICDE, pp. 30–41 (2015)Google Scholar
  18. 18.
    Cheng, H.J., Kumar, A.: Process mining on noisy logs –can log sanitization help to improve performance? Decis. Support Syst. 79, 138–149 (2015)CrossRefGoogle Scholar
  19. 19.
    Cendrowska, J.: PRISM: An algorithm for inducing modular rules. Int. J. Man Mach. Stud. 27(4), 349–370 (1987)CrossRefzbMATHGoogle Scholar
  20. 20.
    Conforti, R., La Rosa, M., ter Hofstede, A.H.M.: Filtering out infrequent behavior from business process event logs. IEEE Trans. Knowl. Data Eng. 29(2), 300–314 (2017)CrossRefGoogle Scholar
  21. 21.
    Bolt, A., de Leoni, M., van der Aalst, W.M.P.: Scientific workflows for process mining: building blocks, scenarios, and implementation. Int. J. Softw. Tools Technol. Transfer. 18(6), 607–628 (2016). CrossRefGoogle Scholar
  22. 22.
    Weerdt, J.D., Backer, M.D., Vanthienen, J., Baesens, B.: A robust F-measure for evaluating discovered process models. In: Proceedings of the CIDM, pp. 148–155 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Mohammadreza Fani Sani
    • 1
    Email author
  • Sebastiaan J. van Zelst
    • 1
  • Wil M. P. van der Aalst
    • 1
  1. 1.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations