Complex Symbolic Sequence Clustering and Multiple Classifiers for Predictive Process Monitoring

  • Ilya VerenichEmail author
  • Marlon Dumas
  • Marcello La Rosa
  • Fabrizio Maria Maggi
  • Chiara Di Francescomarino
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 256)


This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case, and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex symbolic sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets.


Process mining Predictive process monitoring Complex symbolic sequence Clustering Ensemble methods 


  1. 1.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA (1984)zbMATHGoogle Scholar
  2. 2.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  3. 3.
    Conforti, R., de Leoni, M., La Rosa, M., van der Aalst, W.M.P., ter Hofstede, A.H.M.: A recommendation system for predicting risks across multiple business process instances. Decis. Support Syst. 69, 1–19 (2015)CrossRefGoogle Scholar
  4. 4.
    Francescomarino, C.D., Dumas, M., Maggi, F.M., Teinemaa, I.: Clustering-Based Predictive Process Monitoring. ArXiv e-prints, June 2015Google Scholar
  5. 5.
    Dumas, M., Maggi, F.M.: Enabling process innovation via deviance mining and predictive monitoring. In: vom Brocke, J., Schmiedel, T. (eds.) BPM - Driving Innovation in a Digital World. Management for Professionals, pp. 145–154. Springer, Heidelberg (2015)Google Scholar
  6. 6.
    Folino, F., Guarascio, M., Pontieri, L.: Discovering context-aware models for predicting business process performances. In: Meersman, R., et al. (eds.) OTM 2012, Part I. LNCS, vol. 7565, pp. 287–304. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Greco, G., Guzzo, A., Manco, G., Sacca, D.: Mining unconnected patterns in workflows. Inf. Syst. 32(5), 685–712 (2007)CrossRefGoogle Scholar
  8. 8.
    Grigori, D., Casati, F., Dayal, U., Shan, M.-C.: Improving business process quality through exception understanding, prediction, and prevention. In: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB 2001, pp. 159–168. San Francisco, CA, USA (2001). Morgan Kaufmann Publishers IncGoogle Scholar
  9. 9.
    Kang, B., Kim, D., Kang, S.-H.: Real-time business process monitoring method for prediction of abnormal termination using knni-based lof prediction. Expert Syst. Appl. 39(5), 6061–6068 (2012)CrossRefGoogle Scholar
  10. 10.
    Kononenko, I., Kukar, M.: Machine Learning and Data Mining. Elsevier Science, New York (2007)CrossRefzbMATHGoogle Scholar
  11. 11.
    Lakshmanan, G.T., Shamsi, D., Doganata, Y.N., Unuvar, M., Khalaf, R.: A markov prediction model for data-driven semi-structured business processes. Knowl. Inf. Syst. 42(1), 97–126 (2015)CrossRefGoogle Scholar
  12. 12.
    Langfeldera, P., Zhangb, B., Horvatha, S.: Dynamic tree cut: in-depth description, tests and applications, November 22, 2007Google Scholar
  13. 13.
    Leontjeva, A., Conforti, R., Francescomarino, C.D., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Motahari-Nezhad, H.R., Recker, J., Weidlich, M. (eds.) Business Process Management. LNCS, vol. 9253, pp. 297–313. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  14. 14.
    Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business processes. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 457–472. Springer, Heidelberg (2014)Google Scholar
  15. 15.
    Metzger, A., Franklin, R., Engel, Y.: Predictive monitoring of heterogeneous service-oriented business networks: The transport and logistics case. In: SRII Global Conference (SRII), 2012 Annual, pp. 313–322. IEEE (2012)Google Scholar
  16. 16.
    Nguyen, H., Dumas, M., La Rosa, M., Maggi, F.M., Suriadi, S.: Mining business process deviance: a quest for accuracy. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 436–445. Springer, Heidelberg (2014)Google Scholar
  17. 17.
    Pika, A., van der Aalst, W.M.P., Fidge, C.J., ter Hofstede, A.H.M., Wynn, M.T.: Predicting deadline transgressions using event logs. In: La Rosa, M., Soffer, P. (eds.) BPM Workshops 2012. LNBIP, vol. 132, pp. 211–216. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  18. 18.
    Rogge-Solti, A., Weske, M.: Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 389–403. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Setiawan, M.A., Sadiq, S.: A methodology for improving business process performance through positive deviance. Int. J. Inf. Syst. Model. Des. (IJISMD) 4(2), 1–22 (2013)CrossRefGoogle Scholar
  20. 20.
    Suriadi, S., Ouyang, C., van der Aalst, W.M.P., ter Hofstede, A.H.M.: Root cause analysis with enriched process logs. In: La Rosa, M., Soffer, P. (eds.) BPM Workshops 2012. LNBIP, vol. 132, pp. 174–186. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  21. 21.
    van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer, Heidelberg (2011).
  22. 22.
    van der Aalst, W.M.P., Pesic, M., Song, M.: Beyond process mining: from the past to present and future. In: Pernici, B. (ed.) CAiSE 2010. LNCS, vol. 6051, pp. 38–52. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Van der Aalst, W.M.P., Schonenberg, M.H., Song, M.: Time prediction based on process mining. Inf. Syst. 36(2), 450–475 (2011)CrossRefGoogle Scholar
  24. 24.
    van der Spoel, S., van Keulen, M., Amrit, C.: Process prediction in noisy data sets: a case study in a dutch hospital. In: Cudre-Mauroux, P., Ceravolo, P., Gašević, D. (eds.) SIMPDA 2012. LNBIP, vol. 162, pp. 60–83. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  25. 25.
    van Dongen, B.F., Crooy, R.A., van der Aalst, W.M.P.: Cycle time prediction: when will this case finally be finished? In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 319–336. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  26. 26.
    Xing, Z., Pei, J., Dong, G., Philip, S.Y.: Mining sequence classifiers for early prediction. In: SDM, pp. 644–655. SIAM (2008)Google Scholar
  27. 27.
    Xing, Z., Pei, J., Keogh, E.: A brief survey on sequence classification. ACM SIGKDD Explor. Newsl. 12(1), 40–48 (2010)CrossRefGoogle Scholar
  28. 28.
    Xu, R., Wunsch, D.: Clustering. IEEE Press Series on Computational Intelligence. Wiley, New York (2008)CrossRefGoogle Scholar
  29. 29.
    Zeng, L., Lingenfelder, C., Lei, H., Chang, H.: Event-driven quality of service prediction. In: Bouguettaya, A., Krueger, I., Margaria, T. (eds.) ICSOC 2008. LNCS, vol. 5364, pp. 147–161. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ilya Verenich
    • 1
    • 2
    Email author
  • Marlon Dumas
    • 2
  • Marcello La Rosa
    • 1
  • Fabrizio Maria Maggi
    • 2
  • Chiara Di Francescomarino
    • 3
  1. 1.Information Systems SchoolQueensland University of TechnologyBrisbaneAustralia
  2. 2.Institute of Computer ScienceUniversity of TartuTartuEstonia
  3. 3.FBK-IRSTTrentoItaly

Personalised recommendations