A Novel Business Process Prediction Model Using a Deep Learning Method


The ability to proactively monitor business processes is a main competitive differentiator for firms. Process execution logs generated by process aware information systems help to make process specific predictions for enabling a proactive situational awareness. The goal of the proposed approach is to predict the next process event from the completed activities of the running process instance, based on the execution log data from previously completed process instances. By predicting process events, companies can initiate timely interventions to address undesired deviations from the desired workflow. The paper proposes a multi-stage deep learning approach that formulates the next event prediction problem as a classification problem. Following a feature pre-processing stage with n-grams and feature hashing, a deep learning model consisting of an unsupervised pre-training component with stacked autoencoders and a supervised fine-tuning component is applied. Experiments on a variety of business process log datasets show that the multi-stage deep learning approach provides promising results. The study also compared the results to existing deep recurrent neural networks and conventional classification approaches. Furthermore, the paper addresses the identification of suitable hyperparameters for the proposed approach, and the handling of the imbalanced nature of business process event datasets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. Barga R, Fontama V, Tok WH, Cabrera-Cordon L (2015) Predictive analytics with Microsoft Azure machine learning. Apress, Berkely, CA

    Google Scholar 

  2. Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. Advances in neural information processing systems. Granada, Spain, pp 2546–2554

    Google Scholar 

  3. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305

    Google Scholar 

  4. Bose RPJC, van der Aalst WMP, Žliobaitė I, Pechenizkiy M (2011) Handling concept drift in process mining. In: International conference on advanced information systems engineering, Springer, London, pp 391–405

  5. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159

    Article  Google Scholar 

  6. Breuker D, Matzner M, Delfmann P, Becker J (2016) Comprehensible predictive models for business processes. MIS Q 40(4):1009–1034

    Article  Google Scholar 

  7. Candel A, Parmar V, LeDell E, Arora A (2016) Deep learning with h2o. H2O Inc, CA

    Google Scholar 

  8. Caragea C, Silvescu A, Mitra P (2012) Protein sequence classification using feature hashing. Proteome Sci 10(1):1–14

    Article  Google Scholar 

  9. Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: 25th international conference on machine learning, ACM, Helsinki, pp 96–103

  10. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: 23rd international conference on machine learning. ACM, Pittsburgh, pp 161–168

  11. Da Silva NFF, Hruschka ER, Hruschka ER (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179

    Article  Google Scholar 

  12. Davenport TH, Harris JG (2007) Competing on analytics: the new science of winning. Harvard Business School Press, Boston

    Google Scholar 

  13. Di Francescomarino C, Dumas M, Federici M, et al (2016) Predictive business process monitoring framework with hyperparameter optimization. In: 28th international conference on advanced information systems engineering, Springer, Ljubljana, pp 361–376

  14. Duan L, Da Xu L (2012) Business intelligence for enterprise systems: a survey. IEEE Trans Ind Inform 8(3):679–687

    Article  Google Scholar 

  15. Erhan D, Bengio Y, Courville A et al (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660

    Google Scholar 

  16. Evermann J, Rehse J-R, Fettke P (2017) Predicting process behaviour using deep learning. Decis Support Syst 100:129–140

    Article  Google Scholar 

  17. Folino F, Guarascio M, Pontieri L (2012) Discovering context-aware models for predicting business process performances. In: OTM confederated international conferences “on the move to meaningful internet systems”, Springer, Rome, pp 287–304

  18. Forman G, Kirshenbaum E (2008) Extremely fast text feature extraction for classification and indexing. In: 17th ACM conference on information and knowledge management. ACM, Napa Valley, pp 1221–1230

  19. Ganchev K, Dredze M (2008) Small statistical models by random feature mixing. In: the ACL08 HLT workshop on mobile language processing, Columbus, OH, pp 19–20

  20. Goodfellow IJ, Warde-Farley D, Mirza M, et al (2013) Maxout networks. (preprint) arXiv arXiv:1302.4389. Accessed 30 Oct 2017

  21. Gregor S, Hevner AR (2013) Positioning and presenting design science research for maximum impact. MIS Q 37(2):337–356

    Article  Google Scholar 

  22. Hall M, Frank E, Holmes G et al (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  23. Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg, pp 878–887

  24. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  Google Scholar 

  25. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  Google Scholar 

  26. Huang C, Li Y, Change Loy C, Tang X (2016) Learning deep representation for imbalanced classification. In: IEEE conference on computer vision and pattern recognition, IEEE, Las Vegas, pp 5375–5384

  27. Izadyyazdanabadi M, Belykh E, Mooney M et al (2017) Convolutional neural networks: ensemble modeling, fine-tuning and unsupervised semantic localization for intraoperative CLE images. https://arxiv.org/pdf/1709.03028. Accessed 30 Oct 2017

  28. Kang B, Kim D, Kang S (2012a) Periodic performance prediction for real-time business process monitoring. Ind Manag Data Syst 112(1):4–23

    Article  Google Scholar 

  29. Kang B, Kim D, Kang S-H (2012b) Real-time business process monitoring method for prediction of abnormal termination using KNNI-based LOF prediction. Expert Syst Appl 39(5):6061–6068

    Article  Google Scholar 

  30. Lakshmanan GT, Shamsi D, Doganata YN et al (2015) A markov prediction model for data-driven semi-structured business processes. Knowl Inf Syst 42(1):97–126

    Article  Google Scholar 

  31. Langford J, Li L, Strehl A (2007) Vowpal wabbit online learning project. Technical report, Yahoo!

  32. Larochelle H, Erhan D, Courville A et al (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: 24th international conference on machine larning, ACM, Corvallis, pp 473–480

  33. LaValle S, Lesser E, Shockley R et al (2011) Big data, analytics and the path from insights to value. MIT Sloan Manag Rev 52:21–32

    Google Scholar 

  34. Le M, Gabrys B, Nauck D (2017) A hybrid model for business process event and outcome prediction. Expert Syst 34(5):e12079

    Article  Google Scholar 

  35. Le M, Nauck D, Gabrys B, Martin T (2014) Sequential clustering for event sequences and its impact on next process step prediction. In: International conference on information processing and management of uncertainty in knowledge-based systems, Springer, Cádiz, pp 168–178

  36. LeCun YA, Bottou L, Orr GB, Müller KR (2012) Efficient backprop. Neural networks: tricks of the trade. Springer, Berlin, pp 9–50

    Google Scholar 

  37. Leontjeva A, Conforti R, Di Francescomarino C, et al (2015) Complex symbolic sequence encodings for predictive monitoring of business processes. In: International conference on business process management, Springer, Innsbruck, pp 297–313

  38. Márquez-Chamorro AE, Resinas M, Ruiz-Cortés A, Toro M (2017) Run-time prediction of business process indicators using evolutionary decision rules. Expert Syst Appl 87:1–14

    Article  Google Scholar 

  39. Mehdiyev N, Evermann J, Fettke P (2017) A multi-stage deep learning approach for business process event prediction. In: IEEE 19th conference on business informatics, IEEE, Thessaloniki, pp 119–128

  40. Metzger A, Leitner P, Ivanovic D et al (2015) Comparing and combining predictive business process monitoring techniques. IEEE Trans Syst, Man, Cybern Syst 45(2):276–290

    Article  Google Scholar 

  41. Polato M, Sperduti A, Burattin A, de Leoni M (2016) Time and activity sequence prediction of business process instances. http://arxiv.org/abs/1602.07566. Accessed 01 Sept 2017

  42. Robnik-Šikonja M (2014) Data generator based on RBF network. (preprint) arXiv arXiv:1403.7308. Accessed 01 Sept 2017

  43. Rogge-Solti A, Weske M (2013) Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: International conference on service-oriented computing, Springer, Berlin, pp 389–403

  44. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  45. Senderovich A, Di Francescomarino C, Ghidini C et al (2017) Intra and inter-case features in predictive process monitoring: a tale of two dimensions. In: International conference on business process management, Springer, Barcelona, pp 306–323

  46. Shi S, Chu X (2017) Speeding up convolutional neural networks by exploiting the sparsity of rectifier units. https://arxiv.org/pdf/1704.07724. Accessed 30 Oct 2017

  47. Steeman W (2013) BPI challenge 2013. https://doi.org/10.4121/uuid:a7ce5c55-03a7-4583-b855-98b86e1a2b07. Accessed 01 Sept 2017

  48. Sun Z, Pambel F, Wang F (2015) Incorporating big data analytics into enterprise information systems. In: Information and communication technology: third IFIP TC 5/8 international conference, ICT-EurAsia 2015, and 9th IFIP WG 8.9 working conference, CONFENIS 2015, Springer, Daejeon, pp 300–309

  49. Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719

    Article  Google Scholar 

  50. Tax N, Verenich I, La Rosa M, Dumas M (2017) Predictive business process monitoring with LSTM neural networks. In: International conference on advanced information systems engineering, Springer, Essen, pp 477–492

  51. Tomović A, Janičić P, Kešelj V (2006) n-Gram-based classification and unsupervised hierarchical clustering of genome sequences. Comput Methods Programs Biomed 81(2):137–153

    Article  Google Scholar 

  52. Tu JV (1996) Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol 49(11):1225–1231

    Article  Google Scholar 

  53. Unuvar M, Lakshmanan GT, Doganata YN (2016) Leveraging path information to generate predictions for parallel business processes. Knowl Inf Syst 47(2):433–461

    Article  Google Scholar 

  54. van der Aalst WMP, Schonenberg MH, Song M (2011) Time prediction based on process mining. Inf Syst 36(2):450–475

    Article  Google Scholar 

  55. van Dongen BF (2012) BPI challenge 2012. https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f. Accessed 01 Sept 2017

  56. van Dongen BF, Crooy RA, van der Aalst WMP (2008) Cycle time prediction: when will this case finally be finished? In: OTM confederated international conferences “on the move to meaningful internet systems”, Springer, Monterey, pp 319–336

  57. Verenich I (2016) Helpdesk. https://doi.org/10.17632/39bp3vv62t.1. Accessed 01 Sept 2017

  58. Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: 25th international conference on machine learning, ACM, Helsinki, pp 1096–1103

  59. Vincent P, Larochelle H, Lajoie I et al (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    Google Scholar 

  60. Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(4):1119–1130

    Article  Google Scholar 

  61. Weinberger K, Dasgupta A, Langford J, et al (2009) Feature hashing for large scale multitask learning. In: Proceedings of the 26th annual international conference on machine learning – ICML’09, ACM, Montreal, pp 1–8

  62. Wickham H, Francois R (2015) dplyr: a grammar of data manipulation. R Package Version 04(1):20

    Google Scholar 

  63. Witt N, Seifert C (2017) Understanding the influence of hyperparameters on text embeddings for text classification tasks. In: International conference on theory and practice of digital libraries, Springer, Thessaloniki, pp 193–204

  64. Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37

    Article  Google Scholar 

  65. XES Standard (2016) 1849–2016-IEEE standard for eXtensible event stream (XES) for achieving interoperability in event logs and event streams. http://www.xes-standard.org/. Accessed 01 Sept 2017

Download references

Author information



Corresponding author

Correspondence to Nijat Mehdiyev.

Additional information

Accepted after one revision by Jelena Zdravkovic.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mehdiyev, N., Evermann, J. & Fettke, P. A Novel Business Process Prediction Model Using a Deep Learning Method. Bus Inf Syst Eng 62, 143–157 (2020). https://doi.org/10.1007/s12599-018-0551-3

Download citation


  • Process prediction
  • Deep learning
  • Feature hashing
  • N-grams
  • Stacked autoencoders