Skip to main content
Log in

Can recurrent neural networks learn process model structure?

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Various methods using machine and deep learning have been proposed to tackle different tasks in predictive process monitoring, forecasting for an ongoing case e.g. the most likely next event or suffix, its remaining time, or an outcome-related variable. Recurrent neural networks (RNNs), and more specifically long short-term memory nets (LSTMs), stand out in terms of popularity. In this work, we investigate the capabilities of such an LSTM to actually learn the underlying process model structure of an event log. We introduce an evaluation framework that combines variant-based resampling and custom metrics for fitness, precision and generalization. We evaluate 4 hypotheses concerning the learning capabilities of LSTMs, the effect of overfitting countermeasures, the level of incompleteness in the training set and the level of parallelism in the underlying process model. We confirm that LSTMs can struggle to learn process model structure, even with simplistic process data and in a very lenient setup. Taking the correct anti-overfitting measures can alleviate the problem. However these measures did not present themselves to be optimal when selecting hyperparameters purely on predicting accuracy. We also found that decreasing the amount of information seen by the LSTM during training, causes a sharp drop in generalization and precision scores. In our experiments, we could not identify a relationship between the extent of parallelism in the model and the generalization capability, but they do indicate that the process’ complexity might have impact.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available in the github repository, https://github.com/jaripeeperkorn/LSTM_Process_Model_Structure.

Notes

  1. https://github.com/jaripeeperkorn/LSTM_Process_Model_Structure

  2. In the, since then updated version of the PM4PY (Berti et al., 2019) (version 2.2.18, fixed since 2.2.19), silent activities could still be generated in certain cases, even though the setting was put to 0. Because there is no intrinsic reason we do or do not want to consider these, and because certain experiments were already performed, we opted to keep using the models that do contain some silent activities.

  3. https://github.com/jaripeeperkorn/LSTM_Process_Model_Structure

  4. https://keras.io

References

  • Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization.

  • Berti, A., van Zelst, S. J., & van der Aalst, W. (2019). Process mining for python (PM4py): Bridging the gap between process-and data science. In Proceedings of the ICPM Demo Track 2019, Co-located with 1st International Conference on Process Mining (ICPM 2019), Aachen, Germany, June 24-26, 2019, (pp. 13–16).

  • Bukhsh, Z.A., Saeed, A., & Dikman, R. M. (2021). Processtransformer: Predictive business process monitoring with transformer network. CoRR abs/2104.00721.

  • Camargo, M., Dumas, M., & González-Rojas, O. (2019). Learning accurate lstm models of business processes. In T. Hildebrandt, B.F. van dongen, M. Röglinger, & J. Mendling (Eds.) Management, Business Process, (pp. 286–302). Springer.

  • Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078.

  • Cooijmans, T., Ballas, N., Laurent, C., & Courville, A. C. (2016). Recurrent batch normalization. CoRR abs/1603.09025.

  • Evermann, J., Rehse, J. -R., & Fettke, P. (2017). Predicting process behaviour using deep learning. Decision Support Systems, 100, 129–140. https://doi.org/10.1016/j.dss.2017.04.003.

    Article  Google Scholar 

  • Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471. https://doi.org/10.1162/089976600300015015.

    Article  Google Scholar 

  • Guzzo, A., Joaristi, M., Rullo, A., & Serra, E. (2021). A multi-perspective approach for the analysis of complex business processes behavior. Expert Systems with Applications, 177, 114934. https://doi.org/10.1016/j.eswa.2021.114934.

    Article  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.

    Article  Google Scholar 

  • Jouck, T., & Depaire, B. (2016). Ptandloggenerator: a generator for artificial event data. BPM (Demos), 1789, 23–27.

    Google Scholar 

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Y. Bengio Y. LeCun (Eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 1412.6980.

  • Klinkmüller, C., van Beest, N. R. T. P., & Weber, I. (2018). Towards reliable predictive process monitoring. In J. Mendling H. Mouratidis (Eds.) Information systems in the big data era, (pp. 163–181). Springer.

  • Lawrence, S., Giles, C. L., & Fong, S. (2000). Natural language grammatical inference with recurrent neural networks. IEEE Transactions on Knowledge and Data Engineering, 12(1), 126–140. https://doi.org/10.1109/69.842255.

    Article  Google Scholar 

  • Leemans, S. J. J., Fahland, D., & van der Aalst, W. M. P. (2013). Discovering block-structured process models from event logs - a constructive approach. In J. -M. Colom J. Desel (Eds.) Application and theory of petri nets and concurrency, (pp. 311–329). Springer.

  • Lin, L., Wen, L., & Wang, J. (2019). MM-Pred: A deep predictive model for multi-attribute event sequence, 118–126. https://doi.org/10.1137/1.9781611975673.14.

  • Mehdiyev, N., Evermann, J., & Fettke, P. (2017). A multi-stage deep learning approach for business process event prediction. In 2017 IEEE 19Th conference on business informatics (CBI), (vol. 01, pp. 119–128), https://doi.org/10.1109/CBI.2017.46.

  • Moreira, C., Haven, E., Sozzo, S., & Wichert, A. (2018). Process mining with real world financial loan applications: Improving inference on incomplete event logs. PLOS ONE, 0207806, 13. https://doi.org/10.1371/journal.pone.0207806.

    Article  Google Scholar 

  • Pasquadibisceglie, V., Appice, A., Castellano, G., & Malerba, D. (2019). Using convolutional neural networks for predictive process analytics. In 2019 International conference on process mining (ICPM), (pp. 129–136). https://doi.org/10.1109/ICPM.2019.00028.

  • Peeperkorn, J., vanden Broucke, S., & De Weerdt, J. (2022). Can deep neural networks learn process model structure? an assessment framework and analysis. In J. Munoz-gama X. Lu (Eds.) Workshops, Process Mining, (pp. 127-139). Springer.

  • Petri, C. A. (1962). Kommunikation mit automaten. PhD thesis: Universität Hamburg.

  • Rama-Maneiro, E., Vidal, J. C., & Lama, M. (2021). Embedding graph convolutional networks in recurrent neural networks for predictive monitoring. CoRR abs/2112.09641.

  • Schäfer, A. M., & Zimmermann, H. G. (2006). Recurrent neural networks are universal approximators. In S. D. Kollias, A. Stafylopatis, W. Duch, & E. Oja (Eds.) Artificial neural networks – ICANN 2006, (pp. 632–640). Springer.

  • Sennhauser, L., & Berwick, R. (2018). Evaluating the ability of LSTMs to learn context-free grammars. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks For NLP, (pp. 115–124). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-5414.

  • Sennhauser, L., & Berwick, R. (2018). Evaluating the ability of LSTMs to learn context-free grammars. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks For NLP, (pp. 115–124). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-5414.

  • Siegelmann, H. T., & Sontag, E. D. (1995). On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), 132–150.

    Article  MathSciNet  MATH  Google Scholar 

  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.

    MathSciNet  MATH  Google Scholar 

  • Stevens, A., De Smedt, J., & Peeperkorn, J. (2022). Quantifying explainability in outcome-oriented predictive process monitoring. In J. Munoz-gama X. Lu (Eds.) Workshops, Process Mining, (pp. 194–206). Springer.

  • Tax, N., Teinemaa, I., & Zelst, S. J. (2020). An interdisciplinary comparison of sequence modeling methods for next-element prediction. Software and Systems Modeling, 1–21.

  • Tax, N., Verenich, I., La Rosa, M., & Dumas, M. (2017). Predictive business process monitoring with lstm neural networks. Lecture Notes in Computer Science, 477–492. https://doi.org/10.1007/978-3-319-59536-8_3.

  • Tax, N., van Zelst, S. J., & Teinemaa, I. (2018). An experimental evaluation of the generalizing capabilities of process discovery techniques and black-box sequence models. In J. Gulden, I. Reinhartz-berger, R. Schmidt, S. Guerreiro, W. Guédria, & P. Bera (Eds.) Enterprise, Business-Process and Information Systems Modeling, (pp. 165–180), Springer.

  • Taymouri, F., Rosa, M. L., Erfani, S., Bozorgi, Z. D., & Verenich, I. (2020). Predictive business process monitoring via generative adversarial nets: The case of next event prediction. In D. Fahland, C. Ghidini, J. Becker, & M. Dumas (Eds.) Management, Business Process, (pp. 237-256). Springer.

  • Tu, Z., He, F., & Tao, D. (2020). Understanding generalization in recurrent neural networks. In International Conference on Learning Representations. https://openreview.net/forum?id=rkgg6xBYDH.

  • Weinzierl, S., Zilker, S., Brunk, J., Revoredo, K., Nguyen, A., Matzner, M., Becker, J., & Eskofier, B. (2020). An empirical comparison of deep-neural-network architectures for next activity prediction using context-enriched process event logs. CoRR abs/2005.01194.

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2), 301–320.

    Article  MathSciNet  MATH  Google Scholar 

  • van der Aalst, W. M. P., & Santos, L. (2022). May i take your order?. In A. Marrella B. Weber (Eds.) Workshops, Business Process Management, (pp. 99–110). Springer.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jari Peeperkorn.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Absolute metrics

Complementary to the metrics introduced in in Section 4, we have designed three alternative metrics, which we call the “absolute” metrics and which can be seen below. These are not well calibrated, as they highly depend on the predetermined size of the Tr+Te and the Simulated Log (in this work always taken to be 100 times the number of variants). This is because the values depend on whether a certain variants present in another log at all, without taking multiplicities into account. For this we use the Ex(v, L) function. Subsequently, FAPMSL corresponds with the fraction of variants present in the original Training Log that are actually replicated in the Simulated Log. PAPMSL then measures how many of the variants produced by the LSTM in the Simulated Log actually corresponds with correct behavior, present in the original Tr+Te Log. Finally, GAPMSL just counts how many of the variants in the Test Log are correctly reproduced by the LSTM. In case of the LOVOCV GAPMSL is equal to either 0 or 1, since there is only one variant in the Simulated Log.

$$ F_{{A-PMSL}} = \sum\limits_{v\in{Var(Tr)}} \frac{Ex\left( v, Sim\right)}{|{Var(Tr)}} $$
(4)
$$ P_{{A-PMSL}} = \sum\limits_{v\in{Var(Sim)}} \frac{Ex\left( v, Tr+Te\right)}{|{Var(Sim)}|} $$
(5)
$$ G_{{A-PMSL}} = \sum\limits_{v\in{Var(Te)}} \frac{Ex\left( v, Sim\right)}{|{Var(Te)}|} $$
(6)
$$ \text{With: } {Ex}(v, L) = \begin{cases} 1 & \text{if } v \in L\\ 0 & \text{if } v \not\in L \end{cases} $$
(7)

Next to the metrics in the paper itself, these metrics have been applied on the logs, produced by the LSTMs discussed throughout (Tables 6 and 7 and Fig. 5). Using these results we can unveil whether non-optimal results on the original metrics (FPMSL, PPMSL and GPMSL) are due to the LSTM not learning certain behavior at all, or just because it displays lower probabilities to simulate these variants. The lower “absolute” precision scores (PAPMSL), combined with the nonetheless high PPMSL scores, shows that a lot of the LSTMs simulate a lot of different (wrong) variants, which do not correspond with allowed process model behavior; however do this with low multiplicities.

Appendix B: Comparing with inductive miner

To compare the LSTMs’ abilities to understand the six different basic control flow elements introduced in Fig. 3 to “classical” process discovery, we have performed some additional experiments. For this purpose we have used the Inductive Miner (Leemans et al., 2013) as implemented in pm4py (Berti et al., 2019), with the default settings (corresponding to a noise threshold of 0.0) to discover Process Trees. These process Trees are further played out (simulated) with the basic play out function, implemented in pm4py (Berti et al., 2019) as well. We have repeated both the LOVOCV and the fold experiment introduced in Section 5. In the fold experiment we take out an increasing number of variants from the Training Log to group them into the Test Log. We have used both the metrics from (1)–(3) and the “absolute” metrics (4)–(6) introduced in Appendix A. The results of these experiments are visualized in Figs. 6 and 7. We can immediately notice that Model 1, the model with one parallel split, is discovered (and played out) perfectly by the Inductive Miner, and this for both the LOVOCV as for every fold. This is in contrast to the LSTM models, which seem to struggle a bit more with correctly interpreting this kind of behavior. Model 2, introducing multiple XOR splits, also does not cause any issues for the Inductive Miner, with the exception of the 2 fold experiment (where half of the control-flow variants are sampled out of the Training Log). The long-term dependency introduced in Model 3 is not picked up at all by the Process Tree discovered with the Inductive Miner (unsurprisingly). This results in near perfect absolute fitness (4) and generalization (6) scores (the Process Tree is able to play out the behavior), but frequency dependent fitness (1) and generalization (3) scores of around 0.5. Both precision scores are also around 0.5. Remarkably when performing the 2 fold experiment, fitness (and absolute precision) goes to 1.0 and generalization to 0.0. What happens is that the model completely overfits on the possible control-flow variants (one big XOR split) preventing any level of generalization. A similar type of behavior can be noticed for Model 4, which consists out of different inclusive OR splits. The longer parallel tracks of Model 5 are interpreted worse and worse, as more and more variants are left out of the Training Log, leading to a decreasing trend in all metrics. This is comparable to the results of the LSTMs trained on the same model. It is in contrast to the better interpretation of the Inductive Miner of the parallel split in Model 1 (with short 1 activity long parallel tracks). Finally the loops in Model 6 are modeled correctly by the Inductive Miner, where the imperfect frequency based fitness and precision scores are due to the actually infinite amount of possible behavior introduced by loops.

Fig. 6
figure 6

The framework output, using metrics (1)-(3), as performed on the simulated Logs produced by playing out Process Trees discovered with the Inductive Miner (Leemans et al., 2013), discovered with different amounts of variants left out and put aside in the Test Log

Fig. 7
figure 7

The framework output, using the alternative absolute metrics (4)-(6), as performed on the simulated Logs produced by playing out Process Trees discovered with the Inductive Miner (Leemans et al., 2013), discovered with different amounts of variants left out and put aside in the Test Log

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peeperkorn, J., Broucke, S.v. & De Weerdt, J. Can recurrent neural networks learn process model structure?. J Intell Inf Syst 61, 27–51 (2023). https://doi.org/10.1007/s10844-022-00765-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-022-00765-x

Keywords

Navigation