Can recurrent neural networks learn process model structure?

Peeperkorn, Jari; Broucke, Seppe vanden; De Weerdt, Jochen

doi:10.1007/s10844-022-00765-x

Can recurrent neural networks learn process model structure?

Published: 01 December 2022

Volume 61, pages 27–51, (2023)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Jari Peeperkorn ORCID: orcid.org/0000-0003-4644-4881¹,
Seppe vanden Broucke^1,2 &
Jochen De Weerdt¹

582 Accesses
2 Citations
4 Altmetric
Explore all metrics

Abstract

Various methods using machine and deep learning have been proposed to tackle different tasks in predictive process monitoring, forecasting for an ongoing case e.g. the most likely next event or suffix, its remaining time, or an outcome-related variable. Recurrent neural networks (RNNs), and more specifically long short-term memory nets (LSTMs), stand out in terms of popularity. In this work, we investigate the capabilities of such an LSTM to actually learn the underlying process model structure of an event log. We introduce an evaluation framework that combines variant-based resampling and custom metrics for fitness, precision and generalization. We evaluate 4 hypotheses concerning the learning capabilities of LSTMs, the effect of overfitting countermeasures, the level of incompleteness in the training set and the level of parallelism in the underlying process model. We confirm that LSTMs can struggle to learn process model structure, even with simplistic process data and in a very lenient setup. Taking the correct anti-overfitting measures can alleviate the problem. However these measures did not present themselves to be optimal when selecting hyperparameters purely on predicting accuracy. We also found that decreasing the amount of information seen by the LSTM during training, causes a sharp drop in generalization and precision scores. In our experiments, we could not identify a relationship between the extent of parallelism in the model and the generalization capability, but they do indicate that the process’ complexity might have impact.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can Deep Neural Networks Learn Process Model Structure? An Assessment Framework and Analysis

Classifying Process Instances Using Recurrent Neural Networks

An Eye into the Future: Leveraging A-priori Knowledge in Predictive Business Process Monitoring

Data Availability

The datasets generated during and/or analysed during the current study are available in the github repository, https://github.com/jaripeeperkorn/LSTM_Process_Model_Structure.

Notes

https://github.com/jaripeeperkorn/LSTM_Process_Model_Structure
In the, since then updated version of the PM4PY (Berti et al., 2019) (version 2.2.18, fixed since 2.2.19), silent activities could still be generated in certain cases, even though the setting was put to 0. Because there is no intrinsic reason we do or do not want to consider these, and because certain experiments were already performed, we opted to keep using the models that do contain some silent activities.
https://github.com/jaripeeperkorn/LSTM_Process_Model_Structure
https://keras.io

References

Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization.
Berti, A., van Zelst, S. J., & van der Aalst, W. (2019). Process mining for python (PM4py): Bridging the gap between process-and data science. In Proceedings of the ICPM Demo Track 2019, Co-located with 1st International Conference on Process Mining (ICPM 2019), Aachen, Germany, June 24-26, 2019, (pp. 13–16).
Bukhsh, Z.A., Saeed, A., & Dikman, R. M. (2021). Processtransformer: Predictive business process monitoring with transformer network. CoRR abs/2104.00721.
Camargo, M., Dumas, M., & González-Rojas, O. (2019). Learning accurate lstm models of business processes. In T. Hildebrandt, B.F. van dongen, M. Röglinger, & J. Mendling (Eds.) Management, Business Process, (pp. 286–302). Springer.
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078.
Cooijmans, T., Ballas, N., Laurent, C., & Courville, A. C. (2016). Recurrent batch normalization. CoRR abs/1603.09025.
Evermann, J., Rehse, J. -R., & Fettke, P. (2017). Predicting process behaviour using deep learning. Decision Support Systems, 100, 129–140. https://doi.org/10.1016/j.dss.2017.04.003.
Article Google Scholar
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471. https://doi.org/10.1162/089976600300015015.
Article Google Scholar
Guzzo, A., Joaristi, M., Rullo, A., & Serra, E. (2021). A multi-perspective approach for the analysis of complex business processes behavior. Expert Systems with Applications, 177, 114934. https://doi.org/10.1016/j.eswa.2021.114934.
Article Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Article Google Scholar
Jouck, T., & Depaire, B. (2016). Ptandloggenerator: a generator for artificial event data. BPM (Demos), 1789, 23–27.
Google Scholar
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Y. Bengio Y. LeCun (Eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 1412.6980.
Klinkmüller, C., van Beest, N. R. T. P., & Weber, I. (2018). Towards reliable predictive process monitoring. In J. Mendling H. Mouratidis (Eds.) Information systems in the big data era, (pp. 163–181). Springer.
Lawrence, S., Giles, C. L., & Fong, S. (2000). Natural language grammatical inference with recurrent neural networks. IEEE Transactions on Knowledge and Data Engineering, 12(1), 126–140. https://doi.org/10.1109/69.842255.
Article Google Scholar
Leemans, S. J. J., Fahland, D., & van der Aalst, W. M. P. (2013). Discovering block-structured process models from event logs - a constructive approach. In J. -M. Colom J. Desel (Eds.) Application and theory of petri nets and concurrency, (pp. 311–329). Springer.
Lin, L., Wen, L., & Wang, J. (2019). MM-Pred: A deep predictive model for multi-attribute event sequence, 118–126. https://doi.org/10.1137/1.9781611975673.14.
Mehdiyev, N., Evermann, J., & Fettke, P. (2017). A multi-stage deep learning approach for business process event prediction. In 2017 IEEE 19Th conference on business informatics (CBI), (vol. 01, pp. 119–128), https://doi.org/10.1109/CBI.2017.46.
Moreira, C., Haven, E., Sozzo, S., & Wichert, A. (2018). Process mining with real world financial loan applications: Improving inference on incomplete event logs. PLOS ONE, 0207806, 13. https://doi.org/10.1371/journal.pone.0207806.
Article Google Scholar
Pasquadibisceglie, V., Appice, A., Castellano, G., & Malerba, D. (2019). Using convolutional neural networks for predictive process analytics. In 2019 International conference on process mining (ICPM), (pp. 129–136). https://doi.org/10.1109/ICPM.2019.00028.
Peeperkorn, J., vanden Broucke, S., & De Weerdt, J. (2022). Can deep neural networks learn process model structure? an assessment framework and analysis. In J. Munoz-gama X. Lu (Eds.) Workshops, Process Mining, (pp. 127-139). Springer.
Petri, C. A. (1962). Kommunikation mit automaten. PhD thesis: Universität Hamburg.
Rama-Maneiro, E., Vidal, J. C., & Lama, M. (2021). Embedding graph convolutional networks in recurrent neural networks for predictive monitoring. CoRR abs/2112.09641.
Schäfer, A. M., & Zimmermann, H. G. (2006). Recurrent neural networks are universal approximators. In S. D. Kollias, A. Stafylopatis, W. Duch, & E. Oja (Eds.) Artificial neural networks – ICANN 2006, (pp. 632–640). Springer.
Sennhauser, L., & Berwick, R. (2018). Evaluating the ability of LSTMs to learn context-free grammars. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks For NLP, (pp. 115–124). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-5414.
Sennhauser, L., & Berwick, R. (2018). Evaluating the ability of LSTMs to learn context-free grammars. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks For NLP, (pp. 115–124). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-5414.
Siegelmann, H. T., & Sontag, E. D. (1995). On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), 132–150.
Article MathSciNet MATH Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.
MathSciNet MATH Google Scholar
Stevens, A., De Smedt, J., & Peeperkorn, J. (2022). Quantifying explainability in outcome-oriented predictive process monitoring. In J. Munoz-gama X. Lu (Eds.) Workshops, Process Mining, (pp. 194–206). Springer.
Tax, N., Teinemaa, I., & Zelst, S. J. (2020). An interdisciplinary comparison of sequence modeling methods for next-element prediction. Software and Systems Modeling, 1–21.
Tax, N., Verenich, I., La Rosa, M., & Dumas, M. (2017). Predictive business process monitoring with lstm neural networks. Lecture Notes in Computer Science, 477–492. https://doi.org/10.1007/978-3-319-59536-8_3.
Tax, N., van Zelst, S. J., & Teinemaa, I. (2018). An experimental evaluation of the generalizing capabilities of process discovery techniques and black-box sequence models. In J. Gulden, I. Reinhartz-berger, R. Schmidt, S. Guerreiro, W. Guédria, & P. Bera (Eds.) Enterprise, Business-Process and Information Systems Modeling, (pp. 165–180), Springer.
Taymouri, F., Rosa, M. L., Erfani, S., Bozorgi, Z. D., & Verenich, I. (2020). Predictive business process monitoring via generative adversarial nets: The case of next event prediction. In D. Fahland, C. Ghidini, J. Becker, & M. Dumas (Eds.) Management, Business Process, (pp. 237-256). Springer.
Tu, Z., He, F., & Tao, D. (2020). Understanding generalization in recurrent neural networks. In International Conference on Learning Representations. https://openreview.net/forum?id=rkgg6xBYDH.
Weinzierl, S., Zilker, S., Brunk, J., Revoredo, K., Nguyen, A., Matzner, M., Becker, J., & Eskofier, B. (2020). An empirical comparison of deep-neural-network architectures for next activity prediction using context-enriched process event logs. CoRR abs/2005.01194.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2), 301–320.
Article MathSciNet MATH Google Scholar
van der Aalst, W. M. P., & Santos, L. (2022). May i take your order?. In A. Marrella B. Weber (Eds.) Workshops, Business Process Management, (pp. 99–110). Springer.

Download references

Author information

Authors and Affiliations

Research Center for Information Systems Engineering (LIRIS), KU Leuven, Leuven, Belgium
Jari Peeperkorn, Seppe vanden Broucke & Jochen De Weerdt
Department of Business Informatics and Operations Management, Ghent University, Ghent, Belgium
Seppe vanden Broucke

Authors

Jari Peeperkorn
View author publications
You can also search for this author in PubMed Google Scholar
Seppe vanden Broucke
View author publications
You can also search for this author in PubMed Google Scholar
Jochen De Weerdt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jari Peeperkorn.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Absolute metrics

Complementary to the metrics introduced in in Section 4, we have designed three alternative metrics, which we call the “absolute” metrics and which can be seen below. These are not well calibrated, as they highly depend on the predetermined size of the Tr+Te and the Simulated Log (in this work always taken to be 100 times the number of variants). This is because the values depend on whether a certain variants present in another log at all, without taking multiplicities into account. For this we use the Ex(v, L) function. Subsequently, F_A−PMSL corresponds with the fraction of variants present in the original Training Log that are actually replicated in the Simulated Log. P_A−PMSL then measures how many of the variants produced by the LSTM in the Simulated Log actually corresponds with correct behavior, present in the original Tr+Te Log. Finally, G_A−PMSL just counts how many of the variants in the Test Log are correctly reproduced by the LSTM. In case of the LOVOCV G_A−PMSL is equal to either 0 or 1, since there is only one variant in the Simulated Log.

$$ F_{{A-PMSL}} = \sum\limits_{v\in{Var(Tr)}} \frac{Ex\left( v, Sim\right)}{|{Var(Tr)}} $$

(4)

$$ P_{{A-PMSL}} = \sum\limits_{v\in{Var(Sim)}} \frac{Ex\left( v, Tr+Te\right)}{|{Var(Sim)}|} $$

(5)

$$ G_{{A-PMSL}} = \sum\limits_{v\in{Var(Te)}} \frac{Ex\left( v, Sim\right)}{|{Var(Te)}|} $$

(6)

$$ \text{With: } {Ex}(v, L) = \begin{cases} 1 & \text{if } v \in L\\ 0 & \text{if } v \not\in L \end{cases} $$

(7)

Next to the metrics in the paper itself, these metrics have been applied on the logs, produced by the LSTMs discussed throughout (Tables 6 and 7 and Fig. 5). Using these results we can unveil whether non-optimal results on the original metrics (F_PMSL, P_PMSL and G_PMSL) are due to the LSTM not learning certain behavior at all, or just because it displays lower probabilities to simulate these variants. The lower “absolute” precision scores (P_A−PMSL), combined with the nonetheless high P_PMSL scores, shows that a lot of the LSTMs simulate a lot of different (wrong) variants, which do not correspond with allowed process model behavior; however do this with low multiplicities.

Appendix B: Comparing with inductive miner

To compare the LSTMs’ abilities to understand the six different basic control flow elements introduced in Fig. 3 to “classical” process discovery, we have performed some additional experiments. For this purpose we have used the Inductive Miner (Leemans et al., 2013) as implemented in pm4py (Berti et al., 2019), with the default settings (corresponding to a noise threshold of 0.0) to discover Process Trees. These process Trees are further played out (simulated) with the basic play out function, implemented in pm4py (Berti et al., 2019) as well. We have repeated both the LOVOCV and the fold experiment introduced in Section 5. In the fold experiment we take out an increasing number of variants from the Training Log to group them into the Test Log. We have used both the metrics from (1)–(3) and the “absolute” metrics (4)–(6) introduced in Appendix A. The results of these experiments are visualized in Figs. 6 and 7. We can immediately notice that Model 1, the model with one parallel split, is discovered (and played out) perfectly by the Inductive Miner, and this for both the LOVOCV as for every fold. This is in contrast to the LSTM models, which seem to struggle a bit more with correctly interpreting this kind of behavior. Model 2, introducing multiple XOR splits, also does not cause any issues for the Inductive Miner, with the exception of the 2 fold experiment (where half of the control-flow variants are sampled out of the Training Log). The long-term dependency introduced in Model 3 is not picked up at all by the Process Tree discovered with the Inductive Miner (unsurprisingly). This results in near perfect absolute fitness (4) and generalization (6) scores (the Process Tree is able to play out the behavior), but frequency dependent fitness (1) and generalization (3) scores of around 0.5. Both precision scores are also around 0.5. Remarkably when performing the 2 fold experiment, fitness (and absolute precision) goes to 1.0 and generalization to 0.0. What happens is that the model completely overfits on the possible control-flow variants (one big XOR split) preventing any level of generalization. A similar type of behavior can be noticed for Model 4, which consists out of different inclusive OR splits. The longer parallel tracks of Model 5 are interpreted worse and worse, as more and more variants are left out of the Training Log, leading to a decreasing trend in all metrics. This is comparable to the results of the LSTMs trained on the same model. It is in contrast to the better interpretation of the Inductive Miner of the parallel split in Model 1 (with short 1 activity long parallel tracks). Finally the loops in Model 6 are modeled correctly by the Inductive Miner, where the imperfect frequency based fitness and precision scores are due to the actually infinite amount of possible behavior introduced by loops.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Peeperkorn, J., Broucke, S.v. & De Weerdt, J. Can recurrent neural networks learn process model structure?. J Intell Inf Syst 61, 27–51 (2023). https://doi.org/10.1007/s10844-022-00765-x

Download citation

Received: 10 March 2022
Revised: 22 July 2022
Accepted: 10 November 2022
Published: 01 December 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10844-022-00765-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can recurrent neural networks learn process model structure?

Abstract

Access this article

Similar content being viewed by others

Can Deep Neural Networks Learn Process Model Structure? An Assessment Framework and Analysis

Classifying Process Instances Using Recurrent Neural Networks

An Eye into the Future: Leveraging A-priori Knowledge in Predictive Business Process Monitoring

Data Availability

Notes

References