Abstract
Process extraction from text is an important task of process discovery, for which various approaches have been developed in recent years. However, differently from other information extraction tasks, there is a lack of gold-standard corpora of business process descriptions carefully annotated with all the entities and relationships of interest. This paper presents the PET dataset, a first corpus of business process descriptions annotated with activities, gateways, actors, and flow information. We present our new resource, including a variety of baselines to benchmark the difficulty and challenges of business process extraction from text. The PET dataset, annotation guidelines, and inception schema are freely available via huggingface.co/datasets/patriziobellan/PET.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Differently from customary BPM terminology, we break down activities to differentiate among an activity’s “action” expression and the object the activity acts on. This eases the annotation task when, e.g., different actions relate to the same object (or vice versa) and because NLP techniques may differ concerning the dealing of verb expressions and noun expressions.
References
van der Aa, H., Carmona, J., Leopold, H., Mendling, J., Padró, L.: Challenges and opportunities of applying natural language processing in business process management. In: COLING, pp. 2791–2801 (2018)
Adamo, G., Di Francescomarino, C., Ghidini, C.: Digging into business process meta-models: a first ontological analysis. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 384–400. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_24
Bellan, P., Dragoni, M., Ghidini, C.: A qualitative analysis of the state of the art in process extraction from text. In: Proceedings of the AIxIA 2020 Discussion Papers Workshop Co-located with AIxIA2020. CEUR Workshop Proceedings, vol. 2776, pp. 19–30. CEUR-WS.org (2020)
Friedrich, F.: Automated generation of business process models from natural language input. M. Sc., School of Business and Economics. Humboldt-Universität zu Berlin (2010)
Hripcsak, G., Rothschild, A.S.: Technical brief: agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)
Klie, J.C., et al.: The inception platform: machine-assisted and knowledge-oriented interactive annotation. In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pp. 5–9. ACL (2018)
Maqbool, B., et al.: A comprehensive investigation of BPMN models generation from textual requirements—techniques, tools and trends. In: Kim, K.J., Baek, N. (eds.) ICISA 2018. LNEE, vol. 514, pp. 543–557. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1056-0_54
Nanni, F., Glavaš, G., Ponzetto, S.P., et al.: Findings from the hackathon on understanding euroscepticism through the lens of textual data. In: LREC. European Language Resources Association (ELRA) (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., Ponzetto, S.P. (2023). PET: An Annotated Dataset for Process Extraction from Natural Language Text Tasks. In: Cabanillas, C., Garmann-Johnsen, N.F., Koschmider, A. (eds) Business Process Management Workshops. BPM 2022. Lecture Notes in Business Information Processing, vol 460. Springer, Cham. https://doi.org/10.1007/978-3-031-25383-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-25383-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25382-9
Online ISBN: 978-3-031-25383-6
eBook Packages: Computer ScienceComputer Science (R0)