Skip to main content

PET: An Annotated Dataset for Process Extraction from Natural Language Text Tasks

  • Conference paper
  • First Online:
Business Process Management Workshops (BPM 2022)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 460))

Included in the following conference series:

Abstract

Process extraction from text is an important task of process discovery, for which various approaches have been developed in recent years. However, differently from other information extraction tasks, there is a lack of gold-standard corpora of business process descriptions carefully annotated with all the entities and relationships of interest. This paper presents the PET dataset, a first corpus of business process descriptions annotated with activities, gateways, actors, and flow information. We present our new resource, including a variety of baselines to benchmark the difficulty and challenges of business process extraction from text. The PET dataset, annotation guidelines, and inception schema are freely available via huggingface.co/datasets/patriziobellan/PET.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Differently from customary BPM terminology, we break down activities to differentiate among an activity’s “action” expression and the object the activity acts on. This eases the annotation task when, e.g., different actions relate to the same object (or vice versa) and because NLP techniques may differ concerning the dealing of verb expressions and noun expressions.

References

  1. van der Aa, H., Carmona, J., Leopold, H., Mendling, J., Padró, L.: Challenges and opportunities of applying natural language processing in business process management. In: COLING, pp. 2791–2801 (2018)

    Google Scholar 

  2. Adamo, G., Di Francescomarino, C., Ghidini, C.: Digging into business process meta-models: a first ontological analysis. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 384–400. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_24

    Chapter  Google Scholar 

  3. Bellan, P., Dragoni, M., Ghidini, C.: A qualitative analysis of the state of the art in process extraction from text. In: Proceedings of the AIxIA 2020 Discussion Papers Workshop Co-located with AIxIA2020. CEUR Workshop Proceedings, vol. 2776, pp. 19–30. CEUR-WS.org (2020)

    Google Scholar 

  4. Friedrich, F.: Automated generation of business process models from natural language input. M. Sc., School of Business and Economics. Humboldt-Universität zu Berlin (2010)

    Google Scholar 

  5. Hripcsak, G., Rothschild, A.S.: Technical brief: agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)

    Article  Google Scholar 

  6. Klie, J.C., et al.: The inception platform: machine-assisted and knowledge-oriented interactive annotation. In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pp. 5–9. ACL (2018)

    Google Scholar 

  7. Maqbool, B., et al.: A comprehensive investigation of BPMN models generation from textual requirements—techniques, tools and trends. In: Kim, K.J., Baek, N. (eds.) ICISA 2018. LNEE, vol. 514, pp. 543–557. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1056-0_54

    Chapter  Google Scholar 

  8. Nanni, F., Glavaš, G., Ponzetto, S.P., et al.: Findings from the hackathon on understanding euroscepticism through the lens of textual data. In: LREC. European Language Resources Association (ELRA) (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrizio Bellan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., Ponzetto, S.P. (2023). PET: An Annotated Dataset for Process Extraction from Natural Language Text Tasks. In: Cabanillas, C., Garmann-Johnsen, N.F., Koschmider, A. (eds) Business Process Management Workshops. BPM 2022. Lecture Notes in Business Information Processing, vol 460. Springer, Cham. https://doi.org/10.1007/978-3-031-25383-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25383-6_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25382-9

  • Online ISBN: 978-3-031-25383-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics