Extracting Workflows from Natural Language Documents: A First Step
Business process models are used to identify control-flow relationships of tasks extracted from information system event logs. These event logs may fail to capture critical tasks executed outside of regular logging environments, but such latent tasks may be inferred from unstructured natural language texts. This paper highlights two workflow discovery pipeline components which use NLP and sequence mining techniques to extract workflow candidates from such texts. We present our Event Labeling and Sequence Analysis (ELSA) prototype which implements these components, associated approach methodologies, and performance results of our algorithm against ground truth data from the Apache Software Foundation Public Email Archive.
KeywordsWorkflow discovery Natural language Sequence mining
This material is based upon work supported under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the U. S. Air Force.
- 1.Apache camel. https://camel.apache.org/. last accessed 23 Jan 2018
- 2.Allard, T., Alvino, P., Shing, L., Wollaber, A., Yuen, J.: A novel dataset to facilitate automated workflow analysis. PLOS ONE (2018) (submitted)Google Scholar
- 4.Di Ciccio, C., Mecella, M., Scannapieco, M., Zardetto, D., Catarci, T.: MailOfMine – analyzing mail messages for mining artful collaborative processes. In: Aberer, K., Damiani, E., Dillon, T. (eds.) SIMPDA 2011. LNBIP, vol. 116, pp. 55–81. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34044-4_4CrossRefGoogle Scholar
- 5.Dredze, M., Lau, T., Kushmerick, N.: Automatically classifying emails into activities. In: Proceedings of the 11th International Conference on Intelligent User Interfaces, pp. 70–77. ACM (2006)Google Scholar
- 7.Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96, 226–231 (1996)Google Scholar
- 9.Fournier-Viger, P., Tseng, V.S.: TNS: mining top-k non-redundant sequential rules. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 164–166. ACM (2013)Google Scholar
- 10.Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)Google Scholar
- 11.Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
- 14.Schumacher, P., Minor, M., Walter, K., Bergmann, R.: Extraction of procedural knowledge from the web: a comparison of two workflow extraction approaches. In: Proceedings of the 21st International Conference on World Wide Web, pp. 739–747. ACM (2012)Google Scholar
- 15.Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994). https://doi.org/10.3115/981732.981751