Extracting Workflows from Natural Language Documents: A First Step

  • Leslie ShingEmail author
  • Allan WollaberEmail author
  • Satish ChikkagoudarEmail author
  • Joseph YuenEmail author
  • Paul AlvinoEmail author
  • Alexander ChambersEmail author
  • Tony AllardEmail author
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 342)


Business process models are used to identify control-flow relationships of tasks extracted from information system event logs. These event logs may fail to capture critical tasks executed outside of regular logging environments, but such latent tasks may be inferred from unstructured natural language texts. This paper highlights two workflow discovery pipeline components which use NLP and sequence mining techniques to extract workflow candidates from such texts. We present our Event Labeling and Sequence Analysis (ELSA) prototype which implements these components, associated approach methodologies, and performance results of our algorithm against ground truth data from the Apache Software Foundation Public Email Archive.


Workflow discovery Natural language Sequence mining 



This material is based upon work supported under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the U. S. Air Force.


  1. 1.
    Apache camel. last accessed 23 Jan 2018
  2. 2.
    Allard, T., Alvino, P., Shing, L., Wollaber, A., Yuen, J.: A novel dataset to facilitate automated workflow analysis. PLOS ONE (2018) (submitted)Google Scholar
  3. 3.
    Allen, J.F., Ferguson, G.: Actions and events in interval temporal logic. J. Log. Comput. 4(5), 531–579 (1994)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Di Ciccio, C., Mecella, M., Scannapieco, M., Zardetto, D., Catarci, T.: MailOfMine – analyzing mail messages for mining artful collaborative processes. In: Aberer, K., Damiani, E., Dillon, T. (eds.) SIMPDA 2011. LNBIP, vol. 116, pp. 55–81. Springer, Heidelberg (2012). Scholar
  5. 5.
    Dredze, M., Lau, T., Kushmerick, N.: Automatically classifying emails into activities. In: Proceedings of the 11th International Conference on Intelligent User Interfaces, pp. 70–77. ACM (2006)Google Scholar
  6. 6.
    Dufour-Lussier, V., Le Ber, F., Lieber, J., Nauer, E.: Automatic case acquisition from texts for process-oriented case-based reasoning. Inf. Syst. 40, 153–167 (2014)CrossRefGoogle Scholar
  7. 7.
    Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96, 226–231 (1996)Google Scholar
  8. 8.
    Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a Java open-source pattern mining library. J. Mach. Learn. Res. 15(1), 3389–3393 (2014)zbMATHGoogle Scholar
  9. 9.
    Fournier-Viger, P., Tseng, V.S.: TNS: mining top-k non-redundant sequential rules. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 164–166. ACM (2013)Google Scholar
  10. 10.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)Google Scholar
  11. 11.
    Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
  12. 12.
    Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  13. 13.
    Navigli, R., Lapata, M.: An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Trans. Pattern Anal. Mach. Intell. 32(4), 678–692 (2010)CrossRefGoogle Scholar
  14. 14.
    Schumacher, P., Minor, M., Walter, K., Bergmann, R.: Extraction of procedural knowledge from the web: a comparison of two workflow extraction approaches. In: Proceedings of the 21st International Conference on World Wide Web, pp. 739–747. ACM (2012)Google Scholar
  15. 15.
    Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.MIT Lincoln LaboratoryLexingtonUSA
  2. 2.Naval Research LaboratoryWashington, D.C.USA
  3. 3.Commonwealth Bank of AustraliaSydneyAustralia
  4. 4.Defence Science and Technology GroupEdinburghAustralia

Personalised recommendations