Advertisement

Towards a Process Mining Approach to Grammar Induction for Digital Libraries

Syntax Checking and Style Analysis
  • Stefano FerilliEmail author
  • Sergio Angelastro
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 988)

Abstract

Since most content in Digital Libraries and Archives is text, there is an interest in the application of Natural Language Processing (NLP) to extract valuable information from it in order to support various kinds of user activities. Most NLP techniques exploit linguistic resources that are language-specific, costly and error prone to produce manually, which motivates research for automatic ways to build them.

This paper extends the BLA-BLA tool for learning linguistic resources, adding a Grammar Induction feature based on the advanced process mining and management system WoMan. Experimental results are encouraging, envisaging interesting applications to Digital Libraries and motivating further research aimed at extracting an explicit grammar from the learned models.

Keywords

Natural Language Processing Grammar Induction Process Mining and Management 

References

  1. 1.
    Baker, J.K.: Trainable grammars for speech recognition. J. Acoust. Soc. Am. 65(S1), S132–S132 (1979)CrossRefGoogle Scholar
  2. 2.
    Bombini, G., Di Mauro, N., Esposito, F., Ferilli, S.: Incremental learning from positive examples. In: Atti del (2009)Google Scholar
  3. 3.
    Bosco, C., Dell’Orletta, F., Montemagni, S., Sanguinetti, M., Simi, M.: The Evalita 2014 dependency parsing task. In: EVALITA 2014 Evaluation of NLP and Speech Tools for Italian, pp. 1–8. Pisa University Press (2014)Google Scholar
  4. 4.
    Bosco, C., Fabio, T., Andrea, B., Mazzei, A.: Overview of the Evalita 2016 part of speech on Twitter for Italian task. In: CEUR Workshop Proceedings, vol. 1749, pp. 1–7 (2016)Google Scholar
  5. 5.
    Buchholz, S., Marsi, E.: CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning, pp. 149–164. Association for Computational Linguistics (2006)Google Scholar
  6. 6.
    Carroll, G., Charniak, E.: Two experiments on learning probabilistic dependency grammars from corpora. Department of Computer Science, Univ. (1992)Google Scholar
  7. 7.
    Clark, A.: Unsupervised induction of stochastic context-free grammars using distributional clustering. In: Proceedings of the Workshop on Computational Natural Language Learning, vol. 7, p. 13. Association for Computational Linguistics (2001)Google Scholar
  8. 8.
    Esposito, F., Semeraro, G., Fanizzi, N., Ferilli, S.: Multistrategy theory revision: induction and abduction in INTHELEX. Mach. Learn. 38(1–2), 133–156 (2000)CrossRefGoogle Scholar
  9. 9.
    Ferilli, S.: WoMan: logic-based workflow learning and management. IEEE Trans. Syst. Man Cybern. Syst. 44, 744–756 (2014)CrossRefGoogle Scholar
  10. 10.
    Ferilli, S., Esposito, F.: A logic framework for incremental learning of process models. Fundam. Inform. 128, 413–443 (2013)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Ferilli, S., Esposito, F., Grieco, D.: Automatic learning of linguistic resources for stopword removal and stemming from text. Proc. Comput. Sci. 38, 116–123 (2014)CrossRefGoogle Scholar
  12. 12.
    Ferilli, S., Esposito, F., Redavid, D.: Language identification as process prediction using WoMan. In: Proceedings of the 12th Italian Research Conference on Digital Library Management Systems (IRCDL-2016), p. 12 (2016)Google Scholar
  13. 13.
    Ferilli, S., Grieco, D., Esposito, F.: Automatic learning of linguistic resources for stopword removal and stemming from text. In: Agosti, M., Ferro, N. (eds.) Proceedings of the 10th Italian Research Conference on Digital Library Management Systems (IRCDL-2014), p. 12 (2014)CrossRefGoogle Scholar
  14. 14.
    Ferilli, S., Esposito, F., Redavid, D., Angelastro, S.: Predicting process behavior in WoMan. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS, vol. 10037, pp. 308–320. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49130-1_23CrossRefGoogle Scholar
  15. 15.
    Gold, E.M., et al.: Language identification in the limit. Inf. Contr. 10(5), 447–474 (1967)MathSciNetCrossRefGoogle Scholar
  16. 16.
    IEEE Task Force on Process Mining: Process mining manifesto. In: BPM Workshops, LNBIP, vol. 99, pp. 169–194 (2012)Google Scholar
  17. 17.
    Lari, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Comput. Speech Lang. 4(1), 35–56 (1990)CrossRefGoogle Scholar
  18. 18.
    Leuzzi, F., Ferilli, S., Rotella, F.: ConNeKTion: a tool for handling conceptual graphs automatically extracted from text. In: Catarci, T., Ferro, N., Poggi, A. (eds.) IRCDL 2013. CCIS, vol. 385. Springer, Heidelberg (2013). Publications/ircdl2013.pdfGoogle Scholar
  19. 19.
    Marcus, M., et al.: The Penn Treebank: annotating predicate argument structure. In: Proceedings of the Workshop on Human Language Technology, HLT 1994, pp. 114–119. Association for Computational Linguistics, Stroudsburg (1994).  https://doi.org/10.3115/1075812.1075835
  20. 20.
    Naseem, T., et al.: Using universal linguistic knowledge to guide grammar induction. In: Proceedings of the 2010 Conference on Empirical Methods in NLP, pp. 1234–1244. Association for Computational Linguistics (2010)Google Scholar
  21. 21.
    Rotella, F., Leuzzi, F., Ferilli, S.: Learning and exploiting concept networks with conNeKTion. Appl. Intell. 42, 87–111 (2015)CrossRefGoogle Scholar
  22. 22.
    Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Three dependency-and-boundary models for grammar induction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning, pp. 688–698. Association for Computational Linguistics (2012)Google Scholar
  23. 23.
    Stolcke, A., Omohundro, S.: Inducing probabilistic grammars by Bayesian model merging. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 106–118. Springer, Heidelberg (1994).  https://doi.org/10.1007/3-540-58473-0_141CrossRefGoogle Scholar
  24. 24.
    Weijters, A., van der Aalst, W.: Rediscovering workflow models from event-based data. In: Proceedings of the 11th Dutch-Belgian Conference of Machine Learning (Benelearn 2001), pp. 93–100 (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Dipartimento di InformaticaUniversità di BariBariItaly

Personalised recommendations