A Modular Chain of NLP Tools for Basque

  • Arantxa Otegi
  • Nerea Ezeiza
  • Iakes Goenaga
  • Gorka Labaka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9924)

Abstract

This work describes the initial stage of designing and implementing a modular chain of Natural Language Processing tools for Basque. The main characteristic of this chain is the deep morphosyntactic analysis carried out by the first tool of the chain and the use of these morphologically rich annotations by the following linguistic processing tools of the chain. It is designed following a modular approach, showing high ease of use of its processors. Two tools have been adapted and integrated to the chain so far, and are ready to use and freely available, namely the morphosyntactic analyzer and PoS tagger, and the dependency parser. We have evaluated these tools and obtained competitive results. Furthermore, we have tested the robustness of the tools on an extensive processing of Basque documents in various research projects.

References

  1. 1.
    Aduriz, I., Agirre, E., Aldezabal, I., Alegria, I., Arregi, X., Jose, M.A., Artola, X., Gojenola, K., Sarasola, K., Urkia, M.: A word-grammar based morphological analyzer for agglutinative languages. In: Proceedings of COLING, Saarbrucken, Germany, vol. 1, pp. 1–7 (2000)Google Scholar
  2. 2.
    Aduriz, I., Aranzabe, M., Jose, M.A., Atutxa, A., de Ilarraza, A.D., Ezeiza, N., Gojenola, K., Oronoz, M., Soroa, A., Urizar, R.: Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. In: Corpus Linguistics Around the World. Language and Computers Series, vol. 56, pp. 1–15. Rodopi, Netherlands (2006)Google Scholar
  3. 3.
    Aduriz, I., et al.: Construction of a basque dependency treebank. In: Proceedings of the Workshop on Treebanks an Linguistic Theories (TLT 2003). Treebanks and Linguistic Theories (2003)Google Scholar
  4. 4.
    Agerri, R., Bermudez, J., Rigau, G.: IXA pipeline: efficient and ready to use multilingual NLP tools. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of LREC 2014, pp. 3823–3828. European Language Resources Association (ELRA) (2014)Google Scholar
  5. 5.
    Agerri, R., Rigau, G.: Robust multilingual named entity recognition with shallow semi-supervised features. Artif. Intell. 238, 63–82 (2016)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Aranzabe, M., Atutxa, A., Bengoetxea, K., de Ilarraza, A.D., Goenaga, I., Gojenola, K., Uria, L.: Automatic conversion of the basque dependency treebank to universal dependencies. In: Proceedings of the Workshop on Treebanks an Linguistic Theories (TLT 2014), pp. 233–241. Institute of Computer Science of the Polish Academy of Sciences, Warszawa (2015)Google Scholar
  7. 7.
    Cunningham, H.: GATE, a general architecture for text engineering. Comput. Humanit. 36(2), 223–254 (2002)CrossRefGoogle Scholar
  8. 8.
    Ezeiza, N., Aduriz, I., Alegria, I., Arriola, M., Urizar, R.: Combining stochastic and rule-based methods for disambiguation in agglutinative languages. In: Proceedings of COLING-ACL 1998, vol. 1, pp. 380–384. Association for Computational Linguistics (1998)Google Scholar
  9. 9.
    Fokkens, A., Soroa, A., Beloki, Z., Ockeloen, N., Rigau, G., van Hage, W.R., Vossen, P.: NAF and GAF: linking linguistic annotations. In: Proceedings 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (2014)Google Scholar
  10. 10.
    Goenaga, I., Gojenola, K., Ezeiza, N.: Exploiting the contribution of morphological information to parsing: the BASQUE TEAM system in the SPRML’2013 shared task. In: Proceedings of SPRML-2013 Workshop, ACL, pp. 71–77. Association for Computational Linguistics (2013)Google Scholar
  11. 11.
    Karlsson, F., Voutilainen, A., Heikkila, J., Anttila, A. (eds.): Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)Google Scholar
  12. 12.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of ACL 2014: System Demonstrations, pp. 55–60. Association for Computational Linguistics (2014)Google Scholar
  13. 13.
    Padro, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of LREC 2012. European Language Resources Association (ELRA), Istanbul (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Arantxa Otegi
    • 1
  • Nerea Ezeiza
    • 1
  • Iakes Goenaga
    • 1
  • Gorka Labaka
    • 1
  1. 1.IXA GroupUniversity of the Basque Country, UPV/EHUSan SebastianSpain

Personalised recommendations