Combining and Extending Data Infrastructures with Linguistic Annotation Services

  • Stelios Piperidis
  • Dimitrios Galanis
  • Juli Bakagianni
  • Sokratis Sofianopoulos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9442)

Abstract

This paper reports on a first prototype implementation for combining and extending a data infrastructure with linguistic processing services, bringing language datasets and basic language processing services together in a unified platform thus boosting the organic growth of data and facilitating language technology research and development. The META-SHARE data infrastructure is enhanced by providing a language processing mechanism for annotating content with appropriate NLP services that are documented with the appropriate metadata. Atomic services are combined into workflows modeled as an acyclic directed graph where each node corresponds to an NLP processing service (e.g. sentence splitting, part-of-speech tagging). Services run either locally or remotely. Currently, the language processing layer implements services and workflows for processing monolingual and bilingual content/resources in raw text, xces, tmx formats. From the legal framework point of view, a simple operational model is adopted by which only openly licensed datasets can be processed by openly licensed services and workflows.

Keywords

Data infrastructures Distributed repositories Metadata standards Language resources licensing Linguistic processing services Workflows Web services 

References

  1. 1.
    Soria, C., Bel, N., Choukri, K., Mariani, J., Monachini, M., Odijk, J., Piperidis, S., Quochi, V., Calzolari, N.: The FLaReNet strategic language resource agenda. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA), Istanbul, 23–25 May 2012Google Scholar
  2. 2.
    Wittenburg, P., Bel, N., Borin, L., Budin, G., Calzolari, N., Hajicova, E., Koskenniemi, K., Lemnitzer, L., Maegaard, B., Piasecki, M., Pierrel, J.M., Piperidis, S., Skadina, I., Tufis, D., Veenendaal, R.V., Váradi, T., Wynne, M.: Resource and service centres as the backbone for a sustainable service infrastructure. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), European Language Resources Association (ELRA), Valletta (2010)Google Scholar
  3. 3.
    Ishida, T. (ed.): The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer, Heidelberg (2011)Google Scholar
  4. 4.
    Poch, M., Bel, N.: Interoperability and technology for a language resources factory. Article Presented in the Workshop on Language Resources, Technology and Services in the Sharing Paradigm at IJCNLP 2011, Chiang Mai, 12 November 2011Google Scholar
  5. 5.
    Ide, N., Pustejovsky, J., Cieri, C., Nyberg, E., Wang, D., Suderman, K., Verhagen, M., Wright, J.: The language application grid. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), European Language Resources Association (ELRA), Reykjavik (2014)Google Scholar
  6. 6.
    Piperidis, S.: The META-SHARE language resources sharing infrastructure: principles, challenges, solutions. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA), Istanbul, 23–25 May 2012Google Scholar
  7. 7.
    Piperidis, S., Papageorgiou, H., Spurk, C., Rehm, G., Choukri, K., Hamon, O., Calzolari, N., del Gratta, R., Magnini, B., Girardi, C.: METASHARE: one year after. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings Of The Ninth International Conference On Language Resources and Evaluation (LREC 2014), European Language Resources Association (ELRA), Reykjavik (2012)Google Scholar
  8. 8.
    Federmann, C., Georgantopoulos, B., Girardi, C., Hamon, O., Mavroeidis, D., Minutoli, S., Schröder, M.: META-SHARE v2: an open network of repositories for language resources including data and tools. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA), Istanbul, 23–25 May 2012Google Scholar
  9. 9.
    Gavrilidou, M., Labropoulou, P., Desypri, E., Piperidis, S., Papageorgiou, H., Monachini, M., Frontini, F., Declerck, T., Francopoulo, G., Arranz, V., Mapelli, V: The META-SHARE metadata schema for the description of language resources. In: Calzolari, N., Choukri, K., Declerck, T., Uğur Doğan, M., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA), Istanbul, 23–25 May 2012Google Scholar
  10. 10.
    Broeder, D., Kemps-Snijders, M., Van Uytvanck, D., Windhouwer, M., Withers, P., Wittenburg, P. Zinn, C.: A Data category registry- and component-based metadata framework. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), European Language Resources Association (ELRA), Valletta (2010)Google Scholar
  11. 11.
    ISO 12620. Terminology and other language and content resources – Specification of data categories and management of a Data Category Registry for language resources. (2009). http://www.isocat.org

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Stelios Piperidis
    • 1
  • Dimitrios Galanis
    • 1
  • Juli Bakagianni
    • 1
  • Sokratis Sofianopoulos
    • 1
  1. 1.Athena RC/ILSPAthensGreece

Personalised recommendations