Advertisement

Discovery of Language Resources

  • Philipp Cimiano
  • Christian Chiarcos
  • John P. McCrae
  • Jorge Gracia
Chapter

Abstract

Finding appropriate language resources for a particular research purpose or task is of crucial importance and represents a significant challenge at the same time. Currently, there are a number of distributed data repositories which contain metadata about many language resources. However, the metadata formats and metadata content is not harmonized across the different repositories, making it extremely difficult to provide automatic support for the process of searching for resources across repositories. In this chapter we describe an approach that supports the harmonization of metadata from a number of relevant repositories. As a proof-of-concept of this approach, we describe Linghub, a portal that has been developed to aggregate metadata from a number of repositories to provide a single point of entry for searching language resources across repositories. We describe the methods that have been used in the normalization of the data and report on the accuracy of the methods. The Linghub portal is publicly available and can be used freely to search for language resources.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    J.P. McCrae, P. Cimiano, Linghub: a linked data based portal supporting the discovery of language resources, in Proceedings of the 11th International Conference on Semantic Systems (2015)Google Scholar
  2. 2.
    M. Gavrilidou, P. Labropoulou, E. Desipri, S. Piperidis, H. Papageorgiou, M. Monachini, F. Frontini, T. Declerck, G. Francopoulo, V. Arranz, et al., The META-SHARE metadata schema for the description of language resources, in Proceedings of the 8th International Conference on Language Resources and Evaluation (2012), pp. 1090–1097Google Scholar
  3. 3.
    J.P. McCrae, P. Cimiano, LIXR: quick, succinct conversion of XML to RDF, in Proceedings of the Posters and Demo Track of the International Semantic Web Conference (2016)Google Scholar
  4. 4.
    J.P. McCrae, P. Labropoulou, J. Gracia, M. Villegas, V. Rodriguez-Doncel, P. Cimiano, One ontology to bind them all: the META-SHARE OWL ontology for the interoperability of linguistic datasets on the web, in Proceedings of 12th Extended Semantic Web Conference (ESWC) Satellite Events, vol. 9341 (Springer, Cham, 2015), pp. 271–282Google Scholar
  5. 5.
    D. Van Uytvanck, C. Zinn, D. Broeder, P. Wittenburg, M. Gardelleni, Virtual language observatory: the portal to the language resources and technology universe, in Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC) (European Language Resources Association (ELRA), Luxembourg, 2010), pp. 900–903Google Scholar
  6. 6.
    D. Broeder, M. Windhouwer, D. Van Uytvanck, T. Goosen, T. Trippel, CMDI: a component metadata infrastructure, in Describing LRs with Metadata: Towards Flexibility and Interoperability in the Documentation of LR Workshop Programme (2012), p. 1Google Scholar
  7. 7.
    N. Calzolari, R. Del Gratta, G. Francopoulo, J. Mariani, F. Rubino, I. Russo, C. Soria, The LRE Map, Harmonising community descriptions of resources, in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC) (2012), pp. 1084–1089Google Scholar
  8. 8.
    R. Del Gratta, G. Pardelli, S. Goggi, The LRE Map disclosed, in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (2014), pp. 3534–3541Google Scholar
  9. 9.
    F. Maali, J. Erickson, P. Archer, Data Catalog Vocabulary (DCAT). W3C recommendation (The World Wide Web Consortium, Cambridge, 2014)Google Scholar
  10. 10.
    M.F. Porter, Snowball: A Language for Stemming Algorithms (2001), http://snowball.tartarus.org/texts/introduction.html
  11. 11.
    R. Navigli, S.P. Ponzetto, BabelNet: building a very large multilingual semantic network, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (2010), pp. 216–225Google Scholar
  12. 12.
    A. Moro, A. Raganato, R. Navigli, Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231 (2014)CrossRefGoogle Scholar
  13. 13.
    J.P. McCrae, Yuzu: publishing any data as linked data, in Proceedings of the Demo and Posters Track at the International Semantic Web Conference (2016)Google Scholar
  14. 14.
    P. Boncz, O. Erling, M.D. Pham, Advances in large-scale RDF data management, in Linked Open Data–Creating Knowledge Out of Interlinked Data (Springer, Cham, 2014), pp. 21–44CrossRefGoogle Scholar
  15. 15.
    M. Sporny, D. Longley, G. Kellogg, M. Lanthaler, N. Lindström, JSON-LD 1.0. W3C recommendation (World Wide Web Consortium, Cambridge, 2014)Google Scholar
  16. 16.
    J. Tennison, CSV on the web: a primer. W3C working group note (World Wide Web Consortium, Cambridge, 2014)Google Scholar
  17. 17.
    J. Tandy, I. Herman, G. Kellogg, Generating RDF from tabular data on the web. W3C recommendation (World Wide Web Consortium, Cambridge, 2015)Google Scholar
  18. 18.
    A. Seaborne, K.G. Clark, L. Feigenbaum, E. Torres, SPARQL 1.1 query results JSON format. W3C recommendation (The World Wide Web Consortium, Cambridge, 2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Semantic Computing GroupBielefeld UniversityBielefeldGermany
  2. 2.Angewandte ComputerlinguistikGoethe-UniversityFrankfurt am MainGermany
  3. 3.Insight Centre for Data AnalyticsNational University of IrelandGalwayIreland
  4. 4.Aragon Institute of Engineering Research (I3A)University of ZaragozaZaragozaSpain

Personalised recommendations