Advertisement

One Ontology to Bind Them All: The META-SHARE OWL Ontology for the Interoperability of Linguistic Datasets on the Web

  • John P. McCrae
  • Penny Labropoulou
  • Jorge GraciaEmail author
  • Marta Villegas
  • Víctor Rodríguez-Doncel
  • Philipp Cimiano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9341)

Abstract

META-SHARE is an infrastructure for sharing Language Resources (LRs) where significant effort has been made into providing carefully curated metadata about LRs. However, in the face of the flood of data that is used in computational linguistics, a manual approach cannot suffice. We present the development of the META-SHARE ontology, which transforms the metadata schema used by META-SHARE into ontology in the Web Ontology Language (OWL) that can better handle the diversity of metadata found in legacy and crowd-sourced resources. We show how this model can interface with other more general purpose vocabularies for online datasets and licensing, and apply this model to the CLARIN VLO, a large source of legacy metadata about LRs. Furthermore, we demonstrate the usefulness of this approach in two public metadata portals for information about language resources.

Keywords

Language resources and evaluation Metadata Ontologies Harmonization 

Notes

Acknowledgments

We are very grateful to the members of the W3C Linked Data for Language Technologies (LD4LT) for all the useful feedback received and for allowing this initiative to be developed as an activity of the group. This work is supported by the FP7 European project LIDER (610782), by the Spanish Ministry of Economy and Competitiveness (project TIN2013-46238-C4-2-R and a Juan de la Cierva grant), the Greek CLARIN Attiki project (MIS 441451) and the H2020 project CRACKER (645357).

References

  1. 1.
    Bird, S., Simons, G.: The OLAC metadata set and controlled vocabularies. In: Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources, vol. 15, pp. 7–18. Association for Computational Linguistics (2001)Google Scholar
  2. 2.
    Broeder, D., Kemps-Snijders, M., Van Uytvanck, D., Windhouwer, M., Withers, P., Wittenburg, P., Zinn, C.: A data category registry-and component-based metadata framework. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation, pp. 43–47 (2010)Google Scholar
  3. 3.
    Broeder, D., Offenga, F., Willems, D., Wittenburg, P.: The IMDI metadata set, its tools and accessible linguistic databases. In: Proceedings of the IRCS Workshop on Linguistic Databases, pp. 11–13 (2001)Google Scholar
  4. 4.
    Broeder, D., Windhouwer, M., Van Uytvanck, D., Goosen, T., Trippel, T.: CMDI: a component metadata infrastructure. In: Describing LRs with Metadata: Towards Flexibility and Interoperability in the Documentation of LR, pp. 1–4 (2012)Google Scholar
  5. 5.
    Calzolari, N., Del Gratta, R., Francopoulo, G., Mariani, J., Rubino, F., Russo, I., Soria, C.: The LRE map. Harmonising community descriptions of resources. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation, pp. 1084–1089 (2012)Google Scholar
  6. 6.
    Chiarcos, C.: Ontologies of linguistic annotation: Survey and perspectives. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp. 303–310 (2012)Google Scholar
  7. 7.
    Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: linguistic linked data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources: Ideas, Projects, Systems, pp. 7–25. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Cieri, C., Choukri, K., Calzolari, N., Langendoen, D.T., Leveling, J., Palmer, M., Ide, N., Pustejovsky, J.: A road map for interoperable language resource metadata. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta, May 2010Google Scholar
  9. 9.
    Farrar, S., Lewis, W., Langendoen, T.: A common ontology for linguistic concepts. In: Proceedings of the Knowledge Technologies Conference, pp. 10–13 (2002)Google Scholar
  10. 10.
    Gartner, R.: MODS: Metadata object description schema. JISC Techwatch report TSW, pp. 3–6 (2003)Google Scholar
  11. 11.
    Gavrilidou, M., Labropoulou, P., Desipri, E., Piperidis, S., Papageorgiou, H., Monachini, M., Frontini, F., Declerck, T., Francopoulo, G., Arranz, V., Mapelli, V.: The META-SHARE metadata schema for the description of language resources. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp. 1090–1097 (2012)Google Scholar
  12. 12.
    Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H., et al. (eds.) The Semantic Web – ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  13. 13.
    Ide, N.: Corpus encoding standard: SGML guidelines for encoding linguistic corpora. In: Proceedings of the First International Language Resources and Evaluation Conference, pp. 463–470 (1998)Google Scholar
  14. 14.
    Ide, N., Véronis, J. (eds.): Text Encoding Initiative: Background and Contexts. Springer, Heidelberg (1995)Google Scholar
  15. 15.
    Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., Wright, S.E.: ISOcat: corralling data categories in the wild. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (2008)Google Scholar
  16. 16.
    Maali, F., Erickson, J., Archer, P.: Data catalog vocabulary (DCAT). W3C recommendation, The World Wide Web Consortium (2014)Google Scholar
  17. 17.
    Motik, B., Patel-Schneider, P.F., Parsia, B., Bock, C., Fokoue, A., Haase, P., Hoekstra, R., Horrocks, I., Ruttenberg, A., Sattler, U., Smith, M.: OWL 2 web ontology language structural specification and functional-style syntax. W3C recommendation, The World Wide Web Consortium (2012)Google Scholar
  18. 18.
    Piperidis, S.: The META-SHARE language resources sharing infrastructure: principles, challenges, solutions. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation, pp. 36–42 (2012)Google Scholar
  19. 19.
    Rodriguez-Doncel, V., Villata, S., Gomez-Perez, A.: A dataset of RDF licenses. In: Proceedings of the 27th International Conference on Legal Knowledge and Information System (JURIX), pp. 187–189 (2014)Google Scholar
  20. 20.
    Soria, C., Calzolari, N., Monachini, M., Quochi, V., Bel, N., Choukri, K., Mariani, J., Odijk, J., Piperidis, S.: The language resource strategic agenda: the flarenet synthesis of community recommendations. Lang. Resour. Eval. 48(4), 753–775 (2014). http://dx.doi.org/10.1007/s10579-014-9279-yCrossRefGoogle Scholar
  21. 21.
    \(\check{\text{ D }}\)určo, M., Windhouwer, M.: From CLARIN component metadata to linked open data. In: Proceedings of the 3rd Workshop on Linked Data in Linguistics, pp. 13–17 (2014)Google Scholar
  22. 22.
    Villegas, M., Melero, M., Bel, N.: Metadata as linked open data: mapping disparate XML metadata registries into one RDF/OWL registry. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp. 393–400 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • John P. McCrae
    • 1
  • Penny Labropoulou
    • 3
  • Jorge Gracia
    • 2
    Email author
  • Marta Villegas
    • 4
  • Víctor Rodríguez-Doncel
    • 2
  • Philipp Cimiano
    • 1
  1. 1.Cognitive Interaction Technology, Excellence ClusterBielefeld UniversityBielefeldGermany
  2. 2.Ontology Engineering GroupUniversidad Politécnica de MadridMadridSpain
  3. 3.CILSP/“Athena” RCAthensGreece
  4. 4.University Pompeu FabraBarcelonaSpain

Personalised recommendations