One Ontology to Bind Them All: The META-SHARE OWL Ontology for the Interoperability of Linguistic Datasets on the Web

  • John P. McCrae
  • Penny Labropoulou
  • Jorge Gracia
  • Marta Villegas
  • Víctor Rodríguez-Doncel
  • Philipp Cimiano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9341)

Abstract

META-SHARE is an infrastructure for sharing Language Resources (LRs) where significant effort has been made into providing carefully curated metadata about LRs. However, in the face of the flood of data that is used in computational linguistics, a manual approach cannot suffice. We present the development of the META-SHARE ontology, which transforms the metadata schema used by META-SHARE into ontology in the Web Ontology Language (OWL) that can better handle the diversity of metadata found in legacy and crowd-sourced resources. We show how this model can interface with other more general purpose vocabularies for online datasets and licensing, and apply this model to the CLARIN VLO, a large source of legacy metadata about LRs. Furthermore, we demonstrate the usefulness of this approach in two public metadata portals for information about language resources.

Keywords

Language resources and evaluation Metadata Ontologies Harmonization 

References

  1. 1.
    Bird, S., Simons, G.: The OLAC metadata set and controlled vocabularies. In: Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources, vol. 15, pp. 7–18. Association for Computational Linguistics (2001)Google Scholar
  2. 2.
    Broeder, D., Kemps-Snijders, M., Van Uytvanck, D., Windhouwer, M., Withers, P., Wittenburg, P., Zinn, C.: A data category registry-and component-based metadata framework. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation, pp. 43–47 (2010)Google Scholar
  3. 3.
    Broeder, D., Offenga, F., Willems, D., Wittenburg, P.: The IMDI metadata set, its tools and accessible linguistic databases. In: Proceedings of the IRCS Workshop on Linguistic Databases, pp. 11–13 (2001)Google Scholar
  4. 4.
    Broeder, D., Windhouwer, M., Van Uytvanck, D., Goosen, T., Trippel, T.: CMDI: a component metadata infrastructure. In: Describing LRs with Metadata: Towards Flexibility and Interoperability in the Documentation of LR, pp. 1–4 (2012)Google Scholar
  5. 5.
    Calzolari, N., Del Gratta, R., Francopoulo, G., Mariani, J., Rubino, F., Russo, I., Soria, C.: The LRE map. Harmonising community descriptions of resources. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation, pp. 1084–1089 (2012)Google Scholar
  6. 6.
    Chiarcos, C.: Ontologies of linguistic annotation: Survey and perspectives. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp. 303–310 (2012)Google Scholar
  7. 7.
    Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: linguistic linked data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources: Ideas, Projects, Systems, pp. 7–25. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Cieri, C., Choukri, K., Calzolari, N., Langendoen, D.T., Leveling, J., Palmer, M., Ide, N., Pustejovsky, J.: A road map for interoperable language resource metadata. In: Chair, N.C.C., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta, May 2010Google Scholar
  9. 9.
    Farrar, S., Lewis, W., Langendoen, T.: A common ontology for linguistic concepts. In: Proceedings of the Knowledge Technologies Conference, pp. 10–13 (2002)Google Scholar
  10. 10.
    Gartner, R.: MODS: Metadata object description schema. JISC Techwatch report TSW, pp. 3–6 (2003)Google Scholar
  11. 11.
    Gavrilidou, M., Labropoulou, P., Desipri, E., Piperidis, S., Papageorgiou, H., Monachini, M., Frontini, F., Declerck, T., Francopoulo, G., Arranz, V., Mapelli, V.: The META-SHARE metadata schema for the description of language resources. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, pp. 1090–1097 (2012)Google Scholar
  12. 12.
    Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H., et al. (eds.) The Semantic Web – ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  13. 13.
    Ide, N.: Corpus encoding standard: SGML guidelines for encoding linguistic corpora. In: Proceedings of the First International Language Resources and Evaluation Conference, pp. 463–470 (1998)Google Scholar
  14. 14.
    Ide, N., Véronis, J. (eds.): Text Encoding Initiative: Background and Contexts. Springer, Heidelberg (1995)Google Scholar
  15. 15.
    Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., Wright, S.E.: ISOcat: corralling data categories in the wild. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (2008)Google Scholar
  16. 16.
    Maali, F., Erickson, J., Archer, P.: Data catalog vocabulary (DCAT). W3C recommendation, The World Wide Web Consortium (2014)Google Scholar
  17. 17.
    Motik, B., Patel-Schneider, P.F., Parsia, B., Bock, C., Fokoue, A., Haase, P., Hoekstra, R., Horrocks, I., Ruttenberg, A., Sattler, U., Smith, M.: OWL 2 web ontology language structural specification and functional-style syntax. W3C recommendation, The World Wide Web Consortium (2012)Google Scholar
  18. 18.
    Piperidis, S.: The META-SHARE language resources sharing infrastructure: principles, challenges, solutions. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation, pp. 36–42 (2012)Google Scholar
  19. 19.
    Rodriguez-Doncel, V., Villata, S., Gomez-Perez, A.: A dataset of RDF licenses. In: Proceedings of the 27th International Conference on Legal Knowledge and Information System (JURIX), pp. 187–189 (2014)Google Scholar
  20. 20.
    Soria, C., Calzolari, N., Monachini, M., Quochi, V., Bel, N., Choukri, K., Mariani, J., Odijk, J., Piperidis, S.: The language resource strategic agenda: the flarenet synthesis of community recommendations. Lang. Resour. Eval. 48(4), 753–775 (2014). http://dx.doi.org/10.1007/s10579-014-9279-y CrossRefGoogle Scholar
  21. 21.
    \(\check{\text{ D }}\)určo, M., Windhouwer, M.: From CLARIN component metadata to linked open data. In: Proceedings of the 3rd Workshop on Linked Data in Linguistics, pp. 13–17 (2014)Google Scholar
  22. 22.
    Villegas, M., Melero, M., Bel, N.: Metadata as linked open data: mapping disparate XML metadata registries into one RDF/OWL registry. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, pp. 393–400 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • John P. McCrae
    • 1
  • Penny Labropoulou
    • 3
  • Jorge Gracia
    • 2
  • Marta Villegas
    • 4
  • Víctor Rodríguez-Doncel
    • 2
  • Philipp Cimiano
    • 1
  1. 1.Cognitive Interaction Technology, Excellence ClusterBielefeld UniversityBielefeldGermany
  2. 2.Ontology Engineering GroupUniversidad Politécnica de MadridMadridSpain
  3. 3.CILSP/“Athena” RCAthensGreece
  4. 4.University Pompeu FabraBarcelonaSpain

Personalised recommendations