Skip to main content
Log in

Interchanging lexical resources on the Semantic Web

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Lexica and terminology databases play a vital role in many NLP applications, but currently most such resources are published in application-specific formats, or with custom access interfaces, leading to the problem that much of this data is in “data silos” and hence difficult to access. The Semantic Web and in particular the Linked Data initiative provide effective solutions to this problem, as well as possibilities for data reuse by inter-lexicon linking, and incorporation of data categories by dereferencable URIs. The Semantic Web focuses on the use of ontologies to describe semantics on the Web, but currently there is no standard for providing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology. We present our model, lemon, which aims to address these gaps while building on existing work, in particular the Lexical Markup Framework, the ISOcat Data Category Registry, SKOS (Simple Knowledge Organization System) and the LexInfo and LIR ontology-lexicon models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Note that LMF is a meta-model and hence other serializations could be consistent with the model. However for the purposes of this paper, we refer to the RDF and XML serializations described at http://www.lexicalmarkupframework.org/.

  2. Note that “lemma” approximately corresponds to “canonical form” in lemon and we specify the xml:lang special property on each string in lemon.

  3. The ISOcat results are based on public data categories, of which there are a total of 3,036 of these 9 lack a specified type and 17 are typed as both simple and complex. Results retrieved 17th January 2012.

  4. See http://dublincore.org/.

  5. We reference ISOcat by the use of the data category number, and put a readable comment to each property. In the diagrams, we put only the readable description.

  6. We note that more precise modelling of the phrase structure of the term is possible using the lemon model. This is described further in the “lemon cookbook” available at http://lexinfo.net/lemon-cookbook.pdf.

  7. i.e., s lemon:reference x 1s lemon:reference \(x_2\,\vdash\,x_1\) owl:sameAs x 2, if both x 1 and x 2 are individuals.

  8. Here we use our lemon-aligned version of LexInfo, as ISOcat does not currently have many data categories for subcategorisation. Note that it is not strictly necessary to define these properties as subproperties of lemon, as they are already published as such.

  9. Available as part of the lemon Java API.

  10. Available at http://monnetproject.deri.ie/lemonsource.

  11. https://github.com/jmccrae/lemon.api/

  12. http://www.w3.org/community/ontolex/.

References

  • Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., et al. (2000). Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics, 25(1), 25.

    Article  Google Scholar 

  • Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. In: Proceedings of the 6th international Semantic Web conference (ISWC) (pp. 722–735).

  • Baker, C., Fillmore, C., & Lowe, J. (1998). The Berkeley FrameNet project. In: Proceedings of the 36th annual meeting of the association for computational linguistics (ACL) (pp. 86–90).

  • Beckett, D., & Berners-Lee, T. (2008). Turtle–Terse RDF triple language. http://www.w3.org/TeamSubmission/turtle/. Accessed October 19, 2010.

  • Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data—the story so far. International Journal on Semantic Web and Information Systems, 5, 1–22.

    Google Scholar 

  • Buitelaar, P. (2010). Ontology-based Semantic Lexicons: Mapping between terms and object descriptions. In: C.R. Huang, N. Calzolari, A. Gangemi, A. Lenci, A. Oltramari & L. Prevot (Eds.), Ontology and the Lexicon (pp. 212–223). Cambridge: Cambridge University Press.

  • Buitelaar, P., Cimiano, P., Haase, P., & Sintek, M. (2009). Towards linguistically grounded ontologies. In: Proceedings of the European Semantic Web conference (ESWC) (pp. 111–125).

  • Chiarcos, C. (2010). Grounding an ontology of linguistic annotations in the data category registry. In: Proceedings of the international conference on language resource and evaluation (LREC) (pp. 37–40).

  • Cimiano, P., Buitelaar, P., McCrae, J., & Sintek, M. (2011). LexInfo: A declarative model for the lexicon-ontology interface. Journal of Web Semantics, 9(1), 29–51.

    Article  Google Scholar 

  • Farrar, S., & Langendoen, D. (2003). Markup and the GOLD ontology. In: Proceedings of workshop on digitizing and annotating text and field recordings (pp. 845–862).

  • Fellbaum, C. (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT press.

    Google Scholar 

  • Francopoulo, G., George, M., Calzolari, N., Monachini, M., Bel, N., Pet, M., et al. (2006). Lexical markup framework (LMF). In: Proceedings of the international conference on language resource and evaluation (LREC) (pp. 233–236).

  • Grishman, R., Macleod, C., & Meyers, A. (1994). COMLEX syntax: Building a computational lexicon. In: Proceedings of the 15th international conference on computational linguistics (COLING) (pp. 268–272).

  • Isaac, A., Phipps, J., & Rubin, D. (2009). SKOS use cases and requirements. http://www.w3.org/TR/2009/NOTE-skos-ucr-20090818/, Accessed 19 October 2010.

  • Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., & Wright, S. (2008). ISOcat: Corralling data categories in the wild. In: Proceedings of the international conference on language resource and evaluation (LREC) (pp. 887–891).

  • Kifer, M. (2008). Rule interchange format: The framework. In: Proceedings of the 2nd international conference on web reasoning and rule systems (pp. 1–11).

  • Kilgarriff, A. (1997). I don’t believe in word senses. Computers and the Humanities, 31(2), 91–113.

    Article  Google Scholar 

  • Kim, J., Ohta, T., Tateisi, Y., & Tsujii, J. (2003). GENIA corpus—a semantically annotated corpus for bio-textmining. Bioinformatics, 19(1), 180–182.

    Article  Google Scholar 

  • McCrae, J., Spohr, D., & Cimiano, P. (2011). Linking lexical resources and ontologies on the semantic web with lemon. In: Proceedings of the 8th extended Semantic Web conference (ESWC-11) (pp. 245–259).

  • Miles, A., & Bechhofer, S. (2009). SKOS simple knowledge organization system reference. http://www.w3.org/TR/skos-reference/. Accessed October 19, 2010.

  • Montiel-Ponsoda, E., Aguado de Cea, G., Gómez Pérez, A., & Peters, W. (2010). Enriching ontologies with multilingual information. In B. Boguraev, J. Tait, & M. Palmer (Eds.), Natural language engineering (pp 1–27). Cambridge: Cambridge University Press.

  • Reymonet, A., Thomas, J., & Aussenac-Gilles, N. (2007). Modelling ontological and terminological resources in OWL-DL. In: Proceedings of the 6th international Semantic Web conference (ISWC) (pp. 415–425).

  • Romary, L. (2010). Standardization of the formal representation of lexical information for NLP. In: Dictionaries: An international encyclopedia of lexicography. Mouton de Gruyter. http://arxiv.org/abs/0911.5116v1.

  • Shadbolt, N., Hall, W., & Berners-Lee, T. (2006). The semantic web revisited. IEEE Intelligent Systems, 21(3), 96–101.

    Article  Google Scholar 

  • Scheffczyk, J., Pease, A., & Ellsworth, M. (2006). Linking FrameNet to the suggested upper merged ontology. In: Formal ontology in information systems (FOIS-2006) (pp. 289–300).

  • Van Assem, M., Gangemi, A., & Schreiber, G. (2006). Conversion of WordNet to a standard RDF/OWL representation. In: Proceedings of the fifth international conference on language resources and evaluation (LREC) (pp. 237–242).

  • Vossen, P. (1998). EuroWordNet: A multilingual database with lexical semantic networks. Computational Linguistics, 25(4), 628–630.

    Google Scholar 

  • Vossen, P., Bloksma, L., Peters, W., Kunze, C., Wagner, A., Pala, K., et al. (1999). Extending the Inter-Lingual-Index with new concepts. Deliverable 2D010, EuroWordNet, LE2-4003.

Download references

Acknowledgments

The lemon model and associated tools have been developed in the context of the Monnet project, which is funded by the European Union FP7 program under grant number 248458, the CITEC excellence initiative funded by the DFG (Deutsche Forschungsgemeinschaft), the Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2), the Spanish Ministry of Science and Innovation within the Juan de la Cierva program and the Spanish Project BabelData (TIN2010-17550). We would like to thank the following people for their contributions and advice: Axel Polleres (DERI), Antoine Zimmermann (DERI), Dimitra Anastasiou (CNGL), Susan Marie Thomas (SAP), Christina Unger (CITEC), Sue Ellen Wright (Kent State University), Menzo Windhouwer (Universiteit van Amsterdam).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John McCrae.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McCrae, J., Aguado-de-Cea, G., Buitelaar, P. et al. Interchanging lexical resources on the Semantic Web. Lang Resources & Evaluation 46, 701–719 (2012). https://doi.org/10.1007/s10579-012-9182-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-012-9182-3

Keywords

Navigation