Skip to main content

Integrating WordNet and Wiktionary with lemon

  • Chapter
Linked Data in Linguistics

Abstract

Nowadays, there is a significant quantity of linguistic data available on the Web. However, linguistic resources are often published using proprietary formats and, as such, it can be difficult to interface with one another and they end up confined in “data silos”. The creation of web standards for the publishing of data on the Web and projects to create Linked Data have lead to interest in the creation of resources that can be published using Web principles. One of the most important aspects of “Lexical Linked Data” is the sharing of lexica and machine readable dictionaries. It is for this reason, that the lemon format has been proposed, which we briefly describe. We then consider two resources that seem ideal candidates for the Linked Data cloud, namely WordNet 3.0 and Wiktionary, a large document based dictionary. We discuss the challenges of converting both resources to lemon , and in particular for Wiktionary, the challenge of processing the mark-up, and handling inconsistencies and underspecification in the source material. Finally, we turn to the task of creating links between the two resources and present a novel algorithm for linking lexica as lexical Linked Data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Berners-Lee T (2009) Linked Data-The Story So Far. International Journal on Semantic Web and Information Systems 5(3):1–22

    Article  Google Scholar 

  • Chiarcos C (2010) Grounding an Ontology of Linguistic Annotations in the Data Category Registry. In: Proceedings of the 2010 International Conference on Language Resource and Evaluation (LREC)

    Google Scholar 

  • Chiarcos C (this vol.) Interoperability of corpora and annotations. pp 161–179

    Google Scholar 

  • Cimiano P, Buitelaar P, McCrae J, Sintek M (2010) Lexinfo: A declarative model for the lexicon-ontology interface. Web Semantics: Science, Services and Agents on the World Wide Web

    Google Scholar 

  • Farrar S, Langendoen D (2003) Markup and the GOLD Ontology. In: Proceedings of Workshop on Digitizing and Annotating Text and Field Recordings

    Google Scholar 

  • Fellbaum C (1998) WordNet: An electronic lexical database. MIT press Cambridge, MA

    MATH  Google Scholar 

  • Kemps-Snijders M, Windhouwer M, Wittenburg P, Wright S (2008) ISOcat: Corralling data categories in the wild. In: Proceedings of the 2008 International Conference on Language Resource and Evaluation (LREC)

    Google Scholar 

  • Kipper-Schuler K (2005) Verbnet: A broad coverage, comprehensive verb lexicon. PhD thesis, University of Pennsylvania

    Google Scholar 

  • Levin B (1993) English Verb Classes and Alternations: A Preliminary Investigation.. University of Chicago Press, Chicago

    Google Scholar 

  • McCrae J, Spohr D, Cimiano P (2011) Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. The Semantic Web: Research and Applications 245–259

    Google Scholar 

  • McCrae J, Aguado-de Cea G, Buitelaar P, Cimiano P, Declerck T, Gomez-Perez A, Gracia J, Hollink L, Montiel-Ponsoda E, Spohr D, Wunner T (in press) Interchanging lexical resources on the semantic web. Language Resources and Evaluation

    Google Scholar 

  • Montiel-Ponsoda E, Gracia J, Aguado de Cea G, Gómez-Pérez A (2011) Representing translations on the semantic web. In: Proceedings CW (ed) Proceedings of the 2nd International Workshop on the Multilingual Semantic Web 2011 (MSW 2011), vol 775, pp 25–37

    Google Scholar 

  • Zesch T, Müller C, Gurevych I (2008) Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), Citeseer, pp 1646–1652

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John McCrae .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

McCrae, J., Montiel-Ponsoda, E., Cimiano, P. (2012). Integrating WordNet and Wiktionary with lemon . In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds) Linked Data in Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28249-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28249-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28248-5

  • Online ISBN: 978-3-642-28249-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics