Skip to main content
Log in

DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper is a contribution to the discussion on compiling computational lexical resources from conventional dictionaries. It describes the theoretical as well as practical problems that are encountered when reusing a conventional dictionary for compiling a lexical-semantic resource in terms of a wordnet. More specifically, it describes the methodological issues of compiling a wordnet for Danish, DanNet, from a monolingual basis, and not—as is often seen—by applying the translational expansion method with Princeton WordNet as the English source. Thus, we apply as our basis a large, corpus-based printed dictionary of modern Danish. Using this approach, we discuss the issues of readjusting inconsistent and/or underspecified hyponymy hierarchies taken from the conventional dictionary, sense distinctions as opposed to the synonym sets of wordnets, generating semantic wordnet relations on the basis of sense definitions, and finally, supplementing missing or implicit information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. Actually, Ruus (1995: 130) argues that some of these hyponyms are characterised by the fact that a limited set of features can distinguish them from each other. She uses Grandy’s terminology and calls such hyponyms contrast sets.

  2. It should be made clear that multiple inheritance is also rather frequent with 1st Order Entities. For instance, in the previously mentioned example of grøntsag (vegetable) several vegetables are encoded partly with plante (plant) or plantedel (part of plant) as hypernym, partly with grøntsag as hypernym.

  3. ‘Domain’ is an ontological type that has been inserted by DanNet (not EWN ontology). Such additions are given in cases where large groups of synsets call for a more specific ontological type than what is given by the EuroWordNet Ontology. Another example of a DanNet extension of the ontology is the ontological type BodyPart.

  4. A more detailed account of this is given in Asmussen (2007).

References

  • Agirre, E., Ansa, O., Arregi, X., Artola, X., Díaz de Ilarraza, A., Lersundi, M., et al. (2000). Extraction of semantic relations from a Basque monolingual dictionary using constraint grammar. In Proceedings from the ninth Euralex international congress (pp. 639–640). Universität Stuttgart.

  • Asmussen, J. (2007). Korpuslinguistische Verfahren zur Optimierung lexikalisch-semantischer Beschreibungen. In W. Kallmeyer & G. Zifonun (Eds.), Jahrbuch des Instituts für Deutsche Sprache 2006 (pp. 123–151). Berlin and New York: Walter de Gruyter.

    Google Scholar 

  • Boguraev, B., & Briscoe, T. (Eds.). (1989). Computational lexicography for natural language processing. London and New York: Longman.

    Google Scholar 

  • Church, K., & Hanks, P. (1989). Word association norms, mutual information and lexicography. In ACL proceedings, 27th annual meeting, Vancouver.

  • Cruse, D. A. (1991). Lexical semantics. Cambridge: Cambridge University Press.

    Google Scholar 

  • Cruse, D. A. (2002). Hyponymy and its varieties. In R. Green, C. A. Bean, & S. H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective, information science and knowledge management (pp. 2–21). Springer.

  • DDO = Hjorth, E., Kristensen, K., et al. (Eds.). (2003–2005). Den Danske Ordbog 1–6 (‘The Danish dictionary 1–6’). Copenhagen: Gyldendal and Society for Danish Language and Literature.

  • Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawislawska, M., & Broda, B. (2008). Words, concepts and relations in the construction of the polish WordNet. In Global WordNet Conference 2008 (pp. 162–177). Szeged, Hungary.

  • Dirven, R., & Verspoor, M. (Eds.). (1998). Cognititive exploration of language and linguistics. Amsterdam/Philadelphia: John Benjamins.

    Google Scholar 

  • Dunning, T. (1994). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.

    Google Scholar 

  • Fellbaum, C. (2002). Parallel hierarchies in the verb lexicon. In Proceedings of the OntoLex workshop, LREC (pp. 27–31). Las Palmas, Spain.

  • Fernández-Montraveta, A., Vázquez, G., & Fellbaum, C. (2008). The Spanish version of WordNet 3.0. Text resources and lexical knowledge. In Text, translation, computational processing (pp. 175–182). Berlin and New York: Mouton de Gruyter.

  • Fillmore, C. J., Johnson, C. R., & Petruck, M. R. L. (2003). Background to FrameNet. International Journal of Lexicography, 16(3), 235–250 (Oxford: Oxford University Press).

    Google Scholar 

  • Fontenelle, T. (1997). Using a bilingual dictionary to create semantic networks. International Journal of Lexicography, 10(4), 275–303.

    Google Scholar 

  • Guarino, N. (1998). Some ontological principles for designing upper level lexical resources. In Proceedings from the first international conference on language resources and evaluation (pp. 527–534). Granada.

  • Guarino, N., & Welty, C. (2002). Identity and subsumption. In R. Green, C. A. Bean & S. H. Myaeng (Eds.), The semantics of relationships: An interdisciplinary perspective, information science and knowledge management. Springer.

  • Huang, C., Hsiao, P., Su, I., & Ke, X. (2008). Paranymy: Enriching ontological knowledge in WordNets. In Proceedings of the fourth global WordNet conference (pp. 221–228). Szeged, Hungary.

  • Ide, N., & Véronis, J. (1995). Knowledge extraction from machine-readable dictionaries: An evaluation. In P. Steffens (Ed.), Machine translation and the lexicon, third international EAMT workshop, Heidelberg, April 26–28, 1993, proceedings. Lecture Notes in Computer Science 898, Springer.

  • Ide, N., & Wilks, Y. (2007). Making sense about senses. In E. Agirre & P. Edmonds (Eds.), Word sense disambiguation—Algorithms and applications. Springer.

  • Jackson, H. (2002). Lexicography: An introduction. London: Routledge.

    Google Scholar 

  • Kilgarriff, A. (1997). I don’t believe in word senses. Computers and the Humanities, 31(2), 91–113.

    Article  Google Scholar 

  • Kokkinakis, D., Toporowska Gronostaj, M., & Warmenius, K. (2000). Annotating, disambiguating & automatically extending the coverage of the Swedish SIMPLE lexicon. In Proceedings from the second international conference on language resources and evaluation (pp. 1397–1405). Athens.

  • Lenci, A., Bel, N., Busa, F., Calzolari, N., Gola, E., Monachini, M., et al. (2000). SIMPLE—A general framework for the development of multilingual lexicons. International Journal of Lexicography, 13(4), 249–263.

    Google Scholar 

  • Levin, B. (1993). English verb classes and alternations—A preliminary investigation. Chicago: The University of Chicago Press.

  • Lorentzen, H. (2004). The Danish dictionary at large: Presentation, problems and perspectives. In G. Williams & S. Vessier (Eds.), Proceedings of the eleventh EURALEX international congress (pp. 285–294). Lorient, France.

  • Lyons, J. (1977). Semantics. Cambridge: Press Syndicate of the University of Cambridge.

  • Márton, M., Hatvani, C., Kuti, J., Szarvas, G., Csirik, J., Prószéky, G., et al. (2008). Methods and results of the Hungarian WordNet project. In Proceedings of the fourth global WordNet conference (pp. 311–320). Szeged, Hungary.

  • Miller, G. A. (1998). Nouns in WordNet. In C. Fellbaum (Ed.), WordNet—An electronic lexical database (pp. 23–47). Cambridge, London: The MIT Press.

    Google Scholar 

  • Norling-Christensen, O., & Asmussen, J. (1998). The corpus of the Danish dictionary. Lexikos. Afrilex Series, 8, 223–242.

    Google Scholar 

  • Pedersen, B. S., & Nimb, S. (2000). Semantic encoding of Danish verbs in SIMPLE—Adapting a verb-framed model to a satellite-framed language. In Proceedings from the second international conference on language resources and evaluation (pp. 1405–1412), Language resources and evaluation—LREC 2000, Athens.

  • Pedersen, B. S., & Paggio, P. (2004). The Danish SIMPLE lexicon and its application in content-based querying. Nordic Journal of Linguistics, 27(1), 97–127.

    Article  Google Scholar 

  • Pedersen, B. S., & Sørensen, N. H. (2006). Towards sounder taxonomies in wordnets. In A. Oltramari Chu-Ren Huang, A. Lenci, P. Buuitelaar, & C. Fellbaum (Eds.). Ontolex 2006 at 5th international conference on language resources and evaluation (pp. 9–16), Genova, Italy.

  • Pedersen, B. S., Braasch, A., Henriksen, L., Olsen, S., & Povlsen, C. (2008). Merging a syntactic resource with a WordNet: A feasibility study of a merge between STO and DanNet. In Proceedings from the sixth international conference on language resources and evaluation, Marrakech, Morocco.

  • Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Rigau, G., & Agirre, E. (2002). Semi-automatic methods for WordNet construction. In Tutorial at 2002 international WordNet conference, Mysore, India.

  • Rodríguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., Martí, M. A., et al. (2008). Arabic WordNet: Current state and future extension. In Proceedings of the fourth global WordNet conference (pp. 387–405). Szeged, Hungary.

  • Ruus, H. (1995). Danske kerneord. Copenhagen: Museum Tusculanums Forlag.

  • Svensén, B. (1993). Practical lexicography. Principles and methods of dictionary-making. Oxford: Oxford University Press [translated from the Swedish Handbok i lexikografi (1987) by Sykes, J. & Schofield, K.].

  • Veale, T., & Hao, Y. (2008). Enriching WordNet with folk knowledge and stereotypes. In Proceedings of the fourth global WordNet conference, Szeged, Hungary.

  • Vossen, P. (Ed.). (1999). EuroWordNet, a multilingual database with lexical semantic networks. The Netherlands: Kluwer.

    Google Scholar 

  • Vossen, P., Maks, I., Segers, R., & van der Vliet, H. (2008). Integrating lexical units, synsets and ontology in the Cornetto database. In Proceedings from the 6th international conference on language resources and evaluation, language resources and evaluation—LREC 2008, Marrakech, Morocco.

  • Zgusta, L. (1988). Pragmatics, lexicography and dictionaries of English. World Englishes, 7(3), 243–253.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bolette Sandford Pedersen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pedersen, B.S., Nimb, S., Asmussen, J. et al. DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Lang Resources & Evaluation 43, 269–299 (2009). https://doi.org/10.1007/s10579-009-9092-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-009-9092-1

Keywords

Navigation