Skip to main content

Representation Formats and Models for Lexicons

  • Chapter
  • First Online:
Linguistic Modeling of Information and Markup Languages

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 41))

Abstract

The chapter on formats and models for lexicons deals with different available data formats of lexical resources. It elaborates on their structure and possible uses. Motivated by the restrictions in merging different lexical resources based on widely spread formalisms and international standards, a formal lexicon model for lexical resources is developed which is related to graph structures in annotations. For lexicons this model is termed the Lexicon Graph. Within this model the concepts of lexicon entries and lexical structures frequently described in the literature are formally defined and examples are given. The article addresses the problem of ambiguity in those formal terms. An implementation based on XML and XML technology such as XQuery for the defined structures is given. The relation to international standards is included as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Atkins, Sue; Bel, Nuria; Bouillon, Pierrette; Charoenporn, Thatsanee; Gibbon, Dafydd; Grishman, Ralph; Huang, Chu-Ren; Kawtrakul, Asanee; Ide, Nancy; Lee, Hae-Yun; Li, Paul J. K.; McNaught, Jock; Odijk, Jan; Palmer, Martha; Quochi, Valeria; Reeves, Ruth; Sharma, Dipti Misra; Sornlertlamvanich, Virach; Tokunaga, Takenobu; Thurmair, Gregor; Villegas, Marta; Zampolli, Antonio; and Zeiton, Elizabeth. Standards and best practice for multilingual computational lexicons and MILE (the multilingual ISLE lexical entry). Deliverable d2.2-d3.2 isle computational lexicon working group, International Standards for Language Engineering (ISLE), Pisa. undated.

    Google Scholar 

  • Bird, Steven and Liberman, Mark (2001). A formal framework for linguistic annotation. Speech Communication, 33(1,2):23–60.

    Article  MATH  Google Scholar 

  • Bird, Steven and Simons, Gary (2003). Seven dimensions of portability for language documentation and description. Language, 79(3):557–582.

    Article  Google Scholar 

  • Boag, Scott, Chamberlin, Don, Fernandez, Mary F., Florescu, Daniela, Robie, Jonathan, and Siméon, Jérôme (2007). XQuery 1.0: An XML query language. URL: http://www.w3.org/TR/xquery/, accessed 2009-06-05. W3C Recommendation 23 January 2007.

  • Bozsahin, Cem, Kruijff, Geert-Jan M., and White, Michael (2005). Specifying grammars for OpenCCG: A rough guide.

    Google Scholar 

  • Crowther, Jonathan, editor (1995). Oxford Advanced Learner’s Dictionary of Current English. Oxford University Press, Oxford, fifth edition edition.

    Google Scholar 

  • Gibbon, Dafydd (2005). Spoken language lexicography: an integrative framework. Translatologie – Neue Ideen und Ansätze., pages 247–289.

    Google Scholar 

  • Hartmann, Reinhard R. K. (2001). Teaching and Researching Lexicography. Applied Linguistics in Action. Pearson Education, Harlow.

    Google Scholar 

  • ISO 12620 (1999). Computer applications in terminology – data categories. International Standard.

    Google Scholar 

  • ISO 24610-1 (2006). Language resource management—feature structures—part 1: Feature structure representation. International Standard.

    Google Scholar 

  • ISO-FDIS 12620:2009 (2009). Terminology and other content and language resources – data categories – specification of data categories and management of a data category registry for language resources. Final Draft International Standard.

    Google Scholar 

  • ISO 16642:2003 (2003). Computer applications in terminology—Terminology markup framework (TMF) International Standard.

    Google Scholar 

  • ISO-DIS 24612:2009 (2009). Language resource management—Linguistic annotation framework (LAF) Draft International Standard.

    Google Scholar 

  • ISO 24613:2008 (2008). Language resource management—Lexical markup framework (LMF). International Standard.

    Google Scholar 

  • ISO 30042:2008 (2008). Term-base exchange (TBX) format specification. International Standard.

    Google Scholar 

  • Miller, George A., Fellbaum, Christiane, Tengi, Randee, Wolff, Susanne, Wakefield, Pamela, Langone, Helen, and Haskell, Benjamin (2006). Wordnet 3.0. URL: http://wordnet.princeton.edu/ accessed 2009-06-05. Cognitive Science Laboratory at Princeton University.

  • Pustejovsky, James (1995). The Generative Lexicon. MIT Press, Cambridge (Ma), London.

    Google Scholar 

  • Sag, Ivan A., Wasow, Thomas, and Bender, Emily M. (2003). Syntactic Theory. CSLI Publications, Stanford, 2nd edition.

    Google Scholar 

  • TEI P5 (2008). Dictionaries. In TEI P5: Guidelines for Electronic Text Encoding and Interchange. Section 9.

    Google Scholar 

  • Trippel, Thorsten (2005). The Lexicon Graph Model: A Generic Model for Multimodal Lexicon Development. PhD thesis, Universität Bielefeld, Bielefeld.

    Google Scholar 

  • Trippel, Thorsten (2006). The Lexicon Graph Model: A Generic Model for Multimodal Lexicon Development. AQ Verlag, Saarbrücken.

    Google Scholar 

  • Trippel, Thorsten; Thies, Alexandra; Milde, Jan-Torsten; Looks, Karin; Gut, Ulrike; and Gibbon, Dafydd (2004). CoGesT: a formal transcription system for conversational gesture. In Proceedings of LREC 2004, Lisbon. ELRA.

    Google Scholar 

  • Zipf, George Kingsley (1935). The Psycho-Biology of Language. Houghton Mifflin Company, Boston.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thorsten Trippel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Trippel, T. (2010). Representation Formats and Models for Lexicons. In: Witt, A., Metzing, D. (eds) Linguistic Modeling of Information and Markup Languages. Text, Speech and Language Technology, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3331-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-3331-4_9

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-3330-7

  • Online ISBN: 978-90-481-3331-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics