Abstract
The chapter on formats and models for lexicons deals with different available data formats of lexical resources. It elaborates on their structure and possible uses. Motivated by the restrictions in merging different lexical resources based on widely spread formalisms and international standards, a formal lexicon model for lexical resources is developed which is related to graph structures in annotations. For lexicons this model is termed the Lexicon Graph. Within this model the concepts of lexicon entries and lexical structures frequently described in the literature are formally defined and examples are given. The article addresses the problem of ambiguity in those formal terms. An implementation based on XML and XML technology such as XQuery for the defined structures is given. The relation to international standards is included as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atkins, Sue; Bel, Nuria; Bouillon, Pierrette; Charoenporn, Thatsanee; Gibbon, Dafydd; Grishman, Ralph; Huang, Chu-Ren; Kawtrakul, Asanee; Ide, Nancy; Lee, Hae-Yun; Li, Paul J. K.; McNaught, Jock; Odijk, Jan; Palmer, Martha; Quochi, Valeria; Reeves, Ruth; Sharma, Dipti Misra; Sornlertlamvanich, Virach; Tokunaga, Takenobu; Thurmair, Gregor; Villegas, Marta; Zampolli, Antonio; and Zeiton, Elizabeth. Standards and best practice for multilingual computational lexicons and MILE (the multilingual ISLE lexical entry). Deliverable d2.2-d3.2 isle computational lexicon working group, International Standards for Language Engineering (ISLE), Pisa. undated.
Bird, Steven and Liberman, Mark (2001). A formal framework for linguistic annotation. Speech Communication, 33(1,2):23–60.
Bird, Steven and Simons, Gary (2003). Seven dimensions of portability for language documentation and description. Language, 79(3):557–582.
Boag, Scott, Chamberlin, Don, Fernandez, Mary F., Florescu, Daniela, Robie, Jonathan, and Siméon, Jérôme (2007). XQuery 1.0: An XML query language. URL: http://www.w3.org/TR/xquery/, accessed 2009-06-05. W3C Recommendation 23 January 2007.
Bozsahin, Cem, Kruijff, Geert-Jan M., and White, Michael (2005). Specifying grammars for OpenCCG: A rough guide.
Crowther, Jonathan, editor (1995). Oxford Advanced Learner’s Dictionary of Current English. Oxford University Press, Oxford, fifth edition edition.
Gibbon, Dafydd (2005). Spoken language lexicography: an integrative framework. Translatologie – Neue Ideen und Ansätze., pages 247–289.
Hartmann, Reinhard R. K. (2001). Teaching and Researching Lexicography. Applied Linguistics in Action. Pearson Education, Harlow.
ISO 12620 (1999). Computer applications in terminology – data categories. International Standard.
ISO 24610-1 (2006). Language resource management—feature structures—part 1: Feature structure representation. International Standard.
ISO-FDIS 12620:2009 (2009). Terminology and other content and language resources – data categories – specification of data categories and management of a data category registry for language resources. Final Draft International Standard.
ISO 16642:2003 (2003). Computer applications in terminology—Terminology markup framework (TMF) International Standard.
ISO-DIS 24612:2009 (2009). Language resource management—Linguistic annotation framework (LAF) Draft International Standard.
ISO 24613:2008 (2008). Language resource management—Lexical markup framework (LMF). International Standard.
ISO 30042:2008 (2008). Term-base exchange (TBX) format specification. International Standard.
Miller, George A., Fellbaum, Christiane, Tengi, Randee, Wolff, Susanne, Wakefield, Pamela, Langone, Helen, and Haskell, Benjamin (2006). Wordnet 3.0. URL: http://wordnet.princeton.edu/ accessed 2009-06-05. Cognitive Science Laboratory at Princeton University.
Pustejovsky, James (1995). The Generative Lexicon. MIT Press, Cambridge (Ma), London.
Sag, Ivan A., Wasow, Thomas, and Bender, Emily M. (2003). Syntactic Theory. CSLI Publications, Stanford, 2nd edition.
TEI P5 (2008). Dictionaries. In TEI P5: Guidelines for Electronic Text Encoding and Interchange. Section 9.
Trippel, Thorsten (2005). The Lexicon Graph Model: A Generic Model for Multimodal Lexicon Development. PhD thesis, Universität Bielefeld, Bielefeld.
Trippel, Thorsten (2006). The Lexicon Graph Model: A Generic Model for Multimodal Lexicon Development. AQ Verlag, Saarbrücken.
Trippel, Thorsten; Thies, Alexandra; Milde, Jan-Torsten; Looks, Karin; Gut, Ulrike; and Gibbon, Dafydd (2004). CoGesT: a formal transcription system for conversational gesture. In Proceedings of LREC 2004, Lisbon. ELRA.
Zipf, George Kingsley (1935). The Psycho-Biology of Language. Houghton Mifflin Company, Boston.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Trippel, T. (2010). Representation Formats and Models for Lexicons. In: Witt, A., Metzing, D. (eds) Linguistic Modeling of Information and Markup Languages. Text, Speech and Language Technology, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3331-4_9
Download citation
DOI: https://doi.org/10.1007/978-90-481-3331-4_9
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3330-7
Online ISBN: 978-90-481-3331-4
eBook Packages: Computer ScienceComputer Science (R0)