Improving the consistency of usage labelling in dictionaries with TEI Lex-0


This paper analyzes the application of usage labels in three representative lexicographic works, namely the Portuguese, Spanish, and French Academy Dictionaries as a starting point for creating a consistent classification of usage labels and their encoding in accordance with TEI Lex-0. The use of labels is not always entirely consistent within individual dictionaries and even less so across different lexicographic projects. This makes the tasks of accurately classifying and encoding them quite difficult. This difficulty is compounded by the differences and partial incompatibilities found in the lexicographic literature on the treatment of diasystemic information. We address the existing literature and the initial classification of TEI Lex-0, and argue for the need to introduce some changes to TEI Lex-0, most notably in terms of diatextual labels. Finally, we argue that the existing classifications based on examples rather than on clear and explicit definitions of classification categories will always lack in precision and lead to mutually incompatible encodings of different dictionaries. We propose a set of definitions for usage label categories that can be adopted by TEI Lex-0 and used in other similar attempts to create interoperable lexical resources. An agreement on usage label categories is a first and necessary step before proceeding in the direction of harmonizing and standardizing the actual values of usage labels across various dictionaries and across different languages.

  1. 1.

    The three selected dictionaries are representative of the academic tradition in European lexicography (see Considine 2014). Each dictionary under consideration is a large, scholarly, monolingual dictionary of a major Romance language undertaken by a national academy. While further work could be taken up to extend our study to other traditions, and other languages, this would be beyond the scope of what can be achieved in a single journal article.

  2. 2.

  3. 3.

  4. 4.

    TEI is a de facto standard in digital edition or text annotation projects, and it is frequently used in Digital Humanities as the basis for a large number of current lexicographic projects, such as,, and

  5. 5.

    The electronic version of the DACL is not publicly available, but the first author of this paper is the coordinator of the new edition. The Natural Language Processing group of the Computer Science Department of the University of Minho has been developing the technological support of the new digital edition of DACL, counting on the participation of Alberto Simões from IPCA (Instituto Politécnico do Cávado e do Ave), responsible for the technological support, José João Almeida, and the consultancy of Álvaro Iriarte Sanromán, both from University of Minho. The participation of NOVA CLUNL (Linguistic Research Center of NOVA University of Lisbon) is related to its transition into the TEI LEX-0 format.

  6. 6.

  7. 7.

    We would like to thank ILex (Institute of Lexicography of the RAE) for allowing us to use some statistics obtained during a 3-week stay in the scope of a scholarship granted to the first author by ELEXIS ( Therefore, at the time of that stay (November 2018), the database of the DLE had 95,410 entries, in a total of 198,176 senses.

  8. 8.

    círculo in Dicionário Infopédia da Língua Portuguesa [em linha]. Porto: Porto Editora, 2003–2019. [consult. 2019-08-15]:írculo.

  9. 9.

  10. 10.

  11. 11.

  12. 12.

    For further details, see chapter 6—“Usage Information”:

  13. 13.

    Typed element is used to indicate an element that can have a type, and it specifies a set of values.


Research financed by Portuguese National Funding through the FCT—Fundação para a Ciência e Tecnologia as part of the project Centro de Linguística da Universidade NOVA de Lisboa—UID/LIN/03213/2019, and by the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 731015 (ELEXIS).

  • Lexicography
  • Usage labels
  • Diasystemic information
  • TEI