An approach to measuring and annotating the confidence of Wiktionary translations


Wiktionary is an online collaborative project based on the same principle than Wikipedia , where users can create, edit and delete entries containing lexical information. While the open nature of Wiktionary is the reason for its fast growth, it has also brought a problem: how reliable is the lexical information contained in every article? If we are planing to use Wiktionary translations as source content to accomplish a certain use case, we need to be able to answer this question and extract measures of their confidence . In this paper we present our work on assessing the quality of Wiktionary translations by introducing confidence metrics. Additionally, we describe our effort to share Wiktionary translations and the associated confidence values as linked data.

    An example of this can be found for the German translation of the word “banco” in the Spanish language edition ( The available translations contain the German translation “Bank”, however it points to an empty page (, showing that the term does not exist in the Spanish edition. The translation link is well created in the opposite direction, i.e., from German to Spanish.

    The latest developments on lemon by part of the Ontology Lexicon (Ontolex) community group can be found at

    For a detailed description of the different lemon components we refer the reader to the official cookbook at

    At the time of writing, this modification is being considered as a possible approach to model explicit translations in lemon. Further discussions are available at

    The category registry can be found at

    The dataset is available at

Appendix: Dataset example

Appendix: Dataset example

In the following, we show an example of how our data model can be used to describe lexical translations. We have taken the word “able” in English and build the associated ISG for Spanish as described in Sect. 3. The resulting graph is shown in Fig. 3. Table 3 contains the adjacency matrix with the existing translations and the computed confidence after combining the individual PageRank scores. Note that for this example we take the ISG as the only graph under consideration and therefore it is equivalent to the USG. Listing 5 depicts the generated model in turtle notation.

Table 3 Adjacency matrix corresponding to \({ ISG}_{en,es}\big (able\big )\) and associated confidence

