Advertisement

Language Resources and Evaluation

, Volume 51, Issue 2, pp 319–349 | Cite as

An approach to measuring and annotating the confidence of Wiktionary translations

  • Antonio J. Roa-ValverdeEmail author
  • Salvador Sanchez-Alonso
  • Miguel-Angel Sicilia
  • Dieter Fensel
Original Paper

Abstract

Wiktionary is an online collaborative project based on the same principle than Wikipedia , where users can create, edit and delete entries containing lexical information. While the open nature of Wiktionary is the reason for its fast growth, it has also brought a problem: how reliable is the lexical information contained in every article? If we are planing to use Wiktionary translations as source content to accomplish a certain use case, we need to be able to answer this question and extract measures of their confidence . In this paper we present our work on assessing the quality of Wiktionary translations by introducing confidence metrics. Additionally, we describe our effort to share Wiktionary translations and the associated confidence values as linked data.

Keywords

Linguistics Linked data Ranking Random walks 

References

  1. Blumenstock, J. E. (2008). Size matters: Word count as a measure of quality on Wikipedia. In Proceedings of the 17th international conference on world wide web, WWW ’08 (pp. 1095–1096). ACM, New York, NY, USA. doi: 10.1145/1367497.1367673.
  2. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRefGoogle Scholar
  3. Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. WWW.Google Scholar
  4. Fuertes-Olivera, P. A. (2009). The function theory of lexicography and electronic dictionaries: Wiktionary as a prototype of collective free multiple-language internet dictionary. In Lexicography at a crossroads: Dictionaries and encyclopedias today, Lexicographical Tools Tomorrow (pp. 99–134). Bern: Peter Lang.Google Scholar
  5. Gracia, J., Montiel-Ponsoda, E., Vila-Suero, D., & de Cea, G. A. (2014). Enabling language resources to expose translations as linked data on the web. In LREC (pp. 409–413).Google Scholar
  6. Lih, A. (2004). Wikipedia as participatory journalism: Reliable sources? metrics for evaluating collaborative media as a news resource. In Proceedings of the 5th international symposium on online journalism (pp. 16–17). http://jmsc.hku.hk/faculty/alih/publications/utaustin-2004-wikipedia-rc2.pdf.
  7. Lim, E. P., Vuong, B. Q., Lauw, H. W., & Sun, A. (2006). Measuring qualities of articles contributed by online communities. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, WI ’06 (pp. 81–87). IEEE Computer Society, Washington, DC, USA. http://dx.doi.org/10.1109/WI.2006.115.
  8. Manola, F., & Miller, E. (2004). Rdf primer. w3c recommendation. http://www.w3.org/TR/2004/REC-rdf-primer-20040210/.
  9. Matuschek, M., Meyer, C. M., & Gurevych, I. (2013). Multilingual knowledge in aligned Wiktionary and Omegawiki for translation applications. Translation: Corpora, Computation, Cognition (TC3), 3(1), 87–118.Google Scholar
  10. McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gmez-Prez, A., et al. (2012). Interchanging lexical resources on the semantic web. Language Resources and Evaluation, 46(4), 701–719.CrossRefGoogle Scholar
  11. Meyer, C. M., & Gurevych, I. (2012). Wiktionary: A new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography. In S. Granger, & M. Paquot (Eds.) Electronic lexicography, chap. 13 (pp. 259–291). Oxford: Oxford University Press. http://www.christian-meyer.org/research/publications/oup-elex2012/.
  12. Miles, A., & Bechhofer, S. (2009a). SKOS simple knowledge organization system extension for labels (SKOS-XL). http://www.w3.org/TR/skos-reference/skos-xl.html.
  13. Miles, A., & Bechhofer, S. (2009b). SKOS simple knowledge organization system reference. http://www.w3.org/TR/2009/REC-skos-reference-20090818/.
  14. Miller, T., & Gurevych, I. (2014). Wordnet—Wikipedia—Wiktionary: Construction of a three-way alignment. In N. C. C. Chair), K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.) Proceedings of the ninth international conference on language resources and evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland.Google Scholar
  15. Montiel-Ponsoda, E., Gracia, J., de Cea, G. A., & Gómez-Pérez, A. (2011). Representing translations on the semantic web. In MSW (pp. 25–37).Google Scholar
  16. Müller, C., & Gurevych, I. (2009). Using Wikipedia and Wiktionary in domain-specific information retrieval. In Proceedings of the 9th cross-language evaluation forum conference on evaluating systems for multilingual and multimodal information access (pp. 219–226). CLEF’08 Berlin, Heidelberg: Springer-Verlag.Google Scholar
  17. Navarro, E., Sajous, F., Gaume, B., Prévot, L., ShuKai, H., & Tzu-Yi, K. et al. (2009). Wiktionary and NLP: Improving synonymy networks. In Proceedings of the 2009 workshop on the people’s web meets NLP: Collaboratively constructed semantic resources (pp. 19–27). People’s Web ’09 Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  18. Sajous, F., Navarro, E., Gaume, B., Prévot, L., & Chudy, Y. (2013). Semi-automatic enrichment of crowdsourced synonymy networks: The wisigoth system applied to Wiktionary. Language Resources and Evaluation, 47(1), 63–96.CrossRefGoogle Scholar
  19. Sérasset, G. (2014). DBnary: Wiktionary as a lemon-based multilingual lexical resource in RDF. Semantic Web Journal: Special issue on Multilingual Linked Open Data. http://hal.archives-ouvertes.fr/hal-00953638.
  20. Weale, T., & Brew, C. F. L. E. (2009). Using the Wiktionary graph structure for synonym detection. In Proceedings of the 2009 workshop on the people’s web meets NLP: Collaboratively constructed semantic resources, People’s Web ’09 (pp. 28–31). Association for Computational Linguistics, Stroudsburg, PA, USA. http://dl.acm.org/citation.cfm?id=1699765.1699769.
  21. Zesch, T., Müller, C., & Gurevych, I. (2008). Using Wiktionary for computing semantic relatedness. In Proceedings of the 23rd national conference on artificial intelligence—Volume 2, AAAI’08 (pp. 861–866). AAAI Press.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  • Antonio J. Roa-Valverde
    • 1
    Email author
  • Salvador Sanchez-Alonso
    • 2
  • Miguel-Angel Sicilia
    • 2
  • Dieter Fensel
    • 1
  1. 1.Semantic Technology InstituteUniversity of InnsbruckInnsbruckAustria
  2. 2.University of AlcaláMadridSpain

Personalised recommendations