What Types of Translations Hide in Wikipedia?

  • Jonas Sjöbergh
  • Olof Sjöbergh
  • Kenji Araki
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4938)


We extend an automatically generated bilingual Japanese-Swedish dictionary with new translations, automatically discovered from the multi-lingual online encyclopedia Wikipedia. Over 50,000 translations, most of which are not present in the original dictionary, are generated, with very high translation quality. We analyze what types of translations can be generated by this simple method. The majority of the words are proper nouns, and other types of (usually) uninteresting translations are also generated. Not counting the less interesting words, about 15,000 new translations are still found. Checking against logs of search queries from the old dictionary shows that the new translations would significantly reduce the number of searches with no matching translation.


Proper Noun Parallel Corpus Translation Quality Bilingual Dictionary Title Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Koehn, P., Knight, K.: Knowledge sources for word-level translation models. In: Proceedings of EMNLP 2001, Pittsburgh, USA (2001)Google Scholar
  2. 2.
    Sjöbergh, J.: Creating a free digital Japanese-Swedish lexicon. In: Proceedings of PACLING 2005, Tokyo, Japan, pp. 296–300 (2005)Google Scholar
  3. 3.
    Adafre, S.F., de Rijke, M.: Finding similar sentences across multiple languages in Wikipedia. In: EACL 2006 Workshop on New Text – Wikis and Blogs and Other Dynamic Text Sources, Trento, Italy (2006)Google Scholar
  4. 4.
    Wang, Y.C., et al.: IASL system for NTCIR-6 Korean-Chinese cross-language information retrieval. In: Proceedings of NTCIR-6 Workshop, Tokyo, Japan (2007)Google Scholar
  5. 5.
    Su, C.Y., Wu, S.H., Lin, T.C.: Using Wikipedia to translate OOV terms on MLIR. In: Proceedings of NTCIR-6 Workshop, Tokyo, Japan (2007)Google Scholar
  6. 6.
    Mori, T., Takahashi, K.: A method of cross-lingual question-answering based on machine translation and noun phrase translation using web documents. In: Proceedings of NTCIR-6 Workshop, Tokyo, Japan (2007)Google Scholar
  7. 7.
    Fukuhara, T., Murayama, T., Nishida, T.: Analyzing concerns of people from Weblog articles. AI & Society (in press, 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jonas Sjöbergh
    • 1
  • Olof Sjöbergh
    • 2
  • Kenji Araki
    • 1
  1. 1.Graduate School of Information Science and TechnologyHokkaido University 
  2. 2.KTH CSC 

Personalised recommendations