An Approach for Extracting Bilingual Terminology from Wikipedia

  • Maike Erdmann
  • Kotaro Nakayama
  • Takahiro Hara
  • Shojiro Nishio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4947)


With the demand of bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach to extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast amount of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translation candidates from redirect page and link text information. In an experiment, we proved the advantages of our methods compared to a traditional approach of extracting bilingual terminology from parallel corpora.


Machine Translation Page Title Parallel Corpus Correct Translation Bilingual Dictionary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Shimohata, S.: Finding translation candidates from patent corpus. In: Proceedings of the Machine Translation Summit, September 12-16, 2005, pp. 50–54 (2005)Google Scholar
  2. 2.
    Sadat, F., Yoshikawa, M., et al.: Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval. In: The Companion Volume to the Proceedings of Annual Meeting of the Association for Computational Linguistics, July 2003, pp. 141–144 (2003)Google Scholar
  3. 3.
    Nakayama, K., Hara, T., Nishio, S.: A thesaurus construction method from large scale web dictionaries. In: IEEE International Conference on Advanced Information Networking and Applications (AINA 2007), pp. 932–939 (2007)Google Scholar
  4. 4.
    Nakayama, K., Hara, T., Nishio, S.: Wikipedia mining for an association web thesaurus construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Breen, J.W.: Jmdict: a japanese-multilingual dictionary. In: COLING Multilingual Linguistic Resources Workshop (August 2004)Google Scholar
  6. 6.
    Tsuji, K., Kageura, K.: Automatic generation of japanese-english bilingual thesauri based on bilingual corpora. Journal of the American Society for Information Science and Technology 57(7), 891–906 (2006)CrossRefGoogle Scholar
  7. 7.
    Fung, P., McKeown, K.: A technical word- and term-translation aid using noisy parallel corpora across language groups. Machine Translation 12(1-2), 53–87 (1997)CrossRefGoogle Scholar
  8. 8.
    Kaji, H.: Adapted seed lexicon and combined bidirectional similarity measures for translation equivalent extraction from comparable corpora. In: Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation, October 4-6, 2004, pp. 115–124 (2004)Google Scholar
  9. 9.
    Wikimedia Foundation: Wikimedia downloads,
  10. 10.
    Utiyama, M., Isahara, H.: Reliable measures for aligning japanese-english news articles and sentences. In: Proceedings of the Annual Meeting of Association for Computational Linguistics, pp. 72–79 (2003)Google Scholar
  11. 11.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. In: Proceedings of the International Conference on Computational Linguistics, vol. 19(2), pp. 263–311 (1993)Google Scholar
  12. 12.
    Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the Conference on Computational Linguistics, pp. 836–841 (1996)Google Scholar
  13. 13.
    Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, October 2000, pp. 440–447 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Maike Erdmann
    • 1
  • Kotaro Nakayama
    • 1
  • Takahiro Hara
    • 1
  • Shojiro Nishio
    • 1
  1. 1.Dept. of Multimedia Engineering, Graduate School of Information Science and TechnologyOsaka UniversityOsakaJapan

Personalised recommendations