Advertisement

A Bilingual Dictionary Extracted from the Wikipedia Link Structure

  • Maike Erdmann
  • Kotaro Nakayama
  • Takahiro Hara
  • Shojiro Nishio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4947)

Abstract

A lot of bilingual dictionaries have been released on the WWW. However, these dictionaries insufficiently cover new and domainspecific terminology. In our demonstration, we present a dictionary constructed by analyzing the link structure of Wikipedia, a huge scale encyclopedia containing a large amount of links between articles in different languages. We analyzed not only these interlanguage links but extracted even more translation candidates from redirect page and link text information. In an experiment, we already proved the advantages of our dictionary compared to manually created dictionaries as well as to extracting bilingual terminology from parallel corpora.

Keywords

Machine Translation Link Structure Parallel Corpus Bilingual Dictionary Target Page 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tsuji, K., Kageura, K.: Automatic generation of japanese-english bilingual thesauri based on bilingual corpora. Journal of the American Society for Information Science and Technology 57(7), 891–906 (2006)CrossRefGoogle Scholar
  2. 2.
    Fung, P., McKeown, K.: A technical word- and term-translation aid using noisy parallel corpora across language groups. Machine Translation 12(1-2), 53–87 (1997)CrossRefGoogle Scholar
  3. 3.
    Nakayama, K., Hara, T., Nishio, S.: A thesaurus construction method from large scale web dictionaries. In: Proc. of IEEE International Conference on Advanced Information Networking and Applications (AINA 2007), pp. 932–939 (2007)Google Scholar
  4. 4.
    Nakayama, K., Hara, T., Nishio, S.: Wikipedia mining for an association web thesaurus construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Erdmann, M., Nakayama, K., Hara, T., Nishio, S.: An approach for extracting bilingual terminology from wikipedia. In: Haritsa, et al.(eds.) DASFAA 2008. LNCS, vol. 4947, pp. 580–587, Springer, Heidelberg (to appear, 2008) Google Scholar
  6. 6.
    Wikimedia Foundation: Wikimedia downloads, http://download.wikimedia.org/

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Maike Erdmann
    • 1
  • Kotaro Nakayama
    • 1
  • Takahiro Hara
    • 1
  • Shojiro Nishio
    • 1
  1. 1.Dept. of Multimedia Engineering, Graduate School of Information Science and TechnologyOsaka UniversityOsakaJapan

Personalised recommendations