Chapter

Web Technologies and Applications

Volume 7235 of the series Lecture Notes in Computer Science pp 496-503

Extracting Difference Information from Multilingual Wikipedia

  • Yuya FujiwaraAffiliated withKonan University
  • , Yu SuzukiAffiliated withNagoya University
  • , Yukio KonishiAffiliated withKonan University
  • , Akiyo NadamotoAffiliated withKonan University

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Wikipedia articles for a particular topic are written in many languages. When we select two articles which are about a single topic but which are written in different languages, the contents of these two articles are expected to be identical because of the Wikipedia policy. However, these contents are actually different, especially topics related to culture. In this paper, we propose a system to extract different Wikipedia information between that shown for Japan and that of other countries. An important technical problem is how to extract comparison target articles of Wikipedia. A Wikipedia article is written in different languages, with their respective linguistic structures. For example, “Cricket” is an important part of English culture, but the Japanese Wikipedia article related to cricket is too simple. Actually, it is only a single page. In contrast, the English version is substantial. It includes multiple pages. For that reason, we must consider which articles can be reasonably compared. Subsequently, we extract comparison target articles of Wikipedia based on a link graph and article structure. We implement our proposed method, and confirm the accuracy of difference extraction methods.