Wikipedia Mining for an Association Web Thesaurus Construction

Nakayama, Kotaro; Hara, Takahiro; Nishio, Shojiro

doi:10.1007/978-3-540-76993-4_27

Kotaro Nakayama¹,
Takahiro Hara¹ &
Shojiro Nishio¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4831))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1292 Accesses
45 Citations

Abstract

Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency - Inversed Backward link Frequency) and the extension method “forward / backward link weighting (FB weighting)” in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Giles, J.: Internet encyclopaedias go head to head. Nature 438, 900–901 (2005)
Article Google Scholar
Nakayama, K., Hara, T., Nishio, S.: A thesaurus construction method from large scale web dictionaries. In: AINA 2007. Proc. of IEEE International Conference on Advanced Information Networking and Applications, pp. 932–939 (2007)
Google Scholar
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)
Google Scholar
Strube, M., Ponzetto, S.: WikiRelate! Computing semantic relatedness using Wikipedia. In: AAAI 2006. Proc. of National Conference on Artificial Intelligence, pp. 1419–1424. Boston, Mass (2006)
Google Scholar
Milne, D., Medelyan, O., Witten, I.H.: Mining domain-specific thesauri from wikipedia: A case study. In: WI 2006. Proc. of ACM International Conference on Web Intelligence, pp. 442–448 (2006)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI 2007. Proc. of International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)
Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1984)
Google Scholar
Lawrence, P., Sergey, B., Rajeev, M., Terry, W.: The pagerank citation ranking: Bringing order to the web. Technical Report, Stanford Digital Library Technologies Project (1999)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (5), 604–632 (1999)
Google Scholar
Davison, B.D.: Topical locality in the web. In: Proc. of the ACM SIGIR, pp. 272–279 (2000)
Google Scholar
Schutze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. International Journal of Information Processing and Management 33(3), 307–318 (1997)
Article Google Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)
Article Google Scholar
Chen, H., Yim, T., Fye, D.: Automatic thesaurus generation for an electronic community system. Journal of the American Society for Information Science 46(3), 175–193 (1995)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Multimedia Eng., Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan
Kotaro Nakayama, Takahiro Hara & Shojiro Nishio

Authors

Kotaro Nakayama
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Hara
View author publications
You can also search for this author in PubMed Google Scholar
Shojiro Nishio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Boualem Benatallah Fabio Casati Dimitrios Georgakopoulos Claudio Bartolini Wasim Sadiq Claude Godart

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakayama, K., Hara, T., Nishio, S. (2007). Wikipedia Mining for an Association Web Thesaurus Construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-76993-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76992-7
Online ISBN: 978-3-540-76993-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics