Skip to main content

Wikipedia Mining for an Association Web Thesaurus Construction

  • Conference paper
Web Information Systems Engineering – WISE 2007 (WISE 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4831))

Included in the following conference series:

Abstract

Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency - Inversed Backward link Frequency) and the extension method “forward / backward link weighting (FB weighting)” in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Giles, J.: Internet encyclopaedias go head to head. Nature 438, 900–901 (2005)

    Article  Google Scholar 

  2. Nakayama, K., Hara, T., Nishio, S.: A thesaurus construction method from large scale web dictionaries. In: AINA 2007. Proc. of IEEE International Conference on Advanced Information Networking and Applications, pp. 932–939 (2007)

    Google Scholar 

  3. Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)

    Google Scholar 

  4. Strube, M., Ponzetto, S.: WikiRelate! Computing semantic relatedness using Wikipedia. In: AAAI 2006. Proc. of National Conference on Artificial Intelligence, pp. 1419–1424. Boston, Mass (2006)

    Google Scholar 

  5. Milne, D., Medelyan, O., Witten, I.H.: Mining domain-specific thesauri from wikipedia: A case study. In: WI 2006. Proc. of ACM International Conference on Web Intelligence, pp. 442–448 (2006)

    Google Scholar 

  6. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI 2007. Proc. of International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)

    Google Scholar 

  7. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1984)

    Google Scholar 

  8. Lawrence, P., Sergey, B., Rajeev, M., Terry, W.: The pagerank citation ranking: Bringing order to the web. Technical Report, Stanford Digital Library Technologies Project (1999)

    Google Scholar 

  9. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (5), 604–632 (1999)

    Google Scholar 

  10. Davison, B.D.: Topical locality in the web. In: Proc. of the ACM SIGIR, pp. 272–279 (2000)

    Google Scholar 

  11. Schutze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. International Journal of Information Processing and Management 33(3), 307–318 (1997)

    Article  Google Scholar 

  12. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)

    Article  Google Scholar 

  13. Chen, H., Yim, T., Fye, D.: Automatic thesaurus generation for an electronic community system. Journal of the American Society for Information Science 46(3), 175–193 (1995)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Boualem Benatallah Fabio Casati Dimitrios Georgakopoulos Claudio Bartolini Wasim Sadiq Claude Godart

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nakayama, K., Hara, T., Nishio, S. (2007). Wikipedia Mining for an Association Web Thesaurus Construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76993-4_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76992-7

  • Online ISBN: 978-3-540-76993-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics