Abstract
Wikipedia has become a huge phenomenon on the WWW. As a corpus for knowledge extraction, it has various impressive characteristics such as a huge amount of articles, live updates, a dense link structure, brief link texts and URL identification for concepts. In this paper, we propose an efficient link mining method pfibf (Path Frequency - Inversed Backward link Frequency) and the extension method “forward / backward link weighting (FB weighting)” in order to construct a huge scale association thesaurus. We proved the effectiveness of our proposed methods compared with other conventional methods such as cooccurrence analysis and TF-IDF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Giles, J.: Internet encyclopaedias go head to head. Nature 438, 900–901 (2005)
Nakayama, K., Hara, T., Nishio, S.: A thesaurus construction method from large scale web dictionaries. In: AINA 2007. Proc. of IEEE International Conference on Advanced Information Networking and Applications, pp. 932–939 (2007)
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)
Strube, M., Ponzetto, S.: WikiRelate! Computing semantic relatedness using Wikipedia. In: AAAI 2006. Proc. of National Conference on Artificial Intelligence, pp. 1419–1424. Boston, Mass (2006)
Milne, D., Medelyan, O., Witten, I.H.: Mining domain-specific thesauri from wikipedia: A case study. In: WI 2006. Proc. of ACM International Conference on Web Intelligence, pp. 442–448 (2006)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI 2007. Proc. of International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1984)
Lawrence, P., Sergey, B., Rajeev, M., Terry, W.: The pagerank citation ranking: Bringing order to the web. Technical Report, Stanford Digital Library Technologies Project (1999)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (5), 604–632 (1999)
Davison, B.D.: Topical locality in the web. In: Proc. of the ACM SIGIR, pp. 272–279 (2000)
Schutze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. International Journal of Information Processing and Management 33(3), 307–318 (1997)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)
Chen, H., Yim, T., Fye, D.: Automatic thesaurus generation for an electronic community system. Journal of the American Society for Information Science 46(3), 175–193 (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nakayama, K., Hara, T., Nishio, S. (2007). Wikipedia Mining for an Association Web Thesaurus Construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-76993-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76992-7
Online ISBN: 978-3-540-76993-4
eBook Packages: Computer ScienceComputer Science (R0)