The Importance of Link Evidence in Wikipedia

  • Jaap Kamps
  • Marijn Koolen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: internal links in Wikipedia are typically based on words naturally occurring in a page, and link to another semantically related entry. Our main aim is to find out if Wikipedia’s link structure can be exploited to improve ad hoc information retrieval. We first analyse the relation between Wikipedia links and the relevance of pages. We then experiment with use of link evidence in the focused retrieval of Wikipedia content, based on the test collection of INEX 2006. Our main findings are: First, our analysis of the link structure reveals that the Wikipedia link structure is a (possibly weak) indicator of relevance. Second, our experiments on INEX ad hoc retrieval tasks reveal that if the link evidence is made sensitive to the local context we see a significant improvement of retrieval effectiveness. Hence, in contrast with earlier TREC experiments using crawled Web data, we have shown that Wikipedia’s link structure can help improve the effectiveness of ad hoc retrieval.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wikipedia: The free encyclopedia (2008), http://en.wikipedia.org/
  2. 2.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  3. 3.
    Kleinberg, J.M.: Authoritative structures in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Hawking, D.: Overview of the TREC-9 web track. In: Ninth Text REtrieval Conference (TREC-9), National Institute for Standards and Technology, pp. 87–102. NIST Special Publication 500-249 (2001)Google Scholar
  5. 5.
    Kraaij, W., Westerveld, T.: How different are web documents? In: Proceedings of the ninth Text Retrieval Conference, TREC-9, May 2001, NIST Special Publication (2001)Google Scholar
  6. 6.
    Hawking, D., Craswell, N.: Very large scale retrieval and web search. In: TREC: Experiment and Evaluation in Information Retrieval, pp. 199–231. MIT Press, Cambridge (2005)Google Scholar
  7. 7.
    Kraaij, W., Westerveld, T., Hiemstra, D.: The importance of prior probabilities for entry page search. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 27–34. ACM Press, New York (2002)CrossRefGoogle Scholar
  8. 8.
    Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 143–150. ACM Press, New York (2003)Google Scholar
  9. 9.
    Kamps, J.: Web-centric language models. In: CIKM 2005: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 307–308. ACM Press, New York (2005)CrossRefGoogle Scholar
  10. 10.
    Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)CrossRefGoogle Scholar
  11. 11.
    INEX: INitiative for the Evaluation of XML retrieval (2007), http://inex.is.informatik.uni-duisburg.de/
  12. 12.
    Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40, 64–69 (2006)CrossRefGoogle Scholar
  13. 13.
    Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: SIGCOMM 1999: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, pp. 251–262. ACM Press, New York (1999)CrossRefGoogle Scholar
  14. 14.
    ILPS: The ILPS extension of the Lucene search engine (2008), http://ilps.science.uva.nl/Resources/
  15. 15.
    Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente (2001)Google Scholar
  16. 16.
    Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)CrossRefGoogle Scholar
  17. 17.
    Sigurbjörnsson, B., Kamps, J., de Rijke, M.: An element-based approach to XML retrieval. In: INEX 2003 Workshop Proceedings, pp. 19–26 (2004)Google Scholar
  18. 18.
    Lalmas, M., Kazai, G., Kamps, J., Pehcevski, J., Piwowarski, B., Robertson, S.: INEX 2006 evaluation measures. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 20–34. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jaap Kamps
    • 1
    • 2
  • Marijn Koolen
    • 1
  1. 1.Archives and Information StudiesUniversity of AmsterdamThe Netherlands
  2. 2.ISLA, University of AmsterdamThe Netherlands

Personalised recommendations