Advertisement

Focused Search in Books and Wikipedia: Categories, Links and Relevance Feedback

  • Marijn Koolen
  • Rianne Kaptein
  • Jaap Kamps
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6203)

Abstract

In this paper we describe our participation in INEX 2009 in the Ad Hoc Track, the Book Track, and the Entity Ranking Track. In the Ad Hoc track we investigate focused link evidence, using only links from retrieved sections. The new collection is not only annotated with Wikipedia categories, but also with YAGO/WordNet categories. We explore how we can use both types of category information, in the Ad Hoc Track as well as in the Entity Ranking Track. Results in the Ad Hoc Track show Wikipedia categories are more effective than WordNet categories, and Wikipedia categories in combination with relevance feedback lead to the best results. Preliminary results of the Book Track show full-text retrieval is effective for high early precision. Relevance feedback further increases early precision. Our findings for the Entity Ranking Track are in direct opposition of our Ad Hoc findings, namely, that the WordNet categories are more effective than the Wikipedia categories. This marks an interesting difference between ad hoc search and entity ranking.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fachry, K.N., Kamps, J., Koolen, M., Zhang, J.: Using and detecting links in Wikipedia. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 388–403. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Fachry, K.N., Kamps, J., Koolen, M., Zhang, J.: Using and detecting links in Wikipedia. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 388–403. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM Press, New York (2004)Google Scholar
  4. 4.
    Kamps, J., Koolen, M.: The impact of document level ranking on focused retrieval. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 140–151. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Kamps, J., Koolen, M.: Is wikipedia link structure different? In: Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM 2009). ACM Press, New York (2009b)Google Scholar
  6. 6.
    Kamps, J., Koolen, M., Sigurbjörnsson, B.: Filtering and clustering XML retrieval results. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 121–136. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Kaptein, R., Kamps, J.: Finding entities in Wikipedia using links and categories. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 273–279. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Kaptein, R., Koolen, M., Kamps, J.: Using Wikipedia categories for ad hoc search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York (2009)Google Scholar
  9. 9.
    Kazai, G., Milic-Frayling, N., Costello, J.: Towards methods for the collective gathering and quality control of relevance assessments. In: SIGIR ’09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 452–459. ACM, New York (2009), http://doi.acm.org/10.1145/1571941.1572019 CrossRefGoogle Scholar
  10. 10.
    Koolen, M., Kamps, J.: What’s in a link? from document importance to topical relevance. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 313–321. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Schenkel, R., Suchanek, F., Kasneci, G.: YAWN: A semantically annotated wikipedia xml corpus. In: 12th GI Conference on Databases in Business, Technology and Web (BTW 2007) (March 2007)Google Scholar
  12. 12.
    Sigurbjörnsson, B., Kamps, J.: The effect of structured queries and selective indexing on XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 104–118. Springer, Heidelberg (2006)Google Scholar
  13. 13.
    Sigurbjörnsson, B., Kamps, J., de Rijke, M.: An Element-Based Approach to XML Retrieval. In: INEX 2003 Workshop Proceedings, pp. 19–26 (2004)Google Scholar
  14. 14.
    Sigurbjörnsson, B., Kamps, J., de Rijke, M.: Mixture models, overlap, and structural hints in XML element retreival. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 196–210. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language-model based search engine for complex queries. In: Proceedings of the International Conference on Intelligent Analysis (2005)Google Scholar
  16. 16.
    Vercoustre, A.-M., Pehcevski, J., Thom, J.A.: Using Wikipedia categories and links in entity ranking. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 321–335. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Marijn Koolen
    • 1
  • Rianne Kaptein
    • 1
  • Jaap Kamps
    • 1
    • 2
  1. 1.Archives and Information Studies, Faculty of HumanitiesUniversity of AmsterdamNetherlands
  2. 2.ISLA, Faculty of ScienceUniversity of AmsterdamNetherlands

Personalised recommendations