Advertisement

Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah

  • Theodora Tsikrika
  • Pavel Serdyukov
  • Henning Rode
  • Thijs Westerveld
  • Robin Aly
  • Djoerd Hiemstra
  • Arjen P. de Vries
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4862)

Abstract

CWI and University of Twente used PF/Tijah, a flexible XML retrieval system, to evaluate structured document retrieval, multimedia retrieval, and entity ranking tasks in the context of INEX 2007. For the retrieval of textual and multimedia elements in the Wikipedia data, we investigated various length priors and found that biasing towards longer elements than the ones retrieved by our language modelling approach can be useful. For retrieving images in isolation, we found that their associated text is a very good source of evidence in the Wikipedia collection. For the entity ranking task, we used random walks to model multi-step relevance propagation from the articles describing entities to all related entities and further, and obtained promising results.

Keywords

Mean Average Precision Retrieval Task Multimedia Retrieval Relevant Entity Entity Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boncz, P., Grust, T., van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: MonetDB/XQuery: A fast XQuery processor powered by a relational engine. In: Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, pp. 479–490 (2006)Google Scholar
  2. 2.
    Chirita, P.-A., Diederich, J., Nejdl, W.: Mailrank: using ranking for spam detection. In: Proceedings of the 14th ACM CIKM International Conference on Information and Knowledge Management, pp. 373–380 (2005)Google Scholar
  3. 3.
    Craswell, N., Szummer, M.: Random walks on the click graph. In: Proceedings of the 30th ACM SIGIR Annual International Conference on Research and Development in Information Retrieval, pp. 239–246 (2007)Google Scholar
  4. 4.
    Hiemstra, D.: A linguistically motivated probabilistic model of information retrieval. In: ECDL 1991. LNCS, vol. 513, pp. 569–584. Springer, Heidelberg (1991)Google Scholar
  5. 5.
    Hiemstra, D., Rode, H., van Os, R., Flokstra, J.: PF/Tijah: text search in an XML database system. In: Proceedings of the 2nd International Workshop on Open Source Information Retrieval (OSIR) (2006)Google Scholar
  6. 6.
    Kamps, J., de Rijke, M., Sigurbjörnsson, B.: Length normalization in XML retrieval. In: Proceedings of the 27th ACM SIGIR Annual International Conference on Research and Development in Information Retrieval, pp. 80–87 (2004)Google Scholar
  7. 7.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Kraaij, W., Westerveld, T., Hiemstra, D.: The importance of prior probabilities for entry page search. In: Proceedings of the 25th ACM SIGIR Annual International Conference on Research and Development in Information Retrieval, pp. 27–34 (2002)Google Scholar
  9. 9.
    Kritikopoulos, A., Sideri, M., Varlamis, I.: Blogrank: ranking weblogs based on connectivity and similarity features. In: Proceedings of the 2nd International Workshop on Advanced Architectures and Algorithms for Internet Delivery and Applications, p. 8 (2006)Google Scholar
  10. 10.
    Lawrence, P., Sergey, B., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)Google Scholar
  11. 11.
    List, J., Mihajlovic, V., Ramirez, G., de Vries, A., Hiemstra, D., Blok, H.: Tijah: Embracing IR methods in XML databases. Information Retrieval 8(4), 547–570 (2005)CrossRefGoogle Scholar
  12. 12.
    Serdyukov, P., Rode, H., Hiemstra, D.: University of Twente at the TREC 2007 Enterprise Track: Modeling relevance propagation for the expert search task. In: Proceedings of the 16th Text REtrieval Conference (2007)Google Scholar
  13. 13.
    Shakery, A., Zhai, C.: A probabilistic relevance propagation model for hypertext retrieval. In: Proceedings of the 15th ACM CIKM International Conference on Information and Knowledge Management, pp. 550–558 (2006)Google Scholar
  14. 14.
    Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th ACM SIGIR Annual International Conference on Research and Development in Information Retrieval, pp. 21–29 (1996)Google Scholar
  15. 15.
    Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., Attardi, G.: Ranking very many typed entities on wikipedia. In: Proceedings of the 16th ACM CIKM International Conference on Information and Knowledge Management, Lisbon, Portugal, pp. 1015–1018 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Theodora Tsikrika
    • 1
  • Pavel Serdyukov
    • 2
  • Henning Rode
    • 2
  • Thijs Westerveld
    • 3
  • Robin Aly
    • 2
  • Djoerd Hiemstra
    • 2
  • Arjen P. de Vries
    • 1
  1. 1.CWIAmsterdamThe Netherlands
  2. 2.University of TwenteEnschedeThe Netherlands
  3. 3.Teezir Search SolutionsEdeThe Netherlands

Personalised recommendations