Abstract
The INEX 2006 evaluation was based on the Wikipedia collection in XML format. It consisted of several tasks that required different approaches to element selection. In this paper we describe the approach that we adopted in an attempt to satisfy the requirements of all the tasks, Thorough, Focused, Relevant in Context, and Best in Context. We have used the same underlying system to approach all tasks. The retrieval strategy is based on the construction of a collection sub-tree, consisting of all nodes that contain one or more of the search terms. Nodes containing search terms were then assigned a score using the GPX ranking scheme which incorporates TF-IDF or BM25 variants, but extends them. Scores are recursively propagated to ancestors in the document XML tree, and finally all scoring XML elements are ranked. We present results that demonstrate that the approach is versatile and produces consistently good performance. We also provide empirical analysis of the GPX ranking scheme and compare its performance against a baseline TF-IDF and a BM25 scoring scheme.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Geva, S.: GPX - Gardens Point XML Information Retrieval INEX 2004. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 211–223. Springer, Heidelberg (2005)
Geva, S.: GPX - Gardens Point XML IR at INEX 2005, INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 240–253. Springer, Heidelberg (2006)
Geva, S., Tannier, X., Hassler, M.: XOR - XML Oriented Retrieval Language, SIGIR 2006, Workshop on XML Element Retrieval Methodology (2006), Proceedings online at: http://www.cs.otago.ac.nz/sigirmw/Proceedings.pdf
Robertson, S.E., Sparck Jones, K.: Simple, proven approaches to text retrieval, University of Cambridge Technical Report UCAM-CL-TR-356, ISSN 1476-2986 (December 1994) (last updated February 2006), http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-356.pdf
Robertson, S.: Understanding Inverse Document Frequency: On theoretical arguments for IDF. Journal of Documentation 60(5), 503–520 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Geva, S. (2007). GPX - Gardens Point XML IR at INEX 2006. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-73888-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73887-9
Online ISBN: 978-3-540-73888-6
eBook Packages: Computer ScienceComputer Science (R0)