Hybrid XML Retrieval Revisited
The widespread adoption of XML necessitates structure-aware systems that can effectively retrieve information from XML document collections. This paper reports on the participation of the RMIT group in the INEX 2004 ad hoc track, where we investigate different aspects of the XML retrieval task. Our preliminary analysis of CO and VCAS relevance assessments identifies three XML retrieval scenarios: Original, General and Specific. Further analysis of the relevance assessments under the General retrieval scenario reveals two categories of CO and VCAS topics: Broad and Narrow. We design runs that follow a hybrid XML approach and implement two retrieval heuristics with different levels of overlap among the answer elements. For the Original retrieval scenario we show that the overlap CO runs outperform the non-overlap CO runs, and the VCAS run that uses queries with structural constraints and no explicitly specified target element performs best. In both CO and VCAS cases, runs that implement the retrieval heuristic that favours less specific over more specific answer elements produce most effective retrieval. Importantly, we present results which show that, for the General retrieval scenario where users prefer less specific and non-overlapping answers to their queries, the choice of using a plain full-text search engine is a very effective choice for XML retrieval.
Unable to display preview. Download preview PDF.
- 1.Chiaramella, Y., Mulhem, P., Fourel, F.: A Model for Multimedia Information Retrieval. Technical report, FERMI ESPRIT BRA 8134, University of Glasgow (April 1996)Google Scholar
- 2.Fuhr, N., Lalmas, M., Malik, S. (eds.): INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the Second INEX Workshop, Dagstuhl, Germany, December 15–17, 2003 (March 2004)Google Scholar
- 3.Hatano, K., Kinutan, H., Watanabe, M., Mori, Y., Yoshikawa, M., Uemura, S.: Keyword-based XML Fragment Retrieval: Experimental Evaluation based on INEX 2003 Relevance Assessments. In: Fuhr, et al. (eds.) , pp. 81–88Google Scholar
- 4.Kamps, J., de Rijke, M., Sigurbjoernsson, B.: Length Normalization in XML Retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25–29, pp. 80–87 (2004)Google Scholar
- 5.Kazai, G.: Report on the INEX2003 Metrics working group. In: Fuhr, et al. (eds.) , pp. 184–190Google Scholar
- 6.Kazai, G., Lalmas, M., de Vries, A.P.: The Overlap Problem in Content-Oriented XML Retrieval Evaluation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25–29, pp. 72–79 (2004)Google Scholar
- 7.Pehcevski, J., Thom, J.A., Vercoustre, A.-M.: Enhancing Content-And-Structure Information Retrieval using a Native XML Database. In: Proceedings of The First Twente Data Management Workshop (TDM 2004) on XML Databases and Information Retrieval, Enschede, The Netherlands, June 21, pp. 24–31 (2004)Google Scholar
- 8.Pehcevski, J., Thom, J.A., Vercoustre, A.-M.: Hybrid XML Retrieval: Combining Information Retrieval and a Native XML Database. Journal of Information Retrieval: Special Issue on INEX (2004) (to appear)Google Scholar