Advertisement

Component Ranking and Automatic Query Refinement for XML Retrieval

  • Yosi Mass
  • Matan Mandelbrod
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3493)

Abstract

Queries over XML documents challenge search engines to return the most relevant XML components that satisfy the query concepts. In a previous work we described a component ranking algorithm that performed relatively well in INEX’03. In this paper we show an improvement to that algorithm by introducing a document pivot that compensates for missing terms statistics in small components. Using this new algorithm we achieved improvements of 30%-50% in the Mean Average Precision over the previous algorithm. We then describe a general mechanism to apply known Query Refinement algorithms from traditional IR on top of this component ranking algorithm and demonstrate an example such algorithm that achieved top results in INEX’04.

Keywords

Query Term Mean Average Precision Inverted Index Query Node XPath Expression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Broder, A.Z., Maarek, Y., Mandelbrod, M., Mass, Y.: Using XML to Query XML – From Theory to Practice. In: Proceedings of RIAO 2004, Avignon France (April 2004)Google Scholar
  2. 2.
    Carmel, D., Farchi, E., Petruschka, Y., Soffer, A.: Automatic Query Refinement using Lexical Affinities with Maximal Information Gain. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2002)Google Scholar
  3. 3.
    Carmel, D., Maarek, Y., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML Documents via XML Fragments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada (August 2003)Google Scholar
  4. 4.
    INEX, Initiative for the Evaluation of XML Retrieval, http://inex.is.informatik.uni-duisburg.de
  5. 5.
  6. 6.
    Mass, Y., Mandelbrod, M.: Retrieving the most relevant XML Component. In: Proceedings of the Second Workshop of the Initiative for The Evaluation of XML Retrieval (INEX), Schloss Dagstuhl, Germany, December 15-17, pp. 53–58 (2003)Google Scholar
  7. 7.
    Ruthven, I., Lalmas, M.: A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review 18(1) (2003)Google Scholar
  8. 8.
    Salton, G.: Automatic Text Processing – The Transformation, Analysis and Retrieval of Information by Computer. Addison Wesley Publishing Company, Reading (1989)Google Scholar
  9. 9.
    Sigurbjornsson, B., Kamps, J., Rijke, M.: An element based approach to XML Retrieval. In: Proceedings of the Second Workshop of the Initiative for The Evaluation of XML Retrieval (INEX), Schloss Dagstuhl, Germany, December 15-17, pp. 19–26 (2003)Google Scholar
  10. 10.
    Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of SIGIR 1996, pp. 21–29 (1996)Google Scholar
  11. 11.
    XPath – XML Path Language (XPath) 2.0, http://www.w3.org/TR/xpath2

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Yosi Mass
    • 1
  • Matan Mandelbrod
    • 1
  1. 1.IBM Research LabHaifaIsrael

Personalised recommendations