Towards More Effective Techniques for Automatic Query Expansion

  • Claudio Carpineto
  • Giovanni Romano
Conference paper

DOI: 10.1007/3-540-48155-9_10

Part of the Lecture Notes in Computer Science book series (LNCS, volume 1696)
Cite this paper as:
Carpineto C., Romano G. (1999) Towards More Effective Techniques for Automatic Query Expansion. In: Abiteboul S., Vercoustre AM. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1999. Lecture Notes in Computer Science, vol 1696. Springer, Berlin, Heidelberg

Abstract

Techniques for automatic query expansion from top retrieved documents have recently shown promise for improving retrieval effectiveness on large collections but there is still a lack of systematic evaluation and comparative studies. In this paper we focus on term-scoring methods based on the differences between the distribution of terms in (pseudo-)relevant documents and the distribution of terms in all documents, seen as a complement or an alternative to more conventional techniques. We show that when such distributional methods are used to select expansion terms within Rocchio’s classical reweighting scheme, the overall performance is not likely to improve. However, we also show that when the same distributional methods are used to both select and weight expansion terms the retrieval effectiveness may considerably improve. We then argue, based on their variation in performance on individual queries, that the set of ranked terms suggested by individual distributional methods can be combined to further improve mean performance, by analogy with ensembling classifiers, and present experimental evidence supporting this view. Taken together, our experiments show that with automatic query expansion it is possible to achieve performance gains as high as 21.34% over non-expanded query (for non-interpolated average precision). We also discuss the effect that the main parameters involved in automatic query expansion, such as query difficulty, number of selected documents, and number of selected terms, have on retrieval effectiveness.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Claudio Carpineto
    • 1
  • Giovanni Romano
    • 2
  1. 1.Fondazione Ugo BordoniRomeItaly
  2. 2.Fondazione Ugo BordoniRomeItaly

Personalised recommendations