Towards More Effective Techniques for Automatic Query Expansion

  • Claudio Carpineto
  • Giovanni Romano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1696)

Abstract

Techniques for automatic query expansion from top retrieved documents have recently shown promise for improving retrieval effectiveness on large collections but there is still a lack of systematic evaluation and comparative studies. In this paper we focus on term-scoring methods based on the differences between the distribution of terms in (pseudo-)relevant documents and the distribution of terms in all documents, seen as a complement or an alternative to more conventional techniques. We show that when such distributional methods are used to select expansion terms within Rocchio’s classical reweighting scheme, the overall performance is not likely to improve. However, we also show that when the same distributional methods are used to both select and weight expansion terms the retrieval effectiveness may considerably improve. We then argue, based on their variation in performance on individual queries, that the set of ranked terms suggested by individual distributional methods can be combined to further improve mean performance, by analogy with ensembling classifiers, and present experimental evidence supporting this view. Taken together, our experiments show that with automatic query expansion it is possible to achieve performance gains as high as 21.34% over non-expanded query (for non-interpolated average precision). We also discuss the effect that the main parameters involved in automatic query expansion, such as query difficulty, number of selected documents, and number of selected terms, have on retrieval effectiveness.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Attar, R., and Fraenkel, A. S. (1977). Local feedback in full-text retrieval systems. Journal of the Association for Computing Machinery, 2493), 397–417.Google Scholar
  2. 2.
    Bartell, B., Cottrell, G., and Belew, R. (1994). Automatic combination of multiple ranked retrieval systems. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pp. 173–181, Dublin.Google Scholar
  3. 3.
    Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), pp. 123–140.MATHMathSciNetGoogle Scholar
  4. 4.
    Buckley, C., and Salton, G. (1995). Optimization of relevance feedback weights. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), pp. 351–357, Seattle.Google Scholar
  5. 5.
    Buckley, C., Salton, G., Allan, J., and Singhal, A. (1995). Automatic query expansion using SMART: TREC3. Proceedings of the third Text REtrieval Conference (TREC-3).Google Scholar
  6. 6.
    Carpineto, C., and Romano, G. (1998). Effective reformulation of Boolean queries with concept lattices. Proceedings of the 3rd International Conference on Flexible Query-Answering Systems (FQAS’98), Lecture Notes in Artificial Intelligence, Springer Verlag, pp. 83–94.Google Scholar
  7. 7.
    Carpineto, C., De Mori, R., and Romano, G. (1999). Informative term selection for automatic query expansion. Proceedings of the Seventh Text Retrieval Conference (TREC-7).Google Scholar
  8. 8.
    Cooper, J., & Byrd, R. (1997). Lexical navigation: visually prompted query expansion and refinement. Proceedings of the 2nd ACM Digital Library Conference, pp. 237–246.Google Scholar
  9. 9.
    Croft, B., and Harper, D. J. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35, 285–295.CrossRefGoogle Scholar
  10. 10.
    Dietterich, T. (1997). Machine-learning research: four current directions. AI Magazine, Winter 1997, pp. 97–135.Google Scholar
  11. 11.
    Doszcocks, T.E. (1978). AID: an associative interactive dictionary for online searching. Online Review 2(2), pp. 163–174.CrossRefGoogle Scholar
  12. 12.
    Fitzpatrick, L., and Dent, M. (1997). Automatic feedback using past queries: social searching? Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’97), pp. 306–313, Philadelpia.Google Scholar
  13. 13.
    Harman, D. (1992). Relevance feedback and other query modification techniques. In Information Retrieval–Data Structures and Algorithms, Frakes, W.B., and Baeza-Yates, R. (Eds.), pp. 241–263, Prentice Hall, Englewood Cliffs, NJ.Google Scholar
  14. 14.
    Hawking, D., Thistlewaite, P., and Craswell, N. (1998). ANU/ACSys TREC-6 Experiments. In D. K. Harman, editor, Proceedings of the Sixth Text Retrieval Conference (TREC-6).Google Scholar
  15. 15.
    Hull, D., Pedersen, J., and Schutze, H. (1996). Method combination for document filtering. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 279–287, Zurich.Google Scholar
  16. 16.
    Larkey, L., and Croft, B. (1996). Combining classifiers in text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 289–297, Zurich.Google Scholar
  17. 17.
    Mitra, M., Singhal, A., and Buckley, C. (1998). Improving automatic query expansion. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pp. 206–214, Melbourne.Google Scholar
  18. 18.
    Robertson, S.E. (1990). On term selection for query expansion. Journal of Documentation, 46(4), pp. 359–364.CrossRefGoogle Scholar
  19. 19.
    Robertson, S.E., Walker, S., Jones, G.J.F., Hancock-Beaulieu, and Gatford, M. (1995). Okapi at TREC-3. Proceedings of the third Text REtrieval Conference (TREC-3).Google Scholar
  20. 20.
    Rocchio, J. (1971). Relevance feedback in information retrieval. In Salton, G. (ed.), The SMART retrieval system–experiments in automatic document processing, chapter 14, Prentice Hall, Englewood Cliffs.Google Scholar
  21. 21.
    Salton, G. and Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4), 288–297.CrossRefGoogle Scholar
  22. 22.
    Shapire, R, Singer, Y., and Singhal, A. (1998). Boosting and Rocchio applied to text filtering. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pp. 215–223, Melbourne.Google Scholar
  23. 23.
    Singhal, A., Choi, J., Hindle, D., Lewis, D., and Pereira, F. (1999). AT&T at TREC-7. Proceedings of the Seventh Text Retrieval Conference (TREC-7).Google Scholar
  24. 24.
    Srinavasan, P. (1996). Query expansion and MEDLINE. Information Processing & Management, 32(4), pp. 431–443.CrossRefGoogle Scholar
  25. 25.
    Xu, J., and Croft, B. (1996). Query expansion using local and global document analysis. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), pp. 4–11, Zurich.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Claudio Carpineto
    • 1
  • Giovanni Romano
    • 2
  1. 1.Fondazione Ugo BordoniRomeItaly
  2. 2.Fondazione Ugo BordoniRomeItaly

Personalised recommendations