Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts

  • Stefan Klink
  • Armin Hust
  • Markus Junker
  • Andreas Dengel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper, a new approach is presented based on using term concepts learned by other queries. Two important issues with query expansion are addressed: the selection and the weighing of additional search terms. In contrast to other methods, the regarded query is expanded by adding those terms which are most similar to the concept of individual query terms, rather than selecting terms that are similar to the complete query or that are directly similar to the query terms. Experiments have shown that this kind of query expansion results in notable improvements of the retrieval effectiveness if measured the recall/precision in comparison to the standard vector space model and to the pseudo relevance feedback. This approach can be used to improve the retrieval of documents in Digital Libraries, in Document Management Systems, in the WWW etc.

Keywords

Information Retrieval Digital Library Average Precision Relevance Feedback Query Term 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aalbersberg I.J.: Incremental relevance feedback. In Proceedings of the Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–22, 1992Google Scholar
  2. 2.
    Allan J.: Incremental relevance feedback for information filtering. In Proceedings of the 19 th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 270–278, 1996Google Scholar
  3. 3.
    Alsaffar A.H., Deogun J.S., Raghavan V.V., Sever H: Concept-based retrieval with minimal term sets. In Z.W. Ras and A. Skowon, editors, Foundation of Intelligent Systems: 11th Int. Symposium, ISMIS’99, pp. 114–122, Springer, Warsaw, Poland, June 1999Google Scholar
  4. 4.
    Buckley C, Salton G., Allen J.: The effect of adding relevance information in a relevance feedback environment. In Proceedings of the 17 th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 292–300, 1994Google Scholar
  5. 5.
    Baeza-Yates R., Ribeiro-Neto B.: Modern Information Retrieval. Addison-Wesley Pub. Co., 1999. ISBN020139829XGoogle Scholar
  6. 6.
    Croft W.B.: Approaches to intelligent information retrieval. Information Processing and Management, 1987, Vol.23, No.4, pp. 249–254CrossRefGoogle Scholar
  7. 8.
    Harman D.: Towards Interactive Query Expansion. In: Chiaramella Y. editor: 11th International Conference on Research and Development in Information Retrieval, pp. 321–331, Grenoble, France, 1988Google Scholar
  8. 11.
    Hull D.: Using Statistical Testing in the Evaluation of Retrieval Experiments. In Proceedings of the 16 th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 329–338,1993Google Scholar
  9. 12.
    Iwayama M.: Relevance Feedback with a Small Number of Relevance Judgments: Incremental Relevance Feedback vs. Document Clustering. In Proceedings of the 23 rd Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 10–16, Athens, Greece, July 2000Google Scholar
  10. 13.
    Jansen B.J., Spink A., Bateman J. and Saracevic T.: Real Life Information Retrieval: A Study of User Queries on the Web, In SIGIR Forum, Vol. 31, pp. 5–17, 1988Google Scholar
  11. 14.
    Kim M., Raghavan V.: Adaptive concept-based Retrieval Using a Neural Network, In Proceedings of ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, Athens, Greece, July 2000Google Scholar
  12. 15.
    Kise K., Junker M., Dengel A., Matsumoto K.: Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs, In Proceedings of the 4 th International Conference of Discovery Science, pp. 155–169, Washington, DC, USA, November 2001Google Scholar
  13. 16.
    Kwok K.: Query Modification and Expansion in a Network with Adaptive Architecture. In Proceedings of the 14 th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201, 1991Google Scholar
  14. 17.
    Lu F., Johnsten Th., Raghavan V.V., Traylor D.: Enhancing Internet Search Engines to Achieve Concept-based Retrieval, In Proceedings of Info rum’99, Oakridge, USAGoogle Scholar
  15. 18.
    Manning CD. and Schiitze H.: Foundations of Statistical Natural Language Processing, MIT Press, 1999Google Scholar
  16. 19.
    Maglano V., Beauiieu M., Robertson S.,: Evaluation of interfaces for IRS: modeling end-user search behaviour. 20th Colloquium on Information Retrieval, Grenoble, 1988Google Scholar
  17. 20.
    McCune B.P., Tong R.M., Dean J.S., Shapiro D.G.: RUBRIC: A System for Rule-Based Information Retrieval, In IEEE Transaction on Software Engineering, Vol. SE-11, No.9, September 1985Google Scholar
  18. 21.
    Minker J., Wilson, G.A. Zimmerman, B.H.: An evaluation of query expansion by the addition of clustered terms for a document retrieval system, Information Storage and Retrieval, vol. 8(6), pp. 329–348, 1972CrossRefGoogle Scholar
  19. 22.
    Peat H.J., Willet, P.: The limitations of term co-occurrence data for query expansion in document retrieval systems, Journal of the ASIS, vol. 42(5), pp. 378–383, 1991Google Scholar
  20. 23.
    Pirkola A.: Studies on Linguistic Problems and Methods in Text Retrieval: The Effects of Anaphor and Ellipsis Resolution in Proximity Searching, and Translation and query Structuring Methods in Cross-Language Retrieval, PhD dissertation, Department of Information Studies, University of Tampere. Acta Universitatis Tamperensis 672. ISBN 951-44-4582-1; ISSN 1455-1616. June 1999Google Scholar
  21. 24.
    Qiu Y.: ISIR: an integrated system for information retrieval, In Proceedings of 14 th IR Colloqium, British Computer Society, Lancaster, 1992Google Scholar
  22. 25.
    van Rijsbergen C.J., Harper D.H., etal.: The Selection of Good Search Terms. Information Processing and Management 17, pp. 77–91, 1981CrossRefGoogle Scholar
  23. 26.
    Resnik P.: Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14 th Int. Joint Conference on Artificial Intelligence, pp. 448–453, 1995Google Scholar
  24. 27.
    Salton G., Buckley C: Term weighting approaches in automatic text retrieval. Information Processing & Management 24(5), pp. 513–523, 1988CrossRefGoogle Scholar
  25. 28.
    Salton G., Buckley G: Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science 41 (4), pp. 288–297, 1990CrossRefGoogle Scholar
  26. 29.
    Sanderson M., Croft B.: Deriving concept hierarchies from text. In Proceedings of the 22 nd Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–213, Berkeley, CA, August 1999Google Scholar
  27. 30.
    Sparck-Jones K.: Notes and references on early classification work. In SIGIR Forum, vol. 25(1), pp. 10–17,1991CrossRefGoogle Scholar
  28. 31.
    Smeaton A.F., van Rijsbergen C.J.: The retrieval effects of query expansion on a feedback document retrieval system. The Computer Journal, vol. 26(3), pp. 239–246, 1983CrossRefGoogle Scholar
  29. 32.
    Stucky D.,: Unterstutzung der Anfrageformulierung bei Internet-Suchmaschinen durch User Relevance Feedback, diploma thesis, German Research Center of Artificial Intelligence (DFKI), Kaiserslautern, November 2000Google Scholar
  30. 33.
    Yang Y. and Liu X.: A Re-Examination of Text Categorization Methods. In Proceedings of the 22 nd Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49, Berkeley, CA, August 1999Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Stefan Klink
    • 1
  • Armin Hust
    • 1
  • Markus Junker
    • 1
  • Andreas Dengel
    • 1
  1. 1.German Research Center for Artificial Intelligence (DFKI, GmbH)KaiserslauternGermany

Personalised recommendations