Information Retrieval

, Volume 5, Issue 1, pp 61–86 | Cite as

Some Formal Analysis of Rocchio's Similarity-Based Relevance Feedback Algorithm

  • Zhixiang Chen
  • Binhai Zhu


Rocchio's similarity-based Relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In spite of its popularity in various applications there is little rigorous analysis of its learning complexity in literature. In this paper we show that in the binary vector space model, if the initial query vector is 0, then for any of the four typical similarities (inner product, dice coefficient, cosine coefficient, and Jaccard coefficient), Rocchio's similarity-based relevance feedback algorithm makes at least n mistakes when used to search for a collection of documents represented by a monotone disjunction of at most k relevant features (or terms) over the n-dimensional binary vector space {0, 1} n . When an arbitrary initial query vector in {0, 1} n is used, it makes at least (n + k − 3)/2 mistakes to search for the same collection of documents. The linear lower bounds are independent of the choices of the threshold and coefficients that the algorithm may use in updating its query vector and making its classification.

relevance feedback vector space supervised learning similarity lower bound 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Angluin D (1987) Queries and concept learning. Machine Learning, 2(4):319–432.Google Scholar
  2. Baeza-Yates R and Ribeiro-Neto B (1999) Eds. Modern Information Retrieval. Addison-Wesley, Essex, England.Google Scholar
  3. Chen Z (2001) Multiplicative adaptive algorithms for user preference retrieval. In: Proceedings of the Seventh Annual International Computing and Combinatorics Conference. Springer-Verlag, pp. 540–549.Google Scholar
  4. Chen Z and Meng X (2000) Yarrow: A real-time client site meta search learner. In: Proceedings of the AAAI 2000 Workshop on Artificial Intelligence for Web Search. AAAI Press, pp. 12–17.Google Scholar
  5. Chen Z, Meng X, Fowler R and Zhu B (2001) FEATURES: Real-time adaptive feature learning and document learning. Journal of the American Society for Information Science, 52(8):655–665.Google Scholar
  6. Chen Z, Meng X, Zhu B and Fowler R (2000)WebSail: From on-line learning to web search. In: Q. Li et al. Eds., Proceedings of the 2000 International Conference on Web Information Systems Engineering (the full version will appear in Journal of Knowledge and Information Science, the special issue of WISE'00). IEEE Press, pp. 192–199.Google Scholar
  7. Frakes W and Baeza-Yates R (1992), Eds. Information Retrieval: Data Structures and Algorithms. Prentice Hall.Google Scholar
  8. Ide E. (1971a) Interactive search strategies and dynamic file organization in information retrieval. In: Salton G, Ed., The Smart System-Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, NJ, pp. 373–393.Google Scholar
  9. Ide E (1971b) New experiments in relevance feedback. In: Salton G, Ed., The Smart System-Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, NJ, pp. 337–354.Google Scholar
  10. Kivinen J, Warmuth M and Auer P (1997) The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97(1-2):325–343.Google Scholar
  11. Lewis D (1991) Learning in intelligent information retrieval. In: Proceedings of the Eighth InternationalWorkshop on Machine Learning, pp. 235–239.Google Scholar
  12. Littlestone N (1988) Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.Machine Learning, 2:285–318.Google Scholar
  13. Maass W and Turán G (1994) How fast can a threshold gate learn?. Computational Learning Theory and Natural Learning Systems, 1:381–414.Google Scholar
  14. Maass W and Warmuth M (1998) Efficient learning with virtual threshold gates. Information and Computation 141(1):66–83.Google Scholar
  15. Papadimitriou C, Raghavan P and Tamaki H (2000) Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Science, 61(2):217–235.Google Scholar
  16. Raghavan V and Wong S (1986) A critical analysis of the vector space model for information retrieval. Journal of the American Society for Information Science, 37(5):279–287.Google Scholar
  17. Rocchio J (1971) Relevance feedback in information retrieval. In: Salton G, Ed., The Smart Retrieval System- Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, NJ, pp. 313–323.Google Scholar
  18. Rosenblatt F (1958) The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review, 65(6):386–407.Google Scholar
  19. Salton G (1989), Ed. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA.Google Scholar
  20. Salton G and Buckley C (1990) Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288–297.Google Scholar
  21. Salton G, Wong S and Yang C (1975) A vector space model for automatic indexing. Comm. of ACM, 18(11):613–620.Google Scholar
  22. Sclaroff S, Taycher L and Cascia M(1997) ImageRover: A content-based image browser for theWorldWideWeb.In: Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries. IEEE Press, pp. 2–9.Google Scholar
  23. Taycher L. Cascia M and Sclaroff S (1997) Image digestion and relevance feedback in the ImageRover WWW search engines. In: Proceedings of the International Conference on Visual Information, pp. 85–92.Google Scholar
  24. Wong S, Yao Y and Bollmann P (1988) Linear structures in information retrieval. In: Proceedings of the 1988 ACM-SIGIR Conference on Information Retrieval, pp. 219–232.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Zhixiang Chen
    • 1
  • Binhai Zhu
    • 2
  1. 1.Department of Computer ScienceUniversity of Texas-Pan AmericanEdinburgUSA
  2. 2.Department of Computer ScienceMontana State UniversityBozemanUSA

Personalised recommendations