A Hybrid Relevance-Feedback Approach to Text Retrieval
Relevance feedback (RF) has been an effective query modification approach to improving the performance of information retrieval (IR) by interactively asking a user whether a set of documents are relevant or not to a given query concept. The conventional RF algorithms either converge slowly or cost a user’s additional efforts in reading irrelevant documents. This paper surveys several RF algorithms and introduces a novel hybrid RF approach using a support vector machine (HRFSVM), which actively selects the uncertain documents as well as the most relevant ones on which to ask users for feedback. It can efficiently rank documents in a natural way for user browsing. We conduct experiments on Reuters-21578 dataset and track the precision as a function of feedback iterations. Experimental results have shown that HRFSVM significantly outperforms two other RF algorithms.
KeywordsSupport Vector Machine Relevant Document Support Vector Machine Model Relevance Feedback Document Retrieval
Unable to display preview. Download preview PDF.
- C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, (2):121–167, 1998.Google Scholar
- D. Cohn and Z. Ghahramani. Active learning with statistical models. Journal of Artificial Intelligence Research, (4):129–145, 1996.Google Scholar
- H. Drucker, B. Shahraray, and D. Gibbon. Relevance feedback using support vector machines. In Proceedings of the 18th International Conference on Machine Learning, pages 122–129, 2001.Google Scholar
- S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of the Seventh International Conference on Information and Knowledge Management. ACM Press, 1998.Google Scholar
- D. Harman. Relevance feedback revisited. In Proceedings of the Fifth International SIGIR Conference on Research and Development in Information Retrieval, pages 1–10, 1992.Google Scholar
- T. Joachims. Text categorization with support vector machines. In Proceedings of the European Conference on Machine Learning. Springer Verlag, 1998.Google Scholar
- D. Lewis and W. Gale. A sequential algorithm for training text classifiers. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, 1994.Google Scholar
- T. Mitchell. Generalization as search. Artificial Intelligence, (28):203–226, 1982.Google Scholar
- J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice Hall, 1971.Google Scholar
- G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.Google Scholar
- S. Tong and D. Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, (2):45–66, 2001.Google Scholar
- V. Vapnik. Estimation of Dependences Based on Empirical Data. Springer Verlag, 1982.Google Scholar