A Hybrid Relevance-Feedback Approach to Text Retrieval
Relevance feedback (RF) has been an effective query modification approach to improving the performance of information retrieval (IR) by interactively asking a user whether a set of documents are relevant or not to a given query concept. The conventional RF algorithms either converge slowly or cost a user’s additional efforts in reading irrelevant documents. This paper surveys several RF algorithms and introduces a novel hybrid RF approach using a support vector machine (HRFSVM), which actively selects the uncertain documents as well as the most relevant ones on which to ask users for feedback. It can efficiently rank documents in a natural way for user browsing. We conduct experiments on Reuters-21578 dataset and track the precision as a function of feedback iterations. Experimental results have shown that HRFSVM significantly outperforms two other RF algorithms.
Unable to display preview. Download preview PDF.
- C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, (2):121–167, 1998.Google Scholar
- D. Cohn and Z. Ghahramani. Active learning with statistical models. Journal of Artificial Intelligence Research, (4):129–145, 1996.Google Scholar
- H. Drucker, B. Shahraray, and D. Gibbon. Relevance feedback using support vector machines. In Proceedings of the 18th International Conference on Machine Learning, pages 122–129, 2001.Google Scholar
- S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of the Seventh International Conference on Information and Knowledge Management. ACM Press, 1998.Google Scholar
- D. Harman. Relevance feedback revisited. In Proceedings of the Fifth International SIGIR Conference on Research and Development in Information Retrieval, pages 1–10, 1992.Google Scholar
- T. Joachims. Text categorization with support vector machines. In Proceedings of the European Conference on Machine Learning. Springer Verlag, 1998.Google Scholar
- D. Lewis and W. Gale. A sequential algorithm for training text classifiers. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, 1994.Google Scholar
- T. Mitchell. Generalization as search. Artificial Intelligence, (28):203–226, 1982.Google Scholar
- J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice Hall, 1971.Google Scholar
- G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.Google Scholar
- S. Tong and D. Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, (2):45–66, 2001.Google Scholar
- V. Vapnik. Estimation of Dependences Based on Empirical Data. Springer Verlag, 1982.Google Scholar