Text categorization with Support Vector Machines: Learning with many relevant features
This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.
Unable to display preview. Download preview PDF.
- 1.C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20:273–297, November 1995.Google Scholar
- 2.T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In International Conference on Machine Learning (ICML), 1997.Google Scholar
- 3.T. Joachims. Text categorization with support vector machines: Learning with many relevant features. Technical Report 23, Universität Dortmund, LS VIII, 1997.Google Scholar
- 4.J. Kivinen, M. Warmuth, and P. Auer. The perceptron algorithm vs. winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant. In Conference on Computational Learning Theory, 1995.Google Scholar
- 5.T. Mitchell. Machine Learning. McGraw-Hill, 1997.Google Scholar
- 6.J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
- 7.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice-Hall Inc., 1971.Google Scholar
- 9.Vladimir N. Vapnik. The Nature of Statistical Learning. Springer, New York, 1995.Google Scholar
- 10.Y. Yang. An evaluation of statistical approaches to text categorization. Technical Report CMU-CS-97-127, Carnegie Mellon University, April 1997.Google Scholar
- 11.Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In International Conference on Machine Learning (ICML), 1997.Google Scholar