Abstract
This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.
Chapter PDF
Keywords
- Support Vector Machine
- Radial Basic Function
- Text Categorization
- Irrelevant Feature
- Linear Threshold Function
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20:273–297, November 1995.
T. Joachims. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. In International Conference on Machine Learning (ICML), 1997.
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. Technical Report 23, Universität Dortmund, LS VIII, 1997.
J. Kivinen, M. Warmuth, and P. Auer. The perceptron algorithm vs. winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant. In Conference on Computational Learning Theory, 1995.
T. Mitchell. Machine Learning. McGraw-Hill, 1997.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice-Hall Inc., 1971.
G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523, 1988.
Vladimir N. Vapnik. The Nature of Statistical Learning. Springer, New York, 1995.
Y. Yang. An evaluation of statistical approaches to text categorization. Technical Report CMU-CS-97-127, Carnegie Mellon University, April 1997.
Y. Yang and J. Pedersen. A comparative study on feature selection in text categorization. In International Conference on Machine Learning (ICML), 1997.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds) Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science, vol 1398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026683
Download citation
DOI: https://doi.org/10.1007/BFb0026683
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64417-0
Online ISBN: 978-3-540-69781-7
eBook Packages: Springer Book Archive