Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks

Colas, Fabrice; Brazdil, Pavel

doi:10.1007/978-0-387-34747-9_18

Fabrice Colas² &
Pavel Brazdil³

Part of the book series: IFIP International Federation for Information Processing ((IFIPAICT,volume 217))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence in Theory and Practice

7427 Accesses
84 Citations

Summary

Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM ?

We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.

Download to read the full chapter text

Chapter PDF

Selecting Features with SVM

Analytic Feature Selection for Support Vector Machines

Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naïve Bayes

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

W. Daelemans, V. Hoste, F. D. Meulder, and B. Naudts. Combined optimization of feature selection and algorithm parameters in machine learning of language. In Proceedings of the European Conference of Machine Learning, pages 84–95, 2003.
Google Scholar
S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management, pages 148–155, 1998.
Google Scholar
J. Fürnkranz. Pairwise classification as an ensemble technique. In Proceedings of the 13th European Conference on Machine Learning, pages 97–110, 2002.
Google Scholar
T. Joachims. Making large-scale support vector machine learning practical. In Advances in Kernel Methods: Support Vector Machines. 1998.
Google Scholar
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. AAAI-98 Workshop on Learning for Text Categorization, 1998.
Google Scholar
A.K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow, 1996.
Google Scholar
T.M. Mitchell. Machine Learning. McGraw-Hill, 1997.
Google Scholar
J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report 98-14, Microsoft Research, 1998.
Google Scholar
M. Rogati and Y. Yang. High-performing feature selection for text classification. In Proceedings of the 11th International Conference on Information and Knowledge Management, pages 659–661, 2002.
Google Scholar
Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, pages 69–90, 1999.
Google Scholar
Y. Yang. A scalability analysis of classifiers in text categorization. In Proceedings 26th ACM International Conference on Research and Development in Information Retrieval, 2003.
Google Scholar
Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 42–49, 1999.
Google Scholar
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning, pages 412–420, 1997.
Google Scholar
T. Zhang and F. J. Oles. Text categorization based on regularized linear classification methods. Information Retrieval, pages 5–31, 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

LIACS, Leiden University, The Netherlands
Fabrice Colas
LIACC-NIAAD, University of Porto, Portugal
Pavel Brazdil

Authors

Fabrice Colas
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Brazdil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Portsmouth, UK
Max Bramer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Colas, F., Brazdil, P. (2006). Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks. In: Bramer, M. (eds) Artificial Intelligence in Theory and Practice. IFIP AI 2006. IFIP International Federation for Information Processing, vol 217. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-34747-9_18

Download citation

DOI: https://doi.org/10.1007/978-0-387-34747-9_18
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34654-0
Online ISBN: 978-0-387-34747-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks

Summary

Chapter PDF

Similar content being viewed by others

Selecting Features with SVM

Analytic Feature Selection for Support Vector Machines

Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naïve Bayes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks

Summary

Chapter PDF

Similar content being viewed by others

Selecting Features with SVM

Analytic Feature Selection for Support Vector Machines

Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naïve Bayes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation