Does SVM Really Scale Up to Large Bag of Words Feature Spaces?

Colas, Fabrice; Paclík, Pavel; Kok, Joost N.; Brazdil, Pavel

doi:10.1007/978-3-540-74825-0_27

Does SVM Really Scale Up to Large Bag of Words Feature Spaces?

Fabrice Colas¹,
Pavel Paclík²,
Joost N. Kok¹ &
…
Pavel Brazdil³

Conference paper

1611 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Abstract

We are concerned with the problem of learning classification rules in text categorization where many authors presented Support Vector Machines (SVM) as leading classification method. Number of studies, however, repeatedly pointed out that in some situations SVM is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule. In this paper, we aim at developing better understanding of SVM behaviour in typical text categorization problems represented by sparse bag of words feature spaces. We study in details the performance and the number of support vectors when varying the training set size, the number of features and, unlike existing studies, also SVM free parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show that SVM solutions with small C are high performers. However, most training documents are then bounded support vectors sharing a same weight C. Thus, SVM reduce to a nearest mean classifier; this raises an interesting question on SVM merits in sparse bag of words feature spaces. Additionally, SVM suffer from performance deterioration for particular training set size/number of features combinations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Burges, C., Crisp, D.: Uniqueness of the svm solution. In: Proceedings of the 12th Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA (1999)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)
Google Scholar
Colas, F., Brazdil, P.: On the behavior of svm and some older algorithms in binary text classification tasks. In: Proceedings of TSD2006, Text Speech and Dialogue, pp. 45–52 (2006)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel based learning methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Daelemans, W., Hoste, V., De Meulder, F., Naudts, B.: Combined optimization of feature selection and algorithm parameter interaction in machine learning of language. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, Springer, Heidelberg (2003)
Google Scholar
Davidov, D., Gabrilovich, E., Markovitch, S.: Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In: Proceedings of the 27th annual international conference on Research and development in information retrieval, pp. 250–257 (2004)
Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998: Proceedings of the seventh international conference on Information and knowledge management, pp. 148–155. ACM Press, New York, US (1998)
Chapter Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
Article MATH Google Scholar
Fürnkranz, J.: Pairwise classification as an ensemble technique. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 97–110. Springer, London, UK (2002)
Chapter Google Scholar
Joachims, T.: Making large-scale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA (1998)
Google Scholar
Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., Ma, W.-Y.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Exploration Newsletter 7(1), 36–43 (2005)
Article Google Scholar
McCallum, A. K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow
Mladenic, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 234–241. ACM Press, New York, US (2004)
Google Scholar
Rifkin, R., Pontil, M., Verri, A.: A note on support vector machine degeneracy. In: Algorithmic Learning Theory, 10th International Conference, ALT 1999, Tokyo, Japan, December 1999, vol. 1720, pp. 252–263. Springer, Heidelberg (1999)
Chapter Google Scholar
Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Learning hierarchical multi-category text classification models. Journal of Machine Learning Research (2006)
Google Scholar
Schönhofen, P., Benczúr, A.A.: Exploiting extremely rare features in text categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 759–766. Springer, Heidelberg (2006)
Chapter Google Scholar
Strang, G.: Introduction to applied mathematics. Wellesley-Cambridge Press (1986)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Theory. Information Science and Statistics. Springer, Heidelberg (1995)
MATH Google Scholar
Yang, Y.: A scalability analysis of classifiers in text categorization. In: SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval, ACM Press, New York, US (2003)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, pp. 42–49. ACM Press, New York (1999)
Chapter Google Scholar
Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Inf. Retr. 4(1), 5–31 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

LIACS, Leiden University, The Netherlands
Fabrice Colas & Joost N. Kok
ICT Group, Delft University of Technology, The Netherlands
Pavel Paclík
LIACC-NIAAD, University of Porto, Portugal
Pavel Brazdil

Authors

Fabrice Colas
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Paclík
View author publications
You can also search for this author in PubMed Google Scholar
Joost N. Kok
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Brazdil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Colas, F., Paclík, P., Kok, J.N., Brazdil, P. (2007). Does SVM Really Scale Up to Large Bag of Words Feature Spaces?. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-74825-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics