Abstract
We are concerned with the problem of learning classification rules in text categorization where many authors presented Support Vector Machines (SVM) as leading classification method. Number of studies, however, repeatedly pointed out that in some situations SVM is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule. In this paper, we aim at developing better understanding of SVM behaviour in typical text categorization problems represented by sparse bag of words feature spaces. We study in details the performance and the number of support vectors when varying the training set size, the number of features and, unlike existing studies, also SVM free parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show that SVM solutions with small C are high performers. However, most training documents are then bounded support vectors sharing a same weight C. Thus, SVM reduce to a nearest mean classifier; this raises an interesting question on SVM merits in sparse bag of words feature spaces. Additionally, SVM suffer from performance deterioration for particular training set size/number of features combinations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Burges, C., Crisp, D.: Uniqueness of the svm solution. In: Proceedings of the 12th Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA (1999)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)
Colas, F., Brazdil, P.: On the behavior of svm and some older algorithms in binary text classification tasks. In: Proceedings of TSD2006, Text Speech and Dialogue, pp. 45–52 (2006)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel based learning methods. Cambridge University Press, Cambridge (2000)
Daelemans, W., Hoste, V., De Meulder, F., Naudts, B.: Combined optimization of feature selection and algorithm parameter interaction in machine learning of language. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, Springer, Heidelberg (2003)
Davidov, D., Gabrilovich, E., Markovitch, S.: Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In: Proceedings of the 27th annual international conference on Research and development in information retrieval, pp. 250–257 (2004)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998: Proceedings of the seventh international conference on Information and knowledge management, pp. 148–155. ACM Press, New York, US (1998)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
Fürnkranz, J.: Pairwise classification as an ensemble technique. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 97–110. Springer, London, UK (2002)
Joachims, T.: Making large-scale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA (1998)
Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., Ma, W.-Y.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Exploration Newsletter 7(1), 36–43 (2005)
McCallum, A. K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow
Mladenic, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 234–241. ACM Press, New York, US (2004)
Rifkin, R., Pontil, M., Verri, A.: A note on support vector machine degeneracy. In: Algorithmic Learning Theory, 10th International Conference, ALT 1999, Tokyo, Japan, December 1999, vol. 1720, pp. 252–263. Springer, Heidelberg (1999)
Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Learning hierarchical multi-category text classification models. Journal of Machine Learning Research (2006)
Schönhofen, P., Benczúr, A.A.: Exploiting extremely rare features in text categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 759–766. Springer, Heidelberg (2006)
Strang, G.: Introduction to applied mathematics. Wellesley-Cambridge Press (1986)
Vapnik, V.N.: The Nature of Statistical Theory. Information Science and Statistics. Springer, Heidelberg (1995)
Yang, Y.: A scalability analysis of classifiers in text categorization. In: SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval, ACM Press, New York, US (2003)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, pp. 42–49. ACM Press, New York (1999)
Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Inf. Retr. 4(1), 5–31 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Colas, F., Paclík, P., Kok, J.N., Brazdil, P. (2007). Does SVM Really Scale Up to Large Bag of Words Feature Spaces?. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-74825-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)