Skip to main content

Does SVM Really Scale Up to Large Bag of Words Feature Spaces?

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Abstract

We are concerned with the problem of learning classification rules in text categorization where many authors presented Support Vector Machines (SVM) as leading classification method. Number of studies, however, repeatedly pointed out that in some situations SVM is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule. In this paper, we aim at developing better understanding of SVM behaviour in typical text categorization problems represented by sparse bag of words feature spaces. We study in details the performance and the number of support vectors when varying the training set size, the number of features and, unlike existing studies, also SVM free parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show that SVM solutions with small C are high performers. However, most training documents are then bounded support vectors sharing a same weight C. Thus, SVM reduce to a nearest mean classifier; this raises an interesting question on SVM merits in sparse bag of words feature spaces. Additionally, SVM suffer from performance deterioration for particular training set size/number of features combinations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Burges, C., Crisp, D.: Uniqueness of the svm solution. In: Proceedings of the 12th Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA (1999)

    Google Scholar 

  • Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)

    Google Scholar 

  • Colas, F., Brazdil, P.: On the behavior of svm and some older algorithms in binary text classification tasks. In: Proceedings of TSD2006, Text Speech and Dialogue, pp. 45–52 (2006)

    Google Scholar 

  • Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel based learning methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  • Daelemans, W., Hoste, V., De Meulder, F., Naudts, B.: Combined optimization of feature selection and algorithm parameter interaction in machine learning of language. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, Springer, Heidelberg (2003)

    Google Scholar 

  • Davidov, D., Gabrilovich, E., Markovitch, S.: Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In: Proceedings of the 27th annual international conference on Research and development in information retrieval, pp. 250–257 (2004)

    Google Scholar 

  • Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998: Proceedings of the seventh international conference on Information and knowledge management, pp. 148–155. ACM Press, New York, US (1998)

    Chapter  Google Scholar 

  • Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)

    Article  MATH  Google Scholar 

  • Fürnkranz, J.: Pairwise classification as an ensemble technique. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 97–110. Springer, London, UK (2002)

    Chapter  Google Scholar 

  • Joachims, T.: Making large-scale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA (1998)

    Google Scholar 

  • Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., Ma, W.-Y.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Exploration Newsletter 7(1), 36–43 (2005)

    Article  Google Scholar 

  • McCallum, A. K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow

  • Mladenic, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 234–241. ACM Press, New York, US (2004)

    Google Scholar 

  • Rifkin, R., Pontil, M., Verri, A.: A note on support vector machine degeneracy. In: Algorithmic Learning Theory, 10th International Conference, ALT 1999, Tokyo, Japan, December 1999, vol. 1720, pp. 252–263. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  • Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Learning hierarchical multi-category text classification models. Journal of Machine Learning Research (2006)

    Google Scholar 

  • Schönhofen, P., Benczúr, A.A.: Exploiting extremely rare features in text categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 759–766. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Strang, G.: Introduction to applied mathematics. Wellesley-Cambridge Press (1986)

    Google Scholar 

  • Vapnik, V.N.: The Nature of Statistical Theory. Information Science and Statistics. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  • Yang, Y.: A scalability analysis of classifiers in text categorization. In: SIGIR-03, 26th ACM International Conference on Research and Development in Information Retrieval, ACM Press, New York, US (2003)

    Google Scholar 

  • Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, pp. 42–49. ACM Press, New York (1999)

    Chapter  Google Scholar 

  • Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Inf. Retr. 4(1), 5–31 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Colas, F., Paclík, P., Kok, J.N., Brazdil, P. (2007). Does SVM Really Scale Up to Large Bag of Words Feature Spaces?. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74825-0_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74824-3

  • Online ISBN: 978-3-540-74825-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics