Chapter

Advances in Natural Computation

Volume 3610 of the series Lecture Notes in Computer Science pp 554-564

Training Data Selection for Support Vector Machines

  • Jigang WangAffiliated withInstitute for Brain and Neural Systems, Physics Department, Brown University
  • , Predrag NeskovicAffiliated withInstitute for Brain and Neural Systems, Physics Department, Brown University
  • , Leon N. CooperAffiliated withInstitute for Brain and Neural Systems, Physics Department, Brown University

Abstract

In recent years, support vector machines (SVMs) have become a popular tool for pattern recognition and machine learning. Training a SVM involves solving a constrained quadratic programming problem, which requires large memory and enormous amounts of training time for large-scale problems. In contrast, the SVM decision function is fully determined by a small subset of the training data, called support vectors. Therefore, it is desirable to remove from the training set the data that is irrelevant to the final decision function. In this paper we propose two new methods that select a subset of data for SVM training. Using real-world datasets, we compare the effectiveness of the proposed data selection strategies in terms of their ability to reduce the training set size while maintaining the generalization performance of the resulting SVM classifiers. Our experimental results show that a significant amount of training data can be removed by our proposed methods without degrading the performance of the resulting SVM classifiers.