Annals of Operations Research

, Volume 216, Issue 1, pp 229–255 | Cite as

Relaxing support vectors for classification

  • Onur Şeref
  • Wanpracha A. Chaovalitwongse
  • J. Paul Brooks
Article

Abstract

We introduce a novel modification to standard support vector machine (SVM) formulations based on a limited amount of penalty-free slack to reduce the influence of misclassified samples or outliers. We show that free slack relaxes support vectors and pushes them towards their respective classes, hence we use the name relaxed support vector machines (RSVM) for our method. We present theoretical properties of the RSVM formulation and develop its dual formulation for nonlinear classification via kernels. We show the connection between the dual RSVM and the dual of the standard SVM formulations. We provide error bounds for RSVM and show it to be stable, universally consistent and tighter than error bounds for standard SVM. We also introduce a linear programming version of RSVM, which we call RSVMLP. We apply RSVM and RSVMLP to synthetic data and benchmark binary classification problems, and compare our results with standard SVM classification results. We show that relaxed influential support vectors may lead to better classification results. We develop a two-phase method called RSVM2 for multiple instance classification (MIC) problems, where RSVM formulations are used as classifiers. We extend the two-phase method to the linear programming case and develop RSVMLP2. We demonstrate the classification characteristics of RSVM2 and RSVMLP2, and report our classification results compared to results obtained by other SVM-based MIC methods on public benchmark datasets. We show that both RSVM2 and RSVMLP2 are faster and produce more accurate classification results.

Keywords

Classification Support vector machines Error bounds Multiple instance classification 

Notes

Acknowledgements

This research is supported in part by NASA Award NNX09AR44A, and NIH-NIAID Award UH3AI08326-01.

References

  1. Andrews, S., Hofmann, T., & Tsochantaridis, I. (2002). Multiple instance learning with generalized support vector machines. In Eighteenth national conference on artificial intelligence (pp. 943–944). Menlo Park: Am. Assoc. of Artificial Intelligence. Google Scholar
  2. Andrews, S., Tsochantaridis, I., & Hofmann, T. (2003). Support vector machines for multiple-instance learning. Advances in Neural Information Processing Systems, 15, 561–568. Google Scholar
  3. Bartlett, P. L., & Mendelson, S. (2003). Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482. Google Scholar
  4. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In COLT’92, Proceedings of the fifth annual workshop on computational learning theory (pp. 144–152). New York: ACM. CrossRefGoogle Scholar
  5. Brow, T., Settles, B., & Craven, M. (2005). Classifying biomedical articles by making localized decisions. In Proceedings of the fourteenth text retrieval conference (TREC05). Google Scholar
  6. Byun, H., & Lee, S. W. (2002). Applications of support vector machines for pattern recognition: a survey. In SVM’02, Proceedings of the first international workshop on pattern recognition with support vector machines (pp. 213–236). London: Springer. CrossRefGoogle Scholar
  7. Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410. CrossRefGoogle Scholar
  8. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. CrossRefGoogle Scholar
  9. Chen, Y., & Wang, J. Z. (2004). Image categorization by learning and reasoning with regions. Journal of Machine Learning Research, 5, 913–939. Google Scholar
  10. Chien, L. J., Lee, Y. J., Kao, Z. P., & Chang, C. C. (2010). Robust 1-norm soft margin smooth support vector machine. In Proceedings of the 11th international conference on intelligent data engineering and automated learning (pp. 145–152). Berlin, Heidelberg: Springer. Google Scholar
  11. Cortes, C., & Vapnik, V. N. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. Google Scholar
  12. Cover, T. M. (1968). Rates of convergence for nearest neighbor procedures. In Proceedings of the Hawaii international conference on system sciences. Google Scholar
  13. Cristianini, N., & Taylor, J. S. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  14. De Mol, C., De Vito, E., & Rosasco, L. (2009). Elastic-net regularization in learning theory. Journal of Complexity, 25(2), 201–230. CrossRefGoogle Scholar
  15. Devroye, L., Györfi, L., & Lugosi, G. (1996). Applications of mathematics: Vol. 1. A probabilistic theory of pattern recognition. Berlin: Springer. CrossRefGoogle Scholar
  16. Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1–2), 31–71. CrossRefGoogle Scholar
  17. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification (Vol. 2). New York: Wiley. Google Scholar
  18. Fan, R. E., Chen, P. H., & Lin, C. J. (2005). Working set selection using second order information for training SVM. Journal of Machine Learning Research, 6, 1889–1918. Google Scholar
  19. Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine: University of California, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml. Google Scholar
  20. Hastings, C., Mosteller, F., Tukey, J. W., & Winsor, C. P. (1947). Low moments for small samples: a comparative study of order statistics. Annals of Mathematical Statistics, 18, 413–426. CrossRefGoogle Scholar
  21. Ivanciuc, O. (2007). Applications of support vector machines in chemistry. Biochemistry, 23(2), 291–400. Google Scholar
  22. Krause, N., & Singer, Y. (2004). Leveraging the margin more carefully. In Twenty first international conference on machine learning, ICML 04 (pp. 63–70). CrossRefGoogle Scholar
  23. Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., & Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain-computer interfaces. Journal of Neural Engineering, 4(2), R1–R13. CrossRefGoogle Scholar
  24. Mangasarian, O. L., & Wild, E. W. (2007). Multiple instance classification via successive linear programming. Journal of Optimization Theory and Applications, 137(3), 555–568. CrossRefGoogle Scholar
  25. Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Functional gradient techniques for combining hypotheses. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 221–246). Cambridge: MIT Press. Google Scholar
  26. Noble, W. S. (2004). Kernel methods in computational biology (pp. 71–92). Cambridge: MIT Press. Google Scholar
  27. Pedroso, J. P., & Murata, N. (2001). Support vector machines with different norms: motivation, formulations and results. Pattern Recognition Letters, 22, 1263–1272. CrossRefGoogle Scholar
  28. Platt, J. C. (1998). Sequential minimal optimization: a fast algorithm for training support vector machines. Advances in Kernel Methods: Support Vector Learning, 208, 1–21. Google Scholar
  29. Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification (Vol. 12, pp. 547–553). Cambridge: MIT Press. Google Scholar
  30. Qi, X., & Han, Y. (2007). Incorporating multiple SVMs for automatic image annotation. Pattern Recognition, 40(2), 728–741. CrossRefGoogle Scholar
  31. Rätsch, G. (2011). IDA benchmark repository. Friedrich Miescher Laboratory of the Max Planck Institute. URL http://www.fml.tuebingen.mpg.de/Members/raetsch/benchmark.
  32. Schölkopf, B., Smola, A. J., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural Computing, 12, 1207–1245. CrossRefGoogle Scholar
  33. Seref, O., Kundakcioglu, O. E., & Bewernitz, M. (2008). Support vector machines in neuroscience. In N. Wickramasinghe & E. Geisler (Eds.), Encyclopedia of healthcare information systems (pp. 1283–1293). Hershey: IGI Global. CrossRefGoogle Scholar
  34. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis (Vol. 47). Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  35. Solera-Ureña, R., Padrell-Sendra, J., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., & Díaz-De-María, F. (2007). SVMs for automatic speech recognition: a survey. In Progress in nonlinear speech processing (pp. 190–216) Berlin, Heidelberg: Springer CrossRefGoogle Scholar
  36. Song, Q., Hu, W., & Xie, W. (2002). Robust support vector machine with bullet hole image classification. Structure, 32(4), 440–448. Google Scholar
  37. Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2(1), 67–93. Google Scholar
  38. Steinwart, I. (2002). Support vector machines are universally consistent. Journal of Complexity, 18(3), 768–791. CrossRefGoogle Scholar
  39. Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51(1), 128–142. CrossRefGoogle Scholar
  40. Tuia, D., Volpi, M., Copa, L., Kanevski, M., & Munoz-Mari, J. (2011). A survey of active learning algorithms for supervised remote sensing image classification. IEEE Journal of Selected Topics in Signal Processing, 5(3), 606–617. CrossRefGoogle Scholar
  41. Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley. Google Scholar
  42. Wahba, G. (1997). Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV (Chap. 6, pp. 69–88). Cambridge: MIT Press Google Scholar
  43. Wang, J. Y. (2002). Application of support vector machines in bioinformatics. Master’s thesis, National Taiwan University. Google Scholar
  44. Wang, L., Zhu, J., & Zou, H. (2006). The doubly regularized support vector machine. Statistica Sinica, 16(2), 589–615. Google Scholar
  45. Weston, J., & Herbrich, R. (2000). Adaptive margin support vector machines. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 281–295). Cambridge: MIT Press. Google Scholar
  46. Wu, Y., & Liu, Y. (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479), 974–983. CrossRefGoogle Scholar
  47. Xu, L., Crammer, K., & Schuurmans, D. (2006). Robust support vector machine training via convex outlier ablation. In Proceedings of the 21st national conference on artificial intelligence (Vol. 1, pp. 536–542). Menlo Park: AAAI Press. Google Scholar
  48. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B, 67(2), 301–320. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Onur Şeref
    • 1
  • Wanpracha A. Chaovalitwongse
    • 2
  • J. Paul Brooks
    • 3
  1. 1.Business Information Technology, Pamplin College of BusinessVirginia Institute of Technology and State UniversityBlacksburgUSA
  2. 2.Department of Industrial and Systems Engineering, Department of RadiologyUniversity of WashingtonSeattleUSA
  3. 3.Department of Statistical Sciences and Operations ResearchVirginia Commonwealth UniversityRichmondUSA

Personalised recommendations