Machine Learning

, Volume 59, Issue 1–2, pp 77–97 | Cite as

Multicategory Proximal Support Vector Machine Classifiers

Article

Abstract

Given a dataset, each element of which labeled by one of k labels, we construct by a very fast algorithm, a k-category proximal support vector machine (PSVM) classifier. Proximal support vector machines and related approaches (Fung & Mangasarian, 2001; Suykens & Vandewalle, 1999) can be interpreted as ridge regression applied to classification problems (Evgeniou, Pontil, & Poggio, 2000). Extensive computational results have shown the effectiveness of PSVM for two-class classification problems where the separating plane is constructed in time that can be as little as two orders of magnitude shorter than that of conventional support vector machines. When PSVM is applied to problems with more than two classes, the well known one-from-the-rest approach is a natural choice in order to take advantage of its fast performance. However, there is a drawback associated with this one-from-the-rest approach. The resulting two-class problems are often very unbalanced, leading in some cases to poor performance. We propose balancing the k classes and a novel Newton refinement modification to PSVM in order to deal with this problem. Computational results indicate that these two modifications preserve the speed of PSVM while often leading to significant test set improvement over a plain PSVM one-from-the-rest application. The modified approach is considerably faster than other one-from-the-rest methods that use conventional SVM formulations, while still giving comparable test set correctness.

Keywords

multicategory data classification support vector machines proximal classifiers 

References

  1. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., & Sorensen, D. (1999). LAPACK User’s Guide, third edition. Philadelphia, Pennsylvania: SIAM. http://www.netlib.org/lapack/.
  2. Bennett, K. P. & Mangasarian, O. L. (1993). Multicategory separation via linear programming. Optimization Methods and Software, 3, 27–39.Google Scholar
  3. Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel, L., LeCun, Y., Muller, U., Sackinger, E., Simard, P., & Vapnik, V. (1994). Comparison of classifier methods: A case study in handwriting digit recognition. International Conference on Pattern Recognition (pp. 77–87). IEEE Computer Society Press.Google Scholar
  4. Bradley, P. S. & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optimization Methods and Software, 13, 1–10. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.Google Scholar
  5. Bredensteiner, E. J. & Bennett, K. P. (1999). Multicategory classification by support vector machines. Computational Optimization and Applications, 12, 53–79.Google Scholar
  6. Burges, C. J. C. (1998).Atutorial on support vectormachines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 121–167.CrossRefGoogle Scholar
  7. Chen, C. & Mangasarian, O. L. (1996). Hybrid misclassification minimization. Advances in Computational Mathematics, 5:2, 127–136. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/95-05.ps.Google Scholar
  8. Cherkassky, V. & Mulier, F. (1998). Learning from Data—Concepts, Theory and Methods. New York: JohnWiley & Sons.Google Scholar
  9. CPLEX Optimization Inc., Incline Village, Nevada. (1992). Using the CPLEX(TM) Linear Optimizer and CPLEX(TM) Mixed Integer Optimizer (Version 2.0).Google Scholar
  10. Dasgupta, S. (2000). Experiments with random projection. Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000) (pp. 143–151). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  11. Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 171–203). Cambridge, MA: MIT Press.Google Scholar
  12. Facchinei, F. (1995). Minimization of SC1 functions and the Maratos effect. Operations Research Letters, 17, 131–137.Google Scholar
  13. Fung, G. & Mangasarian, O. L. (2001). Proximal support vector machine classifiers. In F. Provost & R. Srikant (Eds.), Proceedings KDD-2001: Knowledge discovery and data mining (pp. 77–86). San Francisco, CA, New York: Asscociation for Computing Machinery. ftp://ftp.cs.wisc.edu/pub/ dmi/tech-reports/01-02.ps.Google Scholar
  14. Furey, T. S., Duffy, N., Cristianini, N., Bednarski, D., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cnacer tissue samples usingmicroarray expression data. Bioinformatics, 16:10, 906–914.PubMedGoogle Scholar
  15. Van Gestel, T., Suykens, J., Lanckriet, G., Lambrechts, A., De Moor, B., & Vandewalle, J. (2002). Multiclass ls-svms: moderated outputs and coding-decoding schemes. Neural Processing Letters, 15:1, 45–48.Google Scholar
  16. Hiriart-Urruty, J.-B., Strodiot, J. J., & Nguyen, V. H. (1984). Generalized hessian matrix and second-order optimality conditions for problems with CL1 data. Applied Mathematics and Optimization, 11, 43–56.Google Scholar
  17. Hoerl, A. E. & Kennard, R.W. (1952). Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.Google Scholar
  18. Hsu, C.-W. & Lin, C.-J. (2001). A comparison on methods for multi-class support vector machines. http://www.csie.ntu.edu.tw/cjlin/papers.html.
  19. Kanzow, C., Qi, H., & Qi, L. (2001). On the minimum norm solution of linear programs. Preprint, University of Hamburg, Hamburg. Journal of Optimization Theory and Applications, to appear. http://www.math.uni-hamburg.de/home/kanzow/paper.html.
  20. Lee, Y.-J. & Mangasarian, O. L. (2001). RSVM: Reduced support vector machines. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7, CD-ROM. ftp://ftp.cs.wisc.edu/ pub/dmi/tech-reports/00-07.ps.Google Scholar
  21. Lee, Y.-J. & Mangasarian, O. L. (2001). SSVM: A smooth support vector machine. Computational Optimization and Applications, 20, 5–22. Data Mining Institute, University of Wisconsin, Technical Report 99-03. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.ps.MathSciNetGoogle Scholar
  22. Mangasarian, O. L. (1994). Nonlinear Programming. Philadelphia, PA: SIAM.Google Scholar
  23. Mangasarian, O. L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 135–146). Cambridge, MA: MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps.Google Scholar
  24. MATLAB. (1994–2001). User’s Guide. The MathWorks, Inc., Natick, MA 01760. http://www.mathworks.com.
  25. Murphy, P. M. & Aha, D. W. (1992). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
  26. Polyak, B. T. (1987). Introduction to Optimization. New York: Optimization Software, Inc., Publications Division.Google Scholar
  27. Rockafellar, R. T. (1970). Convex Analysis Princeton. New Jersey: Princeton University Press.Google Scholar
  28. Roth, V. V. & Steinhage, V. (1999). Nonlinear discriminant analysis using kernel function. In S. A. Solla, T. K. Leen, & K.-R. Mueller (Eds.), Advances in neural information processing systems–NIPS*99 (pp. 568–574).Google Scholar
  29. Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K.-R., Rätsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10, 1000–1017.CrossRefGoogle Scholar
  30. Smola, A. J. & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. Proc. 17th International Conf. on Machine Learning (pp. 911–918). San Francisco, CA: Morgan Kaufmann.Google Scholar
  31. Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B., & Vandewalle, J. (2002). Least Squares Support Vector Machines. Singapore: World Scientific Publishing Co..Google Scholar
  32. Suykens, J. A. K., Lukas, L., Van Dooren, P., De Moor, B., & Vandewalle, J. (1999). Least squares support vector machine classifiers: A large scale algorithm. European Conference on Circuit Theory and Design, ECCTD’99 (pp. 839–842). Stresa, Italy.Google Scholar
  33. Suykens, J. A. K. & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9:3, 293–300.Google Scholar
  34. Suykens, J. A. K. & Vandewalle, J. (1999). Multiclass least squares support vector machines. Proceedings of IJCNN’99 (pp. CD-ROM). Washington, DC.Google Scholar
  35. Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of Ill–Posed Problems. New York: John Wiley & Sons.Google Scholar
  36. Vapnik, V. N. (2000). The Nature of Statistical Learning Theory. New York: Springer.Google Scholar
  37. Weston, J. & Watkins, C. (1998). Multi-class support vector machines. Technical report csd-tr-98-04, Royal Holloway, University of London, Surrey, England.Google Scholar
  38. Williams, C. K. I. & Seeger, M. (2000). Using the Nyström method to speed up kernel machines. Advances in Neural Information Processing Systems (NIPS2000). http://www.kernel-machines.org.

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Computer-Aided Diagnosis & Therapy SolutionsSiemens Medical Solutions, IncMalvern
  2. 2.Computer Sciences DepartmentUniversity of WisconsinMadison
  3. 3.Department of MathematicsUniversity of California at San DiegoLa Jolla

Personalised recommendations