# Data Discrimination via Nonlinear Generalized Support Vector Machines

## Abstract

The main purpose of this paper is to show that new formulations of support vector machines can generate nonlinear separating surfaces which can discriminate between elements of a given set better than a linear surface. The principal approach used is that of generalized support vector machines (GSVMs) [21] which employ possibly indefinite kernels. The GSVM training procedure is carried out by either a simple successive overrelaxation (SOR) [22] iterative method or by linear programming. This novel combination of powerful support vector machines [28, 7] with the highly effective SOR computational algorithm [19, 20, 17], or with linear programming, allows us to use a *nonlinear* surface to discriminate between elements of a dataset that belong to one of two categories. Numerical results on a number of datasets show improved testing set correctness, by as much as a factor of two, when comparing the nonlinear GSVM surface to a linear separating surface.

## Keywords

Support Vector Machine Linear Complementarity Problem Linear Kernel Linear Programming Formulation Convex Quadratic Program## Preview

Unable to display preview. Download preview PDF.

## References

- [1]K. P. Bennett, D. Hui, and L. Auslender. On support vector decision trees for database marketing. Department of Mathematical Sciences Math Report No. 98–100, Rensselaer Polytechnic Institute, Troy, NY 12180, March 1998. http://www.math.rpi.edu/~bennek/.Google Scholar
- [2]B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, PA, July 1992. ACM Press.Google Scholar
- [3]P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In J. Shavlik, editor, Machine Learning Proceedings of the Fifteenth International Conference (ICML ’98), pages 82–90, San Francisco, California, 1998. Morgan Kaufmann, ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98–03.ps.Google Scholar
- [4]E. J. Bredensteiner. Optimization Methods in Data Mining and Machine Learning. PhD thesis, Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, 1997.Google Scholar
- [5]E. J. Bredensteiner and K. P. Bennett. Feature minimization within decision trees. Computational Optimizations and Applications,
**10**:111–126, 1998.MathSciNetMATHGoogle Scholar - [6]C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.CrossRefGoogle Scholar
- [7]V. Cherkassky and F. Mulier. Learning from Data — Concepts, Theory and Methods. John Wiley & Sons, New York, 1998.MATHGoogle Scholar
- [8]G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, New Jersey, 1963.MATHGoogle Scholar
- [9]R. De Leone and O. L. Mangasarian. Serial and parallel solution of large scale linear programs by augmented Lagrangian successive overrelaxation. In A. Kurzhanski, K. Neumann, and D. Pallaschke, editors, Optimization, Parallel Processing and Applications, pages 103–124, Berlin, 1988. Springer-Verlag. Lecture Notes in Economics and Mathematical Systems 304.CrossRefGoogle Scholar
- [10]R. De Leone, O. L. Mangasarian, and T.-H. Shiau. Multi-sweep asynchronous parallel successive overrelaxation for the nonsymmet-ric linear complementarity problem. Annals of Operations Research, 22:43–54, 1990.MathSciNetMATHCrossRefGoogle Scholar
- [11]R. De Leone and M. A. Tork Roth. Massively parallel solution of quadratic programs via successive overrelaxation. Concurrency: Practice and Experience, 5:623–634, 1993.CrossRefGoogle Scholar
- [12]M. C. Ferris and O. L. Mangasarian. Parallel constraint distribution. SI AM Journal on Optimization,
**l**(4):487–500, 1991.MathSciNetCrossRefGoogle Scholar - [13]T.-T. Priess. Support vector neural networks: The kernel ada-tron with bias and soft margin. Technical report, Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, England, 1998. Revised Version: http://www.brunner-edv.com/friess/.Google Scholar
- [14]T.-T. Priess, N. Cristianini, and C. Campbell. The kernel-adatron algorithm: A fast and simple learning procedure for support vector machines. In Jude Shavlik, editor, Machine Learning Proceedings of the Fifteenth International Conference (ICML’98), pages 188–196, San Francisco, 1998. Morgan Kaufmann. http://www.svm.first.gmd.de/papers/FriCriCam98.ps.gz.Google Scholar
- [15]Tin Kam Ho and Eugene M. Kleinberg. Building projectable classifiers of arbitrary complexity. In Proceedings of the 13th International Conference on Pattern Recognition, pages 880–885, Vienna, Austria, 1996. http://www.cm.bell-labs.com/who/tkh/pubs.html. Checker dataset at: ftp://ftp.cs.wisc.edu/math-prog/cpo-dataset/machine-learn/checker.Google Scholar
- [16]L. Kaufman. Solving the quadratic programming problem arising in support vector classification. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 147–167. MIT Press, 1999.Google Scholar
- [17]Z.-Q. Luo and P. Tseng. Error bounds and convergence analysis of feasible descent methods: A general approach. Annals of Operations Research, 46:157–178, 1993.MathSciNetCrossRefGoogle Scholar
- [18]O. L. Mangasarian. Nonlinear Programming. McGraw-Hill, New York, 1969. Reprint: SIAM Classic in Applied Mathematics 10, 1994, Philadelphia.Google Scholar
- [19]O. L. Mangasarian. Solution of symmetric linear complementarity problems by iterative methods. Journal of Optimization Theory and Applications, 22(4):465–485, August 1977.MathSciNetMATHCrossRefGoogle Scholar
- [20]O. L. Mangasarian. On the convergence of iterates of an inexact matrix splitting algorithm for the symmetric monotone linear complementarity problem. SIAM Journal on Optimization, 1:114–122, 1991.MathSciNetMATHCrossRefGoogle Scholar
- [21]O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 135–146, Cambridge, MA, 2000. MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98–14.ps.Google Scholar
- [22]O. L. Mangasarian and David R. Musicant. Successive over-relaxation for support vector machines. IEEE Transactions on Neural Networks, 10:1032–1037, 1999. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98–18.ps.CrossRefGoogle Scholar
- [23]Matlab. User’s Guide. The MathWorks, Inc., Natick, MA 01760, 1992.Google Scholar
- [24]Matlab. Application Program Interface Guide. The MathWorks, Inc., Natick, MA 01760, 1997.Google Scholar
- [25]P. M. Murphy and D. W. Aha. UCI repository of machine learning databases, 1992. www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
- [26]B. A. Murtagh and M. A. Saunders. MINOS 5.0 user’s guide. Technical Report SOL 83.20, Stanford University, December 1983. MINOS 5.4 Release Notes, December 1992.Google Scholar
- [27]B. T. Polyak. Introduction to Optimization. Optimization Software, Inc., Publications Division, New York, 1987.Google Scholar
- [28]V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.MATHGoogle Scholar