Machine Learning

, Volume 46, Issue 1–3, pp 255–269 | Cite as

Large Scale Kernel Regression via Linear Programming

  • O.L. Mangasarian
  • David R. Musicant


The problem of tolerant data fitting by a nonlinear surface, induced by a kernel-based support vector machine is formulated as a linear program with fewer number of variables than that of other linear programming formulations. A generalization of the linear programming chunking algorithm for arbitrary kernels is implemented for solving problems with very large datasets wherein chunking is performed on both data points and problem variables. The proposed approach tolerates a small error, which is adjusted parametrically, while fitting the given data. This leads to improved fitting of noisy data (over ordinary least error solutions) as demonstrated computationally. Comparative numerical results indicate an average time reduction as high as 26.0% over other formulations, with a maximal time reduction of 79.7%. Additionally, linear programs with as many as 16,000 data points and more than a billion nonzero matrix elements are solved.

kernel regression support vector machines linear programming 


  1. Bennett, K. P. (1999). Combining support vector and mathematical programming methods for induction. In B. Schölkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods: Support vector mechines (pp. 307-326). Cambridge, MA: MIT Press.Google Scholar
  2. Bradley, P. S. & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optimization, Methods and Software, (vol. 13, pp. 1-10). Scholar
  3. Burges, C. J. C. (1998). Atutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 121-167.Google Scholar
  4. Cherkassky, V. & Mulier, F. (1998). Learning from data-concepts, theory and methods. New York: John Wiley & Sons.Google Scholar
  5. Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273-279.Google Scholar
  6. Dantzig, G. B. & Wolfe, P. (1960). Decomposition principle for linear programs. Operations Research, 8, 101-111.Google Scholar
  7. Delve. Data for evaluating learning in valid experiments. Scholar
  8. Gilmore, P. C. & Gomory, R. E. (1961). A linear programming approach to the cutting stock problem. Operations Research, 9, 849-859.Google Scholar
  9. Huber, P. J. (1964). Robust estimation of location parameter. Annals of Mathematical Statistics, 35, 73-101.Google Scholar
  10. Huber, P. J. (1981). Robust statistics. New York: John Wiley.Google Scholar
  11. ILOG CPLEX Division, Incline Village, Nevada. (1991). ILOG CPLEX 6.5 Reference Manual. 2000.Google Scholar
  12. Mangasarian, O. L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 135-146). Cambridge, MA: MIT Press. Scholar
  13. Mangasarian, O. L. & Meyer, R. R. (1979). Nonlinear perturbation of linear programs. SIAM Journal on Control and Optimization, 17:6, 745-752.Google Scholar
  14. MATLAB. (1994-2000). User's guide. The MathWorks, Inc., Natick, MA 01760. http:/ products/matlab/usersguide.shtmlGoogle Scholar
  15. MATLAB. (1997). Application program interface guide. The MathWorks, Inc., Natick, MA 01760.Google Scholar
  16. Murphy, P. M. & Aha, D. W. (1992). UCI repository of machine learning databases. MLRepository.htmlGoogle Scholar
  17. Schölkopf, B., Bartlett, P., Smola, A., & Williamson, R. (1998). Support vector regression with automatic accuracy control. In L. Niklasson, M. Boden, & T. Ziemke (Eds.). Proceedings of the Eight International Conference on Artificial Neural Networks (pp. 111-116) Berlin: Springer Verlag. Available at http://www.kernelmachines. org/publications.htmlGoogle Scholar
  18. Schölkopf, B., Bartlett, P., Smola, A., & Williamson, R. (1999). Shrinking the tube: A new support vector regression algorithm. In M. S. Kearns, S. A. Solla, & D. A. Cohn (Eds.). Advances in neural information processing systems (vol. 11, pp. 330-336). Cambridge, MA: MIT Press. Available at http://www.kernelmachines. org/publications.htmlGoogle Scholar
  19. Schölkopf, B., Burges, C., & Smola, A. (Eds.). (1999). Advances in kernel methods: Support vector machines. Cambridge, MA: MIT Press.Google Scholar
  20. Smola, A. J. (1998). Learning with kernels. Ph.D. Thesis, Technische Universität Berlin, Berlin, Germany.Google Scholar
  21. Smola, A., Schölkopf, B., & Rätsch, G. (1999). Linear programs for automatic accuracy control in regression. In Ninth International Conference on Artificial Neural Networks, Conference Publications No. 470 (pp. 575-580). London: IEE. Available at Scholar
  22. Street, W. N. & Mangasarian, O. L. (1998). Improved generalization via tolerant training. Journal of Optimization Theory and Applications, 96:2, 259-279. Scholar
  23. US Census Bureau. Adult dataset. Publicly available from Scholar
  24. Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.Google Scholar
  25. Weston, J., Gammerman, A., Stitson, M., Vapnik, V., Vovk, V., & Watkins, C. (1997). Support vector density estimation. In B. Schölkopf, C. Burnes, & A. Smola (Eds.). Advances in kernel methods: Support vector machines (pp. 293-306). Cambridge, MA: MIT Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • O.L. Mangasarian
    • 1
  • David R. Musicant
    • 2
  1. 1.Computer Sciences DepartmentUniversity of WisconsinMadisonUSA
  2. 2.Department of Mathematics and Computer ScienceCarleton CollegeNorthfieldUSA

Personalised recommendations