Machine Learning

, Volume 46, Issue 1–3, pp 271–290 | Cite as

Efficient SVM Regression Training with SMO

  • Gary William Flake
  • Steve Lawrence


The sequential minimal optimization algorithm (SMO) has been shown to be an effective method for training support vector machines (SVMs) on classification tasks defined on sparse data sets. SMO differs from most SVM algorithms in that it does not require a quadratic programming solver. In this work, we generalize SMO so that it can handle regression problems. However, one problem with SMO is that its rate of convergence slows down dramatically when data is non-sparse and when there are many support vectors in the solution—as is often the case in regression—because kernel function evaluations tend to dominate the runtime in this case. Moreover, caching kernel function outputs can easily degrade SMO's performance even more because SMO tends to access kernel function outputs in an unstructured manner. We address these problems with several modifications that enable caching to be effectively used with SMO. For regression problems, our modifications improve convergence time by over an order of magnitude.

support vector machines sequential minimal optimization regression caching quadratic programming optimization 


  1. Burges, C. (1998). A Tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 955-974.Google Scholar
  2. Friess, T., Cristianini, N., & Campbell, C. (1998). The Kernel-adatron: A fast and simple learning procedure for support vector machines. In J. Shavlik (Ed.), Proceedings of the Fifteenth International Conference on Machine Learning (pp. 188-196).Google Scholar
  3. Joachims, T. (1999). Making large-scale support vector machine learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods-Support vector learning (pp. 169-184). MIT Press.Google Scholar
  4. Keerthi, S. S., Shevade, S., Bhattacharyya, C., & Murthy K. R. K. (1999). Improvements to Platt's SMO algorithm for SVM classifier design. Technical Report CD-99-14, Dept. of Mechanical and Production Engineering, National University of Singapore.Google Scholar
  5. Mackey, M. C. & Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 2:4300, 287-289.Google Scholar
  6. Mangasarian, O. L. & Musicant, D. R. (1999) Successive overrelaxation for support vector machines. IEEE Transactions on Neural Networks, 10:5, 1032-1037.Google Scholar
  7. Mattera, D., Palmieri, F., & Haykin, S. (1999). An explicit algorithm for training support vector machines. IEEE Signal Processing Letters, 6:9, 243-245.Google Scholar
  8. Mukherjee, S., Osuna, E., & Girosi, F. (1997). Nonlinear prediction of chaotic time series using support vector machines. In Proc. of IEEE NNSP'97 (pp. 511-519).Google Scholar
  9. Müller, K., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1997). Predicting time series with support vector machines. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial neural networks-ICANN'97, Vol. 1327 of Springer Lecture Notes in Computer Science (pp. 999-1004). Berlin.Google Scholar
  10. Osuna, E., Freund, R., & Girosi, F. (1997). An improved training algorithm for support vector machines. In Proc. of IEEE NNSP'97.Google Scholar
  11. Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in Kernel methods-support vector learning. Cambridge, MA: MIT Press.Google Scholar
  12. Platt, J. (1999a). Private communication.Google Scholar
  13. Platt, J. (1999b). Using sparseness and analytic QP to speed training of support vector machines. In M. S. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in neural information processing systems 11. Cambridge, MA: MIT Press.Google Scholar
  14. Saunders, C., Stitson, M. O., Weston, J., Bottou, L., Schölkopf, B., & Smola, A. (1998). Support vector machine reference manual. Technical Report CSD-TR-98-03, Royal Holloway, University of London.Google Scholar
  15. Shevade, S. K., Keerthi, S. S., Bhattacharyya, C., & Murthy, K. R. K. (2000). Improvements to the SMO algorithms for SVM regression. IEEE Transactions on Neural Networks, 11:5, 1188-1193.Google Scholar
  16. Smola, A. & Schölkopf, B. (1998). A tutorial on support vector regression. Technical Report NC2-TR-1998-030, NeuroCOLT2.Google Scholar
  17. Takens, F. (1980). Detecting strange attractors in turbulence. In D. A. Rand & L. S. Young (Eds.), Dynamical systems and turbulence (pp. 366-381). New York: Spinger-Verlag.Google Scholar
  18. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer Verlag.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Gary William Flake
    • 1
  • Steve Lawrence
    • 1
  1. 1.NEC Research InstitutePrincetonUSA

Personalised recommendations