The Relaxed Online Maximum Margin Algorithm

Abstract

We describe a new incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the maximum-margin algorithm also satisfies this mistake bound; this is the first worst-case performance guarantee for this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm, but their generalization is much better. We show that a batch algorithm based on aggressive ROMMA converges to the fixed threshold SVM hypothesis.

References

  1. Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821-837.

    Google Scholar 

  2. Anthony, M. & Bartlett, P. L. (1999). Neural network learning: Theoretical foundations. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  3. Block, H. D. (1962). The perceptron: A model for brain functioning. Reviews of Modern Physics, 34, 123-135.

    Google Scholar 

  4. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Workshop on Computational Learning Theory (pp. 144-152).

  5. Burges, C. & Crisp, D. J. (1999). Uniqueness of the SVM solution. In Advances in neural information processing systems, 12.

  6. Campbell, C. & Cristianini, N. (1998). Simple learning algorithms for training support vector machines. Technical report, University of Bristol.

  7. Chapelle, O. & Vapnik,V. (1999). Model selection for support vector machines. In Advances in Neural Information Processing Systems.

  8. Cortes, C. & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20:3, 273-297.

    Google Scholar 

  9. Cristianini, N. & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  10. Fletcher, R. (1987). Practical methods of optimization. (2nd edn.). New York: John Wiley and Sons.

    Google Scholar 

  11. Freund, Y. & Schapire, R. E. (1998). Large margin classification using the perceptron algorithm. In Proceedings of the Eleventh Conference on Computational Learning Theory (pp. 209-217).

  12. Friedman, J. H. (1996). Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford, CA: Stanford University.

    Google Scholar 

  13. Friess, T. T., Cristianini, N., & Campbell, C. (1998). The kernel adatron algorithm: A fast and simple learning procedure for support vector machines. In Proceedings of the Fifteenth International Conference on Machine Learning.

  14. Gallant, S. I. (1986). Optimal linear discriminants. In Proceedings of the Eighth International Conference on Pattern Recognition. Paris, France (pp. 849-852).

  15. Gilbert, E. G. (1996). Minimizing the quadratic form on a convex set. SIAM J. Control, 4, 61-79.

    Google Scholar 

  16. Guo, Y., Bartlett, P. L., Shawe-Taylor, J., & Williamson, R. (1999). Covering numbers for support vector machines. In Proceedings of the 1999 Conference on Computational Learning Theory (pp. 267-277.)

  17. Helmbold, D. & Warmuth, M. K. (1995). On weak learning. Journal of Computer and System Sciences, 50, 551-573.

    Google Scholar 

  18. Hertz, J. A., Krogh, A., & Palmer, R. (1991). Introduction to the theory of neural computation. Redwood, CA: Addison-Wesley.

    Google Scholar 

  19. Joachims, T. (1998). Making large-scale support vector machines learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods: Support vector machines (pp. 169-184).

  20. Kaufman, L. (1998). Solving the quardratic programming problem arising in support vector classification. In B. Sch¨olkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods: Support vector machines.

  21. Kearns, M., Li, M., Pitt, L., & Valiant, L. G. (1987). On the learnability of Boolean formulae. In Proceedings of the 19th Annual Symposium on the Theory of Computation (pp. 285-295).

  22. Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (1999).Afast iterative nearest point algorithm for support vector machine classifier design. Technical report, Indian Institute of Science. TR-ISL-99-03.

  23. Klasner, N. & Simon, H. U. (1995). From noise-free to noise-tolerant and from on-line to batch learning. In Proceedings of the 1995 Conference on Computational Learning Theory (pp. 250-257).

  24. Knerr, S., Personnaz, L., & Dreyfus, G. (1990). Single-layer learning revisited: A stepwise procedure for building and training a neural network. In Fogelman-Soulie & Herault (Eds.). Neurocomputing: Algorithms, architectures and applications. NATO ASI: Springer.

    Google Scholar 

  25. Kowalczyk, A. (1999). Maximal margin perceptron. In A. Smola, P. Bartlett, B. Schölkopf, & O. Schuurmans (Eds.). Advances in large margin classifiers. Cambridge, MA: MIT Press.

    Google Scholar 

  26. LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Muller, U., Sackinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. In Proceedings of the Fifth International Conference on Artificial Neural Networks (pp. 53-60).

  27. Li, Y. (2000). Selective voting for perceptron-like online learning. In Proceedings of the 17th International Conference on Machine Learning (pp. 559-566).

  28. Littlestone, N. (1998). Learning quickly when irrelevant attributes abound: A new lenear-threshold algorithm. Machine Learning, 2, 285-318.

    Google Scholar 

  29. Littlestone, N. (1989a). From on-line to batch learning. In Proceedings of the SecondWorkshop on Computational Learning Theory (pp. 269-284).

  30. Littlestone, N. (1989b). Mistake-bounds and logarithmic linear-threshold learning algorithms. Ph.D. thesis, UC Santa Cruz.

  31. Minsky, M. & Papert, S. (1969). expanded edition 1988, Perceptrons. Cambridge, MA: MIT Press.

    Google Scholar 

  32. Mitchell, B. F., Dem'yanov, V. F., & Malozemov, V. N. (1974). Finding the point of a polyhedron closet to the origin. SIAM J. Control, 12, 19-26.

    Google Scholar 

  33. Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata (pp. 615-622).

  34. Opper, M. & Winther, O. (1999). Gaussian processes and SVM: Mean field results and leave-one-out. In Smola, Bartlett, Schölkopf, & Schuurmans (Eds.). Advances in large margin Classifiers. Cambridge, MA: MIT Press

    Google Scholar 

  35. Osuna, E., Freund R., & Girosi, F. (1997). An improved training algorithm for support vector machines. In J. Principle, L. Gile, N. Margan, & E. Wilson (Eds.). Neural networks for signal processing VII-Proceedings of the 1997 IEEE workshop (pp. 276-285).

  36. Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods: Support vector machines. Cambridge, MA: MIT Press.

    Google Scholar 

  37. Platt, J., Cristianini, N., & Shawe-Taylor, J. (1999). Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems, 12.

  38. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386-407.

    Google Scholar 

  39. Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms.Washington, D. C.: Spartan Books.

    Google Scholar 

  40. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the Margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:5, 1651-1686.

    Google Scholar 

  41. Shawe-Taylor, J., Bartlett, P., Williamson, R., & Ony, M. A. (1998). Structural risk minimization over datadependent hierarchies. IEEE Transactions on Information Theory, 44:5, 1926-1940.

    Google Scholar 

  42. Smola, A., Óvári, Z., & Williamson, R. (2000). Regularization with dot-product kernels. submitted to NIPS00.

  43. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.

    Google Scholar 

  44. Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.

    Google Scholar 

  45. Wahba, G. (1999). Support vector machines, reproducing kernel hilbert spaces and the randomized GACV. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.). Advances in kernel methods-Support vector learning (pp.69-88). Cambridge, MA: MIT Press.

    Google Scholar 

  46. Williams, C. K. I (1998). Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M. I. Jordan (Ed.). Learning and inference in graphical models. Dordrecht: Kluwer.

    Google Scholar 

  47. Williamson, R. C., Smola, A., & Scholkpof, B. (1998). Generalization bounds for regularization networks and support vector machines via entropy numbers of compact operators. IEEE Transactions on Information Theory.

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Li, Y., Long, P.M. The Relaxed Online Maximum Margin Algorithm. Machine Learning 46, 361–387 (2002). https://doi.org/10.1023/A:1012435301888

Download citation

  • online learning
  • large margin classifiers
  • perceptrons
  • support vector machines