Skip to main content
Log in

A deterministic rescaled perceptron algorithm

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

The perceptron algorithm is a simple iterative procedure for finding a point in a convex cone \(F\subseteq \mathbb {R}^m\). At each iteration, the algorithm only involves a query to a separation oracle for \(F\) and a simple update on a trial solution. The perceptron algorithm is guaranteed to find a point in \(F\) after \(\mathcal O(1/\tau _F^2)\) iterations, where \(\tau _F\) is the width of the cone \(F\). We propose a version of the perceptron algorithm that includes a periodic rescaling of the ambient space. In contrast to the classical version, our rescaled version finds a point in \(F\) in \(\mathcal O(m^5 \log (1/\tau _F))\) perceptron updates. This result is inspired by and strengthens the previous work on randomized rescaling of the perceptron algorithm by Dunagan and Vempala (Math Program 114:101–114, 2006) and by Belloni et al. (Math Oper Res 34:621–641, 2009). In particular, our algorithm and its complexity analysis are simpler and shorter. Furthermore, our algorithm does not require randomization or deep separation oracles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agmon, S.: The relaxation method for linear inequalities. Can. J. Math. 6(3), 382–392 (1954)

    Article  MATH  MathSciNet  Google Scholar 

  2. Amaldi, E., Belotti, P., Hauser, R.: A randomized algorithm for the maxFS problem. In: IPCO, pp. 249–264 (2005)

  3. Amaldi, E., Hauser, R.: Boundedness theorems for the relaxation method. Math. Oper. Res. 30(4), 939–955 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  4. Ball, K.: An Elementary Introduction to Modern Convex Geometry. Flavors of Geometry, vol. 31, pp. 1–58. Cambridge University Press, Cambridge (1997)

  5. Bauschke, H.H., Borwein, J.M.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4, 27–67 (1997)

    MATH  MathSciNet  Google Scholar 

  6. Bauschke, H.H., Borwein, J.M., Lewis, A.: The method of cyclic projections for closed convex sets in Hilbert space. Contemp. Math. 204, 1–38 (1997)

    Article  MathSciNet  Google Scholar 

  7. Belloni, A., Freund, R., Vempala, S.: An efficient rescaled perceptron algorithm for conic systems. Math. Oper. Res. 34(3), 621–641 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  8. Betke, U.: Relaxation, new combinatorial and polynomial algorithms for the linear feasibility problem. Discrete Comput. Geom. 32, 317–338 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  9. Block, H.D.: The perceptron: a model for brain functioning. Rev. Mod. Phys. 34, 123–135 (1962)

    Article  MATH  MathSciNet  Google Scholar 

  10. Blum, A., Frieze, A., Kannan, R., Vempala, S.: A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica 22(1–2), 35–52 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  11. Chubanov, S.: A strongly polynomial algorithm for linear systems having a binary solution. Math. Program. 134, 533–570 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  12. Dunagan, J., Vempala, S.: A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program. 114(1), 101–114 (2006)

    Article  MathSciNet  Google Scholar 

  13. Fleming, W.: Functions of Several Variables. Springer, New York (1977)

  14. Freund, Y., Schapire, R.: Large margin classification using the perceptron algorithm. Mach. Learn. 37, 277–296 (1999)

    Article  MATH  Google Scholar 

  15. Gilpin, A., Peña, J., Sandholm, T.: First-order algorithm with \({\cal {O}}({\ln }(1/\epsilon ))\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games. Math. Program. 133, 279–298 (2012)

  16. Goffin, J.: The relaxation method for solving systems of linear inequalities. Math. Oper. Res. 5, 388–414 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  17. Goffin, J.: On the non-polynomiality of the relaxation method for systems of linear inequalities. Math. Program. 22, 93–103 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  18. Huber, G.: Gamma function derivation of \(n\)-sphere volumes. Am. Math. Mon. 89, 301–302 (1982)

    Article  Google Scholar 

  19. Motzkin, T.S., Schoenberg, I.J.: The relaxation method for linear inequalities. Can. J. Math. 6(3), 393–404 (1954)

    Article  MATH  MathSciNet  Google Scholar 

  20. Novikoff, A.B.J.: On convergence proofs on perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata, vol. XII, pp. 615–622 (1962)

  21. O’Donoghue, B., Candès, E.J.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. (2013) doi:10.1007/s10208-013-9150-3

  22. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Cornell Aeronaut. Lab. Psychol. Rev. 65(6), 386–408 (1958)

    MathSciNet  Google Scholar 

  23. Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127, 3–30 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  24. Soheili, N., Peña, J.: A smooth perceptron algorithm. SIAM J. Optim. 22(2), 728–737 (2012)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Peña.

Appendix: Proof of (7) and (8)

Appendix: Proof of (7) and (8)

Let \(v \in \text {int}(\mathbb {B}^{m-1})\) be given. The volumes \({\left| \left| \left| (\Psi \circ \Phi )'(v) \right| \right| \right| }\) and \({\left| \left| \left| \Phi '(v) \right| \right| \right| }\) are \(\sqrt{\det \left( (\Psi \circ \Phi )'(v)^\mathrm{T}(\Psi \circ \Phi )'(v)\right) }\) and \(\sqrt{\det \left( \Phi '(v)^\mathrm{T}\Phi '(v)\right) }\) respectively. Thus (7) and (8) are equivalent to

$$\begin{aligned} \det \left( (\Psi \circ \Phi )'(v)^\mathrm{T}(\Psi \circ \Phi )'(v)\right) = \dfrac{\alpha ^2}{(1-\Vert v\Vert ^2)\left( \alpha ^2 + (1-\alpha ^2) \Vert v\Vert ^2\right) ^m}, \end{aligned}$$
(15)

and

$$\begin{aligned} \det \left( \Phi '(v)^\mathrm{T}\Phi '(v)\right) = \dfrac{1}{1-\Vert v\Vert ^2}. \end{aligned}$$
(16)

We first prove (15). To simplify notation, put \(t^2 := \alpha ^2+\left( 1-\alpha ^2\right) \Vert v\Vert ^2\). Computing the partial derivatives of \((\Psi \circ \Phi )(v) = \frac{\left( v, \alpha \sqrt{1-\Vert v\Vert ^2}\right) }{\sqrt{\alpha ^2 + (1-\alpha ^2)\Vert v\Vert ^2}}\) we get

$$\begin{aligned} (\Psi \circ \Phi )'(v)&= \begin{bmatrix} \dfrac{\partial (\Psi \circ \Phi )(v)}{\partial v_1}&\cdots&\dfrac{\partial (\Psi \circ \Phi )(v)}{\partial v_{m-1}} \end{bmatrix} \\&= \begin{bmatrix} \dfrac{1}{t}I_{m-1} - \dfrac{(1-\alpha ^2)}{t^3}vv^\mathrm{T}\\ -\dfrac{\alpha }{t^3\sqrt{1-\Vert v\Vert ^2}}v^\mathrm{T}\end{bmatrix}, \end{aligned}$$

where \(I_{m-1}\) is the \((m-1)\times (m-1)\) identity matrix. Hence,

$$\begin{aligned} {(\Psi \circ \Phi )'(v) }^\mathrm{T}(\Psi \circ \Phi )'(v) = \dfrac{1}{t^2}I_{m-1} + F vv^\mathrm{T}, \end{aligned}$$

where

$$\begin{aligned} F&= -\dfrac{2(1-\alpha ^2)}{t^4} + \dfrac{(1-\alpha ^2)^2\Vert v\Vert ^2}{t^6} + \dfrac{\alpha ^2}{t^6(1-\Vert v\Vert ^2)} \\&= \frac{-2(1-\alpha ^2)t^2(1-\Vert v\Vert ^2) + (1-\alpha ^2)^2\Vert v\Vert ^2(1-\Vert v\Vert ^2)+\alpha ^2}{t^6(1-\Vert v\Vert ^2)} \\&= \frac{(1-\alpha ^2)(1-\Vert v\Vert ^2)(-2t^2 + (1-\alpha ^2)\Vert v\Vert ^2)+\alpha ^2}{t^6(1-\Vert v\Vert ^2)} \\&= \frac{(1-t^2)(-t^2-\alpha ^2)+\alpha ^2}{t^6(1-\Vert v\Vert ^2)} \\&= \frac{\alpha ^2-1+t^2}{t^4(1-\Vert v\Vert ^2)}. \end{aligned}$$

The fourth step above follows from \(t^2 = 1-(1-\alpha ^2)(1-\Vert v\Vert ^2) = \alpha ^2 + (1-\alpha ^2)\Vert v\Vert ^2\).

Hence we have

$$\begin{aligned} \det \left( {(\Psi \circ \Phi )'(v) }^\mathrm{T}(\Psi \circ \Phi )'(v)\right)&= \det \left( \dfrac{1}{t^{2}}\left( I_{m-1} + \dfrac{\alpha ^2-1+t^2}{t^2(1-\Vert v\Vert ^2)} vv^\mathrm{T}\right) \right) \\&= \dfrac{1}{t^{2(m-1)}} \det \left( I_{m-1} + \dfrac{\alpha ^2-1+t^2}{t^2(1-\Vert v\Vert ^2)} vv^\mathrm{T}\right) \\&= \frac{1}{t^{2(m-1)}} \left( 1+\frac{(\alpha ^2 - 1 +t^2)\Vert v\Vert ^2}{t^2(1-\Vert v\Vert ^2)}\right) \\&= \frac{t^2(1-\Vert v\Vert ^2) + (\alpha ^2 - 1 +t^2)\Vert v\Vert ^2}{t^{2m}(1-\Vert v\Vert ^2)} \\&= \frac{\alpha ^2}{(\alpha ^2+ \left( 1-\alpha ^2\right) \Vert v\Vert ^2)^{m}(1-\Vert v\Vert ^2)}. \end{aligned}$$

The last step follows because

$$\begin{aligned} t^2(1-\Vert v\Vert ^2) + (\alpha ^2 - 1 +t^2)\Vert v\Vert ^2&= t^2 - (1- \alpha ^2)\Vert v\Vert ^2 \\&= \alpha ^2+(1-\alpha ^2)\Vert v\Vert ^2 - (1- \alpha ^2)\Vert v\Vert ^2 \\&= \alpha ^2. \end{aligned}$$

The proof of (16) is similar but easier. Computing partial derivatives of \(\Phi (v) = (v,\sqrt{1-\Vert v\Vert ^2})\) we get

$$\begin{aligned} \Phi '(v) = \begin{bmatrix} I_{m-1}\\ \dfrac{1}{\sqrt{1-\Vert v\Vert ^2}}v^\mathrm{T}\end{bmatrix}. \end{aligned}$$

Hence

$$\begin{aligned} {\Phi '(v)}^\mathrm{T}\Phi '(v) = I_{m-1} + \dfrac{1}{1-\Vert v\Vert ^2} vv^\mathrm{T}. \end{aligned}$$

Therefore,

$$\begin{aligned} \det \left( {\Phi '(v)}^\mathrm{T}\Phi '(v)\right) = \det \left( I_{m-1} + \dfrac{1}{1-\Vert v\Vert ^2} vv^\mathrm{T}\right) = 1+\dfrac{\Vert v\Vert ^2}{1-\Vert v\Vert ^2} = \dfrac{1}{1-\Vert v\Vert ^2}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peña, J., Soheili, N. A deterministic rescaled perceptron algorithm. Math. Program. 155, 497–510 (2016). https://doi.org/10.1007/s10107-015-0860-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-015-0860-y

Keywords

Mathematics Subject Classification

Navigation