Advertisement

Mathematical Programming

, Volume 155, Issue 1–2, pp 267–305 | Cite as

Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization

  • Saeed Ghadimi
  • Guanghui LanEmail author
  • Hongchao Zhang
Full Length Paper Series A

Abstract

This paper considers a class of constrained stochastic composite optimization problems whose objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a certain non-differentiable (but convex) component. In order to solve these problems, we propose a randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of stochastic samples allowed. The RSPG algorithm also employs a general distance function to allow taking advantage of the geometry of the feasible region. Complexity of this algorithm is established in a unified setting, which shows nearly optimal complexity of the algorithm for convex stochastic programming. A post-optimization phase is also proposed to significantly reduce the variance of the solutions returned by the algorithm. In addition, based on the RSPG algorithm, a stochastic gradient free algorithm, which only uses the stochastic zeroth-order information, has been also discussed. Some preliminary numerical results are also provided.

Keywords

Constrained stochastic programming Mini-batch of samples Stochastic approximation Nonconvex optimization  Stochastic programming First-order method Zeroth-order method 

Mathematics Subject Classification

90C25 90C06 90C22 49M37 

References

  1. 1.
    Andradóttir, S.: A review of simulation optimization techniques. In: Proceedings of the Winter Simulation Conference, pp. 151–158 (1998)Google Scholar
  2. 2.
    Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)zbMATHMathSciNetCrossRefGoogle Scholar
  3. 3.
    Bauschke, H., Borwein, J., Combettes, P.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)zbMATHMathSciNetCrossRefGoogle Scholar
  4. 4.
    Ben-Tal, A., Margalit, T., Nemirovski, A.S.: The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12, 79–108 (2001)zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    Bregman, L.: The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Phys. 7, 200–217 (1967)CrossRefGoogle Scholar
  6. 6.
    Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20(6), 2833–2852 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  7. 7.
    Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)zbMATHGoogle Scholar
  8. 8.
    Dang, C.D., Lan, G.: On the Convergence Properties of Non-Euclidean Extragradient Methods for Variational Inequalities with Generalized Monotone Operators, manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, April 2012. Available on http://www.optimization-online.org/
  9. 9.
    Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22, 674–701 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Duchi, J.C., Shalev-shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of the Twenty Third Annual Conference on Computational Learning Theory (2010)Google Scholar
  11. 11.
    Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. J. Am. Stat. Assoc., 385–394 (2005). Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithmsGoogle Scholar
  12. 12.
    Fu, M.: Gradient estimation. In: Henderson, S.G., Nelson, B.L. (eds). Handbooks in Operations Research and Management Science: Simulation. Elsevier, Amsterdam, pp. 575–616 (2008)Google Scholar
  13. 13.
    Fu, M.: Optimization for simulation: theory vs. practice. INFORMS J. Comput. 14, 192–215 (2002)zbMATHMathSciNetCrossRefGoogle Scholar
  14. 14.
    Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  15. 15.
    Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)zbMATHMathSciNetCrossRefGoogle Scholar
  16. 16.
    Ghadimi, S., Lan, G.: Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Optimization. Technical Report. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA (2013)Google Scholar
  17. 17.
    Glasserman, P.: Gradient Estimation Via Perturbation Analysis. Kluwer, Boston, MA (2003)Google Scholar
  18. 18.
    Juditsky, A., Nemirovski, A.S.: Large Deviations of Vector-Valued Martingales in \(2\)-Smooth Normed Spaces, manuscript, Georgia Institute of Technology, Atlanta, GA, E-print: www2.isye.gatech.edu/~nemirovs/LargeDevSubmitted.pdf (2008)
  19. 19.
    Juditsky, A., Nemirovski, A.S.: First-order methods for nonsmooth convex large-scale optimization. I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge, MA (2011)Google Scholar
  20. 20.
    Kelton, R.P.S.W.D., Sturrock, D.T.: Simulation with Arena, 4th edn. McGraw-Hill, New York (2007)Google Scholar
  21. 21.
    Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  22. 22.
    Lan, G., Nemirovski, A.S., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  23. 23.
    LÉcuyer, P.: A unified view of the IPA, SF, and LR gradient estimation techniques. Manag. Sci. 36(11), 1364–1383 (1990)Google Scholar
  24. 24.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, pp. 689–696 (2009)Google Scholar
  25. 25.
    Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. Proc. NIPS 12, 512–518 (1999)Google Scholar
  26. 26.
    Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)zbMATHMathSciNetCrossRefGoogle Scholar
  27. 27.
    Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley, Chichester, New York (1983)Google Scholar
  28. 28.
    Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston, MA (2004)CrossRefGoogle Scholar
  29. 29.
    Nesterov, Y.E.: Random Gradient-Free Minimization of Convex Functions, Technical Report. Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (2010)Google Scholar
  30. 30.
    Polyak, B.: New stochastic approximation type procedures. Automat. i Telemekh. 7, 98–107 (1990)MathSciNetGoogle Scholar
  31. 31.
    Polyak, B., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)zbMATHMathSciNetCrossRefGoogle Scholar
  32. 32.
    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)zbMATHMathSciNetCrossRefGoogle Scholar
  33. 33.
    Rockafellar, R.T., Wets, R.J.-B.: Variational analysis, ser. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer, Berlin (1998)Google Scholar
  34. 34.
    Rubinstein, R., Shapiro, A.: Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method. Wiley, New York (1993)zbMATHGoogle Scholar
  35. 35.
    Schmidt, M., Roux, N.L., Bach, F.: Minimizing Finite Sums with the Stochastic Average Gradient, Technical Report (2013)Google Scholar
  36. 36.
    Spall, J.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley, Hoboken, NJ (2003)CrossRefGoogle Scholar
  37. 37.
    Sra, S.: Scalable nonconvex inexact proximal splitting. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 530–538. Curran Associates, Inc. (2012)Google Scholar
  38. 38.
    Teboulle, M.: Convergence of proximal-like algorithms. SIAM J. Optim. 7, 1069–1083 (1997)zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2014

Authors and Affiliations

  1. 1.Department of Industrial and Systems EngineeringUniversity of FloridaGainesvilleUSA
  2. 2.Department of MathematicsLouisiana State UniversityBaton RougeUSA

Personalised recommendations