Journal of the Indian Institute of Science

, Volume 99, Issue 2, pp 247–256 | Cite as

Gradient Methods for Non-convex Optimization

  • Prateek JainEmail author
Review Article


Non-convex optimization forms bedrock of most modern machine learning (ML) techniques such as deep learning. While non-convex optimization problems have been studied for the past several decades, ML-based problems have significantly different characteristics and requirements due to large datasets and high-dimensional parameter spaces along with the statistical nature of the problem. Over the last few years, there has been a flurry of activity in non-convex optimization for such ML problems. This article surveys a few of the foundational approaches in this domain.


Non-convex optimization Machine learning First-order methods SVRG 



  1. 1.
    Agarwal N, Allen-Zhu Z, Bullins B, Hazan E, Ma T (2017) Finding approximate local minima faster than gradient descent. In: Proceedings of the 49th annual ACM SIGACT symposium on theory of computing (STOC)Google Scholar
  2. 2.
    Allen-Zhu Z (2018) Natasha 2: faster non-convex optimization than SGD. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018. NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp 2680–2691.
  3. 3.
    Allen-Zhu Z, Hazan E (2016) Variance reduction for faster non-convex optimization. In: International conference on machine learning, pp 699–707Google Scholar
  4. 4.
    Anandkumar A, Ge R (2016) Efficient approaches for escaping higher order saddle points in non-convex optimization. In: Proceedings of the 29th conference on learning theory (COLT), pp 81–102Google Scholar
  5. 5.
    Bertsekas DP (2016) Nonlinear programming, 3rd edn. Athena Scientific, BelmontGoogle Scholar
  6. 6.
    Bhojanapalli S, Boumal N, Jain P, Netrapalli P (2018) Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form. In: Conference on learning theory, COLT 2018, Stockholm, Sweden, 6–9 July 2018, pp 3243–3270.
  7. 7.
    Carmon Y, Duchi JC (2016) Gradient descent efficiently finds the cubic-regularized non-convex Newton step. CoRR. arXiv:1612.00547
  8. 8.
    Daneshmand H, Kohler J, Lucchi A, Hofmann T (2018) Escaping saddles with stochastic gradients. arXiv preprint. arXiv:1803.05999
  9. 9.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR09Google Scholar
  10. 10.
    Ge R, Huang F, Jin C, Yuan Y (2015) Escaping from saddle points—online stochastic gradient for tensor decomposition. In: Proceedings of the 28th conference on learning theory (COLT), pp 797–842Google Scholar
  11. 11.
    Ge R, Jin C, Zheng Y (2017) No spurious local minima in nonconvex low rank problems: a unified geometric analysis. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp 1233–1242.
  12. 12.
    Ghadimi S, Lan G (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J Optim 23(4):2341–2368CrossRefGoogle Scholar
  13. 13.
    Ghadimi S, Lan G (2016) Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math Program 156(1–2):59–99CrossRefGoogle Scholar
  14. 14.
    Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math Program 155(1–2):267–305CrossRefGoogle Scholar
  15. 15.
    Jain P, Kar P et al (2017) Non-convex optimization for machine learning. Found Trends® Mach Learn 10(3–4):142–336CrossRefGoogle Scholar
  16. 16.
    Jain P, Tewari A, Kar P (2014) On iterative hard thresholding methods for high-dimensional M-estimation. In: Proceedings of the 28th annual conference on neural information processing systems (NIPS)Google Scholar
  17. 17.
    Jin C, Ge R, Netrapalli P, Kakade SM, Jordan MI (2017) How to escape saddle points efficiently. In: Proceedings of the 34th international conference on machine learning (ICML), pp 1724–1732Google Scholar
  18. 18.
    Jin C, Netrapalli P, Jordan MI (2017) Accelerated gradient descent escapes saddle points faster than gradient descent. CoRR. arXiv:1711.10456
  19. 19.
    Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer-Academic, DordrechtGoogle Scholar
  20. 20.
    Nesterov Y, Polyak B (2006) Cubic regularization of Newton method and its global performance. Math Program 108(1):177–205CrossRefGoogle Scholar
  21. 21.
    Nesterov YE (1983) A method for unconstrained convex minimization problem with the rate of convergence \({O}(1/k^2)\). Doklady AN SSSR 269Google Scholar
  22. 22.
    Netrapalli P (2019) Stochastic gradient descent and it’s variants in machine learningGoogle Scholar
  23. 23.
    Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys 4Google Scholar
  24. 24.
    Reddi SJ, Sra S, Poczos B, Smola AJ (2016) Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29. Curran Associates, Inc., New York, pp 1145–1153.
  25. 25.
    Reddi SJ, Hefny A, Sra S, Poczos B, Smola A (2016) Stochastic variance reduction for nonconvex optimization. In: International conference on machine learning, pp 314–323Google Scholar
  26. 26.
    Sagun L, Evci U, Guney VU, Dauphin Y, Bottou L (2017) Empirical analysis of the Hessian of over-parametrized neural networks. arXiv preprint. arXiv:1706.04454
  27. 27.
    Tripuraneni N, Stern M, Jin C, Regier J, Jordan MI (2018) Stochastic cubic regularization for fast nonconvex optimization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. Curran Associates, Inc., New York, pp 2904–2913.

Copyright information

© Indian Institute of Science 2019

Authors and Affiliations

  1. 1.Microsoft ResearchBengaluruIndia

Personalised recommendations