Journal of Mathematical Imaging and Vision

, Volume 56, Issue 2, pp 175–194 | Cite as

Techniques for Gradient-Based Bilevel Optimization with Non-smooth Lower Level Problems

  • Peter Ochs
  • René Ranftl
  • Thomas Brox
  • Thomas Pock


We propose techniques for approximating bilevel optimization problems with non-smooth lower level problems that can have a non-unique solution. To this end, we substitute the expression of a minimizer of the lower level minimization problem with an iterative algorithm that is guaranteed to converge to a minimizer of the problem. Using suitable non-linear proximal distance functions, the update mappings of such an iterative algorithm can be differentiable, notwithstanding the fact that the minimization problem is non-smooth.


Bilevel optimization Non-smooth lower level problem Bregman proximity function 


  1. 1.
    Al-Baali, M.: Descent property and global convergence of the Fletcher–Reeves method with inexact line search. IMA J. Numer. Anal. 5(1), 121–124 (1985)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Bennett, K., Kunapuli, G., Hu, J., Pang, J.S.: Bilevel optimization and machine learning. In: Computational Intelligence: Research Frontiers. Lecture Notes in Computer Science, vol. 5050, pp. 25–47. Springer, Berlin (2008)Google Scholar
  5. 5.
    Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Calatroni, L., Reyes, J., Schönlieb, C.B.: Dynamic sampling schemes for optimal noise learning under multiple nonsmooth constraints. ArXiv e-prints (2014). arXiv:1403.1278
  7. 7.
    Calatroni, L., Reyes, J., Schönlieb, C.B., Valkonen, T.: Bilevel approaches for learning of variational imaging models. ArXiv e-prints (2015). arXiv:1505.02120
  8. 8.
    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. (2015). doi:10.1007/s10107-015-0957-3
  10. 10.
    Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based MRFs for image restoration. In: German Conference on Pattern Recognition (GCPR). in Lecture Notes in Computer Science, vol. 8142, pp. 271–281. Springer, Berlin (2013)Google Scholar
  11. 11.
    Chen, Y., Ranftl, R., Pock, T.: Insights into analysis operator learning: from patch-based sparse models to higher order MRFs. IEEE Trans. Image Process. 23(3), 1060–1072 (2014)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Deledalle, C.A., Vaiter, S., Fadili, J., Peyré, G.: Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM J. Imaging Sci. 7(4), 2448–2487 (2014)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Dempe, S.: Annotated Bibliography on bilevel programming and mathematical programs with equilibrium constraints. Optimization 52(3), 333–359 (2003)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Dempe, S., Kalashnikov, V., Pérez-Valdés, G., Kalashnykova, N.: Bilevel Programming Problems. Energy Systems. Springer, Berlin (2015)MATHGoogle Scholar
  15. 15.
    Dempe, S., Zemkoho, A.: The generalized Mangasarian–Fromowitz constraint qualification and optimality conditions for bilevel programs. J. Optim. Theory Appl. 148(1), 46–68 (2010)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Domke, J.: Implicit differentiation by perturbation. In: Advances in Neural Information Processing Systems (NIPS), pp. 523–531 (2010)Google Scholar
  17. 17.
    Domke, J.: Generic methods for optimization-based modeling. In: International Workshop on Artificial Intelligence and Statistics, pp. 318–326 (2012)Google Scholar
  18. 18.
    Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton (1992)MATHGoogle Scholar
  19. 19.
    Fletcher, R., Reeves, C.: Function minimization by conjugate gradients. Comput. J. 7(2), 149–154 (1964)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: International Conference on Computer Vision (ICCV) (2009)Google Scholar
  21. 21.
    Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2008)CrossRefMATHGoogle Scholar
  22. 22.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  23. 23.
    Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational models. SIAM J. Imaging Sci. 6(2), 938–983 (2013)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Appl. Math. 16(6), 964–979 (1979)MathSciNetMATHGoogle Scholar
  25. 25.
    Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  27. 27.
    Moore, G.: Bilevel programming algorithms for machine learning model selection. Ph.D. thesis, Rensselaer Polytechnic Institute, Troy (2010)Google Scholar
  28. 28.
    Ochs, P.: Long term motion analysis for object level grouping and nonsmooth optimization methods. Ph.D. thesis, Albert–Ludwigs–Universität Freiburg, Freiburg im Breisgau (2015)Google Scholar
  29. 29.
    Ochs, P., Chen, Y., Brox, T., Pock, T.: ipiano: Inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Ochs, P., Ranftl, R., Brox, T., Pock, T.: Bilevel optimization with nonsmooth lower level problems. In: International Conference on Scale Space and Variational Methods in Computer Vision (SSVM) (2015)Google Scholar
  31. 31.
    Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Peyré, G., Fadili, J.: Learning analysis sparsity priors. In: Proceedings of Sampta (2011)Google Scholar
  33. 33.
    Ranftl, R., Pock, T.: A deep variational model for image segmentation. In: German Conference on Pattern Recognition (GCPR), pp. 107–118 (2014)Google Scholar
  34. 34.
    Reyes, J., Schönlieb, C.B., Valkonen, T.: The structure of optimal parameters for image restoration problems. ArXiv e-prints (2015). arXiv:1505.01953
  35. 35.
    Reyes, J.C.D.L., Schönlieb, C.B.: Image denoising: learning noise distribution via pde-constrained optimisation. Inverse Probl. Imaging 7, 1183–1214 (2013)MathSciNetCrossRefMATHGoogle Scholar
  36. 36.
    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)CrossRefMATHGoogle Scholar
  37. 37.
    Tappen, M.: Utilizing variational optimization to learn MRFs. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)Google Scholar
  38. 38.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetMATHGoogle Scholar
  39. 39.
    Vedaldi, A., Lenc, K.: MatConvNet—convolutional neural networks for MATLAB. In: Proceedings of the ACM International Conference on Multimedia (2015)Google Scholar
  40. 40.
    Zavriev, S., Kostyuk, F.: Heavy-ball method in nonconvex optimization problems. Comput. Math. Model. 4(4), 336–341 (1993)MathSciNetCrossRefMATHGoogle Scholar
  41. 41.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Peter Ochs
    • 1
    • 3
  • René Ranftl
    • 2
  • Thomas Brox
    • 3
  • Thomas Pock
    • 4
    • 5
  1. 1.Mathematical Image Analysis GroupUniversity of SaarlandSaarbrückenGermany
  2. 2.Visual Computing LabIntel LabsSanta ClaraUSA
  3. 3.Computer Vision GroupUniversity of FreiburgFreiburg im BreisgauGermany
  4. 4.Institute for Computer Graphics and VisionGraz University of TechnologyGrazAustria
  5. 5.Digital Safety & Security DepartmentAIT Austrian Institute of Technology GmbHViennaAustria

Personalised recommendations