Skip to main content
Log in

Techniques for Gradient-Based Bilevel Optimization with Non-smooth Lower Level Problems

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

We propose techniques for approximating bilevel optimization problems with non-smooth lower level problems that can have a non-unique solution. To this end, we substitute the expression of a minimizer of the lower level minimization problem with an iterative algorithm that is guaranteed to converge to a minimizer of the problem. Using suitable non-linear proximal distance functions, the update mappings of such an iterative algorithm can be differentiable, notwithstanding the fact that the minimization problem is non-smooth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Note that the bilevel problem as in (1) is not well-defined in this case. We discuss some details in Sect. 3.

  2. The classification in [14] applies to the optimistic bilevel problem.

  3. The associated proximity operator has a closed-form solution or the solution may be determined efficiently numerically.

  4. More generally, the concept of outer semi-continuity of the feasible set mapping is needed, otherwise a gradient-based method could converge to a non-feasible point.

  5. Note that we kept the order of the terms given by the chain rule, since for multi-dimensional problems the products are matrix products and are, in general, not commutative.

References

  1. Al-Baali, M.: Descent property and global convergence of the Fletcher–Reeves method with inexact line search. IMA J. Numer. Anal. 5(1), 121–124 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bennett, K., Kunapuli, G., Hu, J., Pang, J.S.: Bilevel optimization and machine learning. In: Computational Intelligence: Research Frontiers. Lecture Notes in Computer Science, vol. 5050, pp. 25–47. Springer, Berlin (2008)

  5. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  6. Calatroni, L., Reyes, J., Schönlieb, C.B.: Dynamic sampling schemes for optimal noise learning under multiple nonsmooth constraints. ArXiv e-prints (2014). arXiv:1403.1278

  7. Calatroni, L., Reyes, J., Schönlieb, C.B., Valkonen, T.: Bilevel approaches for learning of variational imaging models. ArXiv e-prints (2015). arXiv:1505.02120

  8. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. (2015). doi:10.1007/s10107-015-0957-3

  10. Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based MRFs for image restoration. In: German Conference on Pattern Recognition (GCPR). in Lecture Notes in Computer Science, vol. 8142, pp. 271–281. Springer, Berlin (2013)

  11. Chen, Y., Ranftl, R., Pock, T.: Insights into analysis operator learning: from patch-based sparse models to higher order MRFs. IEEE Trans. Image Process. 23(3), 1060–1072 (2014)

    Article  MathSciNet  Google Scholar 

  12. Deledalle, C.A., Vaiter, S., Fadili, J., Peyré, G.: Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM J. Imaging Sci. 7(4), 2448–2487 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dempe, S.: Annotated Bibliography on bilevel programming and mathematical programs with equilibrium constraints. Optimization 52(3), 333–359 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Dempe, S., Kalashnikov, V., Pérez-Valdés, G., Kalashnykova, N.: Bilevel Programming Problems. Energy Systems. Springer, Berlin (2015)

    MATH  Google Scholar 

  15. Dempe, S., Zemkoho, A.: The generalized Mangasarian–Fromowitz constraint qualification and optimality conditions for bilevel programs. J. Optim. Theory Appl. 148(1), 46–68 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Domke, J.: Implicit differentiation by perturbation. In: Advances in Neural Information Processing Systems (NIPS), pp. 523–531 (2010)

  17. Domke, J.: Generic methods for optimization-based modeling. In: International Workshop on Artificial Intelligence and Statistics, pp. 318–326 (2012)

  18. Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton (1992)

    MATH  Google Scholar 

  19. Fletcher, R., Reeves, C.: Function minimization by conjugate gradients. Comput. J. 7(2), 149–154 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  20. Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: International Conference on Computer Vision (ICCV) (2009)

  21. Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2008)

    Book  MATH  Google Scholar 

  22. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014)

  23. Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational models. SIAM J. Imaging Sci. 6(2), 938–983 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  24. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Appl. Math. 16(6), 964–979 (1979)

    MathSciNet  MATH  Google Scholar 

  25. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  26. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  27. Moore, G.: Bilevel programming algorithms for machine learning model selection. Ph.D. thesis, Rensselaer Polytechnic Institute, Troy (2010)

  28. Ochs, P.: Long term motion analysis for object level grouping and nonsmooth optimization methods. Ph.D. thesis, Albert–Ludwigs–Universität Freiburg, Freiburg im Breisgau (2015)

  29. Ochs, P., Chen, Y., Brox, T., Pock, T.: ipiano: Inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  30. Ochs, P., Ranftl, R., Brox, T., Pock, T.: Bilevel optimization with nonsmooth lower level problems. In: International Conference on Scale Space and Variational Methods in Computer Vision (SSVM) (2015)

  31. Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  32. Peyré, G., Fadili, J.: Learning analysis sparsity priors. In: Proceedings of Sampta (2011)

  33. Ranftl, R., Pock, T.: A deep variational model for image segmentation. In: German Conference on Pattern Recognition (GCPR), pp. 107–118 (2014)

  34. Reyes, J., Schönlieb, C.B., Valkonen, T.: The structure of optimal parameters for image restoration problems. ArXiv e-prints (2015). arXiv:1505.01953

  35. Reyes, J.C.D.L., Schönlieb, C.B.: Image denoising: learning noise distribution via pde-constrained optimisation. Inverse Probl. Imaging 7, 1183–1214 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  36. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  37. Tappen, M.: Utilizing variational optimization to learn MRFs. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

  38. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  39. Vedaldi, A., Lenc, K.: MatConvNet—convolutional neural networks for MATLAB. In: Proceedings of the ACM International Conference on Multimedia (2015)

  40. Zavriev, S., Kostyuk, F.: Heavy-ball method in nonconvex optimization problems. Comput. Math. Model. 4(4), 336–341 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  41. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: International Conference on Computer Vision (ICCV) (2015)

Download references

Acknowledgments

Peter Ochs and Thomas Brox acknowledge the support from the German Research Foundation (DFG Grant BR 3815/8-1). René Ranftl acknowledges the support from Intel Labs. Thomas Pock acknowledges the support from the Austrian science fund under the ANR-FWF project “Efficient algorithms for nonsmooth optimization in imaging”, No. I1148, and the FWF-START Project “Bilevel optimization for Computer Vision”, No. Y729.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Ochs.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ochs, P., Ranftl, R., Brox, T. et al. Techniques for Gradient-Based Bilevel Optimization with Non-smooth Lower Level Problems. J Math Imaging Vis 56, 175–194 (2016). https://doi.org/10.1007/s10851-016-0663-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-016-0663-7

Keywords

Navigation