Bilevel Optimization with Nonsmooth Lower Level Problems

  • Peter OchsEmail author
  • René Ranftl
  • Thomas Brox
  • Thomas Pock
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9087)


We consider a bilevel optimization approach for parameter learning in nonsmooth variational models. Existing approaches solve this problem by applying implicit differentiation to a sufficiently smooth approximation of the nondifferentiable lower level problem. We propose an alternative method based on differentiating the iterations of a nonlinear primal–dual algorithm. Our method computes exact (sub)gradients and can be applied also in the nonsmooth setting. We show preliminary results for the case of multi-label image segmentation.


Neural Information Processing System Lower Level Problem Bilevel Optimization Dual Algorithm Bregman Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational models. SIAM Journal on Imaging Sciences 6(2), 938–983 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Reyes, J.C.D.L., Schönlieb, C.B.: Image denoising: Learning noise distribution via pde-constrained optimisation. Inverse Problems and Imaging 7, 1183–1214 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Samuel, K., Tappen, M.: Learning optimized MAP estimates in continuously-valued MRF models. In: International Conference on Computer Vision and Pattern Recognition (CVPR), 477–484 (2009)Google Scholar
  4. 4.
    Tappen, M., Samuel, K., Dean, C., Lyle, D.: The logistic random field-a convenient graphical model for learning parameters for MRF-based labeling. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)Google Scholar
  5. 5.
    Wainwright, M., Jaakkola, T., Willsky, A.: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches. IEEE Transactions on Information Theory 51, 3697–3717 (2002)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771–1800 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: a large margin approach. In: International Conference on Machine Learning (ICML), pp. 896–903 (2005)Google Scholar
  8. 8.
    LeCun, Y., Huang, F.: Loss functions for discriminative training of energy-based models. In: International Workshop on Artificial Intelligence and Statistics (2005)Google Scholar
  9. 9.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems (NIPS), pp. 2951–2959 (2012)Google Scholar
  10. 10.
    Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  11. 11.
    Eggensperger, K., Feurer, M., Hutter, F., Bergstra, J., Snoek, J., Hoos, H., Leyton-Brown, K.: Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In: NIPS Workshop (2013)Google Scholar
  12. 12.
    Ranftl, R., Pock, T.: A deep variational model for image segmentation. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 104–115. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  13. 13.
    Peyré, G., Fadili, J.: Learning analysis sparsity priors. In: Proceedings of Sampta (2011)Google Scholar
  14. 14.
    Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based MRFs for image restoration. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 271–281. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  15. 15.
    Chen, Y., Ranftl, R., Pock, T.: Insights into analysis operator learning: From patch-based sparse models to higher order MRFs. IEEE Transactions on Image Processing 23(3), 1060–1072 (2014)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Tappen, M.: Utilizing variational optimization to learn MRFs. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)Google Scholar
  17. 17.
    Domke, J.: Generic methods for optimization-based modeling. In: International Workshop on Artificial Intelligence and Statistics, pp. 318–326 (2012)Google Scholar
  18. 18.
    Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 367–383 (1992)CrossRefGoogle Scholar
  19. 19.
    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(1), 120–145 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Technical report (2014) (to appear)Google Scholar
  21. 21.
    Deledalle, C.A., Vaiter, S., Fadili, J., Peyré, G.: Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences 7(4), 2448–2487 (2014)CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Foo, C.S., Do, C., Ng, A.: Efficient multiple hyperparameter learning for log-linear models. In: Advances in Neural Information Processing Systems (NIPS), pp. 377–384. Curran Associates, Inc. (2008)Google Scholar
  23. 23.
    Borenstein, E., Sharon, E., Ullman, S.: Combining top-down and bottom-up segmentation. In: International Conference on Computer Vision and Pattern Recognition Workshop (CVPR) (2004)Google Scholar
  24. 24.
    Ochs, P., Chen, Y., Brox, T., Pock, T.: ipiano: Inertial proximal algorithm for non-convex optimization. SIAM Journal on Imaging Sciences 7(2), 1388–1419 (2014)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Mathematical Programming 45(1), 503–528 (1989)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Peter Ochs
    • 1
    Email author
  • René Ranftl
    • 2
  • Thomas Brox
    • 1
  • Thomas Pock
    • 2
    • 3
  1. 1.Computer Vision GroupUniversity of FreiburgFreiburgGermany
  2. 2.Institute for Computer Graphics and VisionGraz University of TechnologyGrazAustria
  3. 3.Digital Safety and Security DepartmentAIT Austrian Institute of Technology GmbHViennaAustria

Personalised recommendations