Abstract
We consider a bilevel optimization approach for parameter learning in nonsmooth variational models. Existing approaches solve this problem by applying implicit differentiation to a sufficiently smooth approximation of the nondifferentiable lower level problem. We propose an alternative method based on differentiating the iterations of a nonlinear primal–dual algorithm. Our method computes exact (sub)gradients and can be applied also in the nonsmooth setting. We show preliminary results for the case of multi-label image segmentation.
Keywords
- Neural Information Processing System
- Lower Level Problem
- Bilevel Optimization
- Dual Algorithm
- Bregman Distance
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Kunisch, K., Pock, T.: A bilevel optimization approach for parameter learning in variational models. SIAM Journal on Imaging Sciences 6(2), 938–983 (2013)
Reyes, J.C.D.L., Schönlieb, C.B.: Image denoising: Learning noise distribution via pde-constrained optimisation. Inverse Problems and Imaging 7, 1183–1214 (2013)
Samuel, K., Tappen, M.: Learning optimized MAP estimates in continuously-valued MRF models. In: International Conference on Computer Vision and Pattern Recognition (CVPR), 477–484 (2009)
Tappen, M., Samuel, K., Dean, C., Lyle, D.: The logistic random field-a convenient graphical model for learning parameters for MRF-based labeling. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Wainwright, M., Jaakkola, T., Willsky, A.: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches. IEEE Transactions on Information Theory 51, 3697–3717 (2002)
Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771–1800 (2002)
Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: a large margin approach. In: International Conference on Machine Learning (ICML), pp. 896–903 (2005)
LeCun, Y., Huang, F.: Loss functions for discriminative training of energy-based models. In: International Workshop on Artificial Intelligence and Statistics (2005)
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems (NIPS), pp. 2951–2959 (2012)
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011)
Eggensperger, K., Feurer, M., Hutter, F., Bergstra, J., Snoek, J., Hoos, H., Leyton-Brown, K.: Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In: NIPS Workshop (2013)
Ranftl, R., Pock, T.: A deep variational model for image segmentation. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 104–115. Springer, Heidelberg (2014)
Peyré, G., Fadili, J.: Learning analysis sparsity priors. In: Proceedings of Sampta (2011)
Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based MRFs for image restoration. In: Weickert, J., Hein, M., Schiele, B. (eds.) GCPR 2013. LNCS, vol. 8142, pp. 271–281. Springer, Heidelberg (2013)
Chen, Y., Ranftl, R., Pock, T.: Insights into analysis operator learning: From patch-based sparse models to higher order MRFs. IEEE Transactions on Image Processing 23(3), 1060–1072 (2014)
Tappen, M.: Utilizing variational optimization to learn MRFs. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Domke, J.: Generic methods for optimization-based modeling. In: International Workshop on Artificial Intelligence and Statistics, pp. 318–326 (2012)
Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 367–383 (1992)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Technical report (2014) (to appear)
Deledalle, C.A., Vaiter, S., Fadili, J., Peyré, G.: Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences 7(4), 2448–2487 (2014)
Foo, C.S., Do, C., Ng, A.: Efficient multiple hyperparameter learning for log-linear models. In: Advances in Neural Information Processing Systems (NIPS), pp. 377–384. Curran Associates, Inc. (2008)
Borenstein, E., Sharon, E., Ullman, S.: Combining top-down and bottom-up segmentation. In: International Conference on Computer Vision and Pattern Recognition Workshop (CVPR) (2004)
Ochs, P., Chen, Y., Brox, T., Pock, T.: ipiano: Inertial proximal algorithm for non-convex optimization. SIAM Journal on Imaging Sciences 7(2), 1388–1419 (2014)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Mathematical Programming 45(1), 503–528 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ochs, P., Ranftl, R., Brox, T., Pock, T. (2015). Bilevel Optimization with Nonsmooth Lower Level Problems. In: Aujol, JF., Nikolova, M., Papadakis, N. (eds) Scale Space and Variational Methods in Computer Vision. SSVM 2015. Lecture Notes in Computer Science(), vol 9087. Springer, Cham. https://doi.org/10.1007/978-3-319-18461-6_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-18461-6_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18460-9
Online ISBN: 978-3-319-18461-6
eBook Packages: Computer ScienceComputer Science (R0)