Skip to main content
Log in

Enforcing Geometrical Priors in Deep Networks for Semantic Segmentation Applied to Radiotherapy Planning

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript


Incorporating prior knowledge into a segmentation process, whether it is geometrical constraints such as volume penalisation, (partial) convexity enforcement, or topological prescriptions to preserve the contextual relations between objects, proves to improve accuracy in medical image segmentation, in particular when addressing the issue of weak boundary definition. Motivated by this observation, the proposed contribution aims to provide a unified variational framework including geometrical constraints in the training of convolutional neural networks in the form of a penalty in the loss function. These geometrical constraints take several forms and encompass level curve alignment through the integration of the weighted total variation, an area penalisation phrased as a hard constraint in the modelling, and an intensity homogeneity criterion based on a combination of the standard Dice loss with the piecewise constant Mumford–Shah model. The mathematical formulation yields a non-smooth non-convex optimisation problem, which rules out conventional smooth optimisation techniques and leads us to adopt a Lagrangian setting. The application falls within the scope of organ-at-risk segmentation in CT (computed tomography) images, in the context of radiotherapy planning. Experiments demonstrate that our method provides significant improvements (i) over existing non-constrained approaches both in terms of quantitative criteria, such as the measure of overlap, and qualitative assessment (spatial regularisation/coherency, fewer outliers), (ii) over in-layer constrained deep convolutional networks, and shows a certain degree of versatility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others


  1. The problem being separable with respect to the variable k, we omit the dependency in k from now on.


  1. Alexandrov, O., Santosa, F.: A topology-preserving level set method for shape optimization. J. Comput. Phys. 204(1), 121–130 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  2. Azé, D.: Éléments d’analyse convexe et variationnelle. Mathématiques pour le 2ème cycle. Ellipses (1997)

  3. Baldi, A.: Weighted BV functions. Houston J. Math. 27(3), 683–705 (2001)

    MathSciNet  MATH  Google Scholar 

  4. Bohlender, S., Oksuz, I., Mukhopadhyay, A.: A Survey on Shape-Constraint Deep Learning for Medical Image Segmentation (2021)

  5. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011)

    Article  MATH  Google Scholar 

  6. Bresson, X., Esedoḡlu, S., Vandergheynst, P., Thiran, J.P., Osher, S.: Fast global minimization of the active contour/snake model. J. Math. Imaging Vis. 28(2), 151–167 (2007)

    Article  MathSciNet  Google Scholar 

  7. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vis. 22(1), 61–87 (1993)

    Article  MATH  Google Scholar 

  8. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1), 89–97 (2004)

    MathSciNet  MATH  Google Scholar 

  9. Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. 159(1), 253–287 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chambolle, A., Tan, P., Vaiter, S.: Accelerated alternating descent methods for Dykstra-like problems. J. Math. Imaging Vis. 3(59), 481–497 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  12. Chen, X., Williams, B.M., Vallabhaneni, S.R., Czanner, G., Williams, R., Zheng, Y.: Learning active contour models for medical image segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11624–11632 (2019)

  13. Clough, J., Byrne, N., Oksuz, I., Zimmer, V.A., Schna-bel, J.A., King, A.: A topological loss function for deep-learning based image segmentation using persistent homology. IEEE Trans. Pattern Anal. Mach. Intell. 6, 66 (2020)

    Google Scholar 

  14. Combettes, P.L., Pesquet, J.C.: Proximal Splitting Methods in Signal Processing, pp. 185–212. Springer, New York (2011)

  15. Dolz, J., Ayed, I.B., Desrosiers, C.: Unbiased shape compactness for segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 755–763. Springer (2017)

  16. Ekeland, I., Témam, R.: Convex Analysis and Variational Problems. Society for Industrial and Applied Mathematics (1999)

  17. El Jurdi, R., Petitjean, C., Honeine, P., Cheplygina, V., Abdallah, F.: High-level prior-based loss functions for medical image segmentation: a survey. arXiv preprint arXiv:2011.08018 (2020)

  18. Fu, H., Xu, Y., Lin, S., Wong, D.W.K., Liu, J.: Deepvessel: retinal vessel segmentation via deep learning and conditional random field. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 132–139. Springer (2016)

  19. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  20. Ganaye, P.A., Sdika, M., Triggs, B., Benoit-Cattin, H.: Removing segmentation inconsistencies with semi-supervised non-adjacency constraint. Med. Image Anal. 58, 101551 (2019)

    Article  Google Scholar 

  21. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)

  22. Jia, F., Liu, J., Tai, X.C.: A regularized convolutional neural network for semantic image segmentation. Anal. Appl. 19(01), 147–165 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  23. Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017)

    Article  Google Scholar 

  24. Kass, M., Witkin, A.P., Terzopoulos, D.: Snakes: active contour models. J. Comput. Vis. 1(4), 321–331 (1988)

    Article  MATH  Google Scholar 

  25. Kervadec, H., Dolz, J., Tang, M., Granger, E., Boykov, Y., Ayed, I.B.: Constrained-CNN losses for weakly supervised segmentation. Med. Image Anal. 54, 88–99 (2019)

    Article  Google Scholar 

  26. Kim, B., Ye, J.C.: Mumford–Shah loss functional for image segmentation with deep learning. IEEE Trans. Image Process. 29, 1856–1866 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML’01, pp. 282–289. Morgan Kaufmann, San Francisco, CA, USA (2001)

  28. Lambert, Z., Le Guyader, C., Petitjean, C.: A geometrically-constrained deep network for CT image segmentation. In: IEEE International Symposium on Biomedical Imaging (ISBI) (2021)

  29. Lambert, Z., Petitjean, C., Dubray, B., Ruan, S.: SegTHOR: segmentation of thoracic organs at risk in CT images. In: 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6 (2020)

  30. Le Guyader, C., Vese, L.A.: Self-repelling snakes for topology-preserving segmentation models. IEEE Trans. Image Process. 17(5), 767–779 (2008)

    Article  MathSciNet  Google Scholar 

  31. Liu, J., Wang, X., Tai, X.C.: Deep Convolutional Neural Networks with Spatial Regularization, Volume and Star-Shape Priori for Image Segmentation (2020)

  32. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)

  33. Moisan, L.: How to discretize the total variation of an image? PAMM 7(1), 1041907–1041908 (2007)

    Article  Google Scholar 

  34. Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences 255, 2897–2899 (1962)

    MathSciNet  MATH  Google Scholar 

  35. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42(5), 577–685 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  36. Nosrati, M.S., Hamarneh, G.: Incorporating Prior Knowledge in Medical Image Segmentation: a Survey. CoRR arXiv:1607.01092 (2016)

  37. Peng, J., Kervadec, H., Dolz, J., Ben Ayed, I., Pedersoli, M., Desrosiers, C.: Discretely-constrained deep network for weakly supervised segmentation. Neural Netw. 130, 297–308 (2020)

    Article  Google Scholar 

  38. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  39. Rupprecht, C., Huaroc, E., Baust, M., Navab, N.: Deep active contours. CoRR arXiv:1607.05074 (2016)

  40. Ségonne, F.: Active contours under topology control-genus preserving level sets. Int. J. Comput. Vis. 79(2), 107–117 (2008)

    Article  MATH  Google Scholar 

  41. Siu, C.Y., Chan, H.L., Lui, L.M.: Image segmentation with partial convexity shape prior using discrete conformality structures. SIAM J. Imaging Sci. 13(4), 2105–2139 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  42. Taylor, G., Burmeister, R., Xu, Z., Singh, B., Patel, A., Goldstein, T.: Training neural networks without gradients: a scalable ADMM approach. In: Balcan, M.F., Weinberger, K.Q. (Eds.) Proceedings of the 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 2722–2731. PMLR (2016)

  43. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537 (2015)

Download references


This project was co-financed by the European Union with the European regional development fund (ERDF, 18P03390/18E01750/18P02733) and by the Hau-te-Normandie Régional Council via the M2SINUM project. The authors would like to thank the CRIANN (Centre Régio-nal Informatique et d’Applications Numériques de Normandie, France) for providing computational resources.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Zoé Lambert.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Proof of Theorem 2

The sets \({\mathcal {C}}_1\) and \({\mathcal {C}}_2\) are closed and convex, while the objective function is continuous and coercive. Indeed, \(\forall l \in \left\{ 1,\cdots ,L\right\} \),

$$\begin{aligned} \left\| \begin{array}{ccc} \dfrac{\mu }{2}\,\Vert u^l-s^l(\theta )-w^l\Vert ^2&{}\ge \dfrac{\mu }{4}\,\Vert u^l\Vert ^2-\dfrac{\mu }{2}\,\Vert s^l(\theta )+w^l\Vert ^2,\\ \dfrac{1}{2}\,\Vert u^l-v^l\Vert ^2&{}\ge \dfrac{1}{2}\,\Vert u^l\Vert ^2+\dfrac{1}{2}\,\Vert v^l\Vert ^2-\Vert u^l\Vert \,\Vert v^l\Vert . \end{array}\right. \end{aligned}$$

Using Young’s inequality with \(\epsilon \) (valid for \(\epsilon >0\)) and stated by \(ab\le \dfrac{a^2}{2\epsilon }+\dfrac{\epsilon b^2}{2}\), we get:

$$\begin{aligned} {\mathcal {J}}(u,v) \ge \left( \dfrac{\mu +2}{4}-\dfrac{1}{2\epsilon }\right) \,\displaystyle {\sum _{l=1}^{L}}\,\Vert u^l\Vert ^2-\dfrac{\mu }{2}\,\displaystyle {\sum _{l=1}^{L}}\,\Vert s^l(\theta )+w^l\Vert ^2\\ +\left( \dfrac{1}{2}-\dfrac{\epsilon }{2}\right) \,\displaystyle {\sum _{l=1}^{L}}\,\Vert v^l\Vert ^2, \end{aligned}$$

or equivalently,

$$\begin{aligned} {\mathcal {J}}(u,v) \ge \left( \dfrac{\mu +2}{4}-\dfrac{1}{2\epsilon }\right) \,\Vert u\Vert ^2-\dfrac{\mu }{2}\,\Vert s(\theta )+w\Vert ^2\\ +\left( \dfrac{1}{2}-\dfrac{\epsilon }{2}\right) \,\Vert v\Vert ^2. \end{aligned}$$

(To lighten the notations, when there is no ambiguity about the dimension of the mathematical objects we handle, we omit the lower index making this dimension explicit in the definition of the Euclidean norm as well as on the related scalar product.)

Taking \(\epsilon \) such that \(\dfrac{2}{\mu +2}<\epsilon <1\) yields the desired result.

To conclude, functional \({\mathcal {J}}\) is strictly convex, due to the strict convexity of functional \({\mathcal {H}}\) defined by \({\mathcal {H}}(u,v)=\dfrac{\mu }{2}\,\Vert u-s(\theta )-w\Vert ^2+\dfrac{1}{2}\,\Vert u-v\Vert ^2\), u denoting the concatenation of the \(u^l\)’s, similarly for \(s(\theta ), w\) and v. A straightforward computation gives that \(\forall (u_1,v_1)\in {\mathcal {C}}_1 \times {\mathcal {C}}_2\), \(\forall (u_2,v_2)\in {\mathcal {C}}_1 \times {\mathcal {C}}_2\),

$$\begin{aligned} \langle \nabla {\mathcal {H}}(u_1,v_1)-&\nabla {\mathcal {H}}(u_2,v_2),\begin{pmatrix}u_1-u_2\\ v_1-v_2 \end{pmatrix}\rangle \\&=\mu \,\Vert u_1-u_2\Vert ^2+\Vert (u_1-u_2)-(v_1-v_2)\Vert ^2, \end{aligned}$$

this quantity vanishing if and only if \(u_1=u_2\) and \(v_1=v_2\).

Proof of Theorem 3

For every \(p\in {\mathcal {B}}, (u,v) \mapsto {\mathcal {L}}(u,v,p)\) is strictly convex, owing to the strict convexity of functional \({\mathcal {H}}\). For each \(p\in {\mathcal {B}}\), functional \((u,v) \mapsto {\mathcal {L}}(u,v,p)\) is continuous and coercive. To establish such a coercivity inequality, denoting by \(\kappa =\Vert {\text{ div }}\Vert =\displaystyle {\sup _{\Vert p\Vert _{Y^L} \le 1}}\,\Vert {\text{ div }}\,p\Vert _{X^L}\), we first observe that with the convention \((p^1)^1_{0,j}=(p^1)^1_{N,j}=(p^1)^2_{i,0}=(p^1)^2_{i,N}=0\) (similarly for \(p^l\) with \(l\in \left\{ 2,\cdots ,L\right\} \)) and by applying twice the inequality \((a+b)^2\le 2(a^2+b^2)\),

$$\begin{aligned} \Vert {\text{ div }}\,p\Vert _{X^L}^2\le 8\,\Vert p\Vert ^2. \end{aligned}$$

Thus \(\kappa \le 2\sqrt{2}\) and with suitable \(\epsilon \),

$$\begin{aligned} {\mathcal {L}}(u,v,p)\ge&\left( \dfrac{\mu +2}{4}-\dfrac{1}{2\epsilon }\right) \,\Vert u\Vert ^2-\dfrac{\mu }{2}\,\Vert s(\theta )+w\Vert ^2\\&+\left( \dfrac{1}{2}-\dfrac{\epsilon }{2}\right) \,\Vert v\Vert ^2-\Vert {\text{ div }}\Vert \Vert p\Vert \Vert u\Vert ,\\ \ge&\left( \dfrac{\mu +2}{4}-\dfrac{1}{2\epsilon }\right) \,\Vert u\Vert ^2-\dfrac{\mu }{2}\,\Vert s(\theta )+w\Vert ^2\\&+\left( \dfrac{1}{2}-\dfrac{\epsilon }{2}\right) \,\Vert v\Vert ^2-2\sqrt{2L}N\Vert u\Vert , \end{aligned}$$

since \(p\in {\mathcal {B}}\), entailing that \(\Vert p\Vert _{Y^L}^2\le LN^2\).

Applying again Young’s inequality with \(\epsilon '>0\),

$$\begin{aligned} {\mathcal {L}}(u,v,p)\ge&\left( \dfrac{\mu +2}{4}-\dfrac{1}{2\epsilon }-\dfrac{\epsilon '}{2}\right) \,\Vert u\Vert ^2+\left( \dfrac{1}{2}-\dfrac{\epsilon }{2}\right) \,\Vert v\Vert ^2\\&-\dfrac{\mu }{2}\,\Vert s(\theta )+w\Vert ^2-\dfrac{4LN^2}{\epsilon '}. \end{aligned}$$

This latter inequality shows that by choosing \(\epsilon '\) suitable— which is always possible, the coercivity property is ensured. Also, the quantity \({\mathcal {L}}(u,v,p)\) is bounded below (independently of p) and one remarks that if one takes \({\tilde{u}}\) such that \({\tilde{u}}^1 \equiv 1\) and \(\forall l\in \left\{ 2,\cdots ,L\right\} , {\tilde{u}}^l\equiv 0\), and \({\tilde{v}}\) such that \(\forall l \in \left\{ 1,\cdots ,L\right\} \), \(\forall (i,j)\in {\mathcal {G}}\), \({\tilde{v}}_{i,j}^l=\dfrac{\alpha ^l}{N^2}\), \({\mathcal {L}}({\tilde{u}},{\tilde{v}},p)\) is independent of p, showing that the infimum is finite.

Then for each \(p\in {\mathcal {B}}\), functional \({\mathcal {L}}(\cdot ,\cdot ,p)\) is continuous, coercive and strictly convex, so it admits a unique minimiser in \({\mathcal {C}}_1 \times {\mathcal {C}}_2\)\({\mathcal {C}}_1\) and \({\mathcal {C}}_2\) being closed convex sets—denoted by \((e_1(p),e_2(p))\). We denote this minimum by f(p), i.e.

$$\begin{aligned} f(p)=\displaystyle {\min _{(u,v)\in {\mathcal {C}}_1\times {\mathcal {C}}_2}}\,{\mathcal {L}}(u,v,p)={\mathcal {L}}(e_1(p),e_2(p),p). \end{aligned}$$

Function \(p \mapsto f(p)\) is concave as the pointwise infimum of concave functions (\(\forall (u,v) \in {\mathcal {C}}_1 \times {\mathcal {C}}_2, p \mapsto {\mathcal {L}}(u,v,p)\) is concave since in fact linear). This can be proved using the hypograph of f. Also, f is upper semicontinuous as the pointwise infimum of continuous functions. It is therefore bounded above and attains its upper bound as the set \({\mathcal {B}}\) is compact at a point denoted by \({\bar{p}}\). Thus

$$\begin{aligned} f({\bar{p}})=\displaystyle {\max _{p\in {\mathcal {B}}}}\,f(p)=\displaystyle {\max _{p\in {\mathcal {B}}}}\,\displaystyle {\min _{(u,v)\in {\mathcal {C}}_1 \times {\mathcal {C}}_2}}\,{\mathcal {L}}(u,v,p). \end{aligned}$$

Additionally, as \(f({\bar{p}})=\displaystyle {\min _{(u,v)\in {\mathcal {C}}_1 \times {\mathcal {C}}_2}}\,{\mathcal {L}}(u,v,{\bar{p}})\), one has, \(\forall (u,v)\in {\mathcal {C}}_1 \times {\mathcal {C}}_2\),

$$\begin{aligned} f({\bar{p}}) \le {\mathcal {L}}(u,v,{\bar{p}}). \end{aligned}$$

By concavity of \({\mathcal {L}}\) with respect to the third argument (in fact, linearity), \(\forall (u,v)\in {\mathcal {C}}_1 \times {\mathcal {C}}_2, \forall p \in {\mathcal {B}}, \forall \lambda \in ]0,1[\),

$$\begin{aligned} {\mathcal {L}}(u,v,(1-\lambda )\,{\bar{p}}+\lambda p)=(1-\lambda )\,{\mathcal {L}}(u,v,{\bar{p}})+\lambda \,{\mathcal {L}}(u,v,{p}). \end{aligned}$$

Taking as particular value \((u,v)=(e_1((1-\lambda )\,{\bar{p}}+\lambda p),e_2((1-\lambda )\,{\bar{p}}+\lambda p))=(e_{\lambda }^1,e_{\lambda }^2)\), it yields, using again that \(f({\bar{p}})=\displaystyle {\max _{p\in {\mathcal {B}}}}\,f(p)\) and the concavity (even linearity) of \({\mathcal {L}}\) with respect to the third argument,

$$\begin{aligned} f({\bar{p}})&\ge f((1-\lambda )\,{\bar{p}}+\lambda p)={\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,(1-\lambda )\,{\bar{p}}+\lambda p),\\&\ge (1-\lambda )\,{\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,{\bar{p}})+\lambda \,{\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,p). \end{aligned}$$

As \({\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,{\bar{p}}) \ge f({\bar{p}})=\displaystyle {\min _{(u,v)\in {\mathcal {C}}_1 \times {\mathcal {C}}_2}}\,{\mathcal {L}}(u,v,{\bar{p}})\), the latter inequality implies that \(\forall p \in {\mathcal {B}}\),

$$\begin{aligned} f({\bar{p}}) \ge {\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,{p}). \end{aligned}$$

By virtue of the coercivity property established previously, one has, \(\forall p \in {\mathcal {B}}\) (parameters \(\epsilon \) and \(\epsilon '\) being suitably chosen),

$$\begin{aligned}&\scriptstyle {\left( \dfrac{\mu +2}{4}-\dfrac{1}{2\epsilon }-\dfrac{\epsilon '}{2}\right) \,\Vert e_{\lambda }^1\Vert ^2+\left( \dfrac{1}{2}-\dfrac{\epsilon }{2}\right) \,\Vert e_{\lambda }^2\Vert ^2-\dfrac{\mu }{2}\,\Vert s(\theta )+w\Vert ^2-\dfrac{4LN^2}{\epsilon '}}\\&\le {\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,(1-\lambda )\,{\bar{p}}+\lambda p)\le {\mathcal {L}}({\tilde{u}},{\tilde{v}},(1-\lambda )\,{\bar{p}}+\lambda p), \end{aligned}$$

with \({\tilde{u}}\in {\mathcal {C}}_1\) and \({\tilde{v}} \in {\mathcal {C}}_2\) as defined before, making the right-hand side independent of \(p, {\bar{p}}\) and \(\lambda \)—and constituting thus a uniform bound, and showing that \(e_{\lambda }^1\) is uniformly bounded (this was already known owing to the definition of \({\mathcal {C}}_1\)) as well as \(e_{\lambda }^2\). One can thus extract a subsequence (common extracting mapping) \(e_{\lambda _n}^1\) and \(e_{\lambda _n}^2\) with \(\lambda _n \underset{n \rightarrow +\infty }{\rightarrow } 0\) converging to some limits \({\bar{u}}\) and \({\bar{v}}\). We show next that \({\bar{u}}=e_1({\bar{p}})\) and \({\bar{v}}=e_2({\bar{p}})\).

As by definition, \({\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,(1-\lambda )\,{\bar{p}}+\lambda p)= \min _{(u,v)\in \displaystyle {{\mathcal {C}}_1\times {\mathcal {C}}_2}}\,{\mathcal {L}}(u,v,(1-\lambda )\,{\bar{p}}+\lambda p), \forall (u,v)\in {\mathcal {C}}_1\times {\mathcal {C}}_2\),

$$\begin{aligned}&{\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,(1-\lambda )\,{\bar{p}}+\lambda p)\\&\quad \le {\mathcal {L}}(u,v,(1-\lambda )\,{\bar{p}}+\lambda p), \end{aligned}$$

and by linearity of \({\mathcal {L}}\) with respect to the third argument, it follows that \(\forall (u,v)\in {\mathcal {C}}_1\times {\mathcal {C}}_2\),

$$\begin{aligned}&(1-\lambda )\,{\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,{\bar{p}})+\lambda \,{\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,p)\\&\quad \le {\mathcal {L}}(u,v,(1-\lambda )\,{\bar{p}}+\lambda p). \end{aligned}$$

The quantity \({\mathcal {L}}(e_{\lambda }^1,e_{\lambda }^2,p)\) is bounded below by f(p) so that passing to the limit in the previous inequality when \(\lambda _n\) tends to 0 yields, using the continuity of \({\mathcal {L}}\),

$$\begin{aligned} {\mathcal {L}}({\bar{u}},{\bar{v}},{\bar{p}})\le {\mathcal {L}}(u,v,{\bar{p}}), \end{aligned}$$

this being true for all \((u,v)\in {\mathcal {C}}_1\times {\mathcal {C}}_2\). By uniqueness of the minimiser of \(\displaystyle {\min _{(u,v)\in {\mathcal {C}}_1\times {\mathcal {C}}_2}}\,{\mathcal {L}}(u,v,{\bar{p}})\), we deduce that \(({\bar{u}},{\bar{v}})=(e_1({\bar{p}}),e_2({\bar{p}}))\).

At last, passing to the limit in (20) yields \({\mathcal {L}}({\bar{u}},{\bar{v}},p)\le f({\bar{p}}), \forall p \in {\mathcal {B}}\), which combines with (19) and the invocation of [16, Chapter VI, Proposition 1.3] enables one to conclude that \(({\bar{u}},{\bar{v}},{\bar{p}})\) is a saddle point of \({\mathcal {L}}\).

Comment on the Fact that Fixed Points of Algorithm 3 are Saddle Points of the Associated Lagrangian

First of all, one remarks that

$$\begin{aligned} {\hat{x}}&={{\,\mathrm{\mathrm{arg \, min}}\,}}_{x}\,g(x)+\langle \nabla f({\bar{x}})+K^{*}{\tilde{y}},x\rangle +\dfrac{1}{2\tau }\,\Vert x-{\bar{x}}\Vert ^2,\\&={{\,\mathrm{\mathrm{arg \, min}}\,}}_{x}\,g(x)+\dfrac{1}{2\tau }\,\Vert x-\left( {\bar{x}}-\tau \,\left( \nabla f({\bar{x}})+K^{*}{\tilde{y}}\right) \right) \Vert ^2,\\&={\text{ prox }}_{\tau \,g}({\bar{x}}-\tau \,(\nabla f({\bar{x}})+K^{*}{\tilde{y}})). \end{aligned}$$

The first step of the algorithm can thus be rephrased as

$$\begin{aligned} x^{n+1}={\text{ prox }}_{\tau \,g}(x^n-\tau \,(\nabla f(x^n)+K^{*}y^n)). \end{aligned}$$


$$\begin{aligned} {\hat{y}}&={{\,\mathrm{\mathrm{arg \, min}}\,}}_{y}\,h^{*}(y)-\langle K{\tilde{x}},y\rangle +\dfrac{1}{2\sigma }\,\Vert y-{\bar{y}}\Vert ^2,\\&={{\,\mathrm{\mathrm{arg \, min}}\,}}_{y}\,h^{*}(y)+\dfrac{1}{2\sigma }\,\Vert y-\left( {\bar{y}}+\sigma \,K{\tilde{x}}\right) \Vert ^2,\\&={\text{ prox }}_{\sigma \,h^{*}}({\bar{y}}+\sigma \,K{\tilde{x}}). \end{aligned}$$

The second step of the algorithm thus reads as

$$\begin{aligned} y^{n+1}={\text{ prox }}_{\sigma \,h^{*}}(y^n+\sigma \,K(2x^{n+1}-x^n)). \end{aligned}$$

Now, considering a fixed point \((x^{*},y^{*})\) of the algorithm and owing to the fact that

$$\begin{aligned} r={\text{ prox }}_f(s)\,\,&\Longleftrightarrow \,\,s-r\in \partial f(r)\,\,\\&\Longleftrightarrow \,\,\forall t,\,f(t)\ge f(r)+\langle s-r,t-r\rangle , \end{aligned}$$

the relation (21) gives that \(\forall x \in {\mathcal {X}}\),

$$\begin{aligned} g(x) \ge g(x^{*})-\langle x-x^{*},\nabla f(x^{*})+K^{*}y^{*}\rangle , \end{aligned}$$

while the relation (22) leads to \(\forall y \in {\mathcal {Y}}\),

$$\begin{aligned} h^{*}(y)\ge h^{*}(y^{*})+\langle y-y^{*},Kx^{*}\rangle . \end{aligned}$$

By summing both inequalities, it yields

$$\begin{aligned} g(x^{*})-h^{*}(y)-\langle x-x^{*},\nabla f(x^{*})\rangle -\langle Kx,y^{*}\rangle \\\le g(x)-h^{*}(y^{*})-\langle Kx^{*},y\rangle . \end{aligned}$$

But by convexity of \(f,f(x^{*})+\langle \nabla f(x^{*}),x-x^{*}\rangle \le f(x)\), which implies that

$$\begin{aligned}&g(x^{*})-h^{*}(y)+f(x^{*})+\langle Kx^{*},y\rangle \\&\le g(x)-h^{*}(y^{*})+\langle Kx,y^{*}\rangle +f(x). \end{aligned}$$

Then \(\forall x \in {\mathcal {X}}, \forall y \in {\mathcal {Y}}\), \({\mathcal {L}}(x^{*},y)\le {\mathcal {L}}(x,y^{*})\), showing that \((x^{*},y^{*})\) is a saddle point of the associated Lagrangian denoted by \({\mathcal {L}}\) here.

Proof of Theorem 4

We recall that the considered saddle-point structure reads as

figure k

and that the general iteration of the algorithm is given by

$$\begin{aligned} ({\hat{x}},{\hat{y}})=PD_{\tau ,\sigma }({\bar{x}},{\bar{y}},{\tilde{x}},{\tilde{y}}), \end{aligned}$$


$$\begin{aligned} \left\{ \begin{array}{ccc} {\hat{x}}&{}=&{}\displaystyle {{{\,\mathrm{\mathrm{arg \, min}}\,}}_{x}} \,f({\bar{x}})+\langle \nabla f({\bar{x}}),x-{\bar{x}}\rangle +g(x)+\langle Kx,{\tilde{y}}\rangle \\ &{}&{}+\dfrac{1}{\tau }\,D_x(x,{\bar{x}}),\\ {\hat{y}}&{}=&{}\displaystyle {{{\,\mathrm{\mathrm{arg \, min}}\,}}_{y}} \,h^{*}(y)-\langle K{\tilde{x}},y\rangle +\dfrac{1}{\sigma }\,D_y(y,{\bar{y}}). \end{array}\right. \end{aligned}$$

The inputs are thus the points \(({\bar{x}},{\bar{y}})\) as well as the intermediate points \(({\tilde{x}},{\tilde{y}})\), while the outputs are the generated points \(({\hat{x}},{\hat{y}})\).

Here \(D_x\) and \(D_y\) are Bregman proximity/distance functions (see [10, p. 256] for further details) chosen to be \(D_x(x,{\bar{x}})=\frac{1}{2}\,\Vert x-{\bar{x}}\Vert ^2\) (respectively, \(D_y(y,{\bar{y}})=\frac{1}{2}\,\Vert y-{\bar{y}}\Vert ^2\)) in our setting, this choice being the most common one. Also, from an algorithmic viewpoint, an iteration is applied with \({\bar{x}}=x^n\), \({\bar{y}}=y^n, {\tilde{x}}=2x^{n+1}-x^n\) and \({\tilde{y}}=y^n\). Lemma 1 from [10, Lemma 1, p. 257 with proof] states that, provided the previous general iteration holds, then for any \(x\in {\mathcal {X}}\) and for any \(y\in {\mathcal {Y}}\), one has

$$\begin{aligned}&{\mathcal {L}}({\hat{x}},y)-{\mathcal {L}}(x,{\hat{y}})\\&\quad \le \frac{1}{\tau }\,D_x(x,{\bar{x}})-\frac{1}{\tau }\,D_x(x,{\hat{x}})-\frac{1}{\tau }\,D_x({\hat{x}},{\bar{x}})+\frac{L_f}{2}\,\Vert {\hat{x}}-{\bar{x}}\Vert ^2\\&\qquad +\frac{1}{\sigma }\,D_y(y,{\bar{y}})-\frac{1}{\sigma }\,D_y(y,{\hat{y}}) -\frac{1}{\sigma }\,D_y({\hat{y}},{\bar{y}})\\&\qquad +\langle K(x-{\hat{x}}),{\tilde{y}}-{\hat{y}}\rangle -\langle K({\tilde{x}}-{\hat{x}}),y-{\hat{y}}\rangle . \end{aligned}$$

Applying [10, Lemma 1] with \({\hat{x}}=x^{n+1}, {\hat{y}}=y^{n+1}\) and \(D_x, D_y\) as defined above, gives that \(\forall x \in {\mathcal {X}}\) and \(\forall y \in {\mathcal {Y}}\), one has:

$$\begin{aligned}&{\mathcal {L}}(x^{n+1},y)-{\mathcal {L}}(x,y^{n+1}) \\&\quad \le \,\, \dfrac{1}{2\tau }\,\Vert x-x^n\Vert ^2-\dfrac{1}{2\tau }\,\Vert x-x^{n+1}\Vert ^2-\dfrac{1}{2\tau }\,\Vert x^{n+1}-x^n\Vert ^2\\&\qquad +\dfrac{L_f}{2}\,\Vert x^{n+1}-x^n\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y-y^n\Vert ^2-\dfrac{1}{2\sigma }\,\Vert y-y^{n+1}\Vert ^2\\&\qquad -\dfrac{1}{2\sigma }\,\Vert y^{n+1}-y^n\Vert ^2+\langle K(x-x^{n+1}),y^n-y^{n+1}\rangle \\&\qquad -\langle K(x^{n+1}-x^{n}),y-y^{n+1}\rangle . \end{aligned}$$

Remarking then that \(\left\{ \begin{array}{ccc}x-x^{n+1}&{}=&{}x-x^n+x^n-x^{n+1}\\ x^{n+1}-x^n&{}=&{}x^{n+1}-x+x-x^n \end{array}\right. \), and thus

$$\begin{aligned}&\langle K(x-x^{n+1}),y^n-y^{n+1}\rangle -\langle K(x^{n+1}-x^{n}),y-y^{n+1}\rangle \\&\quad =\langle K(x-x^{n}),y^n-y^{n+1}\rangle +\langle K(x^n-x^{n+1}),y^n-y^{n+1}\rangle \\&\qquad -\langle K(x^{n+1}-x),y-y^{n+1}\rangle -\langle K(x-x^n),y-y^{n+1}\rangle ,\\&\quad =-\langle K(x-x^n),y-y^n\rangle +\langle K(x^{n+1}-x^n),y^{n+1}-y^n\rangle \\&\qquad +\langle K(x-x^{n+1}),y-y^{n+1}\rangle , \end{aligned}$$

it yields

$$\begin{aligned}&{\mathcal {L}}(x^{n+1},y)-{\mathcal {L}}(x,y^{n+1}) \nonumber \\&\quad \le \left[ \dfrac{1}{2\tau }\,\Vert x-x^n\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y-y^n\Vert ^2-\langle K(x-x^n),y-y^n\rangle \right] \nonumber \\&\qquad -\scriptstyle {\left[ \dfrac{1}{2\tau }\,\Vert x-x^{n+1}\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y-y^{n+1}\Vert ^2-\langle K(x-x^{n+1}),y-y^{n+1}\rangle \right] }\nonumber \\&\qquad -\scriptstyle {\left[ \dfrac{1}{2\tau }\,\Vert x^{n+1}-x^{n}\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y^{n+1}-y^{n}\Vert ^2-\langle K(x^{n+1}-x^{n}),y^{n+1}-y^{n}\rangle \right. }\nonumber \\&\qquad \left. -\dfrac{L_f}{2}\,\Vert x^{n+1}-x^n\Vert ^2 \right] . \end{aligned}$$

Owing to hypothesis (H) according to which \(\left( \frac{1}{\tau }-L_f\right) \frac{1}{\sigma } >\Vert K\Vert ^2\), the quantities in brackets are non-negative. To show this, let us focus on the last quantity in brackets. The same reasoning applies to the two former ones. Using Cauchy–Schwarz inequality combined with Young’s inequality with parameter \(\varepsilon >0\), one has \(\langle K(x^{n+1}-x^n),y^{n+1}-y^n\rangle \le \Vert K\Vert \,\Vert x^{n+1}-x^n\Vert \,\Vert y^{n+1}-y^n\Vert \le \dfrac{\Vert K\Vert }{2\varepsilon }\Vert x^{n+1}-x^n\Vert ^2+\dfrac{\Vert K\Vert \varepsilon }{2}\Vert y^{n+1}-y^n\Vert ^2\), so that setting \(\varepsilon =\frac{1}{{\sqrt{\sigma \,\left( \frac{1}{\tau }-L_f\right) }}}\) leads to:

$$\begin{aligned}&\left( \dfrac{1}{2\tau }-\dfrac{L_f}{2}\right) \,\Vert x^{n+1}-x^n\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y^{n+1}-y^n\Vert ^2\\&\qquad -\langle K(x^{n+1}-x^n),y^{n+1}-y^n\rangle \\&\quad \ge \left( \dfrac{1}{2\tau }-\dfrac{L_f}{2}-{\sqrt{\sigma \,\left( \frac{1}{\tau }-L_f\right) }}\,\dfrac{\Vert K\Vert }{2}\right) \,\Vert x^{n+1}-x^n\Vert ^2 \\&\qquad +\left( \dfrac{1}{2\sigma }-\frac{\Vert K\Vert }{{2\sqrt{\sigma \,\left( \frac{1}{\tau }-L_f\right) }}}\right) \,\Vert y^{n+1}-y^n\Vert ^2. \end{aligned}$$

Hypothesis (H) enables one to conclude that the weights balancing \(\Vert x^{n+1}-x^n\Vert ^2\) and \(\Vert y^{n+1}-y^n\Vert ^2\) are positive, or equivalently that there exists \(\xi >0\) so that

$$\begin{aligned}&\left( \dfrac{1}{2\tau }-\dfrac{L_f}{2}\right) \,\Vert x^{n+1}-x^n\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y^{n+1}-y^n\Vert ^2\nonumber \\&\quad -\langle K(x^{n+1}-x^n),y^{n+1}-y^n\rangle \ge \xi \,\left( \Vert x^{n+1}-x^n\Vert ^2\right. \nonumber \\&\quad \left. +\,\Vert y^{n+1}-y^n\Vert ^2\right) . \end{aligned}$$

An immediate consequence is that equation (23) reduces to

$$\begin{aligned}&{\mathcal {L}}(x^{n+1},y)-{\mathcal {L}}(x,y^{n+1}) \nonumber \\&\quad \le \left[ \dfrac{1}{2\tau }\,\Vert x-x^n\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y-y^n\Vert ^2-\langle K(x-x^n),y-y^n\rangle \right] \nonumber \\&\qquad -\scriptstyle {\left[ \dfrac{1}{2\tau }\,\Vert x-x^{n+1}\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y-y^{n+1}\Vert ^2-\langle K(x-x^{n+1}),y-y^{n+1}\rangle \right] }. \end{aligned}$$

By taking as particular (xy) a saddle point \((x^{*},y^{*})\) of the Lagrangian \({\mathcal {L}}\) (whose existence is ensured in our case by Theorem 3), entailing, by definition of a saddle point, that \({\mathcal {L}}(x^{n+1},y^{*})-{\mathcal {L}}(x^{*},y^{n+1})\ge 0\), inequality (25) gives:

$$\begin{aligned}&\scriptstyle {\dfrac{1}{2\tau }\,\Vert x^{*}-x^{n+1}\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y^{*}-y^{n+1}\Vert ^2-\langle K(x^{*}-x^{n+1}),y^{*}-y^{n+1}\rangle }\\&\quad \scriptstyle {-\dfrac{1}{2\tau }\,\Vert x^{*}-x^n\Vert ^2-\dfrac{1}{2\sigma }\,\Vert y^{*}-y^n\Vert ^2+\langle K(x^{*}-x^n),y^{*}-y^n\rangle \le 0}. \end{aligned}$$

By summing from \(n=0\) to \(N-1\), one gets:

$$\begin{aligned}&\scriptstyle {\dfrac{1}{2\tau }\,\Vert x^{*}-x^{N}\Vert ^2-\dfrac{1}{2\tau }\,\Vert x^{*}-x^{0}\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y^{*}-y^{N}\Vert ^2-\dfrac{1}{2\sigma }\,\Vert y^{*}-y^{0}\Vert ^2}\\&\quad \scriptstyle {+\langle K(x^{*}-x^0),y^{*}-y^0\rangle -\langle K(x^{*}-x^{N}),y^{*}-y^{N}\rangle \le 0}, \end{aligned}$$

and using an estimate as in (24), it follows, still using Cauchy–Schwarz inequality and Young’s inequality, that

$$\begin{aligned}&\xi \,\left( \Vert x^{*}-x^{N}\Vert ^2+\Vert y^{*}-y^{N}\Vert ^2\right) \\&\quad \le \left( \dfrac{1}{2\tau }+\dfrac{\Vert K\Vert }{2}\right) \Vert x^{*}-x^{0}\Vert ^2+\left( \dfrac{1}{2\sigma }+\dfrac{\Vert K\Vert }{2}\right) \,\Vert y^{*}-y^{0}\Vert ^2, \end{aligned}$$

showing that the sequence \((x^n,y^n)\) is a bounded sequence. One can thus extract a subsequence \((x^{\Psi (n)},y^{\Psi (n)})\) that (strongly) converges to \((\hat{{\hat{x}}},\hat{{\hat{y}}})\) (since we work in finite dimension). Let us now come back to inequality (23). Proceeding as before with \((x,y)=(x^{*},y^{*})\) and summing the inequalities from \(n=0\) to \(N-1\) coupled with estimation (24) shows that

$$\begin{aligned}&\xi \,\left( \displaystyle {\sum _{n=0}^{N-1}}\,\Vert x^{n+1}-x^n\Vert ^2+\displaystyle {\sum _{n=0}^{N-1}}\,\Vert y^{n+1}-y^n\Vert ^2\right) \nonumber \\&\quad \le \dfrac{1}{2\tau }\,\Vert x^{*}-x^0\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y^{*}-y^0\Vert ^2-\langle K(x^{*}-x^0),y^{*}-y^0\rangle \nonumber \\&\qquad -\scriptstyle {\dfrac{1}{2\tau }\,\Vert x^{*}-x^N\Vert ^2-\dfrac{1}{2\sigma }\,\Vert y^{*}-y^N\Vert ^2+\langle K(x^{*}-x^N),y^{*}-y^N\rangle }, \end{aligned}$$

the latter line containing a negative quantity according again to hypothesis (H). Thus

$$\begin{aligned}&\xi \,\left( \displaystyle {\sum _{n=0}^{N-1}}\,\Vert x^{n+1}-x^n\Vert ^2+\displaystyle {\sum _{n=0}^{N-1}}\,\Vert y^{n+1}-y^n\Vert ^2\right) \nonumber \\&\quad \le \dfrac{1}{2\tau }\,\Vert x^{*}-x^0\Vert ^2+\dfrac{1}{2\sigma }\,\Vert y^{*}\\&\qquad -y^0\Vert ^2-\langle K(x^{*}-x^0),y^{*}-y^0\rangle . \end{aligned}$$

The sequence \(({\mathcal {S}}(N))_{N\in {\mathbb {N}}^{*}}\) with general term \({\mathcal {S}}(N)=\) \(\displaystyle {\sum _{n=0}^{N-1}}\,\Vert x^{n+1}-x^n\Vert ^2\) is thus increasing, bounded above, so it converges and \(\displaystyle {\lim _{n \rightarrow +\infty }}\,(x^{n+1}-x^n)=0\) (similarly, \(\displaystyle {\lim _{n \rightarrow +\infty }}\,(y^{n+1}-y^n)=0\)), implying that \((x^{\Psi (n)-1},y^{\Psi (n)-1})\) also converges to \((\hat{{\hat{x}}},\hat{{\hat{y}}})\) (take \(n:=\Psi (n)-1\) in the previous result), which is thus a fixed point of Algorithm 3 iteration, hence a saddle point of the Lagrangian \({\mathcal {L}}\) from Appendix Appendix C.

For the last time, we come back to inequality (25) with \((x,y)=(\hat{{\hat{x}}},\hat{{\hat{y}}})\) and sum the inequalities from \(n=\Psi (n)\) to \(N-1\) with \(N>\Psi (n)\). It yields

$$\begin{aligned}&\xi \,\left( \Vert \hat{{\hat{x}}}-x^N\Vert ^2+\Vert \hat{{\hat{y}}}-y^N\Vert ^2\right) \\&\quad \le \left( \dfrac{1}{2\tau }+\dfrac{\Vert K\Vert }{2}\right) \,\Vert \hat{{\hat{x}}}-x^{\Psi (n)}\Vert ^2\\&\qquad +\left( \dfrac{1}{2\sigma }+\dfrac{\Vert K\Vert }{2}\right) \,\Vert \hat{{\hat{y}}}-y^{\Psi (n)}\Vert ^2, \end{aligned}$$

which proves that \(x^N \rightarrow \hat{{\hat{x}}}\) and \(y^N \rightarrow \hat{{\hat{y}}}\) as N tends to \(+\infty \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lambert, Z., Le Guyader, C. & Petitjean, C. Enforcing Geometrical Priors in Deep Networks for Semantic Segmentation Applied to Radiotherapy Planning. J Math Imaging Vis 64, 892–915 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: