Skip to main content
Log in

Image Labeling by Assignment

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

We introduce a novel geometric approach to the image labeling problem. Abstracting from specific labeling applications, a general objective function is defined on a manifold of stochastic matrices, whose elements assign prior data that are given in any metric space, to observed image measurements. The corresponding Riemannian gradient flow entails a set of replicator equations, one for each data point, that are spatially coupled by geometric averaging on the manifold. Starting from uniform assignments at the barycenter as natural initialization, the flow terminates at some global maximum, each of which corresponds to an image labeling that uniquely assigns the prior data. Our geometric variational approach constitutes a smooth non-convex inner approximation of the general image labeling problem, implemented with sparse interior-point numerics in terms of parallel multiplicative updates that converge efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. For locations i close to the boundary of the image domain where patch supports \({\mathscr {N}}_{p}(i)\) shrink, the definition of the vector \(w^{p}\) has to be adapted accordingly.

References

  1. Amari, S.I., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Oxford University Press, Oxford (2000)

    MATH  Google Scholar 

  2. Aujol, J.F., Gilboa, G., Chan, T., Osher, S.: Structure-texture image decomposition-modeling, algorithms, and parameter selection. Int. J. Comput. Vis. 67(1), 111–136 (2006)

    Article  MATH  Google Scholar 

  3. Ball, K.: An elementary introduction to modern convex geometry. In: Levy, S. (ed.) Flavors of Geometry, MSRI Publ., vol. 31, pp. 1–58. Cambridge University Press (1997)

  4. Bayer, D., Lagarias, J.: The nonlinear geometry of linear programming. I. Affine and projective scaling trajectories. Trans. Am. Math. Soc. 314(2), 499–526 (1989)

    MathSciNet  MATH  Google Scholar 

  5. Bayer, D., Lagarias, J.: The nonlinear geometry of linear programming. II. Legendre transform coordinates and central trajectories. Trans. Am. Math. Soc. 314(2), 527–581 (1989)

    MathSciNet  MATH  Google Scholar 

  6. Bishop, C.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)

    MATH  Google Scholar 

  7. Bomze, I., Budinich, M., Pelillo, M., Rossi, C.: Annealed replication: a new heuristic for the maximum clique problem. Discr. Appl. Math. 121, 27–49 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bomze, I.M.: Regularity versus degeneracy in dynamics, games, and optimization: a unified approach to different aspects. SIAM Rev. 44(3), 394–414 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  9. Buades, A., Coll, B., Morel, J.: A review of image denoising algorithms, with a new one. SIAM Multiscale Model. Simul. 4(2), 490–530 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  10. Buades, A., Coll, B., Morel, J.M.: Neighborhood filters and PDEs. Numer. Math. 105, 1–34 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cabrales, A., Sobel, J.: On the limit points of discrete selection dynamics. J. Econ. Theory 57, 407–419 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  12. Čencov, N.: Statistical Decision Rules and Optimal Inference. American Mathematical Society, Providence (1982)

    Google Scholar 

  13. Chambolle, A., Cremers, D., Pock, T.: A convex approach to minimal partitions. SIAM J. Imaging Sci. 5(4), 1113–1158 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chan, T., Esedoglu, S., Nikolova, M.: Algorithms for Finding Global Minimizers of Image Segmentation and Denoising Models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hérault, L., Horaud, R.: Figure-ground discrimination: a combinatorial optimization approach. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 899–914 (1993)

    Article  Google Scholar 

  16. Heskes, T.: Convexity arguments for efficient minimization of the Bethe and Kikuchi free energies. J. Artif. Intell. Res. 26, 153–190 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Hofbauer, J., Siegmund, K.: Evolutionary game dynamics. Bull. Am. Math. Soc. 40(4), 479–519 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Hofman, T., Buhmann, J.: Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 1–14 (1997)

    Article  Google Scholar 

  19. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 3rd edn. Springer, Berlin (1996)

    Book  MATH  Google Scholar 

  20. Hummel, R., Zucker, S.: On the foundations of the relaxation labeling processes. IEEE Trans. Pattern Anal. Mach. Intell. 5(3), 267–287 (1983)

    Article  MATH  Google Scholar 

  21. Jost, J.: Riemannian Geometry and Geometric Analysis, 4th edn. Springer, Berlin (2005)

    MATH  Google Scholar 

  22. Kappes, J., Andres, B., Hamprecht, F., Schnörr, C., Nowozin, S., Batra, D., Kim, S., Kausler, B., Kröger, T., Lellmann, J., Komodakis, N., Savchynskyy, B., Rother, C.: A comparative study of modern inference techniques for structured discrete energy minimization problems. Int. J. Comput. Vis. 115(2), 155–184 (2015)

    Article  MathSciNet  Google Scholar 

  23. Kappes, J., Savchynskyy, B., Schnörr, C.: A bundle approach to efficient MAP-inference by Lagrangian relaxation. In: Proc. CVPR (2012)

  24. Kappes, J., Schnörr, C.: MAP-inference for highly-connected graphs with DC-programming. In: Pattern Recognition—30th DAGM Symposium, LNCS, vol. 5096, pp. 1–10. Springer (2008)

  25. Karcher, H.: Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 30, 509–541 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  26. Karcher, H.: Riemannian center of mass and so called karcher mean. arxiv:1407.2087 (2014)

  27. Kass, R.: The geometry of asymptotic inference. Stat. Sci. 4(3), 188–234 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  28. Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)

    Article  Google Scholar 

  29. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)

    Article  MATH  Google Scholar 

  30. Ledoux, M.: The Concentration of Measure Phenomenon. American Mathematical Society, Providence (2001)

    MATH  Google Scholar 

  31. Lellmann, J., Lenzen, F., Schnörr, C.: Optimality bounds for a variational relaxation of the image partitioning problem. J. Math. Imaging Vis. 47(3), 239–257 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  32. Lellmann, J., Schnörr, C.: Continuous multiclass labeling approaches and algorithms. SIAM J. Imaging Sci. 4(4), 1049–1096 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  33. Losert, V., Alin, E.: Dynamics of games and genes: discrete versus continuous time. J. Math. Biol. 17(2), 241–251 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  34. Luce, R.: Individual Choice Behavior: A Theoretical Analysis. Wiley, New York (1959)

    MATH  Google Scholar 

  35. Milanfar, P.: A tour of modern image filtering. IEEE Signal Process. Mag. 30(1), 106–128 (2013)

    Article  Google Scholar 

  36. Milanfar, P.: Symmetrizing smoothing filters. SIAM J. Imaging Sci. 6(1), 263–284 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  37. Montúfar, G., Rauh, J., Ay, N.: On the fisher metric of conditional probability polytopes. Entropy 16(6), 3207–3233 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  38. Nesterov, Y., Todd, M.: On the riemannian geometry defined by self-concordant barriers and interior-point methods. Found. Comput. Math. 2, 333–361 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  39. Orland, H.: Mean-field theory for optimization problems. J. Phys. Lett. 46(17), 763–770 (1985)

    Article  Google Scholar 

  40. Pavan, M., Pelillo, M.: Dominant sets and pairwise clustering. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 167–172 (2007)

    Article  Google Scholar 

  41. Pelillo, M.: The dynamics of nonlinear relaxation labeling processes. J. Math. Imaging Vis. 7, 309–323 (1997)

    Article  MathSciNet  Google Scholar 

  42. Pelillo, M.: Replicator equations, maximal cliques, and graph isomorphism. Neural Comput. 11(8), 1933–1955 (1999)

    Article  Google Scholar 

  43. Rosenfeld, A., Hummel, R., Zucker, S.: Scene labeling by relaxation operations. IEEE Trans. Syst. Man Cybern. 6, 420–433 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  44. Singer, A., Shkolnisky, Y., Nadler, B.: Diffusion interpretation of non-local neighborhood filters for signal denoising. SIAM J. Imaging Sci. 2(1), 118–139 (2009)

  45. Sutton, R., Barto, A.: Reinforcement Learning, 2nd edn. MIT Press, Cambridge (1999)

    Google Scholar 

  46. Swoboda, P., Shekhovtsov, A., Kappes, J., Schnörr, C., Savchynskyy, B.: Partial optimality by pruning for MAP-inference with general graphical models. IEEE Trans. Patt. Anal. Mach. Intell. 38(7), 1370–1382 (2016)

  47. Wainwright, M., Jordan, M.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)

    MATH  Google Scholar 

  48. Weickert, J.: Anisotropic Diffusion in Image Processing. B.G Teubner, Leipzig (1998)

    MATH  Google Scholar 

  49. Werner, T.: A linear programming approach to max-sum problem: a review. IEEE Trans. Pattern Anal. Mach. Intell. 29(7), 1165–1179 (2007)

    Article  Google Scholar 

  50. Yedidia, J., Freeman, W., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory 51(7), 2282–2312 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Support by the German Research Foundation (DFG) was gratefully acknowledged, Grant GRK 1653.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christoph Schnörr.

Appendices

Appendix 1: Basic Notation

For \(n \in {\mathbb {N}}\), we set \([n] = \{1,2,\ldots ,n\}\). \({\mathbbm {1}}= (1,1,\ldots ,1)^{\top }\) denotes the vector with all components equal to 1, whose dimension can either be inferred from the context or is indicated by a subscript, e.g., \({\mathbbm {1}}_{n}\). Vectors \(v^{1}, v^{2},\ldots \) are indexed by lowercase letters and superscripts, whereas subscripts \(v_{i},\, i \in [n]\), index vector components. \(e^{1},\ldots ,e^{n}\) denotes the canonical orthonormal basis of \(\mathbb {R}^{n}\).

We assume data to be indexed by a graph \({\mathscr {G}}=({\mathscr {V}},{\mathscr {E}})\) with nodes \(i \in {\mathscr {V}}=[m]\) and associated locations \(x^{i} \in \mathbb {R}^{d}\), and with edges \({\mathscr {E}}\). A regular grid graph and \(d=2\) is the canonical example. But \({\mathscr {G}}\) may also be irregular due to some preprocessing like forming super-pixels, for instance, or correspond to 3D images or videos (\(d=3\)). For simplicity, we call i location although this actually is \(x^{i}\).

If \(A \in \mathbb {R}^{m \times n}\), then the row and column vectors are denoted by \(A_{i} \in \mathbb {R}^{n},\, i \in [m]\) and \(A^{j} \in \mathbb {R}^{m},\, j \in [n]\), respectively, and the entries by \(A_{ij}\). This notation of row vectors \(A_{i}\) is the only exception from our rule of indexing vectors stated above.

The componentwise application of functions \(f :\mathbb {R}\rightarrow \mathbb {R}\) to a vector is simply denoted by f(v), e.g., 

$$\begin{aligned} \forall v \in \mathbb {R}^{n},\qquad \sqrt{v}&:= (\sqrt{v_{1}},\ldots ,\sqrt{v_{n}})^{\top }, \end{aligned}$$
(6.1a)
$$\begin{aligned} \exp (v)&:= \big (e^{v_{1}},\ldots ,e^{v_{n}}\big )^{\top } \quad \text {etc.} \end{aligned}$$
(6.1b)

Likewise, binary relations between vectors apply componentwise, e.g., \(u \ge v \;\Leftrightarrow \; u_{i} \ge v_{i},\; i \in [n]\), and binary componentwise operations are simply written in terms of the vectors. For example,

$$\begin{aligned} p q := (\ldots , p_{i} q_{i},\ldots )^{\top },\qquad \frac{p}{q} := \Big (\ldots ,\frac{p_{i}}{q_{i}},\ldots \Big )^{\top }, \end{aligned}$$
(6.2)

where the latter operation is only applied to strictly positive vectors \(q > 0\). The support \({{\mathrm{supp}}}(p) = \{p_{i} \ne 0 :i \in {{\mathrm{supp}}}(p)\} \subset [n]\) of a vector \(p \in \mathbb {R}^{n}\) is the index set of all non-nonvanishing components of p.

\(\langle x, y \rangle \) denotes the standard Euclidean inner product and \(\Vert x\Vert = \langle x, x \rangle ^{1/2}\) the corresponding norm. Other \(\ell _{p}\)-norms, \(1 \le p \ne 2 \le \infty \), are indicated by a corresponding subscript, \( \Vert x\Vert _{p} = \big (\sum _{i \in [d]} |x_{i}|^{p}\big )^{1/p}, \) except for the case \(\Vert x\Vert = \Vert x\Vert _{2}\). For matrices \(A, B \in \mathbb {R}^{m \times n}\), the canonical inner product is \( \langle A, B \rangle = \hbox {tr}(A^{\top } B) \) with the corresponding Frobenius norm \(\Vert A\Vert = \langle A, A \rangle ^{1/2}\). \({{\mathrm{Diag}}}(v) \in \mathbb {R}^{n \times n},\, v \in \mathbb {R}^{n}\), is the diagonal matrix with the vector v on its diagonal.

Other basic sets and their notation are

  • the positive orthant

    $$\begin{aligned} \mathbb {R}_{+}^{n} = \{ p \in \mathbb {R}^{n} :p \ge 0 \}, \end{aligned}$$
    (6.3)
  • the set of strictly positive vectors

    $$\begin{aligned} \mathbb {R}_{++}^{n} = \{p \in \mathbb {R}^{n} :p > 0\}, \end{aligned}$$
    (6.4)
  • the ball of radius r centered at p

    $$\begin{aligned} {\mathbb {B}}_{r}(p) = \{p \in \mathbb {R}^{n} :\Vert p\Vert \le r\}, \end{aligned}$$
    (6.5)
  • the unit sphere

    $$\begin{aligned} {\mathbb {S}}^{n-1} = \{p \in \mathbb {R}^{n} :\Vert p\Vert =1\}, \end{aligned}$$
    (6.6)
  • the probability simplex

    $$\begin{aligned} \varDelta _{n-1} = \{p \in \mathbb {R}_{+}^{n} :\langle {\mathbbm {1}}, p \rangle = 1 \} \end{aligned}$$
    (6.7)
  • and its relative interior

    $$\begin{aligned} {\mathscr {S}}&= \mathring{\varDelta }_{n-1} = \varDelta _{n-1} \cap \mathbb {R}_{++}^{n}, \end{aligned}$$
    (6.8a)
    $$\begin{aligned} {\mathscr {S}}_{n}&= {\mathscr {S}}\;\text {with concrete value of} n (e.g.,~{\mathscr {S}}_{3}), \end{aligned}$$
    (6.8b)
  • closure (not regarded as manifold)

    $$\begin{aligned} \overline{{\mathscr {S}}} = \varDelta _{n-1}, \end{aligned}$$
    (6.9)
  • the sphere with radius 2

    $$\begin{aligned} {\mathscr {N}} = 2 {\mathbb {S}}^{n-1}, \end{aligned}$$
    (6.10)
  • the assignment manifold

    $$\begin{aligned} {\mathscr {W}} = {\mathscr {S}} \times \cdots \times {\mathscr {S}}, \quad (\text {m times}) \end{aligned}$$
    (6.11)
  • and its closure (not regarded as manifold)

    $$\begin{aligned} \overline{{\mathscr {W}}} = \overline{{\mathscr {S}}} \times \cdots \times \overline{{\mathscr {S}}}, \quad (\text {m times}). \end{aligned}$$
    (6.12)

For a discrete distribution \(p \in \varDelta _{n-1}\) and a finite set \(S=\{s^{1},\ldots ,s^{n}\}\) vectors, we denote by

$$\begin{aligned} {\mathbb {E}}_{p}[S] := \sum _{i \in [n]} p_{i} s^{i} \end{aligned}$$
(6.13)

the mean of S with respect to p.

Let \({\mathscr {M}}\) be a any differentiable manifold. Then \(T_{p}{\mathscr {M}}\) denotes the tangent space at base point \(p \in {\mathscr {M}}\) and \(T{\mathscr {M}}\) the total space of the tangent bundle of \({\mathscr {M}}\). If \(F :{\mathscr {M}} \rightarrow {\mathscr {N}}\) is a smooth mapping between differentiable manifold \({\mathscr {M}}\) and \({\mathscr {N}}\), then the differential of F at \(p \in {\mathscr {M}}\) is denoted by

$$\begin{aligned} DF(p) :T_{p}{\mathscr {M}} \rightarrow T_{F(p)}{\mathscr {N}},\qquad DF(p) :v \mapsto DF(p)[v]. \end{aligned}$$
(6.14)

If \(F :\mathbb {R}^{m} \rightarrow \mathbb {R}^{n}\), then \(DF(p) \in \mathbb {R}^{n \times m}\) is the Jacobian matrix at p, and the application DF(p)[v] to a vector \(v \in \mathbb {R}^{m}\) means matrix-vector multiplication. We then also write DF(p)v. If \(F = F(p,q)\), then \(D_{p}F(p,q)\) and \(D_{q}F(p,q)\) are the Jacobians of the functions \(F(\cdot ,q)\) and \(F(p,\cdot )\), respectively.

The gradient of a differentiable function \(f :\mathbb {R}^{n} \rightarrow \mathbb {R}\) is denoted by \(\nabla f(x) = \big (\partial _{1} f(x),\ldots ,\partial _{n} f(x)\big )^{\top }\), whereas the Riemannian gradient of a function \(f :{\mathscr {M}} \rightarrow \mathbb {R}\) defined on Riemannian manifold \({\mathscr {M}}\) is denoted by \(\nabla _{{\mathscr {M}}} f\). Eq. (2.5) recalls the formal definition.

The exponential mapping [21, Def. 1.4.3]

$$\begin{aligned} {{\mathrm{Exp}}}_{p} :T_{p}{\mathscr {M}}&\rightarrow {\mathscr {M}}, \quad v \mapsto {{\mathrm{Exp}}}_{p}(v) = \gamma _{v}(1), \end{aligned}$$
(6.15a)
$$\begin{aligned} \gamma _{v}(0)&=p,\; \dot{\gamma }_{v}(0)=\frac{\hbox {d}}{\hbox {d}t}\gamma _{v}(t)\big |_{t=0} = v, \end{aligned}$$
(6.15b)

maps the tangent vector v to the point \(\gamma _{v}(1) \in {\mathscr {M}}\), uniquely defined by the geodesic curve \(\gamma _{v}(t)\) emanating at p in direction v. \(\gamma _{v}(t)\) is the shortest path on \({\mathscr {M}}\) between the points \(p, q \in {\mathscr {M}}\) that \(\gamma _{v}\) connects. This minimal length equals the Riemannian distance \(d_{{\mathscr {M}}}(p,q)\) induced by the Riemannian metric, denoted by

$$\begin{aligned} \langle u, v \rangle _{p}, \end{aligned}$$
(6.16)

i.e., the inner product on the tangent spaces \(T_{p}{\mathscr {M}},\,p \in {\mathscr {M}}\), that smoothly varies with p. Existence and uniqueness of geodesics will not be an issue for the manifolds \({\mathscr {M}}\) considered in this paper.

Remark 8

The exponential mapping \({{\mathrm{Exp}}}_{p}\) should not be confused with

  • the exponential function \(e^{v}\) used, e.g., in (6.1);

  • the mapping \(\exp _{p} :T_{p}{\mathscr {S}} \rightarrow {\mathscr {S}}\) defined by Eq. (3.8a).

The abbreviations “l.h.s.” and “r.h.s.” mean left-hand side and right-hand side of some equation, respectively. We abbreviate with respect to by “wrt.”

Appendix 2: Proofs and Further Details

1.1 Proofs of Section 2

Proof

(of Lemma 1) Let \(p \in {\mathscr {S}}\) and \(v \in T_{p}{\mathscr {S}}\). We have

$$\begin{aligned} D\psi (p) = {{\mathrm{Diag}}}(p)^{-1/2} \end{aligned}$$
(7.1)

and \(\big \langle \psi (p), D\psi (p)[v] \big \rangle = \langle 2 \sqrt{p}, \frac{v}{\sqrt{p}} \rangle = 2 \langle {\mathbbm {1}}, v \rangle = 0\), that is, \(D\psi (p)[v] \in T_{\psi (p)}{\mathscr {N}}\). Furthermore,

$$\begin{aligned} \big \langle D\psi (p)[u], D\psi (p)[v] \big \rangle = \big \langle u/\sqrt{p}, v/\sqrt{p} \rangle \overset{(2.1)}{=} \langle u, v \rangle _{p}, \end{aligned}$$
(7.2)

i.e., the Riemannian metric is preserved and hence also the length L(s) of curves \(s(t) \in {\mathscr {N}},\, t \in [a,b]\): Put \(\gamma (t) = \psi ^{-1}\big (s(t)\big ) = \frac{1}{4} s^{2}(t) \in {\mathscr {S}},\, t \in [a,b]\). Then \(\dot{\gamma }(t)=\frac{1}{2} s(t) \dot{s}(t) = \frac{1}{2} \psi \big (\gamma (t)\big ) \dot{s}(t) = \sqrt{\gamma (t)} \dot{s}(t)\) and

$$\begin{aligned} L(s)&= \int _{a}^{b} \Vert \dot{s}(t)\Vert \mathrm{d}t = \int _{a}^{b} \bigg \langle \frac{\dot{\gamma }(t)}{\sqrt{\gamma (t)}}, \frac{\dot{\gamma }(t)}{\sqrt{\gamma (t)}} \bigg \rangle ^{1/2} \mathrm{d}t \end{aligned}$$
(7.3a)
$$\begin{aligned}&\overset{(2.1)}{=} \int _{a}^{b} \Vert \dot{\gamma }(t)\Vert _{\gamma (t)} \mathrm{d}t = L(\gamma ). \end{aligned}$$
(7.3b)

\(\square \)

Proof

(of Prop. 1) Setting \(g:{\mathscr {N}} \rightarrow \mathbb {R}\), \(q \mapsto g(s) := f\big (\psi ^{-1}(s)\big )\) with \(s = \psi (p) = 2 \sqrt{p}\) from (2.3), we have

$$\begin{aligned} \nabla _{{\mathscr {N}}} g(s) = \bigg (I - \frac{s}{\Vert s\Vert } \frac{s^{\top }}{\Vert s\Vert }\bigg ) \nabla g(s), \end{aligned}$$
(7.4)

because the 2-sphere \({\mathscr {N}}=2{\mathbb {S}}^{n-1}\) is an embedded submanifold, and hence the Riemannian gradient equals the orthogonal projection of the Euclidean gradient onto the tangent space. Pulling back the vector field \(\nabla _{{\mathscr {N}}} g\) by \(\psi \) using

$$\begin{aligned} \nabla g(s) = \nabla f\big (\psi ^{-1}(s)\big ) = \nabla f\Big (\frac{1}{4} s^{2}\Big ) = \frac{1}{2} s \big (\nabla f(p)\big ), \end{aligned}$$
(7.5)

we get with (7.1), (7.4) and \(\Vert s\Vert =2\) and hence \(s/\Vert s\Vert = \frac{1}{2} \psi (p)=\sqrt{p}\)

$$\begin{aligned} \nabla f_{{\mathscr {S}}}(p)&= \big (D\psi (p)\big )^{-1}\big (\nabla _{{\mathscr {N}}} g(\psi (p))\big ) \end{aligned}$$
(7.6a)
$$\begin{aligned}&= {{\mathrm{Diag}}}(\sqrt{p}) \Big (\big (I - \sqrt{p} \sqrt{p}^{\top }\big ) \sqrt{p} \big (\nabla f(p)\big )\Big ) \end{aligned}$$
(7.6b)
$$\begin{aligned}&= p \big (\nabla f(p)\big ) - \langle p, \nabla f(p) \rangle p, \end{aligned}$$
(7.6c)

which equals (2.6). We finally check that \(\nabla f_{{\mathscr {S}}}(p)\) satisfies (2.5) (with \({\mathscr {S}}\) in place of \({\mathscr {M}}\)). Using (2.1), we have

$$\begin{aligned} \langle \nabla f_{{\mathscr {S}}}(p), v \rangle _{p}&= \Big \langle \sqrt{p} \big (\nabla f(p)\big ) - \langle p, \nabla f(p) \rangle \sqrt{p}, \frac{v}{\sqrt{p}} \Big \rangle \end{aligned}$$
(7.7a)
$$\begin{aligned}&= \langle \nabla f(p), v \rangle - \langle p, \nabla f(p) \rangle \langle {\mathbbm {1}}, v \rangle \end{aligned}$$
(7.7b)
$$\begin{aligned}&\overset{(2.2)}{=} \langle \nabla f(p), v \rangle ,\quad \forall v \in T_{p}{\mathscr {S}}. \end{aligned}$$
(7.7c)

\(\square \)

Proof

(of Prop. 2) The geodesic on the 2-sphere emanating at \(s(0) \in {\mathscr {N}}\) in direction \(w=\dot{s}(0) \in T_{s(0)}{\mathscr {N}}\) is given by

$$\begin{aligned} s(t) = s(0) \cos \Big (\frac{\Vert w\Vert }{2} t\Big ) + 2 \frac{w}{\Vert w\Vert } \sin \Big (\frac{\Vert w\Vert }{2} t\Big ). \end{aligned}$$
(7.8)

Setting \(s(0)=\psi (p)\) and \(w = D\psi (p)[v]=v/\sqrt{p}\), the geodesic emanating at \(p=\gamma _{v}(0)\) in direction v is given by \(\psi ^{-1}\big (s(t)\big )\) due to Lemma 1, which results in (2.7a) after elementary computations. \(\square \)

1.2 Proofs of Section 3 and Further Details

Proof

(of Prop. 3) We have \(p = \exp _{p}(0)\) and

$$\begin{aligned} \frac{\hbox {d}}{\hbox {d}t} \exp _{p}(u t)&= \frac{\langle p, e^{u t} \rangle p e^{u t} u - p e^{u t} \langle p, e^{u t} u \rangle }{\langle p, e^{u t} \rangle ^{2}} \end{aligned}$$
(7.9a)
$$\begin{aligned}&= p(t) u - \langle p(t), u \rangle p(t), \end{aligned}$$
(7.9b)

which confirms (3.10), is equal to (3.9) at \(t=0\) and hence yields the first expression of (3.11). The second expression of (3.11) follows from a Taylor expansion of (2.7a)

$$\begin{aligned} \gamma _{v}(t) \approx p + v t + \frac{1}{4}\big (v_{p}^{2}-\Vert v_{p}\Vert ^{2} p\big ) t^{2},\qquad v_{p} = \frac{v}{\sqrt{p}}. \end{aligned}$$
(7.10)

\(\square \)

Proof

(of Lemma 4) By construction, \(S(W) \in {\mathscr {W}}\), that is, \(S_{i}(W) \in {\mathscr {S}},\; i \in [m]\). Consequently,

$$\begin{aligned} 0 \le J(W)=\sum _{i \in [m]} \langle S_{i}(W), W_{i} \rangle \le \sum _{i \in [m]} \Vert S_{i}(W)\Vert \Vert W_{i}\Vert < m. \end{aligned}$$
(7.11)

The upper bound corresponds to matrices \(\overline{W}^{*} \in \overline{{\mathscr {W}}}\) and \(S(\overline{W}^{*})\) where for each \(i \in [m]\), both \(\overline{W}^{*}_{i}\) and \(S_{i}(\overline{W}^{*})\) equal the same unit vector \(e^{k_{i}}\) for some \(k_{i} \in [m]\). \(\square \)

Proof

(Explicit form of (3.27)) The matrices \(T^{ij}(W) = \frac{\partial }{\partial W_{ij}} S(W)\) are implicitly given through the optimality condition (2.9) that each vector \(S_{k}(W),\, k \in [m]\), defined by (3.13) has to satisfy

$$\begin{aligned} S_{k}(W)&= {\mathrm {mean}}_{{\mathscr {S}}}\{L_{r}(W_{r})\}_{r \in \tilde{{\mathscr {N}}}_{{\mathscr {E}}}(k)} \end{aligned}$$
(7.12a)
$$\begin{aligned} \qquad \Leftrightarrow \qquad 0&= \sum _{r \in \tilde{{\mathscr {N}}}_{{\mathscr {E}}}(k)} {{\mathrm{Exp}}}_{S_{k}(W)}^{-1}\big (L_{r}(W_{r})\big ). \end{aligned}$$
(7.12b)

Writing

$$\begin{aligned} \phi \big (S_{k}(W),L_{r}(W_{r})\big ) := {{\mathrm{Exp}}}_{S_{k}(W)}^{-1}\big (L_{r}(W_{r})\big ), \end{aligned}$$
(7.13)

while temporarily dropping below W as argument to simplify the notation, and using the indicator function \(\delta _{{\mathrm {P}}} = 1\) if the predicate \({\mathrm {P}}={\mathrm {true}}\) and \(\delta _{{\mathrm {P}}} = 0\) otherwise, we differentiate the optimality condition on the r.h.s. of (7.12),

$$\begin{aligned} 0&= \frac{\partial }{\partial W_{ij}} \sum _{r \in \tilde{{\mathscr {N}}}_{{\mathscr {E}}}(k)} \phi \big (S_{k}(W),L_{r}(W_{r})\big ) \end{aligned}$$
(7.14a)
$$\begin{aligned}&= \sum _{r \in \tilde{{\mathscr {N}}}_{{\mathscr {E}}}(k)} \Big ( D_{S_{k}}\phi (S_{k},L_{r}) \Big [\frac{\partial }{\partial W_{ij}} S_{k}(W)\Big ] \end{aligned}$$
(7.14b)
$$\begin{aligned}&\quad + \delta _{i=r} D_{L_{r}}\phi (S_{k},L_{r}) \Big [\frac{\partial }{\partial W_{rj}} L_{r}(W_{r})\Big ] \Big ) \end{aligned}$$
(7.14c)
$$\begin{aligned}&= \Big (\sum _{r \in \tilde{{\mathscr {N}}}_{{\mathscr {E}}}(k)} D_{S_{k}}\phi (S_{k},L_{r}) \Big ) \Big (\frac{\partial }{\partial W_{ij}} S_{k}(W)\Big ) \end{aligned}$$
(7.14d)
$$\begin{aligned}&\quad + \delta _{i \in \tilde{{\mathscr {N}}}_{{\mathscr {E}}}(k)} D_{L_{i}}\phi (S_{k},L_{i})\Big (\frac{\partial }{\partial W_{ij}} L_{i}(W_{i}) \Big ) \end{aligned}$$
(7.14e)
$$\begin{aligned}&=: H^{k}(W) \Big (\frac{\partial }{\partial W_{ij}} S_{k}(W)\Big ) + h^{k,ij}(W). \end{aligned}$$
(7.14f)

Since the vectors \(\phi (S_{k},L_{r})\) given by (7.13) are the negative Riemannian gradients of the (locally) strictly convex objectives (2.8) defining the means \(S_{k}\) [21, Thm. 4.6.1], the regularity of the matrices \(H^{k}(W)\) follows. Thus, using (7.14f) and defining the matrices

$$\begin{aligned}&T^{ij}(W) \in \mathbb {R}^{m \times n},\quad T^{ij}_{kl}(W) := \frac{\partial }{\partial S_{kl}(W)}{W_{ij}},\nonumber \\&\quad i,k \in [m],\; j,l \in [n], \end{aligned}$$
(7.15)

results in (3.27). The explicit form of this expression results from computing and inserting into (7.14f) the corresponding Jacobians \(D_{p}\phi (p,q)\) and \(D_{q}\phi (p,q)\) of

$$\begin{aligned} \phi (p,q)&={{\mathrm{Exp}}}_{p}^{-1}(q) \end{aligned}$$
(7.16a)
$$\begin{aligned}&= \frac{d_{{\mathscr {S}}}(p,q)}{\sqrt{1-\langle \sqrt{p},\sqrt{q}\rangle ^{2}}}\big (\sqrt{p q}-\langle \sqrt{p},\sqrt{q}\rangle p\big ), \end{aligned}$$
(7.16b)

and

$$\begin{aligned} \frac{\partial }{\partial W_{ij}} L_{i}(W_{i})&= \frac{e^{-U_{ij}}}{\langle W_{i}, e^{-U_{i}} \rangle } \big (e^{j} - L_{i}(W_{i})\big ). \end{aligned}$$
(7.16c)

The term (7.16b) results from mapping back the corresponding vector from the 2-sphere \({\mathscr {N}}\),

$$\begin{aligned} {{\mathrm{Exp}}}_{p}^{-1}(q) = -\big (D\psi (p)\big )^{-1}\Big (\frac{1}{2}\nabla _{{\mathscr {N}}}d_{{\mathscr {N}}}^{2}\big (\psi (p),\psi (q)\big )\Big ), \end{aligned}$$
(7.17)

where \(\psi \) is the sphere map (2.3) and \(d_{{\mathscr {N}}}\) is the geodesic distance on \({\mathscr {N}}\). The term (7.16c) results from directly evaluating (3.12). \(\square \)

Proof

(of Lemma 5) We first compute \(\exp _{p}^{-1}\). Suppose

$$\begin{aligned} q = \exp _{p}(u) = \frac{p e^{u}}{\langle p, e^{u} \rangle }, \qquad p,q \in {\mathscr {S}},\quad u \in \mathbb {R}^{n}. \end{aligned}$$
(7.18)

Then

$$\begin{aligned} \log (q)&= \log (p)+u-\log (\langle p, e^{u} \rangle ) {\mathbbm {1}}, \end{aligned}$$
(7.19a)
$$\begin{aligned} \log (\langle p, e^{u} \rangle )&= \frac{1}{n} \langle {\mathbbm {1}}, \log (p)-\log (q) \rangle , \end{aligned}$$
(7.19b)

and

$$\begin{aligned} u = \exp _{p}^{-1}(q) = \left( I-\frac{1}{n} {\mathbbm {1}}{\mathbbm {1}}^{\top }\right) \big (\log (q)-\log (p)\big ). \end{aligned}$$
(7.20)

Thus, in view of (3.9), we approximate

$$\begin{aligned}&{{\mathrm{Exp}}}_{p}^{-1}(q) \approx v \nonumber \\&\quad = \big ({{\mathrm{Diag}}}(p)-p p^{\top }\big ) u \end{aligned}$$
(7.21a)
$$\begin{aligned}&\quad = \left( {{\mathrm{Diag}}}(p)-\frac{1}{n} p {\mathbbm {1}}^{\top } - p p^{\top } + \frac{1}{n} p {\mathbbm {1}}^{\top }\right) \log \Big (\frac{q}{p}\Big ) \end{aligned}$$
(7.21b)
$$\begin{aligned}&\quad = \big ({{\mathrm{Diag}}}(p)-p p^{\top }\big ) \log \Big (\frac{q}{p}\Big ). \end{aligned}$$
(7.21c)

Applying this to the point set \({\mathscr {P}}\), i.e., setting

$$\begin{aligned} v^{i} = \big ({{\mathrm{Diag}}}(p)-p p^{\top }\big ) \log \frac{p^{i}}{p}, \qquad i \in [N], \end{aligned}$$
(7.22)

step (3) of (3.31) yields

$$\begin{aligned} v&:= \frac{1}{N} \sum _{i \in [N]} v^{i} = \frac{1}{N} \big ({{\mathrm{Diag}}}(p)-p p^{\top }\big )\nonumber \\&\quad \Big (\sum _{i \in [N]} \log (p^{i}) - N \log (p)\Big ) \end{aligned}$$
(7.23a)
$$\begin{aligned}&= \big ({{\mathrm{Diag}}}(p)-p p^{\top }\big ) \log \bigg ( \frac{1}{p} \Big (\prod _{i \in [N]} p^{i}\Big )^{\frac{1}{N}} \bigg ) \end{aligned}$$
(7.23b)
$$\begin{aligned}&= \big ({{\mathrm{Diag}}}(p)-p p^{\top }\big ) \log \Big (\frac{{\mathrm {mean}}_{g}({\mathscr {P}})}{p}\Big ) \end{aligned}$$
(7.23c)
$$\begin{aligned}&=: \big ({{\mathrm{Diag}}}(p)-p p^{\top }\big ) u. \end{aligned}$$
(7.23d)

Finally, approximating step (4) of (3.31) results in view of Prop. 3 in the update of p

$$\begin{aligned} \exp _{p}(u) = \frac{p e^{u}}{\langle p, e^{u} \rangle } = \frac{{\mathrm {mean}}_{g}({\mathscr {P}})}{\langle {\mathbbm {1}}, {\mathrm {mean}}_{g}({\mathscr {P}}) \rangle }. \end{aligned}$$
(7.24)

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Åström, F., Petra, S., Schmitzer, B. et al. Image Labeling by Assignment. J Math Imaging Vis 58, 211–238 (2017). https://doi.org/10.1007/s10851-016-0702-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-016-0702-4

Keywords

Mathematics Subject Classification

Navigation