1 Introduction

In this paper we continue the study of the capacity estimate from [1], where we introduce a geometric characterization of the Eyring–Kramers formula. To introduce our setting, we begin by considering the Kolmogorov process

$$\begin{aligned} d X_t = -\nabla F(X_t) dt + \sqrt{2\varepsilon } dB_t \end{aligned}$$

where F is a non-convex potential and \(\varepsilon \) is a small positive number. A formula for the expected transition time from one local minimum point to another was proposed independently by Eyring [7] and Kramers [11] in the context of metastability of chemical processes, and can be stated as follows. Assume that x and y are quadratic local minima of F, separated by a unique saddle z which is such that the Hessian has a single negative eigenvalue \(\lambda _1(z)\). Then the expected transition time from x to y satisfies

$$\begin{aligned} {\mathbb {E}}^x[\tau ] \simeq \frac{2\pi }{|\lambda _1(z)|} \sqrt{\frac{|\det (\nabla ^2 F(z))|}{\det (\nabla ^2 F(x))}} e^{(F(z)-F(x))/\varepsilon }, \end{aligned}$$

where \(\simeq \) denotes that the comparison constant tends to 1 as \(\varepsilon \rightarrow 0\). The validity of the above formula has been studied extensively, references can be found in for instance [1, 4, 8, 12]. The first rigorous proof of the Eyring–Kramers formula above, is by Bovier et al. [5] using potential theory and this approach has turned out to be fruitful.

Our main motivation to study this phenomenon, comes from non-convex optimization, for instance, optimization of neural networks. In this setting, the minima/saddles are in general degenerate and/or non-smooth.

In [1] we use the potential theoretic formulation and extend the results of Bovier et al. [5] and Berglund and Gentz [2] to more general cases, which in particular includes non-smooth critical points. As in [5], the main technical issue is to provide sharp capacity estimates and the main result in [1] is a geometric characterization of Newtonian capacity w.r.t. the measure \(e^{-F(x)/\varepsilon } dx\) inspired by the corresponding characterization for conformal capacity originally proved by Gehring [9]. In [1] we observe that the capacity depends on the configuration of the saddle points which connect the two local minima, but we computed the capacity only in the simple cases when the saddles are either parallel or in series, see Fig. 1. However, for an arbitrary smooth potential the situation can be more complex and the configuration of the saddle points can be a combination of both parallel and series cases with essentially arbitrary complexity.

The novelty in [1] was the use of Geometric Function Theory to provide a lower bound for the capacity. In this paper we complete this method by providing an upper bound using Geometric Function Theory together with Thompson’s principle, see the proof of the upper bound in Sect. 3.4, Proof of Theorem 1. Our goal is to extend the capacity estimate from Avelin et al. [1] to the case of arbitrary configurations of critical points. We do this by discretizing the problem where the ‘valleys’/‘islands’ around the local minimum points are the vertices and the regions around the saddle points which we call ‘bridges’ are the edges. The local capacity of a bridge can be geometrically characterized using the results from Avelin et al. [1] and this defines the weights of the edges, thus turning the problem into a capacitary problem on a graph. Connecting problems of this type to graphs is similar to Michel [14], however, we did not find this particular problem in the literature. We note that the result in [5] covers only the case of the parallel configuration, see Fig. 1. Moreover, the framework of geometric function theory (see [1]) makes this construction straightforward and natural.

The capacitary problem on the graph is equivalent to the notion of an electrical network, which was originally defined by Kirchhoff in the 1840s in his elegant solution to the problem of replacement resistance for a network of resistors [10]. For a modern presentation of electrical networks and its connection to Markov chains and Kirchhoff’s theorem, we refer to Bollobás [3], Levin and Peres [13] and Wagner [15].

1.1 Assumptions and definitions

In order to state our main results we first need to introduce our assumptions on the potential F. We remark that our assumptions cover the case where F is a Morse function as defined in [4, Assumption 10.3], i.e. a \(C^2\) function in which all critical points are non-degenerate (non-degenerate Hessian with at most one negative eigenvalue). We further remark that our assumptions cover the degenerate case studied in [2], but we also allow for non-smooth (Lipschitz) potentials.

Fig. 1
figure 1

Left picture is the parallel case and the right is the series case, \(x_u,x_w\) are local minimum points and \(z_i\) are saddle points

Let us first introduce some general terminology. Recall that a Lipschitz function \(h:{\mathbb {R}}\rightarrow {\mathbb {R}}\) has a critical point at t, if 0 is in the generalized gradient of h at t in the following sense,

$$\begin{aligned} \limsup _{s \rightarrow t\pm } \frac{f(s)-f(t)}{s-t} \ge 0 \quad \text {or} \quad \liminf _{s \rightarrow t\pm } \frac{f(s)-f(t)}{s-t} \le 0, \end{aligned}$$

where in the above we mean that both the left and the right limit satisfies the conditions. We say that a point z of a Lipschitz function \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is a critical point, if for every \(e \in {\mathbb {R}}^n\), \(\Vert e\Vert =1\), the function \(h_e(t)=f(z+te)\) has a critical point at 0.

Given a continuous function \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\), we say that a local minimum of f at z is proper if there exists a \(\hat{\delta }> 0\) such that for every \(0< \delta < \hat{\delta }\) there is a \(\rho \) such that

$$\begin{aligned} f(x) \ge \left\{ \begin{array}{ll} f(z), &{} x \in B_{\rho }(z), \\ f(z) + \delta , &{} x \in \partial B_{\rho }(z), \end{array}\right. \end{aligned}$$

where \(B_\rho (z)\) denotes an open ball with radius \(\rho \) centered at z (proper maximum is defined analogously). When the center is at the origin we use the short notation \(B_\rho \). We say that a critical point z of f is a saddle point if it is not a proper local minimum nor maximum point.

Let us then proceed to our assumptions on the potential.

Definition 1.1

Let \(F \in C^{0,1}({\mathbb {R}}^n)\) satisfy the following quadratic growth condition

$$\begin{aligned} F(x) \ge \frac{ |x|^2}{C_0} - C_0 \end{aligned}$$

for a constant \(C_0 \ge 1\). We assume that every local minimum point z of F is proper.

We say that F is admissible if for every saddle point \(z \in {\mathbb {R}}^n\) of F there are convex functions \(g_z: {\mathbb {R}}\rightarrow {\mathbb {R}}\) and \(G_z:{\mathbb {R}}^{n-1} \rightarrow {\mathbb {R}}\) which have proper minimum at 0, such that \(g_z(0) = G_z(0) = 0\), and an isometryFootnote 1\(T_z: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) such that, denoting \(x = (x_1, x') \in {\mathbb {R}}\times {\mathbb {R}}^{n-1}\), it holds

$$\begin{aligned} \big | (F\circ T_z) (x) -F(z) + g_z(x_1) - G_z(x')\big |\le \omega ( g_z(x_1)) + \omega ( G_z(x')), \end{aligned}$$
(1.1)

where \(\omega : [0,\infty ) \rightarrow [0,\infty )\) is a continuous and increasing function with \(\lim _{s \rightarrow 0} \frac{\omega (s)}{s} = 0\).

The assumption (1.1) allows the saddle point to be degenerate, but we do not allow branching saddles, in the sense that \(\{f(x) < f(z)\} \cap B_{\rho }(z)\) can have at most two components for small \(\rho \). Note that the convex functions \(g_z, G_z\) and the isometry \(T_z\) depend on z, while the function \(\omega \) is the same for all saddle points. As such, we denote by \(\delta _0\) the largest number for which \(\omega (\delta ) \le \frac{\delta }{100}\) for all \(\delta \le 4 \delta _0\).

Definition 1.2

Let \(F \in C^{0,1}({\mathbb {R}}^n)\) be admissible, then for every saddle point z and \(\delta > 0\), we define the bridge at z as

$$\begin{aligned} O_{z,\delta }:= T_z\left( \{x_1 \in {\mathbb {R}}: g_z(x_1)< \delta \} \times \{x' \in {\mathbb {R}}^{n-1}: G_z(x') < \delta \}\right) , \end{aligned}$$

where \(T_z\) is the isometry from Definition 1.1. See Fig. 2.

Fig. 2
figure 2

The neighborhood \(O_{z, \delta }\) of the saddle point z (bridge) connects the sets \(U_{x_u}\) and \(U_{x_w}\), components of \(\{F < F(z)-\delta /3\}\)

Note that, since the saddle may be flat, we should talk about sets rather than points. However, we adopt the convention that we always choose a representative point from each saddle (set) and thus we may label the saddles by points \(z_1, z_2, \dots \). Moreover, we assume that there is a \(\delta _1 \le \delta _0\) such that for \(\delta < \delta _1\) we have that if \(z_1\) and \(z_2\) are two different saddle points, then their neighborhoods \(O_{z_1, 3\delta }\) and \(O_{z_2, 3\delta }\) defined in Definition 1.2 are disjoint. Furthermore, we also assume that \(\varepsilon _0\) is small enough that for any local minimum point x, the ball \(B_{\varepsilon _0}(x)\) does not intersect any bridge or any \(\varepsilon _0\) ball around any other local minimum.

We use the definitions of a geodesic length and a minimal cut originally defined in [1], inspired by [9].

Definition 1.3

Let \(A,B \subset \Omega \subset {\mathbb {R}}^n\) where \(\Omega \) is a domain and \(A \cap B = \emptyset \). We denote the curve family

$$\begin{aligned} {\mathcal {C}}(A,B; \Omega ) := \{\gamma : \gamma \in C^{1}([0,1];\Omega ), \gamma (0) \in A, \gamma (1) \in B\} \end{aligned}$$

and the family of separating sets as \({\mathcal {S}}(A,B;\Omega )\), where a smooth hypersurface \(S \subset {\mathbb {R}}^n\) (possibly with boundary) is in \({\mathcal {S}}(A,B;\Omega )\) if every \(\gamma \in {\mathcal {C}}(A,B; \Omega )\) intersects S. We define the geodesic distance between A and B in \(\Omega \) as

$$\begin{aligned} d_{\varepsilon }(A,B; \Omega ):= \inf \left( \int _{\gamma } |\gamma '| e^{\frac{F(\gamma )}{\varepsilon }} \, dt: \gamma \in {\mathcal {C}}(A,B; \Omega ) \right) \end{aligned}$$

and the minimal cut by

$$\begin{aligned} V_{\varepsilon }(A,B; \Omega ):= \inf \left( \int _{S} e^{-\frac{F(x)}{\varepsilon }} \, d {\mathcal {H}}^{n-1}(x):S \in {\mathcal {S}}(A,B;\Omega ) \right) . \end{aligned}$$

We define some topological quantities.

Definition 1.4

Let \(x_u,x_w\) be two local minima of an admissible F. The communication height between \(x_u,x_w\) is defined as

$$\begin{aligned} F(x_u; x_w) = \inf _{\gamma \in {\mathcal {C}}(B_\varepsilon (x_u),B_\varepsilon (x_w); {\mathbb {R}}^n)} \sup _{t \in [0,1]} \, F(\gamma (t)). \end{aligned}$$

Fixing \(\delta < \delta _1\), we denote the component of the sub-levelset \(\{ F < F(x_u; x_w) + \delta /3\}\) which contains the points \(x_u\) and \(x_w\) by \(U_{\delta /3}\), and we denote

$$\begin{aligned} U_{-\delta /3}:= \{ F < F(x_u; x_w) - \delta /3\} \cap U_{\delta /3}. \end{aligned}$$
(1.2)

Furthermore, we remark that \(F(x_u;x_w)\) does not depend on \(\varepsilon \) if \(\varepsilon < \varepsilon _0\). We call the components of \(U_{-\delta /3}\) islands. For each island U we select a proper minimum point x satisfying \(F(x) = \min _U F\), and we will in the following denote \(U_x\) as the island which contains x, see Fig. 2. We denote all saddle points in \(U_{\delta /3} {\setminus } U_{-\delta /3}\) by Z.

Finally we recall that the capacity of two disjoint sets AB is defined as

$$\begin{aligned} {\text {cap}}(A ,B) = \inf \left( \varepsilon \int _{{\mathbb {R}}^n } |\nabla h|^2 e^{-\frac{F}{\varepsilon }}\, dx \, : \,\, h=1 \text { in } A, \, h \in W_0^{1,2}({\mathbb {R}}^n \setminus B)\right) . \end{aligned}$$

1.2 Construction of the electrical network

Definition 1.5

An electrical network is a pair \((G,\varvec{y})\), where \(G = (V,E)\) is a graph, where V are the vertices and E are the edges, the vector \(\varvec{y} \in {\mathbb {R}}^{|E|}\) is called the admittances.

We will now construct an electrical network based on the islands and bridges from Definitions  1.2 and 1.4. We associate the vertices with the islands and for every vertex v we denote the corresponding island by \(U_v\). The set of all vertices is V. Furthermore, we associate the edges with the bridges from Definition 1.2, specifically, for every saddle point \(z \in Z\) in Definition 1.4 we associate the edge \(e_z\) with the bridge \(O_{e_z} = O_{z, \delta }\). The set of all edges is E, vice versa we associate with \(e \in E\) the corresponding saddle point \(z_e \in Z\). We say that vertices \(v,v' \in V\) are incident with an edge e, and vice versa, if they are the ends of the edge, or in other words, the associated islands \(U_v,U_v'\) intersect the bridge \(O_e\) (there are at most two since F is admissible). An edge which is incident with only one vertex is called a loop. We also define a cycle of the graph G to be any non-trivial closed path for which only the first and last vertices are equal.

We thus have a graph \(G = (V,E)\), and we orient it arbitrarily (i.e. we orient each edge of G arbitrarily by assigning an arrow on it pointing towards one of its two ends). In order to have an electrical network we need to define admittance \({\varvec{y}}_e\) for \(e \in E\). Now, let \(e \in E\), which is not a loop, and let \(v_{-}, v_+ \in V\) be its incident vertices. Define the connected set \(\Omega _{e} = O_{z_e,\delta } \cup U_{v_-} \cup U_{v_+}\) and the admittance

$$\begin{aligned} {\varvec{y}}_e:= \varepsilon \frac{V_{\varepsilon }(B_\varepsilon (x_{v_-}),B_\varepsilon (x_{v_+}); \Omega _{e}) }{d_{\varepsilon }(B_\varepsilon (x_{v_-}),B_\varepsilon (x_{v_+}); \Omega _{e}) } e^{\frac{F(z_e)}{\varepsilon }}. \end{aligned}$$
(1.3)

From the geometric characterization of capacity in [1], we see that the admittance of the edge e is the pre-factor of the capacity of \((B_{\varepsilon }(x_{v_-}),B_{\varepsilon }(x_{v_+}))\) in \(\Omega _{e}\). If e is a loop we set \({\varvec{y}}_e = 0\). We have thus constructed our electrical network \((G,\varvec{y})\) which consists of the graph G and the admittance vector \(\varvec{y} \in {\mathbb {R}}^{|E|}\).

1.3 The main result

We begin with some notation and recalling some results from Avelin et al. [1].

For functions f and g, which depend continuously on \(\varepsilon >0\), we adopt the notation

$$\begin{aligned} f(\varepsilon ) \simeq g(\varepsilon ) \end{aligned}$$

when there exists a constant C depending only on the data of the problem such that

$$\begin{aligned} (1- C \hat{\eta }(\varepsilon )) f(\varepsilon ) \le g(\varepsilon ) \le (1+C \hat{\eta }(\varepsilon )) f(\varepsilon ), \end{aligned}$$

where \(\hat{\eta }(\cdot )\) is an increasing and continuous function \(\hat{\eta }(\cdot ):[0,\infty ) \rightarrow [0,\infty )\) with \(\lim _{s \rightarrow 0} \hat{\eta }(\cdot ) = 0\). We remark that in the following the function \(\hat{\eta }\) is the one from Proposition 1.6 and Lemma 3.4. For us the explicit form will not be important but can be found in [1] and we merely note that \(\hat{\eta }\) is sublinear and depends on the Lipschitz constant of F inside \(U_{\delta }\), the dimension and the function \(\omega \) from (1.1).

We need the above notation in order to relate the geodesic distance and the minimal cut from Definition 1.3 to the convex functions \(g_z,G_z\) from Definition 1.2, which is stated in the following proposition (for the proof see [1, Proposition 4.1-\(-\)4.2]):

Proposition 1.6

Let \(v_-,v_+ \in V\) be incident vertices connected with edge \(e \in E\) and let \(0< \varepsilon < \varepsilon _0\). Denote \(x_{v_-},x_{v_+}\) the corresponding proper local minimum points and let \(z_e\) be the corresponding saddle, then

$$\begin{aligned} d_{\varepsilon }(B_\varepsilon (x_{v_-}),B_\varepsilon (x_{v_+}); \Omega _e) \simeq e^{\frac{F(z_e)}{\varepsilon }} \int _{{\mathbb {R}}} e^{-\frac{g_{z_e}(y_1)}{\varepsilon }} \, dy_1, \end{aligned}$$

and

$$\begin{aligned} V_{\varepsilon }(B_\varepsilon (x_v),B_\varepsilon (x_{v'});\Omega _e) \simeq e^{-\frac{F(z_e)}{\varepsilon }} \int _{{\mathbb {R}}^{n-1}} e^{-\frac{G_{z_e}(y')}{\varepsilon }} \, dy', \end{aligned}$$

where \(g_{z_e},G_{z_e}\) are the functions in Definition 1.2.

We need the definition of a spanning tree for Kirchhoff’s formula.

Definition 1.7

Let \(G=(V,E)\) be a graph. We say that \(G'\) is a spanning subgraph of G if \(V(G') = V(G)\) and \(E(G') \subset E(G)\) (i.e. the same vertices but only a subset of the edges). A tree is a connected graph which does not contain cycles and a spanning tree of G is a spanning subgraph of G that is a tree. We denote the set of all spanning trees of G by \({\mathcal {T}}(G)\). Finally, for two vertices \(v,w \in V\) we let G/vw denote the graph obtained by merging the vertices v and w together into a single vertex.

We are now ready to state our main theorem.

Theorem 1

Let F be admissible as in Definition 1.1, let \(x_u\) and \(x_w\) be local minimum points of F and let \((G,\varvec{y})\) be the electrical network as in Sect. 1.2. Let uw be the associated vertices in V. Then the capacity is given by

$$\begin{aligned} {\text {cap}}(B _\varepsilon (x_u),B_\varepsilon (x_w)) \simeq \frac{T(G;{\varvec{y}})}{T(G/uw;{\varvec{y}})}, \end{aligned}$$

where

$$\begin{aligned} T(G;{\varvec{y}}) = \sum _{G' \in {\mathcal {T}}(G)} \Big ( \prod _{e \in G'} {\varvec{y}}_e \Big ). \end{aligned}$$
(1.4)

Theorem 1, together with the formula (1.3), provide the characterization of the capacity in the general case where the critical points may have any configuration.

2 Preliminaries on graph theory and electrical networks

In this section we recall some basic results in graph theory. For an introduction to the topic we refer to Bollobás [3], Levin and Peres [13] and Wagner [15].

The signed incidence matrix D of the oriented graph \(G=(V,E)\) is the \(|V| \times |E|\) matrix with entries

$$\begin{aligned} D_{ve} = \left\{ \begin{array}{ll} +1 &{} \text {if }e\text { points into }v\text { but not out} \\ -1 &{} \text {if }e\text { points out of }v\text { but not in} \\ 0 &{} \text {otherwise}. \end{array}\right. \end{aligned}$$

Let \(\varvec{y}\) be a vector of admittances defined in Sect. 1.2. Let Y be the \(|E|\times |E|\) diagonal matrix that has \(\varvec{y}\) as its entries, i.e., \(Y = \text {diag}({\varvec{y}}_e: e \in E)\). We also define the weighted Laplacian matrix as \(L = DYD^T\).

We begin by recalling the weighted Matrix-Tree theorem, see [15, Theorem 5], which relates the quantity (1.4) to the weighted Laplacian matrix.

Proposition 2.1

Let \(G = (V,E)\) be an oriented graph and let DY and L be as above. Then, for any \(v \in V\)

$$\begin{aligned} T(G;{\varvec{y}}) = \det L(v \mid v) \end{aligned}$$

where \(L(v \mid v)\) is L with the row and column corresponding to v removed and \(T(G;{\varvec{y}}) \) is defined in (1.4).

Let us recall Kirchhoff’s theorem, see [15, Theorem 8], which relates the right-hand side of the formula in Theorem 1 to the solution of a linear system.

Proposition 2.2

Let \(G = (V,E)\) be an oriented graph, \((G,\varvec{y})\) the electrical network, and let L be the corresponding weighted Laplacian matrix. Fix \(u,w \in V\) and let the vector \({\varvec{\varphi }} \in {\mathbb {R}}^{|V|}\), with the component \({\varvec{\varphi }}_u = 0\), be the solution to the system

$$\begin{aligned} L {\varvec{\varphi }} = {\varvec{\delta }}_w-{\varvec{\delta }}_u, \end{aligned}$$

where \(\varvec{\delta }_w\) is a vector with 1 in the position of w and 0 otherwise. Then the component \({\varvec{\varphi }}_w\) is given by

$$\begin{aligned} {\varvec{\varphi }}_w = \frac{T(G/uw;{\varvec{y}})}{T(G;{\varvec{y}})}. \end{aligned}$$

The classical interpretation of Kirchhoff’s theorem is that of a network of resistors (the admittance is the inverse of the resistance), where we have grounded one end of the network (\(\varvec{\varphi }_u = 0\)) and let 1 Ampere of current flow through it (right-hand side \(\varvec{\delta }_w\)). Then the voltage at the exiting node \(\varvec{\varphi }_w\) is given by the formula above. This allowed Kirchhoff [10] to solve the problem of replacement resistance which in this case is just \(\varvec{\varphi }_w\).

Given an electrical network \((G,\varvec{y})\) we may define a discrete Dirichlet capacity between two vertices \(v_1,v_m \in V\) as

$$\begin{aligned} \min _{{\varvec{\varphi }} \in {\mathbb {R}}^{m}; {\varvec{\varphi }}_1 = 1; {\varvec{\varphi }}_m = 0} \langle L {\varvec{\varphi }}, {\varvec{\varphi }} \rangle \end{aligned}$$

where L is the weighted Laplacian matrix. Then the minimizer of the above problem is inversely related to Kirchhoff’s theorem, Proposition 2.2. For more information, see [3].

Lemma 2.3

Let \((G,\varvec{y})\) be the electrical network from Sect. 1.2 and let L be the Laplacian matrix. Then it holds

$$\begin{aligned} \min _{{\varvec{\varphi }} \in {\mathbb {R}}^{m}; {\varvec{\varphi }}_1 = 1; {\varvec{\varphi }}_m = 0} \langle L {\varvec{\varphi }}, {\varvec{\varphi }} \rangle = \frac{T(G;{\varvec{y}})}{T(G/v_1v_m;{\varvec{y}})}. \end{aligned}$$

The minimizer is given by the unique solution with the boundary conditions \(\varvec{\varphi }_m=0\), \(\varvec{\varphi }_1=1\) to the linear system

$$\begin{aligned} L {\varvec{\varphi }} = \lambda ( {\varvec{\delta }}_1 - {\varvec{\delta }}_m) \end{aligned}$$

where \({\varvec{\delta }}_1 = (1,0,\ldots ,0)\) and \({\varvec{\delta }}_m = (0,\ldots ,0,1)\) are vectors of length m and \(\lambda \) is the value of the minimum problem.

Proof

Recall that \(L = D Y D^T\), where D is the signed incidence matrix and Y is the admittance matrix. Let us first reduce the problem. Note that the constraint \({\varvec{\varphi }}_m = 0\) implies that we may remove the last row of D (call it \(D_-\)) and the last entry of \({\varvec{\varphi }}\) (call it \({\varvec{\varphi }}_-\)) and note that \(D_-^T {\varvec{\varphi }}_- = D^T {\varvec{\varphi }}\). Let \(L_- = D_- Y D_-^T = L(v_m \mid v_m)\) and note that similar reasoning gives that

$$\begin{aligned} \langle L_- {\varvec{\varphi }}_-, {\varvec{\varphi }}_- \rangle = \langle L {\varvec{\varphi }},{\varvec{\varphi }} \rangle . \end{aligned}$$

By the Lagrange multiplier method we get

$$\begin{aligned} \left\{ \begin{array}{ll} L_- {\varvec{\varphi }}_- &{}= \lambda {\varvec{\delta }_1} \\ ({\varvec{\varphi }_-})_1 &{}= 1, \end{array}\right. \end{aligned}$$

where \({\varvec{\delta }_1} = (1,0,\ldots )\). Note that by Proposition 2.1 we know that \(\det (L_-) = T(G;{\varvec{y}}) \ne 0\) which gives that the above system has a unique solution. From the above we get that the value of the minimum is given as

$$\begin{aligned} \langle L {\varvec{\varphi }}, {\varvec{\varphi }} \rangle = \langle L_- {\varvec{\varphi }}_-, {\varvec{\varphi }}_- \rangle = \lambda . \end{aligned}$$
(2.1)

Next, we note that \(\varvec{\varphi }/ \lambda \) is a solution to the linear system in Proposition 2.2, as such we get

$$\begin{aligned} \frac{T(G/uw;{\varvec{y}})}{T(G;{\varvec{y}})} = \frac{\varvec{\varphi }_1}{\lambda } = \frac{1}{\lambda }, \end{aligned}$$

which together with (2.1) finishes the proof. \(\square \)

We also need the following dual formulation of the minimization problem in Lemma 2.3.

Lemma 2.4

Let \(G = (V,E)\) be an oriented graph, where \(V = (v_1,\ldots ,v_m)\), let D be the signed incidence matrix, L the Laplacian matrix, and let Y be the admittance matrix. Then it holds

$$\begin{aligned} \min \left( \langle Y^{-1} {\varvec{j}} , {\varvec{j}} \rangle : \,\, {\varvec{j}} \in {\mathbb {R}}^{|E|}, \,\, D {\varvec{j}} = {\varvec{\delta }}_1 - {\varvec{\delta }}_m \right) = \frac{1}{\lambda }, \end{aligned}$$
(2.2)

where \(\lambda \) is the value of the minimization problem from Lemma 2.3, i.e.,

$$\begin{aligned} \lambda = \min _{{\varvec{\varphi }} \in {\mathbb {R}}^{m}; {\varvec{\varphi }}_1 = 1; {\varvec{\varphi }}_m = 0} \langle L {\varvec{\varphi }}, {\varvec{\varphi }} \rangle . \end{aligned}$$

We point out that one may interpret the minimization problem (2.2) as a discrete version of Thompson’s principle.

Proof

Let \({\varvec{j}}\in {\mathbb {R}}^{|E|}\) be the minimizer of (2.2). The first variation of the minimization problem implies that \(\langle Y^{-1} {\varvec{j}},\varvec{e} \rangle = 0\) for all \({ \varvec{e}} \in {\mathbb {R}}^{|E|}\) with \(D{ \varvec{e}} = 0\), i.e.,

$$\begin{aligned} Y^{-1} {\varvec{j}} \in \text {Ker}^{\perp }(D). \end{aligned}$$
(2.3)

Recall that the solution to the minimization problem in Lemma 2.3 satisfies

$$\begin{aligned} \varvec{\delta }_1 - \varvec{\delta }_m = \lambda ^{-1} L \varvec{\varphi }= \lambda ^{-1} DYD^T \varvec{\varphi }= D(\lambda ^{-1} Y D^T \varvec{\varphi }) \end{aligned}$$

as such \(\tilde{\varvec{j}} = \lambda ^{-1} YD^T \varvec{\varphi }\) will according to the above satisfy the constraint \(D \tilde{ \varvec{j}} = {\varvec{\delta }}_1- {\varvec{\delta }}_m\). Then, \(\tilde{\varvec{j}} - \varvec{j} =: \alpha \in \text {Ker}(D)\) and it holds trivially \(\langle D^T{\varvec{\varphi }}, \alpha \rangle = 0\) since, \(D^T{\varvec{\varphi }} \in \text {Ker}^{\perp }(D)\). Moreover, by (2.3) it holds \(\langle Y^{-1} {\varvec{j}}, \alpha \rangle = 0\), thus

$$\begin{aligned} \langle Y^{-1}\alpha , \alpha \rangle = \langle \lambda ^{-1} D^T{\varvec{\varphi }}, \alpha \rangle - \langle Y^{-1}{\varvec{j}}, \alpha \rangle = 0. \end{aligned}$$

Since \(Y^{-1}\) is positive definite we obtain \(\alpha = 0\), that is

$$\begin{aligned} \lambda ^{-1} Y D^T{\varvec{\varphi }} = {\varvec{j}}. \end{aligned}$$

The result then follows from Lemma 2.3 as

$$\begin{aligned} \langle Y^{-1} {\varvec{j}}, {\varvec{j}} \rangle = \frac{\langle Y^{-1} Y D^T{\varvec{\varphi }}, Y D^T{\varvec{\varphi }} \rangle }{\lambda ^2} = \frac{\langle D Y D^T{\varvec{\varphi }},{\varvec{\varphi }} \rangle }{\lambda ^2} = \frac{1}{\lambda }. \end{aligned}$$

\(\square \)

2.1 Simplification of the electrical network

The formula in the statement of Theorem 1 given by Kirchhoff’s formula is precise, but if the graph contains many cycles and loops, it may be unnecessarily cumbersome to calculate. In the next two lemmas we consider the case when the formula in Theorem 1 can be simplified.

Consider a graph \(G = (V,E)\). A cut vertex is a vertex, that when removed from G will increase the number of components. A biconnected graph is a graph with no cut vertices. A biconnected component of a graph G is a maximal biconnected subgraph.

Lemma 2.5

Let \(G=(V,E)\) be a graph with a biconnected component \(G_1=(V_1,E_1)\) and let \(G_2=(V_2,E_2)\) be a subgraph of G such that they intersect in one cut vertex \(v \in V\) and \(G = G_1 \cup G_2\). Then if \({\textbf{y}} \in {\mathbb {R}}^{|E|}\) is the admittance vector, \(\mathbf {y_1} = {\textbf{y}}|_{E_1}\) and \(\mathbf {y_2} = {\textbf{y}}|_{E_2}\), it holds

$$\begin{aligned} T(G;{\textbf{y}}) = T(G_1;\mathbf {y_1})T(G_2;\mathbf {y_2}). \end{aligned}$$

Proof

By the definition of biconnected components, and since \(G_1,G_2\) intersect only in v, we can by reordering the vertices write the Laplacian matrix \(L = D Y D^T\) such that the first rows/columns correspond to the vertices in \(G_1\). Then L with the column and row corresponding to v removed (\(L(v \mid v)\)) has a block diagonal structure with the blocks \(L_1 = L_{G_1}(v \mid v)\) and \(L_2 = L_{G_2}(v \mid v)\). Now, since \(\det (L) = \det (L_1)\det (L_2)\) the claim follows by applying Proposition 2.1 on all matrices. \(\square \)

We can use the above lemma to simplify the computation of Kirchhoff’s theorem in the presence of irrelevant biconnected components, see Fig. 3.

Proposition 2.6

Consider the graph \(G = (V,E)\) and let Y be the admittance matrix. Assume that \(G = G_1 \cup G_2\), where \(G_1\) is a biconnected component and \(G_1, G_2\) intersect in a cut vertex \(v \in V\). Then if \(u,w \in V_2\), it holds

$$\begin{aligned} \frac{T(G;{\varvec{y}})}{T(G/uw;{\varvec{y}})} = \frac{T(G_2;\textbf{y}_2)}{T(G_2/uw;{\varvec{y}}_2)}. \end{aligned}$$

The main consequence of Proposition 2.6 is that, using the terminology from [1], only the vertices in \(V_2\) are relevant. We also point out that this is related to the definition of a gate in [5]. In particular, referring to Kirchhoff’s theorem, a consequence of the above is that the voltage \(\varvec{\varphi }\) is constant on the biconnected components and is thus redundant.

Fig. 3
figure 3

Example of the graph decomposition in Proposition 2.6. Here the subgraph corresponding to the blue edges is the biconnected component and the red edges correspond to the graph \(G_1\)

A consequence of Lemma 2.3 is that edges with small admittance does not contribute total capacity unless they significantly alter the topology of the graph:

Lemma 2.7

(Deletion of edge) Let \((G,\varvec{y})\) be the electrical network as in Lemma 2.3. Let \(e \in E\) and define \(G' = (V, E {\setminus } \{e\})\), then it holds

$$\begin{aligned} \frac{T(G';\varvec{y})}{T(G'/(v_1v_m);\varvec{y})} \le \frac{T(G;\varvec{y})}{T(G/(v_1v_m);\varvec{y})} \le \frac{T(G';\varvec{y})}{T(G'/(v_1v_m);\varvec{y})} + {\varvec{y}}_e \end{aligned}$$

Proof

Let \(Y'\) be the diagonal matrix Y with the entry corresponding to \({\varvec{y}}_e\) replaced by 0. Then we immediately have

$$\begin{aligned} \min _{{\varvec{\varphi }} \in {\mathbb {R}}^{m}; {\varvec{\varphi }}_1 = 1; {\varvec{\varphi }}_m = 0} \langle D Y' D^T {\varvec{\varphi }}, {\varvec{\varphi }} \rangle \le \min _{{\varvec{\varphi }} \in {\mathbb {R}}^{m}; {\varvec{\varphi }}_1 = 1; {\varvec{\varphi }}_m = 0} \langle D Y D^T {\varvec{\varphi }}, {\varvec{\varphi }} \rangle \end{aligned}$$

which proves the first inequality. For the second, note that for any edge \(e \in E\), let \(v_-,v_+ \in V\) be the incident vertices, then \(|\varphi (v_-)-\varphi (v_+)| \le 1\), hence for any \(\varvec{y}\) having each component bounded by 1 satisfies

$$\begin{aligned} \langle D Y D^T {\varvec{\varphi }}, {\varvec{\varphi }} \rangle \le \langle D Y' D^T {\varvec{\varphi }}, {\varvec{\varphi }} \rangle + {\varvec{y}}_e \end{aligned}$$

which proves the last inequality. \(\square \)

3 Proof of the main theorem

The proof of the main theorem consists of an upper and a lower bound of the capacity. The lower bound uses the electrical network defined in Sect. 1.2 and the variational definition of capacity, similar to the proof in [1]. For simple networks the lower bound follows from the variational characterization of capacity together with the fundamental theorem of calculus.

For the upper bound, we provide a novel proof using ideas from Geometric Function Theory together with Thompson’s principle which is in a sense dual to the lower bound. In this case, for simple networks the upper bound follows from Thompson’s principle together with the divergence theorem. For the general case we need an alternative construction of the electrical network.

3.1 Alternative construction of the electrical network

We will construct the alternative electrical network using the domain \(U_{\delta /3}\) (see Definition 1.4), instead of using the components of \(U_{-\delta /3}\) as in Sect. 1.2. To this aim, for a saddle point \(z \in Z\), we define the surface

$$\begin{aligned} S_z:= T_z\left( \{0 \} \times \{x' \in {\mathbb {R}}^{n-1}: G_z(x') < \delta \}\right) , \end{aligned}$$
(3.1)

where \(T_z\) is from Definition 1.1. The set \(U_{\delta /3}\) is connected, but the surfaces \(S_z\) in (3.1) divide it into different components, which we will associate with vertices, see Fig. 5. Define

$$\begin{aligned} \Omega _{\delta /3}: = U_{\delta /3}\setminus \bigcup _{z \in Z} S_z. \end{aligned}$$
(3.2)

We will now provide two technical lemmas. The first says that any path connecting two local minimum points in \(U_{-\delta /3}\) necessarily passes through a surface \(S_z\) for some z in the set of saddles Z, where we recall that Z denotes the saddle points inside \(U_{\delta /3} \setminus U_{-\delta /3}\). The second lemma states that \(U_{-\delta /3}\) and \( \Omega _{\delta /3}\) have the same number of components and \( \Omega _{\delta /3}\) defines exactly the same graph \(G=(V,E)\) as in Sect. 1.2.

Lemma 3.1

Let \(U_{v}\) and \(U_{v'}\) be two different components of \(U_{-\delta /3}\) and let \(\gamma \in {\mathcal {C}}(U_{v},U_{v'}; U_{\delta /3})\). Then there is a critical point \(z \in Z\) such that the intersection \(\gamma ([0,1]) \cap S_z\) is non-empty.

Proof

W.L.O.G. we assume \(F(x_u;x_w)= 0\). Fix \(\gamma _0 \in {\mathcal {C}}(U_{v},U_{v'}; U_{\delta /3})\) and denote \(\gamma \sim \gamma _0\) when \(\gamma \) is homotopy equivalent to \(\gamma _0\) in \(U_{\delta /3}\). Define

$$\begin{aligned} F_{\gamma _0}:= \inf _{\gamma \sim \gamma _0} \sup _{t \in [0,1]} \, F(\gamma (t)). \end{aligned}$$

Then there is a critical point z of F such that \(F(z) = F_{\gamma _0}\) and a continuous path \(\gamma _1 \sim \gamma _0\) such that \(\gamma _1(t) = z\) for some \(t \in (0,1)\). We may choose the coordinates in \({\mathbb {R}}^n\) such that \(z = 0\) and \(S_z = S_0 = \{0 \} \times \{x' \in {\mathbb {R}}^{n-1}: G(x') < \delta \}\).

Note that \(S_0\) is a convex hypersurface with boundary \(\partial S_0 = \{0 \} \times \{x' \in {\mathbb {R}}^{n-1}: G(x') = \delta \}\), and note that \(\partial S_0\) is homeomorphic to \({\mathbb {S}}^{n-2}\). Since F is admissible it follows from (1.1) that \(F(x)\ge F(0) + 2\delta /3\) on \(x \in \partial S_0\) and therefore since \(F(0) > -\delta /3\) we have \(\partial S_0 \subset {\mathbb {R}}^n {\setminus } U_{\delta /3}\). In particular, if \(\gamma \) is a path in \(U_{\delta /3}\) then it does not intersect \(\partial S_0\), and if \(\gamma \sim \gamma _0 \) then \(\gamma \) has to intersect \(S_0\). The claim then follows from \(\gamma _1 \sim \gamma _0\). \(\square \)

Lemma 3.2

The set \(\Omega _{\delta /3} \) defined in (3.2) has the same components as \( U_{-\delta /3}\) defined in (1.2). To be more precise, if \(\Omega '\) is a component of \(\Omega _{\delta /3} \) then there is exactly one component, say \(U'\), of \(U_{-\delta /3}\) such that \(U' \subset \Omega '\).

Proof

W.L.O.G. we assume \(F(x_u;x_w)= 0\). Let us fix a component \(\Omega '\) of \(\Omega _{\delta /3} \). Since F is admissible, then for any \(z \in Z\), we see from the definition of \(S_z\) in (3.1) that \(F(x) \ge F(z)\) for all \(x \in S_z\), and hence \(S_z \cap U_{-\delta /3} = \emptyset \). Thus, there is a component \(U'\) of \(U_{-\delta /3}\) such that \(U' \subset \Omega '\). Let us also note that \(U'\) is the only component of \(U_{-\delta /3}\) which is in \(\Omega '\), since if there was another component \(U''\) then a curve \(\gamma \in {\mathcal {C}}(U',U''; \Omega ') \subset {\mathcal {C}}(U',U''; U_{\delta /3})\) necessarily intersects one \(S_z\) by Lemma 3.1. \(\square \)

We will localize the capacity of the sets \(A = B_\varepsilon (x_u)\) and \(B = B_\varepsilon (x_w)\) in \(U_{\delta /3}\) by defining

$$\begin{aligned} {\text {cap}}(A ,B; U_{\delta /3}):= \inf \left( \varepsilon \int _{U_{\delta /3}} |\nabla h|^2 e^{-\frac{F}{\varepsilon }}\, dx: h=1 \text { in } A, h \in W_0^{1,2}({\mathbb {R}}^n \setminus B) \right) . \nonumber \\ \end{aligned}$$
(3.3)

In the above minimization problem we do not have any boundary condition on \(\partial U_{\delta /3}\). Thus, it follows from a classical result of calculus of variations (see [6, Sect. 2.4]) that the minimizer \({\hat{h}}_{A,B}\) of (3.3) satisfies the natural boundary condition, \(\nabla {\hat{h}}_{A,B} \cdot n = 0\) on the smooth part of \(\partial U_{\delta /3}\).

It is easy to see that for 3.3 it holds

$$\begin{aligned} {\text {cap}}(A ,B) \ge {\text {cap}}(A ,B; U_{\delta /3}) \ge (1-C \hat{\eta }(\varepsilon )) {\text {cap}}(A ,B), \end{aligned}$$
(3.4)

where \(\hat{\eta }\) is as in Sect. 1.3. Indeed, the first inequality in (3.4) is trivial. For the second we take \({\hat{h}}_{A,B}\) to be the minimizer of 3.3 and we recall the rough capacity bound from Avelin et al. [1, Lemma 3.2], i.e., there exists constants \(c_1,c_2,q_1,q_2\) such that

$$\begin{aligned} c_1 \varepsilon ^{q_1} e^{-F(x_u;x_w)/\varepsilon } \le {\text {cap}}(A ,B) \le c_2 \varepsilon ^{q_2} e^{-F(x_u;x_w)/\varepsilon }. \end{aligned}$$
(3.5)

We choose a cut-off function \(0 \le \zeta \le 1\) such that \(\zeta = 1\) in \(U_{\delta /6}\), \(\zeta = 0\) outside \(U_{\delta /3}\) and \(|\nabla \zeta | \le C\), where C depends on \(\delta \) and on the Lipschitz constant of the potential F. Then, using Young’s inequality, the maximum principle and (3.5) we get

$$\begin{aligned} \begin{aligned} {\text {cap}}(A ,B; U_{\delta /3})&\ge \varepsilon \int _{U_{\delta /3}} |\nabla h_{A,B}|^2 \zeta ^2 e^{-\frac{F}{\varepsilon }}\, dx\\&\ge \frac{\varepsilon }{1+ \varepsilon } \int _{U_{\delta /3}}|\nabla ({\hat{h}}_{A,B} \zeta ) |^2 e^{-\frac{F}{\varepsilon }}\, dx - \frac{2}{\varepsilon } \int _{U_{\delta /3}} |\nabla \zeta |^2 {\hat{h}}_{A,B}^2 e^{-\frac{F}{\varepsilon }}\, dx\\&\ge \varepsilon (1-2\varepsilon ) \int _{{\mathbb {R}}^n} |\nabla ({\hat{h}}_{A,B} \zeta ) |^2 e^{-\frac{F}{\varepsilon }}\, dx - \frac{C}{\varepsilon } e^{-\frac{\delta }{6 \varepsilon }} e^{-F(x_u,x_w)/\varepsilon } \\&\ge (1 - C \hat{\eta }(\varepsilon )) {\text {cap}}(A ,B), \end{aligned} \end{aligned}$$

where the last inequality follows from the sub-linearity of \(\hat{\eta }\).

3.2 Thompson’s principle

The construction of the network via (3.2) is suitable for the dual definition of the capacity via Thompson’s principle. This is done by defining a class of vector fields, denoted by \({\mathcal {M}}\), where \(X \in {\mathcal {M}}\) if \(X \in W^{1,\infty }(U_{\delta /3} {\setminus } ({\bar{A}} \cup {\bar{B}}); {\mathbb {R}}^n)\) and satisfies

$$\begin{aligned} \left\{ \begin{array}{ll} \text {div} X = 0 &{} \text {in }U_{\delta /3} \setminus ({\bar{A}} \cup {\bar{B}}), \\ X \cdot n = 0 &{} \text {on }\partial U_{\delta /3} \\ \int _{\partial A} X \cdot n = 1. \end{array}\right. \end{aligned}$$
(3.6)

We note that the set \({\mathcal {M}}\) is non-empty, since the vector field \(X = C e^{-V/\varepsilon } \nabla {\hat{h}}_{A,B}\), where \(C = ( {\text {cap}}(A ,B; U_{\delta /3}))^{-1}\), belongs to \({\mathcal {M}}\). Then we have the following (see e.g. [12])

$$\begin{aligned} \frac{1}{ {\text {cap}}(A ,B;U_{\delta /3})} = \inf \left( \varepsilon \int _{U_{\delta /3} \setminus ({\bar{A}} \cup {\bar{B}})} |X|^2 e^{\frac{F}{\varepsilon }}\, dx \,: \, X \in {\mathcal {M}} \right) . \end{aligned}$$
(3.7)

Let \(G=(V,E)\) be the graph constructed as above using the domain \(\Omega _{\delta /3}\) defined in (3.2) and let \(X \in {\mathcal {M}}\). We construct a current \(j:E \rightarrow {\mathbb {R}}\) associated with X as follows. Let us fix a vertex \(v \in V {\setminus } \{ u,w\}\) and let \({\tilde{U}}_v\) be the associated component of the domain \(\Omega _{\delta /3} \). Denote the edges incident with v by \(e \in E_v \subset E\) and the associated surface defined in (3.1) by \(S_e = S_{z_e}\). The boundary \(\partial {\tilde{U}}_v\) is contained in \(\partial U_{\delta /3} \cup (\bigcup _{e \in E_v} S_e)\). Recall that \(v \ne u,w\), therefore \(\text {div} X = 0\) in \({\tilde{U}}_v\), and we have by the divergence theorem and by \(X \cdot n = 0 \) on \(\partial U_{\delta /3}\) that

$$\begin{aligned} 0 = -\int _{{\tilde{U}}_v} \text {div} (X) \, dx = \int _{\partial {\tilde{U}}_v} X \cdot n \, d {\mathcal {H}}^{n-1} = \sum _{e \in E_v} \int _{S_e} X \cdot n \, d {\mathcal {H}}^{n-1}. \end{aligned}$$
(3.8)

We define the value of j at \(e \in E_v\) as

$$\begin{aligned} j(e):= \left\{ \begin{array}{ll} \varepsilon \int _{S_e} X \cdot n \, d {\mathcal {H}}^{n-1},&{} \,\, \text{ if } e \text{ points } \text{ into } v, \\ - \varepsilon \int _{S_e} X \cdot n \, d {\mathcal {H}}^{n-1}, &{}\,\, \text{ if } e \hbox { points out of } v. \end{array}\right. \end{aligned}$$
(3.9)

We define the current similarly also at edges incident with u and w. If we label the edges as \(e_1, \dots , e_l\) we have a vector \({\varvec{j}}\in {\mathbb {R}}^{|E|}\) which has components \({\varvec{j}}_k = j(e_k)\). By construction and by (3.8) \({\varvec{j}}\) satisfies the so-called Kirchhoff’s current law, which means that at every vertex the current flowing in equals the current flowing out. We may write this simply as (see [15])

$$\begin{aligned} D {\varvec{j}} = {\varvec{\delta }}_1 - {\varvec{\delta }}_m \end{aligned}$$

where we have labeled the vertices as \(v_1, \dots , v_m\) with \(v_1 = u\) and \(v_m = w\), and \({\varvec{\delta }}_1\) and \( {\varvec{\delta }}_m\) are as in Lemma 2.3.

3.3 Technical lemmas

Before we prove the main theorem we recall the following lemma from Avelin et al. [1].

Lemma 3.3

Let F be admissible. Let \(x_u,x_w\) be as in Definition 1.4 and assume that communication height from Definition 1.4 is zero, i.e., \(F(x_u;x_w) = 0\). If \({\hat{U}}_v\) is a component of \(U_{-\delta /2} = \{ F < -\delta /2\}\), then

$$\begin{aligned} \underset{{\hat{U}}_v}{{\text {osc}}\,} h_{B_\varepsilon (x_u),B_\varepsilon (x_w)} \le C e^{-\frac{3\delta }{16 \varepsilon }} , \end{aligned}$$

for small enough \(\varepsilon \le \varepsilon _0\).

Proof

The proof is almost the same as in [1, Lemma 3.5], but we repeat it for the reader’s convenience. Let us denote \(u:= h_{B_\varepsilon (x_u),B_\varepsilon (x_w)}\) for short.

Recall that \({\hat{U}}_v\) is a component of \(U_{-\delta /2} = \{ F < -\delta /2\}\). Since F is Lipschitz continuous, we find a Lipschitz domain \(D_v\) such that

$$\begin{aligned} {\hat{U}}_v \subset D_v \subset U_{-\frac{4\delta }{9}} = \{ F < -\tfrac{4\delta }{9}\} \end{aligned}$$

and the Poincaré inequality holds in \(D_v\) with a constant that depends on \(\Vert F\Vert _{C^{0,1}}\), i.e.,

$$\begin{aligned} \int _{D_v} |u - u_{D_v}|^2 \, dx \le C \int _{D_v} |\nabla u|^2 \, dx, \end{aligned}$$

where \(u_{D_v}\) denotes the average of u in \(D_v\). We use the rough capacity bound (3.5) and \(D_v \subset U_{-4\delta /9}\) to deduce

$$\begin{aligned} \int _{D_v} |\nabla u|^2 \, dx&\le e^{-\frac{4\delta }{9 \varepsilon }} \int _{D_v} |\nabla u|^2 e^{-\frac{F}{\varepsilon }} \, dx \\&\le \varepsilon ^{-1} e^{-\frac{4\delta }{9 \varepsilon }} {\text {cap}}(B _\varepsilon (x_u),B_\varepsilon (x_w)) \le C \varepsilon ^{q-1} e^{-\frac{4\delta }{9 \varepsilon }}. \end{aligned}$$

Fix a point \(x_0 \in {\hat{U}}_v\). Then by Harnack’s inequality [1, Lemma 2.7] it holds

In conclusion, we have (since \(\varepsilon \le \varepsilon _0\))

$$\begin{aligned} \sup _{B_{\varepsilon }(x_0)} |u - u_{D_v}| \le \varepsilon ^{\frac{q-1-n}{2}} e^{-\frac{2\delta }{9 \varepsilon }} \le C e^{-\frac{3\delta }{16 \varepsilon }}. \end{aligned}$$

The claim follows from the fact that \(x_0\) is arbitrary point in \({\hat{U}}_v\). \(\square \)

We also need the following lemma which relates the function \(\hat{\eta }\) to \(\omega \) in the assumption (1.1). This lemma can be found in [1, Lemma 3.9].

Lemma 3.4

Assume that \(G: {\mathbb {R}}^k \rightarrow {\mathbb {R}}\) is a convex function which has a proper minimum at the origin and let \(\omega \) be the increasing function from (1.1). Then for a fixed \(\delta \le \delta _0\) and for any \(\varepsilon \le \varepsilon _0\) it holds

$$\begin{aligned} (1 -\hat{\eta }(\varepsilon )) \int _{{\mathbb {R}}^k} e^{-\frac{G(x)}{\varepsilon }} \, dx\le \int _{\{ G < \delta \}} e^{-\frac{G(x)}{\varepsilon }} e^{\pm \frac{\omega (G(x))}{\varepsilon }} \, dx \le (1 +\hat{\eta }(\varepsilon )) \int _{{\mathbb {R}}^k} e^{-\frac{G(x)}{\varepsilon }} \, dx, \end{aligned}$$

for a continuous an increasing function \(\hat{\eta }\) with \(\hat{\eta }(0) = 0\), which depends on \(\omega \) and on the dimension.

3.4 Proof of the main theorem

We prove the main theorem by providing sharp lower bounds for the variation definition of the capacity and for (3.7), which is in some sense the dual of the argument in [12].

Proof of Theorem 1

Consider two local minima \(x_u,x_w\), let \(A = B_\varepsilon (x_u)\) and \(B = B_\varepsilon (x_w)\), and let \(h_{A,B}\) be the capacitary potential for the capacitor (AB). By rescaling we may assume that communication height from Definition 1.4 is zero, i.e., \(F(x_u;x_w)= 0\).

Lower bound: Let \((G,\varvec{y})\) be the electrical network from Sect. 1.2, and label the vertices as \(V = \{v_1,\ldots ,v_m\}\), where \(v_1 = u\), \(v_m = w\). We need to show that

$$\begin{aligned} {\text {cap}}(A ,B) \ge (1 - C \hat{\eta }(\varepsilon )) \frac{T(G;{\varvec{y}})}{T(G/uw;{\varvec{y}})}, \end{aligned}$$

where \(\hat{\eta }\) is as in Sect. 1.3.

Let \(\varphi :V \rightarrow {\mathbb {R}}\) be a function such that \(\varphi (v) = h_{A,B}(x_v)\) where \(v \in V\) and \(x_v\) is the associated minimum point. Let \({\hat{U}}_v\) be the component of \(\{ F < -\delta /2\}\) which contains \(x_v\). By Lemma 3.3 we have

$$\begin{aligned} \text {osc}_{{\hat{U}}_v}(h_{A,B}) \le C e^{-\frac{3\delta }{16 \varepsilon }} \quad \text {for all }v \in V. \end{aligned}$$

Therefore, \(\varphi \) satisfies

$$\begin{aligned} | h_{A,B} - \varphi (v) | \le C e^{-\frac{3\delta }{16 \varepsilon }} \quad \text {in } \,{\hat{U}}_{v} \quad \text {for }v \in V. \end{aligned}$$
(3.10)

Consider an edge \(e \in E\), which is not a loop, and let \(v_{-},v_{+}\) be the two incident vertices in V. Denote the associated minimum points as \(x_{-},x_{+}\), the associated islands as \(U_{-},U_{+}\) respectively and the saddle point as \(z_{e}\). We may assume that \(z_e = 0\) and that the bridge is given by

$$\begin{aligned} O_{e} = O_{z_e, \delta } = \{y_1: g(y_1)< \delta \} \times \{y': G(y') < \delta \}. \end{aligned}$$

Let us consider a domain (see Fig. 4)

$$\begin{aligned} {\hat{O}}_{e}:= \{y_1: g(y_1)< \delta \} \times \{y': G(y') < \delta /100 \} \end{aligned}$$

and denote for \(\tau \ge 0\) the surface

$$\begin{aligned} S_\tau := \{\tau \} \times \{y': G(y') < \delta /100\}. \end{aligned}$$
Fig. 4
figure 4

The bridge \(O_e\) connects the sets \({\hat{U}}_{v_+}\) and \({\hat{U}}_{v_-}\). The smaller cylindrical bridge \({\hat{O}}_e\) has its lateral boundaries inside \({\hat{U}}_{v_+} \cup {\hat{U}}_{v_-}\)

We denote the lateral boundary of \( {\hat{O}}_{e} \) by \(\Gamma _e:= \{y_1: g(y_1) = \delta \} \times \{y': G(y') < \delta /100 \}\) and note that \(\Gamma _e = S_{\tau _1}\cup S_{\tau _2}\) for \(\tau _1<0 < \tau _2\) which satisfy \(g(\tau _1) = g(\tau _2)= \delta \). Recall that we assume \(F(z_e) = F(0)< \delta /3\) and therefore by the definition of \({\hat{O}}_{e}\) and assumption (1.1) it holds for all \(y \in \Gamma _e\) that

$$\begin{aligned} F(y) \le \overbrace{F(0)}^{<\delta /3} - \overbrace{g(y_1)}^{=\delta } + \overbrace{G(y')}^{<\delta /100} + \overbrace{\omega (g(y_1))}^{<\delta /100} + \overbrace{\omega (G(y'))}^{<\delta /100} < - \frac{\delta }{2}. \end{aligned}$$
(3.11)

In other words, the lateral boundary \(\Gamma _e\) is contained in the sublevel-set \(\{ F < -\delta /2\} \) and the inequality in (3.10) holds there.

Let us next prove that it holds

$$\begin{aligned} (1 - C \hat{\eta }(\varepsilon )) (\varphi (v_{-})-\varphi (v_{+}))^2 {\varvec{y}}_e \le \varepsilon \int _{O_{e}} |\nabla h_{A,B}|^2 e^{-\frac{F(y)}{\varepsilon }} \,dy + C e^{-\frac{\delta }{24 \varepsilon }}, \end{aligned}$$
(3.12)

where the admittance \( {\varvec{y}}_e\) is defined in (1.3). To this aim we fix \(y' \in \{y': G(y') < \delta /100 \}\), let \(\tau _1<0 < \tau _2\) be such that \(g(\tau _1) = g(\tau _2)= \delta \) and notice that \(( \tau _i, y') \in \Gamma _e\), for \(i =1,2\). Using the fundamental theorem of calculus and (3.10) we get

$$\begin{aligned} |\varphi (v_-) - \varphi (v_+)|-C e^{-\frac{3\delta }{16 \varepsilon }}&\le |h_{A,B}(\tau _2, y') - h_{A,B}(\tau _1, y')| \\&\le \int _{\{g< \delta \}} |\partial _{y_1} h_{A,B}(y_1,y')| dy_1\\&= \int _{\{g < \delta \}} |\nabla h_{A,B}(y_1,y')| e^{-\frac{F(y)}{2\varepsilon }} e^{\frac{F(y)}{2\varepsilon }}dy_1. \end{aligned}$$

By Cauchy–Schwarz inequality we have

$$\begin{aligned} \begin{aligned} (\varphi (v_{-})-\varphi (v_{+}))^2- 2C e^{-\frac{3\delta }{8 \varepsilon }} \le \left( \int _{\{g<\delta \} } |\nabla h_{A,B}(y)|^2 e^{-\frac{F(y)}{\varepsilon }} \,dy_1 \right) \left( \int _{\{g<\delta \} } e^{\frac{F(y)}{\varepsilon }} \,dy_1\right) \end{aligned} \end{aligned}$$

for \( (y_1,y') \in \{ g<\delta \} \times \{ G <\delta /100\}\). The assumption (1.1) implies

$$\begin{aligned} F(y) \le F(0) - g(y_1) + \omega (g(y_1)) + G(y') + \omega (G(y')). \end{aligned}$$

Dividing the above estimate by \(e^{\frac{G(y')}{\varepsilon }}e^{\frac{\omega (G(y'))}{\varepsilon }} \) and integrating over \(y'\) yields

$$\begin{aligned} \begin{aligned}&\left( \int _{\{G< \delta /100\}} e^{-\frac{G(y')}{\varepsilon }}e^{-\frac{\omega (G(y'))}{\varepsilon }} dy' \right) \big ((\varphi (v_{-})-\varphi (v_{+}))^2- 2C e^{-\frac{3\delta }{8 \varepsilon }} \big ) \\&\quad \le \left( \int _{ {\hat{O}}_{e} } |\nabla h_{A,B}(y)|^2 e^{-\frac{F(y)}{\varepsilon }} \,dy \right) \left( \int _{\{g<\delta \} } e^{-\frac{g(y_1)}{\varepsilon }} e^{\frac{\omega (g(y_1))}{\varepsilon }} \,dy_1\right) e^{\frac{F(0)}{\varepsilon }}. \end{aligned} \end{aligned}$$

Using Lemma 3.4 and Proposition 1.6 it holds

$$\begin{aligned} \begin{aligned} e^{\frac{F(0)}{\varepsilon }} \, \int _{\{g<\delta \} } e^{\frac{-g(y_1)}{\varepsilon }} e^{\frac{\omega (g(y_1))}{\varepsilon }} \,dy_1&\le (1 + \hat{\eta }(\varepsilon )) e^{\frac{F(0)}{\varepsilon }} \int _{{\mathbb {R}}} e^{\frac{-g(y_1)}{\varepsilon }} \,dy_1 \\&\le (1 + C \hat{\eta }(\varepsilon )) \, d_\varepsilon (B_\varepsilon (x_{-}),B_\varepsilon (x_{+});\Omega _e), \end{aligned} \end{aligned}$$

and, trivially

$$\begin{aligned} \int _{\{g< \delta \}} e^{\frac{-g(y_1)}{\varepsilon }} e^{\frac{\omega (g(y_1))}{\varepsilon }} dy_1 \ge \int _{\{g< \varepsilon \}} e^{\frac{-\varepsilon }{2 \varepsilon }} dy_1 \ge c |\{g < \varepsilon \}|. \end{aligned}$$

Since g is Lipschitz and \(g(0)= 0\) we have \((-c\varepsilon ,c\varepsilon ) \subset \{g < \varepsilon \}\) for some c, and therefore \(|\{g < \varepsilon \}| \ge c \varepsilon \). Again, by Lemma 3.4 and Proposition 1.6 we get

$$\begin{aligned} \begin{aligned} e^{-\frac{F(0)}{\varepsilon }} \, \int _{\{G < \delta /100\}} e^{-\frac{G(y')}{\varepsilon }}e^{-\frac{\omega (G(y'))}{\varepsilon }} dy'&\ge (1 - \hat{\eta }(\varepsilon ))e^{-\frac{F(0)}{\varepsilon }} \int _{{\mathbb {R}}^{n-1} } e^{\frac{-G(y')}{\varepsilon }} \,dy' \\&\ge (1- C\hat{\eta }(\varepsilon )) \, V_\varepsilon (B_\varepsilon (x_{-}),B_\varepsilon (x_{+});\Omega _e), \end{aligned} \end{aligned}$$

and trivially we also get

$$\begin{aligned} \int _{\{G< \delta /100\}} e^{\frac{-G(y')}{\varepsilon }} e^{\frac{- \omega (G(y'))}{\varepsilon }} dy' \le |\{G < \delta \}|. \end{aligned}$$

Recalling that \(F(0) \le \delta /3\), this together with the above estimates and the definition of the admittance \({\varvec{y}}_e\) (1.3) imply the inequality (3.12).

Since (3.12) holds for all \(e \in E\) we can sum the inequalities over e and rephrase the sum using the signed incidence matrix D and the admittance matrix Y. To this aim, let \(\varvec{\varphi }\) be the vector \((\varphi (v_1),\ldots ,\varphi (v_m))\), where \(v_1 = u\) and \(v_m = w\), and for an edge \(e \in E\), let \(v_{e^-},v_{e^+}\) be the incident vertices. Then since D is the \(|V |\times |E|\) signed incidence matrix, we have for the edges \((e_1,\ldots ,e_k)\)

$$\begin{aligned} D^T {\varvec{\varphi }} = (\varphi (v_{e^+_1})-\varphi (v_{e^-_1}),\ldots ,\varphi (v_{e^+_k})-\varphi (v_{e^-_k})). \end{aligned}$$

Furthermore, by the definition of the admittance matrix Y we have that

$$\begin{aligned} Y D^T {\varvec{\varphi }} = ((\varphi (v_{e^+_1})-\varphi (v_{e^-_1}))y_{e_1},\ldots ,(\varphi (v_{e^+_k})-\varphi (v_{e^-_k}))y_{e_k}). \end{aligned}$$

Recalling that (3.12) holds for every edge \(e \in E\), we get since sets \(O_{e_i}\) are disjoint, that

$$\begin{aligned} (1 - C \hat{\eta }(\varepsilon )) \langle DYD^T {\varvec{\varphi }}, {\varvec{\varphi }} \rangle \le&\varepsilon \int _{{\mathbb {R}}^n} |\nabla h_{A,B}|^2 e^{-\frac{F(y)}{\varepsilon }} \,dy + C e^{-\frac{\delta }{24 \varepsilon }} \\ \le&{\text {cap}}(A ,B) + C e^{-\frac{\delta }{24 \varepsilon }}. \end{aligned}$$

Now note that, the rough capacity bound (3.5) implies that \( {\text {cap}}(A ,B) \ge c_1 \varepsilon ^{q_1}\). By construction, it holds \({\varvec{\varphi }}_1 = \varphi (u) = 1\) and \({\varvec{\varphi }}_m = \varphi (w) = 0\), therefore Lemma 2.3 completes the proof of the lower bound.

Upper bound: We prove the upper bound by a similar argument by providing a lower bound in the dual characterization ((3.7)). Indeed, by the second inequality in (3.4) this provides an upper bound for the global capacity. Let us fix a vector field \(X \in {\mathcal {M}}\), where \({\mathcal {M}}\) is defined via conditions ((3.6)), and construct the associated current \({\varvec{j}} \in {\mathbb {R}}^{|E|}\) as in Sect. 3.1. The construction implies that \({\varvec{j}} \) satisfies Kirchhoff’s current law \(D {\varvec{j}} = {\varvec{\delta }}_1 - {\varvec{\delta }}_m\), and therefore it holds by Lemmas 2.3 and 2.4 that

$$\begin{aligned} \langle Y^{-1} {\varvec{j}} , {\varvec{j}} \rangle \ge \frac{T(G/uw;{\varvec{y}})}{T(G;{\varvec{y}})}. \end{aligned}$$

In order to conclude the proof, it is enough to show that at every edge \(e \in E\) it holds

$$\begin{aligned} \varepsilon \int _{O_e \cap U_{\delta /3}} |X|^2 e^{\frac{F}{\varepsilon }}\, dx \ge (1 - C \hat{\eta }(\varepsilon )) \frac{{\varvec{j}}_e^2}{{\varvec{y}}_e}, \end{aligned}$$
(3.13)

where \(O_{e} = O_{z_e, \delta }\) denotes the associated bridge. To this aim we may choose the coordinates in \({\mathbb {R}}^n\) such that

$$\begin{aligned} O_e = \{x_1 : g(x_1)< \delta \} \times \{x' : G(x') < \delta \}. \end{aligned}$$

For every \(|\tau |< \delta /100\) redefine

$$\begin{aligned} S_\tau := \{\tau \} \times \{x': G(x') \le \delta \}, \end{aligned}$$
(3.14)

and note that by the definition of \({\varvec{j}}\) in (3.9) it holds

$$\begin{aligned} \varepsilon \, \left| \int _{S_{0}} X \cdot {\hat{e}}_1 \, d {\mathcal {H}}^{n-1} \right| = |{\varvec{j}}_e|, \end{aligned}$$
(3.15)

where \({\hat{e}}_1\) is the first coordinate vector of \({\mathbb {R}}^n\). Let us fix \(0< \tau < \delta /100\) and consider the domain (see Fig. 5)

$$\begin{aligned} {\hat{O}}_\tau = \{x_1: 0< g(x_1)< \tau \} \times \{x' : G(x') < \delta \} \end{aligned}$$

and denote the ‘cylindrical’ boundary by

$$\begin{aligned} \Sigma _\tau = \{x_1 : 0 \le g(x_1) \le \tau \} \times \{x' : G(x') = \delta \}. \end{aligned}$$
Fig. 5
figure 5

The set \(S_{z_e}\) and the domain \({\hat{O}}_\tau \) separate the domain \(U_{\delta /3}\) into different components

Arguing as in (3.11) we deduce that \(F > \delta /3\) on \(\Sigma _\tau \) and therefore \(\Sigma _\tau \subset ({\overline{U}}_{\delta /3})^c\). Note that the ‘lateral’ boundary of \( {\hat{O}}_\tau \) is the union of \(S_0\) and \(S_\tau \) defined in (3.14). By (3.6) X is divergence free, and thus we obtain by the divergence theorem that

$$\begin{aligned} 0 =&\int _{{\hat{O}}_e \cap U_{\delta /3}} \text {div} (X) \, dx \\ =&\int _{\partial U_{\delta /3} \cap {\hat{O}}_e} X \cdot n \, d {\mathcal {H}}^{n-1} + \int _{S_0} X \cdot n \, d {\mathcal {H}}^{n-1} + \int _{S_\tau } X \cdot n \, d {\mathcal {H}}^{n-1} . \end{aligned}$$

Again by (3.6) we have \(X \cdot n = 0\) on \(\partial U_{\delta /3}\), and since the normal on the lateral boundary, \(S_0\) and \(S_\tau \), points in direction of \({\hat{e}}_1\), we have by (3.15)

$$\begin{aligned} \varepsilon \, \left| \int _{S_{\tau }} X \cdot {\hat{e}}_1 \, d {\mathcal {H}}^{n-1} \right| = \varepsilon \, \left| \int _{S_{0}} X \cdot {\hat{e}}_1 \, d {\mathcal {H}}^{n-1} \right| = |{\varvec{j}}_e|. \end{aligned}$$

We may apply the same argument to \(\tau <0\) to deduce the above equality for all \(|\tau |< \delta /100\).

We proceed by the Cauchy–Schwarz inequality

$$\begin{aligned} |{\varvec{j}}_e| = \varepsilon \, \left| \int _{S_{\tau }} X \cdot {\hat{e}}_1 \, d {\mathcal {H}}^{n-1} \right| \le \varepsilon \, \left( \int _{S_{\tau }} |X|^2 e^{\frac{F}{\varepsilon }} \, d {\mathcal {H}}^{n-1} \right) ^{\frac{1}{2}} \left( \int _{S_{\tau }} e^{-\frac{F}{\varepsilon }}\, d {\mathcal {H}}^{n-1}\right) ^{\frac{1}{2}}. \end{aligned}$$

By assumption (1.1) we have

$$\begin{aligned} F(y) \ge F(0) - g(y_1) - \omega (g(y_1)) + G(y') - \omega (G(y')) \end{aligned}$$

thus by Lemma 3.4 and Proposition 1.6 it holds

$$\begin{aligned} \begin{aligned} \int _{S_{\tau }} e^{-\frac{F}{\varepsilon }}\, d {\mathcal {H}}^{n-1}&\le e^{\frac{g(\tau )}{\varepsilon } } e^{\frac{\omega (g(\tau ))}{\varepsilon } } e^{-\frac{F(0)}{\varepsilon }} \int _{\{ G <\delta \}} e^{-\frac{G(x')}{\varepsilon } } e^{\frac{\omega (G(x'))}{\varepsilon } } \, dx' \\&\le (1+ \hat{\eta }(\varepsilon )) e^{\frac{g(\tau )}{\varepsilon } } e^{\frac{\omega (g(\tau ))}{\varepsilon } } e^{-\frac{F(0)}{\varepsilon }} \int _{{\mathbb {R}}^{n-1}} e^{-\frac{G(x')}{\varepsilon } } \, dx' \\&\le (1+ C \hat{\eta }(\varepsilon ))e^{\frac{g(\tau )}{\varepsilon } } e^{\frac{\omega (g(\tau ))}{\varepsilon } }V_\varepsilon (B_\varepsilon (x_{-}),B_\varepsilon (x_{+});\Omega _e). \end{aligned} \end{aligned}$$

Hence, by the three previous inequalities we have

$$\begin{aligned} \frac{{\varvec{j}}_e^2}{V_\varepsilon (B_\varepsilon (x_{-}),B_\varepsilon (x_{+});\Omega _e)} e^{-\frac{g(\tau )}{\varepsilon } } e^{-\frac{\omega (g(\tau ))}{\varepsilon } }\le (1+ C\hat{\eta }(\varepsilon )) \varepsilon ^2 \int _{S_{\tau }} |X|^2 e^{\frac{F}{\varepsilon }} \, d {\mathcal {H}}^{n-1}. \end{aligned}$$

Integrating over \(\tau \in (-\delta /100,\delta /100)\) and using Lemma 3.4 and Proposition 1.6 we get

$$\begin{aligned} {\varvec{j}}_e^2 e^{-\frac{F(0)}{\varepsilon }} \frac{d_\varepsilon (B_\varepsilon (x_{-}),B_\varepsilon (x_{+});\Omega _e)}{V_\varepsilon (B_\varepsilon (x_{-}),B_\varepsilon (x_{+});\Omega _e)} \le (1+ C \hat{\eta }(\varepsilon )) \varepsilon ^2 \int _{O_e} |X|^2 e^{\frac{F}{\varepsilon }}\, dx. \end{aligned}$$

Inequality (3.13) then follows from the definition of \({\varvec{y}}_e\) in (1.3). \(\square \)