We begin this section by proving that all critical points lie on the axes of the \(n\)-simplex. Thereafter, we analyze each axis, characterizing for which scales we see 1, 2, or 3 critical points. To decide which of the one-dimensional maxima are modes, we analyze the \(n\)-sections orthogonal to the axes. As it turns out, all modes lie on axes that pass through vertices of the \(n\)-simplex. Most interesting is the critical point at the barycenter, which changes from unique mode during an initial interval of scales, to \((n+2)\)-nd mode during a non-empty intermediate interval, to a saddle of index one during a final interval. We call the length of the intermediate interval the resilience of the extra mode and show that it grows like the square root of the dimension. Finally, we construct sums of isotropic Gaussian kernels with a superlinear number of modes.
Lines of Critical Points
In this subsection, we note that all critical points of a scaled \(n\)-design lie on the axes of the scaled \(n\)-simplex. We begin by introducing coordinates that are more natural for the \(n\)-design, and we show how they relate to the barycentric coordinates.
Distance Coordinates
Write \(v_i = s e_i\), for \(0 \le i \le n\), and let \(x\) be a point of the corresponding scaled \(n\)-simplex \(s \varDelta ^n\). Setting \(r_i = {\Vert {x}-{v_i}\Vert }\), we note that \(x\) is uniquely defined by the vector of \(n+1\) distances since \(x\) lies on the hyperplane spanned by \(\{ v_i \}\). We express this by writing \(x = (r_0, r_1, \ldots , r_n)_D\), and by calling the \(r_i\) the distance coordinates of \(x\). Recall that \((x_0, x_1, \ldots , x_n)_B\) is the representation of the same point in barycentric coordinates. We are interested in computing the barycentric from the distance coordinates via the coordinate transformation below.
Coordinate Transformation
For
\(0 \le i \le n\), the
\(i\)th barycentric coordinate is given by:
$$\begin{aligned} x_i&= \frac{1}{n+1} + \frac{1}{2(n+1)s^2} \Big ( \sum _{j=0}^n r_j^2 - (n+1) r_i^2 \Big ). \end{aligned}$$
(8)
Proof
Let \((r_0, r_1, \ldots , r_n)_D\) be the distance coordinates of a point \(x\) in the scaled \(n\)-simplex \(s\varDelta ^n\). Let \(i \ne j\) and consider the edge connecting \(v_i\) with \(v_j\), recalling that \(v_i\) and \(v_j\) are two vertices of \(s\varDelta ^n\). The length of the edge is \(s \sqrt{2}\). Let \(x_{ij}\) be the distance between \(v_j\) and the orthogonal projection of \(x\) onto the edge, normalized by dividing with \(s \sqrt{2}\); see Fig. 5. We first show that
$$\begin{aligned} x_{ij} = \frac{1}{2} + \frac{1}{4 s^2} (r_j^2 - r_i^2). \end{aligned}$$
(9)
Indeed, if \(x = (1-t) v_j + t v_i\), then we have \(r_j = s \sqrt{2} t\) and \(r_i = s \sqrt{2} (1-t)\). Furthermore, \(x_{ij} = t\), which agrees with the equation we get by plugging the values of \(r_i\) and \(r_j\) into (9). Realizing that \(r_j^2 - r_i^2\) is constant along hyperplanes orthogonal to the edge, we get (9) for all points of the \(n\)-simplex.
For the next step, let \(b_i\) be the barycenter of the \((n-1)\)-face complementary to \(v_i\) and \(y\) be the orthogonal projection of \(x\) onto the edge between \(v_i\) and \(v_j\). Set \(\alpha _n\) to the angle between the edges that connect \(v_i\) to \(v_j\) and to \(b_i\). Because \(s \varDelta ^n\) is regular, this angle does not depend on the choice of \(i\) and \(j\). Suppose \(x\) lies on the latter edge, which connects \(v_i\) and \(b_i\). Then \(x = (1-x_i) b_i + x_i v_i\) and we have two expressions for \(\cos \alpha _n\). Setting these two expression equal we arrive at
$$\begin{aligned} \frac{D_{0,n-1}}{\sqrt{2}} =\frac{\sqrt{2}(1-x_{ij})}{D_{0,n-1}(1-x_i)}, \end{aligned}$$
for every \(0 \le j \le n\) and \(j \ne i\). Adding the \(n\) equations gives
$$\begin{aligned} nD_{0,n-1}^2(1-x_i) = 2n - 2 \sum _{j\ne i}x_{ij}. \end{aligned}$$
(10)
Similar to before, we notice that the two sides of the equation are constant along hyperplanes orthogonal to the axis defined by \(v_i\) and \(b_i\). Hence, (10) holds for all points \(x\) of the scaled \(n\)-simplex. It remains to plug (4) and (9) into (10), which gives
$$\begin{aligned} (n+1)(1-x_i) = n + \frac{1}{2s^2}\big ( nr_i^2 - \sum _{j \ne i} r_j^2 \big ). \end{aligned}$$
The equation simplifies to the claimed equation. \(\square \)
Non-Zero Gradients
Recall that \(G_s :{{\mathbb{R }}}^{n+1} \rightarrow {{\mathbb{R }}}\) is the scaled \(n\)-design formed by taking the sum of the \(n+1\) unit Gaussian kernels whose centers are the vertices of \(s \varDelta ^n\). We use the coordinate transformation to show that \(G_s\) has no critical points away from the axes of the scaled \(n\)-simplex:
Axes Lemma
Every critical point of
\(G_s\)
lies on an axis of the scaled
\(n\)-simplex
\(s\varDelta ^n\).
Proof
Recall that a point \(x\) belongs to an axis of \(s \varDelta ^n\) iff it has at most two distinct barycentric coordinates. We will show that if \(x\) has three distinct barycentric coordinates, then the gradient of \(G_s\) at \(x\) is non-zero. Writing \(f_i = g_{s e_i}\), we obtain
$$\begin{aligned} \nabla G_s (x)= -2\pi G_s (x) \cdot x + 2 \pi \sum _{i=0}^n f_i (x) \cdot v_i. \end{aligned}$$
Setting the gradient to zero, we solve for \(x\)
$$\begin{aligned} x =\sum _{i=0}^n \frac{f_i (x)}{G_s (x)} \cdot v_i. \end{aligned}$$
(11)
We will show that Eq. (11) can hold only if \(x\) has at most two distinct barycentric coordinates. To this end, we write \(x\) in distance coordinates: \(x = (r_0, r_1, \ldots , r_n)_D\). Similar to the barycentric coordinates, \(x\) lies on an axis of \(s \varDelta ^n\) iff there are at most two distinct distance coordinates. Transforming \(x\) into barycentric coordinates, we have \(x = (x_0, x_1, \ldots , x_n)_B\), in which
$$\begin{aligned} x_i = \frac{1}{n+1}+\frac{1}{2(n+1)s^2} \big (\sum _{j=0}^n r_j^2-(n+1)r_i^2 \big ), \end{aligned}$$
for \(0 \le i \le n\). Assume now that \(x\) does not lie on any of the axes. If follows there are three distinct distance coordinates: \(r_k < r_\ell < r_m\). Subtracting the \(m\)th barycentric coordinate from the other two, we obtain
$$\begin{aligned} x_k - x_m = \frac{1}{2s^2}(r_m^2 - r_k^2), \end{aligned}$$
(12)
$$\begin{aligned} x_\ell - x_m = \frac{1}{2s^2} (r_m^2 - r_\ell ^2). \end{aligned}$$
(13)
Assuming a zero gradient, the barycentric coordinates of \(x\) have the form given in (11). Hence, \(x_k - x_m\) and \(x_\ell - x_m\) are equal to
$$\begin{aligned} \frac{f_k (x) - f_m (x)}{G_s (x)}&= \frac{\mathrm{e}^{-\pi r_k^2} - \mathrm{e}^{-\pi r_m^2}}{G_s (x)}, \end{aligned}$$
(14)
$$\begin{aligned} \frac{f_\ell (x) - f_m (x)}{G_s (x)}&= \frac{\mathrm{e}^{-\pi r_\ell ^2} - \mathrm{e}^{-\pi r_m^2}}{G_s (x)}, \end{aligned}$$
(15)
respectively. Since the right-hand sides of (12) and (14) are equal, as well as the right-hand sides of (13) and (15), we have
$$\begin{aligned} \frac{r_m^2 - r_k^2}{\mathrm{e}^{-r_m^2} - \mathrm{e}^{-r_k^2}} = \frac{r_m^2 - r_\ell ^2}{\mathrm{e}^{-r_m^2} - \mathrm{e}^{-r_\ell ^2}}. \end{aligned}$$
But this is impossible because \(r_m^2 - r_k^2 > r_m^2 - r_\ell ^2\), by assumption, and because the function \(f(t) = \mathrm{e}^{-t}\) is strictly convex. \(\square \)
One-Dimensional Sections
The restriction of \(G_s\) to an axis of \(s \varDelta ^n\) is a sum of two weighted Gaussian kernels. This sum has two maxima for a range of scale factors, which we now analyze.
Transitions
Recall that the \(n\)-design consists of \(n+1\) unit Gaussian kernels placed at the vertices of the standard \(n\)-simplex. Consider the 1-section defined by the line that connects the barycenter of a \(k\)-face with the barycenter of the complementary \(\ell \)-face, with \(k + \ell = n -1\), and vary the construction by scaling the design with \(s \ge 0\). We call a value a transition if the number of critical points of the 1-section changes as \(s\) passes the value. It is easy to compute the transition for \(k = \ell = \frac{n-1}{2}\) because the corresponding 1-section is balanced for all scale factors \(s\). The distance between the two centers is \(s D_{\ell ,\ell } = 2s/\sqrt{n+1}\) and we find the transition by setting the distance equal to \(2 {\sigma }_0\), which gives \(s\) equal to
$$\begin{aligned} U_{n} = \sqrt{ \frac{n+1}{2 \pi }}. \end{aligned}$$
(16)
Consider next the case \(k < \ell \). Equation (7) gives the weights of the two kernels in the decomposition of the 1-section as \((k+1) g(s R_k)\) and \((\ell +1) g(s R_\ell )\). Using (5) and taking the ratio, the weight function is computed by the following function:
$$\begin{aligned} {\omega }_{k,\ell } \big (s\big ) = \frac{\ell +1}{k+1} \cdot \mathrm{e}^{- \pi s^2 \big (\frac{1}{k+1} - \frac{1}{\ell +1}\big )}. \end{aligned}$$
(17)
We compare this with the two transition functions, which we get by setting \(z = \frac{s}{2} D_{k,\ell }\) and plugging the two solutions of \(x^2 = z^2 - {\sigma }_0^2\) into the formula for \(r(x) + 1\), which we get from (3). This gives
$$\begin{aligned} {\tau }_{k,\ell } (s)&= \frac{z-\sqrt{z^2-{\sigma }_0^2}}{z+\sqrt{z^2-{\sigma }_0^2}} \cdot \mathrm{e}^{4 \pi z \sqrt{z^2-{\sigma }_0^2} }, \end{aligned}$$
(18)
$$\begin{aligned} {\upsilon }_{k,\ell } (s)&= \frac{z+\sqrt{z^2-{\sigma }_0^2}}{z-\sqrt{z^2-{\sigma }_0^2}} \cdot \mathrm{e}^{- 4 \pi z \sqrt{z^2-{\sigma }_0^2} }. \end{aligned}$$
(19)
Note that \({\upsilon }_{k,\ell } (s) = 1 / {\tau }_{k,\ell } (s)\). We find the first transition, \(T_{k,\ell }\), by solving \({\omega }_{k,\ell } (s) = {\tau }_{k,\ell } (s)\), and the second transition, \(U_{n}\), by solving \({\omega }_{k,\ell } (s) = {\upsilon }_{k,\ell } (s)\). Appendix A will prove that both transitions are well defined, also showing that the second transition depends on \(n\) but not on \(k\) and \(\ell \) and is given by (16) in all cases. While we have no analytic expression for \(T_{k,\ell }\), we will derive one for an upper bound in Sect. 3.4.
Section Evolution
We follow the 1-section defined by an axis of the \(n\)-simplex as the scale factor, \(s\), goes from 0 to infinity. By construction, we have qualitative changes at the transitions, which we now summarize.
1-Section Lemma
Let
\(0 \le k \le \ell \)
with
\(k+\ell = n-1\), and let
\(A\)
be the axis passing through the barycenters of a
\(k\)-face and its complementary
\(\ell \)-face of
\(s \varDelta ^n\). Then
\(G_s|_A\)
has one maximum whenever
\(s<T_{k,\ell }\), and two maxima whenever
\(T_{k,\ell } < s\)
and
\(s \ne U_{n}\).
Indeed, the double intersection is responsible for the special evolution of the 1-section. In particular, we go from one maximum for \(s < T_{k,\ell }\) to two maxima for \(T_{k,\ell } < s < U_n\), of which one is the barycenter of the \(n\)-simplex. After the second transition at the double intersection, we still have two maxima, but now the separating minimum is the barycenter of the \(n\)-simplex. Figure 6 shows all transition scale factors in a single picture for small values of \(k\) and \(\ell \). First, we look at \(T_{k,\ell }\) for a fixed value of \(n\). We observe that \(T_{k,\ell }\) increases with growing \(k\). This implies that for constant \(n\), the axes defined for small values of \(k\) spawn a second maximum earlier than do axes defined by large values of \(k\). For \(k = \ell \), the two transitions coincide and the corresponding 1-section does not witness the extra maximum at all. Second, we fix \(k\) and observe that \(T_{k,\ell }\) increases with growing \(\ell \). This implies that the \(k\)-faces of a low-dimensional simplex spawn second maxima earlier than do the \(k\)-faces in high-dimensional simplices.
Next, we look at the second transition, \(U_{n}\). As we have observed, it depends only on \(n\). This implies that all 1-sections lose the maximum at the barycenter at the same scale factor. For \(k = \ell \), the two transitions coincide, so the interval collapses.
\(n\)-Dimensional Sections
In this subsection, we show that most maxima of the 1-sections are not modes. We begin with the analysis of the barycenter of \(s \varDelta ^n\), which belongs to every axis of the scaled \(n\)-simplex.
Barycenter of \(n\)-simplex
The \(n\)-design has the symmetry group of the \(n\)-simplex, which implies that the barycenter, \(b_G \in s \varDelta ^n\), is a critical point of \(G_s\). Indeed, if \(b_G\) is not a critical point, then it has a non-zero gradient, which contradicts the symmetry. More specifically, \(b_G\) is either a maximum or a minimum of the \(n\)-section defined by the \(n\)-simplex, and it is a maximum of the orthogonal 1-section defined by the diagonal line of \({{\mathbb{R }}}^{n+1}\).
Barycenter Lemma
Let
\(n \ge 1\). Then the barycenter of
\(s \varDelta ^n\)
is a mode of
\(G_s\)
for
\(s < U_n\)
and it is a saddle of index 1 for
\(s > U_n\).
Proof
We compute the Hessian of \(G_s\) at \(b_G\) by taking partial derivatives with respect to the Cartesian basis of \({{\mathbb{R }}}^{n+1}\). Because of the symmetry, we have
$$\begin{aligned} d&= \frac{\partial ^2 G_s}{\partial x_0^2}(b_G) = \frac{\partial ^2 G_s}{\partial x_i^2}(b_G), \\ c&= \frac{\partial ^2 G_s}{\partial x_0 \partial x_1}(b_G) =\frac{\partial ^2 G_s}{\partial x_i \partial x_j}(b_G), \end{aligned}$$
for all \(0 \le i \le n\) and all \(j \ne i\). The characteristic polynomial of the Hessian is therefore
$$\begin{aligned} \det \left[ \begin{array}{cccc} d-\xi &{} c &{} \ldots &{} c \\ c &{} d-\xi &{} \ldots &{} c \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ c &{} c &{} \ldots &{} d-\xi \end{array} \right] . \end{aligned}$$
(20)
Its roots are the eigenvalues of the Hessian, which are \(\xi = d+nc\), with multiplicity one, and \(\xi = d-c\), with multiplicity \(n\). For simple geometric reasons, \(d+nc\) is negative and corresponds to the eigenvector in the diagonal direction of \({{\mathbb{R }}}^{n+1}\). Since all Gaussian kernels live in a common plane, any point in that plane will be a maximum value in the direction orthogonal to that plane. To compute \(d-c\), it suffices to consider just one 1-section through the barycenter, and we choose the line, \(B\), that passes through the vertex \(v_0 = s e_0\) and the barycenter of the complementary \((n-1)\)-face, which we denote as \(b_0\). The distances from the barycenter of the \(n\)-simplex are \({\Vert {b_G}-{v_0}\Vert } = s R_n\) and \({\Vert {b_G}-{b_0}\Vert } = \frac{s}{n} R_n\). Furthermore, the common distance of the vertices \(v_i = s \mathrm{e}^i\) from \(B\) is \({\Vert {v_i}-{b_0}\Vert } = s R_{n-1}\), for \(1 \le i \le n\). Plugging these distances into the second derivative of the one-dimensional section given by (2), we compute the second derivatives at \(b_G\) of the 1-sections defined by the \(n+1\) unit Gaussian kernels as
$$\begin{aligned}&[4 \pi ^2 s^2 R_n^2 - 2 \pi ] \cdot \mathrm{e}^{- \pi s^2 R_n^2}, \end{aligned}$$
(21)
$$\begin{aligned}&[4 \pi ^2 \frac{s^2}{n^2} R_n^2 - 2 \pi ]\cdot \mathrm{e}^{- \pi s^2 (R_{n-1}^2 + R_n^2/n^2)}, \end{aligned}$$
(22)
where the first line applies for \(i = 0\) and the second line for \(1 \le i \le n\). Note that \(R_{n-1}^2 + R_n^2/n^2 = R_n^2\) and \(R_n^2 (1 + \frac{1}{n}) = 1\). Adding (21) and \(n\) times (22), we obtain the second derivative of the sum of \(n+1\) one-dimensional Gaussian kernels as
$$\begin{aligned} d-c=[4 \pi ^2 s^2-2\pi (n+1)]\cdot \mathrm{e}^{-\pi s^2 R_n^2}, \end{aligned}$$
which has the same sign as \(s^2 - \frac{n+1}{2 \pi }\). Thus, \(b_G\) is a maximum of \(G_s\) for \(s < U_n\) and a saddle of index one for \(s > U_n\) as claimed. \(\square \)
We note here that the barycenter is an index-1 saddle for \(s > U_n\), as opposed to a minimum, because we place the \(n\)-simplex in \({{\mathbb{R }}}^{n+1}\). At the transition, when \(s = U_n\), the barycenter of the \(n\)-simplex is a degenerate critical point.
Orthogonal Sections
We generalize the analysis of the barycenter. Let \(1 \le k \le \ell \) with \(k + \ell = n-1\), and consider a \(k\)-face of the \(n\)-simplex as well as the complementary \(\ell \)-face. Writing \(G_s\) as the sum of the \(f_i = g_{s e_i}\), for \(0 \le i \le n\), we assume that the centers of \(f_0\) to \(f_k\) span the \(k\)-face, and that the centers of \(f_{k+1}\) to \(f_n\) span the \(\ell \)-face. Hence, \(G_s = K_s + L_s\), where \(K_s = \sum _{i=0}^k f_i\) and \(L_s = \sum _{i=k+1}^n f_i\). Writing \(b_K\) and \(b_L\) for the barycenters of the two faces, we let \(A\) be the axis defined by \(A(t) = (1-t) b_K + t b_L\). We are interested in the Hessian of \(G_s\) at \(x = A(t)\). For symmetry reasons, it has at most four distinct eigenvalues, each a second derivative along pairwise orthogonal lines. One line is the axis, another is orthogonal to the \(n\)-simplex, a third line is parallel to the \(k\)-face, and a fourth line is parallel to the \(\ell \)-face. The latter two eigenvalues have multiplicity \(k\) and \(\ell \). We write \(\kappa \) for the length parameter along the third line and \(\lambda \) for the length parameter along the fourth line.
\(\varvec{n}\)
-Section Lemma
Let
\(1 \le k \le \ell \)
with
\(k+\ell = n-1\). The second derivatives of
\(G_s\)
at
\(x = A(t)\)
along lines parallel to the complementary
\(k\)-and
\(\ell \)-faces of
\(s\varDelta ^n\)
are
$$\begin{aligned} \frac{\partial ^2 G_s}{\partial \kappa ^2} (x)&= -2 \pi G_s (x) + 4 \pi ^2 s^2 f_0 (x), \end{aligned}$$
(23)
$$\begin{aligned} \frac{\partial ^2 G_s}{\partial \lambda ^2} (x)&= -2 \pi G_s (x) + 4 \pi ^2 s^2 f_n (x). \end{aligned}$$
(24)
Proof
Recall \(G_s = \sum _{i=0}^n f_i\) and \(f_i (x) = \mathrm{e}^{- \pi {\Vert {x}-{se_i}\Vert }^2}\). The derivative with respect to the \(i\)th coordinate direction is
$$\begin{aligned} \frac{\partial G_s}{\partial x_i} (x) = - 2 \pi x_i G_s (x) + 2 \pi s f_i (x). \end{aligned}$$
Deriving again, with respect to the same and a different coordinate direction, we have
$$\begin{aligned} \frac{\partial ^2 G_s}{\partial x_i^2} (x)&= [-2 \pi + 4 \pi ^2 x_i^2] G_s(x) \nonumber \\&- 4\pi ^2 (2s x_i - s^2) f_i(x),\end{aligned}$$
(25)
$$\begin{aligned} \frac{\partial ^2 G_s}{\partial x_i \partial x_j} (x)&= 4 \pi ^2 [ x_i x_j G_s(x)\nonumber \\&- s x_i f_j(x) - s x_j f_i(x)]. \end{aligned}$$
(26)
The point at which we take the second derivative has only two distinct coordinates, \(\frac{(1-t)s}{k+1}\), repeated \(k+1\) times, and \(\frac{ts}{\ell +1}\), repeated \(\ell + 1\) times. We can therefore substitute \(x_0\) and \(x_1\) for any two among the first \(k+1 \ge 2\) coordinate directions, and we can substitute \(x_n\) and \(x_{n-1}\) for any two among the last \(\ell +1 \ge 2\) coordinate directions. The Hessian at the point \(x\) is
$$\begin{aligned} H(x)&= \left[ \begin{array}{cccccc} d &{} \ldots &{} c &{} \gamma &{} \ldots &{} \gamma \\ \vdots &{} \ddots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ c &{} \ldots &{} d &{} \gamma &{} \ldots &{} \gamma \\ \gamma &{} \ldots &{} \gamma &{} D &{} \ldots &{} C \\ \vdots &{} \ddots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ \gamma &{} \ldots &{} \gamma &{} C &{} \ldots &{} D \end{array} \right] , \end{aligned}$$
where
$$\begin{aligned} d&= \frac{\partial ^2 G_s}{\partial x_0^2} (x),\quad \quad c =\frac{\partial ^2 G_s}{\partial x_0 \partial x_1} (x),\end{aligned}$$
(27)
$$\begin{aligned} D&= \frac{\partial ^2 G_s}{\partial x_n^2} (x), \quad \quad C = \frac{\partial ^2 G_s}{\partial x_n \partial x_{n-1}} (x),\end{aligned}$$
(28)
$$\begin{aligned} \gamma&= \frac{\partial ^2 G_s}{\partial x_0 \partial x_{n}} (x). \end{aligned}$$
(29)
We get the eigenvalues as the roots of the characteristic polynomial, which we find by subtracting the variable \(\xi \) from each diagonal element and taking the determinant, as in (20). In particular, \(d-c\) is the \(k\)-fold eigenvalue that corresponds to the \(k\)-face, and \(D-C\) is the \(\ell \)-fold eigenvalue that corresponds to the \(\ell \)-face. Plugging (25) and (26) into (27) and (28), we arrive at
$$\begin{aligned} d - c&= -2 \pi G_s (x) + 4 \pi ^2 s^2 f_0 (x), \\ D - C&= -2 \pi G_s (x) + 4 \pi ^2 s^2 f_n (x). \end{aligned}$$
These are the two claimed second derivatives of (23) and (24). \(\square \)
Sign Change
A point \(x = A(t)\) is a mode of \(G_s : {{\mathbb{R }}}^{n+1} \rightarrow {{\mathbb{R }}}\) iff it is a maximum of the 1-section defined by \(A\) as well as of the \(n\)-section defined by \(H_t\). Focusing on the latter, we compute the values of the parameter \(t\) at which the second derivatives with respect to \(\kappa \) and with respect to \(\lambda \) vanish. Beginning with \(\kappa \), we set (23) to zero and find
$$\begin{aligned}{}[4 \pi ^2 s^2 - 2 \pi (k+1)] f_0(x) = 2 \pi (\ell +1) f_n(x). \end{aligned}$$
(30)
We note that the natural logarithm of \(f_n(x) / f_0(x)\) is \(- \pi \) times the following difference of squared distances:
$$\begin{aligned} {\Vert {x}-{se_n}\Vert }^2 - {\Vert {x}-{se_0}\Vert }^2 = 2 s^2 \cdot \frac{(\ell +1) - t (n+1)}{(k+1)(\ell +1)}. \end{aligned}$$
(31)
Plugging (31) into (30) gives us
$$\begin{aligned} \frac{2 \pi s^2 - (k+1)}{\ell + 1}&= \mathrm{e}^{-2 \pi s^2 \frac{(\ell +1) - t(n+1)}{(k+1)(\ell +1)} }. \end{aligned}$$
Solving this equation, we get \(t\) as a function of the scale parameter. We call this function \(t_K\). Doing the symmetric computations for \(\lambda \), we find a second function \(t_L : {{\mathbb{R }}} \rightarrow {{\mathbb{R }}}\), both defined by
$$\begin{aligned} t_K (s)&= \frac{k\ell +n}{2\pi s^2 (n+1)} \cdot \ln \frac{2 \pi s^2-k-1}{\ell +1} + \frac{\ell +1}{n+1},\end{aligned}$$
(32)
$$\begin{aligned} t_L (s)&= \frac{k\ell +n}{2\pi s^2 (n+1)} \cdot \ln \frac{k+1}{2 \pi s^2-\ell -1} + \frac{\ell +1}{n+1}. \end{aligned}$$
(33)
For example, for \(s = U_n\), we get \(t_K = t_L = \frac{\ell +1}{n+1}\), which is consistent with the Barycenter Lemma, where \(s = \frac{\ell +1}{n+1}\) is identified as the scale factor at which the barycenter of the \(n\)-simplex changes from a maximum to a minimum. Note also that \(t_K\) is undefined for \(s = U_k\), and \(t_L\) is undefined for \(s = U_\ell \).
Chandelier
To get a feeling for the situation, we draw the trajectories of the critical points of \(G_s\), and in particular those of the modes. We call this set in \({{\mathbb{R }}}^{n+1} \times {{\mathbb{R }}}\) the chandelier of the 1-parameter family of functions. Letting \(s\) increase from bottom to top, Fig. 7 sketches the chandelier for \(n = 1, 2\). The most prominent feature is the base point, which we use to decompose the chandelier into curves. Two of these curves are vertical, both swept out by the barycenter of the \(n\)-simplex, which changes from index \(n+1\) to index 1 when it passes through the base point. For each curve, we consider the height function defined by mapping \((x, s) \in {{\mathbb{R }}}^{n+1} \times {{\mathbb{R }}}\) to \(s\), and we further subdivide so that the height function is injective. In other words, we cut each curve at the local minima and maxima of the height function. The benefit of this subdivision is that now each curve is swept out by a critical point of \(G_s\) with constant index. While the total number of curves in the chandelier grows exponentially with the dimension, the number of curves that correspond to modes grows only by one for each dimension. To count the curves, we compute the number of complementary face pairs of the \(n\)-simplex:
$$\begin{aligned} p_n = \frac{1}{2} \sum _{k=0}^{n-1} \big (\begin{array}{ll} {n + 1} \\ {k + 1} \\ \end{array}\big ) =2^n-1. \end{aligned}$$
For each pair, two branches emanate from the base point. Adding the vertical line, we count \(2p_n+2 = 2^{n+1}\) branches. For each complementary face pair with \(0 \le k < \ell \), the height function of one of the two corresponding branches has a local minimum and is therefore subdivided into two curves. The number of local minima is
$$\begin{aligned} l_n = \Big \{\begin{array}{ll} p_n&{}\quad \text{ if }\quad n\,\text{ is } \text{ even }, \\ p_n-\frac{1}{2} {{n+1} \atopwithdelims (){(n+1)/2}}&{} \quad \text{ if }\quad n\,\text{ is } \text{ odd }. \end{array}\Big . \end{aligned}$$
The total number of curves is therefore \(2p_n + 2 + l_n\). Of these, only \(n+2\) correspond to modes.
Indices
The index swept out by a curve in the chandelier is easy to determine numerically, but at this time, we lack analytic proofs. We first state the result and second explain the numerical evidence that supports it.
-
\(0 \le k < \ell \)::
-
There are \(n+1 \atopwithdelims ()k+1\) complementary face pairs of \(k\)- and \(\ell \)-faces. Besides the barycenter, the corresponding axes witness critical points of index \(\ell +2\) and \(\ell +1\) for \(s \in (T_{k,\ell },U_n)\) and two critical points of index \(k+2\) for \(s>U_n\).
-
\(k = \ell = \frac{n-1}{2}\)::
-
There are \(\frac{1}{2} {n+1 \atopwithdelims ()k+1}\) complementary pairs of \(k\)-faces. Besides the barycenter, the corresponding axes witness two critical points of index \(k+2\) for \(s>T_{k,k}=U_n\).
To explain the numerical evidence, we consider \(t_K(s)\) and \(t_L(s)\), which are given by (32) and (33). We make \(t_K\) injective by restricting it to the range \([0,\frac{\ell +1}{n+1}]\) and we make \(t_L\) injective by restricting it to the range \([\frac{\ell +1}{n+1},1]\); see Fig. 8, which plots the inverses of the restricted functions. Drawing the horizontal line for a value of \(s\), we note that the portion below the graphs of \(t_K\) and \(t_L\) consists of the points \(x\) at which the \(n\)-section orthogonal to the axis has a maximum at \(x\). We see these graphs for even and odd values of \(n\) in Fig. 8. For each scale factor \(s\), there is either one or two modes witnessed by \(A_{k,\ell }\), drawn in cyan in Fig. 8. We notice empirically that the mode at the barycenter, given by \(t = \frac{\ell +1}{n+1}\), is the only mode under the piecewise defined curve for \(0 < k < \ell \). This means that the only mode is at the barycenter and the other critical points are saddles of index \(\ell +1\) and \(\ell +2\).
Resilient Modes
We have seen that the sum of Gaussian kernels can have extra modes. In this subsection, we study their significance, showing that they last for an interval of scale factors whose length increases with the dimension.
Balancing Scales
To get started, we need more information on the transition at which the extra maxima appear. We get an upper bound on \(T_{k,\ell }\) by studying the scale factor at which the weights of the two one-dimensional kernels in the decomposition of \(G_s\) restricted to a relevant axis are balanced. For \(k = \ell \), the two one-dimensional kernels in the decomposition are always balanced. For \(k < \ell \), the balancing scale factor is
$$\begin{aligned} B_{k,\ell } = \sqrt{ \frac{\ln (\ell +1) - \ln (k+1)}{\pi \big (\frac{1}{k+1}-\frac{1}{\ell +1} \big )}}. \end{aligned}$$
(34)
Indeed, recomputing the weights gives \((k+1) g(B_{k,\ell } R_k) = (\ell +1) g(B_{k,\ell } R_\ell )\). Similar to \(T_{k,\ell }\) and \(U_n\), the balancing scale factor increases with respect to \(k\) and \(\ell \). Numerically, we observed that \(B_{k,\ell }\) is not very different, but consistently larger than \(T_{k,\ell }\). We prove that this relationship is not accidental.
Transition Lemma
We have
\(T_{k,\ell } < B_{k,\ell } < U_n\)
for all integers
\(0 \le k < \ell \)
with
\(k + \ell = n-1\).
Proof
We prove the claim indirectly, by showing that \(s = B_{k,\ell }\) gives two maxima in the 1-section along any axis connecting the barycenter of a \(k\)-face with the barycenter of the complementary \(\ell \)-face. For balanced weights, we have two maxima iff the centers of the two one-dimensional kernels are further apart than twice the width; see Sect. 2. To prove the latter property, we compute
$$\begin{aligned} \frac{B_{k,\ell } D_{k,\ell }}{2 {\sigma }_0}= \sqrt{ \frac{n+1}{2(\ell -k)} \cdot \ln \frac{\ell +1}{k+1} }, \end{aligned}$$
(35)
using Eqs. (4), (1), and (34). Recall the logarithmic inequality:
$$\begin{aligned} \frac{x}{1 + \frac{x}{2}}<\ln (1+x), \end{aligned}$$
for \(x > 0\). Setting \(x = \frac{\ell - k}{k+1}\), we see that the right-hand side of (35) exceeds 1 for all choices of \(0 \le k < \ell \). This implies that we have two maxima along the axis, which implies that the balancing scale factor lies between the first and second transitions, as claimed. \(\square \)
Resilience
We define the resilience of a mode as the length of the interval of scale values at which it exists. This definition is not satisfactory for a general 1-parameter family of smooth functions; however, it will suffice in our context, in which we know enough about the modes to follow them through the family parameterized by the scale \(s\). Specifically, we have a single mode for \(0 \le s \le T_{0,n-1}\), and we have \(n+1\) modes for \(U_n \le s\). The picture is more interesting in the interval \(T_{0,n-1} < s < U_n\), in which we have \(n+2\) modes. One of these modes is the barycenter of the \(n\)-simplex, and we study the resilience of this extra mode. The upper endpoint of the interval is defined in (16), and an upper bound for the lower endpoint is given in the Transition Lemma, with the definition of the bound in (34):
$$\begin{aligned} T_{0,n-1} < \sqrt{\frac{\ln n}{\pi (1 - 1/n)}}. \end{aligned}$$
As \(n\) goes to infinity, \(U_n\) grows roughly like the square root of \(n\), and \(T_{0,n-1}\) grows roughly like the square root of the logarithm of \(n\). The gap between the two widens, so that the resilience of the mode at the barycenter of the \(n\)-simplex grows roughly like \(\sqrt{n}\); see Fig. 6.
Summary
We are now ready to summarize the findings in regard to the critical points and the modes of the 1-parameter family of functions \(G_s: {{\mathbb{R }}}^{n+1} \rightarrow {{\mathbb{R }}}\). For values \(s < T_{0,n-1}\), we have a single critical point with index \(n+1\). Thereafter, we pick up \(2 {n+1 \atopwithdelims ()k+1}\) critical points at every \(T_{k,\ell }\), for \(0 \le k < \ell \), until we accumulate \(2 l_n + 1\) critical points right before reaching \(U_n\). The barycenter has index \(n+1\), and the other critical points come in pairs, with indices \(\ell +2\) and \(\ell +1\), for \(\frac{n-1}{2} < \ell \le n-1\). For \(U_n < s\), we have \(2 p_n + 1\) critical points. The barycenter has index 1, and the other critical points come in pairs with indices \(\ell +2\) and \(k+2\), for \(\frac{n-1}{2} \le \ell \le n-1\). As a sanity check, we consider the Euler-Poincaré formula, which states that the alternating sum of critical points is equal to the Euler characteristic of \({{\mathbb{R }}}^{n+1}\):
$$\begin{aligned} \sum _{i=0}^{n+1} (-1)^i c_i&= (-1)^{n+1}, \end{aligned}$$
(36)
where \(c_i\) counts the critical points with index \(i\). We also write \(c = \sum _{i=0}^{n+1} c_i\). Trivially, (36) holds in the first case. Thereafter, we pick up the critical points in pairs whose contribution to the alternating sum cancel, so (36) is maintained. Finally, for \(U_n < s\), we have a bijection between the critical points and the faces of the \(n\)-simplex such that the index is \(n+1\) minus the dimension of the face. Since the \(n\)-simplex is a closed ball, its Euler characteristic is 1, which again implies (36). We thus have a complete description of the critical points of the \(n\)-design as the scale factor increases from zero to infinity.
Main Theorem
Let
\(n \ge 1\)
and consider the sum of
\(n+1\)
unit Gaussian kernels placed at the vertices of the scaled standard
\(n\)-simplex, \(s\varDelta ^n\).
- (1):
-
For
\(s < T_{0,n-1}\), we have one critical point which is also a mode.
- (2):
-
For
\(T_{0,n-1} < s < U_n\), we have gradually more critical points after passing each
\(T_{k,\ell }\), until we accumulate
\(2 l_n +1\)
critical points right before
\(U_n\). Of these critical points, \(n+2\)
are modes, and they exist during the entire interval.
- (3):
-
For
\(U_n < s\), we have
\(2 p_n +1\)
critical points, of which
\(n+1\)
are modes.
The resilience of the extra mode in Case (2) is
\(U_n - T_{0,n-1}\), which grows like
\(\sqrt{n}\).
Many Modes
In this subsection, we construct a finite configuration of isotropic Gaussian kernels with a superlinear number of modes. While there is a family of such constructions, it will suffice to explain one.
Products of Simplices
The basic building block of our construction is the standard 2-simplex. Let the dimension be \(3n\) and write the \(3n\)-dimensional Euclidean space as the Cartesian product of \(n\,3\)-dimensional planes: \({{\mathbb{R }}}^{3n} = H_1 \times H_2 \times \ldots \times H_n\), in which \(H_i\) is spanned by the three coordinate vectors \(e_{3i-2},\,e_{3i-1},\,e_{3i}\), for \(1 \le i \le n\). Let \(\varDelta _i^2\) be the standard 2-simplex in \(H_i\), with vertices \(v_{i0} = e_{3i-2},\,v_{i1} = e_{3i-1},\,v_{i2} = e_{3i}\). Correspondingly, we write \(g_{ij}: H_i \rightarrow {{\mathbb{R }}}\) for the 3-dimensional unit Gaussian kernel with center \(v_{ij}\), for \(0 \le j \le 2\), and \(G_i: H_i \rightarrow {{\mathbb{R }}}\) defined by
$$\begin{aligned} G_i (x)= g_{i0} (x) + g_{i1} (x) + g_{i2} (x) \end{aligned}$$
for their sum. Next, we construct a \(3n\)-dimensional sum of Gaussian kernels by taking products. To begin, we let \(P \subseteq {{\mathbb{R }}}^{3n}\) be the largest subset of points whose orthogonal projection to \(H_i\) is \(\{v_{i0}, v_{i1}, v_{i2}\}\), for \(1 \le i \le n\). This is the set of \(3^n\) points formed by taking the Cartesian product of the \(n\) triplets of points. For each point \(p \in P\), let \(f_p : {{\mathbb{R }}}^{3n} \rightarrow {{\mathbb{R }}}\) be the unit Gaussian kernel with center \(p\). Adding these kernels, we get \(F: {{\mathbb{R }}}^{3n} \rightarrow {{\mathbb{R }}}\), defined by
$$\begin{aligned} F(x) = \sum _{p \in P} f_p (x). \end{aligned}$$
To understand \(F\), we recall that \(f_p\) can be written as the product of \(3n\,1\)-dimensional unit Gaussian kernels; see (6). Collecting the terms in sets of three, we can write
$$\begin{aligned} f_p (x)=\prod _{i=1}^n g_{ij} (x), \end{aligned}$$
where \(j\) is chosen such that \(v_{ij}\) is the orthogonal projection of \(p\) onto \(H_i\). Substituting the sum of the three kernels for the singletons, we obtain
$$\begin{aligned} F(x)=\prod _{i=1}^n G_i (x). \end{aligned}$$
In words, the sum of the \(3^n\,3n\)-dimensional unit Gaussian kernels is the product of \(n\) sums of three 3-dimensional unit Gaussian kernels.
Counting Modes
We arrive at the final construction by reintroducing the scale factor, writing \(F_s : {{\mathbb{R }}}^{3n} \rightarrow {{\mathbb{R }}}\) for the product of the \(G_{is} : H_i \rightarrow {{\mathbb{R }}}\), where \(G_{is}\) is of course the sum of the three unit Gaussian kernels with centers \(s e_{3i-2},\,s e_{3i-1},\,s e_{3i}\). We have seen in Sect. 3 that \(s\) can be chosen such that \(G_{is}\) has 4 modes. Since \(F_s\) is the product of the \(G_{is}\), its sets of modes is the largest subset of \({{\mathbb{R }}}^{3n}\) whose orthogonal projection to \(H_i\) is the set of four modes of \(G_{is}\), for \(1 \le i \le n\). Its size is \(4^n = 3^{(1 + \log _3 \frac{4}{3}) n}>3^{1.261 n}\). This shows that the number of modes is roughly the number of kernels to the power 1.261.
There is an entire family of similar constructions. The one presented here neither maximizes the number nor the resilience of the extra modes. Indeed, we can increase the exponent by improving the ratio of modes over kernels in each \(H_i\), and we can improve the resilience by using higher-dimensional simplices.