Add Isotropic Gaussian Kernels at Own Risk: More and More Resilient Modes in Higher Dimensions
 461 Downloads
 2 Citations
Abstract
The fact that a sum of isotropic Gaussian kernels can have more modes than kernels is surprising. Extra (ghost) modes do not exist in \({{\mathbb{R }}}^1\) and are generally not well studied in higher dimensions. We study a configuration of \(n+1\) Gaussian kernels for which there are exactly \(n+2\) modes. We show that all modes lie on a finite set of lines, which we call axes, and study the restriction of the Gaussian mixture to these axes in order to discover that there are an exponential number of critical points in this configuration. Although the existence of ghost modes remained unknown due to the difficulty of finding examples in \({{\mathbb{R }}}^2\), we show that the resilience of ghost modes grows like the square root of the dimension. In addition, we exhibit finite configurations of isotropic Gaussian kernels with superlinearly many modes.
Keywords
Unit Gaussian kernels Diffusion Convolution Standard simplex Critical points Modes ResilienceMathematics Subject Classification (2000)
26B40 62F35 62E991 Introduction
The diffusion of chemical substances, such as hormones, and of physical quantities, such as temperature, is a general phenomenon. Assuming a uniform medium, the process is described by the solution to the heat equation. In Euclidean space, solving this equation is synonymous to convolving with a Gaussian kernel. This is also a popular computational method, in particular in computer vision, where the 1parameter family of convolutions of a given image is known as its scale space; see [13, 17]. A onedimensional Gaussian kernel is known as a normal density function in probability [8].
We are interested in the quantitative analysis of diffusion and Gaussian convolution. In particular, we study the evolution of the critical points of a function that is convolved with a progressively wider Gaussian kernel. If the function is onedimensional, from \({{\mathbb{R }}}\) to \({\mathbb{R }}\), then Gaussian convolution does not create new critical points [1, 9, 15, 18]. As a consequence, the diffusion of \(m\) point masses (a sum of \(m\) Dirac delta functions) cannot have more than \(m\) modes (local maxima); see [2, 4, 16]. For two or higherdimensional functions, this is no longer true; see [12] for a twodimensional function for which diffusion temporarily increase the number of modes and [7, 14] for a mathematical analysis of the unfolding events that cause this effect. It has been observed that these events are rare in practice [10, 11] and it has been confirmed that the ability to create critical points with nonnegligible persistence deteriorates rapidly [6]. It is also known that \(n+1\) point masses can be arranged in \({{\mathbb{R }}}^n\) so that diffusion creates \(n+2\) modes during a nonempty time interval; see [5].
The contribution of this paper is a strengthening of the cautionary voice on using Gaussian convolution in dimensions beyond one. In particular, we give a detailed analysis of the sum of \(n+1\) identical isotropic Gaussian kernels placed at the vertices of a regular \(n\)simplex in \({{\mathbb{R }}}^n\). We prove that all critical points lie on the symmetry axes of the \(n\)simplex, and we characterize their indices, confirming the \((n+2)\)nd mode at the barycenter as the sole extra mode. While the extra mode seems fragile, we show that the interval of widths during which it exists grows like the square root of the dimension. It thus seems likely that the phenomenon of extra modes is more prevalent in higher dimensions. Providing additional evidence, we construct finite configurations of isotropic Gaussian kernels with superlinearly many modes.
1.1 Outline
Section 2 provides background on Gaussian kernels and the geometry of regular simplices. Section 3 analyzes the sum of kernels placed at the vertices of a regular simplex, characterizes its critical points, estimates the resilience of the extra mode, and exhibits configurations with superlinearly many modes. Section 4 concludes this paper.
2 Background
Our results depend on onedimensional Gaussian kernels and \(n\)dimensional regular simplices. We study these two topics in two subsections.
2.1 Curve Analysis
In this subsection, we introduce Gaussian kernels and discuss some of their fundamental properties.
2.2 Gaussian Kernels and Derivatives
2.3 Balanced Sums
Consider the sum of two unit Gaussian kernels. For symmetry, we choose their centers at distance \(z \ge 0\) to the left and right of the origin. As proven in [3], \(G = g_{z} + g_z\) has either 1 or 3 critical points and no other number is possible. More specifically, \(G\) has 1 maximum iff \(z \le {\sigma }_0\) and \(G\) has 2 maxima and 1 minimum iff \(z > {\sigma }_0\). We present our own proof of this result, as we need the concepts it uses.
2.4 Unbalanced Sums
Next, we study sums \(G_w = g_{z} + w g_z\), where \(w \ge 0\) is the weight of the second term. The number of critical points of \(G_w\) is at most 3, but in contrast to the balanced case, it can also be 2, as we will see. More specifically, [2] gives necessary and sufficient conditions for all three cases (1, 2, or 3 critical points), but they are not as easy to state as in the balanced case. As before, we present our own proof since we need the concepts it uses.
A new phenomenon is the possibility of 2 critical points. To see when this case arises, we set \(w = r(x) + 1\) and note that for this choice of weight \(r_w(x) = 0\). In words, \(p\) and \(wq\) intersect above \(x\) and, equivalently, \(x\) is a critical point of \(G_w\). If \(x\) has the additional property of being critical for \(r\), then the intersection between \(p\) and \(wq\) is degenerate. As computed above, the critical points of \(r_{w}\) are given by \(x^2 = z^2  {\sigma }_0^2\). Let \(x_1 =  \sqrt{z^2  {\sigma }_0^2}\) and \(x_2 = \sqrt{z^2  {\sigma }_0^2}\) be the two solutions, and note that \(x_1\) gives a weight \(w_1 = r(x_1) + 1\) that is larger than 1, while \(x_2\) gives a weight \(w_2 = r(x_2) + 1\) between 0 and 1. We call \(w_1\) and \(w_2\) the transition weights for \(z\), remembering that they exist iff \(z \ge {\sigma }_0\).
2.5 Simplex Design
In this subsection, we design a sum of Gaussian kernels in \({{\mathbb{R }}}^{n+1}\) that has the symmetry group of the regular \(n\)simplex. We begin with a geometric study of the simplex, whose shape properties will play a central role in our design.
2.6 Standard Simplex
2.7 Sections
2.8 Standard Design
We are interested in changing the widths of the \((n+1)\)dimensional kernels uniformly. Equivalently, we scale the \(n\)simplex by moving the centers of the unit Gaussian kernels closer to or further from each other, without changing their widths and heights. To do this, we introduce the scaled \(n\)design, \(G_s = g_{s e_0} + g_{s e_1} + \cdots + g_{s e_n}\). Here, we call \(s\) the scale factor, and we write \(s \varDelta ^n\) for the scaled \(n\)simplex whose vertices are the \(s e_i\). We are interested in the evolution of the critical points in the 1parameter family of scaled \(n\)design \(G_s: {{\mathbb{R }}}^{n+1} \rightarrow {{\mathbb{R }}}\), as \(s\) goes from zero to infinity.
3 Analysis
We begin this section by proving that all critical points lie on the axes of the \(n\)simplex. Thereafter, we analyze each axis, characterizing for which scales we see 1, 2, or 3 critical points. To decide which of the onedimensional maxima are modes, we analyze the \(n\)sections orthogonal to the axes. As it turns out, all modes lie on axes that pass through vertices of the \(n\)simplex. Most interesting is the critical point at the barycenter, which changes from unique mode during an initial interval of scales, to \((n+2)\)nd mode during a nonempty intermediate interval, to a saddle of index one during a final interval. We call the length of the intermediate interval the resilience of the extra mode and show that it grows like the square root of the dimension. Finally, we construct sums of isotropic Gaussian kernels with a superlinear number of modes.
3.1 Lines of Critical Points
In this subsection, we note that all critical points of a scaled \(n\)design lie on the axes of the scaled \(n\)simplex. We begin by introducing coordinates that are more natural for the \(n\)design, and we show how they relate to the barycentric coordinates.
3.2 Distance Coordinates
Write \(v_i = s e_i\), for \(0 \le i \le n\), and let \(x\) be a point of the corresponding scaled \(n\)simplex \(s \varDelta ^n\). Setting \(r_i = {\Vert {x}{v_i}\Vert }\), we note that \(x\) is uniquely defined by the vector of \(n+1\) distances since \(x\) lies on the hyperplane spanned by \(\{ v_i \}\). We express this by writing \(x = (r_0, r_1, \ldots , r_n)_D\), and by calling the \(r_i\) the distance coordinates of \(x\). Recall that \((x_0, x_1, \ldots , x_n)_B\) is the representation of the same point in barycentric coordinates. We are interested in computing the barycentric from the distance coordinates via the coordinate transformation below.
Proof
3.3 NonZero Gradients
Recall that \(G_s :{{\mathbb{R }}}^{n+1} \rightarrow {{\mathbb{R }}}\) is the scaled \(n\)design formed by taking the sum of the \(n+1\) unit Gaussian kernels whose centers are the vertices of \(s \varDelta ^n\). We use the coordinate transformation to show that \(G_s\) has no critical points away from the axes of the scaled \(n\)simplex:
Axes Lemma Every critical point of \(G_s\) lies on an axis of the scaled \(n\)simplex \(s\varDelta ^n\).
Proof
3.4 OneDimensional Sections
The restriction of \(G_s\) to an axis of \(s \varDelta ^n\) is a sum of two weighted Gaussian kernels. This sum has two maxima for a range of scale factors, which we now analyze.
3.5 Transitions
3.6 Section Evolution
We follow the 1section defined by an axis of the \(n\)simplex as the scale factor, \(s\), goes from 0 to infinity. By construction, we have qualitative changes at the transitions, which we now summarize.
1Section Lemma Let \(0 \le k \le \ell \) with \(k+\ell = n1\), and let \(A\) be the axis passing through the barycenters of a \(k\)face and its complementary \(\ell \)face of \(s \varDelta ^n\). Then \(G_s_A\) has one maximum whenever \(s<T_{k,\ell }\), and two maxima whenever \(T_{k,\ell } < s\) and \(s \ne U_{n}\).
Next, we look at the second transition, \(U_{n}\). As we have observed, it depends only on \(n\). This implies that all 1sections lose the maximum at the barycenter at the same scale factor. For \(k = \ell \), the two transitions coincide, so the interval collapses.
3.7 \(n\)Dimensional Sections
In this subsection, we show that most maxima of the 1sections are not modes. We begin with the analysis of the barycenter of \(s \varDelta ^n\), which belongs to every axis of the scaled \(n\)simplex.
3.8 Barycenter of \(n\)simplex
The \(n\)design has the symmetry group of the \(n\)simplex, which implies that the barycenter, \(b_G \in s \varDelta ^n\), is a critical point of \(G_s\). Indeed, if \(b_G\) is not a critical point, then it has a nonzero gradient, which contradicts the symmetry. More specifically, \(b_G\) is either a maximum or a minimum of the \(n\)section defined by the \(n\)simplex, and it is a maximum of the orthogonal 1section defined by the diagonal line of \({{\mathbb{R }}}^{n+1}\).
Barycenter Lemma Let \(n \ge 1\). Then the barycenter of \(s \varDelta ^n\) is a mode of \(G_s\) for \(s < U_n\) and it is a saddle of index 1 for \(s > U_n\).
Proof
We note here that the barycenter is an index1 saddle for \(s > U_n\), as opposed to a minimum, because we place the \(n\)simplex in \({{\mathbb{R }}}^{n+1}\). At the transition, when \(s = U_n\), the barycenter of the \(n\)simplex is a degenerate critical point.
3.9 Orthogonal Sections
We generalize the analysis of the barycenter. Let \(1 \le k \le \ell \) with \(k + \ell = n1\), and consider a \(k\)face of the \(n\)simplex as well as the complementary \(\ell \)face. Writing \(G_s\) as the sum of the \(f_i = g_{s e_i}\), for \(0 \le i \le n\), we assume that the centers of \(f_0\) to \(f_k\) span the \(k\)face, and that the centers of \(f_{k+1}\) to \(f_n\) span the \(\ell \)face. Hence, \(G_s = K_s + L_s\), where \(K_s = \sum _{i=0}^k f_i\) and \(L_s = \sum _{i=k+1}^n f_i\). Writing \(b_K\) and \(b_L\) for the barycenters of the two faces, we let \(A\) be the axis defined by \(A(t) = (1t) b_K + t b_L\). We are interested in the Hessian of \(G_s\) at \(x = A(t)\). For symmetry reasons, it has at most four distinct eigenvalues, each a second derivative along pairwise orthogonal lines. One line is the axis, another is orthogonal to the \(n\)simplex, a third line is parallel to the \(k\)face, and a fourth line is parallel to the \(\ell \)face. The latter two eigenvalues have multiplicity \(k\) and \(\ell \). We write \(\kappa \) for the length parameter along the third line and \(\lambda \) for the length parameter along the fourth line.
Proof
3.10 Sign Change
3.11 Chandelier
3.12 Indices
 \(0 \le k < \ell \):

There are \(n+1 \atopwithdelims ()k+1\) complementary face pairs of \(k\) and \(\ell \)faces. Besides the barycenter, the corresponding axes witness critical points of index \(\ell +2\) and \(\ell +1\) for \(s \in (T_{k,\ell },U_n)\) and two critical points of index \(k+2\) for \(s>U_n\).
 \(k = \ell = \frac{n1}{2}\):

There are \(\frac{1}{2} {n+1 \atopwithdelims ()k+1}\) complementary pairs of \(k\)faces. Besides the barycenter, the corresponding axes witness two critical points of index \(k+2\) for \(s>T_{k,k}=U_n\).
3.13 Resilient Modes
We have seen that the sum of Gaussian kernels can have extra modes. In this subsection, we study their significance, showing that they last for an interval of scale factors whose length increases with the dimension.
3.14 Balancing Scales
Transition Lemma We have \(T_{k,\ell } < B_{k,\ell } < U_n\) for all integers \(0 \le k < \ell \) with \(k + \ell = n1\).
Proof
3.15 Resilience
3.16 Summary
 (1)

For \(s < T_{0,n1}\), we have one critical point which is also a mode.
 (2)

For \(T_{0,n1} < s < U_n\), we have gradually more critical points after passing each \(T_{k,\ell }\), until we accumulate \(2 l_n +1\) critical points right before \(U_n\). Of these critical points, \(n+2\) are modes, and they exist during the entire interval.
 (3)

For \(U_n < s\), we have \(2 p_n +1\) critical points, of which \(n+1\) are modes.
3.17 Many Modes
In this subsection, we construct a finite configuration of isotropic Gaussian kernels with a superlinear number of modes. While there is a family of such constructions, it will suffice to explain one.
3.18 Products of Simplices
3.19 Counting Modes
We arrive at the final construction by reintroducing the scale factor, writing \(F_s : {{\mathbb{R }}}^{3n} \rightarrow {{\mathbb{R }}}\) for the product of the \(G_{is} : H_i \rightarrow {{\mathbb{R }}}\), where \(G_{is}\) is of course the sum of the three unit Gaussian kernels with centers \(s e_{3i2},\,s e_{3i1},\,s e_{3i}\). We have seen in Sect. 3 that \(s\) can be chosen such that \(G_{is}\) has 4 modes. Since \(F_s\) is the product of the \(G_{is}\), its sets of modes is the largest subset of \({{\mathbb{R }}}^{3n}\) whose orthogonal projection to \(H_i\) is the set of four modes of \(G_{is}\), for \(1 \le i \le n\). Its size is \(4^n = 3^{(1 + \log _3 \frac{4}{3}) n}>3^{1.261 n}\). This shows that the number of modes is roughly the number of kernels to the power 1.261.
There is an entire family of similar constructions. The one presented here neither maximizes the number nor the resilience of the extra modes. Indeed, we can increase the exponent by improving the ratio of modes over kernels in each \(H_i\), and we can improve the resilience by using higherdimensional simplices.
4 Discussion
The main contribution of this paper is a cautionary message about the sum of Gaussian kernels. Giving a detailed analysis of the construction studied in [5], we show that there is indeed only one extra mode, but that its resilience increases like the square root of the dimension. We also exhibit configurations of finitely many identical isotropic Gaussian kernels whose sums have superlinearly many modes. We thus give precisely quantified contradictions to our intuition that diffusion erodes and eliminates local density maxima.
The results in this paper raise a number of questions. How stable are the extra maxima? Our analysis in Sect. 3.4 answers the question when the perturbation is the diffusion of density. How robust are they under moving individual kernels or changing their weights? Related to this question, we ask about the probability of extra modes for randomly placed Gaussian kernels in Euclidean or other spaces. CarreiraPerpiñán and Williams report that their computerized searches in \({{\mathbb{R }}}^2\) did not turn up any extra modes [5], but what if we did similar experiments in three and higher dimensions? Finally, it would be interesting to determine the persistence of the extra modes; see [6] for a recent related study. In other words, how large is the difference in function value between an extra mode and the highest saddle? Understanding the persistence, as well as the basin of attraction for each mode would complement the analysis provided in this paper.
Notes
Acknowledgments
This research is partially supported by the National Science Foundation (NSF) under Grant DBI0820624, by the European Science Foundation under the Research Networking Programme, and the Russian Government Project 11.G34.31.0053.
References
 1.Babaud, J., Witkin, A.P., Baudin, W., Duda, R.O.: Uniqueness of the Gaussian kernel for scalespace filtering. IEEE Trans. Pattern Anal. Mach. Intel. 8, 26–33 (1986)zbMATHCrossRefGoogle Scholar
 2.Behboodian, J.: On the modes of a mixture of two normal distributions. Technometrics 12, 131–139 (1970)zbMATHCrossRefGoogle Scholar
 3.Burke, P.J.: Solution of problem 4616 [1954, 718], proposed by A. C. Cohen Jr. Am. Math. Monthly 63, 129 (1956)MathSciNetCrossRefGoogle Scholar
 4.CarreiraPerpiñán, M., Williams, C.: On the number of modes of a Gaussian mixture. In: ScaleSpace Methods in Computer Vision. Lecture Notes in Computer Science, vol. 2695. pp. 625–640 (2003)Google Scholar
 5.CarreiraPerpiñán, M., Williams, C.: An isotropic Gaussian mixture can have more modes than components. Report EDIINFRR0185, School of Informatics, University of Edinburgh, Scotland (2003)Google Scholar
 6.Chen, C., Edelsbrunner, H.: Diffusion runs low on persistence fast. In: Proceedings of 13th International Confernce on Computer Vision. pp. 423–430 (2011)Google Scholar
 7.Damon, J.: Local Morse theory for solutions to the heat equation and Gaussian blurring. J. Diff. Equ. 115, 368–401 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
 8.Feller, W.: An Introduction to Probability Theory and Its Applications, vol. I. Wiley, New York (1950)zbMATHGoogle Scholar
 9.Koenderink, J.J.: The structure of images. Biol. Cybern. 50, 363–370 (1984)MathSciNetzbMATHCrossRefGoogle Scholar
 10.Kuijper, A., Florack, L.M.J.: The application of catastrophe theory to image analysis. Rept. UUCS200123, Department of Computer Science, Utrecht University, Utrecht (2001)Google Scholar
 11.Kuijper, A., Florack, L.M.J.: The relevance of nongeneric events in scale space models. Int. J. Comput. Vis. 57, 67–84 (2004)CrossRefGoogle Scholar
 12.Lifshitz, L.M., Pizer, S.M.: A multiresolution hierarchical approach to image segmentation based on intensity extrema. IEEE Trans. Pattern Anal. Mach. Intell. 12, 529–540 (1990)CrossRefGoogle Scholar
 13.Lindeberg, T.: ScaleSpace Theory in Computer Vision. Kluwer, Dortrecht (1994)CrossRefGoogle Scholar
 14.Rieger, J.: Generic evolutions of edges on families of diffused greyvalue surfaces. J. Math. Imaging Vis. 5, 207–217 (1995)zbMATHCrossRefGoogle Scholar
 15.Roberts, S.J.: Parametric and nonparametric unsupervised cluster analysis. Pattern Recognit. 30, 261–272 (1997)Google Scholar
 16.Silverman, B.W.: Using kernel density estimates to investigate multimodality. J. Royal Stat. Soc. B 43, 97–99 (1981)Google Scholar
 17.Witkin, A.P.: Scalespace filtering. In: Proceedings of 8th International Joint Conference on Artificial Intelligence. pp. 1019–1022 (1983)Google Scholar
 18.Yuille, A.L., Poggio, T.A.: Scaling theorems for zero crossings. IEEE Trans. Pattern Anal. Mach. Intel. 8, 15–25 (1986)zbMATHCrossRefGoogle Scholar