On the condition number of Vandermonde matrices with pairs of nearly-colliding nodes

We prove upper and lower bounds for the spectral condition number of rectangular Vandermonde matrices with nodes on the complex unit circle. The nodes are"off the grid", pairs of nodes nearly collide, and the studied condition number grows linearly with the inverse separation distance. We provide reasonable sharp constants that are independent from the number of nodes as long as non-colliding nodes are well-separated.


Introduction
Vandermonde matrices with complex nodes appear in polynomial interpolation problems and many other fields of mathematics (see, e.g., the introduction of [2] and its references). In this paper, we are interested in rectangular Vandermonde matrices with nodes on the complex unit circle and with a large polynomial degree. These matrices generalize the classical discrete Fourier matrices to non-equispaced nodes and the involved polynomial degree is also called bandwidth. The condition number of those matrices has recently become important in the context of stability analysis Stefan Kunis skunis@uos.de Dominik Nagel dnagel@uos.de 1 Institute of Mathematics and Research Center of Cellular Nanoanalytics, Osnabrück University, Osnabrueck, Germany of super-resolution algorithms like Prony's method [6,15], the matrix pencil method [12,18], the ESPRIT algorithm [20,21], and the MUSIC algorithm [17,22]. If the nodes of such a Vandermonde matrix are all well-separated, with minimal separation distance greater than the inverse bandwidth, bounds on the condition number are established for example in [2,5,14,18].
If nodes are nearly colliding, i.e., their distance is smaller than the inverse bandwidth, the behavior of the condition number is not yet fully understood. The seminal paper [9] coined the term (inverse) super-resolution factor for the product of the bandwidth and the separation distance of the nodes. For M nodes on a grid, the results in [7,9] imply that the condition number grows like the super-resolution factor raised to the power of M − 1 if all nodes nearly collide. More recently, the practically relevant situation of groups of nearly colliding nodes was studied in [1,4,16,19]. In different setups and oversimplifying a bit, all of these refinements are able to replace the exponent M − 1 by the smaller number m − 1, where m denotes the number of nodes that are in the largest group of nearly colliding nodes. The authors of [1,19] focus on quite specific quantities in an optimization approach and in the so-called Prony mapping, respectively. In contrast, the condition number or the relevant smallest singular value of Vandermonde matrices with "off the grid" nodes on the unit circle is studied in [4,16]. While [4] provided the exponent m − 1 for the first time, the proof technique leads to quite pessimistic constants and more restrictively asks all nodes (including the well-separated ones) to be within a tiny arc of the unit circle. More recently, the second version of [16] provided a quite general framework and reasonable sharp constants, but involves a technical condition which prevents the separation distance from going to zero for a fixed number of nodes and a fixed bandwidth.
Here, we present upper and lower bounds for the condition number of Vandermonde matrices with pairs of nearly colliding nodes, i.e., the special case m = 2. We achieve the expected linear order and all constants are reasonably sharp and absolute. In contrast to the more general quoted results [4,16], the nodes can be placed on the full unit circle and the separation distance is allowed to approach zero. Our mild technical conditions, which seem to be artifacts of our proof technique, are: (i) A logarithmic growth in the separation distance of the well-separated nodes (which can be dropped at a price of a larger constant for the condition number estimate), (ii) A uniformity condition that colliding nodes behave similarly (they have the same separation distance up to a predefined constant), and (iii) An a priori upper bound on the separation distance of the colliding nodes.
The outline of this paper is as follows: Section 2 fixes the notation, recalls results for the case of well-separated nodes, and provides lower bounds for the condition number. In Section 3, we establish upper bounds for nodes that are well-separated from each other except for one pair of nodes that is nearly colliding. Section 4 goes one step further and studies the more general case where an arbitrary number of pairs of nodes nearly collide. Theoretical and numerical comparisons with [3,4,8,16] can be found at the end of Section 4 and in Section 5.

Preliminaries
Let T := {z ∈ C : |z| = 1} be the complex torus and nodes {z 1 , . . . , z M } ⊂ T be parametrized by z j = e −2πit j , j = 1 . . . , M, such that t 1 < · · · < t M ∈ [0, 1). We fix a degree n ∈ N so that N := 2n+1 > M and set up the rectangular Vandermonde matrix: The Dirichlet kernel D n : R → R is given by: The matrix K is symmetric positive definite and the spectral condition number is finite since all nodes are distinct (here and throughout the paper K := sup{ Kx : x = 1} with x 2 := k |x k | 2 ). On the other hand, if two nodes are equal, then two rows of A are the same and by continuity the condition number diverges if two nodes collide. The (wrap around) distance of two nodes is given by: and we introduce the normalized separation distance of the node set as: We call the case τ = 1 critical separation, i.e., min j = t j − t T = 1 N , and the cases τ ≤ 1 and τ > 1 nearly colliding and well-separated, respectively. Figure 1 illustrates the situation for 4 nodes on the unit circle. The parameter ρ min describes a minimum separation distance of involved non-colliding nodes assumed in the theorems.
A reasonable result for well-separated nodes is as follows.
Theorem 2.1 [2,18] Let A be a Vandermonde matrix as in (2.1) with τ > 1, then In particular, we have dotted, Theorem can be applied; filled, well-separated; lined, 3 nearly colliding nodes; empty areas, at most 2 nearly coll. nodes, but not covered by results and this implies K ≤ N + N/τ and We note in passing that the above lower bound on the smallest singular value is an improvement of [18] by [2] and that [18] and [8] allow to replace 1 τ in the upper and the lower bounds by 1 τ − 1 N , respectively. Moreover, we have the following lower bound on the condition number. This already shows that the upper bound for well-separated nodes is quite sharp and provides the benchmark for nearly colliding nodes.
Proof Without loss of generality, let t 2 − t 1 = τ/N and consider the upper left 2 × 2 block in: We apply Lemma A.5, and get: and Lemma A.1 yields the assertion.

Nodes with one nearly colliding pair
Definition 3.1 Let M ≥ 2 and 0 = t 1 < · · · < t M ∈ [0, 1) such that: then {t 1 , . . . , t M } is called a set of nodes with one nearly colliding pair; see Fig. 2 for an illustration. Due to periodicity, the choice t 1 = 0 and |t 1 − t 2 | T = τ N is without loss of generality. Now, we estimate an upper bound on the condition number of the Hermitian matrix K by bounding K directly and applying Lemma A.4 to K −1 before bounding K −1 . For that, we introduce some notation for abbreviation.
we have the partitioning: where A 2 is a Vandermonde matrix with nodes that are at least ρ N separated.

Lemma 3.3
Under the conditions of Definition 3.1 and for ρ ≥ 6, we have: The key idea is to see the set of nodes as a union of two well-separated subsets and use the existing bounds for these. In contrast to the next chapter, here, one of the sets only consists of a single node. We start by noting that Theorem 2.1 and (3.1) Together with the decomposition (3.2), the triangle inequality, Lemma A.6, and Theorem 2.1, we obtain:

Lemma 3.4
Under the conditions of Definition 3.1 and with b as in (3.1), we have: where e 1 ∈ R (M−1) denotes the first unit vector and: Proof The vector b can be approximated by the first column of K 2 in the sense that: . . .

D n (t M )
We have |r 1 | = N − D n (τ/N) and for j = 2, . . . , M − 1 the mean value theorem yields: Note that, in the worst case, half of the nodes can be as close as possible (under the assumed separation condition) to t 2 not only on its right but also on its left. Hence, for j = 2, . . . , M 2 , ξ j ≥ (j −1)ρ N and Lemma A.1 lead to: Thus, for all nodes, we get:

Lemma 3.5
Under the conditions of Definition 3.1 and for ρ ≥ 5, we have: Proof We consider K decomposed as in (3.2) and apply Lemma A.4 with respect to K 2 to obtain: and thus, First of all, we establish an upper bound for the norm of the triangular matrix. Equation (3.1) and Theorem 2.1 imply: Together with Lemma A.6, we obtain: The next step is to bound Applying the second part of Lemma 3.4, Lemma A.1, and Theorem 2.1 yields: For ρ ≥ 5, the most inner bracketed term takes values in (1, 1.4) such that the square bracketed term is positive. Forming the reciprocal gives the result, since Theorem 2.1 also implies: Theorem 3.6 (Upper bound) Under the conditions of Definition 3.1 with ρ ≥ ρ min = 6, we have: Proof The bound follows from Lemmata 3.3 and 3.5 with C(ρ) ≤ C(6) ≤ 6.5.
Lower and upper bounds in Theorems 2.2 and 3.6 yield: The condition on ρ implies that for specific configurations of M nodes, our result becomes effective as early as N ≈ 6M-this is in contrast to the results [4,16], where N has to be much larger. 2 . An additional minor improvement on C(ρ) and on the range of admissible values for ρ can be achieved when applying Lemma A.1 to two factors simultaneously. Remark 3.8 (Generalizations and limitations) In principle, the suggested Schur complement technique can be generalized to more than two nodes colliding and also to the multivariate case: (i) Let M ≥ 3 and 0 = t 1 < · · · < t M ∈ [0, 1) be such that {t 1 , t 2 , t 3 } nearly collide and decompose: N , While it is clear that the Schur complement K 1 −B * K −1 2 B is strictly positive definite, establishing a lower bound on its smallest singular value similar to the proofs of Lemmata 3.4 and 3.5 seems considerably harder. Already, the linear approximation in Lemma 3.4 then needs to be replaced by a higher order approximation for the matrix B.
(ii) Consider the bivariate case and the Vandermonde matrix: The distance of the nodes t j = (u j , v j ) ∈ [0, 1) 2 is measured by t j − t T := min r∈Z 2 t j − t + r ∞ and we consider the situation as in Definitions 3.1 and 3.2 with K = AA * . Lemma 3.4 can be proven using the bivariate mean value theorem to get |r j | ≤ Nτ π/ ξ j T , j = 2, 3, . . . , M, and the packing argument [14,Lem. 4.5] to get: We need additional assumptions for Lemma 3.5 to work since results for general well-separated nodes, cf. [15], seem to be too weak. If the nodes t 2 , . . . , t M are a subset of equispaced nodes in T 2 , then [14,Cor. 4.11]

Pairs of nearly colliding nodes
We now study the situation in which the Vandermonde matrix comes from pairs of nearly colliding nodes.
Numerical Algorithms Definition 4.1 Let n ∈ N, N = 2n + 1, c ≥ 1 and let t 1 < · · · < tM 2 ∈ [0, 1) and t M 2 +1 < · · · < t M ∈ [0, 1) for M ≥ 4 even such that: then {t 1 , . . . , t M } is called a set of nodes with pairs of nearly colliding nodes (see Fig. 4 for an illustration). The constant c measures the uniformity of the colliding nodes. For subsequent use, we additionally introduce the following wrap around distance of indices |j − | := min r∈Z j − + r M 2 with respect to M 2 .

Definition 4.2
We define: Note that under the assumptions in Definition 4.1 the Vandermonde matrices A 1 and A 2 are each corresponding to nodes that are at least ρ/N-separated.
The proof technique we use is analogous to the one we used in the case of two nearly colliding nodes. The difference is that we have a matrix K 1 instead of a scalar and the block B is a matrix instead of a vector. Subsequently, Lemma 4.3 establishes an upper bound on K and Lemmata 4.4, 4.5, and 4.6 establish an upper bound on K −1 .

Numerical Algorithms
Proof Similar to Lemma 3.3, we start by noting that B 2 ≤ K 1 K 2 . Together with the decomposition (4.1), the triangle inequality, Lemma A.6, and Theorem 2.1, this leads to: Proof The Dirichlet kernel D n is monotone decreasing on [0, 1/N]. Hence, for the diagonal entries, we obtain: The off-diagonal entries are bounded by the mean value theorem and Lemma A.1 as: Additionally, we set ( R 1 ) jj := N − D n (cτ/N ). We bound the spectral norm of R 1 by the one of the real symmetric matrix R 1 using Lemma A.2 and proceed by: from which the assertion follows.
Proof First, note that: Monotonicity of the Dirichlet kernel D n on t ∈ [0, 1/N] gives: For each fixed off-diagonal entry j = , the matrix 2NI has no contribution. We write the node t j +M/2 as a perturbation of t j by h j := t j +M/2 − t j and expand the Dirichlet kernel by its Taylor polynomial of degree 2 in the point , the constant term, as well as the linear term, cancels out and we get: Lemma A.1 and ξ 1 , . . . , ξ 4 ≥ |j − | ρ/N imply: The matrix 2NI + R * 1 + R 2 is real symmetric so that: and therefore the result holds.
is positive, we have:

Figure 5 visualizes the values of the constantC(τ, ρ, c, M) with respect to ρ and τ . Please note that (i) increasing the constant c by a factor √ 2 has to be compensated approximately by halving τ and doubling ρ and (ii) increasing the number of nodes M from 4 to 64 has to be compensated approximately by tripling ρ.
Proof We proceed analogously to Lemma 3.5 and apply Lemma A.4 to the matrix K decomposed as in (4.1) and obtain: (4.2) Definition 4.2 and Theorem 2.1 yield: together with Lemma A.6, we obtain: Now, we estimate (K 2 − BK −1 1 B * ) −1 , which is done by the following steps: (i) First, note that I − A † 1 A 1 is an orthogonal projector and thus Theorem 2.1 implies: We apply Lemma A.3 with η = 2N, use the identities R 1 = B − K 1 and R 2 = B − K 2 , apply the triangular inequality, and the sub-multiplicativity of the matrix norm to get: (ii) Lemma 4.5 leads to: (iii) We apply Theorem 2.1 and Lemma 4.4 to get: (iv) We use the estimates for the Dirichlet kernel N − D n (τ/N) ≥ Nτ 2 in ii) and N − D n (cτ/N ) ≤ N π 2 6 c 2 τ 2 in iii) (see Lemma A.1), and insert this in (4.3) to get finally: This upper bound also bounds the maximum in (4.2) since for all τ ≤ 1/2 and ρ ≥ 2 together with Theorem 2.1 Proof In Lemma 4.6, the constant C(τ, ρ, c, M) is monotone increasing in τ and monotone decreasing in ρ. Hence, after plugging in the bounds for τ and ρ in our assumptions, it is easy to see that the constant C( 1 4c 2 , 10c 2 (log M 4 + 1), c, M) is monotone decreasing in c and M, respectively. Therefore, we get C(τ, ρ, c, M) ≤ C(1/4, 10, 1, 4) ≤ 11.3, so that K −1 ≤ 11.3N −1 τ −2 . Together with the bound K ≤ 22N/10 = 2.2N from Lemma 4.3, we obtain the result.
If each pair of nearly colliding nodes has the same separation distance, i.e., c = 1, we can improve the upper bound in the sense that restrictions on τ except for τ ≤ 1 can be dropped. In order to obtain the same constant, we have to increase the restrictions on ρ slightly.
Proof The proof is analogous to that of Lemma 4.6, the only difference is in step (iv). Setting c = 1 in (ii) and (iii), expanding the squared bracket in (iii) and inserting this into (4.3) leads to: In three summands, we can factor out N − D n (τ/N) and use the estimate N − D n (τ/N ) ≥ Nτ 2 , leading to a larger bound after inverting the expression in the end. Afterwards, in the third summand N − D n (τ/N) is left, for which we use the rough bound N − D n (τ/N ) ≤ N. In the fourth summand, we use τ ≤ 1 for the single τ . The same argument as in (3.4) shows that this also bounds the maximum in (4.2) and we get the result.
The lower bound is tight and the numerical value 5 in the upper bound follows from our proof technique and can be improved (see Fig. 6). The uniformity condition τ ≤ 1/(4c 2 ) is artificial and, except for the special cases in Theorems 3.6 and 4.9, prevents letting τ → 1. Moreover, the technical condition ρ ≥ ρ min = 25(log M 4 + 1) in Theorem 4.7 is due to the slow decay of the Dirichlet kernel and can be weakened by a preconditioning technique which however leads to a somewhat larger constant in the final result. 1 The diagonal matrix D = diag(1 − |k|/(n + 1)) |k|≤n ∈ C N×N is positive definite with D ≤ 1 and thus the Rayleigh-Riesz characterization of the smallest eigenvalue for Hermitian matrices leads to: The entries of the matrix ADA * consist of Fejér kernel evaluations and analogously to Lemmata 4.4, 4.5, and 4.6 this yields (independently of M): 16c 2 π 4 + 2c 2 π 6 45ρ 4 − 3ρ 2 3ρ 2 − π 2 2c 2 π 2 9 τ + 2cπ 3 3ρ 2 + 9.68c ρ 3 2 . 1 We thank one of the peer reviewers for this clever hint.
The absolute constant 5 in the upper bound of the condition number (or τ √ N A † ≤ √ 11.3 ≈ 3.4) follows from our proof technique and we give a numerical comparison to the approaches [3,4,8,16] in Fig. 7. A short theoretical comparison including different assumptions on N, M, τ , and ρ is given below.    4) is in fact an a priori lower bound on τ which prevents τ → 0 already for moderate fixed M ≥ 3. Recently, we refined this approach in [13], dropped the mentioned dependencies on M and could weaken the condition (4.4) considerably.
Remark 4.13 (Comparison to [8]) This approach deals with pairs of nearly colliding nodes but differs completely from ours and the ones in [3,4,16], and rather generalizes the construction of certain extremal functions in [18] to pairs of nearly colliding nodes and subsets of them. The proven constant in the upper bound given in [8,Cor. 4.2] is τ √ N A † ≤ 9 √ 6/π ≈ 7.0 and thus is slightly larger than ours ( √ 11.3 ≈ 3.4). Using the stronger assumption on τ from our setting in the proof of [8,Thm. 3.6] and improving estimates in [8,Eq. (8)] provides the best result (≈1.7) for pairs of nearly colliding nodes. The conditions τ ≤ 1 and 3 ≤ ρ are quasi-optimal. Provided all technical results prove right, this approach is superior. Tracing back all constants in lemmata and proofs for the case of pairwise nearly colliding nodes, we obtain the uniform off-diagonal estimate: which yields a constant "multiplicative perturbation" in [3, Lem. 5.1] and thus a condition number estimate like Theorem 4.7 or [8] only if τ ≤ C 1 /M and C 2 M ≤ ρ, for some constants C 1 , C 2 . However, note that for two nearly colliding pairs u 1 < u 2 v 1 < v 2 , a direct computation (avoiding a so-called limit basis used in [3]) yields the off-diagonal estimate: Together with ρ ≥ 27 23 · 232(log M 4 + 1) and Lemma A.6, this gives: The Courant-Fisher min-max theorem [11,Thm. 4.2.6] and Weyl's perturbation theorem [11,Thm. 4.3.1] finally yield: Altogether, the improved variant of this technique can be used for nearly colliding pairs, but leads to a stronger assumption on ρ for all moderate uniformity constants c.

Numerical examples
All computations were carried out using MATLAB R2019b. As a test for the bounds in the case of one pair of nearly colliding nodes, we use the following configuration. Let the number of nodes M = 20 and M = 200 be fixed, respectively. Moreover, we choose N = 1 + 12(M − 1) which ensures that all nodes fit on the unit interval. We choose τ ∈ 10 −11 , 1 logarithmically uniformly at random and ρ 3 , . . . , ρ M ∈ [6,12] uniformly at random. Then, we set the nodes t 1 < · · · < t M ∈ [0, 1) such that t 1 = 0, t 2 = τ/N and for j = 3, . . . , M, t j − t j −1 = ρ j /N. Afterwards, the condition number of the corresponding Vandermonde matrix is computed. This procedure is repeated 100 times and the results are presented in Fig. 6 (left). For pairs of nearly colliding nodes, we use the following configuration. Let the number of nodes M = 20 and M = 200 be fixed, respectively. Moreover, we choose the parameter c = 2 and τ max and ρ min as in Theorem 4.7. To ensure that all nodes fit on the unit interval, we choose N as the smallest odd integer bigger than (cτ max + 2ρ min )M/2. Then, we choose τ ∈ 10 −11 , 1 logarithmically uniformly at random and set the nodes t 1 < · · · < t M ∈ [0, 1) such that t 1 = 0, t 2 = τ/N and for and ρ j ∈ [ρ min , 2ρ min ] are picked uniformly at random, respectively. Afterwards, the condition number of the corresponding Vandermonde matrix is computed. This procedure is repeated 100 times and the results are presented in Fig. 6 (right). Note that Theorem 4.7 makes the restriction τ ≤ τ max = 1 4 , which seems to be an artifact of our proof technique.
In order to compare Theorem 4.7 with the results from [4, Cor. 3.6], we need to satisfy the assumptions of both results. We take M = 3 nodes with two nodes nearly colliding, i.e., t 1 = 0, t 1 = τ/N and t 2 = t 1 + ρ/N. The assumptions in [4,Cor. 3.6] make it necessary that the nodes lie on an interval of length 1 2M 2 = 1 18 . We choose the parameter c = 1, ρ min = 12, and N = 1001. Then, we pick τ ∈ 10 −11 , 1 logarithmically uniformly at random and ρ ∈ ρ min , N 2M 2 − τ uniformly at random. Afterwards, the inverse of the smallest singular value (norm of Moore-Penrose pseudo inverse) of the corresponding Vandermonde matrix is computed. This procedure is repeated 100 times and the results normalized by τ √ N are presented in Fig. 7 (left). From [4, Cor. 3.6], we get: whereas Theorem 4.7 provides again A † ≤ 3.4 · 1 τ √ N for τ ≤ 1 4 . We note that our bound remains valid for c > 1 but the restriction on τ becomes more severe.

Summary
We proved upper and lower bounds for the spectral condition number of rectangular Vandermonde matrices with nodes on the complex unit circle. If pairs of nodes nearly collide, the studied condition number grows linearly with the inverse separation distance. In contrast to the more general results [4,16], we provide reasonable sharp and absolute constants but have to admit that our technique most likely will not generalize to more than two nodes nearly colliding. Note that our easy to achieve lower bound seems to capture the situation more accurately than the upper bound. We posed mild technical conditions in our proofs, which cannot be confirmed to be necessary numerically. While [4] provided the right growth order for the first time, some of the imposed conditions are very restrictive and the involved constants are quite pessimistic. The second version of [16] provided a quite general framework and presented decent results with only a mild artificial growth of the condition number with respect to the number of nodes. Moreover, a technical condition there prevents the separation distance from going to zero for a fixed number of nodes and a fixed bandwidth. We believe that both problems can be fixed at least partially and thus [16] seems to be a good framework for understanding node configurations with nearly colliding nodes. Recently, the manuscript [8] came to our attention-it considers pairs of nearly colliding nodes and weakens the assumptions considerably and gives, after modifications, stronger bounds on the smallest singular value. The taken approach differs completely from ours and the ones in [4,16], but rather generalizes the construction of [18] to pairs of nearly colliding nodes.
Proof Due to symmetry, it suffices to prove all bounds for t > 0 and we use the explicit expression of the Dirichlet kernel in (2.2). The lower bound on t can be derived from the inequalities x − x 3 /6 ≤ sin(x) ≤ x, that hold for all x ∈ [0, π]. The left inequality with x = Nπt and the right inequality with x = πt lead to: sin(Nπ t) ≥ N − π 2 6 N 3 t 2 πt ≥ N − π 2 6 N 3 t 2 sin(π t).
The upper bound on D n (t) can be derived from the inequality cos(αx) ≤ cos(x) that holds for all x ∈ [0, π/2] and α > 1 such that αx ∈ [0, π/2]. Integrating this inequality, choosing α = N/2 and x = πt, and applying the double angle formula yields: sin(Nπ t) Reordering the inequality and applying that cos(x) ≤ 1−4x 2 /π 2 for all x ∈ [0, π/2] yields: sin(Nπ t) sin(π t) ≤ N cos Finally, the remaining bounds on the absolute values can be proven by calculating the first and second derivatives and using sin(x) ≥ 2x/π and cot(x) ≤ 1/x that hold for all x ∈ (0, π/2]. Proof We directly show the result by: Note that similar estimates can be found for the Frobenius norm in [11, p. 520].

Lemma A.3 (Norm of matrix inverse)
Let M ∈ C n×n Hermitian and positive definite and I ∈ C n×n the identity matrix. Let η ∈ R be a parameter satisfying η > M , then: Proof Since M is positive definite, let its real, positive eigenvalues be given by λ 1 (M) ≥ · · · ≥ λ n (M) > 0. By assumption η > M = λ max (M) and