Abstract
We exhibit a randomized algorithm which, given a square matrix \(A\in \mathbb {C}^{n\times n}\) with \(\Vert A\Vert \le 1\) and \(\delta >0\), computes with high probability an invertible V and diagonal D such that \( \Vert A-VDV^{-1}\Vert \le \delta \) using \(O(T_\mathsf {MM}(n)\log ^2(n/\delta ))\) arithmetic operations, in finite arithmetic with \(O(\log ^4(n/\delta )\log n)\) bits of precision. The computed similarity V additionally satisfies \(\Vert V\Vert \Vert V^{-1}\Vert \le O(n^{2.5}/\delta )\). Here \(T_\mathsf {MM}(n)\) is the number of arithmetic operations required to multiply two \(n\times n\) complex matrices numerically stably, known to satisfy \(T_\mathsf {MM}(n)=O(n^{\omega +\eta })\) for every \(\eta >0\) where \(\omega \) is the exponent of matrix multiplication (Demmel et al. in Numer Math 108(1):59–91, 2007). The algorithm is a variant of the spectral bisection algorithm in numerical linear algebra (Beavers Jr. and Denman in Numer Math 21(1-2):143–169, 1974) with a crucial Gaussian perturbation preprocessing step. Our result significantly improves the previously best-known provable running times of \(O(n^{10}/\delta ^2)\) arithmetic operations for diagonalization of general matrices (Armentano et al. in J Eur Math Soc 20(6):1375–1437, 2018) and (with regard to the dependence on n) \(O(n^3)\) arithmetic operations for Hermitian matrices (Dekker and Traub in Linear Algebra Appl 4:137–154, 1971). It is the first algorithm to achieve nearly matrix multiplication time for diagonalization in any model of computation (real arithmetic, rational arithmetic, or finite arithmetic), thereby matching the complexity of other dense linear algebra operations such as inversion and QR factorization up to polylogarithmic factors. The proof rests on two new ingredients. (1) We show that adding a small complex Gaussian perturbation to any matrix splits its pseudospectrum into n small well-separated components. In particular, this implies that the eigenvalues of the perturbed matrix have a large minimum gap, a property of independent interest in random matrix theory. (2) We give a rigorous analysis of Roberts’ Newton iteration method (Roberts in Int J Control 32(4):677–687, 1980) for computing the sign function of a matrix in finite arithmetic, itself an open problem in numerical analysis since at least 1986.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We study the algorithmic problem of approximately finding all of the eigenvalues and eigenvectors of a given arbitrary \(n\times n\) complex matrix. While this problem is quite well-understood in the special case of Hermitian matrices (see, e.g., [52]), the general non-Hermitian case has remained mysterious from a theoretical standpoint even after several decades of research. In particular, the currently best-known provable algorithms for this problem run in time \(O(n^{10}/\delta ^2)\) [2] or \(O(n^c\log (1/\delta ))\) [17] with \(c\ge 12\) where \(\delta >0\) is the desired accuracy, depending on the model of computation and notion of approximation considered.Footnote 1 To be sure, the non-Hermitian case is well-motivated: coupled systems of differential equations, linear dynamical systems in control theory, transfer operators in mathematical physics, and the nonbacktracking matrix in spectral graph theory are but a few situations where finding the eigenvalues and eigenvectors of a non-Hermitian matrix is important.
The key difficulties in dealing with non-normal matrices are the interrelated phenomena of non-orthogonal eigenvectors and spectral instability, the latter referring to extreme sensitivity of the eigenvalues and invariant subspaces to perturbations of the matrix. Non-orthogonality slows down convergence of standard algorithms such as the power method, and spectral instability can force the use of very high precision arithmetic, also leading to slower algorithms. Both phenomena together make it difficult to reduce the eigenproblem to a subproblem by “removing” an eigenvector or invariant subspace, since this can only be done approximately and one must control the spectral stability of the subproblem in order to be able to rigorously reason about it.
In this paper, we overcome these difficulties by identifying and leveraging a phenomenon we refer to as pseudospectral shattering: adding a small complex Gaussian perturbation to any matrix typically yields a matrix with well-conditioned eigenvectors and a large minimum gap between the eigenvalues, implying spectral stability. Previously, even the existence of such a regularizing perturbation with favorable parameters was not known [20]. This result builds on the recent solution of Davies’ conjecture [9] and is of independent interest in random matrix theory, where minimum eigenvalue gap bounds in the non-Hermitian case were previously only known for i.i.d. models [33, 55].
We complement the above by proving that a variant of the well-known spectral bisection algorithm in numerical linear algebra [11] is both fast and numerically stable when run on a pseudospectrally shattered matrix—we call an iterative algorithm numerically stable if it can be implemented using finite precision arithmetic with polylogarithmically many bits, corresponding to a dynamical system whose trajectory to the approximate solution is robust to adversarial noise (see, e.g. [57]). The key step in the bisection algorithm is computing the sign function of a matrix, a problem of independent interest in many areas such as control theory and approximation theory [44]. Our main algorithmic contribution is a rigorous analysis of the well-known Newton iteration method [53] for computing the sign function in finite arithmetic, showing that it converges quickly and numerically stably on matrices for which the sign function is well-conditioned, in particular on pseudospectrally shattered ones.
The end result is an algorithm which reduces the general diagonalization problem to a polylogarithmic (in the desired accuracy and dimension n) number of invocations of standard numerical linear algebra routines (multiplication, inversion, and QR factorization), each of which is reducible to matrix multiplication [22], yielding a nearly matrix multiplication runtime for the whole algorithm. This improves on the previously best-known running time of \(O(n^3+n^2\log (1/\delta ))\) arithmetic operations even in the Hermitian case ([21], see also [41, 52]), and yields the same improvement for the related problem of computing the singular value decomposition of a matrix.
We now proceed to give precise mathematical formulations of the eigenproblem and computational model, followed by statements of our results and a detailed discussion of related work.
1.1 Problem Statement
An eigenpair of a matrix \(A\in \mathbb {C}^{n\times n}\) is a tuple \((\lambda , v)\in \mathbb {C}\times \mathbb {C}^n\) such that
and v is normalized to be a unit vector. The eigenproblem is the problem of finding a maximal set of linearly independent eigenpairs \((\lambda _i,v_i)\) of a given matrix A; note that an eigenvalue may appear more than once if it has geometric multiplicity greater than one. In the case when A is diagonalizable, the solution consists of exactly n eigenpairs, and if A has distinct eigenvalues then the solution is unique, up to the phases of the \(v_i\).
1.1.1 Accuracy and Conditioning
Due to the Abel–Ruffini theorem, it is impossible to have a finite-time algorithm which solves the eigenproblem exactly using arithmetic operations and radicals. Thus, all we can hope for is approximate eigenvalues and eigenvectors, up to a desired accuracy \(\delta >0\). There are two standard notions of approximation. We assume \(\Vert A\Vert \le 1\) for normalization, where throughout this work, \(\Vert \cdot \Vert \) denotes the spectral norm (the \(\ell ^2 \rightarrow \ell ^2\) operator norm).
Forward Approximation. Compute pairs \((\lambda _i',v_i')\) such that
for the true eigenpairs \((\lambda _i,v_i)\), i.e., find a solution close to the exact solution. This makes sense in contexts where the exact solution is meaningful, e.g., the matrix is of theoretical/mathematical origin, and unstable (in the entries) quantities such as eigenvalue multiplicity can have a significant meaning.
Backward Approximation. Compute \((\lambda _i',v_i')\) which are the exact eigenpairs of a matrix \(A'\) satisfying
i.e., find the exact solution to a nearby problem. This is the appropriate and standard notion in scientific computing, where the matrix is of physical or empirical origin and is not assumed to be known exactly (and even if it were, roundoff error would destroy this exactness). Note that since diagonalizable matrices are dense in \(\mathbb {C}^{n\times n}\), one can hope to always find a complete set of eigenpairs for some nearby \(A'=VDV^{-1}\), yielding an approximate diagonalization of A:
Note that the eigenproblem in either of the above formulations is not easily reducible to the problem of computing eigenvalues, since they can only be computed approximately and it is not clear how to obtain approximate eigenvectors from approximate eigenvalues. We now introduce a condition number for the eigenproblem, which measures the sensitivity of the eigenpairs of a matrix to perturbations and allows us to relate its forward and backward approximate solutions.
Condition Numbers. For diagonalizable A, the eigenvector condition number of A, denoted \(\kappa _V(A)\), is defined as:
where the infimum is over all invertible V such that \(A=VDV^{-1}\) for some diagonal D, and its minimum eigenvalue gap is defined as:
where \(\lambda _i\) are the eigenvalues of A (with multiplicity).
We define the condition number of the eigenproblem to beFootnote 2:
It follows from the proposition below (whose proof appears in Sect. 2.2) that a \(\delta \)-backward approximate solution of the eigenproblem is a \(6n\kappa _{\mathrm {eig}}(A)\delta \)-forward approximate solution.Footnote 3
Proposition 1.1
If \(\Vert A\Vert ,\Vert A'\Vert \le 1\), \(\Vert A-A'\Vert \le \delta \), and \(\{(v_i,\lambda _i)\}_{i\le n}\), \(\{(v_i',\lambda _i')\}_{i\le n}\) are eigenpairs of \(A,A'\) with distinct eigenvalues, and \(\delta < \frac{\mathrm {gap}(A)}{8 \kappa _V(A)}\), then
after possibly multiplying the \(v_i\) by phases.
Note that \(\kappa _{\mathrm {eig}}=\infty \) if and only if A has a double eigenvalue; in this case, a relation like (4) is not possible since different infinitesimal changes to A can produce macroscopically different eigenpairs.
In this paper we will present a backward approximation for the eigenproblem with running time scaling polynomially in \(\log (1/\delta )\), which by (4) yields a forward approximation algorithm with running time scaling polynomially in \(\log (1/\kappa _{\mathrm {eig}}\delta )\).
Remark 1.2
(Multiple Eigenvalues) A backward approximation algorithm for the eigenproblem can be used to accurately find bases for the eigenspaces of matrices with multiple eigenvalues, but quantifying the forward error requires introducing condition numbers for invariant subspaces rather than eigenpairs. A standard treatment of this can be found in any numerical linear algebra textbook, e.g. [26], and we do not discuss it further in this paper for simplicity of exposition.
1.1.2 Models of Computation
These questions may be studied in various computational models: exact real arithmetic (i.e., infinite precision), variable precision rational arithmetic (rationals are stored exactly as numerators and denominators), and finite precision arithmetic (real numbers are rounded to a fixed number of bits which may depend on the input size and accuracy). Only the last two models yield actual Boolean complexity bounds, but introduce a second source of error stemming from the fact that computers cannot exactly represent real numbers.
We study the third model in this paper, axiomatized as follows.
Finite Precision Arithmetic. We use the standard floating point axioms from [39]. Numbers are stored and manipulated approximately up to some machine precision \({\textbf {u }}:={\textbf {u }}(\delta ,n)>0\), which for us will depend on the instance size n and desired accuracy \(\delta \). This means every number \(x\in \mathbb {C}\) is stored as \(\mathsf {fl}(x)=(1+\Delta )x\) for some adversarially chosen \(\Delta \in \mathbb {C}\) satisfying \(|\Delta |\le {\textbf {u }}\), and each arithmetic operation \(\circ \in \{+,-,\times ,\div \}\) is guaranteed to yield an output satisfying
It is also standard and convenient to assume that we can evaluate \(\sqrt{x}\) for any \(x\in \mathbb {R}\), where again \(\mathsf {fl}(\sqrt{x}) = \sqrt{x} (1 + \Delta )\) for \(|\Delta | \le {\textbf {u }}\).
Thus, the outcomes of all operations are adversarially noisy due to roundoff. The bit lengths of numbers stored in this form remain fixed at \(\lg (1/{\textbf {u }})\), where \(\lg \) denotes the logarithm base 2. The bit complexity of an algorithm is therefore the number of arithmetic operations times \(O^*(\log (1/{\textbf {u }}))\), the running time of standard floating point arithmetic, where the \(*\) suppresses \(\log \log (1/{\textbf {u }})\) factors. We will state all running times in terms of arithmetic operations accompanied by the required number of bits of precision, which thereby immediately imply bit complexity bounds.
Remark 1.3
(Overflow, Underflow, and Additive Error) Using p bits for the exponent in the floating-point representation allows one to represent numbers with magnitude in the range \([2^{-2^p},2^{2^p}]\). It can be easily checked that all of the nonzero numbers, norms, and condition numbers appearing during the execution of our algorithms lie in the range \([2^{-\lg ^c(n/\delta )},2^{\lg ^c(n/\delta )}]\) for some small c, so overflow and underflow do not occur. In fact, we could have analyzed our algorithm in a computational model where every number is simply rounded to the nearest rational with denominator \(2^{\lg ^c(n/\delta )}\)—corresponding to additive arithmetic errors. We have chosen to use the multiplicative error floating point model since it is the standard in numerical analysis, but our algorithms do not exploit any subtleties arising from the difference between the two models.
The advantages of the floating point model are that it is realistic and potentially yields very fast algorithms by using a small number of bits of precision (polylogarithmic in n and \(1/\delta \)), in contrast to rational arithmetic, where even a simple operation such as inverting an \(n\times n\) integer matrix requires n extra bits of precision (see, e.g., Chapter 1 of [35]). An iterative algorithm that can be implemented in finite precision (typically, polylogarithmic in the input size and desired accuracy) is called numerically stable.
The disadvantage of the model is that it is only possible to compute forward approximations of quantities which are well-conditioned in the input—in particular, discontinuous quantities such as eigenvalue multiplicity cannot be computed in the floating point model, since it is not even assumed that the input is stored exactly.
1.2 Results and Techniques
In addition to \(\kappa _{\mathrm {eig}}\), we will need some more refined quantities to measure the stability of the eigenvalues and eigenvectors of a matrix to perturbations, and to state our results regarding it. The most important of these is the \(\epsilon \)-pseudospectrum, defined for any \(\epsilon >0\) and \(M\in \mathbb {C}^{n\times n}\) as:
where \(\Lambda (\cdot )\) denotes the spectrum of a matrix. The equivalence of (5) and (6) is simple and can be found in the excellent book [62].
Eigenvalue Gaps, \(\kappa _V\), and Pseudospectral Shattering. The key probabilistic result of the paper is that a random complex Gaussian perturbation of any matrix yields a nearby matrix with large minimum eigenvalue gap and small \(\kappa _V\).
Theorem 1.4
(Smoothed Analysis of \(\mathrm {gap}\) and \(\kappa _V\)) Suppose \(A\in \mathbb {C}^{n\times n}\) with \(\Vert A\Vert \le 1\), and \(\gamma \in (0,1/2)\). Let \(G_n\) be an \(n\times n\) matrix with i.i.d. complex Gaussian \(N(0,1_\mathbb {C}/n)\) entries, and let \(X:=A+\gamma G_n\). Then
with probability at least \(1-12/n\).
The proof of Theorem 1.4 appears in Sect. 3.1. The key idea is to first control \(\kappa _V(X)\) using [9] and then observe that for a matrix with small \(\kappa _V\), two eigenvalues of X near a complex number z imply a small second-least singular value of \(z-X\), which we are able to control.
In Sect. 3.2 we develop the notion of pseudospectral shattering, which is implied by Theorem 1.4 and says roughly that the pseudospectrum consists of n components that lie in separate squares of an appropriately coarse grid in the complex plane. This is useful in the analysis of the spectral bisection algorithm in Sect. 5.
Matrix Sign Function. The sign function of a number \(z\in \mathbb {C}\) with \({\text {Re}}(z)\ne 0\) is defined as \(+1\) if \({\text {Re}}(z)>0\) and \(-1\) if \({\text {Re}}(z)<0\). The matrix sign function of a matrix A with Jordan normal form
where N (resp. P) has eigenvalues with strictly negative (resp. positive) real part, is defined as
where \(I_P\) denotes the identity of the same size as P. The sign function is undefined for matrices with eigenvalues on the imaginary axis. Quantifying this discontinuity, Bai and Demmel [4] defined the following condition number for the sign function:
and gave perturbation bounds for \(\mathrm {sgn}(M)\) depending on \(\kappa _{\mathrm {sign}}\).
Roberts [53] showed that the simple iteration
converges globally and quadratically to \(\mathrm {sgn}(A)\) in exact arithmetic, but his proof relies on the fact that all iterates of the algorithm are simultaneously diagonalizable, a property which is destroyed in finite arithmetic since inversions can only be done approximately.Footnote 4 In Sect. 4 we show that this iteration is indeed convergent when implemented in finite arithmetic for matrices with small \(\kappa _{\mathrm {sign}}\), given a numerically stable matrix inversion algorithm. This leads to the following result:
Theorem 1.5
(Sign Function Algorithm) There is a deterministic algorithm \(\mathsf {SGN}\) which on input an \(n \times n\) matrix A with \(\Vert A\Vert \le 1\), a number K with \(K \ge \kappa _{\mathrm {sign}}(A)\), and a desired accuracy \(\beta \in (0, 1/12)\), outputs an approximation \(\mathsf {SGN}(A)\) with
in
arithmetic operations on a floating point machine with
bits of precision, where \(T_\mathsf {INV}(n)\) denotes the number of arithmetic operations used by a numerically stable matrix inversion algorithm (satisfying Definition 2.7).
The main new idea in the proof of Theorem 1.5 is to control the evolution of the pseudospectra \(\Lambda _{\epsilon _k}(A_k)\) of the iterates with appropriately decreasing (in k) parameters \(\epsilon _k\), using a sequence of carefully chosen shrinking contour integrals in the complex plane. The pseudospectrum provides a richer induction hypothesis than scalar quantities such as condition numbers, and allows one to control all quantities of interest using the holomorphic functional calculus. This technique is introduced in Sects. 4.1 and 4.2, and carried out in finite arithmetic in Sect. 4.3, yielding Theorem 1.5.
Diagonalization by Spectral Bisection. Given an algorithm for computing the sign function, there is a natural and well-known approach to the eigenproblem pioneered in [11]. The idea is that the matrices \((I\pm \mathrm {sgn}(A))/2\) are spectral projectors onto the invariant subspaces corresponding to the eigenvalues of A in the left and right open half planes, so if some shifted matrix \(z + A\) or \(z + iA\) has roughly half its eigenvalues in each half plane, the problem can be reduced to smaller subproblems appropriate for recursion.
The two difficulties in carrying out the above approach are: (a) efficiently computing the sign function (b) finding a balanced splitting along an axis that is well-separated from the spectrum. These are nontrivial even in exact arithmetic, since the iteration (8) converges slowly if (b) is not satisfied, even without roundoff error. We use Theorem 1.4 to ensure that a good splitting always exists after a small Gaussian perturbation of order \(\delta \), and Theorem 1.5 to compute splittings efficiently in finite precision. Combining this with well-understood techniques such as rank-revealing QR factorization, we obtain the following theorem, whose proof appears in Sect. 5.1.
Theorem 1.6
(Backward Approximation Algorithm) There is a randomized algorithm \(\mathsf {EIG}\) which on input any matrix \(A\in \mathbb {C}^{n\times n}\) with \(\Vert A\Vert \le 1\) and a desired accuracy parameter \(\delta >0\) outputs a diagonal D and invertible V such that
in
arithmetic operations on a floating point machine with
bits of precision, with probability at least \(1-14/n\). Here \(T_\mathsf {MM}(n)\) refers to the running time of a numerically stable matrix multiplication algorithm (detailed in Sect. 2.5).
Since there is a correspondence in terms of the condition number between backward and forward approximations, and as it is customary in numerical analysis, our discussion revolves around backward approximation guarantees. For convenience of the reader, we write down below the explicit guarantees that one gets by using (4) and invoking \(\mathsf {EIG}\) with accuracy \(\frac{\delta }{6n \kappa _{\mathrm {eig}}}\).
Corollary 1.7
(Forward Approximation Algorithm) There is a randomized algorithm which on input any matrix \(A\in \mathbb {C}^{n\times n}\) with \(\Vert A\Vert \le 1\), a desired accuracy parameter \(\delta >0\), and an estimate \(K\ge \kappa _{\mathrm {eig}}(A)\) outputs a \(\delta \)-forward approximate solution to the eigenproblem for A in
arithmetic operations on a floating point machine with
bits of precision, with probability at least \(1-1/n-12/n^2\). Here \(T_\mathsf {MM}(n)\) refers to the running time of a numerically stable matrix multiplication algorithm (detailed in Sect. 2.5).
Remark 1.8
(Accuracy vs. Precision) The gold standard of “backward stability” in numerical analysis postulates that
i.e., the number of bits of precision is linear in the number of bits of accuracy. The relaxed notion of “logarithmic stability” introduced in [23] requires
for some constant c, where \(\kappa \) is an appropriate condition number. In comparison, Theorem 1.6 obtains the weaker relationship
which is still polylogarithmic in n in the regime \(\delta =1/\mathrm {poly}(n)\).
1.3 Related Work
Minimum Eigenvalue Gap. The minimum eigenvalue gap of random matrices has been studied in the case of Hermitian and unitary matrices, beginning with the work of Vinson [64], who proved an \(\Omega (n^{-4/3})\) lower bound on this gap in the case of the Gaussian Unitary Ensemble (GUE) and the Circular Unitary Ensemble (CUE). Bourgade and Ben Arous [3] derived exact limiting formulas for the distributions of all the gaps for the same ensembles. Nguyen, Tao, and Vu [50] obtained non-asymptotic inverse polynomial bounds for a large class of non-integrable Hermitian models with i.i.d. entries (including Bernoulli matrices).
In a different direction, Aizenman et al. proved an inverse-polynomial bound [1] in the case of an arbitrary Hermitian matrix plus a GUE matrix or a Gaussian orthogonal ensemble (GOE) matrix, which may be viewed as a smoothed analysis of the minimum gap. Theorem 3.6 may be viewed as a non-Hermitian analogue of the last result.
In the non-Hermitian case, Ge [33] obtained an inverse polynomial bound for i.i.d. matrices with real entries satisfying some mild moment conditions, and [55]Footnote 5 proved an inverse polynomial lower bound for the complex Ginibre ensemble. Theorem 3.6 may be seen as a generalization of these results to non-centered complex Gaussian matrices.
Smoothed Analysis and Free Probability. The study of numerical algorithms on Gaussian random matrices (i.e., the case \(A=0\) of smoothed analysis) dates back to [25, 29, 56, 65]. The powerful idea of improving the conditioning of a numerical computation by adding a small amount of Gaussian noise was introduced by Spielman and Teng in [59], in the context of the simplex algorithm. Sankar, Spielman, and Teng [54] showed that adding real Gaussian noise to any matrix yields a matrix with polynomially bounded condition number; [9] can be seen as an extension of this result to the condition number of the eigenvector matrix, where the proof crucially requires that the Gaussian perturbation is complex rather than real. The main difference between our results and most of the results on smoothed analysis (including [2]) is that our running time depends logarithmically rather than polynomially on the size of the perturbation.
The broad idea of regularizing the spectral instability of a nonnormal matrix by adding a random matrix can be traced back to the work of Śniady [58] and Haagerup and Larsen [37] in the context of Free Probability theory.
Matrix Sign Function. The matrix sign function was introduced by Zolotarev in 1877. It became a popular topic in numerical analysis following the work of Beavers and Denman [10, 11, 27] and Roberts [53], who used it first to solve the algebraic Ricatti and Lyapunov equations and then as an approach to the eigenproblem; see [44] for a broad survey of its early history. The numerical stability of Roberts’ Newton iteration was investigated by Byers [14], who identified some cases where it is and isn’t stable. Malyshev [46], Byers et al. [15], Bai et al. [5], and Bai and Demmel [4] studied the condition number of the matrix sign function, and showed that if the Newton iteration converges then it can be used to obtain a high-quality invariant subspaceFootnote 6, but did not prove convergence in finite arithmetic and left this as an open question.Footnote 7 The key issue in analyzing the convergence of the iteration is to bound the condition numbers of the intermediate matrices that appear, as N. Higham remarks in his 2008 textbook:
Of course, to obtain a complete picture, we also need to understand the effect of rounding errors on the iteration prior to convergence. This effect is surprisingly difficult to analyze. \(\ldots \) Since errors will in general occur on each iteration, the overall error will be a complicated function of \(\kappa _{sign}(X_k)\) and \(E_k\) for all k. \(\ldots \) We are not aware of any published rounding error analysis for the computation of sign(A) via the Newton iteration.—[40, Section 5.7]
This is precisely the problem solved by Theorem 1.5, which is as far as we know the first provable algorithm for computing the sign function of an arbitrary matrix which does not require computing the Jordan form.
In the special case of Hermitian matrices, Higham [38] established efficient reductions between the sign function and the polar decomposition. Byers and Xu [16] proved backward stability of a certain scaled version of the Newton iteration for Hermitian matrices, in the context of computing the polar decomposition. Higham and Nakatsukasa [49] (see also the improvement [48]) proved backward stability of a different iterative scheme for computing the polar decomposition, and used it to give backward stable spectral bisection algorithms for the Hermitian eigenproblem with \(O(n^3)\)-type complexity.
Non-Hermitian Eigenproblem. Floating Point Arithmetic. The eigenproblem has been thoroughly studied in the numerical analysis community, in the floating point model of computation. While there are provably fast and accurate algorithms in the Hermitian case (see the next subsection) and a large body of work for various structured matrices (see, e.g., [13]), the general case is not nearly as well-understood. As recently as 1997, J. Demmel remarked in his well-known textbook [26]: “\(\ldots \) the problem of devising an algorithm [for the non-Hermitian eigenproblem] that is numerically stable and globally (and quickly!) convergent remains open.”
Demmel’s question remained entirely open until 2015, when it was answered in the following sense by Armentano, Beltrán, Bürgisser, Cucker, and Shub in the remarkable paper [2]. They exhibited an algorithm (see their Theorem 2.28) which given any \(A\in \mathbb {C}^{n\times n}\) with \(\Vert A\Vert \le 1\) and \(\sigma >0\) produces in \(O(n^{9}/\sigma ^2)\) expected arithmetic operations the diagonalization of the nearby random perturbation \(A+\sigma G\) where G is a matrix with standard complex Gaussian entries. By setting \(\sigma \) sufficiently small, this may be viewed as a backward approximation algorithm for diagonalization, in that it solves a nearby problem essentially exactlyFootnote 8—in particular, by setting \(\sigma =\delta /\sqrt{n}\) and noting that \(\Vert G\Vert =O(\sqrt{n})\) with very high probability, their result implies a running time of \(O(n^{10}/\delta ^2)\) in our setting. Their algorithm is based on homotopy continuation methods, which they argue informally are numerically stable and can be implemented in finite precision arithmetic. Our algorithm is similar on a high level in that it adds a Gaussian perturbation to the input and then obtains a high accuracy forward approximate solution to the perturbed problem. The difference is that their overall running time depends polynomially rather than logarithmically on the accuracy \(\delta \) desired with respect to the original unperturbed problem (Table 1).
Other Models of Computation. If we relax the requirements further and ask for any provable algorithm in any model of Boolean computation, there is only one more positive result with a polynomial bound on the number of bit operations: Jin Yi Cai showed in 1994 [17] that given a rational \(n\times n\) matrix A with integer entries of bit length a, one can find an \(\delta \)-forward approximation to its Jordan Normal Form \(A=VJV^{-1}\) in time \(\mathrm {poly}(n,a,\log (1/\delta ))\), where the degree of the polynomial is at least 12. This algorithm works in the rational arithmetic model of computation, so it does not quite answer Demmel’s question since it is not a numerically stable algorithm. However, it enjoys the significant advantage of being able to compute forward approximations to discontinuous quantities such as the Jordan structure (Table 2).
As far as we are aware, there are no other published provably polynomial-time algorithms for the general eigenproblem. The two standard references for diagonalization appearing most often in theoretical computer science papers do not meet this criterion. In particular, the widely cited work by Pan and Chen [51] proves that one can compute the eigenvalues of A in \(O(n^\omega + n\log \log (1/\delta ))\) (suppressing logarithmic factors) arithmetic operations by finding the roots of its characteristic polynomial, which becomes a bound of \(O(n^{\omega +1}a+n^2\log (1/\delta )\log \log (1/\delta ))\) bit operations if the characteristic polynomial is computed exactly in rational arithmetic and the matrix has entries of bit length a. However that paper does not give any bound for the amount of time taken to find approximate eigenvectors from approximate eigenvalues, and states this as an open problem.Footnote 9
Finally, the important work of Demmel et al. [22] (see also the followup [6]), which we rely on heavily, does not claim to provably solve the eigenproblem either—it bounds the running time of one iteration of a specific algorithm, and shows that such an iteration can be implemented numerically stably, without proving any bound on the number of iterations required in general.
Hermitian Eigenproblem. For comparison, the eigenproblem for Hermitian matrices is much better understood. We cannot give a complete bibliography of this huge area, but mention one relevant landmark result: the work of Wilkinson [66], who exhibited a globally convergent diagonalization algorithm, and the work of Dekker and Traub [21] who quantified the rate of convergence of Wilkinson’s algorithm and from which it follows that the Hermitian eigenproblem can be solved with backward error \(\delta \) in \(O(n^3+n^2\log (1/\delta ))\) arithmetic operations in exact arithmetic.Footnote 10 We refer the reader to [52, §8.10] for the simplest and most insightful proof of this result, due to Hoffman and Parlett [41]
There has also recently been renewed interest in this problem in the theoretical computer science community, with the goal of bringing the runtime close to \(O(n^\omega )\): Louis and Vempala [45] show how to find a \(\delta \)-approximation of just the largest eigenvalue in \(O(n^\omega \log ^4(n)\log ^2(1/\delta ))\) bit operations, and Ben-Or and Eldar [12] give an \(O(n^{\omega +1}\mathrm {polylog}(n))\)-bit-operation algorithm for finding a \(1/\mathrm {poly}(n)\)-approximate diagonalization of an \(n\times n\) Hermitian matrix normalized to have \(\Vert A\Vert \le 1\).
Remark 1.9
(Davies’ Conjecture) The beautiful paper [20] introduced the idea of approximating a matrix function f(A) for nonnormal A by \(f(A+E)\) for some well-chosen E regularizing the eigenvectors of A. This directly inspired our approach to solving the eigenproblem via regularization.
The existence of an approximate diagonalization (1) for every A with a well-conditioned similarity V (i.e, \(\kappa (V)\) depending polynomially on \(\delta \) and n) was precisely the content of Davies’ conjecture [20], which was recently solved by some of the authors and Mukherjee in [9]. The existence of such a V is a pre-requisite for proving that one can always efficiently find an approximate diagonalization in finite arithmetic, since if \(\Vert V\Vert \Vert V^{-1}\Vert \) is very large it may require many bits of precision to represent. Thus, Theorem 1.6 can be viewed as an efficient algorithmic answer to Davies’ question.
Remark 1.10
(Subsequent work in Random Matrix Theory) Since the first version of the present paper was made public there have been some advances in random matrix theory [8, 43] that prove analogues of Theorem 1.4 in the case where \(G_n\) is replaced by a perturbation with random real independent entries. These results formally articulate that, in the context of this paper, there is nothing special about complex Ginibre matrices, and that the same regularization effect can be achieved using a broader class of perturbations. Bounding the eigenvector condition number and the eigenvalue gap when the random perturbation has real entries poses interesting technical challenges that were tackled in different ways in the aforementioned papers. We also refer the reader to [18] where optimal results were obtained in the case where \(A=0\) and \(G_n\) has real Gaussian entries.
Remark 1.11
(Alternate Proofs using [2]) In October 2021 (about two years after the first appearance of this paper), we noticed that a version of Theorem 1.4 (with a worse \(\kappa _V\) bound but a better eigenvalue gap bound) as well as the main theorem of [9] (with a slightly worse dependence on n) can be easily derived from some auxiliary results shown in [2] (specifically Proposition 2.7 and Theorem 2.14 of that paper), which we were not previously aware of. We present these short alternate proofs in “Appendix D”. We remark that our original proofs are essentially different from those appearing in [2]—in particular, they rely on studying the area of pseudospectra, whereas the proof of Theorem 2.14 of [2] relies on geometric concepts and the coarea formula for Gaussian integrals of certain determinantal quantities on Riemannian manifolds. The proofs based on pseudospectra are arguably more flexible; as mentioned in Remark 1.10, they have been recently generalized to ensembles besides the complex Ginibre ensemble, which seems difficult to do for the more algebraic proofs of [2].
Reader Guide. This paper contains a lot of parameters and constants. On first reading, it may be good to largely ignore the constants not appearing in exponents and to keep in mind the typical setting \(\delta =1/\mathrm {poly}(n)\) for the accuracy, in which case the important auxiliary parameters \(\omega , 1-\alpha , \epsilon , \beta , \eta \) are all \(1/\mathrm {poly}(n)\), and the machine precision is \(\log (1/{\textbf {u }})=\mathrm {polylog}(n)\).
2 Preliminaries
Let \(M \in \mathbb {C}^{n\times n}\) be a complex matrix, not necessarily normal. We will write matrices and vectors with uppercase and lowercase letters, respectively. Let us denote by \(\Lambda (M)\) the spectrum of M and by \(\lambda _i(M)\) its individual eigenvalues. In the same way we denote the singular values of M by \(\sigma _i(M)\) and we adopt the convention \(\sigma _1(M) \ge \sigma _2(M) \ge \cdots \ge \sigma _n(M)\). When M is clear from the context we will simplify notation and just write \(\Lambda , \lambda _i\) or \(\sigma _i\), respectively.
Recall that the operator norm of M is
As usual, we will say that M is diagonalizable if it can be written as \(M = VDV^{-1}\) for some diagonal matrix D whose nonzero entries contain the eigenvalues of M. In this case, we have the spectral expansion
where the right and left eigenvectors \(v_i\) and \(w_j^*\) are the columns and rows of V and \(V^{-1}\), respectively, normalized so that \(w^*_i v_i = 1\).
2.1 Spectral Projectors and Holomorphic Functional Calculus
Let \(M \in \mathbb {C}^{n\times n}\), with eigenvalues \(\lambda _1,...,\lambda _n\). We say that a matrix P is a spectral projector for M if \(MP = PM\) and \(P^2 = P\). For instance, each of the terms \(v_i w_i^*\) appearing in the spectral expansion (10) is a spectral projector, as \(Av_iw_i^*= \lambda _i v_i w_i^*= v_i w_i^*A\) and \(w_i^*v_i = 1\). If \(\Gamma _i\) is a simple closed positively oriented rectifiable curve in the complex plane separating \(\lambda _i\) from the rest of the spectrum, then it is well known that
by taking the Jordan normal form of the resolvent \((z - M)^{-1}\) and applying Cauchy’s integral formula.Footnote 11
Since every spectral projector P commutes with M, its range agrees exactly with an invariant subspace of M. We will often find it useful to choose some region of the complex plane bounded by a simple closed positively oriented rectifiable curve \(\Gamma \), and compute the spectral projector onto the invariant subspace spanned by those eigenvectors whose eigenvalues lie inside \(\Gamma \). Such a projector can be computed by a contour integral analogous to the above.
Recall that if f is any function, and M is diagonalizable, then we can meaningfully define \(f(M) := V f(D) V^{-1}\), where f(D) is simply the result of applying f to each element of the diagonal matrix D. The holomorphic functional calculus gives an equivalent definition that extends to the case when M is non-diagonalizable. As we will see, it has the added benefit that bounds on the norm of the resolvent of M can be converted into bounds on the norm of f(M).
Proposition 2.1
(Holomorphic Functional Calculus) Let M be any matrix, \(B \supset \Lambda (M)\) be an open neighborhood of its spectrum (not necessarily connected), and \(\Gamma _1,...,\Gamma _k\) be simple closed positively oriented rectifiable curves in B whose interiors together contain all of \(\Lambda (M)\). Then if f is holomorphic on B, the definition
is an algebra homomorphism in the sense that \((fg)(M) = f(M)g(M)\) for any f and g holomorphic on B.
Finally, we will frequently use the resolvent identity
to analyze perturbations of contour integrals.
2.2 Pseudospectrum and Spectral Stability
The \(\epsilon \)-pseudospectrum of a matrix is defined in (5). Directly from this definition, we can relate the pseudospectra of a matrix and a perturbation of it.
Proposition 2.2
([62], Theorem 52.4) For any \(n \times n\) matrices M and E and any \(\epsilon > 0\), \(\Lambda _{\epsilon - \Vert E\Vert }(M) \subseteq \Lambda _\epsilon (M+E)\).
It is also immediate that \(\Lambda (M) \subset \Lambda _\epsilon (M)\), and in fact a stronger relationship holds as well:
Proposition 2.3
([62], Theorem 4.3) For any \(n \times n\) matrix M, any bounded connected component of \(\Lambda _\epsilon (M)\) must contain an eigenvalue of M.
Several other notions of stability will be useful to us as well. If M has distinct eigenvalues \(\lambda _1,\ldots ,\lambda _n\), and spectral expansion as in (10), we define the eigenvalue condition number of \(\lambda _i\) to be
By considering the scaling of V in (2) in which its columns \(v_i\) have unit length, so that \(\kappa (\lambda _i) = \Vert w_i \Vert \), we obtain the useful relationship
Note also that the eigenvector condition number and pseudospectrum are related as follows:
Lemma 2.4
([62]) Let D(z, r) denote the open disk of radius r centered at \(z \in \mathbb {C}\). For every \(M \in \mathbb {C}^{n\times n}\),
In this paper we will repeatedly use that assumptions about the pseudospectrum of a matrix can be turned into stability statements about functions applied to the matrix via the holomorphic functional calculus. Here we describe an instance of particular importance.
Let \(\lambda _i\) be a simple eigenvalue of M and let \(\Gamma _i\) be a contour in the complex plane, as in Sect. 2.1, separating \(\lambda _i\) from the rest of the spectrum of M, and assume \(\Lambda _\epsilon (M)\cap \Gamma =\emptyset \). Then, for any \(\Vert M-M'\Vert< \eta <\epsilon \), a combination of Proposition 2.2 and Proposition 2.3 implies that there is a unique eigenvalue \(\lambda _i'\) of \(M'\) in the region enclosed by \(\Gamma \), and furthermore \(\Lambda _{\epsilon -\eta }(M')\cap \Gamma = \emptyset \). If \(v_i'\) and \(w_i'\) are the right and left eigenvectors of \(M'\) corresponding to \(\lambda _i'\) we have
We have introduced enough tools to prove Proposition 1.1.
Proof of Proposition 1.1
For \(t\in [0, 1]\) define \(A(t) = (1-t)A+ tA' \). Since \(\delta <\frac{\mathrm {gap}(A)}{8\kappa _V(A)}\) the Bauer–Fike theorem implies that A(t) has distinct eigenvalues for all t, and in fact \(\mathrm {gap}(A(t))\ge \frac{3\mathrm {gap}(A)}{4}\). Standard results in perturbation theory (for instance [34, Theorem 1] or any of the references therein) imply that for every \(i=1, \dots , n\), A(t) has a unique eigenvalue \(\lambda _i(t)\) such that \(\lambda _i(t)\) is a differentiable trajectory, \(\lambda _i(0) =\lambda _i\) and \(\lambda _i(1)=\lambda _i'\). Let \(P_i(t)\) be the associated spectral projector of \(\lambda _i(t)\), which is uniquely defined via a contour integral, and write \(P_i = P_i(0)\).
Let \(\Gamma _i\) be the positively oriented contour forming the boundary of the closed disk centered at \(\lambda _i\) with radius \(\mathrm {gap}(A)/2\), and define \(\epsilon =\frac{\mathrm {gap}(A)}{2\kappa _V(A)}\). Lemma 2.4 implies \(\Lambda _{\epsilon }(A)\) is contained in the union of these disks over all \(i \in [n]\), and for fixed \(t\in [0, 1]\), since \(\Vert A-A(t)\Vert < t \delta \le \epsilon /4\), Proposition 2.2 gives the same containment for \(\Lambda _{3\epsilon /4}(A(t))\). Since these disks intersect only in their boundaries (if they do at all), \(\Vert (z - A)^{-1}\Vert \le 1/\epsilon \) and \(\Vert (z - A(t))^{-1}\Vert \le 4/3\epsilon \) for \(z \in \Gamma _i\). By the derivation of (13) above,
and hence \(\kappa (\lambda _i(t))\le \kappa (\lambda _i)+\kappa _V(A)/3 \le 4\kappa _V(A)/3\). Combining this with (11) we obtain
From Theorem 2 of [34] and the subsequent discussion on p. 468, there exist analytic functions \(v_i(t)\) satisfying \(v_i(0) = v_i\) and \(A(t)v_i(t) = \lambda _i(t)v_i(t)\) for all \(i \in [n]\) and \(t \in [0,1]\), which furthermore admit the bound
However, these \(v_i(t)\) need not in general be unit vectors (see [34, Section 3.4] and references for discussion of various normalizations). Therefore set \({\hat{v}}_i(t) = \Vert v_i(t)\Vert ^{-1} v_i(t)\), and note that by an application of the chain rule,
It then follows that the vectors \(v_i' = \hat{v_i}(1)\) for \(i \in [n]\) satisfy the conclusion of the theorem, by bounding \(\kappa _V(A(t))\le 4n\kappa _V(A)/3\) and \(\mathrm {gap}(A(t))\ge \frac{3\mathrm {gap}(A)}{4}\), and integrating the resulting upper bound \(\Vert \dot{\hat{v_i}}(t)\Vert \le \frac{16n\delta \kappa _V(A)}{9\mathrm {gap}(A)}\) from \(t = 0\) to \(t= 1\). \(\square \)
2.3 Finite-Precision Arithmetic
We briefly elaborate on the axioms for floating-point arithmetic given in Sect. 1.1. Similar guarantees to the ones appearing in that section for scalar-scalar operations also hold for operations such as matrix–matrix addition and matrix-scalar multiplication. In particular, if A is an \(n\times n\) complex matrix,
It will be convenient for us to write such errors in additive, as opposed to multiplicative form. We can convert the above to additive error as follows. Recall that for any \(n\times n\) matrix, the spectral norm (the \(\ell ^2 \rightarrow \ell ^2\) operator norm) is at most \(\sqrt{n}\) times the \(\ell ^2 \rightarrow \ell ^1\) operator norm, i.e. the maximal norm of a column. Thus, we have
For more complicated operations such as matrix–matrix multiplication and matrix inversion, we use existing error guarantees from the literature. This is the subject of Sect. 2.5.
We will also need to compute the trace of a matrix \(A \in \mathbb {C}^{n\times n}\), and normalize a vector \(x \in \mathbb {C}^n\). Error analysis of these is standard (see for instance the discussion in [39, Chapters 3–4]), and the results in this paper are highly insensitive to the details. For simplicity, calling \({\hat{x}} := x/\Vert x\Vert \), we will assume that
Each of these can be achieved by assuming that \({\textbf {u }}n \le \epsilon \) for some suitably chosen \(\epsilon \), independent of n, a requirement which will be depreciated shortly by several tighter assumptions on the machine precision.
Throughout the paper, we will take the pedagogical perspective that our algorithms are games played between the practitioner and an adversary who may additively corrupt each operation. In particular, we will include explicit error terms (always denoted by \(E_{(\cdot )}\)) in each appropriate step of every algorithm. In many cases we will first analyze a routine in exact arithmetic—in which case the error terms will all be set to zero—and subsequently determine the machine precision \({\textbf {u }}\) necessary so that the errors are small enough to guarantee convergence.
2.4 Sampling Gaussians in Finite Precision
For various parts of the algorithm, we will need to sample from normal distributions. For our model of arithmetic, we assume that the complex normal distribution can be sampled up to machine precision in O(1) arithmetic operations. To be precise, we assume the existence of the following sampler:
Definition 2.5
(Complex Gaussian Sampling)
A \(c_{\mathsf {N}}\)-stable Gaussian sampler \(\mathsf {N}(\sigma )\) takes as input \(\sigma \in \mathbb {R}_{\ge 0}\) and outputs a sample of a random variable \({\widetilde{G}} = \mathsf {N}(\sigma )\) with the property that there exists \(G \sim N_{\mathbb {C}}(0, \sigma ^2)\) satisfying
with probability one, in at most \(T_\mathsf {N}\) arithmetic operations for some universal constant \(T_\mathsf {N}>0\).
Note that, since the Gaussian distribution has unbounded support, one should only expect the sampler \(\mathsf {N}(\sigma )\) to have a relative error guarantee of the sort \(|{\widetilde{G}} - G| \le c_{\mathsf {N}}\sigma |G| \cdot {\textbf {u }}\). However, as it will become clear below, we only care about realizations of Gaussians satisfying \(|G|<R\), for a certain prespecified \(R>0\), and the rare event \(|G|>R\) will be accounted for in the failure probability of the algorithm. So, for the sake of exposition we decided to omit the |G| in the bound on \(|{\widetilde{G}}-G|\).
We will only sample \(O(n^2)\) Gaussians during the algorithm, so this sampling will not contribute significantly to the runtime. Here as everywhere in the paper, we will omit issues of underflow or overflow. Throughout this paper, to simplify some of our bounds, we will also assume that \(c_{\mathsf {N}}\ge 1\).
2.5 Black-box Error Assumptions for Multiplication, Inversion, and QR
Our algorithm uses matrix–matrix multiplication, matrix inversion, and QR factorization as primitives. For our analysis, we must therefore assume some bounds on the error and runtime costs incurred by these subroutines. In this section, we first formally state the kind of error and runtime bounds we require, and then discuss some implementations known in the literature that satisfy each of our requirements with modest constants.
Our definitions are inspired by the definition of logarithmic stability introduced in [22]. Roughly speaking, they say that implementing the algorithm with floating point precision \({\textbf {u }}\) yields an accuracy which is at most polynomially or quasipolynomially in n worse than \({\textbf {u }}\) (possibly also depending on the condition number in the case of inversion). Their definition has the property that while a logarithmically stable algorithm is not strictly speaking backward stable, it can attain the same forward error bound as a backward stable algorithm at the cost of increasing the bit length by a polylogarithmic factor. See Section 3 of their paper for a precise definition and a more detailed discussion of how their definition relates to standard numerical stability notions.
Definition 2.6
A \(\mu _{\mathsf {MM}}(n)\)-stable multiplication algorithm \(\mathsf {MM}(\cdot , \cdot )\) takes as input \(A,B\in \mathbb {C}^{n\times n}\) and a precision \({\textbf {u }}>0\) and outputs \(C=\mathsf {MM}(A, B)\) satisfying
on a floating point machine with precision \({\textbf {u }}\), in \(T_\mathsf {MM}(n)\) arithmetic operations.
Definition 2.7
A \((\mu _{\mathsf {INV}}(n), c_\mathsf {INV})\)-stable inversion algorithm \(\mathsf {INV}(\cdot )\) takes as input \(A\in \mathbb {C}^{n\times n}\) and a precision \({\textbf {u }}\) and outputs \(C=\mathsf {INV}(A)\) satisfying
on a floating point machine with precision \({\textbf {u }}\), in \(T_\mathsf {INV}(n)\) arithmetic operations.
Definition 2.8
A \(\mu _\mathsf {QR}(n)\)-stable QR factorization algorithm \(\mathsf {QR}(\cdot )\) takes as input \(A\in \mathbb {C}^{n\times n}\) and a precision \({\textbf {u }}\), and outputs \([Q,R]=\mathsf {QR}(A)\) such that
-
1.
R is exactly upper triangular.
-
2.
There is a unitary \(Q'\) and a matrix \(A'\) such that
$$\begin{aligned} Q' A'= R, \end{aligned}$$(17)and
$$\begin{aligned} \Vert Q' - Q\Vert \le \mu _\mathsf {QR}(n){\textbf {u }}, \quad \text {and} \quad \Vert A'-A\Vert \le \mu _\mathsf {QR}(n){\textbf {u }}\Vert A\Vert , \end{aligned}$$
on a floating point machine with precision \({\textbf {u }}\). Its running time is \(T_\mathsf {QR}(n)\) arithmetic operations.
Remark 2.9
Throughout this paper, to simplify some of our bounds, we will assume that
The above definitions can be instantiated with traditional \(O(n^3)\)-complexity algorithms for which \(\mu _{\mathsf {MM}}, \mu _\mathsf {QR}, \mu _{\mathsf {INV}}\) are all O(n) and \(c_\mathsf {INV}=1\) [39]. This yields easily implementable practical algorithms with running times depending cubically on n.
In order to achieve \(O(n^\omega )\)-type efficiency, we instantiate them with fast-matrix-multiplication-based algorithms and with \(\mu (n)\) taken to be a low-degree polynomial [22]. Specifically, the following parameters are known to be achievable.
Theorem 2.10
(Fast and Stable Instantiations of \(\mathsf {MM},\mathsf {INV}, \mathsf {QR}\))
-
1.
If \(\omega \) is the exponent of matrix multiplication, then for every \(\eta >0\) there is a \(\mu _{\mathsf {MM}}(n)\)-stable multiplication algorithm with \(\mu _{\mathsf {MM}}(n)=n^{c_\eta }\) and \(T_\mathsf {MM}(n)=O(n^{\omega +\eta })\), where \(c_\eta \) does not depend on n.
-
2.
Given an algorithm for matrix multiplication satisfying (1), there is a (\(\mu _{\mathsf {INV}}(n),c_\mathsf {INV})\) -stable inversion algorithm with
$$\begin{aligned} \mu _{\mathsf {INV}}(n)\le O(\mu _{\mathsf {MM}}(n)n^{\lg (10)}),\quad \quad c_\mathsf {INV}\le 8, \end{aligned}$$and \(T_\mathsf {INV}(n)\le T_\mathsf {MM}(3n)=O(T_\mathsf {MM}(n))\).
-
3.
Given an algorithm for matrix multiplication satisfying (1), there is a \(\mu _\mathsf {QR}(n)\)-stable QR factorization algorithm with
$$\begin{aligned} \mu _\mathsf {QR}(n)=O(n^{c_\mathsf {QR}} \mu _{\mathsf {MM}}(n)), \end{aligned}$$where \(c_\mathsf {QR}\) is an absolute constant, and \(T_\mathsf {QR}(n)=O(T_\mathsf {MM}(n))\).
In particular, all of the running times above are bounded by \(T_\mathsf {MM}(n)\) for an \(n\times n\) matrix.
Proof
(1) is Theorem 3.3 of [23]. (2) is Theorem 3.3 (see also equation (9) above its statement) of [22]. The final claim follows by noting that \(T_\mathsf {MM}(3n)=O(T_\mathsf {MM}(n))\) by dividing a \(3n\times 3n\) matrix into nine \(n\times n\) blocks and proceeding blockwise, at the cost of a factor of 9 in \(\mu _{\mathsf {INV}}(n)\). (3) appears in Section 4.1 of [22]. \(\square \)
We remark that for specific existing fast matrix multiplication algorithms such as Strassen’s algorithm, specific small values of \(\mu _\mathsf {MM}(n)\) are known (see [23] and its references for details), so these may also be used as a black box, though we will not do this in this paper.
3 Pseudospectral Shattering
This section is devoted to our central probabilistic result, Theorem 1.4, and the accompanying notion of pseudospectral shattering which will be used extensively in our analysis of the spectral bisection algorithm in Sect. 5.
3.1 Smoothed Analysis of Gap and Eigenvector Condition Number
As is customary in the literature, we will refer to an \(n\times n\) random matrix \(G_n\) whose entries are independent complex Gaussians drawn from \(\mathcal {N}(0,1_\mathbb {C}/n)\) as a normalized complex Ginibre random matrix. To be absolutely clear, and because other choices of scaling are quite common, we mean that \(\mathbb {E}G_{i,j} = 0\) and \(\mathbb {E}|G_{i,j}|^2 = 1/n\).
In the course of proving Theorem 1.4, we will need to bound the probability that the second-smallest singular value of an arbitrary matrix with small Ginibre perturbation is atypically small. We begin with a well-known lower tail bound on the singular values of a Ginibre matrix alone.
Theorem 3.1
([61, Theorem 1.2]) For \(G_n\) an \(n\times n\) normalized complex Ginibre matrix and for any \(\alpha \ge 0\) it holds that
As in several of the authors’ earlier work [9], we can transfer this result to case of a Ginibre perturbation via a remarkable coupling result of P. Śniady.
Theorem 3.2
(Śniady [58]) Let \(A_1\) and \(A_2\) be \(n \times n\) complex matrices such that \(\sigma _i(A_1) \le \sigma _i(A_2)\) for all \(1 \le i \le n\). Assume further that \(\sigma _i(A_1) \ne \sigma _j(A_1)\) and \(\sigma _i(A_2) \ne \sigma _j(A_2)\) for all \(i \ne j\). Then for every \(t \ge 0\), there exists a joint distribution on pairs of \(n \times n\) complex matrices \((G_1, G_2)\) such that
-
1.
the marginals \(G_1\) and \(G_2\) are distributed as normalized complex Ginibre matrices, and
-
2.
almost surely \(\sigma _i(A_1 + \sqrt{t} G_1) \le \sigma _i(A_2 + \sqrt{t} G_2)\) for every i.
Corollary 3.3
For any fixed matrix M and parameters \(\gamma , t>0\)
Proof
We would like to apply Theorem 3.2 to \(A_1=0\) and \(A_2=M\), but the theorem has the technical condition that \(A_1\) and \(A_2\) have distinct singular values. Taking vanishingly small perturbations of 0 and M satisfying this condition and taking the size of the perturbation to zero, we obtain
Invoking Theorem 3.1 with \(j=n-1\) and \(\alpha \) replaced by \(tn/2\gamma \) yields the claim.
\(\square \)
We will need as well the main theorem of [9], which shows that the addition of a small complex Ginibre to an arbitrary matrix tames its eigenvalue condition numbers.
Theorem 3.4
([9, Theorem 1.5]) Suppose \(A\in \mathbb {C}^{n\times n}\) with \(\Vert A\Vert \le 1\) and \(\delta \in (0,1)\). Let \(G_n\) be a complex Ginibre matrix, and let \(\lambda _1,\ldots ,\lambda _n\in \mathbb {C}\) be the (random) eigenvalues of \(A+\delta G_n\). Then for every measurable open set \(B\subset \mathbb {C},\)
Our final lemma before embarking on the proof in earnest shows that bounds on the j-th smallest singular value and eigenvector condition number are sufficient to rule out the presence of j eigenvalues in a small region. For our particular application, we will take \(j=2\).
Lemma 3.5
Let \(D(z_0,r) :=\{z \in \mathbb {C} :|z-z_0|<r\}\). If \(M\in \mathbb {C}^{n\times n}\) is a diagonalizable matrix with at least j eigenvalues in \(D(z_0,r)\) then
Proof
Write \(M=VDV^{-1}\). By Courant–Fischer:
Since \(z_0-D\) is diagonal its singular values are just \(|z_0-\lambda _i|\), so the j-th smallest is at most r, finishing the proof. \(\square \)
We now present the main tail bound that we use to control the minimum gap and eigenvector condition number.
Theorem 3.6
(Multiparameter Tail Bound) Let \(A\in \mathbb {C}^{n\times n}\). Assume \(\Vert A\Vert \le 1\) and \(\gamma <1/2\), and let \(X:=A+\gamma G_n\) where \(G_n\) is a complex Ginibre matrix. For every \(t,r>0\):
Proof
Write \(\Lambda (X):=\{\lambda _1,\ldots ,\lambda _n\}\) for the (random) eigenvalues of \(X:=A+\gamma G_n\), in increasing order of magnitude (there are no ties almost surely). Let \(\mathcal {N}\subset \mathbb {C}\) be a minimal r/2-net of \(B:=D(0,3)\), recalling the standard fact that one exists of size no more than \((3\cdot 4/r)^2=144/r^2\). The most useful feature of such a net is that, by the triangle inequality, for any \(a,b \in D(0,3)\) with distance at most r, there is a point \(y\in \mathcal {N}\) with \(|y-(a+b)/2|<r/2\) satisfying \(a,b\in D(y,r)\). In particular, if \(\mathrm {gap}(X) < r\), then there are two eigenvalues in the disk of radius r centered at some point \(y \in \mathcal {N}\).
Therefore, consider the events
Lemma 3.5 applied to each \(y\in \mathcal {N}\) with \(j=2\) reveals that
whence
By a union bound, we have
From the tail bound on the operator norm of a Ginibre matrix in [9, Lemma 2.2],
Observe that by (11),
since the inequality in the left-hand event must reverse when we sum over all \(\lambda _i \in \Lambda (X)\); thus,
Theorem 3.4 and Markov’s inequality yield
Thus, we have
Corollary 3.3 applied to \(M=-y+A\) gives the bound
for each \(y\in \mathcal {N}\), and plugging these estimates back into (19) we have
as desired. \(\square \)
A specific setting of parameters in Theorem 3.6 immediately yields Theorem 1.4.
Proof of Theorem 1.4
Applying Theorem 3.6 with parameters \( t:=\frac{n^2}{\gamma }\) and \(r := \frac{\gamma ^4}{n^5}\), we have
as desired, where in the last step we use the assumption \(\gamma < 1/2\). \(\square \)
Since it is of independent interest in random matrix theory, we record the best bound on the gap alone that is possible to extract from the theorem above.
Corollary 3.7
(Minimum Gap Bound)
For X as in Theorem 3.6,
In particular, the probability is o(1) if \(r=o((\gamma /n)^{8/3})\).
Proof
Setting
in Theorem 3.6 balances the first two terms and yields the advertised bound. \(\square \)
3.2 Shattering
Propositions 2.2 and 2.3 in the preliminaries together tell us that if the \(\epsilon \)-pseudospectrum of an \(n\times n\) matrix A has n connected components, then each eigenvalue of any size-\(\epsilon \) perturbation \({\widetilde{A}}\) will lie in its own connected component of \(\Lambda _\epsilon (A)\). The following key definitions make this phenomenon quantitative in a sense which is useful for our analysis of spectral bisection.
Definition 3.8
(Grid) A grid in the complex plane consists of the boundaries of a lattice of squares with lower edges parallel to the real axis. We will write
to denote an \(s_1\times s_2\) grid of \(\omega \times \omega \)-sized squares and lower left corner at \(z_0 \in \mathbb {C}\). Write \({{\,\mathrm{diam}\,}}(\mathsf {g}) := \omega \sqrt{s_1^2 + s_2^2}\) for the diameter of the grid.
Definition 3.9
(Shattering) A pseudospectrum \(\Lambda _\epsilon (A)\) is shattered with respect to a grid \(\mathsf {g}\) if:
-
1.
Every square of \(\mathsf {g}\) has at most one eigenvalue of A.
-
2.
\(\Lambda _\epsilon (A)\cap \mathsf {g}=\emptyset \).
Observation 3.10
As \(\Lambda _\epsilon (A)\) contains a ball of radius \(\epsilon \) about each eigenvalue of A, shattering of the \(\epsilon \)-pseudospectrum with respect to a grid with side length \(\omega \) implies \(\epsilon \le \omega /2\).
As a warm-up for more sophisticated arguments later on, we give here an easy consequence of the shattering property.
Lemma 3.11
If \(\lambda _1, \dots , \lambda _n\) are the eigenvalues of A, and \(\Lambda _\epsilon (A)\) is shattered with respect to a grid \(\mathsf {g}\) with side length \(\omega \), then every eigenvalue condition number satisfies \(\kappa (\lambda _i) \le \frac{2\omega }{\pi \epsilon }\).
Proof
Let \(v,w^*\) be a right/left eigenvector pair for some eigenvalue \(\lambda _i\) of A, normalized so that \(w^*v = 1\). Letting \(\Gamma \) be the positively oriented boundary of the square of \(\mathsf {g}\) containing \(\lambda _i\), we can extract the projector \(vw^*\) by integrating, and pass norms inside the contour integral to obtain
In the final step, we have used the fact that, given the definition of pseudospectrum (6) above, \(\Lambda _\epsilon (A) \cap \mathsf {g}= \emptyset \) means \(\Vert (z - A)^{-1}\Vert \le 1/\epsilon \) on \(\mathsf {g}\). \(\square \)
The theorem below quantifies the extent to which perturbing by a Ginibre matrix results in a shattered pseudospectrum. See Fig. 1 for an illustration in the case where the initial matrix is poorly conditioned. In general, not all eigenvalues need move so far upon such a perturbation, in particular if the respective \(\kappa _i\) are small.
Theorem 3.12
(Exact Arithmetic Shattering) Let \(A\in {\mathbb {C}}^{n\times n}\) and \(X:=A+\gamma G_n\) for \(G_n\) a complex Ginibre matrix. Assume \(\Vert A\Vert \le 1\) and \(0< \gamma < 1/2\). Let \(\mathsf {g}:= \mathsf {grid}(z, \omega ,\lceil 8/\omega \rceil , \lceil 8/\omega \rceil )\) with \(\omega := \frac{\gamma ^4}{4n^5}\), and z chosen uniformly at random from the square of side \(\omega \) cornered at \(-4-4i\). Then, \(\kappa _V(X)\le n^2/\gamma \), \(\Vert A-X\Vert \le 4\gamma \), and \(\Lambda _\epsilon (X)\) is shattered with respect to \(\mathsf {g}\) for
with probability at least \(1-13/n\).
Proof
Condition on the event in Theorem 1.4, so that
Consider the random grid \(\mathsf {g}\). Since D(0, 3) is contained in the square of side length 8 centered at the origin, every eigenvalue of X is contained in one square of \(\mathsf {g}\) with probability 1. Moreover, since \(\mathrm {gap}(X)>4\omega \), no square can contain two eigenvalues. Let
Let \(\lambda _i := \lambda _i(X)\). We now have for each \(\lambda _i\) and every \(s < \frac{\omega }{2}\) :
since the distribution of \(\lambda _i\) inside its square is uniform with respect to Lebesgue measure. Setting \(s=\omega /4n^2\), this probability is at least \(1-1/n^2\), so by a union bound
i.e., every eigenvalue is well-separated from \(\mathsf {g}\) with probability \(1-1/n\).
We now recall from (12) that
Thus, on the events (21) and (23), we see that \(\Lambda _\epsilon (X)\) is shattered with respect to \(\mathsf {g}\) as long as
which is implied by
Thus, the advertised claim holds with probability at least
as desired. \(\square \)
Finally, we show that the shattering property is retained when the Gaussian perturbation is added in finite precision rather than exactly. This also serves as a pedagogical warm-up for our presentation of more complicated algorithms later in the paper: we use E to represent an adversarial roundoff error (as in step 2), and for simplicity neglect roundoff error completely in computations whose size does not grow with n (such as steps 3 and 4, which set scalar parameters).
Theorem 3.13
(Finite Arithmetic Shattering) Assume there is a \(c_{\mathsf {N}}\)-stable Gaussian sampling algorithm \(\mathsf {N}\) satisfying the requirements of Definition 2.5. Then \(\mathsf {SHATTER}\) has the advertised guarantees as long as the machine precision satisfies
and runs in
arithmetic operations.
Proof
The two sources of error in \(\mathsf {SHATTER}\) are:
-
1.
An additive error of operator norm at most \(n\cdot c_{\mathsf {N}}\cdot (1/\sqrt{n})\cdot {\textbf {u }}\le c_{\mathsf {N}}\sqrt{n}\cdot {\textbf {u }}\) from \(\mathsf {N}\), by Definition 2.5.
-
2.
An additive error of norm at most \(\sqrt{n}\cdot \Vert X\Vert \cdot {\textbf {u }}\le 3\sqrt{n}{\textbf {u }}\), with probability at least \(1-1/n\), from the roundoff E in step 2.
Thus, as long as the precision satisfies (24), we have
where \(\mathrm {shatter}(\cdot )\) refers to the (exact arithmetic) outcome of Theorem 3.12. The correctness of \(\mathsf {SHATTER}\) now follows from Proposition 2.2. Its running time is bounded by
arithmetic operations, as advertised. \(\square \)
4 Matrix Sign Function
The algorithmic centerpiece of this work is the analysis, in finite arithmetic, of a well-known iterative method for approximating to the matrix sign function. Recall from Sect. 1 that if A is a matrix whose spectrum avoids the imaginary axis, then
where the \(P_+\) and \(P_-\) are the spectral projectors corresponding to eigenvalues in the open right and left half-planes, respectively. The iterative algorithm we consider approximates the matrix sign function by repeated application to A of the function
This is simply Newton’s method to find a root of \(z^2 - 1\), but one can verify that the function g fixes the left and right half-planes, and thus we should expect it to push those eigenvalues in the former towards \(-1\), and those in the latter towards \(+1\).
We denote the specific finite-arithmetic implementation used in our algorithm by \(\mathsf {SGN}\); the pseudocode is provided below.
In Sect. 4.1 we briefly discuss the specific preliminaries that will be used throughout this section. In Sect. 4.2 we give a pseudospectral proof of the rapid global convergence of this iteration when implemented in exact arithmetic. In Sect. 4.2 we show that the proof provided in Sect. 4.3 is robust enough to handle the finite arithmetic case; a formal statement of this main result is the content of Theorem 4.9.
4.1 Circles of Apollonius
It has been known since antiquity that a circle in the plane may be described as the set of points with a fixed ratio of distances to two focal points. By fixing the focal points and varying the ratio in question, we get a family of circles named for the Greek geometer Apollonius of Perga. We will exploit several interesting properties enjoyed by these Circles of Apollonius in the analysis below.
More precisely, we analyze the Newton iteration map \(g\) in terms of the family of Apollonian circles whose foci are the points \(\pm 1 \in \mathbb {C}\). For the remainder of this section we will write \(m(z) = \tfrac{1 - z}{1 + z}\) for the Möbius transformation taking the right half-plane to the unit disk, and for each \(\alpha \in (0,1)\) we denote by
the closed region in the right (respectively left) half-plane bounded by such a circle. Write \(\partial \mathsf {C}^{+}_{\alpha }\) and \(\partial \mathsf {C}^{-}_{\alpha }\) for their boundaries, and \(\mathsf {C}_{\alpha } = \mathsf {C}^{+}_{\alpha } \cup \mathsf {C}^{-}_{\alpha }\) for their union. See Fig. 2 for an illustration.
The region \(\mathsf {C}^{+}_{\alpha }\) is a disk centered at \(\tfrac{1 + \alpha ^2}{1 - \alpha ^2} \in \mathbb {R}\), with radius \(\tfrac{2\alpha }{1-\alpha ^2}\), and whose intersection with the real line is the interval \((m(\alpha ),m(\alpha )^{-1})\); \(\mathsf {C}^{-}_{\alpha }\) can be obtained by reflecting \(\mathsf {C}^{+}_{\alpha }\) with respect to the imaginary axis. For \(\alpha> \beta > 0\), we will write
for the Apollonian annulus lying inside \(\mathsf {C}^{+}_{\alpha }\) and outside \(\mathsf {C}^{+}_{\beta }\); note that the circles are not concentric so this is not strictly speaking an annulus, and note also that in our notation this set does not include \(\partial \mathsf {C}^{+}_{\beta }\). In the same way define \(\mathsf {A}^{-}_{\alpha ,\beta }\) for the left half-plane and write \(\mathsf {A}_{\alpha ,\beta } = \mathsf {A}^{+}_{\alpha ,\beta } \cup \mathsf {A}^{-}_{\alpha ,\beta }\).
Observation 4.1
([53]) The Newton map \(g\) is a two-to-one map from \(\mathsf {C}^{+}_{\alpha }\) to \(\mathsf {C}^{+}_{\alpha ^2}\), and a two-to-one map from \(\mathsf {C}^{-}_{\alpha }\) to \(\mathsf {C}^{-}_{\alpha ^2}\).
Proof
This follows from the fact that for each z in the right half-plane,
and similarly for the left half-plane. \(\square \)
It follows from Observation 4.1 that under repeated application of the Newton map g, any point in the right or left half-plane converges to \(+1\) or \(-1\), respectively.
4.2 Exact Arithmetic
In this section, we set \(A_0 := A\) and \(A_{k+1} := g(A_k)\) for all \(k \ge 0\). In the case of exact arithmetic, Observation 4.1 implies global convergence of the Newton iteration when A is diagonalizable. For the convenience of the reader, we provide this argument (due to [53]) below.
Proposition 4.2
Let A be a diagonalizable \(n \times n\) matrix and assume that \(\Lambda (A) \subset \mathsf {C}_{\alpha }\) for some \(\alpha \in (0,1)\). Then for every \(N \in {\mathbb {N}}\) we have the guarantee
Moreover, when A does not have eigenvalues on the imaginary axis the minimum \(\alpha \) for which \(\Lambda (A) \subset \mathsf {C}_{\alpha }\) is given by
Proof
Consider the spectral decomposition \(A = \sum _{i=1}^n \lambda _i v_i w_i^*,\) and denote by \(\lambda _i^{(N)}\) the eigenvalues of \(A_N\).
By Observation 4.1, we have that \(\Lambda (A_N) \subset \mathsf {C}_{\alpha ^{2^N}}\) and \(\mathrm {sgn}(\lambda _i) = \mathrm {sgn}(\lambda _i^{(N)})\). Moreover, \(A_N\) and \(\mathrm {sgn}(A)\) have the same eigenvectors. Hence
Now we will use that for any matrix X we have that \(\Vert X \Vert \le \kappa _V(X) \mathsf {spr}(X)\) where \(\mathsf {spr}(X)\) denotes the spectral radius of X. Observe that the spectral radii of the two matrices appearing on the right-hand side of (26) are bounded by \(\max _{i} |\lambda _i- \mathrm {sgn}(\lambda _i)|\), which in turn is bounded by the radius of the circle \(\mathsf {C}^{+}_{\alpha ^{2^N}}\), namely \(2\alpha ^{2^N}/(\alpha ^{2^{N+1}}+1)\). On the other hand, the eigenvector condition number of these matrices is bounded by \(\kappa _V(A)\). This concludes the first part of the statement.
In order to compute \(\alpha \) note that if \(z = x+ i y\) with \( x > 0\), then
and analogously when \(x < 0\) and we evaluate \(|m(z)|^{-2}\). \(\square \)
The above analysis becomes useless when trying to prove the same statement in the framework of finite arithmetic. This is due to the fact that at each step of the iteration the roundoff error can make the eigenvector condition numbers of the \(A_k\) grow. In fact, since \(\kappa _V(A_k)\) is sensitive to infinitesimal perturbations whenever \(A_k\) has a multiple eigenvalue, it seems difficult to control it against adversarial perturbations as the iteration converges to \(\mathrm {sgn}(A_k)\) (which has very high multiplicity eigenvalues). A different approach, also due to [53], yields a proof of convergence in exact arithmetic even when A is not diagonalizable. However, that proof relies heavily on the fact that \(m(A_N)\) is an exact power of \(m(A_0)\), or more precisely, it requires the sequence \(A_k\) to have the same generalized eigenvectors, which is again not the case in the finite arithmetic setting.
Therefore, a robust version, tolerant to perturbations, of the above proof is needed. To this end, instead of simultaneously keeping track of the eigenvector condition number and the spectrum of the matrices \(A_k\), we will just show that for certain \(\epsilon _k > 0\), the \(\epsilon _k\)-pseudospectra of these matrices are contained in a certain shrinking region dependent on k. This invariant is inherently robust to perturbations smaller than \(\epsilon _k\), unaffected by clustering of eigenvalues due to convergence, and allows us to bound the accuracy and other quantities of interest via the functional calculus. For example, the following lemma shows how to obtain a bound on \(\Vert A_N- \mathrm {sgn}(A)\Vert \) solely using information from the pseudospectrum of \(A_N\).
Lemma 4.3
(Pseudospectral Error Bound) Let A be any \(n \times n\) matrix and let \(A_N\) be the Nth iterate of the Newton iteration under exact arithmetic. Assume that \(\epsilon _N > 0\) and \(\alpha _N \in (0, 1)\) satisfy \(\Lambda _{\epsilon _N}(A_N) \subset \mathsf {C}_{\alpha _N}\). Then, we have the guarantee
Proof
Note that \(\mathrm {sgn}(A) = \mathrm {sgn}(A_N)\). Using the functional calculus, we get
\(\square \)
In view of Lemma 4.3, we would now like to find sequences \(\alpha _k\) and \(\epsilon _k\) such that
and \(\alpha _k^2/\epsilon _k\) converges rapidly to zero. The dependence of this quantity on the square of \(\alpha _k\) turns out to be crucial. As we will see below, we can find such a sequence with \(\epsilon _k\) shrinking roughly at the same rate as \(\alpha _k\). This yields quadratic convergence, which will be necessary for our bound on the required machine precision in the finite arithmetic analysis of Sect. 4.3.
The lemma below is instrumental in determining the sequences \(\alpha _k, \epsilon _k\).
Lemma 4.4
(Key Lemma) If \(\Lambda _\epsilon (A) \subset \mathsf {C}_{\alpha }\), then for every \(\alpha '>\alpha ^2\), we have \(\Lambda _{\epsilon '}(g(A))\subset \mathsf {C}_{\alpha '}\) where
Proof
From the definition of pseudospectrum, our hypothesis implies \(\Vert (z - A)^{-1}\Vert < 1/\epsilon \) for every z outside of \(\mathsf {C}_{\alpha }\). The proof will hinge on the observation that, for each \(\alpha ' \in (\alpha ^2,\alpha )\), this resolvent bound allows us to bound the resolvent of \(g(A)\) everywhere in the Apollonian annulus \(\mathsf {A}_{\alpha ,\alpha '}\).
Let \(w \in \mathsf {A}_{\alpha ,\alpha '}\); see Fig. 3 for an illustration. We must show that \(w \not \in \Lambda _{\epsilon '}(g(A))\). Since \(w \not \in \mathsf {C}_{\alpha ^2}\), Observation 4.1 ensures no \(z \in \mathsf {C}_{\alpha }\) satisfies \(g(z) = w\); in other words, the function \((w - g(z))^{-1}\) is holomorphic in z on \(\mathsf {C}_{\alpha }\). As \(\Lambda (A) \subset \Lambda _\epsilon (A) \subset \mathsf {C}_{\alpha }\), Observation 4.1 also guarantees that \(\Lambda (g(A)) \subset \mathsf {C}_{\alpha ^2}\). Thus for w in the union of the two Apollonian annuli in question, we can calculate the resolvent of \(g(A)\) at w using the holomorphic functional calculus:
where by this we mean to sum the integrals over \(\partial \mathsf {C}^{+}_{\alpha }\) and \(\partial \mathsf {C}^{-}_{\alpha }\), both positively oriented. Taking norms, passing inside the integral, and applying Observation 4.1 one final time, we get:
In the last step we also use the forthcoming Lemma 4.5. Thus, with \(\epsilon '\) defined as in the theorem statement, \(\mathsf {A}_{\alpha ,\alpha '}\) contains none of the \(\epsilon '\)-pseudospectrum of \(g(A)\). Since \(\Lambda (g(A)) \subset \mathsf {C}_{\alpha ^2}\), Theorem 2.3 tells us that there can be no \(\epsilon '\)-pseudospectrum in the remainder of \(\mathbb {C}\setminus \mathsf {C}_{\alpha '}\), as such a connected component would need to contain an eigenvalue of \(g(A)\). \(\square \)
Lemma 4.5
Let \(1> \alpha , \beta > 0\) be given. Then for any \(x \in \partial \mathsf {C}_{\alpha }\) and \(y \in \partial \mathsf {C}_{\beta }\), we have \(|x-y| \ge (\alpha -\beta )/2\).
Proof
Without loss of generality \(x \in \partial \mathsf {C}^{+}_{\alpha }\) and \(y \in \partial \mathsf {C}^{+}_{\beta }\). Then, we have
\(\square \)
Lemma 4.4 will also be useful in bounding the condition numbers of the \(A_k\), which is necessary for the finite arithmetic analysis.
Corollary 4.6
(Condition Number Bound) Using the notation of Lemma 4.4, if \(\Lambda _\epsilon (A) \subset \mathsf {C}_{\alpha }\), then
Proof
The bound \(\Vert A^{-1} \Vert \le 1/\epsilon \) follows from the fact that \(0 \notin \mathsf {C}_{\alpha } \supset \Lambda _{\epsilon }(A).\) In order to bound A we use the contour integral bound
\(\square \)
Another direct application of Lemma 4.4 yields the following.
Lemma 4.7
Let \(\epsilon > 0\). If \(\Lambda _\epsilon (A) \subset \mathsf {C}_{\alpha }\), and \( 1/\alpha> D > 1\) then for every N we have the guarantee
for \(\alpha _N =(D\alpha )^{2^N}/D\) and \(\epsilon _N = \frac{\alpha _N \epsilon }{\alpha } \left( \frac{(D-1)(1-\alpha ^2)}{8D}\right) ^N \).
Proof
Define recursively \(\alpha _0 = \alpha \), \(\epsilon _0 = \epsilon \), \(\alpha _{k+1} = D \alpha _k^2\) and \(\epsilon _{k+1}= \frac{1}{8} \epsilon _k \alpha _k (D-1)(1-\alpha _0^2).\) It is easy to see by induction that this definition is consistent with the definition of \(\alpha _N\) and \(\epsilon _N\) given in the statement.
We will now show by induction that \(\Lambda _{\epsilon _k}(A_k) \subset \mathsf {C}_{\alpha _k}\). Assume the statement is true for k, so from Lemma 4.4 we have that the statement is also true for \(A_{k+1}\) if we pick the pseudospectral parameter to be
On the other hand,
which concludes the proof of the statement. \(\square \)
We are now ready to prove the main result of this section, a pseudospectral version of Proposition 4.2.
Proposition 4.8
Let \(A\in {\mathbb {C}}^{n\times n}\) be a diagonalizable matrix and assume that \(\Lambda _\epsilon (A) \subset \mathsf {C}_{\alpha }\) for some \(\alpha \in (0,1)\). Then, for any \(1< D < \frac{1}{\alpha }\) for every N we have the guarantee
Proof
Using the choice of \(\alpha _k\) and \(\epsilon _k\) given in the proof of Lemma 4.7 and the bound (27), we get that
where the last inequality was taken solely to make the expression more intuitive, since not much is lost by doing so. \(\square \)
4.3 Finite Arithmetic
Finally, we turn to the analysis of \(\mathsf {SGN}\) in finite arithmetic. By making the machine precision small enough, we can bound the effect of roundoff to ensure that the parameters \(\alpha _k\), \(\epsilon _k\) are not too far from what they would have been in the exact arithmetic analysis above. We will stop the iteration before any of the quantities involved become prohibitively small, so we will only need \(\mathrm {polylog}(1-\alpha _0, \epsilon _0, \beta )\) bits of precision, where \(\beta \) is the accuracy parameter.
In exact arithmetic, recall that the Newton iteration is given by \(A_{k+1} = g(A_{k}) = \frac{1}{2} (A_k + A_k^{-1}).\) Here we will consider the finite arithmetic version \(\mathsf {G}\) of the Newton map \(g\), defined as \(\mathsf {G}(A) := g(A)+E_A\) where \(E_A\) is an adversarial perturbation coming from the roundoff error. Hence, the sequence of interest is given by \({\widetilde{A}}_0 := A\) and \({\widetilde{A}}_{k+1} := \mathsf {G}({\widetilde{A}}_k)\) .
In this subsection we will prove the following theorem concerning the runtime and precision of \(\mathsf {SGN}\). Our assumptions on the size of the parameters \(\alpha _0, \beta , \mu _{\mathsf {INV}}(n)\) and \(c_\mathsf {INV}\) are in place only to simplify the analysis of constants; these assumptions are not required for the execution of the algorithm.
Theorem 4.9
(Main guarantees for \(\mathsf {SGN}\)) Assume \(\mathsf {INV}\) is a \((\mu _{\mathsf {INV}}(n), c_\mathsf {INV})\)-stable matrix inversion algorithm satisfying Definition 2.7. Let \(\epsilon _0\in (0,1), \beta \in (0,1/12)\), assume \(\mu _{\mathsf {INV}}(n) \ge 1\) and \(c_\mathsf {INV}\log n \ge 1\), and assume \(A = {\widetilde{A}}_0\) is a floating-point matrix with \(\epsilon _0\)-pseudospectrum contained in \(\mathsf {C}_{\alpha _0}\) where \(0< 1 - \alpha _0 < 1/100\). Run \(\mathsf {SGN}\) with
iterations (as specified in the statement of the algorithm). Then \(\widetilde{A_N}=\mathsf {SGN}(A)\) satisfies the advertised accuracy guarantee
when run with machine precision satisfying
corresponding to at most
required bits of precision. The number of arithmetic operations is at most
Later on, we will need to call \(\mathsf {SGN}\) on a matrix with shattered pseudospectrum; the lemma below calculates acceptable parameter settings for shattering so that the pseudospectrum is contained in the required pair of Apollonian circles, satisfying the hypothesis of Theorem 4.9.
Lemma 4.10
If A has \(\epsilon \)-pseudospectrum shattered with respect to a grid \(\mathsf {g}= \mathsf {grid}(z_0,\omega ,s_1,s_2)\) that includes the imaginary axis as a grid line, then one has \(\Lambda _{\epsilon _0}(A) \subseteq \mathsf {C}_{\alpha _0}\) where \(\epsilon _0 = \epsilon /2\) and
In particular, if \(\epsilon \) is at least \(1/\mathrm {poly}(n)\) and \(\omega s_1\) and \(\omega s_2\) are at most \(\mathrm {poly}(n\)), then \(\epsilon _0\) and \(1-\alpha _0\) are also at least \(1/\mathrm {poly}(n)\).
Proof
First, because it is shattered, the \(\epsilon /2\)-pseudospectrum of A is at least distance \(\epsilon /2\) from \(\mathsf {g}\). Recycling the calculation from Proposition 4.2, it suffices to take
From what we just observed about the pseudospectrum, we can take \(|{\text {Re}}z| \ge \epsilon /2\). To bound the denominator, we can use the crude bound that any two points inside the grid are at distance no more than \({{\,\mathrm{diam}\,}}(\mathsf {g})\). Finally, we use \(\sqrt{1 - x} \le 1 - x/2\) for any \(x\in (0,1)\). \(\square \)
The proof of Theorem 4.9 will proceed as in the exact arithmetic case, with the modification that \(\epsilon _k\) must be decreased by an additional factor after each iteration to account for roundoff. At each step, we set the machine precision \({\textbf {u }}\) small enough so that the \(\epsilon _k\) remain close to what they would be in exact arithmetic. For the analysis we will introduce an explicit auxiliary sequence \(e_k\) that lower bounds the \(\epsilon _k\), provided that \({\textbf {u }}\) is small enough.
Lemma 4.11
(One-step additive error) Assume the matrix inverse is computed by an algorithm \(\mathsf {INV}\) satisfying the guarantee in Definition 2.7. Then \(\mathsf {G}(A) = g(A) + E\) for some error matrix E with norm
The proof of this lemma is deferred to “Appendix A”.
With the error bound for each step in hand, we now move to the analysis of the whole iteration. It will be convenient to define \(s := 1 - \alpha _0\), which should be thought of as a small parameter. As in the exact arithmetic case, for \(k \ge 1,\) we will recursively define decreasing sequences \(\alpha _k\) and \(\epsilon _k\) maintaining the property
by induction as follows:
-
1.
The base case \(k=0\) holds because by assumption, \(\Lambda _{\epsilon _0} \subset \mathsf {C}_{\alpha _0}\).
-
2.
Here we recursively define \(\alpha _{k+1}\). Set
$$\begin{aligned} \alpha _{k+1} := (1 + s/4) \alpha _k^2. \end{aligned}$$In the notation of Sect. 4.2, this corresponds to setting \(D = 1+s/4\). This definition ensures that \(\alpha _k^2 \le \alpha _{k+1} \le \alpha _k\) for all k, and also gives us the bound \((1+s/4)\alpha _0 \le 1-s/2\). We also have the closed form
$$\begin{aligned} \alpha _k = (1+s/4)^{2^k - 1} \alpha _0^{2^k}, \end{aligned}$$which implies the useful bound
$$\begin{aligned} \alpha _k \le (1-s/2)^{2^k}. \end{aligned}$$(30) -
3.
Here we recursively define \(\epsilon _{k+1}\). Combining Lemma 4.4, the recursive definition of \(\alpha _{k+1}\), and the fact that \(1 - \alpha _k^2 \ge 1 - \alpha _0^2 \ge 1 - \alpha _0 = s\), we find that \(\Lambda _{\epsilon '}\left( g({\widetilde{A}}_k)\right) \subset \mathsf {C}_{\alpha _{k+1}}\), where
$$\begin{aligned} \epsilon ' = \epsilon _k \frac{\left( \alpha _{k+1} - \alpha _k^2\right) (1-\alpha _k^2)}{8\alpha _k} = \epsilon _k \frac{s\alpha _k(1-\alpha _k^2)}{32} \ge \epsilon _k\frac{ \alpha _k s^2}{32}. \end{aligned}$$Thus in particular
$$\begin{aligned} \Lambda _{\epsilon _k\alpha _k s^2/32} \left( g({\widetilde{A}}_k)\right) \subset \mathsf {C}_{\alpha _{k+1}}. \end{aligned}$$Since \({\widetilde{A}}_{k+1} = \mathsf {G}({\widetilde{A}}_k) = g({\widetilde{A}}_k) + E_k\), for some error matrix \(E_k\) arising from roundoff, Proposition 2.2 ensures that if we set
$$\begin{aligned} \epsilon _{k+1} := \epsilon _k\frac{ s^2 \alpha _k }{32} - \Vert E_k \Vert \end{aligned}$$(31)we will have \(\Lambda _{\epsilon _{k+1}}({\widetilde{A}}_{k+1}) \subset \mathsf {C}_{\alpha _{k+1}}, \) as desired.
We now need to show that the \(\epsilon _{k}\) do not decrease too fast as k increases. In view of (31), it will be helpful to set the machine precision small enough to guarantee that \(\Vert E_k \Vert \) is a small fraction of \(\epsilon _k\frac{ \alpha _k s^2}{32}\).
First, we need to control the quantities \(\Vert {\widetilde{A}}_k\Vert \), \(\Vert {\widetilde{A}}_k^{-1}\Vert \), and \(\kappa ({\widetilde{A}}_k) =\Vert {\widetilde{A}}_k\Vert \Vert {\widetilde{A}}_k^{-1}\Vert \) appearing in our upper bound (28) on \(\Vert E_k \Vert \) from Lemma 4.11, as functions of \(\epsilon _k\). By Corollary 4.6, we have
Thus, we may write the coefficient of \({\textbf {u }}\) in the bound (28) as
so that Lemma 4.11 reads
Plugging this into the definition (31) of \(\epsilon _{k+1}\), we have
Now suppose we take \({\textbf {u }}\) small enough so that
For such \({\textbf {u }}\), we then have
which implies
this bound is loose but sufficient for our purposes. Inductively, we now have the following bound on \(\epsilon _k\) in terms of \(\alpha _k\):
Lemma 4.12
(Preliminary lower bound on \(\epsilon _k\)) Let \(k \ge 0\), and for all \(0 \le i \le k-1\), assume \({\textbf {u }}\) satisfies the requirement (34):
Then, we have
In fact, it suffices to assume the hypothesis only for \(i=k-1\).
Proof
The last statement follows from the fact that \(\epsilon _i\) is decreasing in i and \(K_{\epsilon _i}\) is increasing in i.
Since (34) implies (35), we may apply (35) repeatedly to obtain
\(\square \)
We now show that the conclusion of Lemma 4.12 still holds if we replace \(\epsilon _i\) everywhere in the hypothesis by \(e_i\), which is an explicit function of \(\epsilon _0\) and \(\alpha _0\) defined in Lemma 4.12. Note that we do not know \(\epsilon _i \ge e_i\) a priori, so to avoid circularity we must use a short inductive argument.
Corollary 4.13
(Lower bound on \(\epsilon _k\) with explicit hypothesis) Let \(k \ge 0\), and for all \(0 \le i \le k-1\), assume \({\textbf {u }}\) satisfies
where \(e_i\) is defined in Lemma 4.12. Then, we have
In fact, it suffices to assume the hypothesis only for \(i=k-1\).
Proof
The last statement follows from the fact that \(e_i\) is decreasing in i and \(K_{e_i}\) is increasing in i.
Assuming the full hypothesis of this lemma, we prove \(\epsilon _i \ge e_i\) for \(0 \le i \le k\) by induction on i. For the base case, we have \(\epsilon _0 \ge e_0 = \epsilon _0 \alpha _0\).
For the inductive step, assume \(\epsilon _i \ge e_i\). Then as long as \(i \le k-1\), the hypothesis of this lemma implies
so we may apply Lemma 4.12 to obtain \(\epsilon _{i+1} \ge e_{i+1}\), as desired. \(\square \)
Lemma 4.14
(Main accuracy bound) Suppose \({\textbf {u }}\) satisfies the requirement (34) for all \(0 \le k \le N\). Then
Proof
Since \(\mathrm {sgn}= \mathrm {sgn}\circ g\), for every k we have
From the holomorphic functional calculus we can rewrite \(\Vert \mathrm {sgn}(\widetilde{A_{k+1}}) - \mathrm {sgn}(\widetilde{A_{k+1}} - E_k) \Vert \) as the norm of a certain contour integral, which in turn can be bounded as follows:
where we use the definition (6) of pseudospectrum and Proposition 2.2, together with the property (29). Ultimately, this chain of inequalities implies
Summing over all k and using the triangle inequality, we obtain
where in the last step we use \(\alpha _k \le 1\) and \(1 - \alpha _{k+1}^2 \ge s\), as well as (36).
By Lemma 4.3 (to be precise, by repeating the proof of that lemma with \(\widetilde{A_N}\) substituted for \(A_N\)), we have
where we use \(s < 1/2\) in the last step.
Combining the above with the triangle inequality, we obtain the desired bound.
\(\square \)
We would like to apply Lemma 4.14 to ensure \(\Vert \widetilde{A_N} - \mathrm {sgn}(A) \Vert \) is at most \(\beta \), the desired accuracy parameter. The upper bound (38) in Lemma 4.14 is the sum of two terms; we will make each term less than \(\beta /2\). The bound for the second term will yield a sufficient condition on the number of iterations N. Given that, the bound on the first term will then give a sufficient condition on the machine precision \({\textbf {u }}\). This will be the content of Lemmas 4.16 and 4.17.
We start with the second term. The following preliminary lemma will be useful:
Lemma 4.15
Let \(1/800> t > 0\) and \(1/2> c > 0\) be given. Then for
we have
The proof is deferred to “Appendix A”.
Lemma 4.16
(Bound on second term of (38)) Suppose we have
Then
Proof
It is sufficient that
The result now follows from applying Lemma 4.15 with \(c = \beta s^2 \epsilon _0/16\) and \(t=s/8\). \(\square \)
Now we move to the first term in the bound of Lemma 4.14.
Lemma 4.17
(Bound on first term of (38))
Suppose
and suppose the machine precision \({\textbf {u }}\) satisfies
Then we have
Proof
It suffices to show that for all \(0 \le k \le N-1\),
In view of (32), which says \(\Vert {E_k}\Vert \le K_{\epsilon _k} {\textbf {u }}\), it is sufficient to have for all \(0 \le k \le N-1\)
For this, we claim it is sufficient to have for all \(0 \le k \le N-1\)
Indeed, on the one hand, since \(\beta < 1/6\) and by the loose bound \(e_{k+1}< s \alpha _{k+1} < s \alpha _k\) we have that (40) implies \({\textbf {u }}\le \frac{1}{3K_{e_k}} \frac{ s^2 e_k}{32}\), which means that the assumption in Corollary 4.13 is satisfied. On the other hand, Corollary 4.13 yields \(e_k\le \epsilon _k\) for all \(0\le k \le N\), which in turn, combined with (40) would give (39) and conclude the proof.
We now show that (40) holds for all \(0\le k\le N-1\). Because \(1/K_{e_k}\) and \(e_k\) are decreasing in k, it is sufficient to have the single condition
We continue the chain of sufficient conditions on \({\textbf {u }}\), where each line implies the line above:
where we use the bound \(\frac{1}{e_N} \le \frac{4}{s^2 e_N^2}\) without much loss, and we also use our assumption \(\mu _{\mathsf {INV}}(n) \ge 1\) and \(c_\mathsf {INV}\log n \ge 1\) for simplicity.
Substituting the value of \(e_N\) as defined in Lemma 4.12, we get the sufficient condition
Replacing \(\alpha _N\) by the smaller quantity \(\alpha _0^{2^N} = (1-s)^{2^N}\) and cleaning up the constants yields the sufficient condition
Now we finally will use our hypothesis on the size of N to simplify this expression. Applying Lemma 4.16, we have
Thus, our sufficient condition becomes
To make the expression simpler, since \(c_\mathsf {INV}\log n + 3 \ge 4\) we may pull out a factor of \(4^4 > 192\) and remove the occurrences of \(\beta \) to yield the sufficient condition
\(\square \)
Matching the statement of Theorem 4.9, we give a slightly cleaner sufficient condition on N that implies the hypothesis on N appearing in the above lemmas. The proof is deferred to “Appendix A”.
Lemma 4.18
(Final sufficient condition on N) If
then
Taking the logarithm of the machine precision yields the number of bits required:
Lemma 4.19
(Bit length computation) Suppose
and
Then
Proof
In the course of the proof, for convenience we also record a nonasymptotic bound (for \(s<1/100\), \(\beta < 1/12\), \(\epsilon _0 < 1\) and \(c_\mathsf {INV}\log n > 1\) as in the hypothesis of Theorem 4.9), at the cost of making the computation somewhat messier.
Immediately we have
Note that \(\log (1/(1-s)) < s\) for \(s < 1/2\). Also, \(2^{N+1} \le (1/s) \lg (1/s)^3 (\lg (1/\beta ) + \lg (1/\epsilon _0))2^{9.59}.\) Putting this together, we have
We now crudely bound \(\lg N\). Note that for \(s < 1/100\) we have \(\lg (1/s) + 3 \lg \lg (1/s) + 7.59 \le 1/s\). Thus,
Combining the above, we may fold the \(\lg N\) and \(\lg n\) terms into the final term to obtain
where we use that \(c_\mathsf {INV}\log n > 1\) and therefore \(c_\mathsf {INV}\log n + 3 < 4 c_\mathsf {INV}\log n.\)
Using that \(\mu _{\mathsf {INV}}(n) = \mathrm {poly}(n)\) and discarding subdominant terms, we obtain the desired asymptotic bound. \(\square \)
This completes the proof of Theorem 4.9. Finally, we may prove the theorem advertised in Sect. 1.
Proof of Theorem 1.5
Set \(\epsilon := \min \{ \frac{1}{K}, 1\}\). Then \(\Lambda _\epsilon (A)\) does not intersect the imaginary axis, and furthermore \(\Lambda _\epsilon (A) \subseteq D(0, 2)\) because \(\Vert A \Vert \le 1\). Thus, we may apply Lemma 4.10 with \({{\,\mathrm{diam}\,}}(\mathsf {g}) = 4\sqrt{2}\) to obtain parameters \(\alpha _0, \epsilon _0\) with the property that \(\log (1/(1-\alpha _0))\) and \(\log (1/\epsilon _0)\) are both \(O(\log K)\). Theorem 4.9 now yields the desired conclusion. \(\square \)
5 Spectral Bisection Algorithm
In this section we will prove Theorem 1.6. As discussed in Sect. 1, our algorithm is not new, and in its idealized form it reduces to the two following tasks:
- Split::
-
Given an \(n\times n\) matrix A, find a partition of the spectrum into pieces of roughly equal size, and output spectral projectors \(P_{\pm }\) onto each of these pieces.
- Deflate::
-
Given an \(n\times n\) rank-k projector P, output an \(n\times k\) matrix Q with orthogonal columns that span the range of P.
These routines in hand, on input A one can compute \(P_{\pm }\) and the corresponding \(Q_{\pm }\), and then find the eigenvectors and eigenvalues of \(A_{\pm } := Q_{\pm }^*A Q_{\pm }\). The observation below verifies that this recursion is sound.
Observation 5.1
The spectrum of A is exactly \(\Lambda (A_+) \sqcup \Lambda (A_-)\), and every eigenvector of A is of the form \(Q_{\pm }v\) for some eigenvector v of one of \(A_{\pm }\).
The difficulty, of course, is that neither of these routines can be executed exactly: we will never have access to true projectors \(P_{\pm }\), nor to the actual orthogonal matrices \(Q_{\pm }\) whose columns span their range, and must instead make do with approximations. Because our algorithm is recursive and our matrices nonnormal, we must take care that the errors in the sub-instances \(A_{\pm }\) do not corrupt the eigenvectors and eigenvalues we are hoping to find. Additionally, the Newton iteration we will use to split the spectrum behaves poorly when an eigenvalue is close to the imaginary axis, and it is not clear how to find a splitting which is balanced.
Our tactic in resolving these issues will be to pass to our algorithms a matrix and a grid with respect to which its \(\epsilon \)-pseudospectrum is shattered. To find an approximate eigenvalue, then, one can settle for locating the grid square it lies in; containment in a grid square is robust to perturbations of size smaller than \(\epsilon \). The shattering property is robust to small perturbations, inherited by the subproblems we pass to, and—because the spectrum is quantifiably far from the grid lines—allows us to run the Newton iteration in the first place.
Let us now sketch the implementations and state carefully the guarantees for \(\mathsf {SPLIT}\) and \(\mathsf {DEFLATE}\); the analysis of these will be deferred to Appendices B and C. Our splitting algorithm is presented a matrix A whose \(\epsilon \)-pseudospectrum is shattered with respect to a grid \(\mathsf {g}\). For any vertical grid line with real part h, \(\mathrm {Tr}\, \mathrm {sgn}(A-h)\) gives the difference between the number of eigenvalues lying to its left and right. As
we can determine these eigenvalue counts exactly by running \(\mathsf {SGN}\) to accuracy O(1/n) and rounding \(\mathrm {Tr}\, \mathsf {SGN}(A-h)\) to the nearest integer. We will show in “Appendix B” that, by mounting a binary search over horizontal and vertical lines of \(\mathsf {g}\), we will always arrive at a partition of the eigenvalues into two parts with size at least \(\min \{n/5,1\}\). Having found it, we run \(\mathsf {SGN}\) one final time at the desired precision to find the approximate spectral projectors.
Theorem 5.2
(Guarantees for \(\mathsf {SPLIT}\)) Assume \(\mathsf {INV}\) is a \((\mu _\mathsf {INV},c_\mathsf {INV})\)-stable matrix inversion algorithm satisfying Definition 2.7. Let \(\epsilon \le 0.5\), \(\beta \le 0.05/n\), and \(\Vert A\Vert \le 4\) and \(\mathsf {g}\) have side lengths of at most 8, and define
Then \(\mathsf {SPLIT}\) has the advertised guarantees when run on a floating point machine with precision
Using at most
arithmetic operations. The number of bits required is
Deflation of the approximate projectors we obtain from \(\mathsf {SPLIT}\) amounts to a standard rank-revealing QR factorization. This can be achieved deterministically in \(O(n^3)\) time with the classic algorithm of Gu and Eisenstat [36], or probabilistically in matrix-multiplication time with a variant of the method of [22]; we will use the latter.
Theorem 5.3
(Guarantees for \(\mathsf {DEFLATE}\)) Assume \(\mathsf {MM}\) and \(\mathsf {QR}\) are matrix multiplication and QR factorization algorithms satisfying Definitions 2.6 and 2.8. Then \(\mathsf {DEFLATE}\) has the advertised guarantees when run on a machine with precision:
The number of arithmetic operations is at most:
Remark 5.4
The proof of the above theorem, which is deferred to “Appendix C”, closely follows and builds on the analysis of the randomized rank revealing factorization algorithm (\(\mathsf {RURV}\)) introduced in [22] and further studied in [7]. The parameters in the theorem are optimized for the particular application of finding a basis for a deflating subspace given an approximate spectral projector.
The main difference with the analysis in [22] and [7] is that here, to make it applicable to complex matrices, we make use of Haar unitary random matrices instead of Haar orthogonal random matrices. In our analysis of the unitary case, we discovered a strikingly simple formula (Corollary C.6) for the density of the smallest singular value of an \(r\times r\) sub-matrix of an \(n\times n\) Haar unitary; this formula is leveraged to obtain guarantees that work for any n and r, and not only for when \(n-r \ge 30\), as was the case in [7]. Finally, we explicitly account for finite arithmetic considerations in the Gaussian randomness used in the algorithm, where true Haar unitary matrices can never be produced.
We are ready now to state completely an algorithm \(\mathsf {EIG}\) which accepts a shattered matrix and grid and outputs approximate eigenvectors and eigenvalues with a forward-error guarantee. Aside from the a priori un-motivated parameter settings in lines 2 and 3—which we promise to justify in the analysis to come—\(\mathsf {EIG}\) implements an approximate version of the split and deflate framework that began this section.
Theorem 5.5
(\(\mathsf {EIG}\): Finite Arithmetic Guarantee) Assume \(\mathsf {MM}, \mathsf {QR}\), and \(\mathsf {INV}\) are numerically stable algorithms for matrix multiplication, QR factorization, and inversion satisfying Definitions 2.6, 2.8, and 2.7. Let \(\delta < 1\), \(A \in \mathbb {C}^{n\times n}\) have \(\Vert A\Vert \le 3.5\) and, for some \(\epsilon < 1/2\), have \(\epsilon \)-pseudospectrum shattered with respect to a grid \(\mathsf {g}= \mathsf {grid}(z_0,\omega ,s_1,s_2)\) with side lengths at most 8 and \(\omega \le 1\). Define
Then \(\mathsf {EIG}\) has the advertised guarantees when run on a floating point machine with precision satisfying:
The number of arithmetic operations is at most
Remark 5.6
We have not fully optimized the large constant \(2^{9.59}\) appearing in the bit length above.
Theorem 5.5 easily implies Theorem 1.6 when combined with \(\mathsf {SHATTER}\).
Theorem 5.7
(Restatement of Theorem 1.6) There is a randomized algorithm \(\mathsf {EIG}\) which on input any matrix \(A\in \mathbb {C}^{n\times n}\) with \(\Vert A\Vert \le 1\) and a desired accuracy parameter \(\delta \in (0,1)\) outputs a diagonal D and invertible V such that
in
arithmetic operations on a floating point machine with
bits of precision, with probability at least \(1-14/n\). Here \(T_\mathsf {MM}(n)\) refers to the running time of a numerically stable matrix multiplication algorithm (detailed in Sect. 2.5).
Proof
Given A and \(\delta \), consider the following two step algorithm:
-
1.
\((X, \mathsf {g}, \epsilon )\leftarrow \mathsf {SHATTER}(A,\delta /8)\).
-
2.
\((V,D)\leftarrow \mathsf {EIG}(X,\delta ',\mathsf {g},\epsilon ,1/n,n)\), where
$$\begin{aligned} \delta ' := \frac{\delta ^3}{n^{4.5} \cdot 6 \cdot 128 \cdot 2}. \end{aligned}$$(42)
With probability at least \(1 - 13/n\), \(\mathsf {SHATTER}(A,\delta /8)\) succeeds, in which case the output \((X,\mathsf {grid},\epsilon )\) output easily satisfy the assumptions in Theorem 5.5: \(\delta ' \le \delta < 1\), \(\epsilon = \tfrac{(\delta /8)^5}{32 n^9} \le 1/2\), \(\mathsf {g}\) is defined by \(\mathsf {SHATTER}\) to have side length 8, \(\Vert X\Vert \le \Vert A\Vert + \Vert X - A\Vert \le 1 + 4(\delta /8) \le 3.5\), and X has \(\epsilon \)-pseudospectrum shattered with respect to \(\mathsf {g}\). On this event, \(X = WCW^{-1}\), and (using the proof of Theorem 3.6) if we normalize W to have unit length columns, then \(\kappa (W) = \Vert W\Vert \Vert W^{-1}\Vert \le 8n^2/\delta \).
We will show that the choice of \(\delta '\) in (42) guarantees
Since \(\Vert X\Vert \le \Vert A\Vert + \Vert A - X\Vert \le 1 + 4\gamma \le 3\) from Theorem 3.13, the hypotheses of Theorem 5.5 are satisfied. Thus \(\mathsf {EIG}\) succeeds with probability at least \(1-1/n\), and by a union bound, both \(\mathsf {EIG}\) and \(\mathsf {SHATTER}\) succeed with probability at least \(1 - 14/n\). On this event, we have \(V=W+E\) for some \(\Vert E\Vert \le \delta '\sqrt{n}\), so
as well as
since our choice of \(\delta '\) satisfies the much cruder bound of
This implies that
establishing the last item of the theorem. We can control the perturbation of the inverse as:
The grid output by \(\mathsf {SHATTER}(A,\delta /8)\) has \(\omega = \tfrac{\delta ^4}{4*8^4*n^5} \le \tfrac{\delta }{\sqrt{2}}\) provided \(\delta < 1\). Thus the guarantees on \(\mathsf {EIG}\) in Theorem 5.5 tell us each eigenvalue of \(X = WCW^{-1}\) shares a grid square with exactly one diagonal entry of D, which means that \(\Vert C - D\Vert \le \sqrt{2}\omega \le \delta \). So, we have:
which is at most \(\delta /2\), for \(\delta '\) chosen as above. We conclude that
with probability \(1-14/n\) as desired.
To compute the running time and precision, we observe that \(\mathsf {SHATTER}\) outputs a grid with parameters
Plugging this into the guarantees of \(\mathsf {EIG}\), we see that it takes
arithmetic operations, on a floating point machine with precision
bits, as advertised. \(\square \)
5.1 Proof of Theorem 5.5
A key stepping-stone in our proof will be the following elementary result controlling the spectrum, pseudospectrum, and eigenvectors after perturbing a shattered matrix.
Lemma 5.8
(Eigenvector Perturbation for a Shattered Matrix) Let \(\Lambda _{\epsilon }(A)\) be shattered with respect to a grid whose squares have side length \(\omega \), and assume that \(\Vert {{\widetilde{A}}} - A\Vert \le \eta < \epsilon \). Then, (i) each eigenvalue of \({{\widetilde{A}}}\) lies in the same grid square as exactly one eigenvalue of A, (ii) \(\Lambda _{\epsilon - \eta }(\widetilde{A})\) is shattered with respect to the same grid, and (iii) for any right unit eigenvector \({{\widetilde{v}}}\) of \({\widetilde{A}}\), there exists a right unit eigenvector of A corresponding to the same grid square, and for which
Proof
For (i), consider \(A_t = A + t({{\widetilde{A}}} - A)\) for \(t \in [0,1]\). By continuity, the entire trajectory of each eigenvalue is contained in a unique connected component of \(\Lambda _\eta (A) \subset \Lambda _\epsilon (A)\). For (ii), \(\Lambda _{\epsilon - \eta }({{\widetilde{A}}}) \subset \Lambda _{\epsilon }(A)\), which is shattered by hypothesis. Finally, for (iii), let \(w^*\) and \({\widetilde{w}}^*\) be the corresponding left eigenvectors to v and \({\widetilde{v}}\), respectively, normalized so that \(w^*v = {\widetilde{w}}^*{\widetilde{v}} = 1\). Let \(\Gamma \) be the boundary of the grid square containing the eigenvalues associated to v and \({\widetilde{v}}\), respectively. Then, using a contour integral along \(\Gamma \) as in (13) above, one gets
Thus, using that \(\Vert v\Vert =1\) and \(w^*v = 1\),
Now, since \(({\widetilde{v}}^*v) {\widetilde{v}}\) is the orthogonal projection of v onto the span of \({\widetilde{v}}\), we have that
Multiplying v by a phase we can assume without loss of generality that \({\widetilde{v}}^* v\ge 0\) which implies that
The above discussion can now be summarized in the following chain of inequalities
Finally, note that \(\Vert v-{\widetilde{v}}\Vert = \sqrt{2-2{\widetilde{v}}^*v} \le \frac{\sqrt{8}\omega }{\pi } \frac{\eta }{\epsilon (\epsilon - \eta )}\) as we wanted to show. \(\square \)
The algorithm \(\mathsf {EIG}\) works by recursively reducing to subinstances of smaller size, but requires a pseudospectral guarantee to ensure speed and stability. We thus need to verify that the pseudospectrum does not deteriorate too substantially when we pass to a sub-problem.
Lemma 5.9
(Shattering is preserved after compression) Suppose P is a spectral projector of \(A\in \mathbb {C}^{n\times n}\) of rank k. Let \(Q\in \mathbb {C}^{n\times k}\) be such that \(Q^*Q=I_k\) and that its columns span the same space as the columns of P. Then for every \(\epsilon >0\),
Alternatively, the same pseudospectral inclusion holds if again \(Q^*Q=I_k\) and, instead, the columns of Q span the same space as the rows of P.
Proof
We will first analyze the case when the columns of Q span the same space as the columns of P. To begin, note that if \(z\in \Lambda _\epsilon (Q^*AQ)\) then there exists \(v\in \mathbb {C}^k\) satisfying \(\Vert (z-Q^*AQ)v\Vert \le \epsilon \Vert v\Vert \). Since \(I_k=Q^*I_nQ\) we have
And, because \(Q^*\) acts as an isometry on \(\mathrm {range}(Q)\) (the span of the columns of Q) and by assumption this space is invariant under P (and hence under \((z-A)\)), we have that \((z-A)Qv\in \mathrm {range}(Q)\), and therefore \(\Vert Q^*(z-A)Qv\Vert = \Vert (z-A)Qv\Vert \). From where we obtain
showing that \(z\in \Lambda _\epsilon (A)\).
For the case in which the columns of Q span the rows of P, the above proof can be easily modified by now taking v with the property that \(\Vert v^* Q^* (z-A)Q\Vert \le \epsilon \Vert v\Vert \). \(\square \)
Observation 5.10
Since \(\delta ,\omega (\mathsf {g}),\epsilon \le 1\), our assumption on \(\eta \) in Line 2 of the pseudocode of \(\mathsf {EIG}\) implies the following bounds on \(\eta \) which we will use below:
Initial lemmas in hand, let us begin to analyze the algorithm. At several points we will make an assumption on the machine precision in the margin. These will be collected at the end of the proof, where we will verify that they follow from the precision hypothesis of Theorem 5.5.
Correctness.
Lemma 5.11
(Accuracy of \(\widetilde{\lambda _i}\)) When \(\mathsf {DEFLATE}\) succeeds, each eigenvalue of A shares a square of \(\mathsf {g}\) with a unique eigenvalue of either \(\widetilde{A_{+}}\) or \(\widetilde{A_{-}}\), and furthermore \(\Lambda _{4\epsilon /5} (\widetilde{A_{\pm }}) \subset \Lambda _\epsilon (A)\).
Proof
Let \(P_{\pm }\) be the true projectors onto the two bisection regions found by \(\mathsf {SPLIT}(A,\beta )\), \(Q_{\pm }\) be the matrices whose orthogonal columns span their ranges, and \(A_{\pm } := Q_{\pm }^*A Q_{\pm }\). From Theorem 5.3, on the event that \(\mathsf {DEFLATE}\) succeeds, the approximation \(\widetilde{Q_{\pm }}\) that it outputs satisfies \(\Vert \widetilde{Q_{\pm }} - Q_{\pm }\Vert \le \eta \), so in particular \(\Vert \widetilde{Q_{\pm }}\Vert \le 2\) as \(\eta \le 1\). The error \(E_{6,\pm }\) from performing the matrix multiplications necessary to compute \(\widetilde{A_{\pm }}\) admits the bound
Iterating the triangle inequality, we obtain
We can now apply Lemma 5.8. \(\square \)
Everything is now in place to show that, if every call to \(\mathsf {DEFLATE}\) succeeds, \(\mathsf {EIG}\) has the advertised accuracy guarantees. After we show this, we will lower bound this success probability and compute the running time.
When \(A \in \mathbb {C}^{1\times 1}\), the algorithm works as promised. Assume inductively that \(\mathsf {EIG}\) has the desired guarantees on instances of size strictly smaller than n. In particular, maintaining the notation from the above lemmas, we may assume that
satisfy (i) each eigenvalue of \(\widetilde{D_{\pm }}\) shares a square of \(\mathsf {g}_{\pm }\) with exactly one eigenvalue of \(\widetilde{A_{\pm }}\), and (ii) each column of \(\widetilde{V_{\pm }}\) is \(4\delta /5\)-close to a true eigenvector of \(\widetilde{A_{\pm }}\). From Lemma 5.8, each eigenvalue of \(\widetilde{A_{\pm }}\) shares a grid square with exactly one eigenvalue of A, and thus the output
satisfies the eigenvalue guarantee.
To verify that the computed eigenvectors are close to the true ones, let \(\widetilde{{\widetilde{v}}_{\pm }}\) be some approximate right unit eigenvector of one of \(\widetilde{A_{\pm }}\) output by \(\mathsf {EIG}\) (with norm \(1 \pm n{\textbf {u }}\)), \({\widetilde{v}}_{\pm }\) the exact unit eigenvector of \(\widetilde{A_\pm }\) that it approximates, and \(v_{\pm }\) the corresponding exact unit eigenvector of \(A_{\pm }\). Recursively, \(\mathsf {EIG}(A,\epsilon ,\mathsf {g},\delta ,\theta ,n)\) will output an approximate unit eigenvector
whose proximity to the actual eigenvector \(v := Q v_{\pm }\) we need now to quantify. The error terms here are e, a column of the error matrix \(E_{8}\) whose norm we can crudely bound by
and \(e'\), a column \(E_9\) incurred by performing the normalization in floating point; in our initial discussion of floating point arithmetic we assumed in (16) that \(\Vert e'\Vert \le n{\textbf {u }}\).
First, since \({\widetilde{v}} - e'\) and \(\widetilde{Q_{\pm }}\widetilde{{\widetilde{v}}_{\pm }} + e\) are parallel, the distance between them is just the difference in their norms:
Inductively \(\Vert \widetilde{{\widetilde{v}}_{\pm }} - \widetilde{{\widetilde{v}}_{\pm }} \Vert \le 4\delta /5\), and since \(\Vert A_{\pm } - \widetilde{A_{\pm }}\Vert \le \epsilon /5\) and \(A_{\pm }\) has shattered \(\epsilon \)-pseudospectrum from Lemma 5.9, Lemma 5.8 ensures
Thus putting together the above, iterating the triangle identity, and using \(\Vert Q_{\pm }\Vert = 1\),
This concludes the proof of correctness of \(\mathsf {EIG}\).
Running Time and Failure Probability. Let’s begin with a simple lemma bounding the depth of \(\mathsf {EIG}\)’s recursion tree.
Lemma 5.12
(Recursion Depth) The recursion tree of \(\mathsf {EIG}\) has depth at most \(\log _{5/4} n\), and every branch ends with an instance of size \(1\times 1\).
Proof
By Theorem 5.2, \(\mathsf {SPLIT}\) can always find a bisection of the spectrum into two regions containing \(n_\pm \) eigenvalues, respectively, with \(n_+ + n_- = n\) and \(n_{\pm } \ge 4n/5\), and when \(n\le 5\) can always peel off at least one eigenvalue. Thus the depth d(n) satisfies
As \(n \le \log _{5/4}n\) for \(n \le 5\), the result is immediate from induction. \(\square \)
We pause briefly to verify that the assumptions \(\delta < 1\), \(\epsilon < 1/2\), \(\mathsf {grid}\) has side lengths at most 9, and \(\Vert A\Vert \le 3.5\) in Theorem 5.5 ensure that every call to \(\mathsf {SPLIT}\) throughout the algorithm satisfies the hypotheses of Theorem 5.2, namely that \(\epsilon \le 0.5, \beta \le 0.05/n, \Vert A\Vert \le 4,\) and \(\mathsf {grid}\) has side lengths of at most 8. Since \(\delta ,\epsilon ,\) and \(\beta \) are non-increasing as we travel down the recursion tree of \(\mathsf {EIG}\)—with \(\beta \) monotonically decreasing in \(\delta \) and \(\epsilon \)—we need only verify that the hypotheses of Theorem 5.2 hold on the initial call to \(\mathsf {EIG}\). The condition on \(\epsilon \) is immediately satisfied; for the one on \(\beta \), we have
which is clearly at most 0.05/n.
On each new call to \(\mathsf {EIG}\) the grid only decreases in size, so the initial assumption is sufficient. Finally, we need that every matrix passed to \(\mathsf {SPLIT}\) throughout the course of the algorithm has norm at most 4. Lemma 5.11 shows that if \(\Vert A\Vert \le 4\) and has its \(\epsilon \)-pseudospectrum shattered, then \(\Vert \widetilde{A_{\pm }} - A_{\pm }\Vert \le \epsilon /5\), and since \(\Vert A_{\pm }\Vert = \Vert A\Vert \), this means \(\Vert \widetilde{A_{\pm }}\Vert \le \Vert A\Vert + \epsilon /5\). Thus each time we pass to a subproblem, the norm of the matrix we pass to \(\mathsf {EIG}\) (and thus to \(\mathsf {SPLIT}\)) increases by at most an additive \(\epsilon /5\), where \(\epsilon \) is the input to the outermost call to \(\mathsf {EIG}\). Since \(\epsilon \) decreases by a factor of 4/5 on each recursion step, this means that by the end of the algorithm the norm of the matrix passed to \(\mathsf {EIG}\) will increase by at most an additive \((\epsilon + (4/5)\epsilon + (4/5)^2 \epsilon + \cdots )/5 = \epsilon \le 1/2\). Thus we will be safe if our initial matrix has norm at most 3.5, as assumed.
Lemma 5.13
(Lower Bounds on the Parameters) Assume \(\mathsf {EIG}\) is run on an \(n\times n\) matrix, with some parameters \(\delta \) and \(\epsilon \). Throughout the algorithm, on every recursive call to \(\mathsf {EIG}\), the corresponding parameters \(\delta '\) and \(\epsilon '\) satisfy
On each such call to \(\mathsf {EIG}\), the parameters \(\eta '\) and \(\beta '\) passed to \(\mathsf {SPLIT}\) and \(\mathsf {DEFLATE}\) satisfy
Proof
Along each branch of the recursion tree, we replace \(\epsilon \leftarrow 4\epsilon /5\) and \(\delta \leftarrow 4\delta /5\) at most \(\log _{5/4}n\) times, so each can only decrease by a factor of n from their initial settings. The parameters \(\eta '\) and \(\beta '\) are computed directly from \(\epsilon '\) and \(\delta '\). \(\square \)
Lemma 5.14
(Failure Probability) \(\mathsf {EIG}\) fails with probability no more than \(\theta \).
Proof
Since each recursion splits into at most two subproblems, and the recursion tree has depth \(\log _{5/4}n\), there are at most
calls to \(\mathsf {DEFLATE}\). We have set every \(\eta \) and \(\beta \) so that the failure probability of each is \(\theta /2n^4\), so a crude union bound finishes the proof. \(\square \)
The arithmetic operations required for \(\mathsf {EIG}\) satisfy the recursive relationship
All of \(T_{\mathsf {SPLIT}}\), \(T_{\mathsf {DEFLATE}}\), and \(T_{\mathsf {MM}}\) are of the form \(\mathrm {polylog}(n)\mathrm {poly}(n)\), with all coefficients nonnegative and exponents in the \(\mathrm {poly}(n)\) no smaller than 2. So, for any \(n_+ + n_- = n\) and \(n_{\pm } \ge 4 n/5\), holding all other parameters fixed, \(T_{\mathsf {SPLIT}}(n_+,...) + T_{\mathsf {SPLIT}}(n_-,...) \le \left( (4/5)^2 + (1/5)^2\right) T_{\mathsf {SPLIT}}(n,...) = (17/25)T_{\mathsf {SPLIT}}(n,...)\) and the same holds for \(T_{\mathsf {DEFLATE}}\) and \(T_{\mathsf {MM}}\). Applying this recursively, with all parameters other than n set to their lower bounds from Lemma 5.13, we then have
where
In the above inequalities, we’ve substituted in the expressions for \(T_{\mathsf {SPLIT}}\) and \(T_{\mathsf {DEFLATE}}\) from Theorems 5.2 and 5.3, respectively; \(N_{\mathsf {EIG}}\) is defined by recomputing \(N_{\mathsf {SPLIT}}\) with the parameter lower bounds, and the \(\epsilon ^9\) is not an error. The final inequality uses our assumption \(T_\mathsf {N}= O(1)\). Thus using the fast and stable instantiations of \(\mathsf {MM}\), \(\mathsf {INV}\), and \(\mathsf {QR}\) from Theorem 2.10, we have
exact constants can be extracted by analyzing \(N_{\mathsf {EIG}}\) and opening Theorem 2.10.
Required Bits of Precision. We will need the following bound on the norms of all spectral projectors.
Lemma 5.15
(Sizes of Spectral Projectors) Throughout the algorithm, every approximate spectral projector \({\widetilde{P}}\) given to \(\mathsf {DEFLATE}\) satisfies \(\Vert {\widetilde{P}}\Vert \le 10n/\epsilon \).
Proof
Every such \({\widetilde{P}}\) is \(\beta \)-close to a true spectral projector P of a matrix whose \(\epsilon /n\)-pseudosepctrum is shattered with respect to the initial \(8\times 8\) unit grid \(\mathsf {g}\). Since we can generate P by a contour integral around the boundary of a rectangular subgrid, we have
with the last inequality following from \(\epsilon < 1\). \(\square \)
Collecting the machine precision requirements \({\textbf {u }}\le {\textbf {u }}_{\mathsf {SPLIT}},{\textbf {u }}_{\mathsf {DEFLATE}}\) from Theorems 5.2 and 5.3, as well as those we used in the course of our proof so far, and substituting in the parameter lower bounds from Lemma 5.13, we need \({\textbf {u }}\) to satisfy
From Lemma 5.15, \(\Vert {\widetilde{P}}\Vert \le 10n/\epsilon \), so the conditions in the second two lines are all satisfied if we make the crass upper bound
i.e. if \(\lg 1/{\textbf {u }}\ge O\left( \lg \frac{n}{\theta \delta \epsilon }\right) \). Unpacking the first requirement, using the definition \( N_{\mathsf {EIG}} := \lg \tfrac{256 n}{\epsilon } + 3\lg \lg \tfrac{256 n}{\epsilon } + \lg \lg \tfrac{(5n)^{26}}{\theta ^2\delta ^4\epsilon ^9} + 7.59\) from Theorem 5.5, and recalling that \(\epsilon \le 1/2\), \(n \ge 1\), and \((1 - x)^{1/x} \ge 1/4\) for \(x \in (0,1/512)\), we have
so setting \({\textbf {u }}\) smaller than the final expression is sufficient to guarantee \(\mathsf {EIG}\) and all subroutines can execute as advertised. This gives
This dominates the precision requirement from (45), and completes the proof of Theorem 5.5.
Remark 5.16
A constant may be extracted directly from the expression above—leaving \(\epsilon ,\delta ,\theta \) fixed, a crude bound on it is \(2^{9.59} \cdot 26 \cdot 8 \cdot c_{\mathsf {INV}} \approx 160303 c_{\mathsf {INV}}\). This can certainly be optimized, the improvement with the highest impact would be tighter analysis of \(\mathsf {SPLIT}\), with the aim of eliminating the additive 7.59 term in \(N_{\mathsf {SPLIT}}\).
6 Conclusion and Open Questions
In this paper, we reduced the approximate diagonalization problem to a polylogarithmic number of matrix multiplications, inversions, and QR factorizations on a floating point machine with precision depending only polylogarithmically on n and \(1/\delta \). The key phenomena enabling this were: (a) every matrix is \(\delta \)-close to a matrix with well-behaved pseudospectrum, and such a matrix can be found by a complex Gaussian perturbation and (b) the spectral bisection algorithm can be shown to converge rapidly to a forward approximate solution on such a well-behaved matrix, using a polylogarithmic in n and \(1/\delta \) amount of precision and number of iterations. The combination of these facts yields a \(\delta \)-backward approximate solution for the original problem.
Using fast matrix multiplication, we obtain algorithms with nearly optimal asymptotic computational complexity (as a function of n, compared to matrix multiplication), for general complex matrices with no assumptions. Using naïve matrix multiplication, we get easily implementable algorithms with \(O(n^3)\) type complexity and much better constants which are likely faster in practice. The constants in our bit complexity and precision estimates (see Theorem 5.5 and equations (41) and (42)), while not huge, are likely suboptimal. The reasonable practical performance of spectral bisection based algorithms is witnessed by the many empirical papers (see e.g. [5]) which have studied it. The more recent of these works further show that such algorithms are communication-avoiding and have good parallelizability properties.
Remark 6.1
(Hermitian Matrices) A curious feature of our algorithm is that even when the input matrix is Hermitian or real symmetric, it begins by adding a complex non-Hermitian perturbation to regularize the spectrum. If one is only interested in this special case, one can replace this first step by a Hermitian GUE or symmetric GOE perturbation and appeal to the result of [1] instead of Theorem 1.4, which also yields a polynomial lower bound on the minimum gap of the perturbed matrix. It is also possible to obtain a much stronger analysis of the Newton iteration in the Hermitian case, since the iterates are all Hermitian and \(\kappa _V=1\) for such matrices. By combining these observations, one can obtain a running time for Hermitian matrices which is significantly better (in logarithmic factors) than our main theorem. We do not pursue this further since our main goal was to address the more difficult non-Hermitian case.
We conclude by listing several directions for future research.
-
1.
Devise a deterministic algorithm with similar guarantees. The main bottleneck to doing this is deterministically finding a regularizing perturbation, which seems quite mysterious. Another bottleneck is computing a rank-revealing QR factorization in near matrix multiplication time deterministically (all of the currently known deterministic algorithms require \(\Omega (n^3)\) time).
-
2.
Determine the correct exponent for smoothed analysis of the eigenvalue gap of \(A+\gamma G\) where G is a complex Ginibre matrix. We currently obtain roughly \((\gamma /n)^{8/3}\) in Theorem 3.6. Is it possible to match the \(n^{-4/3}\) type dependence [64] which is known for a pure Ginibre matrix?
-
3.
Reduce the dependence of the running time and precision to a smaller power of \(\log (1/\delta )\). The bottleneck in the current algorithm is the number of bits of precision required for stable convergence of the Newton iteration for computing the sign function. Other, “inverse-free” iterative schemes have been proposed for this, which conceivably require lower precision.
-
4.
Study the convergence of “scaled Newton iteration” and other rational approximation methods (see [40, 48]) for computing the sign function on non-Hermitian matrices. Perhaps these have even faster convergence and better stability properties?
More broadly, we hope that the techniques introduced in this paper—pseudospectral shattering and pseudospectral analysis of matrix iterations using contour integrals—are useful in attacking other problems in numerical linear algebra.
Notes
A detailed discussion of these and other related results appears in Sect. 1.3.
In fact, it can be shown that \(\kappa _{\mathrm {eig}}(A)\) is related by a \(\mathrm {poly}(n)\) factor to the smallest constant for which (4) holds for all sufficiently small \(\delta >0\).
Doing the inversions exactly in rational arithmetic could require numbers of bit length \(n^k\) for k iterations, which will typically not even be polynomial.
At the time of writing, the work [55] is still an unpublished arXiv preprint.
This is called an a fortiriori bound in numerical analysis.
[15] states: “A priori backward and forward error bounds for evaluation of the matrix sign function remain elusive.”
The output of their algorithm is n vectors on each of which Newton’s method converges quadratically to an eigenvector, which they refer to as “approximation à la Smale”.
“The remaining nontrivial problems are, of course, the estimation of the above output precision p [sufficient for finding an approximate eigenvector from an approximate eigenvalue], \(\ldots \) . We leave these open problems as a challenge for the reader.”—[51, Section 12].
We are not aware a published analysis of this algorithm in finite arithmetic, but believe that it can be carried out with \(O(\log (n/\delta ))\) bits of precision. The only issue that needs to be handled is forward instability of the QR step when the Wilkinson shift is very close to an eigenvalue of the matrix, which can be resolved e.g. by a small random perturbation of the Wilkinson shift.
In this manuscript we will use \(z-M\) as a shorthand notation for \(zI-M\) where I denotes the identity matrix.
\(G_n\) is almost surely invertible and under this event U and R are uniquely determined by these conditions.
Any algorithm that yields the QR decomposition can be modified in a stable way to satisfy this last condition at the cost of \(O^*(n\log (1/{\textbf {u }}))\) operations.
For example, diagonalizable matrices satisfy this criterion.
References
M. Aizenman, R. Peled, J. Schenker, M. Shamis, and S. Sodin. Matrix regularizing effects of Gaussian perturbations. Communications in Contemporary Mathematics, 19(03):1750028, 2017.
D. Armentano, C. Beltrán, P. Bürgisser, F. Cucker, and M. Shub. A stable, polynomial-time algorithm for the eigenpair problem. Journal of the European Mathematical Society, 20(6):1375–1437, 2018.
G. B. Arous and P. Bourgade. Extreme gaps between eigenvalues of random matrices. The Annals of Probability, 41(4):2648–2681, 2013.
Z. Bai and J. Demmel. Using the matrix sign function to compute invariant subspaces. SIAM Journal on Matrix Analysis and Applications, 19(1):205–225, 1998.
Z. Bai, J. Demmel, and M. Gu. An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems. Numerische Mathematik, 76(3):279–308, 1997.
G. Ballard, J. Demmel, and I. Dumitriu. Minimizing communication for eigenproblems and the singular value decomposition. arXiv preprint arXiv:1011.3077, 2010.
G. Ballard, J. Demmel, I. Dumitriu, and A. Rusciano. A generalized randomized rank-revealing factorization. arXiv preprint arXiv:1909.06524, 2019.
J. Banks, J. Garza-Vargas, A. Kulkarni, and N. Srivastava. Overlaps, eigenvalue gaps, and pseudospectrum under real ginibre and absolutely continuous perturbations. arXiv preprint arXiv:2005.08930, 2020.
J. Banks, A. Kulkarni, S. Mukherjee, and N. Srivastava. Gaussian regularization of the pseudospectrum and Davies’ conjecture. arXiv preprint arXiv:1906.11819, to appear in Communications on Pure and Applied Mathematics, 2019.
A. N. Beavers and E. D. Denman. A computational method for eigenvalues and eigenvectors of a matrix with real eigenvalues. Numerische Mathematik, 21(5):389–396, 1973.
A. N. Beavers Jr. and E. D. Denman. A new similarity transformation method for eigenvalues and eigenvectors. Mathematical Biosciences, 21(1-2):143–169, 1974.
M. Ben-Or and L. Eldar. A quasi-random approach to matrix spectral analysis. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
D. Bindel, S. Chandresekaran, J. Demmel, D. Garmire, and M. Gu. A fast and stable nonsymmetric eigensolver for certain structured matrices. Technical report, Technical report, University of California, Berkeley, CA, 2005.
R. Byers. Numerical stability and instability in matrix sign function based algorithms. In Computational and Combinatorial Methods in Systems Theory. Citeseer, 1986.
R. Byers, C. He, and V. Mehrmann. The matrix sign function method and the computation of invariant subspaces. SIAM Journal on Matrix Analysis and Applications, 18(3):615–632, 1997.
R. Byers and H. Xu. A new scaling for Newton’s iteration for the polar decomposition and its backward stability. SIAM Journal on Matrix Analysis and Applications, 30(2):822–843, 2008.
J.-y. Cai. Computing Jordan normal forms exactly for commuting matrices in polynomial time. International Journal of Foundations of Computer Science, 5(03n04):293–302, 1994.
G. Cipolloni, L. Erdős, and D. Schröder. On the condition number of the shifted real ginibre ensemble. arXiv preprint arXiv:2105.13719, 2021.
R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey, and D. E. Knuth. On the lambert \(w\) function. Advances in Computational mathematics, 5(1):329–359, 1996.
E. B. Davies. Approximate diagonalization. SIAM Journal on Matrix Analysis and Applications, 29(4):1051–1064, 2007.
T. Dekker and J. Traub. The shifted qr algorithm for hermitian matrices. Linear Algebra Appl, 4:137–154, 1971.
J. Demmel, I. Dumitriu, and O. Holtz. Fast linear algebra is stable. Numerische Mathematik, 108(1):59–91, 2007.
J. Demmel, I. Dumitriu, O. Holtz, and R. Kleinberg. Fast matrix multiplication is stable. Numerische Mathematik, 106(2):199–224, 2007.
J. W. Demmel. On condition numbers and the distance to the nearest ill-posed problem. Numerische Mathematik, 51(3):251–289, 1987.
J. W. Demmel. The probability that a numerical analysis problem is difficult. Mathematics of Computation, 50(182):449–480, 1988.
J. W. Demmel. Applied numerical linear algebra, volume 56. SIAM, 1997.
E. D. Denman and A. N. Beavers Jr. The matrix sign function and computations in systems. Applied Mathematics and Computation, 2(1):63–94, 1976.
I. Dumitriu. Smallest eigenvalue distributions for two classes of \(\beta \)-Jacobi ensembles. Journal of Mathematical Physics, 53(10):103301, 2012.
A. Edelman. Eigenvalues and condition numbers of random matrices. SIAM Journal on Matrix Analysis and Applications, 9(4):543–560, 1988.
A. Edelman and N. R. Rao. Random matrix theory. Acta Numerica, 14:233–297, 2005.
A. Edelman and B. D. Sutton. The beta-Jacobi matrix model, the CS decomposition, and generalized singular value problems. Foundations of Computational Mathematics, 8(2):259–285, 2008.
P. J. Forrester. Log-gases and random matrices (LMS-34). Princeton University Press, 2010.
S. Ge. The Eigenvalue Spacing of IID Random Matrices and Related Least Singular Value Results. PhD thesis, UCLA, 2017.
A. Greenbaum, R.-c. Li, and M. L. Overton. First-order perturbation theory for eigenvalues and eigenvectors. SIAM Review, 62(2):463–482, 2020.
M. Grötschel, L. Lovász, and A. Schrijver. Geometric algorithms and combinatorial optimization, volume 2. Springer Science & Business Media, 2012.
M. Gu and S. C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM Journal on Scientific Computing, 17(4):848–869, 1996.
U. Haagerup and F. Larsen. Brown’s spectral distribution measure for \(R\)-diagonal elements in finite von Neumann algebras. Journal of Functional Analysis, 176(2):331–367, 2000.
N. J. Higham. The matrix sign decomposition and its relation to the polar decomposition. Linear Algebra and its Applications, 212:3–20, 1994.
N. J. Higham. Accuracy and stability of numerical algorithms, volume 80. SIAM, 2002.
N. J. Higham. Functions of matrices: theory and computation, volume 104. SIAM, 2008.
W. Hoffmann and B. N. Parlett. A new proof of global convergence for the tridiagonal QL algorithm. SIAM Journal on Numerical Analysis, 15(5):929–937, 1978.
R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, 2012.
V. Jain, A. Sah, and M. Sawhney. On the real Davies’ conjecture. arXiv preprint arXiv:2005.08908, 2020.
C. S. Kenney and A. J. Laub. The matrix sign function. IEEE Transactions on Automatic Control, 40(8):1330–1348, 1995.
A. Louis and S. S. Vempala. Accelerated newton iteration for roots of black box polynomials. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 732–740. IEEE, 2016.
A. N. Malyshev. Parallel algorithm for solving some spectral problems of linear algebra. Linear Algebra and Its Applications, 188:489–520, 1993.
F. Mezzadri. How to generate random matrices from the classical compact groups. arXiv preprint arXiv:math-ph/0609050, 2006.
Y. Nakatsukasa and R. W. Freund. Computing fundamental matrix decompositions accurately via the matrix sign function in two iterations: The power of Zolotarev’s functions. SIAM Review, 58(3):461–493, 2016.
Y. Nakatsukasa and N. J. Higham. Backward stability of iterations for computing the polar decomposition. SIAM Journal on Matrix Analysis and Applications, 33(2):460–479, 2012.
H. Nguyen, T. Tao, and V. Vu. Random matrices: tail bounds for gaps between eigenvalues. Probability Theory and Related Fields, 167(3-4):777–816, 2017.
V. Y. Pan and Z. Q. Chen. The complexity of the matrix eigenproblem. In Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 507–516. ACM, 1999.
B. N. Parlett. The symmetric eigenvalue problem, volume 20. SIAM, 1998.
J. D. Roberts. Linear model reduction and solution of the algebraic Riccati equation by use of the sign function. International Journal of Control, 32(4):677–687, 1980.
A. Sankar, D. A. Spielman, and S.-H. Teng. Smoothed analysis of the condition numbers and growth factors of matrices. SIAM Journal on Matrix Analysis and Applications, 28(2):446–476, 2006.
D. Shi and Y. Jiang. Smallest gaps between eigenvalues of random matrices with complex Ginibre, Wishart and universal unitary ensembles. arXiv preprint arXiv:1207.4240, 2012.
S. Smale. On the efficiency of algorithms of analysis. Bulletin (New Series) of The American Mathematical Society, 13(2):87–121, 1985.
S. Smale. Complexity theory and numerical analysis. Acta Numerica, 6:523–551, 1997.
P. Śniady. Random regularization of Brown spectral measure. Journal of Functional Analysis, 193(2):291–313, 2002.
D. A. Spielman and S.-H. Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.
J.-G. Sun. Perturbation bounds for the Cholesky and QR factorizations. BIT Numerical Mathematics, 31(2):341–352, 1991.
S. J. Szarek. Condition numbers of random matrices. Journal of Complexity, 7(2):131–149, 1991.
L. N. Trefethen and M. Embree. Spectra and pseudospectra: the behavior of nonnormal matrices and operators. Princeton University Press, 2005.
C. Van Loan. On estimating the condition of eigenvalues and eigenvectors. Linear Algebra and Its Applications, 88:715–732, 1987.
J. P. Vinson. Closest spacing of eigenvalues. arXiv preprint arXiv:1111.2743, 2011.
J. Von Neumann and H. H. Goldstine. Numerical inverting of matrices of high order. Bulletin of the American Mathematical Society, 53(11):1021–1099, 1947.
J. H. Wilkinson. Global convergence of tridiagonal QR algorithm with origin shifts. Linear Algebra and its Applications, 1(3):409–420, 1968.
T. G. Wright and L. N. Trefethen. Eigtool. Software available athttp://www.comlab.ox.ac.uk/pseudospectra/eigtool, 2002.
Acknowledgements
We thank Peter Bürgisser for introducing us to this problem, and Ming Gu, Olga Holtz, Vishesh Jain, Ravi Kannan, Pravesh Kothari, Lin Lin, Satyaki Mukherjee, Yuji Nakatsukasa, and Nick Trefethen for helpful conversations. We thank the referees for a careful reading of the paper and many helpful comments which improved it. We thank the Institute for Pure and Applied Mathematics, where part of this work was carried out.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Peter Bürgisser.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Jess Banks supported by the NSF Graduate Research Fellowship Program under Grant DGE-1752814. Nikhil Srivastava supported by NSF Grant CCF-1553751.
Appendices
A Deferred Proofs from Sect. 4
Lemma A.1
(Restatement of Lemma 4.11) Assume the matrix inverse is computed by an algorithm \(\mathsf {INV}\) satisfying the guarantee in Definition 2.7. Then \(\mathsf {G}(A) = g(A) + E\) for some error matrix E with norm
Proof
The computation of \(\mathsf {G}(A)\) consists of three steps:
-
1.
Form \(A^{-1}\) according to Definition 2.7. This incurs an additive error of \(E_{\mathsf {INV}} = \mu _{\mathsf {INV}}(n)\cdot {\textbf {u }}\cdot \kappa (A)^{c_\mathsf {INV}\log n}\Vert A^{-1}\Vert \). The result is \(\mathsf {INV}(A) = A^{-1} + E_\mathsf {INV}.\)
-
2.
Add A to \(\mathsf {INV}(A)\). This incurs an entry-wise relative error of size \({\textbf {u }}\): The result is
$$\begin{aligned}(A + A^{-1} + E_{\mathsf {INV}}) \circ (J + E_{add})\end{aligned}$$where J denotes the all-ones matrix, \(\Vert E_{add} \Vert _{max} \le {\textbf {u }}\), and where \(\circ \) denotes the entrywise (Hadamard) product of matrices.
-
3.
Divide the resulting matrix by 2, which is an exact operation in our floating-point model as we can simply decrement the exponent. The final result is
$$\begin{aligned} \mathsf {G}(A) = \frac{1}{2}(A + A^{-1} + E_{\mathsf {INV}}) \circ (J + E_{add}). \end{aligned}$$
Finally, recall that for any \(n \times n\) matrices M and E, we have the relation (14)
Putting it all together, we have
where we use \({\textbf {u }}< 1\) in the last line. \(\square \)
In what remains of this section we will repeatedly use the following simple calculus fact.
Lemma A.2
Let \(x, y >0\), then
Proof
This follows directly from the concavity of the logarithm. \(\square \)
Lemma A.3
(Restatement of Lemma 4.15)
Let \(1/800> t > 0\) and \(1/2> c > 0\) be given. Then for
we have
Proof of Lemma 4.15
An exact solution for j can be written in terms of the Lambert W-function; see [19] for further discussion and a useful series expansion. For our purposes, it is simpler to derive the necessary quantitative bound from scratch.
Immediately from the assumption \(t < 1/800\), we have \(j > \log (1/t) \ge 9\).
First let us solve the case \(c = 1/2\). We will prove the contrapositive, so assume
Then taking \(\log \) on both sides, we have
Taking \(\lg \) of both sides and applying the second inequality in Lemma A.2 with \(x=2j \log (1/t)\) and \(y=1\), using \(\lg x = 1 + \lg j + \lg \log (1/t)\), we obtain
Since \(t < 1/800\) we have \(\frac{1}{\log 2} \frac{1}{2 j \log (1/t)} < 0.01\), so
But since \(j \ge 9\), we have \(j - \lg j \ge 0.64 j\), so
which implies
Note \(K \le 1.39 \lg (1/t)\), because \(K - \lg (1/t) = \lg \lg (1/t) + 0.49 \le 0.39 \lg (1/t)\) for \(t \le 1/800\). Thus
so for the case \(c=1/2\) we conclude the proof of the contrapositive of the lemma:
For the general case, once \((1-t)^{2^j}/t^{2j} \le 1/2\), consider the effect of incrementing j on the left-hand side. This has the effect of squaring and then multiplying by \(t^{2j-2}\), which makes it even smaller. At most \(\lg \lg (1/c)\) increments are required to bring the left-hand side down to c, since \((1/2)^{2^{\lg \lg (1/c)}} = c\). This gives the value of j stated in the lemma, as desired. \(\square \)
Lemma A.4
(Restatement of Lemma 4.18) If
then
Proof of Lemma 4.18
We aim to provide a slightly cleaner sufficient condition on N than the current condition
Repeatedly using Lemma A.2, as well as the cruder fact \(\lg \lg (ab) \le \lg \lg a + \lg \lg b\) provided \(a, b \ge 4\), we have
where in the last line we use the assumption \(s < 1/100\). Similarly,
Thus, a sufficient condition is
\(\square \)
B Analysis of \(\mathsf {SPLIT}\)
Although it has many potential uses in its own right, the purpose of the approximate matrix sign function in our algorithm is to split the spectrum of a matrix into two roughly equal pieces, so that approximately diagonalizing A may be recursively reduced to two sub-problems of smaller size.
First, we need a lemma ensuring that a shattered pseudospectrum can be bisected by a grid line with at least n/5 eigenvalues on each side.
Lemma B.1
Let A have \(\epsilon \)-pseudospectrum shattered with respect to some grid \(\mathsf {g}\). Then there exists a horizontal or vertical grid line of \(\mathsf {g}\) partitioning \(\mathsf {g}\) into two grids \(\mathsf {g}_\pm \), each containing at least \(\min \{n/5,1\}\) eigenvalues.
Proof
We will view \(\mathsf {g}\) as a \(s_1 \times s_2\) array of squares. Write \(r_1,r_2,...,r_{s_1}\) for the number of eigenvalues in each row of the grid. Either there exists \(1 \le i < s_2\) such that \(r_1 + \cdots + r_i \ge n/5\) and \(r_{i+1} + \cdots + r_{s_1} \ge n/5\)—in which case we can bisect at the grid line dividing the ith from \((i+1)\)st rows—or there exists some i for which \(r_{i} \ge 3/5\). In the latter case, we can always find a vertical grid line so that at least n/5 of the eigenvalues in the ith row are on each of the left and right sides. Finally, if \(n\le 5\), we may trivially pick a grid line to bisect along so that both sides contain at least one eigenvalue. \(\square \)
Proof of Theorem 5.2
The main observation is that, given any matrix A, we can determine how many eigenvalues are on either side of any horizontal or vertical line by approximating the sign function of a shift of the matrix. To be precise, in exact arithmetic \(\mathrm {Tr}\,\mathrm {sgn}(A - h) = n_+ - n_-\), where \(n_\pm \) are the eigenvalue counts for A on either side of the line \({\text {Re}}z = h\). We will now show that under the shattered pseudospectrum assumption, one can exactly compute \(n_+-n_-\) using the advertised precision.
Running \(\mathsf {SGN}\) to a final accuracy of \(\beta \),
It remains to control \(\Vert \mathrm {sgn}(M)\Vert \) and quantify the distance between \(\mathrm {sgn}(M) = \mathrm {sgn}(A-h+E_2)\) and \(\mathrm {sgn}(A-h)\). We first do the latter. Since we need only to modify the diagonal entries of A when creating M, the incurred diagonal error matrix \(E_2\) has norm at most \({\textbf {u }}\max _i |A_{i,i} - h|\). Using \(|A_{i,i}| \le \Vert A\Vert \le 4\) and \(|h| \le 4\), the fact that \({\textbf {u }}\le \epsilon /100 n \le \epsilon /16\) ensures that the \(\epsilon /2\)-pseudospectrum of M will still be shattered with respect to \(\mathsf {g}\). We can then form \(\mathrm {sgn}(A-h)\) and \(\mathrm {sgn}(M)\) by integrating around the boundary of the portions of \(\mathsf {g}\) on either side of the line \({\text {Re}}z = h\), then using the resolvent identity as in Sect. 4, and the fact that \(\Lambda _\epsilon (A)\) and \(\Lambda _{\epsilon /2}(M)\) are shattered we get
where in the last inequality we have used that \(\mathsf {g}\) has side lengths of at most 8 and \(\Vert E_2\Vert \le 8 {\textbf {u }}\).
Now, using the contour integral again and the shattered pseudospectrum assumption
Combining the above bounds we get a a total additive error of \(n(\beta + \beta {\textbf {u }}+ 8{\textbf {u }}/\epsilon )+\frac{128 {\textbf {u }}}{\epsilon ^2}\) in computing the trace of the sign function. If \(\beta \le 0.1/n\) and \({\textbf {u }}\le \min \{\epsilon /100n, \frac{\epsilon ^2}{512}\), this error will strictly be less than 0.5 and we can round \(\mathrm {Tr}\, \mathsf {SGN}(A - h)\) to the nearest real integer. Horizontal bisections work similarly, with \(iA - h\) instead.
Now that we have shown that it is possible to compute \(n_{+}-n_-\) exactly, recall that from the above discussion, the \(\epsilon /2\)-pseudospectrum of M will still be shattered with respect to the translation of the original grid \(\mathsf {g}\). Using Lemma 4.10 and the fact that \({{\,\mathrm{diam}\,}}(\mathsf {g})^2 = 128\), we can safely call \(\mathsf {SGN}\) with parameters \(\epsilon _0 = \epsilon /4\) and
Plugging these in to the Theorem 4.9 (\(\epsilon < 1/2\) so \(1-\alpha _0 \le 1/100\), and \(\beta \le 0.05/n \le 1/12\) so the hypotheses are satisfied) for final accuracy \(\beta \) a sufficient number of iterations is
In the course of these binary searches, we make at most \(\lg s_1 s_2\) calls to \(\mathsf {SGN}\) at accuracy \(\beta \). These require at most
arithmetic operations. In addition, creating M and computing the trace of the approximate sign function cost us \(O(n \lg s_1s_2)\) scalar addition operations. We are assuming that \(\mathsf {g}\) has side lengths at most 8, so \(\lg s_1 s_2 \le 12\lg 1/\omega (\mathsf {g})\). Combining all of this with the runtime analysis and machine precision of \(\mathsf {SGN}\) appearing in Theorem 4.9, we obtain
\(\square \)
C Analysis of \(\mathsf {DEFLATE}\)
The algorithm \(\mathsf {DEFLATE}\), defined in Sect. 5, can be viewed as a small variation of the randomized rank revealing algorithm introduced in [22] and revisited subsequently in [7]. Following these works, we will call this algorithm \(\mathsf {RURV}\).
Roughly speaking, in finite arithmetic, \(\mathsf {RURV}\) takes a matrix A with \(\sigma _r(A)/\sigma _{r+1}(A) \gg 1\), for some \(1\le r \le n-1\), and finds nearly unitary matrices U, V and an upper triangular matrix R such that \( URV\approx A\). Crucially, R has the block decomposition
where \(R_{11} \in {\mathbb {C}}^{r\times r}\) has smallest singular value close to \(\sigma _r(A)\), and \(R_{22}\) has largest singular value roughly \(\sigma _{r+1}(A)\). We will use and analyze the following implementation of \(\mathsf {RURV}\).
As discussed in Sect. 5, we hope to use \(\mathsf {DEFLATE}\) to approximate the range of a projector P with rank \(r<n\), given an approximation \({\widetilde{P}}\) close to P in operator norm. We will show that from the output of \(\mathsf {RURV}({\widetilde{P}})\) we can obtain a good approximation to such a subspace. More specifically, under certain conditions, if \((U, R) = \mathsf {RURV}({{\widetilde{P}}})\), then the first r columns of U carry all the information we need. For a formal statement see Proposition C.12 and Proposition C.18 below.
Since it may be of broader use, we will work in somewhat greater generality, and define the subroutine \(\mathsf {DEFLATE}\) which receives a matrix A and an integer r and returns a matrix \(S \in {\mathbb {C}}^{n\times r}\) with nearly orthonormal columns. Intuitively, if A is diagonalizable, then under the guarantee that r is the smallest integer k such that \(\sigma _k(A)/\sigma _{k+1}(A) \gg 1\), the columns of the output S span a space close to the span of the top r eigenvectors of A. Our implementation of \(\mathsf {DEFLATE}\) is as follows.
Throughout this section we use \(\mathrm {rurv}(\cdot )\) and \(\mathrm {deflate}(\cdot , \cdot ) \) to denote the exact arithmetic versions of \(\mathsf {RURV}\) and \(\mathsf {DEFLATE}\), respectively. In Sect. C.1 we present a random matrix result that will be needed in the analysis of \(\mathsf {DEFLATE}\). In Sect. C.3 we state the properties of \(\mathsf {RURV}\) that will be needed. Finally in Sects. C.4 and C.5 we prove the main guarantees of \(\mathrm {deflate}\) and \(\mathsf {DEFLATE}\), respectively, that are used throughout this paper.
1.1 C.1 Smallest Singular Value of the Corner of a Haar Unitary
We recall the defining property of the Haar measure on the unitary group:
Definition C.1
A random \(n\times n\) unitary matrix V is Haar-distributed if, for any other unitary matrix W, VW and WV are Haar-distributed as well.
For short, we will often refer to such a matrix as a Haar unitary.
Let \(n > r\) be positive integers. In what follows we will consider an \(n\times n\) Haar unitary matrix V and denote by X its upper-left \(r \times r\) corner. The purpose of the present subsection is to derive a tail bound for the random variable \(\sigma _{r}(X)\). We begin by showing a fact that allows us to reduce our analysis to the case when \(r\le n/2\).
Observation C.2
Let \(n> r>0\) and \(V\in {\mathbb {C}}^{n\times n}\) be a unitary matrix and denote by \(V_{11}\) and \(V_{22}\) its upper-left \(r\times r\) corner and its lower-right \((n-r)\times (n-r)\) corner, respectively. If \(r\ge n/2\), then \(2r-n\) of the singular values of \(V_{11}\) are equal to 1, while the remaining \(n-r\) are equal to those of \(V_{22}\).
Proof
Decompose V as follows
Since V is unitary \(VV^*=I_n\), and looking at the upper-left corner of this equation we get \(V_{11} V_{11}^*+V_{12}V_{12}^*=I_r\). Then, since \(V_{11}V_{11}^*=I_r-V_{12}V_{12}^*\), we have \(\Lambda (V_{11}V_{11}^*) = 1 -\Lambda (V_{12}V_{12}^*)\).
Now, looking at the lower-right corner of the equation \(V^*V=I_n\) we get \(V_{12}^*V_{12}+V_{22}^*V_{22}=I_{n-r}\) and hence \(\Lambda (V_{22}^*V_{22})=1-\Lambda (V_{12}^*V_{12})\).
Now recall that for any two matrices X and Y, the symmetric difference of the sets \(\Lambda (XY)\) and \(\Lambda (YX)\) is \(\{0\}\), with multiplicity equal to the difference between the dimensions. Hence \(\Lambda (V_{12} V_{12}^*) = \Lambda (V_{12}^* V_{12})\cup \{0\}\) where the multiplicity of 0 is \(r-(n-r)=2r-n\). Combining this with \(\Lambda (V_{11}V_{11}^*) = 1 -\Lambda (V_{12}V_{12}^*)\) and \(\Lambda (V_{22}^*V_{22})=1-\Lambda (V_{12}^*V_{12})\) we get the desired result. \(\square \)
Proposition C.3
(\(\sigma _{\min }\) of a submatrix of a Haar unitary) Let \(n> r>0\) and let V be an \(n\times n\) Haar unitary. Let X be the upper left \(r\times r\) corner of V. Then, for all \(\theta \in (0, 1]\)
In particular, for every \(\theta \in (0, 1]\) we have
This exact formula for the CDF of the smallest singular value of X is remarkably simple, and we have not seen it anywhere in the literature. It is an immediate consequence of substantially more general results of Dumitriu [28], from which one can extract and simplify the density of \(\sigma _r(X)\). We will begin by introducing the relevant pieces of [28], deferring the final proof until the end of this subsection.
Some of the formulas presented here are written in terms of the generalized hypergeometric function which we denote by \({}_2 F_1^\beta (a, b; c; (x_1, \dots , x_m)).\) For our application it is sufficient to know that
whenever \(c >0\) and \({}_2F_1\) is well defined. The above equation can be derived directly from the definition of \({}_2F_1^\beta \) (see Definition 13.1.1 in [32] or Definition 2.2 in [28]).
The generic results in [28] concern the \(\beta \)-Jacobi random matrices, which we have no cause here to define in full. Of particular use to us will be [28, Theorem 3.1], which expresses the density of the smallest singular value of such a matrix in terms of the generalized hypergeometric function:
Theorem C.4
([28]) The density of the probability distribution of the smallest eigenvalue \(\lambda \), of the \(\beta \)-Jacobi ensembles of parameters a, b and size m, which we denote by \(f_{\lambda _{\min }}(\lambda )\) , is given by
for some normalizing constant \(C_{\beta , a, b, m}\).
For a particular choice of parameters, the above theorem can be applied to describe the distribution of \(\sigma _{r}^2(X)\). The connection between singular values of corners of Haar unitary matrices and \(\beta \)-Jacobi ensembles is the content of [31, Theorem 1.5], which we rephrase below to match our context.
Theorem C.5
([31]) Let V be an \(n\times n\) Haar unitary matrix and let \(r\le \frac{n}{2}\). Let X be the \(r\times r\) upper-left corner of V. Then, the eigenvalues of \(XX^*\) distribute as the eigenvalues of a \(\beta \)-Jacobi matrix of size r with parameters \(\beta =2, a=0\) and \(b=n-2r\).
In view of the above result, Theorem C.4 gives a formula for the density of \(\sigma _{r}^2(X)\).
Corollary C.6
(Density of \(\sigma _{r}^2(X)\)) Let V be an \(n\times n\) Haar unitary and X be its upper-left \(r\times r\) corner with \(r<n\), then \(\sigma _{r}^2(X)\) has the following density
Proof
If \(r > n/2\), since we care only about the smallest singular value of X, we can use Observation C.2 to analyse the \((n-r)\times (n-r)\) lower right corner of V instead. Hence, we can assume without loss of generality that \(r\le n/2\). Now, substitute \(\beta =2, a=0, b=n-2r, m=r\) in Theorem C.4 and observe that in this case
where the last equality follows from (50). Using the relation between the distribution of \(\sigma _{r}^2(X)\) and the distribution of the minimum eigenvalue of the respective \(\beta \)-Jacobi ensemble described in Theorem C.5 we have \(f_{\sigma ^2_{r}}(x) = f_{\lambda _{\min }}(x)\). By integrating on [0, 1] the right side of (53) we find \(C= r(n-r)\). \(\square \)
Proof of Proposition C.3
From (52) we have that
from where (48) follows. To prove (49) note that \(g(t):=(1-t)^{r(n-r)}\) is convex in [0, 1], and hence \(g(t) \ge g(0)+tg'(0)\) for every \(t\in [0, 1]\). \(\square \)
1.2 C.2 Sampling Haar Unitaries in Finite Precision
It is a well-known fact that Haar unitary matrices can be numerically generated from complex Ginibre matrices. We refer the reader to [30, Section 4.6] and [47] for a detailed discussion. In this subsection we carefully analyze this process in finite arithmetic.
The following fact (see [47, Section 5]) is the starting point of our discussion.
Lemma C.7
(Haar from Ginibre) Let \(G_n\) be a complex \(n\times n\) Ginibre matrix and \(U, R\in {\mathbb {C}}^{n\times n}\) be defined implicitly, as a function of \(G_n\), by the equation \(G_n = UR\) and the constraints that U is unitary and R is upper-triangular with nonnegative diagonal entries.Footnote 12 Then, U is Haar-distributed in the unitary group.
The above lemma suggests that \(\mathsf {QR}(\cdot )\) can be used to generate random matrices that are approximately Haar unitaries. While doing this, one should keep in mind that when working with finite arithmetic, the matrix \(\widetilde{G_n}\) passed to \(\mathsf {QR}\) is not exactly Ginibre-distributed, and the algorithm \(\mathsf {QR}\) itself incurs roundoff errors.
Following the discussion in Sect. 2.4 we can assume that we have access to a random matrix \({\widetilde{G}}_n\), with
where \(G_n\) is a complex \(n\times n\) Ginibre matrix and \(E\in {\mathbb {C}}^{n\times n}\) is an adversarial perturbation whose entries are bounded by \(\frac{1}{\sqrt{n}} c_{\mathsf {N}}{{\textbf {u}}}\). Hence, we have \(\Vert E\Vert \le \Vert E \Vert _F \le \sqrt{n} c_{\mathsf {N}}{\textbf {u }}\).
In what follows we use \(\mathrm {QR}(\cdot )\) to denote the exact arithmetic version of \(\mathsf {QR}(\cdot )\). Furthermore, we assume that for any \(A\in {\mathbb {C}}^{n\times n}\), \(\mathrm {QR}(A)\) returns a pair (U, R) with the property that R has nonnegative entries on the diagonal. Since we want to compare \(\mathrm {QR}(G_n)\) with \(\mathsf {QR}({\widetilde{G}}_n)\) it is necessary to have a bound on the condition number of the QR decomposition. For this, we cite the following consequence of a result of Sun [60, Theorem 1.6]:
Lemma C.8
(Condition number for the QR decomposition [60]) Let \(A, E\in {\mathbb {C}}^{n\times n}\) with A invertible. Furthermore assume that \(\Vert E\Vert \Vert A^{-1}\Vert \le \frac{1}{2}\). If \((U, R) = \mathrm {QR}(A)\) and \(({\widetilde{U}}, {\widetilde{R}}) = \mathrm {QR}(A+E) \), then
We are now ready to prove the main result of this subsection. As in the other sections devoted to finite arithmetic analysis, we will assume that \({\textbf {u }}\) is small compared to \(\mu _{\mathsf {QR}}(n)\); precisely, let us assume that
Proposition C.9
(Guarantees for finite-arithmetic Haar unitary matrices) Suppose that \(\mathsf {QR}\) satisfies the assumptions in Definition 2.8 and that it is designed to output upper triangular matrices with nonnegative entries on the diagonal.Footnote 13 If \((V, R )=\mathsf {QR}(\widetilde{G_n})\), then there is a Haar unitary matrix U and a random matrix E such that \({\widetilde{V}} = U+E\). Moreover, for every \(1>\alpha > 0\) and \( t > 2\sqrt{2}+1\) we have
Proof
From our Gaussian sampling assumption, \({\widetilde{G}}_n = G_n +E\) where \(\Vert E\Vert \le \sqrt{n} c_{\mathsf {N}}{\textbf {u }}\). Also, by the assumptions on \(\mathsf {QR}\) from Definition 2.8, there are matrices \(\widetilde{\widetilde{G_n}}\) and \({\widetilde{V}}\) such that \(({\widetilde{V}}, R) = \mathrm {QR}(\widetilde{\widetilde{G_n}})\), and
The latter inequality implies, using (54), that
Let \((U, R') := \mathrm {QR}(G_n)\). From Lemma C.7 we know that U is Haar-distributed on the unitary group, so using (55) and Lemma C.8, and the fact that \(\Vert M \Vert \le \Vert M \Vert _F \le \sqrt{n}\Vert M \Vert \) for any \(n \times n\) matrix M, we know that
Now, from \(\Vert G_n^{-1}\Vert = 1/\sigma _{n}(G_n)\) and from Theorem 3.1 we have that
On the other hand, from Lemma 2.2 of [9] we have \(P\left[ \left\| G_n\right\| > 2\sqrt{2}+ t \right] \le e^{-n t^2} \). Hence, under the events \(\Vert G_n^{-1}\Vert \le \frac{n}{\alpha }\) and \(\Vert G_n\Vert \le 2\sqrt{2}+t\), inequality (56) yields
Finally, if \(t>2\sqrt{2}+1\) we can exchange the term \(2\sqrt{2}+t+1\) for 2t in the bound. Then, using a union bound we obtain the advertised guarantee. \(\square \)
1.3 C.3 Preliminaries of \(\mathsf {RURV}\)
Let \(A\in {\mathbb {C}}^{n\times n}\) and \((U, R) =\mathrm {rurv}(A)\). As will become clear later, in order to analyze \(\mathsf {DEFLATE}(A, r)\) it is of fundamental importance to bound the quantity \(\Vert R_{22}\Vert \), where \(R_{22}\) is the lower-right \((n-r)\times (n-r)\) block of R. To this end, it will suffice to use Corollary C.11 below, which is the complex analog to the upper bound given in equation (4) of [7, Theorem 5.1]. Actually, Corollary C.11 is a direct consequence of Lemma 4.1 in the aforementioned paper and Proposition C.3 proved above. We elaborate below.
Lemma C.10
([7]) Let \(n>r>0\), \(A\in {\mathbb {C}}^{n\times n}\) and \(A = P \Sigma Q^*\) be its singular value decomposition. Let \((U, R) = \mathrm {rurv}(A)\), \(R_{22}\) be the lower right \((n-r)\times (n-r)\) corner of R, and V be such that \(A = URV\). Then, if \(X = Q^* V^*\),
where \(X_{11}\) is the upper left \(r\times r\) block of X.
This lemma reduces the problem to obtaining a lower bound on \(\sigma _{r}(X_{11})\). But, since V is a Haar unitary matrix by construction and \(X= Q^*V\) with \(Q^*\) unitary, we have that X is distributed as a Haar unitary. Combining Lemma C.10 and Proposition C.3 gives the following result.
Corollary C.11
Let \(n> r>0\), \(A\in {\mathbb {C}}^{n\times n}\), \((U, R)= \mathrm {rurv}(A)\) and \(R_{22}\) be the lower right \((n-r)\times (n-r)\) corner of R. Then for any \(\theta > 0\)
1.4 C.4 Exact Arithmetic Analysis of \(\mathsf {DEFLATE}\)
It is a standard consequence of the properties of the QR decomposition that if A is a matrix of rank r, then almost surely \(\mathrm {deflate}(A, r)\) is a \(n\times r\) matrix with orthonormal columns that span the range of A. As a warm-up let’s recall this argument.
Let \((U, R) = \mathrm {rurv}(A)\) and V be the unitary matrix used by the algorithm to produce this output. Since we are working in exact arithmetic, V is a Haar unitary matrix, and hence it is almost surely invertible. Therefore, with probability 1 we have that \(\mathrm {rank}(AV^*) = r\) and that the first r columns of \(AV^*\) are linearly independent, so since UR is the QR decomposition of \(AV^*\), almost surely, \(R_{22} =0\) and \(R_{11} \in {\mathbb {C}}^{r\times r}\), where \(R_{11}\) and \(R_{22}\) are as in (47). Writing
for the block decomposition of U with \(U_{11} \in {\mathbb {C}}^{r\times r}\), note that
On the other hand, almost surely the first r columns of \(AV^*\) span the range of A. Using the right side of Eq. (57) we see that this subspace also coincides with the span of the first r columns of U, since \(R_{11}\) is invertible.
We will now prove a robust version of the above observation for a large class of matrices, namely those A for which \(\mathrm {rank}(A) = \mathrm {rank}(A^2)\).Footnote 14 We make this precise below and defer the proof to the end of the subsection.
Proposition C.12
(Main guarantee for \(\mathrm {deflate}\)) Let \(\beta > 0\) and \(A, {\widetilde{A}}\in {\mathbb {C}}^{n\times n}\) be such that \(\Vert A-{\widetilde{A}}\Vert \le \beta \) and \(\mathrm {rank}(A)= \mathrm {rank}(A^2) =r\). Denote \(S := \mathrm {deflate}({\widetilde{A}}, r)\) and \(T := \mathrm {deflate}(A, r)\). Then, for any \(\theta \in (0, 1) \), with probability \(1-\theta ^2\) there exists a unitary \(W\in {\mathbb {C}}^{r\times r}\) such that
Remark C.13
(The projector case) In the case in which the matrix A of Proposition C.12 is a (not necessarily orthogonal) projector, \(T^*AT = I_r\), and the \(\sigma _r\) term in the denominator of (58) becomes a 1.
We begin by recalling a result about the stability of singular values which will be important throughout this section. This fact is a consequence of Weyl’s inequalities; see for example [42, Theorem 3.3.16] .
Lemma C.14
(Stability of singular values) Let \(X, E \in {\mathbb {C}}^{n\times n}\). Then, for any \(k=1, \dots , n\) we have
We now show that the orthogonal projection \(P:=\mathrm {deflate}({\widetilde{A}}, r)\mathrm {deflate}({\widetilde{A}}, r)^*\) is close to a projection onto the range of A, in the sense that \(P A \approx A\).
Lemma C.15
Let \(\beta > 0\) and \(A, {\widetilde{A}}\in {\mathbb {C}}^{n\times n}\) be such that \(\mathrm {rank}(A)=r\) and \(\Vert A-{\widetilde{A}}\Vert \le \beta \). Let \((U, R) := \mathrm {rurv}({\widetilde{A}})\) and \(S := \mathrm {deflate}({\widetilde{A}}, r)\). Then, almost surely
where \(R_{22}\) is the lower right \((n-r)\times (n-r)\) block of R.
Proof
We will begin by showing that \(\Vert (SS^* -I_n) {\widetilde{A}}\Vert \) is small. Let V be the unitary matrix that was used to generate (U, R). As \(\mathrm {deflate}(\cdot , \cdot )\) outputs the first r columns of U, we have the block decomposition \(U = \begin{pmatrix} S&U' \end{pmatrix}\), where \(S \in {\mathbb {C}}^{n\times r}\) and \(U' \in {\mathbb {C}}^{n\times (n-r)}\).
On the other hand, we have \({\widetilde{A}} = U RV\), so
Since \(\Vert U'\Vert = \Vert V\Vert =1\) from the above equation we get \(\Vert (SS^*-I_n){\widetilde{A}}\Vert \le \Vert R_{22}\Vert \). Now we can conclude that
\(\square \)
The inequality (59) can be applied to quantify the distance between the ranges of \(\mathrm {deflate}({\widetilde{A}}, r)\) and \(\mathrm {deflate}(A, r)\) in terms of \(\Vert R_{22}\Vert \), as the following result shows.
Lemma C.16
(Bound in terms of \(\Vert R_{22}\Vert \)) Let \(\beta > 0\) and \(A, {\widetilde{A}}\in {\mathbb {C}}^{n\times n}\) be such that \(\mathrm {rank}(A)= \mathrm {rank}(A^2) =r\) and \(\Vert A-{\widetilde{A}}\Vert \le \beta \). Denote by \((U, R) :=\mathrm {rurv}({\widetilde{A}})\), \(S := \mathrm {deflate}({\widetilde{A}}, r)\) and \(T := \mathrm {deflate}(A, r)\). Then, almost surely there exists a unitary \(W\in {\mathbb {C}}^{r\times r}\) such that
where \(R_{22}\) is the lower right \((n-r)\times (n-r)\) block of R.
Proof
From Lemma C.15 we know that almost surely \(\Vert (SS^*-I_n)A\Vert \le \Vert R_{22}\Vert + \beta \). We will use this to show that \(\Vert T^*SS^*T- I_r\Vert \) is small, which can be interpreted as \(S^*T\) being close to unitary. First note that
Now, since \(\mathrm {rank}(A) = \mathrm {rank}(A^2)\), if \(w \in \mathrm {range}(A)\) then \(w = Av\) for some \(v\in \mathrm {range}(A)\). So by the Courant–Fischer formula
We can then revisit (61) and get
On the other hand, \(\Vert T^*(SS^*-I_r) A T \Vert \le \Vert (SS^*-I_r) A\Vert \le \Vert R_{22}\Vert +\beta \), so combining this fact with (61) and (62) we obtain
Now define \(X:= S^*T\), \(\beta ' :=\frac{\Vert R_{22}\Vert +\beta }{\sigma _r(T^*AT)} \) and let \(X= W | X|\) be the polar decomposition of X. Observe that
Thus \(\Vert S^*T- W\Vert = \Vert X-W\Vert = \Vert (|X|-I_n) W \Vert \le \beta '.\) Finally note that
which concludes the proof. \(\square \)
Note that so far our results have been deterministic. The possibility of failure of the guarantee given in Proposition C.12 comes from the non-deterministic bound on \(\Vert R_{22}\Vert \).
Proof of Proposition C.12
From Lemma C.14 we have \(\sigma _{r+1}({\widetilde{A}})\le \beta \). Now combine Lemma C.16 with Corollary C.11. \(\square \)
1.5 C.5 Finite Arithmetic Analysis of \(\mathsf {DEFLATE}\)
In what follows we will have an approximation \({\widetilde{A}}\) of a matrix A of rank r with the guarantee that \(\Vert A-{\widetilde{A}}\Vert \le \beta \).
For the sake of readability we will not present optimal bounds for the error induced by roundoff, and we will assume that
We begin by analyzing the subroutine \(\mathsf {RURV}\) in finite arithmetic. This was done in [22, Lemma 5.4]. Here we make the constants arising from this analysis explicit and take into consideration that Haar unitary matrices cannot be exactly generated in finite arithmetic.
Lemma C.17
(\(\mathsf {RURV}\) analysis) Assume that \(\mathsf {QR}\) and \(\mathsf {MM}\) satisfy the guarantees in Definitions 2.6 and 2.8. Also suppose that the assumptions in (63) hold. Then, if \((U, R) := \mathsf {RURV}(A)\) and V is the matrix used to produce such output, there are unitary matrices \({\widetilde{U}}, {\widetilde{V}}\) and a matrix \({\widetilde{A}}\) such that \({\widetilde{A}} = {\widetilde{U}} R {\widetilde{V}} \) and the following guarantees hold:
-
1.
\(\Vert U-{\widetilde{U}}\Vert \le \mu _\mathsf {QR}(n) {{\textbf {u}}}\).
-
2.
\({\widetilde{V}}\) is Haar-distributed in the unitary group.
-
3.
For every \(1> \alpha >0\) and \(t >2 \sqrt{2}+1\), the event:
$$\begin{aligned} \Vert {\widetilde{V}}-V\Vert< & {} \frac{8 t n^{\frac{3}{2}}}{\alpha } c_{\mathsf {N}}\mu _\mathsf {QR}(n) {\textbf {u }}+ \frac{10 n^2}{\alpha } {\textbf {u }}\quad \mathrm {and} \quad \Vert A-{\widetilde{A}}\Vert < \Vert A\Vert \nonumber \\&\quad \left( \frac{9 t n^{\frac{3}{2}}}{\alpha } c_{\mathsf {N}}\mu _\mathsf {QR}(n) {\textbf {u }}+ 2 \mu _\mathsf {MM}(n){\textbf {u }}+ \frac{10n^2}{\alpha }c_{\mathsf {N}}{\textbf {u }}\right) \end{aligned}$$(64)occurs with probability at least \(1- 2e \alpha ^2- 2e^{-t^2 n}\).
Proof
By definition \(V=\mathsf {QR}(\widetilde{G_n})\) with \({\widetilde{G}}_n = G_n+E\), where \(G_n\) is an \(n\times n\) Ginibre matrix and \(\Vert E\Vert \le \sqrt{n} {\textbf {u }}\). A direct application of the guarantees on each step yields the following:
-
1.
From Proposition C.9, we know that there is a Haar unitary \({\widetilde{V}}\) and a random matrix \(E_0\), such that \(V = {\widetilde{V}}+E_0\) and
$$\begin{aligned} {\mathbb {P}}\left[ \Vert E_0\Vert < \frac{8 t n^{\frac{3}{2}}}{\alpha } c_{\mathsf {N}}\mu _\mathsf {QR}(n) {\textbf {u }}+ \frac{10 n^2}{\alpha } c_{\mathsf {N}}{\textbf {u }}\right] \ge 1-2e\alpha ^2 -2e^{-t^2 n}. \end{aligned}$$(65) -
2.
If \(B:= \mathsf {MM}(A, V^*) = AV^* +E_1\), then from the guarantees for \(\mathsf {MM}\) we have \(\Vert E_1\Vert \le \Vert A\Vert \Vert V\Vert \mu _\mathsf {MM}(n){{\textbf {u}}}\). Now from the guarantees for \(\mathsf {QR}\) we know that V is \(\mu _\mathsf {QR}(n) {\textbf {u }}\) away from a unitary, and hence
$$\begin{aligned} \Vert V\Vert \mu _\mathsf {MM}(n) {\textbf {u }}\le (1+\mu _\mathsf {QR}(n){\textbf {u }}) \mu _\mathsf {MM}(n) {\textbf {u }}\le \frac{5}{4}\mu _\mathsf {MM}(n){\textbf {u }}\end{aligned}$$where the last inequality follows from the assumptions in (63). This translates into
$$\begin{aligned} \Vert B\Vert \le \Vert A\Vert \Vert V\Vert + \Vert E_1\Vert \le (1+ \mu _\mathsf {QR}(n){\textbf {u }}) \Vert A\Vert + \Vert E_1\Vert \le \frac{5}{4}\Vert A\Vert + \Vert E_1\Vert . \end{aligned}$$Putting the above together and using (63) again, we get
$$\begin{aligned} \Vert E_1\Vert \le \frac{5}{4} \Vert A\Vert \mu _\mathsf {MM}(n) {{\textbf {u}}} \quad \text {and} \quad B \le \frac{5}{4}\Vert A\Vert (1+ \mu _\mathsf {MM}(n){{\textbf {u}}}) < 2\Vert A\Vert . \end{aligned}$$(66) -
3.
Let \((U, R) = \mathsf {QR}(B)\). Then there is a unitary \({\widetilde{U}}\) and a matrix \({\widetilde{B}}\) such that \(U= {\widetilde{U}} + E_2\), \(B = {\widetilde{B}}+E_3\), and \({\widetilde{B}} = {\widetilde{U}} R\), with error bounds \(\Vert E_2\Vert \le \mu _\mathsf {QR}(n) {{\textbf {u}}}\) and \(\Vert E_3\Vert \le \Vert B\Vert \mu _\mathsf {QR}(n) {{\textbf {u}}}\). Using (66) we obtain
$$\begin{aligned} \Vert E_3\Vert \le \Vert B\Vert \mu _\mathsf {QR}(n) {{\textbf {u}}} < 2\Vert A\Vert \mu _\mathsf {QR}(n) {{\textbf {u}}}. \end{aligned}$$(67) -
4.
Finally, define \({\widetilde{A}} := {\widetilde{B}} {\widetilde{V}}\). Note that \({\widetilde{A}} = {\widetilde{U}} R {\widetilde{V}}\) and
$$\begin{aligned}&{\widetilde{A}} = {\widetilde{B}} {\widetilde{V}} = (B-E_3) {\widetilde{V}} = (AV^* +E_1-E_3) {\widetilde{V}} = (A({\widetilde{V}}+E_0)^*+E_1-E_3 ){\widetilde{V}} \\&\quad = A + (AE_0^*+E_1-E_3){\widetilde{V}}, \end{aligned}$$which translates into
$$\begin{aligned} \Vert A-{\widetilde{A}}\Vert \le \Vert A\Vert \Vert E_0\Vert + \Vert E_1\Vert + \Vert E_3\Vert . \end{aligned}$$Hence, on the event described in the left side of (65), we have
$$\begin{aligned} \Vert A- {\widetilde{A}}\Vert \le \Vert A\Vert \left( \frac{8t n^{\frac{3}{2}}}{\alpha } c_{\mathsf {N}}\mu _\mathsf {QR}(n) {\textbf {u }}+ \frac{10 n^2}{\alpha } c_{\mathsf {N}}{\textbf {u }}+\frac{5}{4}\mu _\mathsf {MM}(n){\textbf {u }}+2 \mu _\mathsf {QR}(n){\textbf {u }}\right) , \end{aligned}$$and using some crude bounds, the above inequality yields the advertised bound.
\(\square \)
We can now prove a finite arithmetic version of Proposition C.12.
Proposition C.18
(Main guarantee for \(\mathsf {DEFLATE}\))
Let \(n> r \) be positive integers, and let \(\beta ,\theta > 0\) and \(A, {\widetilde{A}}\in {\mathbb {C}}^{n\times n}\) be such that \(\Vert A-{\widetilde{A}}\Vert \le \beta \) and \(\mathrm {rank}(A)= \mathrm {rank}(A^2) =r\). Let \(S := \mathsf {DEFLATE}({\widetilde{A}}, r)\) and \(T := \mathrm {deflate}(A, r)\). If \(\mathsf {QR}\) and \(\mathsf {MM}\) satisfy the guarantees in Definitions 2.6 and 2.8, and (63) holds, then, for every \(t > 2\sqrt{2}+1\) there exist a unitary \(W\in {\mathbb {C}}^{r\times r}\) such that
with probability at least \(1-7\theta ^2- 2e^{-t^2 n}\).
Proof
Let \((U, R) = \mathsf {RURV}({\widetilde{A}})\). From Lemma C.17 we know that there exist \({\widetilde{U}}, \widetilde{{\widetilde{A}}} \in {\mathbb {C}}^{n\times n}\), such that \(\Vert U-{\widetilde{U}} \Vert \) and \(\Vert {\widetilde{A}}- \widetilde{{\widetilde{A}}}\Vert \) are small, and \(({\widetilde{U}}, R) = \mathrm {rurv}(\widetilde{{\widetilde{A}}})\) for the respective realization of an exact Haar unitary matrix. Then, from \(\Vert {\widetilde{A}}\Vert \le \Vert A\Vert + \beta \) and (64), for every \(1> \alpha >0\) and \(t > 2 \sqrt{2}+1\) we have
with probability \(1-2e\alpha ^2-2e^{-t^2 n}\).
Now, from (63) we have \({\textbf {u }}\le \beta \le \frac{1}{4}\) and \(c_{\mathsf {N}}\Vert A\Vert \mu {\textbf {u }}\le \beta \) for \(\mu = \mu _\mathsf {QR}(n), \mu _\mathsf {MM}(n)\), so we can bound the respective terms in (69) by \(\beta \):
where the last crude bound uses \(1 \le n^{\frac{3}{2}}\le n^2, 1+\beta \le \frac{5}{4}\) and \(t > 2\).
Observe that \({\widetilde{S}} = \mathrm {deflate}(\widetilde{{\widetilde{A}}}, r)\) is the matrix formed by the first r columns of \({\widetilde{U}}\), and that by Proposition C.12 we know that for every \(\theta > 0\), with probability \(1-\theta ^2\) there exists a unitary W such that
On the other hand, S is the matrix formed by the first r columns of U. Hence
Putting the above together we get that under this event
Now, taking \(\alpha = \theta \), we note that both events in (69) and (71) happen with probability at least \(1-(2e+1)\theta ^2-2e^{-t^2 n}\). The result follows from replacing the constant \(2e+1\) with 7, using \(t> 2\sqrt{2}+1\) and replacing \(8(12t+16)\) with 144t, and combining the inequalities (69), (70) and (72). \(\square \)
We end by proving Theorem 5.3, the guarantees on \(\mathsf {DEFLATE}\) that we will use when analyzing the main algorithm.
Proof of Theorem 5.3
As Remark C.13 points out, in the context of this theorem we are passing to \(\mathsf {DEFLATE}\) an approximate projector \({\widetilde{P}}\), and the above result simplifies. Using this fact, as well as the upper bound \(r(n-r) \le n^2/4\), we get that
with probability at least \(1 - 7\theta ^2 - 2 e^{-t^2 n}\) for every \(t > 2\sqrt{2}\). If our desired quality of approximation is \(\Vert S - TW^*\Vert = \eta \), then some basic algebra gives the success probability as at least
Since \(\beta \le 1/4\), we can safely set \(t = \sqrt{2/\beta }\), giving
To simplify even further, we’d like to use the upper bound \(2e^{-2n/\beta } \le \frac{n^3\sqrt{\beta }}{(\eta - \mu _{\mathsf {QR}}(n){\textbf {u }})^2}\). These two terms have opposite curvature in \(\beta \) on the interval (0, 1), and are equal at zero, so it suffices to check that the inequality holds when \(\beta = 1\). The terms only become closer by setting \(n=1\) everywhere except in the argument of \(\mu _{\mathsf {QR}}(\cdot )\), so we need only check that
Under our assumptions \(\eta ,\mu _{\mathsf {QR}}(n){\textbf {u }}\le 1\), the right-hand side is greater than one, and the left hand less. Thus we can make the replacement, use \({\textbf {u }}\le \tfrac{\eta }{2\mu _{QR}(n)}\), and round for readability to a success probability of no worse than
the constant here is certainly not optimal.
Finally, for the running time, we need to sample \(n^2\) complex Gaussians, perform two QR decompositions, and one matrix multiplication; this gives the total bit operations as
\(\square \)
Remark C.19
Note that the exact same proof of Theorem 5.3 goes through in the more general case where the matrix in question is not necessarily a projection, but any matrix close to a rank-deficient matrix A. In this case an extra \(\sigma _r(T^*AT)\) term appears in the probability of success (see the guarantee given in the box for the Algorithm \(\mathsf {DEFLATE}\) that appears in this appendix).
D Alternate Proofs of Shattering and Davies’ Conjecture
In this section we’ll give an alternate and essentially different route to the smoothed analysis of eigenvalue gap and condition numbers in Theorem 1.4, as well as the proof of Davies’ Conjecture in [9], via results from [2]. We’ll begin by recalling some notation from [2], and direct the reader there for a more thorough treatment.
For any n let \(\mathbb {P}(\mathbb {C}^n)\) denote the projective space associated to \(\mathbb {C}^n\), and given \(A \in \mathbb {C}^{n \times n}\), \(\lambda \in \mathbb {C}\) and \(v \in \mathbb {P}(\mathbb {C})\), define \(A_{\lambda , v}: v^{\perp } \rightarrow v^{\perp }\) by
where \(v^{\perp } = \{ x \in \mathbb {C}^n \mid \langle x, v \rangle = 0 \}\) and \(P_{v^\perp } : \mathbb {C}^n \rightarrow v^\perp \) denotes the orthogonal projection. With this in hand, [2] defines the condition number of a triple \((A, \lambda , v)\in \mathbb {C}^{n\times n}\times \mathbb {C}\times \mathbb {P}(\mathbb {C}^n)\) as
They similarly define the mean square condition number of a matrix as
where \((\lambda _j, v_j)\) are the eigenpairs of A. In particular, note that \(\mu _{F, \mathsf {av}}(A)<\infty \) only when A has simple eigenvalues, and therefore \(\mu _{F, \mathsf {av}}(A)<\infty \) implies that A is diagonalizable.
1.1 D.1 Davies’ conjecture
To compare the notions of eigenvalue condition number and the condition number of a triple we recall the following theorem from [2]:
Theorem D.1
(Part of Proposition 2.7 of [2]) Let \({\mathcal {V}}\) denote the solution variety for the eigenpair problem, defined as
and let \(\Gamma : [0, 1] \rightarrow {\mathcal {V}}\), \(\Gamma (t) = (A_t, \lambda _t, v_t)\) be a smooth curve such that \(A_t\) lies in the unit sphere of \({\mathbb {C}}^{n \times n}\) for all t. Then for all \(t \in [0, 1]\),
Now recall that \(\kappa (\lambda )\) has the following variational description (e.g. see Theorem 1 in [34]) for any a simple eigenpair \((\lambda , v)\) of A, in terms of the derivatives of smooth curves going through the point \((A, \lambda , v)\). Namely
Hence, Theorem D.1 implies
It is then clear that \(\mu _{F, \mathsf {av}}(A)\) can also be used to upper bound \(\kappa _V(A)\). In view of this, we remind the reader of the following result from [2].
Theorem D.2
(Theorem 2.14 of [2]) Let \(G_n \in {\mathbb {C}}^{n \times n}\) denote a complex Ginibre matrix with \({\mathcal {N}}(0, 1_\mathbb {C}/n)\) entries. For any \(A \in \mathbb {C}^{n \times n}\) and \(\gamma > 0\), we have
We are now ready to prove the following result, which directly implies Davies’ conjecture (for comparison, Theorem 1.1 of [9] is the same result but the exponent of n is 3/2 instead of 5/2.)
Proposition D.3
Suppose \(A \in \mathbb {C}^{n \times n}\) and \(\gamma \in (0,1)\). Then there is a matrix \(E \in \mathbb {C}^{n \times n}\) such that \(\Vert E \Vert \le \gamma \Vert A \Vert \) and
where C is an absolute constant.
Proof
Let \(\lambda _i, v_i\) denote the (random) eigenvalues and eigenvectors of \(A + \gamma G_n\). Let \(B_r\) denote the event \(\Vert A + \gamma G_n \Vert _F < r\). Because \( \Vert G_n \Vert _F < 2\sqrt{n}\) with probability at least some absolute positive constant, for \( r = \Vert A \Vert + 2 \sqrt{n}\) the event \(B_r\) holds with that probability as well. Now note that
where in the last line we use Theorem D.2 and
Recalling the general inequality (see Lemma 3.1 [9]) \(\kappa _V \le \sqrt{n \sum _{i=1}^n \kappa (\lambda _i)^2}\) we get
So, when \(\Vert A \Vert = 1\) and \(\gamma < 1\), if we set \( r = \Vert A \Vert + 2 \sqrt{n}\) as discussed above, the event \(B_r\) occurs with positive probability, and by (74) we know that \(n \mathbb {E}\left[ \sum \kappa (\lambda _i)^2 \mid B_r \right] \le \frac{C n^5}{\gamma ^2}\) for some constant C. It follows that there is some realization of \(G_n\) for which \(\kappa _V(A+\gamma G_n)^2 \le \frac{C n^5}{\gamma ^2}\), as we wanted to show. \(\square \)
1.2 D.2 Smoothed analysis of \(\mathrm {gap}\)
Let \(M\in \mathbb {C}^{n\times n}\) be any matrix, and let \(\lambda _1, \dots , \lambda _n\) be its eigenvalues. In what follows we will denote
We begin by comparing these quantities to the condition number of the corresponding triple.
Lemma D.4
Let M be a matrix with distinct eigenvalues and spectral decomposition \(M=\sum _{i=1}^n \lambda _i v_i w_i^*\). Then, for every \(i=1, \dots , n\) it holds that
Proof
First we show that \(\Lambda (M_{\lambda _i, v_i}) = \Lambda (M-\lambda _i)\setminus \{0\}\). To see this, take any \(j\ne i\) and note that
and hence \(\lambda _j-\lambda _i\) is an eigenvalue of \(M_{\lambda _i, v_i}\).
Now, using that the norm of a matrix is bigger than its spectral radius we get
The claim then follows from the definition of \(\mu (M, \lambda _i, v_i)\). \(\square \)
Using Theorem D.2 we get the following.
Proposition D.5
Let \(A\in \mathbb {C}^{n\times n}\) be an arbitrary matrix and let \(G_n\) be a normalized complex Ginibre matrix. Then for any \(t, \gamma >0\)
Thus, \(\mathrm {gap}(A+\gamma G_n)=O(\gamma /n^{3/2})\) with probability bounded away from zero.
Proof
Using Lemma D.4 we get
Combining this with Theorem D.2 we obtain
The proof is then concluded using Markov’s inequality. \(\square \)
Remarkably, the \(\gamma \) dependence in the bound of Proposition D.5 is optimal, partially answering Question 2 in Sect. 6 posed by us in a previous version of this paper. Also note that the bound is stronger than that from Corollary 3.7 obtained with our techniques. That said, and as discussed in Remark 1.11, the results in [2] that were necessary for the proof of Proposition D.5 heavily exploit that the random perturbation has a complex Gaussian distribution, and it is not clear how to extend these result to other distributions.
With this in mind, we publicize the following conjecture of independent interest in random matrix theory, which was communicated to us by Vishesh Jain:
Conjecture D.6
Let \(K>0\) and let \(M_n\) be an \(n \times n\) random matrix with independent complex entries (not necessarily centered), whose distributions are absolutely continuous with respect to the Lebesgue measure on \(\mathbb {C}\), and have density upper bounded by K. Then
where \(\mathrm {poly}(n, t)\) is a universal polynomial (i.e. its coefficients are independent of the distributions of the entries of \(M_n\)) in n and t, which is zero when \(t=0\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Banks, J., Garza-Vargas, J., Kulkarni, A. et al. Pseudospectral Shattering, the Sign Function, and Diagonalization in Nearly Matrix Multiplication Time. Found Comput Math 23, 1959–2047 (2023). https://doi.org/10.1007/s10208-022-09577-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-022-09577-5