Nearest $\Omega$-stable matrix via Riemannian optimization

We study the problem of finding the nearest $\Omega$-stable matrix to a certain matrix $A$, i.e., the nearest matrix with all its eigenvalues in a prescribed closed set $\Omega$. Distances are measured in the Frobenius norm. An important special case is finding the nearest Hurwitz or Schur stable matrix, which has applications in systems theory. We describe a reformulation of the task as an optimization problem on the Riemannian manifold of orthogonal (or unitary) matrices. The problem can then be solved using standard methods from the theory of Riemannian optimization. The resulting algorithm is remarkably fast on small-scale and medium-scale matrices, and returns directly a Schur factorization of the minimizer, sidestepping the numerical difficulties associated with eigenvalues with high multiplicity.


Introduction
Let Ω be a non-empty closed subset of C, and define S Ω,n,F := {X ∈ F n×n : Λ(X) ⊆ Ω} ⊆ F n×n , the set of n×n matrices (with entries in either F = C or F = R) whose eigenvalues all belong to Ω. Given A ∈ F n×n , we consider the problem of finding a matrix B ∈ S Ω,n,F nearest to A, as well as the distance from A to B. This is known as the nearest Ω-stable matrix problem. More formally, the problem is to find together with the value of the minimum. The norm considered here is the Frobenius norm M F := ( n i,j=1 |M ij | 2 ) 1/2 = tr(M * M). Important examples of nearest Ω-stable matrix problems include: • the nearest Hurwitz stable matrix problem 1 (Ω = Ω H := {z ∈ C : ℜz ≤ 0}), which arises in control theory [11,Section 7.6]. Here, ℜz denotes the real part of z ∈ C. Hurwitz stability is related to asymptotical stability of the dynamical systemẋ = Ax; in some cases, numerical or modelling errors produce an unstable system in lieu of a stable one, and it is desirable to find a way to 'correct' it to a stable one without modifying its entries too much.
• the nearest Schur stable matrix problem (Ω = Ω S := {z ∈ C : |z| ≤ 1}), which is the direct analogue of the Hurwitz stable matrix problem that arises when, instead of a continuous-time, one has a discrete-time dynamical system x k+1 = Ax k [11].
• the problem of finding nearest matrix with all real eigenvalues (Ω = R), which was posed (possibly more as a mathematical curiosity than for an engineering-driven application) as an open question in some internet mathematical communities [2,12].
Moreover, in the literature interest has been given also to more exotic choices. For instance, in [9] this problem is considered for the case were Ω is a region generated by the intersection of lines and circles. Another common related (but different!) problem is that of finding the nearest Ω-unstable matrix, i.e., given a matrix A with all its eigenvalues in the interior of a closed set Ω, finding the nearest matrix with at least one eigenvalue outside the interior of Ω, together with its distance from A. For Ω = Ω H and Ω = Ω S , these minimal distances are known as pseudospectral abscissa and pseudospectral radius (or stability radius), respectively, and they have been studied extensively; see for instance [4,6,7,20,21,23,25,28,29,32,33]. Other variants and extensions have been studied recently as well [13,14,15,22,34].
Nearest Ω-stable matrix problems are notoriously hard. Unlike various other matrix nearness problems [24], and to our knowledge, there is no closed-form solution based on an eigenvalue or Schur decomposition. At least for the choices of Ω illustrated above, the feasible region S Ω,n,F is nonconvex. As a result, currently existing algorithms cannot guarantee that a global minimizer is found, and are often caught in local minima. In the most popular case of the nearest Hurwitz stable matrix problem, many approaches have been proposed, including the following.
• Orbandexivry, Nesterov and Van Dooren [35] use a method based on successive projections on convex approximations of the feasible region S Ω,n,F .
• Curtis, Mitchell and Overton [10], improving on a previous method by Lewis and Overton [30], use a BFGS method for non-smooth problems, applying it directly to the largest real part among all eigenvalues.
In this paper, we consider the problem for a completely general Ω and we describe a novel approach: we parametrize X with its complex Schur factorization X = UT U * , and observe that, if U is fixed, then there is an easy solution to the simplified problem in the variable T only. As a remarkable consequence, we demonstrate that finding an Ω-stable matrix nearest to A is equivalent to minimizing a certain function (depending both on Ω and on A) over the matrix Riemannian manifold of unitary matrices U(n). A version of the method employing only real arithmetic (and minimizing over the manifold of orthogonal matrices O(n)) can be developed with some additional considerations.
This reformulation of the problem is still difficult. Indeed, local minima of the original problem are also mapped to local minima of the equivalent Riemannian optimization problem. As a result, we cannot guarantee a global optimum either. However, there are several advantages coming from the new point of view: 1. The approach is completely general, and unlike some previously known algorithms it works for any closed set Ω; 2. It can exploit the robust existing machinery of general algorithms for optimization over Riemannian manifolds [1,5]; 3. In the most popular practical case of Hurwitz stability, numerical experiments show that our algorithm performs significantly better than state-of-the-art existing algorithms, both in terms of speed and in terms of accuracy, measured as the distance of the (possibly not global) minimum found from the matrix A; 4. Our algorithm first finds an optimal unitary/orthogonal matrix Q, then performs (implicitly) a unitary/orthogonal similarity A → Q * AQ, and finally projects in a simple way onto S Ω,n,F : the simplicity of this procedure ensures backward stability of the computation of B.
5. The algorithm produces as an output directly a Schur decomposition B = QT Q * (or a close analogue) of the minimizer. This decomposition is useful in applications, and is a 'certificate' of stability. Chances are that the problem of computing the Schur decomposition a posteriori given B is a highly ill-conditioned one, since in many cases the minimizer B has multiple eigenvalues with high multiplicity. Hence, it is convenient to have this decomposition directly available.
The paper is structured as follows. After some preliminaries concerning the distance from closed subsets of R N (Section 2), in Section 3 we describe the theory underlying our method for complex matrices, while in Section 4 we set up the slightly more involved theoretical background for real matrices. Section 6 describes the details of how to compute the gradient of the objective function in our reformulation of the problem, which is key for devising a practical algorithm. We then specialize to the cases of Hurwitz stability Ω = Ω H and of matrices with all real eigenvalues Ω = R: in Section 7, we describe the results of some numerical experiments, comparing (with rather promising outcome) with existing methods in the case of Hurwitz stability. We finally draw some conclusions in Section 8.
The code that we used for our numerical experiments is made publicly available from the repository https://github.com/fph/nearest-omega-stable.

Squared distance from closed sets
Let Ω ⊆ C be a non-empty closed set, and define Note that p Ω is not always uniquely defined: for instance, take Ω = {0, 2} ⊂ C and let z be any point with ℜz = 1. In many practical cases the minimum is always achieved in a unique point (for instance, when Ω is convex), but in general there may be a non-empty set of points z for which the minimizer is non-unique, known as medial axis of Ω. The same definitions can be formulated for R N , and indeed the definitions on C are just a special case of the following ones, thanks to the isomorphism C ≃ R 2 . Let now Ω ⊆ R N be a non-empty closed set, and define The following result proves differentiability of the squared distance function d 2 Ω : it is stated in [8, Proposition 4.1] for a compact Ω ⊆ R N , but it is not difficult to see that it holds also for unbounded closed sets, since it is a local property.
Theorem 2.1. The squared distance function d 2 Ω (z) is continuous on R N (or C), and almost everywhere differentiable. Indeed, it is differentiable on the complement of the medial axis of Ω. Its gradient is ∇ z d 2 Ω (z) = 2(z − p Ω (z)). In this paper, we will mostly be concerned in cases in which the set Ω is convex, so the medial axis is empty, but most of the framework still works even in more general cases. When the medial axis is not empty, we henceforth tacitly assume that, for all z belonging to the medial axis, p Ω (z) has been fixed by picking one of the possible minimizers (if necessary, via the axiom of choice).

Reformulating the problem: the complex case
We start with a lemma that shows how to solve (2) with the additional restriction that X is upper triangular.
is given by i > j (lower triangular part).
Proof. We have Clearly, 1 is constant, the minimum of 2 under the constraint that T ii ∈ Ω is achieved when T ii = p Ω (Â ii ), and the minimum of 3 is achieved when T ij =Â ij .
In particular, it holds that min T ∈S Ω,n,C T upper triangular with L(Â) =Â − T (Â). One can see that the matrix L(Â) is lower triangular, and has entries Note that the matrices T (Â) and L(Â) are uniquely defined if and only if p Ω (Â ii ) is unique for each i. However, the quantity L(Â) 2 F , which is the optimum of (4), is always uniquely determined, since |Â ii − p Ω (Â ii )| 2 = d 2 Ω (Â ii ) is uniquely determined. Thanks to Lemma 3.1, we can convert the nearest Ω-stable matrix problem (2) into an optimization problem over the manifold of unitary matrices Indeed, Theorem 3.2 below shows the equivalence of (2) and (7).
1. The optimization problems (2) and (7) have the same minimum value.
3. If B is a local (resp. global) minimizer for (2), and B = UT U * is a Schur decomposition, then U is a local (resp. global) minimizer for (7) and T = T (U * AU).
Proof. Taking a Schur form of the optimization matrix variable X = UT U * , we have where the last step holds because of Lemma 3.1. All the statements follow from this equivalence.

Reformulating the problem: the real case
The version of our result for F = R is somewhat more involved. We can identify two separate cases, one being a faithful analogue of the complex case and the other involving some further analysis.

Ω ⊆ R
The simpler version of the real case is when Ω is a subset of the reals. This allows one to solve, for instance, the problem of the nearest matrix with real eigenvalues, Ω = R. In this case, matrices X ∈ S Ω,n,R have a real Schur form in which the factor T is upper triangular (as opposed to quasi-triangular with possible 2 × 2 blocks). In this case, mutatis mutandis, all the arguments in Section 3 still hold. Specifically, it suffices to replace C with is not) and there is nothing more to say.

Ω ⊆ R
In the more general case, matrices X ∈ S Ω,n,R may have complex eigenvalues, so their real Schur form will have a quasi-triangular T . Since the zero pattern of T is not uniquely determined, we need to modify slightly the approach of Section 3 to be able to define a suitable objective function f (Q). We solve this issue by imposing a specific zero pattern for T . Definition 4.1. A real matrix T ∈ R n×n is said to be in modified real Schur form if it is block diagonal with all 2 × 2 blocks on the diagonal, except (if and only if n is odd) for a unique 1 × 1 block at the right bottom.
A modified real Schur decomposition of a matrix M ∈ R n×n is M = QT Q ⊤ where Q ∈ R n×n is orthogonal and T is in modified real Schur form.
Since one can reorder blocks in the real Schur decomposition [27,Theorem 2.3.4], in particular one can obtain a real Schur decomposition in which all non-trivial 2 × 2 diagonal blocks are on top. Thus, the existence of modified real Schur decompositions is an immediate corollary of the existence of real Schur decompositions. Further modified Schur decompositions, having more non-trivial 2 × 2 blocks than the special ones described above, can be obtained by applying Givens rotations to appropriate 2 × 2 triangular blocks of classical real Schur forms.
is in modified real Schur form. Note that in this example the top 2 × 2 block on the diagonal is associated with two real eigenvalues, a case which would not be allowed in the classical real Schur form.
Furthermore, we can generalize the projection p Ω to 2×2 matrices, as follows: if A ∈ R 2×2 then p Ω (A) is a minimizer of the distance from A among all 2×2 real matrices with eigenvalues in Ω. This is just a special case of the definitions (3), with the (closed) set S Ω,n,R in place of Ω, since R 2×2 ≃ R 4 .
We can now state Lemma 4.1 and Theorem 4.2, the real analogues of Lemma 3.1 and Theorem 3.2. We omit their proofs, which are very similar to the complex case.
Analogously, we can define the block lower triangular matrix L(Â) =Â − T (Â), with blocks so that the optimum of (8) is L(Â) 2 F . Similarly to the complex case, we can recast any (real) nearest Ω-stable matrix problem as the optimization problem over the manifold of orthogonal matrices where the function L is defined by either (6) (if Ω ⊆ R) or (10) (otherwise).
1. The optimization problems (2) and (11) have the same minimum value.
3. If B is a local (resp. global) minimizer for (2), and B = QT Q ⊤ is a (modified) Schur decomposition, then Q is a local (resp. global) minimizer for (11) and T = T (Q ⊤ AQ).
Moreover, it is clear that once again we can restrict the optimization problem to Q ∈ SO(n). This concludes the theoretical overview needed to reformulate (complex or real) nearest Ω-stable matrix problems as minimization over (unitary or orthogonal) matrix manifolds.
To solve (7) or (11) in practice for a given closed set Ω, we need an implementation of the projection p Ω . In particular, for real problems with Ω ⊆ R, we also need an implementation of p Ω for 2 × 2 matrices, which is not obvious. In the next section, we discuss how to develop such an implementation in the important case of Ω = Ω H (nearest Hurwitz stable matrix).

Computing real 2×2 nearest Hurwitz stable matrices
Let us start by describing the set of 2 × 2 real Hurwitz stable matrices S Ω H ,2,R .
Hurwitz stable if and only if both eigenvalues have nonpositive real part. There are two cases. If the two eigenvalues are a complex conjugate pair, say, α ± iβ, then Hurwitz stability is equivalent to tr(X) = 2α ≤ 0 whereas det(X) = α 2 + β 2 ≥ 0 is vacuously true.
If the two eigenvalues are real, λ 1 and λ 2 , then Hurwitz stability is equivalent to Observe that the frontiers of S t and S d are, respectively, the sets of traceless and singular matrices: In order to visualize the geometry of S Ω H ,2,R , it is first convenient to fix a basis that preserves the Frobenius distance and where the matrix A has equal diagonal elements. Indeed, as we argue below, in this basis there is one Hurwitz stable matrix B, nearest to A, that also has equal diagonal elements, and therefore we can reduce the problem from one in a four-parameter space to one in a three-parameter space.
Proof. We claim that there is at least one real number x such that (a−y) it is readily verified that G ∈ SO(2). Moreover, In particular, specializing Lemma 5.2 to y = (a + d)/2, we see that any 2 × 2 matrix is rotationally similar to one with equal diagonal entries.
Since the set of matrices such that A 11 = A 22 is closed under p Ω H , and one can ensure that A has this property with an orthogonal change of basis, then it makes sense to visualize S Ω H ,2,R ∩ {A ∈ R 2×2 : A 11 = A 22 } as a volume parametrized by the three coordinates A 11 , A 12 , A 21 . We show a few images of this set in Figure 1. One can distinguish, in the 3D image on the left of Figure 1, two straight faces (corresponding to the linear condition tr(A) = 0) and two curved ones (corresponding to the nonlinear condition det(A) = 0). The 'cross' formed by the two edges A 11 = A 22 = A 21 = 0 and A 11 = A 22 = A 12 = 0 corresponds to matrices with two zero eigenvalues.
The main result of this section is the following.
Lemma 5.4. Let A ∈ R 2×2 ; let A = U[ σ 1 σ 2 ]V ⊤ be a singular value decomposition, and G ∈ SO(2) be a matrix such thatÂ = G ⊤ AG satisfiesÂ 11 =Â 22 (G exists by Lemma 5.2). Then, the set with contains a Hurwitz stable matrix nearest to A.
Proof. We assume that A is not Hurwitz stable, otherwise the result is trivial, and let B be a Hurwitz stable matrix nearest to A. One of the following three cases must hold (corresponding to which of the constraints of the two constraints det(X) ≥ 0 and tr(X) ≤ 0 are active): Indeed, any Hurwitz stable matrix B nearest to A must belong to the boundary of the set. This implies that either det(B) = 0, or tr(B) = 0, or both. We treat the three cases separately.
The only such minimizer is A − 1 2 tr(A)I: this is easy to see by parametrizing X = [ x y z −x ] ∈ ∂S t and studying the resulting function, which is a strictly convex trivariate quadratic.
2. B ∈ ∂S t , B ∈ ∂S d . As in the previous case, note that B must be a local minimizer of A − X 2 F in ∂S d . We divide further into three subcases.
(a) σ 1 > σ 2 > 0. Let us parametrize X ∈ ∂S d as The following computations and arguments are quite tedious, but straightforward and easily checked, e.g., with a computer algebra system. The gradient of In particular, note that We can seek the stationary points by solving for d = 0: we get that either σ = σ 1 and α = β = 0, or σ = σ 2 and α = β = π/2, or σ = 0 and tan α tan β = −σ 1 /σ 2 . Clearly, we can exclude the latter family of solutions, as σ = 0 corresponds (regardless of the values of α, β) to X = 0. It is manifest that 0 cannot be a local minimizer because, for all positive and sufficiently small ǫ, It remains to discuss the other two stationary points. We can study the inertia of the Hessian H by observing that, if it can be quickly checked that M ⊤ HM (and hence H) is positive definite when σ = σ 1 and α = β = 0, and it is indefinite when σ = σ 2 and α = β = π/2. We conclude that the only local minimizer is when σ = σ 1 and α = β = 0, and corresponds to X = B 0 .
(b) σ 1 > σ 2 = 0: we take the same parametrization as the previous case. Solving for d = 0, we get that the stationary points are when either σ = σ 1 and α = β = 0 or σ = 0 and cos(α) cos(β) = 0. Analogously to the previous subscase, once again it can be excluded that σ = 0 corresponds to a local minimizer. On the other hand, the point σ = σ 1 and α = β = 0 is still a local minimizer and still corresponds to X = B 0 .
(c) σ 1 = σ 2 > 0: solving d = 0, and excluding solutions with σ = 0 by the same argument as before, we get that the stationary points are when σ = σ 1 and α = β. All these points have a positive semidefinite Hessian. Hence the local minimizers are It is easy to verify that A−B α F = σ 2 for all α, so all B α are equidistant from A. Note, however, that only one of these matrices belongs to our set (12) of potential minimizers: the matrix B 0 achieved for α = 0. It might happen that, for some α ∈ (0, π), B α is Hurwitz stable but B 0 is not: if that is the case, then in order to prove the statement we must show that there is a different global minimizer in the set (12). In other words, we must prove that (under the above described circumstances) some other matrix belonging to the set (12) is Hurwitz stable and has Frobenius distance from A less than or equal to σ 2 . We may assume without loss of generality that σ 1 = σ 2 = 1, hence A is orthogonal: the general case can be reduced to this one by scaling. We need, once more, to analyze two possible situations separately.
i. det(A) = 1. In this case, we shall prove that B 0 is Hurwitz stable. This is tantamount to showing tr(B 0 ) ≤ 0.
In the SVD, we have Σ = I 2 , and (up to a sign change that we can make without any loss of generality) U and V ⊤ are 2 × 2 rotation matrices; in particular, they commute, and we may assume We have This computation shows that tr(B α ) is independent of α. We conclude that B α is Hurwitz stable for some α = 0 if and only if B 0 is Hurwitz stable. ii. det(A) = −1. In this case, tr(B α ) does depend on α, and it may be the case that B α is Hurwitz stable while B 0 is not. However, we shall show that B + , B − are also global minimizers. These matrices have two zero eigenvalues, so they are always Hurwitz stable. The matrixÂ = G ⊤ AG is orthogonal with det(Â) = −1, so it has the form A = cos(γ) sin(γ) sin(γ) − cos(γ) , γ ∈ (−π, π]. Since its two diagonal entries must be equal, it must hold that cos(γ) = 0, henceÂ = ±[ 0 1 1 0 ]. Then, direct verification shows that the two matrices Lemma 5.4 yields an explicit algorithm to find a projection p Ω (A) for A ∈ R 2×2 , and Ω = Ω H : we compute all the five matrices in the set (12), and among those of them that are Hurwitz stable we choose one which is nearest to A. (Note that at least B + and B − are always Hurwitz stable, so this algorithm always returns a matrix.) • A is not Hurwitz stable since it has an eigenvalue 1 + √ 2 > 0. Note that, by continuity, for any matrix in a sufficiently small neighborhood of A the same inequalities will hold, and hence we will fall under the same case. In particular, the set of matrices A for which p Ω H (A) has a double zero eigenvalue is not negligible.

The gradient of the objective function
To apply most optimization algorithms, one needs the gradient of the objective function; we compute it in this section. We start from the real case, which is simpler.

Computing the gradient: real case
We assume in this section that F = R; we wish to compute the gradient of the function f (Q) = L(Q ⊤ AQ) 2 F , where the function L(·) is given by either (6) or (10). In the set-up of optimization on matrix manifolds, the most appropriate version of the gradient to consider is the so-called Riemannian gradient [1, Section 3.6]. In the case of an embedded manifold (such as our case, since O(n) is embedded in R n×n ≃ R n 2 ) the Riemannian gradient grad f is simply the projection on the tangent space T Q O(n) of the Euclidean gradient of f : i.e., its gradient ∇ Q f when f is considered as a function in the ambient space R n×n → R.
Thus we start by computing the Euclidean gradient ∇ Q f . The function f = g • h is the composition of g(X) = L(X) 2 F and h(Q) = Q ⊤ AQ. The gradient of g is this follows from the fact that the gradient of L 2 ij is 2L ij , and by Theorem 2.1 the gradient ). Here and in the following, for ease of notation, we set L := L(Q ⊤ AQ), together with T := T (Q ⊤ AQ) andÂ = Q ⊤ AQ = L + T .
When one uses the block version (10) of the function L(·), the same result holds blockwise: the gradient of L II F is 2(X II − p Ω (X II )) even when X II is a 2 × 2 block. This follows again from Theorem 2.1, applied to the closed set S Ω,2, hence, using vectorization and Kronecker products [26,Section 11.4] its Jacobian is where the permutation matrix Π = Π ⊤ is often called the vec-permutation matrix [3] or the perfect shuffle matrix, and it is defined as the n 2 × n 2 matrix such that Π vec(X) = vec(X ⊤ ) for all X.
By the chain rule, We now need to project ∇ f (Q) = 2(AQL ⊤ + A ⊤ QL) on the tangent space T Q O(n) to get the Riemannian gradient. One has (see [1,Example 3.5.3] or more generally [3, Lemma In particular, we can write the projection of a matrix M onto T Q O(n) by using the skewsymmetric part operator skew(G) = 1 2 (G − G ⊤ ) as The matrix T L ⊤ − L ⊤ T is a strictly upper triangular matrix with entries kiÂkj , i < j.

Computing the gradient: complex case
The computation of the gradient in the complex case is similar, but technically more involved.
To work only with real differentiation, following (with a slightly different notation) [3, Section 2.1], we define a complex version of the vectorization operator to map C n×n into R 2n 2 Here ℜX and ℑX denote the real and imaginary parts (defined componentwise) of the matrix X.
Similarly, we define a complex version of the Kronecker product ⊗ c , to obtain a complex version of the identity vec(AXB) = B ⊤ ⊗ A vec(X).
Finally, let Π c ∈ R 2n 2 ×2n 2 = diag(Π, −Π) be the permutation matrix (with Π c = Π ⊤ c ) such that Π c cvec(X) = cvec(X * ). We now have all the complex vectorization machinery available to perform a direct analogue of the computation in the previous section. Again, for ease of notation we defineÂ = U ⊤ AU = L + T , with L = L(Â) and T = T (Â).
The tangent space to the manifold of unitary matrices is, by a special case of [3, Lemma 3.2], T U U(n) = {US : S = −S * }, and the associated projection is .
The diagonal entries of T L * − L * T are equal to T ii L ii − L ii T ii = 0, hence that matrix is again strictly upper triangular. Its nonzero entries are given by

Numerical experiments
We implemented the proposed algorithm for F = R, and Ω = Ω H the set of Hurwitz stable matrices. We used the solver trustregions from the Matlab package Manopt [5] (version 5.0) for optimization on matrix manifolds, specifying the manifold O(n), the cost function f (Q) = L(Q ⊤ AQ) 2 F (with L(·) as in (10)) and the Riemannian gradient grad f (Q) = 2Q skew(T L ⊤ − L ⊤ T ). The solver is a quasi-Newton trust-region method; a suitable approximation of the Hessian is produced automatically by Manopt using finite differences.
All the experiments were performed on an Intel i7-4790K CPU 4.00 GHz with 32 Gib of RAM, running Matlab R2018b (Lapack 3.7.0 and MKL 2018.0.1) on a Linux system. Our implementation of the algorithm for the two cases Ω = Ω H and Ω = R is available for download at https://github.com/fph/nearest-omega-stable. Remark 7.1. It is somewhat surprising that our algorithm does not make use of an eigensolver (Matlab's eig, schur, eigs or similar), even though the original problem (2) is intrinsically about eigenvalues. Indeed, the minimization procedure itself is as an eigensolver in a certain sense: it computes a Schur form by trying to minimize the strictly subdiagonal part of Q ⊤ AQ, not much unlike Jacobi's method for eigenvalues ( [17], see also [31] for more references). Indeed, when run with Ω = C, the method essentially becomes a Jacobi-like method to compute the Schur form.

Experiments by Gillis and Sharma
We used the same set of benchmark used in the paper by Gillis and Sharma [16], i.e., matrices with dimension n ∈ {10, 20, 50, 100} of four different types as follows.
On each matrix, we used as competitors the algorithms BCD, Grad, FGM from [16], the algorithm SuccConv from [35], and a BFGS algorithm based on GRANSO [10] (which is an improvement of HANSO [30]). The new algorithm presented here is dubbed Orth. The implementations of all competitors are due to their respective authors and are taken from Nicolas Gillis' website https://sites.google.com/site/nicolasgillis/code. We highlight the fact that we are comparing methods that use different objective functions and formulations of the problem; we are not simply changing the optimization algorithm. The convergence plots in Figures 2-5 show that the new algorithm converges in vastly quicker time scales on matrices of small size. In addition, in all but one case the local minimizers produced by the new algorithm are of better quality, i.e., they have a smaller objective function A − B F than the ones of the competitors. This can be seen in the plots by comparing the lower horizontal level reached by each line.
We report in Figure 6 the results of another experiment taken from [16], which aims to produce a performance profile. The result confirms that in the vast majority of the cases the local minima found by the new algorithm have a lower objective function.

Multiple eigenvalues
Inspecting the minimizers produced by the various methods, we observed that in many cases the new algorithm produces matrices with multiple eigenvalues (especially multiple zero eigenvalues), while other methods settle for worse minimizers with eigenvalues of lower multiplicities (particularly BCD and Grad). An illustrative example is in Figure 7. We observe that, as illustrated by Example 5.1, the instances where the global minimum has multiple eigenvalues are not expected to be rare for this problem. This observation provides a qualitative insight to explain why our approach appears to be so highly competitive.

Comparison with the method by Guglielmi and Lubich
Due to the lack of available code, we could not compare directly our method with the algorithm in [19], which is arguably one of the best competitors. We report some remarks about it based on published figures.
• On the matrix gallery('grcar', 5), the new method computes in 0.1 seconds a different minimizer than the one reported in [19]; both have very close objective values: A−B F = 2.3097 reported in [19] vs. 2.309628 found by the new method. We suspect that this minimal difference could simply be due to different stopping criteria.
• On gallery('grcar', 10), the new method computes in 0.3 seconds a minimizer with A − B F = 3.2834: this time, undoubtedly a better value than those reported in [19].
• On gallery('grcar', 30), the new method computes in 3.5 seconds a minimizer with A − B F = 5.66, which again improves on the outcomes reported in [19]. The computed minimizer has all its eigenvalues on the imaginary axis: a complex conjugate pair with multiplicity 14 each, and a complex conjugate pair with multiplicity 1 each. In contrast, in [18], Guglielmi reports a minimizer with A − B F = 6.57 and seemingly all distinct eigenvalues all on the imaginary axis, found in 143 seconds (on a different machine, so times cannot be compared directly: still, the very different orders of magnitude suggest that the new algorithm is competitive also in terms of speed).
• Guglielmi and Lubich [19] report experiments with matrices of size 800, 1000, and 1090. Currently, our method cannot handle those sizes, since the optimization method used stagnates: after an initial improvement it fails to reduce the objective function further, oscillating between various values with nonzero gradient and showing convergence issues in the tCG algorithm used as an inner solver used in the trustregions algorithm. It is an interesting task for future research to study how to improve our algorithm so that it can deal with larger matrices; see Section 8.
• Another advantage of the algorithm in [19] is the ability to handle many different additional constraints in the form of matrix structure (e.g. sparsity, Toeplitz structure).
The new algorithm cannot handle those, and it seems challenging to include them since they do not interact well with the conjugation Q ⊤ AQ that is a crucial step in our procedure.

The complex case and a conjecture
We implemented the (simpler) complex version of the minimization procedure as well, obtaining comparable results. An interesting observation is that in all the (limited) examples that we tried, for a real A ∈ R n×n we observed that that is, the minimizer over the whole C n×n happens to be a real matrix. It does not seem obvious to prove that this must be the case, since the set S Ω H ,n,C is non-convex, and in particular there are examples of matrices such that X ∈ C n×n is Hurwitz stable, but ℜX is not. We formulate it as a conjecture.

Nearest matrix with real eigenvalues
For a final set of experiments, we have implemented the version of the algorithm that computes the nearest matrix with real eigenvalues. As discussed in Section 4, in this case we do not need to use the quasi-Schur form, and we can simply take T (Â) = triu(Â), the upper triangular part ofÂ = Q ⊤ AQ. Then L(Â) =Â − T (Â) is defined accordingly, and we can use the objective function f (Q) = L(Q ⊤ AQ) 2 F and the gradient computed in Section 6. We tested the algorithm on the examples discussed in [2]. On the matrix the candidates suggested in [2] are: • the matrix obtained from the eigendecomposition A = V DV −1 as B 1 = V (ℜD)V −1 , which has distance A − B 1 F = 1.5811; • the matrix which has distance A − B 2 F = 1.4213 (this matrix actually is not even a local minimizer); • the matrix obtained from the real Schur factorization A = QT Q ⊤ as B 3 = Q triu(T )Q ⊤ , which has distance A − B 3 F = 0.5.
Our method computes in about 0.1 seconds a minimizer These results confirm that none of the strategies suggested in [2] are optimal, and suggest that minimizers with high-multiplicity eigenvalues are to be expected for this problem as well.

Conclusions
In this paper, we introduced a method to solve the nearest Ω-stable matrix problem, based on a completely novel approach rather than improving on those known in the literature. The algorithm has remarkably good numerical results on matrices of sufficiently small size (up to n ≈ 100), both in terms of computational time compared to its competitors, and of quality of the local minima found.
We attribute a good part of the success of this method to the fact that it computes eigenvalues only indirectly; thus, it is able to avoid the inaccuracies associated with multiple eigenvalues. Indeed, it is well-known that computing eigenvalues with high multiplicity is an ill-conditioned problem: if a matrix has an eigenvalue λ of multiplicity k, a perturbation of size ε will, generically, produce a matrix with k simple eigenvalues at distance O(ε 1/k ) from λ (see, for instance, [36,Chapter 2]). The method can, in principle, be generalized to similar problems by defining appropriate Schur-like decompositions; for instance, to the problem of finding matrices with at least a given number k of eigenvalues inside a certain set Ω.
In our future research, we plan to study extensions such as the one mentioned above, and to investigate further the behaviour of the method on larger matrices. Hopefully, convergence on larger matrices can be obtained by adjusting parameters in the optimization method, computing explicitly the Hessian of the objective function f (Q) = L(Q ⊤ AQ) 2 F , or including some techniques borrowed from traditional eigensolvers.