1 Introduction

A given linear dynamical system \(\dot{x} = Ax\), with \(A\in \mathbb {R}^{n\times n}\) is exponentially stable, if and only if all eigenvalues have a strictly negative real part. Or equivalently, the spectral abscissa of \(A\), which is the maximum of the real parts of the eigenvalues, is negative. Unfortunately, the spectral abscissa is not a robust measure for stability. This paper focuses on the effects of perturbations on the stability of a given dynamical system. Therefore the \(\varepsilon \)-pseudospectrum \(\varLambda _{\varepsilon }\left( A\right) \) is studied, i.e. the set of the eigenvalues of all real or complex matrices in a given neighborhood of \(A\). In particular,

$$\begin{aligned} \varLambda _{\varepsilon }^{\mathbb {K},\Vert \cdot \Vert }\left( A\right) \!:=\! \left\{ \lambda \; : \; \lambda \!\in \! \varLambda \left( B\right) , \Vert A-B\Vert \le \varepsilon , B \!\in \! \mathbb {K}^{n\times n} \right\} \!, \end{aligned}$$
(1)

with \(\varLambda (B)\) the spectrum of \(B\), \(\mathbb {K} = \mathbb {R}\text { or }\mathbb {C}\) and \(\Vert \cdot \Vert \) a chosen norm [1]. In this paper, we address complex perturbations measured with the Euclidean norm. Therefore, in order not to overload the notation, we will denote \(\varLambda _{\varepsilon }^{\mathbb {C},\Vert \cdot \Vert _2}(A)\) simply by \(\varLambda _{\varepsilon }(A)\). The largest possible real part of points in the pseudospectrum is called the \(\varepsilon \)-pseudospectral abscissa \(\alpha _\varepsilon \left( A\right) \), i.e.,

$$\begin{aligned} \alpha _\varepsilon \left( A\right) := \max \left\{ \mathrm{Re }\left( \lambda \right) \; : \; \lambda \in \varLambda _\varepsilon \left( A\right) \right\} . \end{aligned}$$
(2)

Perturbations with bound \(\varepsilon \) may render the system unstable whenever \(\alpha _\varepsilon \left( A\right) \ge 0\). To guarantee stability, the norm of perturbations \(\varepsilon \) has to be limited. The distance to instability \(\mu \left( A\right) \) is a strict upper bound on the norm of perturbations such that all perturbed systems are stable. In particular, it holds that \(\alpha _\varepsilon \left( A\right) < 0\) if and only if \(\varepsilon < \mu \left( A\right) \). One could easily see that \(\varepsilon = \mu \left( A\right) \) is the critical value for which \(\alpha _\varepsilon \left( A\right) = 0\). Most earlier proposed methods for computing this distance require in each iteration the computation of all eigenvalues on the imaginary axis of a \(2n \times 2n\) Hamiltonian matrix [24]. Even though convergence to the correct solution is guaranteed, these methods are not computationally tractable when \(n\) is large. Recently in [5], a fast local method has been described based on computing a Jordan block of the corresponding Hamiltonian matrix by Newton’s method. In this paper, we propose a fast local method grounded in the pseudospectral abscissa computation, more precisely in the iteration proposed in [6, 7], as well as a global method based on the optimization algorithm of [8]. In addition, we show that these algorithms apply in fact to a broad class of nonlinear eigenvalue problems, described by

$$\begin{aligned} \left( \sum _{i=0}^m A_i p_i\left( \lambda \right) \right) v = 0, \end{aligned}$$
(3)

where \(A_0,\dots ,A_m\) are real \(n\)-by-\(n\) matrices and \(p_i:\ \mathbb {C}\rightarrow \mathbb {C}\) are entire functions satisfying \(p_i\left( \bar{\lambda }\right) =\overline{p_i(\lambda )},\ i=1,\ldots ,m\). The latter assumption guarantees that eigenvalues are either real or appear as complex conjugate pairs. The class (3) includes polynomial eigenvalue problems and nonlinear eigenvalue problems arising in the analysis of linear delay differential equations as special cases. Although many of these problems can be reformulated as linear eigenvalue problems (for example, by a so-called linearization of a polynomial eigenvalue problem or by a transformation to an equivalent operator eigenvalue problem for the delay eigenvalue problem [9]), we will not consider the unstructured pseudospectra of a particular type of linearization (note that the pseudospectra strongly depend on the basis). Instead, as in [10, 11], we will explicitly take the structure of the original nonlinear eigenvalue problem into account in the definition of pseudospectra, which we relate to the effect of perturbations on the individual coefficient matrices.

2 Preliminaries

2.1 Pseudospectra and distance to instability for nonlinear eigenvalue problems

We consider the nonlinear eigenvalue problem (3). In what follows we call

$$\begin{aligned} F(\lambda ):= \sum _{i=0}^m A_i p_i(\lambda ) \end{aligned}$$
(4)

the characteristic matrix. We call the spectrum \(\varLambda \) and the spectral abscissa \(\alpha \), i.e.,

$$\begin{aligned} \varLambda (F)&:= \left\{ \lambda \in \mathbb {C}:\ \det \left( \sum _{i=0}^m A_i p_i(\lambda )\right) =0\right\} , \end{aligned}$$
(5)
$$\begin{aligned} \alpha (F)&:= \sup \left\{ \mathrm{Re }\left( \lambda \right) :\ \lambda \in \varLambda (F)\right\} . \end{aligned}$$
(6)

We are interested in the effect of bounded perturbations of the matrices \(A_i\) on the spectrum, which leads to the perturbed eigenvalue problem

$$\begin{aligned} \left( \sum _{i=0}^m (A_i+\delta A_i) p_i(\lambda )\right) v=0,\quad \lambda \in \mathbb {C},\ v\in \mathbb {C}^n\setminus \{0\}. \end{aligned}$$
(7)

The first step in the robustness analysis is to define the class of perturbations under consideration, as well as a measure of the combined perturbation

$$\begin{aligned} \varDelta :=(\delta A_0, \ldots , \delta A_m). \end{aligned}$$

In analogy to the classical definition of \(\varepsilon \)-pseudospectrum of a matrix [1], we allow the perturbations \(\delta A_i,\ i=0,\ldots , m,\) to be complex matrices, i.e.,

$$\begin{aligned} \varDelta \in \mathbb {C}^{n\times n\times (m+1)}. \end{aligned}$$
(8)

Introducing weights \(w_i\in {\mathbb {R}}^+_0\cup \{\infty \}, i=0,\ldots , m\), we define the following global measure of the perturbations:

$$\begin{aligned} \Vert \varDelta \Vert _{\mathrm {glob}}:=\left\| \left[ \begin{array}{c} w_0\Vert \delta A_0\Vert _{2}\\ \vdots \\ w_m\Vert \delta A_m\Vert _{2}\end{array}\right] \right\| _{\infty }. \end{aligned}$$
(9)

In this way, the condition

$$\begin{aligned} \Vert \varDelta \Vert _{\mathrm {glob}}\le \varepsilon \end{aligned}$$

corresponds to the natural assumptions of taking perturbations satisfying

$$\begin{aligned} \Vert \delta A_i\Vert _{2}\le \varepsilon /w_i,\ i=0,\ldots , m. \end{aligned}$$

This uncertainty bound is also used in [11] and fits within the general class considered in [10]. Setting a weight equal to infinity implies that the corresponding matrix is not perturbed.

With the above class of allowable perturbations and with the measure (9) we define the \(\varepsilon \)-pseudospectrum of (3) as

$$\begin{aligned} \varLambda _\varepsilon \!=\! \bigcup _{\Vert \varDelta \Vert _{{\mathrm {glob}}}\le \varepsilon } \left\{ \lambda \in \mathbb {C} : \det \left( \sum _{i=0}^m \left( A_i \!+\!\delta A_i \right) p_i\left( \lambda \right) \right) \!=\! 0\right\} ,\nonumber \\ \end{aligned}$$
(10)

The set \(\varLambda _\varepsilon \) is closed and its boundary is generically smooth. Moreover if \(\varepsilon _2 > \varepsilon _1\), then we have \(\varLambda _{\varepsilon _1} \subseteq \varLambda _{\varepsilon _2}\).

The corresponding pseudospectral abscissa is defined as

$$\begin{aligned} \alpha _{\varepsilon }(F):=\sup \left\{ \mathrm{Re }\left( \lambda \right) :\ \lambda \in \varLambda _{\varepsilon }\right\} . \end{aligned}$$
(11)

In [10] the following explicit expression for the pseudospectra is obtained.

Proposition 1

For the perturbation class (8) and measure (9) the pseudospectrum \(\varLambda _{\varepsilon }\) of (4) satisfies

$$\begin{aligned} \varLambda _{\varepsilon }(F)=\left\{ \lambda \in \mathbb {C}:\ \sigma _n\left( \sum _{i=0}^m A_i p_i(\lambda )\right) \le \varepsilon \Vert w(\lambda ) \Vert _{1} \right\} , \end{aligned}$$
(12)

where \(\sigma _n(\cdot )\) denotes the smallest singular value and

$$\begin{aligned} w(\lambda )=\left[ \frac{p_0(\lambda )}{w_0} \ \cdots \ \frac{p_m(\lambda )}{w_m} \right] ^T. \end{aligned}$$
(13)

We say that \(F\) is exponentially stable if all zeros are confined to the open left half plane and bounded away from the imaginary axis, i.e., \(\alpha (F)<0\). To assess the robustness of stability w.r.t. perturbations on the coefficient matrices \(A_i\) we introduce the concept of distance to instability.

Definition 1

The distance to instability of (4) (also called stability radius) w.r.t. the perturbation class (8) and measure (9) is defined as

$$\begin{aligned} \mu (F):= \inf \left\{ \varepsilon \ge 0:\ \alpha _{\varepsilon }(F)\ge 0\right\} . \end{aligned}$$

A comment is necessary here. The perturbation measure considered in the definition of \(\varLambda _{\varepsilon }(F)\) and \(\mu (F)\) permits to robustify the notion of stability of the system, thereby exploiting the structure of the characteristic matrix \(F\). It should be noted that, in this way, the pseudospectra and distance to instability depend on the choice of basis functions \(p_i,\ i=0,\ldots ,m\). However, in most applications, a description in a specific basis gives rise to a physical interpretation of the coefficient matrices. For example, finite-element codes in acoustics and vibrations often lead to quadratic eigenvalue problems expressed in a monomial basis,

$$\begin{aligned} F(\lambda )=A_0+A_1\lambda +A_2\lambda ^2, \end{aligned}$$
(14)

where the three coefficient matrices contain information about stiffness, damping and momentum of the system. This information would be merged in a description in another polynomial basis. Similarly, in control problems that give rise to a delay eigenvalue problem, such as

$$\begin{aligned} F(\lambda )=-\lambda I+A_1+A_2 e^{-\lambda \tau }, \end{aligned}$$
(15)

the coefficient matrix corresponding to the exponential term often has an interpretation in terms of inputs, outputs and a feedback control, which is prone to a dead-time in the control loop.

To conclude the section we state and discuss the main assumption made throughout the paper.

Assumption 1

There exist a number \(\varepsilon _{\max }>\mu (F)\) such that for arbitrary but fixed \(r\in \mathbb {R}\) and \(\varepsilon \in [0,\ \varepsilon _{\max })\) the set \(\varLambda _{\varepsilon }\cap \{\lambda \in \mathbb {C}:\ \mathrm{Re }\left( \lambda \right) \ge r\}\) is bounded.

For delay eigenvalue problem (15), Assumption 1 is satisfied for any value of \(\varepsilon _{\max }\) larger than \(\mu (F)\), as long as only \(A_1\) and \(A_2\) are perturbed. For quadratic eigenvalue problem (14) with nonsingular \(A_2\), the assumption is satisfied by taking \(\varepsilon _{\max }=w_2 \sigma _n(A_2)\), with \(w_2\) the weight on the perturbation of \(A_2\). This bound stems from the fact that for \(\varepsilon >w_2\sigma _n(A_2)\) the \(\varepsilon \)-pseudospectrum is unbounded in all directions in the complex plane, containing perturbations of an eigenvalue at infinity [7, Section 2.2].

Assumption 1 implies that, by varying \(\varepsilon \) in a continuous way, a transition from the situation where \(\alpha _{\varepsilon }<0\) to a situation where \(\alpha _{\varepsilon }\ge 0\) is characterized by eigenvalues moving from the open left half plane to the imaginary axis (i.e., right half plane eigenvalues coming from the point at infinity cannot occur). Combined with the characterization (12) this leads to the following expression for the stability radius [10, Corollary 3].

Proposition 2

If the zeros of \(F\) are confined to the open left half plane and Assumption 1 holds then we have

$$\begin{aligned} \mu (F)=\left\{ \sup _{\omega \in \mathbb {R}} \left\| \left( \sum _{i=0}^m A_i p_i(j\omega )\right) ^{-1}\right\| _2 \Vert w(j\omega )\Vert _1 \right\} ^{-1}, \end{aligned}$$
(16)

with \(w\) given by (13).

3 Previous work

The approach proposed in [7] to compute the \(\varepsilon \)-pseudospectral abscissa for a nonlinear eigenvalue problem is inspired by [6] where an iterative algorithm is proposed to find a locally rightmost point of the pseudospectrum of a matrix. This algorithm is mainly based on the property that a boundary point of the pseudospectrum is an eigenvalue of \(A + \varepsilon uv^*\), with \(u\) and \(v\) respectively left and right corresponding eigenvectors. This gives rise to a fixed-point iteration method where the left and right eigenvectors determine the next perturbation. In [12] a continuous dynamical version of the algorithm is presented. More precisely, a differential equation on the manifold of normalized rank one matrices is constructed, having as stationary point an optimal perturbation for which the corresponding rightmost eigenvalue is an extremal point of the pseudospectrum. Extensions to real perturbations are described in [13].

3.1 Computing pseudospectral abscissa for large-scale linear and nonlinear eigenvalue problems

In what follows, we review the method of [7], which generalizes the method of [6] to nonlinear eigenvalue problems.

Assumption 1 guarantees the presence of a globally rightmost point of the pseudospectrum \(\varLambda _{\varepsilon }\), which we denote by \(\lambda _\varepsilon \), i.e.,

$$\begin{aligned} \mathrm{Re }\left( \lambda _\varepsilon \right) =\alpha _{\varepsilon }. \end{aligned}$$

Let \((u_\varepsilon ,v_\varepsilon )\) be a pair of normalized left and right singular vectors corresponding to

$$\begin{aligned} \sigma _n\left( \sum _{i=0}^m A_i p_i(\lambda _{\varepsilon })\right) \end{aligned}$$

and let \(c\in \mathbb {C}^n\) be such that \(u_{\varepsilon }\) satisfies the normalization constraint

$$\begin{aligned} c^* u_\varepsilon \in \mathbb {R}^+_0. \end{aligned}$$

We make some technical assumptions, on which the algorithm relies.

Assumption 2

The following conditions are assumed:

  1. 1.

    the smallest singular value of \(\sum _{i=0}^m A_i p_i(\lambda _{\varepsilon })\) is simple;

  2. 2.

    the rightmost eigenvalue of

    $$\begin{aligned} \left( \sum _{i=0}^m \left( A_i-\frac{\overline{p_i(\lambda _{\varepsilon })}}{w_i |p_i(\lambda _{\varepsilon })|} u_{\varepsilon } v_{\varepsilon }^* \varepsilon \right) p_i(\lambda )\right) v=0 \end{aligned}$$
    (17)

    is simple;

  3. 3.

    we have \(p_i(\lambda _{\varepsilon })\ne 0\) whenever \(w_i\) is finite, for \(0\le i\le m\).

The cases where Assumption 2 is not satisfied correspond to degenerate cases, see [7] for a detailed discussion. In Propositions 3.1 and 3.3 of this reference it has been shown that under Assumption 2 the triple \((\lambda ,u,v)=(\lambda _\varepsilon ,u_\varepsilon ,v_\varepsilon )\) is an isolated solution of the following system of equations and inequalities:

$$\begin{aligned}&\left( \sum _{i=0}^m \left( A_i-\frac{\overline{p_i(\lambda )}}{w_i |p_i(\lambda )|} u v^* \varepsilon \right) p_i(\lambda )\right) v=0, \end{aligned}$$
(18)
$$\begin{aligned}&u^*\left( \sum _{i=0}^m \left( A_i-\frac{\overline{p_i(\lambda })}{w_i |p_i(\lambda )|} u v^* \varepsilon \right) p_i(\lambda )\right) =0,\end{aligned}$$
(19)
$$\begin{aligned}&u^*u=v^* v=1,\quad c^*u>0, \end{aligned}$$
(20)
$$\begin{aligned}&u^*\left( \sum _{i=0}^m A_i p_i^{\prime }(\lambda )\right) v-\varepsilon \sum _{i=0}^m \frac{ \overline{p_i(\lambda )}p_i^{\prime }(\lambda )}{w_i |p_i(\lambda )|}>0. \end{aligned}$$
(21)

Furthermore, \(\lambda _{\varepsilon }\) is the rightmost eigenvalue of the perturbed eigenvalue problem (17). The meaning of (18)–(21) can be summarized as follows. Conditions (18)–(19) characterize points lying on curves described by

$$\begin{aligned} \sigma _k\left( \sum _{i=0}^m A_i p_i(\lambda )\right) = \varepsilon \Vert w(\lambda )\Vert _1,\qquad k\in \{1,\ldots ,n\}, \end{aligned}$$

which include the boundaries of the pseudospectrum (case where \(k=n\)). Condition (21) is an optimality condition characterizing that the outwards pointing normal vector to the set

$$\begin{aligned} \left\{ \lambda \in \mathbb {C}:\ \sigma _k\left( \sum _{i=0}^m A_i p_i(\lambda )\right) - \varepsilon \Vert w(\lambda )\Vert _1\le 0\right\} \end{aligned}$$

lies in the direction of the positive real axis. Conditions (20) are normalization constraints.

The algorithms in [7] are based on turning (18)–(21) into a fixed-point iteration. The basic steps are described by Algorithm 1.

figure a

Note that there may not exist \(u_k,v_k\) satisfying (23) and (24), or they may not be unique. However, there exist remedies for this, which are discussed in detail in [7]. Summarizing, if there is no solution, the conditions are relaxed. If there are multiple solutions, the solution is selected that gives rise to the largest value of \(\mathrm{Re }\left( \lambda _{k+1}\right) \).

Refinements of the basic algorithm to enforce global convergence to the globally rightmost point of the pseudospectrum can be found in [7], consisting of an adaptation to make the sequence \(\{\mathrm{Re }\left( \lambda _k\right) \}_{k\ge 1}\) monotonic (and in this way preventing, for example, that the iterations converge to a non-equilibrium solution), and the possibility to start with several rightmost eigenvalues (to avoid converging to a locally but not globally rightmost point). All variants share the property that their application only requires methods to compute the rightmost eigenvalues and the corresponding left and right eigenvectors, for which fast iterative solvers can be used if the system matrices are large and sparse. This feature makes them also applicable to large-scale problems. The local convergence is linear, with a convergence factor tending to zero as \(\varepsilon \rightarrow 0\). For an extended discussion on this point see [7].

An application of the method to compute the distance to instability has been considered in [14] and is described in Sect. 4.1.

4 Distance to instability: local method based on root finding of spectral abscissa function

4.1 Newton’s method

We start with a technical lemma about the pseudospectral abscissa function

$$\begin{aligned} \mathbb {R}_+\ni \varepsilon \mapsto \alpha _{\varepsilon }(F), \end{aligned}$$
(25)

which is a consequence of Assumption 1 and the characterization (12).

Lemma 1

The function (25) is continuous, increasing, and, if not all weights are infinity, then we have \(\lim _{\varepsilon \rightarrow \infty } \alpha _{\varepsilon }=\infty .\)

Under mild conditions we have in fact that \(\alpha _{\varepsilon }\) is strictly increasing (this will be clarified shortly in Theorem 3). Based on Definition 1 and Lemma 1 we can refine (for exponentially stable \(F\))

$$\begin{aligned} \mu (F)=\inf \{\varepsilon >0:\ \alpha _{\varepsilon }(F)=0\}, \end{aligned}$$

and in case of strict monotonicity of \(\alpha _{\varepsilon }\) we can drop the infimum, i.e., we can compute \(\mu (F)\) as the value \(\varepsilon \) such that \(\alpha _{\varepsilon }(F)=0\). The following, main result shows that generically the derivative of the pseudospectral abscissa with respect to \(\varepsilon \) exists and can be obtained as a by-product from its computation using Algorithm 1. There might be cases such that the derivative does not exist (for example a defective right-most point of multiplicity 2) but they have to be constructed in a non generic way.

Theorem 3

Let \(\hat{\varepsilon }>0\) and assume that \(\lambda _{\hat{\varepsilon }}\) is a globally rightmost point of \(\varLambda _{\hat{\varepsilon }}\). Let \((u_{\hat{\varepsilon }},v_{\hat{\varepsilon }})\) be a pair of (left, right) singular vectors corresponding to \(\sigma _n\left( \sum \nolimits _{i=0}^m A_i p_i(\lambda _{\hat{\varepsilon }})\right) \). Let Assumption 2 be satisfied, with the triple \((\lambda _{\hat{\varepsilon }},u_{\hat{\varepsilon }},v_{\hat{\varepsilon }})\) solving (18)–(21) for \(\varepsilon =\hat{\varepsilon }\). There exists a constant \(\delta >0\) and a continuous function

$$\begin{aligned} (\hat{\varepsilon }-\delta ,\ \hat{\varepsilon }+\delta )\ni \varepsilon \mapsto (\lambda (\varepsilon ), u(\varepsilon ),v(\varepsilon )) \end{aligned}$$

such that \((\lambda (\varepsilon ),u(\varepsilon ),v(\varepsilon ))\) satisfies (18)–(21) for all \(\varepsilon \in (\hat{\varepsilon }-\delta ,\ \hat{\varepsilon }+\delta )\), such that \((\lambda (\hat{\varepsilon }),u(\hat{\varepsilon }),v(\hat{\varepsilon }))=(\lambda _{\hat{\varepsilon }}, u_{\hat{\varepsilon }},v_{\hat{\varepsilon }})\), as well as

$$\begin{aligned} \mathrm {Re}\left( \lambda ^{\prime }(\varepsilon )\right) = \frac{ \sum \nolimits _{i=0}^m \frac{\left| p_i(\lambda _{\varepsilon })\right| }{w_i} }{ u_{\varepsilon }^* \Big (\sum \nolimits _{i=0}^m A_i\, p'_i(\lambda _{\varepsilon }) \Big ) v_{\varepsilon } - \varepsilon \ \sum _{i=0}^m\left( \frac{\bar{p}_i(\lambda _{\varepsilon })p_i^{\prime }(\lambda _{\varepsilon }) }{w_i |p_i(\lambda _{\varepsilon })|}\right) } {,}\nonumber \\ \end{aligned}$$
(26)

for all \(\varepsilon \in (\hat{\varepsilon }-\delta ,\ \hat{\varepsilon }+\delta )\). Moreover, if the globally rightmost point is uniqueFootnote 1 then we have

$$\begin{aligned} \left. \frac{d\alpha _{\varepsilon }}{d\varepsilon }\right| _{\varepsilon =\hat{\varepsilon }}=\mathrm{Re }\left( \lambda ^{\prime }(\hat{\varepsilon })\right) . \end{aligned}$$
(27)

The proof can be found in [14]. Note that the differentiability of \(\lambda \) in a neighbourhood of \(\hat{\varepsilon }\) follows from the implicit function theorem applied to (18)–(21).

It follows from Theorem 3 that the pseudospectral abscissa function is differentiable whenever the globally rightmost point of the pseudospectrum is unique and the technical Assumption 2 is satisfied. This is the case for almost all \(\varepsilon \), that is, except, e.g., for special choices where the pseudospectrum exhibits for instance more than one globally rightmost point in the upper half complex plane. The fact that the derivative must be strictly positive can be readily seen from expression (26) and optimality condition (21). Finally, formulae (26)–(27) allow us to apply Newton’s method to solve

$$\begin{aligned} \alpha _{\varepsilon }(F)=0 \end{aligned}$$

for \(\varepsilon \), in order to compute the distance to instability, that is

$$\begin{aligned} \varepsilon _{k+1}=\varepsilon _k- \frac{\alpha _{\varepsilon _k}(F)}{\left. \frac{d\alpha _{\varepsilon }}{d\varepsilon }(F)\right| _ {\varepsilon =\varepsilon _k}}, \qquad k=0,1,\ldots \end{aligned}$$
(28)

In some cases it may occur that the distance to instability is not attained at a unique point and the \(\varepsilon \)-pseudospectrum is tangential to the imaginary axis at multiple points. In this situation the pseudospectral abscissa function is in general not differentiable. Newton’s method may run into trouble, in the sense that the rightmost point of the pseudospectrum, computed in the inner iteration, may start alternating between approximations of the different globally rightmost points. Such a situation, which is non-generic unless the problem data result already from an optimization of the distance over parameters, can easily be detected from the output of the algorithm.

4.2 The novel method: mixing the inner and outer iteration loop

Newton’s iteration (28) is characterized by two nested loops. In the outer iteration loop, the value of \(\varepsilon \) is updated. Intrinsic to Newton’s method, the convergence is quadratic in the generic case. In an inner iteration loop, the pseudospectral abscissa is computed for a fixed value of \(\varepsilon \) using the linearly converging Algorithm 1. In order to decrease the computational cost, one may argue that an accurate computation of the pseudospectral abscissa is not needed in the first outer iterations, but only when \(\varepsilon \) is close to \(\mu (F)\). One can go a step further and replace the computation of the pseudospectral abscissa by one iteration step of Algorithm 1, and update \(\varepsilon \) based on the current value of \(\lambda _k\), as if it had already converged to the globally rightmost point. This idea lies at the basis of Algorithm 2. We call the underlying method the method of adaptive perturbations because in every iteration step both the value of \(\varepsilon \) and the perturbations on the coefficient matrices are updated. Note that a fixed point \((\lambda ,\varepsilon ,u,v)\) of the iteration satisfies the conditions (18)–(21), and in addition we have \(\mathrm{Re }\left( \lambda \right) =0\).

It should be noted that in case condition (30) and (31) do not allow a unique solution, the same remedies explained for Algorithm 1 apply. This situation has, however, not occurred in our experiments.

In all our experiments, represented in Sect. 5, Algorithm 2 turns out to be significantly faster then the iteration (28), while loss of convergence due to a nesting of iterations has not been observed. We note that, also with the update of \(\varepsilon \), the method only relies on the computation of the rightmost eigenvalue of perturbed problems, characterized by rank-one updates on the original system matrices. The latter property can be exploited by iterative solvers, both in matrix vector products and in solving linear systems, where the Sherman–Morrison–Woodbury formula plays a key-role. The overall cost of Algorithm 2 is comparable to the cost of Algorithm 1.

Algorithm 2 provides a fast method for computing the distance to instability, yet it is a local method, whose convergence to the correct solution relies on the choice of starting values. This is inherited from the fact that Algorithm 1 is based on a characterization of locally rightmost points of the \(\varepsilon \)-pseudospectrum. A stationary point of the iteration in Algorithm 2 might correspond to the situation where a locally rightmost point of the pseudospectrum is on the imaginary axis, while the distance to instability corresponds to the value of \(\varepsilon \) for which the globally rightmost point of the pseudospectrum is on the imaginary axis. One way to prevent such a situation and to improve the reliability is to run the algorithm multiple times, initiated with several dominant eigenvalues and corresponding eigenvectors. Here the term dominant corresponds to having a large real part and/or a large eigenvalue condition number (see the discussion in [7]), which are combined in the criterion

$$\begin{aligned} \left| \frac{u^*\left( \sum _{i=0}^m A_i p_i'(\lambda )\right) v}{\mathrm{Re }\left( \lambda \right) }\right| , \end{aligned}$$

where \((\lambda ,u,v)\) is the eigentriple under consideration. Another solution is to generate starting values with the global method discussed in the next section.

figure b

5 Distance to instability via global optimization

From Proposition 2 the following expression for the distance to instability is obtained:

$$\begin{aligned} \mu \left( F\right) =\inf _{\omega \in \mathbb {R}}f(\omega ), \end{aligned}$$
(32)

where

$$\begin{aligned} f(\omega ):= \sigma _{n}\left( \sum _{i=0}^m A_i p_i\left( j\omega \right) \right) \left[ \sum _{i=0}^m \frac{\left| p_i\left( j\omega \right) \right| }{w_i}\right] ^{-1}, \end{aligned}$$

i.e., \(f\) is the global mimimum of (frequency depending) scaled singular value curves of the parameterized matrix

$$\begin{aligned} \sum _{i=0}^m A_i p_i\left( j\omega \right) . \end{aligned}$$
(33)

Note that the objective function \(f\) is smooth almost everywhere when polynomial or exponential functions \(\left\{ p_i\right\} _{i=0}^m\) are used. This enables the recently proposed optimization method of [8], suitable for finding the global minimum of a minimum eigenvalue function of a parametrized Hermitian matrix. In order to find a global minimum of \(f\) in an interval \([\omega ^l,\ \omega ^r]\), in the \(\ell \)-th iteration this method computes the global minimum on this interval of a function of the form

$$\begin{aligned} \max _{1\le k\le \ell -1} f_k(\omega ) \end{aligned}$$
(34)

where the function \(f_k,\ k\in \{1,\ldots ,\ell -1\}\) is a (lower) support function of \(f\) at \(\omega =\omega _k\), i.e., a function satisfying \(f_k(\omega _k)=f(\omega _k)\) and \(f_k(\omega )\le f(\omega ),\ \forall \omega \in [\omega ^l,\ \omega ^r]\). Denoting the minimizer of function (34) by \(\omega _{\ell }\), the whole process repeats itself with \(f_{\ell }\) added to the set of support functions.

In [8] a detailed proof of convergence is presented, which relies on the property that the functions \(f_k\) provide global lower bounds on \(f\). Because the function \(f\) may have discontinuities in the derivative due to the intersection of singular value curves of (33), nonsmooth (e.g. piecewise quadratic) support functions are needed to guarantee convergence, at the price of a complex subproblem of minimizing (34). In [8] also a more practical algorithm is presented and available for download, which relies on simple functions of the form

$$\begin{aligned} f_k(\omega )=f(\omega _k)+f'(\omega _k)(\omega -\omega _k)-\gamma (\omega -\omega _k)^2, \end{aligned}$$
(35)

where \(-\gamma \) is a lower bound on the second derivative of the singular value curves under investigation. In case the smallest singular value curve of (33) does not intersect another singular curve in the interval \([\omega ^l,\ \omega ^r]\), convergence of the algorithm to the global minimum is guaranteed, as all functions (35) are lower support functions for \(f\). In the other case, convergence is not guaranteed, yet the algorithm performs very well in practice and proves to be very robust (see [8]), which may be explained by the property that a minimum eigenvalue function of a Hermitian matrix (in our case a minimum singular value function) is smooth in the neighborhood of a minimum. The core of our code for minimizing (32) is this optimization algorithm of [8]. Other important components are:

  1. 1.

    A routine which, for a given value of \(\omega \), returns the objective function and its derivative whenever it is differentiable. The computation of \(f\) mainly amounts to computing the smallest singular value of matrix \(F(j\omega )\). Hence, a specific solver for nonlinear eigenvalue problems is not required, in contrast to the methods proposed in the previous section. In the generic case where \(p_i(j\omega )\ne 0, i=0,\ldots m\) and, in addition, the smallest singular value of \(F(j\omega )\) is simple, the derivative \(f'(\omega )\) exists and we can express

    $$\begin{aligned} f'(\omega )&= -\,\frac{ \mathrm{Im }\left( u^* \left( \sum _{i=0}^m A_i p_i'(j\omega )\right) v\right) }{\sum _{i=0}^m \frac{|p_i(j\omega )|}{w_i}}\\&+\,\frac{\sigma _n\left( \sum _{i=0}^m A_i p_i\left( j\omega \right) \right) \sum _{i=0}^m \frac{\mathrm{Im }\left( p_i(-j\omega )p_i'(j\omega )\right) }{w_i|p_i(j\omega )|} }{\left( \sum _{i=0}^m \frac{|p_i(j\omega )|}{w_i}\right) ^2 }, \end{aligned}$$

    where \(u\) and \(v\) are normalized left and right singular vectors corresponding to the smallest singular value of \(F(j\omega )\).

  2. 2.

    Prior knowledge of a compact interval which contains the global minimizer over \(\mathbb {R}\). For the delay eigenvalue problem such an interval can be computed as in [15], but the obtained bounds may be conservative. Almost always the heuristic choice \([0,\ \omega _{m}]\) is sufficient, where \(\omega _m=1.2\ \mathrm{Im }\left( \lambda _m\right) \), with \(\lambda _m\) the dominant eigenvalue with largest imaginary part. A practical choice is to consider three dominant eigenvalues.

  3. 3.

    A lower bound \(-\gamma \) on the second derivative of the scaled singular value curves, holding over the whole interval under consideration. It is impractical to compute the second derivative and therefore, in our implementation, a piece-wise cubic approximating function of \(f\) over the interval \([0,\omega _m]\) is used to estimate a bound.

The choice of \(\gamma \) is the critical factor in the application of the method of [8], as analytical expressions for lower bounds on \(f''(\omega )\) are—to the best of our knowledge—not available, even not for the standard eigenvalue problem. On the one hand, if the estimated value of \(\gamma \) is too small, there is a risk of no convergence to the global minimum. On the other hand, choosing a safe but conservative bound, leads to slow convergence. As a consequence a large number of function evaluations are needed if a high accuracy is requested. A robust and efficient algorithm can be obtained by combining the global method with the local one presented in the previous section: first the global method is run with a “safe” value of \(\gamma \), but only until a rough approximation of the global optimum is obtained. Subsequently, the approximation is used to initialize Algorithm 2, which, on its turn, allows to refine the results up to the desired precision. In the next Section we will call this two-step procedure the hybrid algorithm.

6 Numerical experiments

We illustrate the methods for computing the distance to instability of Sects. 4 and 5. We compare the Matlab implementations of the Newton iteration (28), the method with adaptive perturbations (Algorithm 2), and the global optimization algorithm for linear eigenvalue problems, quadratic eigenvalue problems, and for a small- and large-scale delay eigenvalue problem.

Table 1 Linear eigenvalue problems [21, 22]: distance to instability \(\varepsilon _d\) and corresponding pseudospectral abscissa \(\alpha _{\varepsilon _d}\) computed with Newton’s method, the adaptive perturbation method, and the optimization method
Table 2 Quadratic eigenvalue problems [23]: distance to instability \(\varepsilon _d\) and corresponding pseudospectral abscissa \(\alpha _{\varepsilon _d}\) computed with Newton’s method, the adaptive perturbation method, the optimization method, and a hybrid version of the previous two methods

For computing the distance to instability of a nonlinear eigenvalue problem with one of the root finding methods (Newton’s iteration (28) and Algorithm 2), we need a solver for computing the rightmost eigenvalue \(\lambda _{\mathrm {RM}}\). In the recent literature [1619], there exist several general nonlinear eigenvalue solvers, which are not only applicable to small problems but also to large sparse problems. In these numerical experiments the quadratic eigenvalue problem is first linearized and then \(\lambda _{\mathrm {RM}}\) can be easily computed using eig or eigs. For computing \(\lambda _{\mathrm {RM}}\) of the delay problems we used the delay eigenvalue solver proposed in [20]. On the other hand, the method based on global optimization (Sect. 5) for computing the distance to instability of a nonlinear eigenvalue problem only requires the computation of the smallest singular value of a constant matrix. For this we used svd or svds.

6.1 Linear eigenvalue problems

For the experiments on linear eigenvalue problems we take two examples (‘airy’ and ‘transient’) of the EIGTOOL collection [21] and one example (‘lshape’) of the Harwell–Boeing collection [22]. This last example is shifted with shift \(\sigma = 7\) in order to stabilize the matrix. The results of the computations of the distance to instability for these linear eigenvalue problems are displayed in Table 1.

The ‘airy’ example shows the performance gain of the adaptive perturbation and optimization method compared to Newton’s iteration (28). However, for the ‘transient’ example the gain of the optimization method is lost because of a too large estimate for the lower bound on the second derivative. The execution time of the Newton iteration and the method with adaptive perturbations for the ‘lshape’ example is similar because the start value \(\varepsilon _0\) for both methods is zero, resulting after one iteration in \(\varepsilon _1 = -(u_0^*v_0)\mathrm{Re }\left( \lambda _0\right) \) and this happens to be the distance to instability. Therefore the number of iterations of both methods is limited.

Table 3 Delay eigenvalue problems: distance to instability \(\varepsilon _d\) and corresponding pseudospectral abscissa \(\alpha _{\varepsilon _d}\) computed with Newton’s method, the adaptive perturbation method, the optimization method, and a hybrid version of the previous two methods

6.2 Polynomial eigenvalue problems

For the experiments on polynomial eigenvalue problems we take three (shifted) examples (‘hospital’, ‘pdde_ stability’ and ‘sign2’) of the NLEVP collection [23]. We applied perturbations to all system matrices and took unity weights, i.e., \(w_i = 1\). The results of the computations are displayed in Table 2.

The ‘hospital’ problem shows that the root finding methods (Newton’s iteration (28) and Algorithm 2) converge to a locally rightmost point, whereas the method based on optimization (Sect. 5) converges to the globally rightmost point. However, to achieve full accuracy of the global minimum, the optimization method requires a lot of iterations, since for this problem we needed to take to into account ten dominant eigenvalues. Therefore, a wise combination of the proposed Algorithms 2 and the global optimization method can tackle this limitation.

In particular, first execute the optimization method with high tolerance. If the lower bound is good, this can be done very fast. Assume the minimizer \(\tilde{\omega }\) is found. Then the point \(j\tilde{\omega }\) lies on the boundary of the \(\varepsilon \)-pseudospectrum where \(\varepsilon \) equals the objective function value. The corresponding perturbation can be constructed from the singular vectors by using [6, Lemma 1.1] for linear problems and [7, Proposition 3.1] for nonlinear problems. Subsequently, adaptive perturbations can be constructed by starting from \(\lambda _0 = j\tilde{\omega }\), and \(\varepsilon \) equal to the objective function value and the aforementioned perturbation. This almost always results in a global solution with high accuracy, see, e.g., Table 2 where we set the required tolerance to 1e\(-\)12.

6.3 Delay eigenvalue problems

The first delay example is the following small-scale delay eigenvalue problem

$$\begin{aligned} \left( \lambda I - A_0 - A_1 e^{-\lambda \tau } \right) x = 0, \end{aligned}$$
(36)

where \(\tau = 1\) and

$$\begin{aligned} A_0 = \left[ \begin{array}{cc} -5 &{} 1 \\ 2 &{} -6 \end{array} \right] , \quad A_1 = \left[ \begin{array}{l@{\quad }l} -2 &{} 1 \\ 3 &{} -1 \end{array} \right] . \end{aligned}$$

We assume that \(A_0\) and \(A_1\) are perturbed and choose unity weights, i.e., \(w_0 = w_1 = 1\). The corresponding results are displayed in Table 3. We remark that Newton’s method requires much less iterations than the adaptive perturbation method for the same accuracy. However, the computation cost for a Newton iteration is much higher. Note also that the computation time of the global optimization method may vary and depends on the estimate of \(\gamma \). Therefore, the hybrid method is a good compromise for global convergence, accuracy and computation time.

The second delay example is a large-scale problem coming from a PDE with delay [20],

$$\begin{aligned} \frac{\partial v(x,t)}{\partial t} = \frac{\partial ^2 v(x,t)}{\partial x^2} + a_0(x)v(x,t) + a_1(x)v(\pi -x,t-1), \end{aligned}$$

where \(a_0(x) = -2 \sin (x)\), \(a_1(x) = 2 \sin (x)\), and \(v_x(0,t) = v_x(\pi ,t) = 0\). The second derivatives in space are approximated with central differences. This gives rise to a standard delay eigenvalue problem of the form (36), with one delay and sparse matrices \(A_0\) and \(A_1\). The number of spatial discretization points is taken such that \(n = 5,000\). We take the weights \(w_0 = w_1 = 1/2\). The corresponding results for the distance to instability are also displayed in Table 3. Again, we obtained similar results as for the small-scale delay problem. Note also that the difference in computation cost between Newton’s method and the adaptive perturbation method increases with the problem size \(n\).

7 Concluding remarks

Two algorithms have been adapted, combined and implemented for computing the distance to instability, for both linear and nonlinear eigenvalue problems. The two algorithms are well suited for large-scale sparse problems. Although for the method with adaptive perturbations only convergence to a locally rightmost point of the pseudospectrum on the imaginary axis can be guaranteed, a high accuracy can be achieved. The second method directly solves an optimization problem inferred from the characterization (12). As the main advantage, the global optimum can be found, yet the convergence rate can be slow if the second derivative of the objective function cannot be accurately estimated. Therefore, a hybrid algorithm is recommended which almost always converges globally with high accuracy and a reasonable computation time.

Recently, several algorithms have been proposed for computing extremal points of structured and / or real pseudospectra of matrices, for the Euclidean and the Frobenius norm (see, e.g., [13] for real pseudospectra). This class of methods is based on either a discrete iteration or on a differential equation on a manifold of low rank matrices, which originate from the property that in the cases under consideration the boundary of the pseudospectra can be reached by applying low rank perturbations to the matrix. It is expected that the idea of updating \(\varepsilon \), behind Algorithm 2, applies to all these methods, which, in this way, could be extended to the computation of the corresponding distances to instability. Extending the presented approach based on global optimization to structured or real perturbations seems more difficult. For the standard eigenvalue problem, an extension to real perturbations measured with the Euclidean norm is still possible. The key is to replace the expression for the distance to instability (12) by a characterization in terms of real structured singular values [24, 25]. However, computing real structured singular values is significantly more demanding computationally than computing (standard) singular values.