The greedy strategy in optimizing the Perron eigenvalue

We address the problems of minimizing and of maximizing the spectral radius over a convex family of non-negative matrices. Those problems being hard in general can be efficiently solved for some special families. We consider the so-called product families, where each matrix is composed of rows chosen independently from given sets. A recently introduced greedy method works surprisingly fast. However, it is applicable mostly for strictly positive matrices. For sparse matrices, it often diverges and gives a wrong answer. We present the"selective greedy method"that works equally well for all non-negative product families, including sparse ones. For this method, we prove a quadratic rate of convergence and demonstrate its exceptional efficiency in numerical examples. In dimensions up to 2000, the matrices with minimal/maximal spectral radii in product families are found within a few iterations. Applications to dynamical systems and to the graph theory are considered.


Introduction
The problem of minimizing or maximizing the spectral radius of a non-negative matrix is notoriously hard. There are no efficient algorithms even for solving this problem over a compact convex (say, polyhedral) set of positive matrices. This is because the objective function is neither convex nor concave in matrix coefficients and, in general, non-Lipschitz. There may be many points of local extrema which are, moreover, hardly identified. Nevertheless, for some special sets of matrices efficient methods do exist. In this paper we consider the Both minimization and maximization problems can be efficiently solved [24]. The spectral simplex method for finding global minimum and maximum demonstrates a surprising efficiency [28]. For example, if all the sets F i are two-element (so the set F has a structure of a Boolean cube), then in dimension d = 100, the matrices with minimal and maximal spectral radii are found (precisely!) within 50 − 60 iterations for 10-15 sec in a standard laptop. Note that in this case F contains |F |= 2 100 > 10 30 matrices. If for the same dimension d = 100, each uncertainty set consists of 100 rows (so, |F |= 10 200 ), then the spectral simplex method performs about 200 − 300 iteration and solves the problem (1) in less than a minute. We write the algorithm in the next section, but its idea can be described within a few lines. Let us consider the maximization problem (1). We start with an arbitrary matrix A 1 ∈ F with rows a i ∈ F i , i = 1, . . . , d, and compute its leading eigenvector v 1 . Then we solve the problem (v 1 , a) → max, a ∈ F 1 , in other words, we find an element from F 1 that makes the biggest projection onto the eigenvector v 1 . If the row a 1 is already optimal, then we leave it and do the same with the second row and with the set F 2 , etc. If all rows of A are optimal, then A has the biggest spectral radius. Otherwise, we replace the row a i by an optimal element from F i and thus obtain the next matrix A 2 . Then we do the next iteration, etc. If all the sets F i consist of strictly positive rows, then the spectral radius increases each iteration. So, we have a relaxation scheme which always converges to a global optimum. However, if the rows may have zeros, then the relaxation is non-strict (it may happen that ρ(A k+1 ) = ρ(A k )), and the algorithm may cycle or converge to a non-optimal matrix. In [28] this trouble was resolved and the method was modified to be applicable for sparse matrices. We will formulate the idea in Theorem A in the next section.
Recently in [1,25] the spectral simplex method was further improved: in each iteration we maximize not one row but all rows simultaneously. According to numerical experiments, this slight modification leads to an exceptionally fast convergence: in all practical examples the algorithm terminates within 3 − 4 iterations! In the example above, with d = 100, |F i |= 100, i = 1, . . . , d, the absolute minimum/maximum is found within a few seconds and the number of iterations rarely exceeds 5. Even for dimension d = 2000, the number of iterations never exceeds 10. The authors of [1] and of [25] came to problem (1) from different angles. The paper [1] studies entropy games with a fixed number of states. The new method was called policy iteration algorithm. It is based not only on the spectral simplex method but on earlier ideas form the game theory of Hoffman and Karp [16] and of Rothblum [29]. The paper [25] solves the problem of finding the closest stable matrix in the L 1 norm. The authors derive the same method and call it the greedy method. We will use the latter name. This modification of the spectral simplex method, although has a much faster convergence, inherits the main disadvantage: it works only for strictly positive matrices. For sparse matrices, it often stucks, cycles, or converges to a wrong solution. None of ideas from the work [28] which successively modified the spectral simplex method helps for the greedy method. This issue is discussed in detail is Section 3. In [1] the positivity assumption is relaxed to irreducibility of all matrices A k in each step, but this condition is also very restrictive and is never satisfied in practice for sparse matrices. In [25] the greedy method was modified for sparse matrices, but only for minimization problem and by a significant complication of the procedure. Therefore, in this paper we attack the three main issues: 1. Is it possible to drop the strict positivity assumption for the greedy method? Can this method be applied for sparse matrices?
2. To estimate the rate of convergence theoretically. To reveal the secret of an extremely fast convergence of the greedy method.
3. To apply the greedy method to known problems. The first issue is solved in Section 4. We derive the selective greedy method working for all kind of non-negative matrices. To this end we introduce and apply the notion of selected leading eigenvector. The new greedy method is as simple in realisation and as fast as the previous one.
In Section 5 we prove two main theorems about the rate of convergence. We show that both the simplex and the greedy methods have a global linear rate of convergence. Then we show that the greedy method (but not the spectral simplex!) has a local quadratic convergence, which explains its efficiency. We also estimate the constants and parameters of the convergence.
Numerical results for matrix families in various dimensions are reported in Section 6. Section 7 shows application of the greedy method to problems of dynamical systems and of the graph theory. Finally, in Section 8, we discuss the details of practical implementation.
2. The minimizing/maximizing spectral radius of product families 2.1. The description of the methods . . , d. Those notation are not completely analogous: the minimality in each row is defined with respect to an arbitrary leading eigenvector v ≥ 0, while the maximality needs a strictly positive v.
We first briefly describe the ideas of the methods and then write formal procedures. Optimisation of spectral radius of a product family is done by a relaxation scheme. We consider first the maximization problem.
Starting with some matrix A 1 ∈ F we build a sequences of matrices A 1 , A 2 , . . . such that ρ(A 1 ) ≤ ρ(A 2 ) ≤ · · · Every time the next matrix A k+1 is constructed from A k by the same rule. The rule depends on the method: Spectral simplex method. A k+1 is obtained from A k by changing one of its lines a where v k is a leading eigenvector of A k . We fix i and take the element a (k) i ∈ F i which maximizes the scalar product with the leading eigenvector v k over all a i ∈ F i . The algorithm terminates when no further step is possible, i.e., when the matrix A k is maximal in each row with respect to v k . Thus, each iteration makes a one-line correction of the matrix A k to increase the projection of this line to the leading eigenvector v k . Of course, such a line a (k+1) i may not be unique. In [28] the smallest index rule was used: each time we take the smallest i for which the line a (k) i is not maximal with respect to v k . This strategy provides a very fast convergence to the optimal matrix. In [1] the convergence was still speeded up by using the pivoting rule: i is the index for which the ratio is maximal. Thus, we always choose the steepest increase of the scalar product. One iteration of this method requires the exhaustion of all indices i = 1, . . . , d, which may be a disadvantage in case of complicated sets F i .
The greedy method. A k+1 is obtained by replacing all rows of A k with the maximal rows in their sets F i with respect to the leading eigenvector v k .
So, in contrast to spectral simplex method, we change not only one line of A k , but all non-optimal lines, where we can increase the scalar product (a is already optimal, we do not change it and set a Realization. The maximal scalar product (a i , v k ) over all a ∈ F i is found by solving the corresponding convex problem over co(F i ). If F i is finite, this is done merely by exhaustion of all elements of F i . If F i is a polyhedron, then we find a (k+1) i among its vertices solving an LP problem by the (usual) simplex method.
Convergence. Denote ρ(A k ) = ρ k . It was shown in [28] that both methods are actually relaxations: ρ k+1 ≥ ρ k for all k. Moreover, if A k is irreducible, then this inequality is strict. Hence, if in all iterations the matrices are irreducible (this is the case, for instance, when all the uncertainty sets F i are strictly positive), then both methods produce strictly increasing sequence of spectral radii {ρ k } k∈N . Hence, the algorithm never cycles. If all the sets F i are finite or polyhedral, then the set of extreme points of the family F is finite, therefore the algorithm eventually arrives at the maximal spectral radius ρ max within finite time. This occurs at the matrix A k maximal in each row.
If all F i are general compact sets, then it may happen that a matrix maximal in each row, although exists, will not be reached within finite time. In this case the algorithm can be interrupted at an arbitrary step, after which we apply the a posteriory estimates for ρ max defined below. We Then for for an arbitrary matrix A and its leading eigenvector v k , we define the following values.
Thus, s i (A) is the maximal ratio between the value (a, v k ) over all a ∈ F i and the ith component of v k . Similarly, for the minimum: Then the following obvious estimates for ρ max are true: Proposition 1 [28] For both the spectral simplex method and the greedy method (for maximization and for minimization respectively), we have in kth iteration: In Section 5 we show that at least for strictly positive matrices, estimates (4) converge to ρ max and to ρ min respectively with linear rate. Moreover, for the greedy method the convergence is locally quadratic, which explains its efficiency in all practical examples. However, for sparse matrices the situation is more difficult. The equalities ρ k+1 ≥ ρ k are not necessarily strict. The algorithm may stay in the same value of ρ k for many iterations and even may cycle. Before addressing this issue, we write a formal procedure for both algorithms.

The algorithms
The spectral simplex method.
Initialization. Taking arbitrary a i ∈ F i , i = 1, . . . , d, we form a matrix A 1 ∈ A. Denote its rows by a d and its leading eigenvector v 1 . Main loop. The kth iteration. We have a matrix A k ∈ A composed with rows a (k) i ∈ F i , i = 1, . . . , d. Compute its leading eigenvector v k (if it is not unique, take any of them) and for i = 1, . . . d, find a solutionā i ∈ F i of the following problem: If F i is finite, then this problem is solved by exhaustion; if F i is polyhedral, then it is solved as an LP problem, andā i is found among its vertices.
i , v k ) and i ≤ d − 1, then we set i = i + 1 and solve problem (5) for the next row. If i = d, then the algorithm terminates. In case v k > 0, the matrix A k is maximal in each row, and ρ(A k ) = ρ max , i.e., A k is a solution.
j for all j = i and form the corresponding matrix A k+1 . Thus, A k+1 is obtained from A k by replacing its ith row byā i . Then we start (k + 1)st iteration.
If we obey the pivoting rule, we solve d problems (5) for all i = 1, . . . , d and take the row for which the value s i (A k ) defined in (2) is maximal. Then A k+1 is obtained from A k by replacing its ith row byā i Termination. If A k is maximal in each row and v k > 0, then ρ max = ρ(A k ) is a solution. Otherwise, we get an estimate for ρ max by inequality (4). End.
Remark 1 If each row of A k makes the maximal scalar product with the eigenvector v k , but v k is not strictly positive (i.e., A k is "almost" maximal in each row), then we cannot conclude that ρ max = ρ(A k ). However, in this case the family F is reducible: the space spanned by the vectors with the same support as v k is invariant with respect to all matrices from F . Hence, the problem of maximizing the spectral radius is reduced to two problems of smaller dimensions. Therefore, before running the algorithm we can factorize F to several irreducible families. The procedure is written in detail in [28]. However, this way is not very efficient, since the case when the final leading eigenvector is not strictly positive occurs rarely. Another way is to factorize the family after the termination into two families as written above and then run the algorithm separately to both of them them.
The simplex method for minimization problem is the same, replacing maximum by minimum in the problem (5) and omitting the positivity assumption v k > 0 in the termination of the algorithm.

The greedy method
Initialization. Taking arbitrary a i ∈ F i , i = 1, . . . , d, we form a matrix A 1 ∈ A with rows a 1 , . . . , a d . Take its leading eigenvector v 1 . Denote a Main loop. The kth iteration. We have a matrix A k ∈ A composed with rows a (k) i ∈ F i , i = 1, . . . , d. Compute its leading eigenvector v k (if it is not unique, take any of them) and for i = 1, . . . d, find a solutionā i ∈ F i of the problem (5). For each i = 1, . . . , d, we do: We form the corresponding matrix A k+1 . If the first case took place for all i, i.e., A k+1 = A k , and if v k > 0, then A k is optimal in each row. Thus, the answer is ρ max = ρ(A k ). If v k has some zero components, then the family F is reducible. We stop the algorithm and factorize F (see Remark 1). Otherwise, go to (k + 1)st iteration.
Termination. If A k is maximal in each row, then ρ max = ρ(A k ) is a solution. Otherwise, we get an estimate for ρ max by inequality (4). End.
The greedy method for minimization problem is the same, replacing maximum by minimum in the problem (5) and omitting the positivity assumption v k > 0 in the termination of the algorithm.
As we mentioned, in case of strictly positive uncertainty sets F i both those algorithms work efficiently and, if all F i are finite or polyhedral, they find the optimal matrix within finite time. However, if rows from F i have some zero components, the situation may be different. We begin with the corresponding example, then we modify those methods to converge for arbitrary non-negative uncertainty sets. Then in Section 5 we estimate the rate of convergence.

The cycling and the divergence phenomena
Both the spectral simplex method and the greedy method work very efficiently for strictly positive product families. However, if some matrices have zero components, the methods may not work at all. For the spectral simplex method, this phenomenon was observed in [28].
Here is an example for the greedy method.
We consider the product family F of 3 × 3 matrices defined by the following three uncertainty sets: So, F 1 , F 2 , and F 3 consists of 4, 2 and 2 rows respectively. Hence, we have in total 4×2×2 = 16 matrices. Taking the first row in each set we compose the first matrix A 1 (which is positive, hence the family is irreducible) and the algorithm runs as follows: The matrix A 1 has a unique eigenvector v 1 = (1, 1, 2) T . Taking the maximal row in each set F i with respect to this eigenvector, we compose the next matrix A 2 . It has a a multiple leading eigenvalue λ = 10. Choosing one of its leading eigenvectors v 2 = (2, 2, 1) T , we make the next iteration and obtain the matrix A 3 . It also has a multiple leading eigenvalue λ = 10. Choosing one of its leading eigenvectors v 2 = (2, 1, 2) T , we make the next iteration and come back to the matrix A 2 . The algorithm cycles. We see that the greedy method stucks on the cycle A 2 ⇄ A 3 of length two and on the value of the spectral radius ρ = 10. However, the maximal spectral radius ρ max of this family is not 10, but 12. The value ρ max is realized for the matrix In this example the algorithm never terminates and, when stopped, gives a wrong solution: ρ(A 2 ) = ρ(A 3 ) = 10 instead of ρ max = 12. The a posteriori estimate (4) gives 10 ≤ ρ max ≤ 12.5. The error is 25 %, and this is the best the greedy method can do for this family.

Is the greedy method applicable for sparse matrices?
The example in the previous section rises a natural question: is the idea of the greedy algorithm applicable for strictly positive matrices only? Can it be modified to work with sparse matrices? To attack this problem we first reveal two main difficulties caused by sparsity. Then we will see how to treated then. We also mention that the same troubles occur for the spectral simplex method, and they were overcome in [28]. However, those ideas of modification are totally inapplicable for the greedy method.

Two problems: cycling and multiple choice of v k
In Section 3 we saw that if some vectors from the uncertainty sets have zero components, then the greedy method may cycle and, which is still worse, may miss the optimal matrix and give a quite rough final estimate of the optimal value. Indeed, numerical experiments show that cycling often occurs for sparse matrices, especially for minimization problem (for maximizing it is rare but also possible). Moreover, the sparser the matrix the more often the cycling occurs.
The reason of cycling is hidden in multiple leading eigenvectors. Recall that the geometrical multiplicity of an eigenvalue is the dimension of the subspace spanned by all eigenvectors corresponding to that eigenvalue. It never exceeds the algebraic multiplicity (the multiplicity of the corresponding root of the characteristic polynomial). We call the subspace spanned by leading eigenvectors Perron subspace. If at some iteration we get a matrix A k with a multiple leading eigenvector, then we have an uncertainty with choosing v k from the Perron subspace. An "unsuccessful" choice may cause cycling. In the example in Section 3, both matrices A 2 and A 3 have Perron subspaces of dimension 2, and this is the reason of the trouble. On the other hand, if in all iterations the leading eigenvectors v k are simple, i.e., their geometric multiplicity are equal to one, then cycling never occurs as the following proposition guarantees.
Proposition 2 If in all iterations of the greedy method, the leading eigenvectors v k are simple, then the method does not cycle and (in case of finite or polyhedral sets F i ) terminates within finite time. The same is true for the spectral simplex method.
This fact follows from Theorem 2 proved below. Of course, if all matrices A k are totally positive, or at least irreducible, then by the well-known results of the Perron-Frobenius theory [13, chapter 13, §2], all v k are simple, and cycling never occurs. But how to guarantee the simplicity of leading eigenvectors in case of sparse matrices? For the spectral simplex method, this problem was solved by the following curious fact: Theorem A. [28] Consider the spectral simplex method in the maximization problem. If the initial matrix A 1 ∈ F has a simple leading eigenvector, then so do all successive matrices A 2 , A 3 , . . . and the method does not cycle.
So, in the spectral simplex method, the issue is solved merely by choosing a proper initial matrix A 1 . In this case there will be no uncertainty in choosing the leading eigenvectors v k in all iterations. Moreover, the leading eigenvectors will continuously depend on the matrices, hence the algorithm will be stable under small perturbations. Note that this holds only for maximization problem! For minimizing the spectral radius, the cycling can be avoided as well but in a totally different way [28].
However, Theorem A does not hold for the greedy method. This is seen already in the example from Section 3: the totally positive (and hence having a simple leading eigenvector) matrix A 1 produces the matrix A 2 with multiple leading eigenvectors. So, a proper choice of the initial matrix A 1 is not a guaranty against cycling of the greedy method! In practice, because of rounding errors, not only multiple but "almost multiple" leading eigenvectors (when the second largest eigenvalue is very close to the first one) may cause cycling or at least essential slowing down of the algorithm.
Nevertheless, for the greedy method, a way to avoid cycling does exist even for sparse matrices. We suggest a strategy showing its efficiency both theoretically and numerically. Even for very sparse matrices, the algorithm works as fast as for positive ones.

Anti-cycling by selection of the leading eigenvectors
A naive idea to avoid cycling would be to slightly change all the uncertainty sets F i making them strictly positive. For example, to replace them by perturbed sets F i,ε = F i +εe, where e = (1, . . . , 1) ∈ R d and ε > 0 is a small constant. So, we add a positive number ε to each entry of vectors from the uncertainty sets. All matrices A ∈ F become totally positive: A → A ε = A + εE, where E = ee T is the matrix of ones. By Proposition 2, the greedy method for the perturbed family F ε does not cycle. However, this perturbation can significantly change the answer of the problem. For example, if a matrix A has d − 1 ones over the main diagonal and all other elements are zeros (a ij = 1 if i − j = 1 and a ij = 0 otherwise), then ρ(A) = 0, while ρ(A ε ) > ε 1/d . Even if ε is very small, say, is 10 −20 , then for d = 50 we have ρ(A ε ) > 0.4, while ρ(A) = 0. Hence, we cannot merely replace the family F by F ε without risk of a big error. Our crucial idea is to deal with the original family F but using the eigenvectors v k from the perturbed family F ε .

Definition 2
The selected leading eigenvector of a non-negative matrix A is the limit lim where v ε is the normalized leading eigenvector of the perturbed matrix A ε = A + εE.
The next result gives a way to find the selected eigenvector explicitly. Before formulating it we make some observations.
If an eigenvector v is simple, then it coincides with the selected eigenvector. If v is multiple, then the selected eigenvector is a vector from the Perron subspace. Thus, Definition 2 provides a way to select one vector from the Perron subspace of every non-negative matrix. We are going to see that this way is successful for the greedy method.
Consider the power method for computing the leading eigenvectors: x k+1 = Ax k , k ≥ 0. In most of practical cases it converges to a leading eigenvector v linearly with the rate O(| λ 2 λ 1 | k ), where λ 1 , λ 2 are the first and the second largest by modulus eigenvalues of A or O( 1 k ), if the leading eigenvalue has non-trivial (i.e., of size bigger than one) Jordan blocks.
Rarely, if there are several largest by modulus eigenvalues, then the "averaged" sequence 1 r r−1 j=0 x k+j converges to v, where r is the imprimitivity index (see [13, chapter 13]). In all cases we will say that the power method converges and, for the sake of simplicity, assume that there is only one largest by modulus eigenvalue, maybe multiple, i.e., that r = 1.

Theorem 1
The selected leading eigenvector of a non-negative matrix A is proportional to the limit of the power method applied to the matrix A with the initial vector x 0 = e.

and obtain
Assume now that the power method for the matrix A and for x 0 = e converges to some vectorṽ. In case r = 1 (r is the imprimitivity index), this means that A k e →ṽ as k → ∞. Then direction of the vector ∞ k=0 (1 + δ) −k A k e converges as δ → 0 to the direction of the vectorṽ. Since δ → 0 as ε → 0, the theorem follows. If r > 1, then 1 r r−1 j=0 A k+j e →ṽ. Arguing as above we conclude again that the direction of the vector ∞ k=0 (1 + δ) −k A k e converges as δ → 0 to the direction of the vectorṽ. This completes the proof. ✷

Definition 3
The greedy method with the selected leading eigenvectors v k in all iterations is called selective greedy method.
Now we arrive at the main result of this section. We show that using selected eigenvectors v k avoids cycling.

Theorem 2
The selective greedy method does not cycle.
Proof. Consider the case of finding maximum, the case of minimum is proved in the same way. By the kth iteration of the algorithm we have a matrix A k and its selected leading eigenvector v k . Assume the algorithm cycles: A 1 = A n+1 , and let n ≥ 2 be the minimal length of the cycle (for n = 1, the equality A 1 = A 2 means that A 1 is the optimal matrix, so this is not cycling). Consider the chain of perturbed matrices A 1,ε → · · · → A n+1,ε , where A k,ε = A k + ε E and ε > 0 is a small number. For the matrix A k , in each row a (k) i , we have one of two cases: , then this row is not changed in this iteration: a , v k ). Hence the same inequality is true for the perturbed matrices A k,ε , A k+1,ε and for the eigenvector v k,ε of A k,ε , whenever ε is small enough.
Thus, in both cases, each row of A k+1,ε makes a bigger or equal scalar product with v k,ε than the corresponding row of A k,ε . Moreover, at least in one row this scalar product is strictly bigger, otherwise A k = A k+1 , which contradicts to the minimality of the cycle. This, in view of strict positivity of A k,ε , implies that ρ k+1,ε > ρ k,ε [28, lemma 2]. We see that in the chain A 1,ε → · · · → A n+1,ε the spectral radius strictly increases. Therefore A 1,ε = A n+1,ε , and hence A 1 = A n+1 . The contradiction competes the proof. ✷ Example 1 We apply the selective greedy method to the family from Section 3, on which the (usual) greedy method cycles and arrives to a wrong solution. Now we have: Thus, the selective greedy method arrives at the optimal matrix A 4 at the third iteration. This matrix is maximal in each row with respect to its leading eigenvector v 4 , as the last iteration demonstrates. The selective greedy method repeats the first iteration of the (usual) greedy method, but in the second and in the third iteration it chooses different leading eigenvectors ("selected eigenvectors") by which it does not cycle and goes directly to the optimal solution.

Realization of the selective greedy method
Thus, the greedy method does not cycle, provided it uses selected leading eigenvectors v k in all iterations. If the leading eigenvector is unique, then it coincides with the selected leading eigenvector, and there is no problem. The difficulty occurs if the leading eigenvector is multiple. The following crucial fact is a straightforward consequence of Theorem 2.

Corollary 1
The power method x k+1 = Ax k , k ≥ 0, applied to the initial vector x 0 = e converges to the selected leading eigenvector.
Therefore, to realize the selective greedy method we compute all the leading eigenvectors v k by the power method starting always with the same vector x 0 = e (the vector of ones). After a sufficient number of iterations we come close to the limit, which is, by Corollary 1, the selected eigenvector (up to normalization). So, actually we perform the greedy method with approximations for the selected eigenvectors, which are, in view of Theorem 2, the leading eigenvectors of perturbed matrices A ε for some small ε.
Thus, to avoid cycling we can compute all eigenvectors v k by the power method starting with the same initial vector x 0 = e. Because of computer approximations, actually we deal with totally positive matrices of the family F ε for some small ε.

The procedure of the selective greedy method
We precisely follow the algorithm of the greedy method presented in subsection 2.2, and in each iteration we compute the corresponding leading eigenvectors v k by the power method starting with the vector of ones e. The precision parameter ε of the power method is chosen in advance and is the same in all iterations. By Theorem 2, if ε > 0 is small enough, then the greedy method does not cycle and finds the solution within finite time. In practice, if the algorithm cycles for a given ε, then we reduce it taking, say, ε/10 and restart the algorithm.
To avoid any trouble with the power method for sparse matrices, we can use the following well-known fact: if A ≥ 0 is a matrix, then the matrix A + I has a unique (maybe multiple) eigenvalue equal by modulus to ρ(A)+1. This implies that the power method applied for the matrix A + I always converges to its leading eigenvector, i.e., (A + I) k e → v as k → ∞. Of course, A + I has the same leading eigenvector as A. Hence, we can replace each uncertainty set F i by the set e i + F i , where e i is the ith canonical basis vector. After this each matrix A k is replaced by A k + I, and we will not have trouble with the convergence of the power method.

The rate of convergence
In this section we address two issues: revealing the secret of the extremely fast convergence of the greedy method and getting an explanation why does the spectral simplex method converges slower (although very fast as well). We restrict ourselves to the maximization problems, where we find the maximal spectral radius ρ max . For minimization problems, the results and the proofs are the same.
First of all, let us remark that due to possible cycling, no efficient estimates for the rate of convergence exist. Indeed, in Section 3 we see that the greedy method may not converge to the solution at all. That is why we estimate the rate of convergence only under some favorable assumptions (positivity of the limit matrix, positivity of its leading eigenvector, etc.) Actually, they are not very restrictive since the selective greedy method has the same convergence rate as the greedy method for a strictly positive family F ε .
We are going to show that under those assumptions, both methods have global linear convergence, but the greedy method, in contrast to the spectral simplex method, has moreover a local quadratic convergence. In both cases we estimate the parameters of linear and quadratic convergence.

Global linear convergence
The following result is formulated for both the greedy and the spectral simplex methods. The same holds for the method with the pivoting rule. We denote by A k the matrix obtained by the kth iteration, by ρ k its spectral radius and by u k , v k its left and right leading eigenvectors (if there are multiple ones, we take any of them). We also denote ρ max = max A∈F ρ(A). As usual, {e j } d j=1 is the canonical basis in R d and e is the vector of ones. Theorem 3 In both the greedy method and the spectral simplex method, in each iteration we have provided v k > 0.
Proof. Consider the value s = s(A) defined in (2). Let the maximum in formula (2) be attained in the mth row. Then for every matrix A ∈ F , we have Av k ≤ s v k , and therefore ρ(A) ≤ s. Thus, ρ max ≤ s. On the other hand, in both greedy and simplex methods, the mth component of the vector A k+1 v k is equal to the mth component of the vector s v k . In all other components, we have A k+1 v k ≥ A k v k = ρ k v k . Therefore, Multiplying this inequality by the vector u k+1 from the left, we obtain Dividing by (u k+1 , v k ) and taking into account that s ≥ ρ max , we arrive at (6). ✷ Rewriting (6) in the form we see that the kth iteration reduces the distance to the maximal spectral radius in at least times. Thus, we have established Corollary 2 Each iteration of the greedy method and of the spectral simplex method reduces the distance to the optimal spectral radius in at least 1 − times.
If A k > 0, then the contraction coefficient from Corollary 2 can be roughly estimated from above. Let m and M be the smallest and the biggest entries respectively of all vectors from the uncertainty sets. For each A ∈ F , we denote by v its leading eigenvector and by ρ its spectral radius. Let also S be the sum of entries of v. Then for each i, Hence, for every matrix from F , the ratio between the smallest and of the biggest entries of its leading eigenvector is at least m/M. The same is true for the left eigenvector. Therefore, We have arrived at the following theorem on the global linear convergence Theorem 4 If all uncertainty sets are strictly positive, then the rate of convergence of both the greedy method and of the spectral simplex method is at least linear with the factor where m and M are the smallest and the biggest entries respectively of vectors from the uncertainty sets.
Thus, in each iteration of the greedy method and of the spectral simplex method, we have

Local quadratic convergence
Now we are going to see that the greedy method have actually a quadratic rate of convergence in a small neighbourhood of the optimal matrix. This explains its fast convergence. We establish this result under the following two assumptions: 1) the optimal matrixĀ, for which ρ(Ā) = ρ max , is strictly positive; 2) in a neighbourhood ofĀ, the uncertainty sets F i are bounded by C 2 smooth strictly convex surfaces. The latter is, of course, not satisfied for polyhedral sets. Nevertheless, the quadratic convergence for sets with a smooth boundary explains the fast convergence for finite or for polyhedral sets as well.
The local quadratic convergence means that there are constants B > 0 and ε ∈ (0, 1 B ) for which the following holds: if A k −Ā ≤ ε, then A k+1 −Ā ≤ B A k −Ā 2 . In this case, the algorithm converges whenever A 1 −Ā ≤ ε , in which case for every iteration k, we have (see, for instance, [4]). As usual, we use Euclidean norm x = (x, x). We also need the L ∞ -norm x ∞ = max i=1,...,d |x i |. We denote by I the identity d × d matrix; A B means that the matrix B − A is positive semidefinite.
Theorem 5 Suppose the optimal matrixĀ is strictly positive and all uncertainty sets F i are strongly convex and C 2 smooth in a neighbourhood ofĀ; then the greedy algorithm has a local quadratic convergence toĀ.
In the proof we use the following auxiliary fact , m and M are minimal and maximal entry of A respectively.
where s i are some non-negative numbers. Choose one for which the value |s i − 1| is maximal. Let it be s 1 and let s 1 − 1 > 0 (the opposite case is considered in the same way). Since v ′ = v and s 1 > 1, it follows that there exists q for which s q < 1.
The last sum is bigger than or equal to one term a 1q v m ( Proof of Theorem 5. Without loss of generality it can be assumed that ρ(Ā) = 1. It is known that any strongly convex smooth surface Γ can be defined by a smooth convex function f so that x ∈ Γ ⇔ f (x) = 0 and f ′ (x) = 1 for all x ∈ Γ. For example, one can set f (x) equal to the distance from x to Γ for x outside Γ or in some neighbourhood inside it. For each i = 1, . . . , d, we set Γ i = ∂F i and denote by f i such a function that defines the surface Γ i . Fix an index i = 1, . . . , d and denote by a (k) , a (k+1) , andā the ith rows of the matrices A k , A k+1 , andĀ respectively (so, we omit the subscript i to simplify the notation). Sinceā = argmax a∈Γ i (a,v), it follows that f ′ i (ā) =v. Denote by L and ℓ the lower and upper bounds of the quadratic form f ′′ (a) in some neighbourhood ofā, i.e., ℓ I f ′′ (a) L I at all points a such that a −ā < ε. Writing the Tailor expansion of the function f i at the pointā, we have for small h ∈ R d : where the remainder r(h) ≤ L. Substituting h = a (k) −ā and taking into account that f i (a (k) ) = f i (ā) = 0, because both a (k) andā belong to Γ, we obtain ( This holds for each row of A k , therefore ( This holds for each row of the matrices, hence the theorem follows. ✷ In the proof of Theorem 5 we estimated the parameter B of the quadratic convergence in terms of the parameters ℓ and L of the functions f ′′ i . This estimate, however, is difficult to evaluate, because the functions f i are a priori unknown. To estimate B effectively, we need another idea based on geometrical properties of the uncertainty sets F i . Below we obtain such an estimate in terms of radii of curvature of the boundary of F i . Assume that at each point of intersection of the surface Γ i = ∂F i and of the corresponding ε-neighbourhood of the pointā i , the maximal radius of curvature does not exceed R and the minimal radius of curvature is at least r > 0. Denote also by C the maximal value of C(A) over the corresponding neighbourhood of the matrixĀ, where C(A) is defined in Lemma 1.
Proof. As in the proof of Theorem 5, we assume that ρ(Ā) = 1 and denote by a (k) , a (k+1) , anā the ith rows of the matrices A k , A k+1 , andĀ respectively. Sinceā = argmax a∈Γ i (a,v), it follows thatv is the outer unit normal vector to Γ i at the pointā. Denote h = a (k) −ā. Draw a Euclidean sphere S i of radius r tangent to Γ i from inside at the pointā. Take a point b (k) on S i such that b (k) −ā = h . Since the maximal radius of curvature of Γ i is at least R, this part of the sphere is inside F i , hence the vector a (k) −ā forms a smaller angle with the normal vectorv than the vector b (k) −ā. The vectors a (k) −ā and b (k) −ā are of the same length, therefore On the other hand, (a (k) −ā,v) ≤ 0. Therefore, This inequality holds for all rows of the matrix A k , therefore and hence, by Lemma 1, Further, a (k+1) = argmax a∈Γ i (a, v k ), hence v k is a normal vector to Γ i at the point a (k+1) .

Numerical results
We now demonstrate and analyse numerical results of the selective greedy method. Several types of problems are considered: maximizing and minimizing the spectral radius, finite and polyhedral uncertainty sets, positive and sparse matrices. All computations are done on a standard laptop.

Product families with finite uncertainty sets
We first test the selective greedy method on positive product families, and then on nonnegative product families with density parameters γ i ∈ (0, 1) (the percentage of nonzero entries of the vectors belonging to the uncertainty set F i .) The results of tests on positive product families are given in Tables 2 and 3, for maximization and minimization respectively. The numbers shown represent the average number of iterations performed by the algorithm to find the optimal matrix.
In Tables 4 and 5 we report the behaviour of selective greedy algorithm as dimension d and the size of the uncertainty sets N vary. For each uncertainty set F i the density parameter γ i is randomly chosen in the interval (0.09, 0.15) and the elements of the set F i are randomly generated in accordance to the parameter γ i . The tables present the average number of iterations and the computation time in which the algorithm terminates and finds       Table 4 contains the data for the maximisation, while Table 5 presents the results for the minimization problem. Table 6 shows how the number of iterations and computing time vary as the density parameter is changed. The dimension is kept fixed at d = 600 and the cardinality of each product set at |F i |= 200, while we vary the interval I from which the density parameter N \ d   25  100  500  2000  50 0.06s 0.16s 0.75s 3.31s 100 0.13s 0.26s 1.18s 5.77s 250 0.25s 0.94s 3.01s 203.28s

Some conclusions
The main conclusion from the numerical results is rather surprising. The number of iterations seems to depend neither on the dimension, nor on the cardinality of the uncertainty sets nor on the sparsity of matrices. The usual number of iterations is 3 -4. The robustness to sparsity can be explained by the fact that the selective greedy method actually works with positive perturbations of matrices. The independence of cardinality of the uncertainty sets (in the tables we see that the number of iterations may even decrease in cardinality) is, most likely, explained by the quadratic convergence. The distance to the optimal matrix quickly decays to the tolerance parameter, which is usually set up between 10 −8 and 10 −6 , and the algorithm terminates.

Polyhedral product families
Here we test the selective greedy method on product families with uncertainty sets given by systems of 2d + N linear constraints of the form: where vectors b j ∈ R d + are randomly generated and normalized, and x = (x 1 , . . . , x d ).   In polyhedral uncertainty sets the algorithm anyway searches the optimal rows among the vertices. However, the number of vertices can be huge. Nevertheless, the selective greedy method still needs less than 5 iterations to find an optimal matrix.

Applications
We consider two possible applications of the greedy method: optimising spectral radius of graphs and finding the closest stable matrix.

Optimizing the spectral radius of a graph
The spectral radius of a graph is the spectral radius of its adjacency matrix A. The problem of maximizing/mimimizing the spectral radius under some restrictions on the graph is well known in the literature (see [7,9,10,12,21,26] and the references therein). Some of them deal with product families of adjacency matrices and hence can be solved by the greedy method. For example, to find the maximal spectral radius of a directed graph with d vertices and with prescribed numbers of incoming edges n 1 , . . . , n d . For this problem, all the uncertainty sets F i are polyhedral and are given by the systems of inequalities: The minimal and maximal spectral radii are both attained at extreme points of the uncertainty sets, i.e., precisely when the the ith row has n i ones and all other entries are zeros.
If we need to minimize the spectral radius, we define F i in the same way, with the only one difference: d k=1 x k ≥ n i . Performing the greedy method we solve in each iteration an LP (linear programming) problem (a i , v k ) → max, a i ∈ F i , i = 1, . . . , d, with the sets F i defined in (11). In fact, this LP problem is solved explicitly: the (0,1)-vector a i has ones precisely at positions of the n i maximal entries of the vector v k , and all other components of a i are zeros.
Thus, in each iteration, instead of solving d LP problems, which can get computationally demanding even for not so big d, we just need to deal with finding the corresponding number of highest entries of the eigenvector v k . In Table 8 we present the results of numerical simulations for applying the selective greedy method to this problem, where the number of iteration and the time required for the algorithm to finish are shown.

Finding the closest stable non-negative matrix
A matrix is called stable (in the sense of Schur stability) if its spectral radius is smaller than one. The matrix stabilization problem, i.e., the problem of finding the closest stable matrix to a given matrix is well known and has been studied in the literature. Finding the closest non-negative matrix is important in applications such as dynamical systems, mathematical economy, population dynamics, etc. (see [22,2,25,14]). Usually the closest matrix is defined in the Euclidean or in the Frobenius norms. In both cases the problem is hard. It was first noted in [25] that in L ∞ -norm there are efficient methods to find the global minimum. Let us recall that A ∞ = max i=1,...,d d j=1 |a ij |. For a given matrix A ≥ 0 such that ρ(A) > 1, we consider the problem The key observation is that for every positive r, the ball of radius r in L ∞ -norm centered in A, i.e., the set of non-negative matrices is a product family with the polyhedral uncertainty sets F i = {x ∈ R d + , d j=1 |x j − a ij |≤ r}. Hence, for each r, the problem ρ(X) → min, X ∈ B(r, A), can be solved by the selective greedy method. Then the minimal r such that ρ(X) ≤ 1 is the solution of problem (12). This r can be found merely by bisection.
We found closest stable matrices to randomly generated matrices A in various dimensions up to d = 500. The results are given in the Table 9. We remark that for this set of experiments we set the precision of the power method to 10 −8 , since for higher dimensions the procedure gets really sensitive due to the fact that the entries of the leading eigenvector get pretty close to one another. This affects the procedure which is dependant on the ordering of the entries of the leading eigenvector. Therefore, the higher precision of the power method guarantees us the computed ordering, if not the same, then very similar to the "real" one, thus giving us overall more precise result.   Below is one practical example with an unstable matrix A of size d = 10 with ρ(A) = 9.139125: The algorithm finds the closest (in L ∞ -norm) stable matrix: Even tough the selective greedy method, in theory, can safely and effectively be used on non-negative product families, the cycling might occur due to the computational errors, especially in the case of very sparse matrices. The following example sheds some light on this issue.
We run the selective greedy algorithm to the product family F = F 1 × · · · × F 7 of (0, 1)matrices of dimension 7, for solving the minimization problem. Each uncertainty set F i consists of (0, 1)-vectors containing from 1 to 4 ones. The selective greedy algorithm cycles between the following two matrices: Both matrices are minimal in the second, fifth, and sixth rows (the rows that get changed) with the respect to the vector v. In addition, we have (a i , v) = (a ′ i , v) for i ∈ {2, 5, 6}. According to the algorithm, those rows should remain unchanged, and algorithm would actually have to terminate with the matrix A. The explanation for this contradictory behaviour is that the machine, due to the calculation imprecisions, keeps miscalculating the above dot products, first taking the "problematic" rows a ′ i of the second matrix as the optimal, and then rolling back to the rows a i , seeing them as optimal.
One of the most straightforward strategy to resolve this issue is to simply round all calculation to some given number of decimals. This approach is particularly efficient if we are not concerned with high precision. However, if we want our calculations to have a higher order of precision, we need to resort to other strategies.
Another idea to deal with this issue is to modify the part of the algorithm that dictates the conditions under which the row will change. Here we will use notation for the maximization, although for everything is completely analogous for the minimization case.
Let a (k) i be the i-th row of a matrix A k obtained in the kth iteration, v k its leading eigenvector, and a  is not truly optimal, but computed dot products are really close to each other, which will lead to rolling back to the vector a (k) i in the next iteration, and thus cycling. To counter this, we impose the rule that the row a , v k )| δ, where δ is a small tolerance parameter. In other words, in the (k + 1)st iteration, the row a (k) i won't be replaced by the new row a (k+1) i if their computed scalar products with the vector v k are really close to each other.

Dealing with reducible families
As stated in subsection 2.2, positivity of the leading eigenvector v k obtained in the final iteration of the selective greedy algorithm for maximization problem is a guarantee that the obtained solution is correct. However, if v k contains some zero entries, our computed solution might not be the optimal one. This is not an issue when working with the irreducible product families, since in that case the leading eigenvector of computed matrix is always positive [28,Lemma 4].
One way to resolve this issue is to compute the Frobenius factorization of the family F , after which the maximization problem will be reduced to several similar problems in smaller dimensions (see [28,Section 3.3]). The algorithm of Frobenius factorization is fast, it can be found in [32]. In practice it is often possible to manage without the Frobenius factorization, since having only one irreducible matrix is enough to render the whole family irreducible. A simple strategy when working with reducible families is to simply include a cyclic permutation matrix P , which is irreducible, multiplied by a small parameter α.