Two global quasi-Newton algorithms for solving matrix polynomial equations

In this article, we globalize the quasi-Newton algorithm proposed in Macías et al. (Appl Math Comput 441:127678, 2023) introducing an exact line search, we propose a polynomial approximation to the merit function and we deduce a sufficient condition for its minimization interval. We use the exact merit function and its approximation to propose two global quasi-Newton algorithms for solving matrix polynomial equations. For each algorithm, we prove that the exact line search does not affect the convergence of quasi-Newton method. In addition, we present comparative numerical tests of the algorithmic proposals in which we also compare with global Newton’s method.


Introduction
Let consider a matrix polynomial equation with degree m defined by where A 0 , . . ., A m and X are n × n matrices.This type of equations arises in many applications related to problems of dynamic analysis of structural mechanics and acoustic systems (Bermúdez et al. 2000;Smith et al. 1995), simulation of electrical circuits (Laub 1980), fluid mechanics (Reddy et al. 1993), signal processing (Davila 1988), vibration problems and modeling of microelectronic mechanical systems (Lancaster 1966).Newton's method for general degree matrix polynomials was investigated in Kratz and Stickel (1987).A local and even quadratically convergent quasi-Newton method for solving (1) was proposed in Macías et al. (2023a), which reduces the computational cost of the Newton method, traditionally used to solve matrix polynomial equations.
These methods due to its local nature are advantageous whenever the starting matrix is close to the solution, but they lose its efficiency otherwise.An alternative is to introduce a globalization strategy such as a line search into the algorithm (Dennis and Schnabel 1996;Nocedal and Wright 2006).This type of strategy has been used to solve nonlinear matrix equations (Benner and Byers 1998).In Higham and Kim (2000), the authors improved Newton's method for solving the quadratic matrix equation with line search.In Seo and Kim (2008), the line search technique is extended for a general matrix polynomial equations.In both cases (quadratic and general) the merit function to be minimized is a polynomial of degree 2m; but, while in the quadratic case it is given explicitly, in the general case, it is not.This makes its calculation and minimization difficult, which motivates the authors in Seo and Kim (2014) to propose another way to determine the step size and a "relaxed" Newton method to solve a particular case of a matrix polynomial equation that arises frequently in stochastic models.
Recently, in Macías et al. (2023b), for Newton's method, the authors deduce the explicit form of the polynomial that defines the merit function to minimize in the line search procedure in the general case and obtain a sufficient condition that guarantees that the minimization interval is [0,2].
Taking into account the advantages of a global algorithm and the good local performance of the quasi-Newton proposed in Macías et al. (2023a), in this article, we globalize it with an exact line search.Since in the quasi-Newton case it is not possible to obtain an explicit polynomial expression of the merit function, we propose a polynomial approximation, deduce its explicit form and a sufficient condition for its minimization on the interval [0, 2].With the exact merit function and its approximation, we propose two global algorithms to solve matrix polynomial equations and for each algorithm, we prove that the line search does not interfere in the local converge of method.In addition, we analyze the numerical performance of the two algorithm proposals and we also compare with global Newton's method.
This paper is organized as follows.In Sect.2, we recall definitions, notations, and properties that will be useful in the document.In Sect.3, initially, we introduce an exact line search in the quasi-Newton algorithm (4), then we propose a polynomial approximation of merit function, we deduce its explicit form and a sufficient condition for its minimization on the interval [0, 2].In Sect.4, we present two global algorithmic proposals to solve the polynomial equation (1).In Sect.5, for each algorithm, we prove that the exact line search strategy does not affect the convergence of the local quasi-Newton method.In Sect.6, we present numerical tests to show performance of the proposed algorithms.Finally, in Sect.7, we present some final remarks.

Preliminaries
In this section, we present notations, some basic concepts, properties and results that we will use in the development of this document.
We will denote as C n×n the set of matrices n × n with complex components, X * a solution (solvent) of the matrix polynomial equation (1), • an induced matrix norm and • F the Frobenius norm (Golub and Van Loan 1996).We will use the equality A 2 F = trace(A H A) for A ∈ C n×n .In the numerical test, we will use the notations 1 n×n and 1 m , for an n × n matrix and an m-column vector with all its elements equal to 1, respectively.
An important concept in the theoretical development of Newton's method for matrix equations is the Fréchet derivative, which we recall below.
Definition 1 (Higham 2008) Let F : C n×n → C n×n be a matrix function.The Fréchet derivative at a point X ∈ C n×n is a linear mapping L X : The Newton's method to solve the matrix polynomial equation ( 1) was introduced in Kratz and Stickel (1987) and its basic iteration from an starting matrix X 0 is defined by where the Fréchet derivative of P at X in the direction of S is given by On the other hand, in Macías et al. (2023a), the authors introduces a quasi-Newton method for solving (1), whose basic iteration starting from an initial matrix X 0 , is defined by where which can be seen as an extension to the matricial case of the classical derivative of a scalar polynomial function.In that case, B m is a matrix polynomial function.
Under the following Assumptions, the authors in Macías et al. (2023a) show that their quasi-Newton algorithm is well defined and converges even quadratically.
A1.There exists X * ∈ C n×n such that P(X * ) = 0. A2.The matrix B m (X * ) is non-singular and β is the norm of its inverse, that is, In addition to Assumptions A1, A2, and A3, we will assume in this document that: A4.The operator L is locally Lipschitz continuous in N (X * , r ) ⊂ C n×n .That is, there exists a positive constant γ such that for all X ∈ N (X * , r ) where N (X * , r ) denotes a neighborhood with center at X * and radius r .
We recall the definition of matrices B j in Macías et al. (2023b), which we will use later.
Definition 2 (Macías et al. 2023b) Let m ∈ N .The matrices B and B r , r = 1, . . ., q, with q = m(m − 1)/2, are defined by where k and i are the unique integers such that r and the function X S is defined by the sum of products of all the permutations of X and S (Seo and Kim 2008;Macías et al. 2023b).
Finally, we will use the , there exists a constant c > 0 such that α k ≤ cβ k for all k ∈ N.

Introducing an exact line search
The quasi-Newton algorithm (4) presents good local performance (Macías et al. 2023b) and is an alternative to the high computational cost of Newton's method because it uses the Schur decomposition (Kratz and Stickel 1987).However, due to its local nature, its good convergence properties depend on the initial approximation.In order to improve the convergence of this algorithm from arbitrary starting points, we introduce an exact line search strategy analogous to the one used to globalize the Newton method in Seo and Kim (2008) and Macías et al. (2023b): the next iteration is of the form X k+1 = X k + t S k , where S k is the quasi-Newton step and t is calculated by an exact line search minimizing the merit function In the particular case of the Newton's method, (9) is a polynomial of degree 2m (Seo and Kim 2008), whose explicit form was recently presented in Macías et al. (2023b).As it was mentioned in this last paper, to have the explicit polynomial to be minimized makes the line search process easier and helps to conjecture about the value of b for which the minimizer (step size) belongs to the interval [0, b].
In the case of the quasi-Newton method, the difficulty of the numerical calculation of the merit function ( 9) is the same as for the Newton's method: the evaluation of the matrix polynomial function at each iteration and its minimization as the degree of the polynomial increases.
Taking into account the above and in order to propose an alternative for the numerical calculation of p(t), we analyze the expression P(X + t S) below.
From the definition of the matrix polynomial function P, we have that In Seo and Kim (2008), it was shown that where L X (S) is given by (3) and the function X S is defined by the sum of products of all the permutations of X and S.
Since the quasi-Newton method uses an approximation to L X (S), that reduces the computational cost of Newton's method, it is reasonable to use this approximation in (11) in the same way.Therefore, we will not have an exact expression for the merit function (9), as given for Newton's method.
The right hand side of ( 11) can be approximated by the expression: which using the equality (quasi-Newton method) B m (X )S = −P(X ) is equal to the expression: where Then, we have the following approximation: and we define the following approximation to the merit function (9), whose explicit form is obtained by means of algebraic manipulations analogous to those performed in Macías et al. (2023b) for Newton's method.This explicit form is given in Theorem 1.
As in Macías et al. (2023b), we define a recurrence for a fixed value of an integer s, with 2 ≤ s ≤ m, by and we will use to simplify the denotation.
Theorem 1 The polynomial q(t) is explicitly defined by where m is the degree of the matrix polynomial (1), . . . , m, using (14), we define For the coefficients c m+1 and c m+2 , of t 5 and t 6 , we have 4. For the coefficients c k of t k+(4−m) , m + 3 ≤ k ≤ 3m − 5; that is, the coefficients of the powers 7, 8, . . ., 2m − 1, we consider if the power k + (4 − m) is odd or even.In the first case, it can be expressed in the form 2r + 1, for some r ≥ 3 and we have that: Analogously, if the power k + (4 − m) is even, it is of the form 2r , with r ≥ 4, and we have the same conditions as for the previous case: (a) if m = r + 1, we have, Proof It is analogous to the proof of Theorem 1 in Macías et al. (2023b).
Even though the form of polynomial q(t) given by ( 15) is the same as that of the polynomial in Macías et al. (2023b) for Newton's method, their coefficients are not necessarily the same since they finally depend on the matrix S.
An immediate consequence of Theorem 1 is the explicit calculation of the derivative of q(t), which is guaranteed by the following corollary.
Corollary 1 Let q(t) be the polynomial given by (15).Then the derivative polynomial is given by q Proof It is analogous to the proof of Corollary 1 in Macías et al. (2023b).
A sufficient condition to minimize the polynomial q(t) in the interval [0, 2] is given by Corollary 2.
Corollary 2 q (2) ≥ 0 if and only if Proof It is analogous to the proof of Corollary 2 in Macías et al. (2023b).

Algorithmic proposals
Next, we present two global quasi-Newton algorithms to solve the matrix polynomial equation ( 1).The first one, that we will call Implicit quasi-Newton Algorithm, uses the exact merit function m(t) given by ( 9), and a solver to find its minimizer.The second, which we will call Explicit quasi-Newton Algorithm uses the polynomial approximation q(t) given by ( 13) as merit function and the condition (2), to its minimization.

Algorithm 1 Implicit quasi-Newton
Find min(m(t)), t = arg min(m(t)).6: Update X k+1 = X k + t S k and k = k + 1. 7: end while 8: return X * Remark 1 In the numerical tests of Algorithm 1, we use the function sqp(•, •) of Matlab to find the minimizer of m(t) (t = sqp(t 0 , m(t))) in Step 5.This function uses sequential quadratic programming (Nocedal and Wright 2006), for solving the minimization problem and calculates the derivatives of the objective function by finite differences (Canfield 2017).
Remark 2 The expression Res(X k ) is a relative measure of P(X k ) F and it is defined as in Macías et al. (2023a, b) and Seo and Kim (2008) by, Remark 3 The exact line search in Algorithm 2 uses the explicit polynomial given in Theorem 1.In addition, it includes the sufficient condition (16) to minimize that polynomial.

Convergence analysis
In this section, for each algorithmic proposal, we prove that the exact line search strategy does not affect the rate of convergence of the quasi-Newton method (4).
Construction of the polynomial q(t).5: The condition 6: if q (2) > 0 then 7: Find t ∈ [0, 2] by minimizing q(t).8: else 9: Find t by minimizing q(t) out of [0, 2].10: end if 11: Update X k+1 = X k + t S k and k = k + 1. 12: end while 13: return X * We start with a technical lemma that gives an upper bound for the double summation in (12) which will be useful in the proof of main theorems of this section.
Lemma 1 Let P : C n×n → C n×n be a polynomial matrix function of degree m ≥ 2, X , S ∈ C n×n and Then Proof The proof is analogous to the one given in Macías et al. (2023a), for the quadratic case.We use induction on m.
Making a change of indices, the double sum in the previous equality is equivalent to the following expression: therefore and by triangular inequality, we have On the other hand, taking into account the triangular inequality and some algebraic manipulations on the first term of the right side of the inequality ( 22), we obtain Finally, using the previous inequality, the inductive hypothesis (20) on ( 22) and some algebraic manipulations, we have Therefore, ( 19) holds for m = l.This completes the proof.
The following is a technical lemma that allows us to bound the norm of the difference of matrices L X (S) and B m (X )S.This lemma will be very useful in the proof of Theorems 2 and 3.
Lemma 2 Let P : C n×n → C n×n be a polynomial matrix function of degree m ≥ 2 that satisfies the Assumptions A1, A3 and A4; X , S ∈ C n×n and E X S = L X (S) − B m (X )S. ( Then there exist positive constants γ, α and ϕ such that Proof Let Using the triangle inequality, the Assumptions A3 and A4, we have that there exist positive constants γ and α, such that On the other hand, since B m is a matrix polynomial function, which implies that it is differentiable and Lipschitz continuous, then there exists a positive constant ϕ > 0 such that Then using ( 26) in (25), we get The following technical lemma and its proof will be useful in the proofs of Theorems 2 and 3.
Lemma 3 Let P : C n×n → C n×n be a polynomial matrix function that satisfies the Assumption A1.For the quasi-Newton method (4), assume that X k is within a region where convergence of order 1 + θ, θ ∈ [0, 1], to X * occurs and let k = X k − X * .Then Proof Since X k is in the region of convergence of order 1 + θ, θ ∈ [0, 1] to X * , then we recall that by Theorems 4 and 5, in Macías et al. (2023a), for μ > 0 and θ ∈ for any θ ∈ (0, 1).Furthermore, S k = X k+1 − X k = k+1 − k .From this equality, from ( 28) and ( 29), we have For the matrix polynomial function P of degree m, From Lemma 1 with t = 1, we get . By Lemma 2, there exist positive constants γ, α and ϕ such that On the other hand, From (31), we have In addition, B m (X * ) k = O ( k ) , Therefore, using (32) and (33), we have Analogously, P(X k+1 ) = O ( k+1 ) and using (28), we have The following is one of the central results of this section which guarantees that the exact line search strategy used in Algorithm 1 does not interfere with the convergence of quasi-Newton method (4).
Theorem 2 Let P : C n×n → C n×n be a polynomial matrix function that satisfies the Assumptions A1 to A4.For the quasi-Newton method (4), assume that X k is within a region where convergence of order 1+θ, θ ∈ [0, 1], to X * occurs and the sequence of step sizes t * k generated by Algorithm 1 is bounded.Then the exact line search strategy used in Algorithm 1 does not interfere with the convergence of the quasi-Newton method (4).
Proof Let X k+1 = X k + S k and X k+1 = X k + t * k S k be the quasi-Newton and quasi-Newton with exact line search iterations, this last generated by Algorithm 1.
From ( 9) and ( 11), we get Now, from (4), we conclude that Moreover, we know that the step size t * k is the global minimizer of the polynomial m(t), consequently, Using ( 28), ( 32), ( 34) and ( 36) on ( 35), we have Then, From (37), the Landau notation, the reverse triangle inequality, and Lemma 1, we have that there exist positive constants α 1 and α 2 such that Using ( 33) and taking into account that by hypothesis the sequence t * k is bounded then there are positive constants c and where α is a positive constant.From (27), for some positive constant ρ.Since B m (X * ) is nonsingular (Assumpion A1), then we have P(X k ) = 0. From (40), we obtain Using quasi-Newton and quasi-Newton with exact line search iterations, we have . By the triangle inequality ( 28), (30), and (41), we get Therefore, X k converges to X * with order 1 + θ.
The following is the second central result of this section, which is analogous to the previous one and guarantees that the exact line search strategy used in Algorithm 2 does not interfere with the convergence of the quasi-Newton method (4).
Theorem 3 Let P : C n×n → C n×n be a polynomial matrix function that satisfies the Assumptions A1-A4.For the quasi-Newton method (4), assume that X k is within a region where convergence of order 1+θ, θ ∈ [0, 1], to X * occurs and the sequence of step sizes t * k generated by Algorithm 2 is bounded.Then the exact line search strategy given by Algorithm 2 does not interfere with the convergence of the quasi-Newton method (4).
Proof Let X k+1 = X k + S k and X k+1 = X k + t * k S k be the quasi-Newton and quasi-Newton with exact line search iterations, this last generated by Algorithm 2.
From (13), we get On the other hand, we recall that the step size t * k is the global minimizer of the polynomial q(t), consequently, Thus, performing some algebraic manipulations we have from (34), we have that there exists α 1 > 0 such that P(X k+1 ) ≤ α 1 k 1+θ ; on the other hand, using (33) and taking into account that by hypothesis the sequence t * k is bounded, there exist positive constants c and α 2 such that Then 44), the Landau notation, the reverse triangle inequality, and Lemma 1, we have that there exists positive constants α 3 , and α 4 such that From ( 33) and since the sequence where α is a positive constant.From ( 27), for some positive constant ρ.Since B m (X * ) is nonsingular (Assumpion A1), then we have P(X k ) = 0. Taking reciprocals in (46), we obtain Using quasi-Newton and quasi-Newton with exact line search iterations, we have . By the triangle inequality ( 28), (30), and (47), we get Therefore, X k converges to X * with order 1 + θ.

Numerical experiments
In this section, we will analyze the numerical performance of the implicit quasi-Newton and explicit quasi-Newton algorithms proposed in Sect.3, which we will denote by IQN and EQN algorithms, respectively.Our purpose is to show numerically the benefits of introducing a line search in the quasi-Newton algorithm (4) and to analyze the global performance of the two new global algorithmic proposals.
We initially compare the local quasi-Newton algorithm (4) proposed in Macías et al. (2023a) (which we call QN algorithm and include below) with the new global quasi-Newton IQN and EQN algorithms.We also include results with the local Newton method (N algorithm below).Next, we compare the IQN and EQN algorithms with the explicit Newton algorithm presented in Macías et al. (2023b) (which we denote EN algorithm and include below) in terms of number of iterations and CPU time.We also analyze whether the sufficient condition ( 16) is satisfied or not, which we report in the result tables with the numbers 1 or 0, respectively.
Update X k+1 = X k + S k and k = k + 1. 4: end while 5: return X * Remark 4 In step 2, to find S k , we proceed analogously as in Kratz and Stickel (1987) and Seo and Kim (2008) using the Schur decomposition and the notion of the Kronecker product, and of the vec function as given in Lancaster (1966).

Remark 5
The exact line search in Algorithm 5 uses the explicit polynomial p(t) given in Macías et al. (2023b).
For the numerical experiments, we consider six problems that include applications in different areas in which it is necessary to solve matrix polynomial equations of degree two, four, five and six, with matrices of different sizes.To write the codes of the algorithms and test problems, we use the software Matlab and we use a computer Intel (R) Core (TM) i5-3450 of 2.8 GHz.
Next, we describe each problem and present the results in two tables that have the following information: for Problem 1, the first table contains the value of the parameter δ for the two algorithms and the other columns contain the number of iterations for the quasi-Newton algorithm without and with exact line search.The second table presents the results obtained from the globalized algorithms with the information presented in the following columns: the initial matrix (X 0 ), the number of iterations (N ) and CPU time (time) for each algorithm.For the other problems, the column (δ), is changed to (X 0 ) indicating the starting matrix.
Construction of the polynomial p(t).

5:
The condition 6: Problem 1 (Seo and Kim 2014;Seo et al. 2018) Markov chain problem.We consider the equation which has zero diagonal entries and non negative constant off-diagonal entries, and satisfies that (3W The Problem size is n = 16, the starting matrix is the zero matrix as in Seo and Kim (2014).For the analysis, we consider different values of the parameter δ.
In Table 1, we present the number of iterations of local Newton algorithm and the quasi-Newton algorithms without and with line search; for all values of δ, we observe a decrease in the number of iterations when line search is applied in the algorithm; in particular, for the value of δ = 10 −1 a smaller decrease is obtained, while in the other cases this decrease is constant.
From the Table 2, we can see the same number of iterations in the three algorithms.On the other hand, the EQN and IQN algorithms uses less CPU time than the EN algorithm: they use approximately 9% and 56% of the CPU time of the EN algorithm.The EQN algorithm shows better behavior than the other algorithms.
Problem 2 (Mehrmann and Watkins 2002;Schiehlen 1990).Consider a model of a robot with an electric motor at the joints given by the system of differential equations: where the Eq. ( 48) gives us the robot model and the Eq. ( 49) the motor mechanism.The matrices K and V are symmetric and positive definite matrix, J is diagonal and positive definite, D is diagonal and positive semidefinite matrix.Through the linearization process h(q, q (2) ) = G q + C q and simplification (M(q) = A 4 ) in the robot Eqs. ( 48) and ( 49) leads to an equation for the robot dynamics of the form A 4 q (2) + G q + (C + K )q − K p = 0, where the matrices A 4 is symmetric and positive definite, G is antisymmetric and C is symmetric.Solving this equation for p and substituting in the Eq. ( 49) we obtain the polynomial system of equations where, From the matrix differential Eq. ( 50), we have the polynomial equation In Table 3, we can see that the QN algorithm produces different iterations that Newton algorithm.In the other part, the number of iterations of the quasi-Newton algorithm with exact line search decreases compared to those of the quasi-Newton algorithm without exact line search, such decrease is between 68% and 83% using the EQN and IQN algorithms, respectively.
In Table 4, we observe a good performance of the IQN algorithm in terms of number of iterations.On the other hand, in terms of CPU time we observe that the EQN algorithm uses approximately 90% of the CPU time of the EN algorithm and 29% of the CPU time of the IQN algorithm.Problem 3 (Kratz and Stickel 1987;Seo and Kim 2008) Vibration problem.Consider the quartic polynomial equation given by Such equation occurs in connection with vibrating systems (Lancaster 1966).
In Table 5, we can see that the number of iterations of the quasi-Newton algorithm with exact line search decreases compared to those of the quasi-Newton algorithm without exact line search, such decrease is between 60% and 85%.
In Table 6, we can see that the three algorithms converge in a similar number of iterations.On the other hand, we observe that the EQN algorithm uses in CPU time approximately 90% of the time used by the EN algorithm and 22% of the time used by the IQN algorithm.Additionally, in this problem it is important to highlight that in the EQN and EN algorithms, the condition is violated the same number of times in each case.
which arises from a finite element solution of the equation for the modes of a planar waveguide using piecewise linear basis functions ϕ i , i = 1 : 10.The coefficient matrices are defined by Thus A 1 and A 3 are diagonal, while A 0 , A 2 and A 4 are tridiagonal.The parameter δ describes the difference in refractive index between the cover and the substrate of the waveguide; q is a function used in the derivation of the variational formulation and is constant in each layer.
In Table 8, we observe a decrease in the number of iterations of approximately 65% on average when we apply the exact line search.
From Table 9, we can observe a similar behavior to the other problems, that is, the EN algorithm has a better performance with respect to the number of iterations.On the other Problem 5 (Vitoria 2001) Consider a matrix polynomial of degree m = 5, X 5 + A 4 X 4 + A 3 X 3 + A 2 X 2 + A 1 X + A 0 = 0, where In Table 10, we observe a decrease in the number of iterations of approximately 82% on average when we apply the exact line search in the quasi-Newton algorithm.
From Table 11, we observe a similar behavior in the three algorithms with respect to the number of iterations.On the other hand, in CPU time, the EQN algorithm uses approximately 75% of the time used by the EN algorithm and 18% of the time used by the IQN algorithm.
For each one of the algorithmic proposals, we prove that the exact line search does not affect the local convergence of quasi-Newton method.
We analyze and compare the numerical performance of the two proposed algorithms, first with a local quasi-Newton algorithm to see the effect of introducing a line search and second, with a global explicit Newton algorithm for to analyze their global performance.The numerical tests show the advantage, in terms of the number of iterations, of including an exact line search strategy, since these are significantly reduced improving local convergence.
We observe that the two new algorithmic proposals have a similar performance in number of iterations, but in CPU time, the explicit quasi-Newton algorithm performs better than other, since it uses less time; in particular, for Problems 2, 3, 4 and 5 uses less than half the CPU time of the implicit quasi-Newton algorithm.
On the other hand, the explicit Newton algorithm uses few iterations, but more CPU time with all problems; in particular, it uses more than double the CPU time of explicit quasi-Newton algorithm in Problems 1, 4 and 6.Moreover, in the last problem, we observed that the CPU time of explicit Newton algorithm increases significantly as the size of the problem increases.
The difference in CPU times of the explicit Newton, implicit quasi-Newton and explicit quasi-Newton algorithms observed in numerical experimentation maybe is due to the computation of S k and to the form of merit function in the three algorithms.
Funding Open Access funding provided by Colombia Consortium.

Table 1
Number of iterations for convergence for Problem 1, with X 0 = 0

Table 3
Number of iterations for convergence for Problem 2

Table 10
Number of iterations for convergence for Problem 5 , in CPU time, the EQN algorithm uses approximately 23% of the time used by the EN algorithm and 15% of the time used by the IQN algorithm. hand