Abstract
We consider the Broydenlike method for a nonlinear mapping \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) that has some affine component functions, using an initial matrix B_{0} that agrees with the Jacobian of F in the rows that correspond to affine components of F. We show that in this setting, the iterates belong to an affine subspace and can be viewed as outcome of the Broydenlike method applied to a lowerdimensional mapping \(G:\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}\), where d is the dimension of the affine subspace. We use this subspace property to make some small contributions to the decadesold question of whether the Broydenlike matrices converge: First, we observe that the only available result concerning this question cannot be applied if the iterates belong to a subspace because the required uniform linear independence does not hold. By generalizing the notion of uniform linear independence to subspaces, we can extend the available result to this setting. Second, we infer from the extended result that if at most one component of F is nonlinear while the others are affine and the associated n − 1 rows of the Jacobian of F agree with those of B_{0}, then the Broydenlike matrices converge if the iterates converge; this holds whether the Jacobian at the root is invertible or not. In particular, this is the first time that convergence of the Broydenlike matrices is proven for n > 1, albeit for a special case only. Third, under the additional assumption that the Broydenlike method turns into Broyden’s method after a finite number of iterations, we prove that the convergence order of iterates and matrix updates is bounded from below by \(\frac {\sqrt {5}+1}{2}\) if the Jacobian at the root is invertible. If the nonlinear component of F is actually affine, we show finite convergence. We provide highprecision numerical experiments to confirm the results.
Introduction
This work is devoted to convergence properties of the Broydenlike method for systems of equations in which some of the equations are linear. Among others, it provides the first answer to the decadesold question whether the Broydenlike matrices converge under the standard assumptions for qsuperlinear convergence of the iterates, albeit for a special case only.
Given a smooth nonlinear mapping \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\), Broyden’s method [3] aims at finding \(\bar {u}\in \mathbb {R}^{n}\) with:
It is a wellestablished member of the class of quasiNewton methods and shares its local qsuperlinear convergence, cf. [9, 14, 15, 21, 23]. The Broydenlike method generalizes Broyden’s method by allowing an additional parameter σ_{k} in the matrix update. It reads as follows.
For (σ_{k}) ≡ 1, we recover Broyden’s method. An appropriate choice of σ_{k} ensures that B_{k+ 1} is invertible if B_{k} is invertible. In fact, by the ShermanMorrison formula, all choices but one maintain invertibility. The Broydenlike method is well known, cf. [22], [28, Section 6] and [16, Algorithm 1].
In this work, we consider Algorithm 1 for mixed linearnonlinear systems of equations. That is, there exists J ⊂{1,…,n} such that \(F_{j}(u)={a_{j}^{T}} u + b_{j}\), where \(a_{j}\in \mathbb {R}^{n}\) and \(b_{j}\in \mathbb {R}\) for all j ∈ J. In addition, we suppose that the initial matrix B_{0} agrees with the Jacobian of F in the rows that correspond to (some of) the affine components of F, i.e., \({B_{0}^{j}} = {a_{j}^{T}}\) for all j ∈ J. For j∉J the functions F_{j} can be nonlinear and \({B_{0}^{j}}\) is not restricted. This framework includes many practically relevant systems of equations. Also, it fits two standard suggestions for the choice of B_{0}, which are to use \(B_{0}=F^{\prime }(u^{0})\) or a finite difference approximation of \(F^{\prime }(u^{0})\). In the following, we speak of exact initialization if \({B_{0}^{j}} = {a_{j}^{T}}\) for all j ∈ J.
This article is divided into four parts. In the first part, we show that exact initialization ensures that the steps (s_{k})_{k≥ 1} stay in a subspace \({\mathcal {S}}\) and that they can be generated by applying Algorithm 1 to a lowerdimensional mapping \(G:\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}\), where d is the dimension of \({\mathcal {S}}\). This extends results from [18].
The second part is concerned with the consequences of the first part for the convergence of the Broydenlike matrices (B_{k}). We point out that it is still largely open if (B_{k}) converges and that several renowned researchers have mentioned this issue in their works, cf. the survey articles [8, Example 5.3], [21, p. 117], [14, p. 306] and [2, p. 940]. The convergence of (B_{k}) is for example of interest because it is closely related to the rate of convergence of (u^{k}), see, e.g., Lemma 2 and 3. For invertible \(F^{\prime }(\bar {u}),\) there is only one result available: It is established in [22, Theorem 5.7] and in [17] that if the sequence of steps (s^{k}) is uniformly linearly independent, then (B_{k}) converges and \(\lim _{k\to \infty } B_{k}=F^{\prime }(\bar {u})\). We include the precise result as Theorem 4. Unfortunately, conditions that imply uniform linear independence of (s^{k}) are unknown and we are not aware of a single examplebe it theoretical or numericalin which (s^{k}) is actually uniformly independent. In the setting of this work, anyway, (s^{k})_{k≥ 1} is confined to the subspace \({\mathcal {S}}\) and thus violates uniform linear independence. After extending the notion of uniform linear independence to subspaces, we generalize the above convergence result for (B_{k}) to the setting of this work, cf. Theorem 5. In doing so, we also obtain a formula for the limit of (B_{k}).
In the third part, we observe that if F has only one nonlinear component function and B_{0} is initialized exactly, then the generalized convergence result from the second part implies that (B_{k}) converges whenever the iterates (u^{k}) converge, and this holds for regular and for singular \(F^{\prime }(\bar {u})\), cf. Corollary 2. Since the assumption of only one nonlinear component function is very restrictive, we stress that this is the first time that convergence of (B_{k}) is shown for n > 1 and invertible \(F^{\prime }(\bar {u})\). We will also see that even though each B_{k} agrees with \(F^{\prime }(\bar {u})\) in n − 1 of n rows, the limit of (B_{k}) is generally not \(F^{\prime }(\bar {u})\).
We continue the third part by paying special attention to the case that σ_{k} = 1 for all k ≥ k_{0} and some k_{0} ≥ 0, i.e., Algorithm 1 turns into Broyden’s method. The result of the first part implies that in this case, Broyden’s method essentially reduces to the onedimensional secant method. This yields a comprehensive characterization of the convergence of (u^{k}) including a lower bound for its qorder, which in turn allows us to establish significantly stronger convergence properties of (B_{k}) than for the Broydenlike method, cf. Theorem 6. For affine F, we prove finite convergence if σ_{k} = 1 is selected at least once, cf. Theorem 7. The third part concludes with a brief application of the developed convergence theory to two examples from the literature.
In the last part, we verify the results from the third part in numerical experiments with high precision. Among others, we find that if \(F^{\prime }(\bar {u})\) is invertible, then choosing \((\sigma _{k})_{k\geq k_{0}}\equiv 1\) for some k_{0} ≥ 0 leads to much faster convergence than, e.g., (σ_{k}) ≡ 0.99, while this is not the case if \(F^{\prime }(\bar {u})\) is not invertible.
The convergence theory of Broyden’s method and specific versions of the Broydenlike method are developed in, e.g., [4, 12, 16, 22]. There is only one further result available on the convergence of the Broyden(like) matrices besides the one mentioned above: In [19], it was recently shown for Broyden’s method that if \(F^{\prime }(\bar {u})\) is singular with some additional structure, then \(({\lVert B_{k+1}B_{k}\rVert })\) converges qlinearly to zero under appropriate assumptions, so (B_{k}) converges.
For other quasiNewton updates, convergence results are available. We are aware of results for the SR1 update [5, 11, 30], for the PowellsymmetricBroyden update [26], for the DFP and the BFGS update [13], and for the convex Broyden class excluding the DFP update [29].
This paper is organized as follows. In Section 2, we collect preparatory results and we present the generalization of uniform linear independence that is useful for subspaces. In Section 3, we prove the subspace property of (s^{k})_{k≥ 1} and show that (s^{k})_{k≥ 1} can be obtained by applying Algorithm 1 to a suitable mapping \(G:\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}\). Section 4 contains the convergence results for the Broydenlike matrices and the application to examples from the literature. Section 5 presents numerical experiments and Section 6 summarizes.
Notation
We use \(\mathbb {N}=\{1,2,3,\ldots \}\). For \(n\in \mathbb {N}\) we set [n] := {1,2,…,n}, [n]_{0} := [n] ∪{0} and [0] := ∅. The Euclidean norm of \(v\in \mathbb {R}^{n}\) is \({\lVert v\rVert }\), while \({\lVert A\rVert }\) is the spectral norm if \(A\in \mathbb {R}^{m\times n}\). For \(A\in \mathbb {R}^{m\times n}\), A^{j} indicates the j th row of A, regarded as a row vector, whereas \(A^{i,j}\in \mathbb {R}\) is the usual notation for entries. The span of \(C\subset \mathbb {R}^{n}\) is indicated by 〈C〉. We will use tacitly that Algorithm 1 cannot generate a step s^{k} satisfying s^{k} = 0. For k ≥ 0, we define:
where the first definition assumes that Algorithm 1 has generated (B_{k}) and (u^{k}) with \(\lim _{k\to \infty }u^{k}=\bar {u}\) for some \(\bar {u}\) at which F is differentiable, while the second definition already makes sense if Algorithm 1 has generated s^{k}. We employ the qorder of convergence and the rorder of convergence in this work. They are studied in, e.g., [25, Section 9].
Preliminaries
Convergence of the Broydenlike method
The main convergence result for Algorithm 1 reads as follows.
Theorem 1
Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) be differentiable in a neighborhood of \(\bar {u}\) with \(F(\bar {u})=0\) and let \({\lVert F^{\prime }(u)F^{\prime }(\bar {u})\rVert }\leq L{\lVert u\bar {u}\rVert }^{\alpha }\) for all u from this neighborhood and constants L,α > 0. Let \(F^{\prime }(\bar {u})\) be invertible. If Algorithm 1 generates a sequence (u^{k}) that satisfies \({\sum }_{k}{\lVert u^{k}\bar {u}\rVert }^{\alpha }<\infty \), then there holds:
implying that (u^{k}) converges qsuperlinearly to \(\bar {u}\).
Moreover, there are δ,ε > 0 such that for every (u^{0},B_{0}) with \({\lVert u^{0}\bar {u}\rVert }\leq \delta \) and \({\lVert B_{0}F^{\prime }(\bar {u})\rVert }\leq \varepsilon \), Algorithm 1 either terminates with output \(u^{\ast }=\bar {u}\) or it generates (u^{k}) such that all B_{k} are invertible and \({\sum }_{k}{\lVert u^{k}\bar {u}\rVert }^{\alpha }<\infty \).
Proof
This follows from [20, Theorem 1]. □
If we restrict attention to Broyden’s method instead of the Broydenlike method, then a stronger result is available, namely Gay’s theorem on 2nstep qquadratic convergence [12, Theorem 3.1]. For mixed linear–nonlinear systems with exact initialization, this result has recently been improved.
Theorem 2
Let \(n\in \mathbb {N}\), d ∈ [n]_{0} and J := [n] ∖ [d]. Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) satisfy \(F_{j}(u)={a_{j}^{T}} u + b_{j}\) for all j ∈ J, where \(a_{j}\in \mathbb {R}^{n}\) and \(b_{j}\in \mathbb {R}\) for all j ∈ J. Let F be differentiable in a neighborhood of \(\bar {u}\) with \(F(\bar {u})=0\) and let \({\lVert F^{\prime }(u)F^{\prime }(\bar {u})\rVert }\leq L{\lVert u\bar {u}\rVert }\) for all u from this neighborhood and a constant L > 0. Let \(F^{\prime }(\bar {u})\) be invertible. Then there are δ,ε > 0 and C > 0 such that for every (u^{0},B_{0}) with \({\lVert u^{0}\bar {u}\rVert }\leq \delta \), \({\lVert B_{0}F^{\prime }(\bar {u})\rVert }\leq \varepsilon \), and \({B_{0}^{j}} = {a_{j}^{T}}\) for all j ∈ J, Algorithm 1 with (σ_{k}) ≡ 1 either terminates with output \(u^{\ast }=\bar {u}\) or it generates (u^{k}) that satisfies (1) and:
In particular, (u^{k}) converges qsuperlinearly and with rorder at least 2^{1/(2d)} to \(\bar {u}\) and all B_{k} are invertible.
Proof
See [18]. □
Convergence of the Broydenlike updates
If (u^{k}) and the Broydenlike updates converge, then \(F(\lim _{k\to \infty } u^{k})=0\).
Lemma 1
Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) be continuous at \(\bar {u}\). Let (u^{k}) and (B_{k}) be generated by Algorithm 1. Suppose that \(u^{k}\to \bar {u}\) and \(\sup _{k\geq 0}{\lVert B_{k+1}B_{k}\rVert }<\infty \). Then \(F(\bar {u})=0\).
Proof
From \(\sup _{k\geq 0}{\lVert B_{k+1}B_{k}\rVert }<\infty \), we infer \(\sup _{k\geq 0}\frac {{\lVert F(u^{k+1})\rVert }}{{\lVert s^{k}\rVert }}<\infty \). The convergence of (u^{k}) yields \(\lim _{k\to \infty }{\lVert s^{k}\rVert }=0\), so \(\lim _{k\to \infty }{\lVert F(u^{k})\rVert }=0\), whence \(F(\bar {u})=0\). □
If (u^{k}) and the Broydenlike matrices converge, then the convergence of (u^{k}) is qsuperlinear.
Lemma 2
Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) be differentiable at \(\bar {u}\) with \(F^{\prime }(\bar {u})\) invertible. Let (u^{k}) and (B_{k}) be generated by Algorithm 1. Suppose that \(u^{k}\to \bar {u}\) and \({\lVert B_{k+1}B_{k}\rVert }\to 0\) for \(k\to \infty \). Then (u^{k}) converges qsuperlinearly to \(\bar {u}\).
Proof
Due to the invertibility of \(F^{\prime }(\bar {u})\) and \(u^{k}\to \bar {u}\), there is C > 0 such that:
for all k sufficiently large. Here, we also used that \(F(\bar {u})=0\) by Lemma 1. Subtracting \(\frac {C}{\sigma _{\min \limits }}{\lVert B_{k+1}B_{k}\rVert }{\lVert u^{k+1}\bar {u}\rVert }\) and taking the limit yields the claim. □
Next we show that convergence of (u^{k}) with qorder at least γ > 1 implies convergence of \(({\lVert B_{k+1}B_{k}\rVert })\) with rorder at least γ, cf. also [25, 9.1.8&9.2.7].
Lemma 3
Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) and let (u^{k}) and (B_{k}) be generated by Algorithm 1. Suppose that (u^{k}) converges to some \(\bar {u}\) and that F satisfies \({\lVert F(u)F(\bar {u})\rVert }\leq L{\lVert u\bar {u}\rVert }\) for all u in a neighborhood of \(\bar {u}\) and some constant L > 0. Let γ > 1.

1.
If \(F(\bar {u})=0\) and there is C > 0 such that for all k sufficiently large:
$$ {\left\u^{k+1}\bar{u}\right\}\leq C {\left\u^{k}\bar{u}\right\}^{\gamma} $$(2)is satisfied, then there exists \(\hat C>0\) such that:
$$ {\left\B_{k+1}B_{k}\right\} \leq \hat C {\left\u^{k}\bar{u}\right\}^{\gamma1} $$(3)for all sufficiently large k.

2.
If \(C,\hat C>0\) exist such that (2) and (3) are satisfied for all sufficiently large k, then we have \(F(\bar {u})=0\) and \(\lim _{k\to \infty }\lVert {B_{k+1}B_{k}}\rVert ^{\frac {1}{p^{k}}}=0\) for all p ∈ [1,γ). In particular, \({\sum }_{k}\lVert {B_{k+1}B_{k}}\rVert <\infty \) and (B_{k}) converges.
Proof

Proof of 1: Since (2) implies qsuperlinear convergence of (u^{k}), we obtain from a wellknown result of Dennis and Moré that \({\lVert u^{k}\bar {u}\rVert }/{\lVert s^{k}\rVert }\to 1\) for \(k\to \infty \), cf. [7, Lemma 2.1]. The Lipschitztype property of F at \(\bar {u}\), \(F(\bar {u})=0\) and (2) hence yield:
$$ {\left\B_{k+1}B_{k}\right\} = \sigma_{k}\frac{{\lVert F(u^{k+1})F(\bar{u})\rVert}}{{\lVert s^{k}\rVert}} \leq \hat C{\left\u^{k}\bar{u}\right\}^{\gamma1} $$for all sufficiently large k and a constant \(\hat C>0\), which proves (3).

Proof of 2: Lemma 1 yields \(F(\bar {u})=0\) due to (3). To prove the remaining claims it suffices to establish that
$$ \lim_{k\to\infty}\left( {\lVert B_{k+1}B_{k}\rVert}^{\frac{1}{\gamma1}}\right)^{\frac{1}{p^{k}}}=0 \qquad \forall p\in[1,\gamma). $$(4)As (u^{k}) has qorder at least γ by (2), its rorder is also at least γ, cf. [25, 9.3.2], thus \(\lim _{k\to \infty }{\lVert u^{k}\bar {u}\rVert }^{\frac {1}{p^{k}}}=0\) for all p ∈ [1,γ), so (4) follows from (3).
□
Remark 1
For Broyden’s method, it is unknown whether (2) holds for any γ > 1 if n > 1, cf. also [18]. For n = 1, it is known that (2) holds with γ equal to the golden mean [31]. In Theorem 6, we show that this result extends to arbitrary n provided F has n − 1 affine component functions and B_{0} is initialized exactly.
Uniform linear independence of dimension d
The following definition is the appropriate generalization of uniform linear independence for the purposes of this paper.
Definition 1
Let \(n\in \mathbb {N}\) and \(d\in \mathbb {N}\). The sequence of vectors \((s^{k})\subset \mathbb {R}^{n}\setminus \{0\}\) is called uniformly linearly independent of dimension d iff there exist constants \(m\in \mathbb {N}\) and ρ > 0 such that for every sufficiently large k the set:
contains d vectors \(s^{k_{1}}, \ldots , s^{k_{d}}\) such that all singular values of the matrix:
are larger than ρ.
Remark 2
The usual notion of uniform linear independence, cf. [5, (AS.4)], is recovered for d = n. If d is not specified, then it is understood that d = n.
Behavior of the Broydenlike method on mixed systems
To conveniently state results for mixed linear–nonlinear systems of equations, we will use the following assumption.
Assumption 1
Let \(n\in \mathbb {N}\), d ∈ [n]_{0} and J := [n] ∖ [d]. Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) satisfy \(F_{j}(u)={a_{j}^{T}} u + b_{j}\) for all j ∈ J, where \(a_{j}\in \mathbb {R}^{n}\) and \(b_{j}\in \mathbb {R}\) for all j ∈ J. Let \(B_{0}\in \mathbb {R}^{n\times n}\) satisfy \({B_{0}^{j}}={a_{j}^{T}}\) for all j ∈ J and suppose that B_{0} is invertible.
Remark 3
Due to \({B_{0}^{j}}={a_{j}^{T}}\) for all j ∈ J and the invertibility of B_{0}, Assumption 1 implies \(\dim ({\langle \{a_{j}\}_{j\in J}\rangle })=nd\), hence \(\dim ({\langle \{a_{j}\}_{j\in J}\rangle }^{\perp })=d\).
The first result establishes basic properties of Algorithm 1 under Assumption 1. It generalizes [18, Lemma 2.1].
Lemma 4
Let Assumption 1 hold and let (u^{k}), (s^{k}) and (B_{k}) be generated by Algorithm 1. Then we have for each j ∈ J and all k ≥ 1 the identities \({B_{k}^{j}} = {a_{j}^{T}}\), F_{j}(u^{k}) = 0, \({a_{j}^{T}} s^{k}=0\) and B_{k}a_{j} = B_{1}a_{j}.
Proof
The proof of [18, Lemma 2.1] applies without changes. □
Under the assumptions of Lemma 4, the sequence (s^{k}) necessarily violates uniform linear independence except if J = ∅.
Corollary 1
Any selection \(\{s^{k_{1}}, \ldots , s^{k_{d+1}}\}\) of d + 1 vectors from the sequence (s^{k})_{k≥ 1} of Lemma 4 is linearly dependent.
Proof
Lemma 4 yields \({a_{j}^{T}} s^{k}=0\) for all j ∈ J and all k ≥ 1, thus \(s^{k}\in {\langle \{a_{j}\}_{j\in J}\rangle }^{\perp }\) for all k ≥ 1. The claim follows from \(\dim ({\langle \{a_{j}\}_{j\in J}\rangle }^{\perp })=d\). □
To conveniently state the next result, we introduce some notation.
Definition 2
Let Assumption 1 hold. We set \({\mathcal {A}}:={\langle \{a_{j}\}_{j\in J}\rangle }\) and \({\mathcal {S}}:={\mathcal {A}}^{\perp }\). Furthermore, we let \(\{{\mathfrak {s}}^{i}\}_{i\in [d]}\) be an orthonormal basis of \({\mathcal {S}}\) and we denote \(S:=\begin {pmatrix} {\mathfrak {s}}^{1} & {\ldots } & {\mathfrak {s}}^{d} \end {pmatrix}\in \mathbb {R}^{n\times d}\). For any matrix \(B\in \mathbb {R}^{n\times n}\), we denote:
We show that under Assumption 1, the iterates (u^{k})_{k≥ 1} obtained by applying Algorithm 1 to F can also be generated by applying it to a mapping G acting between \(\mathbb {R}^{d}\). The following result extends [18, Theorem 2.3].
Theorem 3
Let Assumption 1 hold and let (u^{k}), (B_{k}) and (σ_{k}) be generated by Algorithm 1, where each B_{k} is assumed to be invertible. Define:
as well as:
Then the application of Algorithm 1 to G with initial guess (w^{0},C_{0}) and updating sequence (τ_{k}) generates sequences (w^{k}) and (C_{k}) with the following properties:

1.
Each C_{k} is invertible and for all k ≥ 1, there hold:
$$ u^{k} = u^{1} + S w^{k1}, \qquad \widetilde F(u^{k}) = G(w^{k1}) \qquad\text{and}\qquad C_{k1} = \widetilde B_{k} S. $$(5) 
2.
The iterates (u^{k}) converge to \(\bar {u}\in \mathbb {R}^{n}\) if and only if there is \(\bar w\in \mathbb {R}^{d}\) such that (w^{k}) converges to \(\bar w\). If (u^{k}) and (w^{k}) converge to \(\bar {u}\) and \(\bar w\), respectively, then we have for all k ≥ 1:
$$ \bar u = u^{1} + S\bar w \qquad\text{ and }\qquad {\left\u^{k}\bar{u}\right\} = {\left\w^{k1}\bar w\right\}. $$(6) 
3.
The matrices (B_{k}) converge to \(B\in \mathbb {R}^{n\times n}\) if and only if there is \(C\in \mathbb {R}^{d\times d}\) such that (C_{k}) converges to C. If (B_{k}) and (C_{k}) converge to B and C, respectively, then we have for all k ≥ 1:
$$ C = \widetilde B S \qquad\text{ and }\qquad {\left\C_{k}C\right\} = {\left\B_{k}B\right\}. $$
Proof

Proof of 1: The proof of [18, Theorem 2.3], which is for (σ_{k}) ≡ 1, can be used almost verbatim.

Proof of 2: We will use several times that \({\lVert S v\rVert }={\lVert v\rVert }\) for all \(v\in \mathbb {R}^{d}\) because the columns of S are orthonormal.Let (u^{k}) converge to \(\bar {u}\). From (5), it follows that u^{n} − u^{m} = S(w^{n− 1} − w^{m− 1}) for all n,m ≥ 1, which implies that (w^{k}) is a Cauchy sequence, hence convergent. Denoting the limit by \(\bar w\) we deduce from (5) that \(\bar {u} = u^{1} + S\bar w\), which in turn yields \({\lVert u^{k}\bar {u}\rVert } = {\lVert S(w^{k1}\bar w)\rVert }\), hence \({\lVert u^{k}\bar {u}\rVert } = {\lVert w^{k1}\bar w\rVert }\). If (w^{k}) converges to \(\bar w\), then we can argue similarly.

Proof of 3: Let (B^{k}) converge to B. From (5), it follows that \({\lVert C_{n1}C_{m1}\rVert }\leq {\lVert \widetilde B_{n}\widetilde B_{m}\rVert }={\lVert B_{n}B_{m}\rVert }\) for all n,m ≥ 1, where we used that \({\lVert S\rVert }=1\) and that \({B_{n}^{j}}  {B_{m}^{j}} = 0\) for all j ∈ J due to Lemma 4. This implies that (C^{k}) is a Cauchy sequence, hence convergent. Denoting the limit by C, we deduce from (5) that \(C = \widetilde B S\). Let now (C^{k}) converge to C. We denote by \(A\in \mathbb {R}^{n\times (nd)}\) the matrix:
$$ A := \begin{pmatrix} \mathfrak{a}^{1} & {\ldots} & \mathfrak{a}^{nd} \end{pmatrix}, $$where \(\{\mathfrak {a}^{i}\}_{i\in [nd]}\) is an orthonormal basis of \({\mathcal {A}}\). Furthermore, let \(\hat S\in \mathbb {R}^{n\times n}\) be given by \(\hat S:=\begin {pmatrix}S & A \end {pmatrix}\). Since \({B_{k}^{j}} S = {a_{j}^{T}} S = 0\) and B_{k}A = B_{1}A for all j ∈ J and all k ≥ 1 by Lemma 4, we infer that:
$$ B_{k} \hat S = \begin{pmatrix} \begin{array}{ccc}\widetilde B_{k} S \\ 0 \end{array} \biggl\lvert\biggr. & B_{k} A \end{pmatrix} = \begin{pmatrix} \begin{array}{ccc}C_{k1} \\ 0 \end{array} \biggl\lvert\biggr. & B_{1} A \end{pmatrix}, $$(7)where we also used the identity \(\widetilde B_{k} S = C_{k1}\) from (5). Since \(\hat S \hat S^{T} = I\), it follows that:
$$ B_{k} = \begin{pmatrix} \begin{array}{ccc}C_{k1} \\ 0 \end{array} \biggl\lvert\biggr. & B_{1} A \end{pmatrix} \hat S^{T} $$for all k ≥ 1. Since (C_{k}) converges, we see that (B_{k}) converges, too. Denoting the limit of (B_{k}) by B, we conclude from (5) that \(C = \widetilde B S\) and from (7) that \({\lVert C_{k1}C\rVert }={\lVert (B_{k}  B)\hat S\rVert } ={\lVert B_{k}  B\rVert }\), where we used that \(\hat S\) is orthogonal.
□
Remark 4
Theorem 3 does not require invertibility of \(F^{\prime }(\bar {u})\), which allows us to derive results for singular \(F^{\prime }(\bar {u})\), too, cf. Theorems 6 and 7.
Convergence of the Broydenlike matrices
The general result
From [22, Theorem 5.7], we recall the following sufficient condition for convergence of (B_{k}) to \(F^{\prime }(\bar {u})\).
Theorem 4
Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) be strictly differentiable at \(\bar {u}\). Let (u^{k}), (s^{k}) and (B_{k}) be generated by Algorithm 1. Let (u^{k}) converge to \(\bar {u}\) and let (s^{k}) be uniformly linearly independent. Then \(B:=\lim _{k\to \infty } B_{k}\) exists and satisfies \(B=F^{\prime }(\bar {u})\). Moreover, we have \(F(\bar {u})=0\). If, in addition, \(F^{\prime }(\bar {u})\) is invertible, then (u^{k}) converges qsuperlinearly.
Proof
There are three differences to [22, Theorem 5.7]. The first is that we replaced continuous differentiability of F by strict differentiability. It is easy to verify that the proof of [22, Theorem 5.7] still holds under this weaker assumption. The second and third difference are the statements for \(F(\bar {u})=0\) and the qsuperlinear convergence of (u^{k}), which we added. They follow from Lemma 1 and Lemma 2, respectively. □
Corollary 1 shows that for mixed linear–nonlinear systems with exact initialization, the uniform linear independence required in Theorem 4 does not hold. The following result extends Theorem 4 to mixed systems. We recall that the matrix S is introduced in Definition 2.
Theorem 5
Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\). Let Assumption 1 hold and let (u^{k}), (s^{k}) and (B_{k}) be generated by Algorithm 1, where each B_{k} is assumed to be invertible. Let (u^{k}) converge to \(\bar {u}\) and suppose that \(w\mapsto \widetilde F(\bar {u}+Sw)\) is strictly differentiable at w = 0. Let (s^{k}) be uniformly linearly independent of dimension d. Then \(B:=\lim _{k\to \infty } B_{k}\) exists and satisfies \(\widetilde B S = \widetilde F^{\prime }(\bar {u}) S\), Ba_{j} = B_{1}a_{j} and \(B^{j} = {a_{j}^{T}}=F_{j}^{\prime }(\bar {u})\) for all j ∈ J. Moreover, we have \(F(\bar {u})=0\). If \(\widetilde F^{\prime }(\bar {u}) S\) is invertible, then (u^{k}) converges qsuperlinearly. If F is strictly differentiable at \(\bar {u}\), then \(E:=\lim _{k\to \infty } E_{k}\) exists and satisfies E = E_{1}(I − SS^{T}).
Proof
For d = n, we have J = ∅, \(\widetilde E=E\) and \(S\in \mathbb {R}^{n\times n}\) is orthogonal, so the result is equivalent to Theorem 4 and there is nothing to prove. For d < n, we begin by noting that Lemma 4 yields \({B_{k}^{j}} = {a_{j}^{T}}\) and B_{k}a_{j} = B_{1}a_{j} for all j ∈ J and all k ≥ 1, which carries over to \(\lim _{k\to \infty } B_{k}\) if it exists. Next we show the existence of \(\lim _{k\to \infty } B_{k}\). By applying Theorem 3, we obtain sequences (C_{k}) and (w^{k}) and a point \(\bar w\) as stated in that theorem. Part 3 of that theorem shows that for convergence of (B_{k}), it suffices to demonstrate the convergence of (C_{k}). Denoting \({s_{w}^{k}}:=w^{k+1}w^{k}\) we now prove that \(({s_{w}^{k}})\subset \mathbb {R}^{d}\setminus \{0\}\) is uniformly linearly independent (of dimension d). Indeed, using (5), we have:
This implies that the matrix \(\hat S^{k}\) appearing in the definition of uniform linear independence of dimension d of (s^{k}) and the matrix appearing in the definition of uniform linear independence of \(({s_{w}^{k}})\) have identical singular values, so the uniform linear independence of dimension d of (s^{k}) implies the uniform linear independence of \(({s_{w}^{k}})\). The uniform linear independence of \(({s_{w}^{k}})\) and the results of Theorem 3 allow us to apply Theorem 4 to G, (w^{k}), \(({s_{w}^{k}})\), and (C_{k}). This yields convergence of (C_{k}) to \(G^{\prime }(\bar w)=\widetilde F^{\prime }(\bar {u}) S\), which by means of Theorem 3 3 implies \(\widetilde BS = \widetilde F^{\prime }(\bar {u}) S\). Since (B_{k}) converges, Lemma 1 supplies \(F(\bar {u})=0\) and Theorem 4 implies qsuperlinear convergence of (w^{k}), from which the qsuperlinear convergence of (u^{k}) follows by use of (6). If F is strictly differentiable at \(\bar {u}\), then the claims for B imply that E exists and satisfies \(\widetilde E S = 0\) as well as E^{j} = 0 and Ea_{j} = E_{1}a_{j} for all j ∈ J. It is easy to see that these conditions are equivalent to E = E_{1}(I − SS^{T}). □
Remark 5

1.
If F is strictly differentiable at \(\bar {u}\), then \(\widetilde F(\bar {u}+Sw)\) is strictly differentiable at w = 0. If \(F^{\prime }(\bar {u})\) is invertible, then \(\widetilde F^{\prime }(\bar {u})S\) is invertible.

2
To illustrate the conditions obtained for B let us consider the case that \({\mathcal {S}}=\{(s_{1},s_{2},\ldots ,s_{n})^{T}\in \mathbb {R}^{n}: s_{j}=0 \forall j>d\}\). In this case, we can use for S the first d columns of the n × n identity matrix. Thus, \(\widetilde {B} S\) consists of the entries B^{i,j}, i,j ∈ [d], and \(\widetilde B S = \widetilde F^{\prime }(\bar {u})S\) states that the first d × d block of B agrees with the respective block of \(F^{\prime }(\bar {u})\). From Ba_{j} = B_{1}a_{j} for all j ∈ J, we obtain in addition that the entries B^{i,j}, i ∈ [d], j ∈ [n] ∖ [d], are the same as in B_{1}. If F is strictly differentiable at \(\bar {u}\), then this implies that B^{i,j}, i ∈ [d], j ∈ [n] ∖ [d], cannot equal the respective entries of \(F^{\prime }(\bar {u})\) if the rank of \((E_{0}^{i,j})_{i\in [d],j\in [n]\setminus [d]}\) is larger than one.
The special case d = 1
Sufficient conditions for uniform linear independence of \((s^{k})\subset \mathbb {R}^{n}\) are unknown for Broyden’s method if n > 1 (hence also for the more general Algorithm 1). However, any sequence \((s^{k})\subset \mathbb {R}^{n}\setminus \{0\}\) is uniformly linearly independent of dimension 1, hence Theorem 5 implies the following result.
Corollary 2
Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\). Let Assumption 1 hold for d = 1 and let (u^{k}), (s^{k}) and (B_{k}) be generated by Algorithm 1, where each B_{k} is assumed to be invertible. Let (u^{k}) converge to \(\bar {u}\) and suppose that \(t\mapsto F_{1}(\bar {u}+t\bar s)\) is strictly differentiable at t = 0, where \(\bar s:=S\). Then \(B:=\lim _{k\to \infty } B_{k}\) exists and satisfies \(B^{1} \bar s = F_{1}^{\prime }(\bar {u})(\bar s)\), \(B^{1} a_{j} = {B_{1}^{1}} a_{j}\) and \(B^{j} = {a_{j}^{T}}=F_{j}^{\prime }(\bar {u})\) for all j > 1. Moreover, we have \(F(\bar {u})=0\). If \(F_{1}^{\prime }(\bar {u})(\bar s)\neq 0\), then (u^{k}) converges qsuperlinearly. If F_{1} is strictly differentiable at \(\bar {u}\), then \(E:=\lim _{k\to \infty } E_{k}\) exists and satisfies \(E^{1} = {E_{1}^{1}} (I\bar s \bar s^{T})\) and E^{j} = 0 for all j > 1; in particular, (B_{k}) converges to \(F^{\prime }(\bar {u})\) iff \({E_{1}^{1}} a_{j} = 0\) for all j > 1.
Remark 6
Under the assumptions of Corollary 2, each B_{k} agrees with \(F^{\prime }(\bar {u})\) in all rows except the first and \(B:=\lim _{k\to \infty }B_{k}\) exists, yet B will usually be different from \(F^{\prime }(\bar {u})\) (provided \(F^{\prime }(\bar {u})\) exists). If, say, \(\bar s\) is the first canonical unit vector, then \(E^{1} = \begin {pmatrix}0 & E_{1}^{1,2} & {\ldots } & E_{1}^{1,n}\end {pmatrix}\); hence, E = 0 holds iff \(B_{1}^{1,j}=\left [F_{1}^{\prime }(\bar {u})\right ]_{j}\) for all j > 1, where \([F_{1}^{\prime }(\bar {u})]_{j}\) indicates the j th component of the vector \(F_{1}^{\prime }(\bar {u})\). This also shows that if \({\lVert E_{0}\rVert }\) is large, then \({\lVert E\rVert }\) will usually be large, too. The numerical results in Section 5 and our numerical experience from other work confirm that (B_{k}) will frequently not converge to \(F^{\prime }(\bar {u})\) and indicate that this also holds in more nonlinear settings.
We now focus on Broyden’s method, where (σ_{k}) ≡ 1. In fact, it is enough if σ_{k} = 1 for all k sufficiently large. For this case, we can strengthen the findings of Corollary 2 in several ways, for instance by providing orders of convergence for (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\). These results are derived by exploiting the fact that if σ_{k} = 1 for a \(k\in \mathbb {N}\), then s^{k+ 1} and thus u^{k+ 2} can also be generated by the onedimensional secant method, cf. the proof of Theorem 6 1. Correspondingly, let us first argue for the onedimensional case.
Lemma 5
Let \(G:\mathbb {R}\rightarrow \mathbb {R}\). Let (w^{k}), \(({s_{w}^{k}})\) and (C_{k}) be generated by Algorithm 1 applied to G, using an update sequence (τ_{k}) that satisfies:
Let (w^{k}) converge to \(\bar w\) with \(G(\bar w)=0\). For k ≥ 0, respectively, k ≥ 1 define:
Then the following statements hold:

1.
Let G be differentiable at \(\bar w\) with \(G^{\prime }(\bar w)\neq 0\). Let \(\varphi :=\frac {1+\sqrt 5}{2}\) and suppose that:
$$ \lim_{k\to\infty}\frac{{\lvert w^{k+1}\bar w\rvert}}{{\lvert w^{k}\bar w\rvert}^{\varphi}} $$(8)exists. Then we have:
$$ \lim_{k\to\infty}\frac{{Q_{k}^{G}}}{q_{k2}^{G}} = 1. $$If, in addition, \(\lim _{k\to \infty }\tau _{k}=1\) is satisfied, then there holds:
$$ \lim_{k\to\infty}\frac{{\lvert C_{k+1}C_{k}\rvert}}{{\lvert C_{k}C_{k1}\rvert}^{\varphi}}={\lvert G^{\prime}(\bar w)\rvert}^{1\varphi}. $$ 
2.
Let \(m_{0}\in \mathbb {N}\), κ ∈ (0,1) and \(\hat \kappa >0\). Let G be m_{0} + 1 times differentiable at \(\bar w\). Let \(G^{(m)}(\bar w)=0\) for all m ∈ [m_{0}] and \(G^{(m_{0}+1)}(\bar w)\neq 0\). Suppose that:
$$ \lim_{k\to\infty} {q_{k}^{G}} = \kappa \qquad\text{ and }\qquad \lim_{k\to\infty}\frac{{\lvert {s_{w}^{k}}\rvert}}{{\lvert w^{k}\bar w\rvert}} = \hat\kappa $$are satisfied. Then we have:
$$ \lim_{k\to\infty}{Q_{k}^{G}} = \kappa^{m_{0}}. $$
Proof

Proof of 1: Using \(G(\bar w)=0,\) we find:
$$ \begin{array}{llll} \frac{\tau_{k1}}{\tau_{k}}\cdot\frac{{\lvert C_{k+1}C_{k}\rvert}}{{\lvert C_{k}C_{k1}\rvert}} & = \frac{{\lvert G(w^{k+1})\rvert}{\lvert s_{w}^{k1}\rvert}}{{\lvert {s_{w}^{k}}\rvert}{\lvert G(w^{k})\rvert}}\\ & = \frac{{\lvert G^{\prime}(\bar{w})(w^{k+1}\bar{w})+o({\lvert w^{k+1}\bar{w}\rvert})\rvert}{\lvert s_{w}^{k1}\rvert}}{{\lvert {s_{w}^{k}}\rvert}{\lvert G^{\prime}(\bar{w})(w^{k}\bar{w})+o({\lvert w^{k}\bar{w}\rvert})\rvert}} \end{array} $$for all k ≥ 1. As (8) implies that (w^{k}) converges qsuperlinearly, a wellknown lemma of Dennis and Moré, cf. [7, Lemma 2.1], yields \(\lim _{k\to \infty }\frac {{\lvert {s_{w}^{k}}\rvert }}{{\lvert w^{k}\bar {w}\rvert }}=1\). Therefore, we have:
$$ \begin{array}{llll} \lim_{k\to\infty}\frac{{Q_{k}^{G}}}{q_{k2}^{G}} & = \lim_{k\to\infty} \frac{{\lvert C_{k+1}C_{k}\rvert}}{{\lvert C_{k}C_{k1}\rvert}}\frac{{\lvert w^{k2}\bar{w}\rvert}}{{\lvert w^{k1}\bar{w}\rvert}}\\ & = \lim_{k\to\infty} \frac{{\lvert G^{\prime}(\bar{w})\rvert}{\lvert w^{k+1}\bar{w}\rvert}{\lvert w^{k1}\bar{w}\rvert}}{{\lvert w^{k}\bar{w}\rvert}{\lvert G^{\prime}(\bar{w})\rvert}{\lvert w^{k}\bar{w}\rvert}} \frac{{\lvert w^{k2}\bar{w}\rvert}}{{\lvert w^{k1}\bar{w}\rvert}} \\ & = \lim_{k\to\infty} \frac{{\lvert w^{k+1}\bar{w}\rvert}{\lvert w^{k2}\bar{w}\rvert}}{{\lvert w^{k}\bar{w}\rvert}^{2}}, \end{array} $$provided the latter limit exists. By applying (8) multiple times, we obtain:
$$ \lim_{k\to\infty} \frac{{\lvert w^{k+1}  \bar{w}\rvert}{\lvert w^{k2}  \bar{w}\rvert}}{{\lvert w^{k}\bar{w}\rvert}^{2}} = \lim_{k\to\infty} \mu^{\varphi1\frac{1}{\varphi}}{\lvert w^{k1}  \bar{w}\rvert}^{\varphi^{2}2\varphi+\frac{1}{\varphi}} = 1, $$where \(\mu \in [0,\infty )\) denotes the limit from (8) and where we used the identities \(\varphi ^{2}2\varphi +\frac {1}{\varphi } = \varphi +1+\frac {1}{\varphi } = \varphi 1\frac {1}{\varphi }=0\) that follow from φ^{2} − φ − 1 = 0. Similar considerations show that:
$$ \lim_{k\to\infty}\frac{{\lvert C_{k+1}C_{k}\rvert}}{{\lvert C_{k}C_{k1}\rvert}^{\varphi}}= \bar \mu \lim_{k\to\infty} \frac{{\lvert w^{k+1}\bar{w}\rvert}}{{\lvert w^{k}\bar{w}\rvert}}\cdot \frac{{\lvert w^{k1}\bar{w}\rvert}^{\varphi}}{{\lvert w^{k}\bar{w}\rvert}^{\varphi}} = \bar \mu $$for \(\bar \mu :={\lvert G^{\prime }(\bar {w})\rvert }^{1\varphi }\), where we used (8) to obtain the final equality.

Proof of 2: Let us prove the claim for m_{0} = 1; it is readily generalized to arbitrary m_{0} ≥ 1. Taylor expansion around \(\bar {w}\) together with \(G(\bar {w})=0\) implies
$$ \begin{array}{llll} & \lim_{k\to\infty} \frac{{\lvert G(w^{k+1})\rvert}}{{\lvert G(w^{k})\rvert}}\\ & \enspace = \lim_{k\to\infty}\frac{{\lvert G^{\prime}(\bar{w})(w^{k+1}\bar{w})+\frac{1}{2} G^{\prime\prime}(\bar{w})(w^{k+1}\bar{w})^{2}+o({\lvert w^{k+1}\bar{w}\rvert}^{2})\rvert}}{{\lvert G^{\prime}(\bar{w})(w^{k}\bar{w})+\frac{1}{2} G^{\prime\prime}(\bar{w})(w^{k}\bar{w})^{2}+o({\lvert w^{k}\bar{w}\rvert}^{2})\rvert}}\\ & \enspace = \lim_{k\to\infty}\frac{{\lvert G^{\prime\prime}(\bar{w})\rvert}}{{\lvert G^{\prime\prime}(\bar{w})\rvert}}\cdot\frac{{\lvert w^{k+1}\bar{w}\rvert}^{2}}{{\lvert w^{k}\bar{w}\rvert}^{2}} = \kappa^{2} = \kappa^{m_{0}+1}. \end{array} $$By assumption, we have \(\hat \kappa =\lim _{k\to \infty }\frac {{\lvert {s_{w}^{k}}\rvert }}{{\lvert w^{k}\bar {w}\rvert }}>0\), hence:
$$ \lim_{k\to\infty}\frac{{\lvert s_{w}^{k1}\rvert}}{{\lvert {s_{w}^{k}}\rvert}} = \lim_{k\to\infty}\frac{\hat\kappa{\lvert w^{k1}\bar{w}\rvert}}{\hat\kappa{\lvert w^{k}\bar{w}\rvert}} = \frac{1}{\kappa}. $$By definition, there holds for all k ≥ 1:
$$ \frac{\tau_{k1}}{\tau_{k}}\cdot {Q_{k}^{G}} = \frac{{\lvert G(w^{k+1})\rvert}}{{\lvert G(w^{k})\rvert}} \cdot \frac{{\lvert s_{w}^{k1}\rvert}}{{\lvert {s_{w}^{k}}\rvert}}. $$Taking the limit for \(k\to \infty \) yields the claim.
□
We now provide a detailed description of the convergence behavior of Algorithm 1 with σ_{k} = 1 for all large k and d = 1, where F has n − 1 affine component functions F_{2},…,F_{n}. We first present a result for nonlinear F_{1} and then deal with affine F_{1}.
Theorem 6
Let Assumption 1 hold for d = 1 and let (u^{k}), (s^{k}) and (B_{k}) be generated by Algorithm 1, with each B_{k} invertible. Suppose that σ_{k} = 1 for all k large enough and that (u^{k}) converges to some \(\bar {u}\). Set \(\bar s:=S\) and define:
for all k ≥ 0, respectively, k ≥ 1. Then the following statements hold:

1.
Let \(t\mapsto F_{1}(\bar {u}+t\bar s)\) be twice differentiable near t = 0 with \(t\mapsto F_{1}^{\prime \prime }(\bar {u} + t \bar s)(\bar s,\bar s)\) continuous at t = 0 and \(F_{1}^{\prime }(\bar {u})(\bar s)\neq 0\). Then we have:
$$ \limsup_{k\to\infty}\frac{{\left\u^{k+1}\bar{u}\right\}}{{\left\u^{k}\bar{u}\right\}^{\varphi}} \leq \left\lvert\frac{F_{1}^{\prime\prime}(\bar{u})(\bar s,\bar s)}{2 F_{1}^{\prime}(\bar{u})(\bar s)}\right\rvert^{\frac{1}{\varphi}}, $$(9)where \(\varphi :=\frac {1+\sqrt 5}{2}\). For all p ∈ [1,φ), there holds:
$$ \lim_{k\to\infty}{\left\B_{k+1}B_{k}\right\}^{\frac{1}{p^{k}}}=0. $$(10)If, in addition, \(F_{1}^{\prime \prime }(\bar {u})(\bar s,\bar s)\neq 0\), then (9) holds with equality and \(\limsup \) replaced by \(\lim \), and we have:
$$ \lim_{k\to\infty}\frac{{\left\B_{k+1}B_{k}\right\}}{{\left\B_{k}B_{k1}\right\}^{\varphi}}=\left\lvert F_{1}^{\prime}(\bar{u})(\bar s)\right\rvert^{1\varphi}\qquad\text{and}\qquad \lim_{k\to\infty}\frac{Q_{k}}{q_{k2}} = 1. $$(11) 
2.
Let \(m_{0}\in \mathbb {N}\) and denote by κ ∈ (0,1) the unique root of the polynomial \(x^{m_{0}+1}+x^{m_{0}}1\) in (0,1). Let \(t\mapsto F_{1}(\bar {u}+t\bar s)\) be m_{0} + 1 times differentiable near t = 0 with its (m_{0} + 1)th derivative continuous at t = 0. If \(F_{1}^{(m)}(\bar {u})(\bar s,\ldots ,\bar s)=0\) for all m ∈ [m_{0}] and \(F_{1}^{(m_{0}+1)}(\bar {u})(\bar s,\ldots ,\bar s)\neq 0\), then:
$$ \lim_{k\to\infty} q_{k} = \kappa \qquad\text{ and }\qquad \lim_{k\to\infty} Q_{k} = \kappa^{m_{0}}. $$
Proof

Proof of 1: From Theorem 3, we obtain \(G:\mathbb {R}\rightarrow \mathbb {R}\), (w^{k}), (C_{k}), and \(\bar w\) as stated in that theorem. We let \({s_{w}^{k}}:=w^{k+1}w^{k}\) for all k ≥ 0. Due to \(C_{k} {s_{w}^{k}} ({s_{w}^{k}})^{T} / {\lvert {s_{w}^{k}}\rvert }^{2} = C_{k}\), we have \(C_{k+1}=(G(w^{k+1})G(w^{k}))/{s_{w}^{k}}\) if σ_{k} = 1 and thus Algorithm 1 for G agrees with the onedimensional secant method for all sufficiently large k. As \((G(w^{k+1})G(w^{k}))/{s_{w}^{k}} \to G^{\prime }(\bar w)\) for \(k\to \infty \), we obtain the convergence of (C_{k}), thus \(G(\bar w)=0\) by Lemma 1. Furthermore, there holds \(G^{\prime }(\bar w) = \widetilde F^{\prime }(\bar {u}) S = F_{1}^{\prime }(\bar {u})(\bar s)\neq 0\). Since (w^{k}) converges to \(\bar w\) with \(G(\bar w)=0\) and \(G^{\prime }(\bar w)\neq 0\), classical results for the secant method, cf. [31, (6)], yield that if \(G^{\prime \prime }(\bar w)\neq 0\), then:
$$ \lim_{k\to\infty}\frac{{\lvert w^{k}\bar w\rvert}}{{\lvert w^{k1}\bar w\rvert}^{\varphi}} =\left\lvert\frac{G^{\prime\prime}(\bar w)}{2 G^{\prime}(\bar w)}\right\rvert^{\frac{1}{\varphi}}, $$which by use of (5) is readily transformed into (9) with equality and \(\limsup \) replaced by \(\lim \). Similarly for (9). The rorder (10) follows from Lemma 3 using that \(F(\bar {u})=0\) due to Corollary 2. Since \({Q_{k}^{G}}=Q_{k+1}\) and \(q_{k2}^{G} = q_{k1}\) by (5) and (6), Lemma 5 1 yields (11).

Proof of 2: We argue only for m_{0} = 1. It follows from Corollary 2 that \(F(\bar {u})=0\). It is a standard result for the onedimensional secant method, cf. [10, Section 2.2.2], that \(\lim _{k\to \infty }{q_{k}^{G}} = \kappa \), hence \(\lim _{k\to \infty }q_{k} = \kappa \), too. The claim on (Q_{k}) follows via \(({Q_{k}^{G}})\) from Lemma 5 2 if we can show that there is \(\hat \kappa >0\) such that:
$$ \lim_{k\to\infty}\frac{{\lvert {s_{w}^{k}}\rvert}}{{\lvert w^{k}\bar{w}\rvert}} = \hat\kappa. $$Using \(G^{\prime }(\bar w)=0\), \(G^{\prime \prime }(\bar w)\neq 0\), and \(\lim _{k\to \infty } {q_{k}^{G}}=\kappa \), elementary considerations show that there is an index k_{0} such that \((w^{k}\bar w)_{k\geq k_{0}}\) converges to zero without changing signs. For sufficiently large k, we thus have:
$$ \left\lvert {s_{w}^{k}}\right\rvert = \left\lvert (w^{k+1}  \bar w)  (w^{k}  \bar w)\right\rvert = (1{q_{k}^{G}})\left\lvert w^{k}\bar w\right\rvert, $$hence, the desired limit exists with \(\hat \kappa =1\kappa >0\).
□
Remark 7

1.
If \(F^{\prime }(\bar {u})\) is invertible, then \(F_{1}^{\prime }(\bar {u})(\bar s)\neq 0\). Indeed, since \(\bar s\in {\mathcal {S}}\) and since \(F_{j}^{\prime }(\bar {u})={a_{j}^{T}}\in {\mathcal {A}} = {\mathcal {S}}^{\perp }\) for all j > 1, we have \(F_{j}^{\prime }(\bar {u})(\bar s)=0\) for all j > 1; hence, \(F_{1}^{\prime }(\bar {u})(\bar s)=0\) would imply \(F^{\prime }(\bar {u})(\bar s)=0\).

2.
(9) and (10) show that (u^{k}), respectively, \(({\lVert B_{k+1}B_{k}\rVert })\) have qorder, respectively, rorder no less than φ. If \(F_{1}^{\prime \prime }(\bar {u})(\bar s,\bar s)\neq 0\), then the additional part of 1 implies that both (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\) have qorder and rorder φ, cf. [25, 9.3.3]. For (u^{k}), the qorder φ improves the best available result, which is the 2step qquadratic convergence ensured by Theorem 2 for d = 1. Moreover, the example in Section 4.3.2 shows that if \(F_{1}^{\prime \prime }(\bar {u})(\bar s,\bar s)=0\), then it is possible to have a higher qorder than φ.

3.
For m_{0} = 1, Theorem 6 2 is related to the results in [6, 19].

4.
Corollary 2 is valid under the assumptions of Theorem 6, so in 1 and 2, we also have \(F(\bar {u})=0\) and B satisfies the conditions from that corollary.
In the affine setting, Algorithm 1 terminates after finitely many steps, provided a root exists and σ_{k} = 1 for at least one k (if the Jacobian is regular). More precisely, we have the following result.
Theorem 7
Let \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) be affine. Let Assumption 1 hold for d = 1 and let (u^{k}), (s^{k}) and (B_{k}) be generated by Algorithm 1, with each B_{k} invertible. Let F(u^{0})≠ 0. Then the following statements hold:

1.
Let \(F^{\prime }\) be invertible. Then F has a unique root \(\bar {u}\). If there is an index k ≥ 1 with σ_{k} = 1, then \(u^{k+1}=\bar {u}\) or \(u^{k+2}=\bar {u}\), hence the algorithm terminates in iteration k + 1 or k + 2 with output \(u^{\ast }=\bar {u}\). If the algorithm does not terminate with output \(u^{\ast } = \bar {u}\), then (u^{k}) converges to \(\bar {u}\) and satisfies (1).

2.
Let \(F^{\prime }\) be singular. If F has a root, then F(u^{1}) = 0. If F does not have a root, then the algorithm generates a diverging sequence (u^{k}) such that F(u^{k}) = (ω,0,…,0)^{T} for all k ≥ 1 and some ω≠ 0.
Proof

Proof of 1: From [22, Theorem 3.2], we know that for affine F with invertible \(F^{\prime }\), Algorithm 1 converges qsuperlinearly for any u^{0} if all B_{k} are invertible and the algorithm does not terminate with output \(u^{\ast }=\bar {u}\). (Since d = 1, it is also not difficult to establish this directly.) Theorem 1 now yields (1). Corollary 2 yields the convergence of (B_{k}). It remains to prove that if σ_{k} = 1 and F(u^{k+ 1})≠ 0, then F(u^{k+ 2}) = 0. Since F_{j}(u^{k}) = 0 for all j > 1 and all k ≥ 1 by Lemma 4, we have to show that F_{1}(u^{k+ 2}) = 0. Similar as in the proof of Theorem 6, we use Theorem 3 to obtain \(\{w^{j}\}_{j=0}^{k+1}\) and \(\{C_{j}\}_{j=0}^{k+1}\) by applying Algorithm 1 to the affine function \(G:\mathbb {R}\rightarrow \mathbb {R}\), \(G(w):=F_{1}(u^{1}+w\bar s)\), where \(\bar s:=S\). In view of (5), we have to show that G(w^{k+ 1}) = 0. From τ_{k− 1} = σ_{k} = 1, it follows that \(C_{k} = (G(w^{k})G(w^{k1}))/(w^{k}w^{k1}) = G^{\prime }\). Using C_{k}(w^{k+ 1} − w^{k}) = −G(w^{k}), we find \(G(w^{k+1}) = G(w^{k}) + G^{\prime }\cdot (w^{k+1}  w^{k}) = G(w^{k})G(w^{k})=0\), hence F(u^{k+ 2}) = 0.

Proof of 2: Defining \(A:=F^{\prime }\), we note that A has rank n − 1 since \(A \bar s=0\) and since n − 1 rows of A agree with the invertible B_{0}. Thus, A^{1} can be expressed as a linear combination of \(\{A^{j}\}_{j=2}^{n}\). Since F has a root and since F_{j}(u^{1}) = 0 for all j > 1 by Lemma 4, it readily follows that F_{1}(u^{1}) = 0, whence F(u^{1}) = 0. Now suppose that F does not have a root. By applying Theorem 3 again, we obtain that \(G^{\prime }=A\bar s = 0\); hence, G is constant, say G ≡ ω for some \(\omega \in \mathbb {R}\). Since F has no root, we must have ω≠ 0. Since G is constant, there holds F_{1}(u^{k}) = G(w^{k− 1}) = ω for all k ≥ 1. The sequence (u^{k}) cannot be convergent because Corollary 2 would entail that the limit point is a root of F.
□
Remark 8

1.
The starting point u^{0} is arbitrary in Theorem 7.

2.
The finite convergence in Theorem 7 1 is related to the 2nstep convergence of Broyden’s method for regular linear systems [12, 24]. Indeed, in the proof of Thm. 7 1, we can replace the computation for showing G(w^{k+ 1}) = 0 by an application of the 2nstep convergence to G using that due to τ_{k− 1} = 1, \(s_{w}^{k1}\) and \({s_{w}^{k}}\) are the Broyden steps for initial (w^{k− 1},C_{k− 1}).

3.
If in Theorem 7 1, Algorithm 1 does not terminate with \(u^{\ast }=\bar {u}\), then \(\lim _{k\to \infty } E_{k}\) exists and satisfies the conditions from Corollary 2.
Application to two examples from the literature
We illustrate some of our findings on two examples from the literature. The second example also hints at two extensions.
An example by Dennis and Schnabel
In [9, Example 8.1.3] and [9, Lemma 8.2.7], it is shown that for:
with root \(\bar {u}=(0,3)^{T}\), the initial data:
yields sequences (u^{k}) and (B_{k}) with \(u^{k}\to \bar {u}\) for \(k\to \infty \) and:
The affine component F_{1} has coefficient vector a_{1} = (1,1)^{T}, so \({\mathcal {S}}={\langle \{a_{1}\}\rangle }^{\perp }=\{t\bar s:t\in \mathbb {R}\}\) with \(\bar s:=\frac {1}{\sqrt 2} (1,1)^{T}\). Theorem 3 yields that \((s^{k})_{k\geq 1}\subset {\mathcal {S}}\) and (F_{1}(u^{k}))_{k≥ 1} ≡ 0. Of course, this can also be verified directly, cf. also [9, Example 8.1.3 and Lemma 8.2.7]. In agreement with Theorem 5 and Corollary 2, there holds \(\widetilde B S = B^{2} \bar s = 3\sqrt {2} = \widetilde F^{\prime }(\bar {u})S \), \(B^{1} = {B_{0}^{1}}\) and B(1,1)^{T} = B_{1}(1,1)^{T}. (From B_{1}, \(F^{\prime }(\bar {u})\) and \(\bar s,\) we can actually determine the limit B.) Because of \(F_{2}^{\prime }(\bar {u})\bar s\neq 0\neq F_{2}^{\prime \prime }(\bar {u})(\bar s,\bar s)\), Theorem 6 1 yields qorder φ for (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\) as well as the validity of (11).
An example by Dennis and Moré
In [8, Example 5.3], Dennis and Moré consider Broyden’s method for:
with root \(\bar {u}=(0,0)^{T}\) and note that for any \(\delta ,\epsilon \in \mathbb {R}\) the initial data:
yields a sequence (B_{k}) with \(B_{k}^{1,1}=1+\delta \) for all k ≥ 0. Hence, the incorrect entry 1 + δ is never corrected (assuming δ≠ 0), preventing convergence of (B_{k}) to \(F^{\prime }(\bar {u})\). According to [8], “The above example points out that one of the disadvantages of Broyden’s method is that it is not selfcorrecting. In particular, B_{k} depends upon each B_{j} with j < k and thus it may retain information which is irrelevant or even harmful.”. It is well known that the BFGS method is selfcorrecting, cf., e.g., [1, 27].
We show that the iterates (u^{k}) converge rapidly despite the incorrect entry 1 + δ in all B_{k}. The affine component F_{1} has coefficient vector a_{1} = (1,0)^{T}, thus \({\mathcal {S}}={\langle \{a_{1}\}\rangle }^{\perp }=\{(0,t)^{T}: t\in \mathbb {R}\}\). We set \(\bar s:=(0,1)^{T}\) and observe \((s^{k})_{k\geq 0}\subset {\mathcal {S}}\) as well as (F_{1}(u^{k}))_{k≥ 0} ≡ 0. It is not difficult to see that Theorem 3 and, in turn, Theorem 6 1 apply, even though Assumption 1 is not satisfied in this example. Theorem 6 1 implies that if (u^{k}) converges to \(\bar {u}\), then it has a qorder no smaller than φ and \(({\lVert B_{k+1}B_{k}\rVert })\) goes to zero with rorder no smaller than φ. The fast convergence is enabled by the fact that Broyden’s method effectively reduces to the onedimensional secant method. It should also be noted that (B_{k}) converges to \(F^{\prime }(\bar {u})\) in \({\mathcal {S}}\), i.e., \((B_{k}F^{\prime }(\bar {u}))S\to 0\), cf. Corollary 2. Furthermore, since B_{0}S = 1 correctly approximates the affine part of F_{2} and since F_{2} does not contain a quadratic part, it can be shown that \(({\lVert B_{k+1}B_{k}\rVert })\) has qorder 2, which implies that (u^{k}) has qorder 2, too. The numerical experiments confirm the qorder 2, cf. Section 5.2.2.
Numerical experiments
We use numerical examples to verify Corollary 2 and Theorems 6 and 7. We first present the design of the experiments and then provide the examples and results.
Design of the experiments
Implementation and accuracy
We use the variable precision arithmetic (vpa) of Matlab 2020b. Unless stated otherwise, we work with a precision of 10000 digits and replace the termination criterion F(u^{k}) = 0 in Algorithm 1 by \({\lVert F(u^{k})\rVert }\leq 10^{5000}\). By \(\bar k,\) we denote the final value of k.
Known solution and random initialization
All examples have root \(\bar {u}=0\) and the experiments are set up in such a way that convergence to \(\bar {u}\) takes place in all runs except possibly a handful that are discarded. Except in the second example, the initial guess (u^{0},B_{0}) is randomly generated using Matlab’s function rand to satisfy u^{0} ∈ [−α,α]^{n} and \(B_{0}=F^{\prime }(u^{0})+\hat \alpha {\lVert F^{\prime }(u^{0})\rVert }R\). Here, \(R\in \mathbb {R}^{n\times n}\) is a matrix with R^{j} = 0 for all j > 1 and the entries in R^{1} randomly drawn from [− 1,1]. The values of α ∈ [10^{− 3},1000] and \(\hat \alpha \in [0,1000]\) will be specified within each example.
Quantities of interest
To display the course of Algorithm 1, we use the norm of F_{k} := F(u^{k}), the error \({\lVert E_{k}\rVert }\), the quotients q_{k} and Q_{k} introduced in Theorem 6, and furthermore:
as well as:
and:
We note that \({{\mathcal {Q}}_{k}^{u}}\) and \({{\mathcal {Q}}_{k}^{B}}\) approximate the qorder of convergence while \(\mathcal {R}_{k}^{B}\) approximates the rorder. Whenever any of these quantities is undefined, we set it to − 1, e.g., β_{0} := − 1. We will use these quantities to confirm that (B_{k}) converges, cf. Corollary 2, and to assess the convergence order of (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\), cf. Theorem 6. We are also interested in whether \({\lVert E_{k}\rVert }\to 0\), i.e., whether (B_{k}) converges to the true Jacobian \(F^{\prime }(\bar {u})\), cf. for instance Remark 6.
Single runs and cumulative runs
We use single runs and cumulative runs. For single runs, we display the quantities of interest during the course of the algorithm. A cumulative run consists of 1000 single runs with initial data varying according to Section 5.1.2, unless stated otherwise. Let us briefly describe the aggregated quantities that we use to assess cumulative runs. For instance, to gauge the qorder of \(({\lVert B_{k+1}B_{k}\rVert })\), we compute for each single run of a cumulative run the number:
where j ∈ [1000] indicates the respective single run and we consistently use \(k_{0}(j):=\min \limits \{100,\lfloor 0.75\bar k(j)\rfloor \}\). As outcome of the cumulative run, we display:
If the stronger conditions in Theorem 6 1 hold, then \({\mathcal {Q}}_{B}^{}\) and \({\mathcal {Q}}_{B}^{+}\) should both be close to the golden mean φ. If the convergence is of lower order in any of the 1000 single runs, then we expect \({\mathcal {Q}}_{B}^{}\) to be smaller than φ.
In the same way as just presented for \({\mathcal {Q}}_{B}^{}\) and \({\mathcal {Q}}_{B}^{+}\), we derive \({\lVert E\rVert }^{}\), \({\lVert E\rVert }^{+}\), q^{−}, q^{+}, \({\mathcal {Q}}_{u}^{}\), \({\mathcal {Q}}_{u}^{+}\), β^{−}, β^{+}, Q^{−}, Q^{+}, \(\mathcal {R}_{u}^{}\), and \(\mathcal {R}_{u}^{+}\) from the respective quantities used in single runs. In addition, we use:
To keep the tables for cumulative runs of a reasonable size, we will omit some of these quantities, but what is omitted varies from example to example.
Numerical examples
Example 1
To verify the results of Theorem 6 1, we consider \(F:\mathbb {R}^{10}\rightarrow \mathbb {R}^{10}\) given by:
where \(A\in \mathbb {R}^{9\times 10}\) is a random matrix with entries in [− 1,1] that is changed after each of the 1000 single runs of the cumulative run. The randomly generated A is only accepted if the resulting \(F^{\prime }(\bar {u})\) is invertible. We use α = 0.001 in this example. A single and a cumulative run with (σ_{k}) ≡ 1 and \(\hat \alpha =0\) are displayed in Tables 1 and 2. The results agree with Theorem 6 1. For instance, it is apparent that (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\) converge with qorder φ ≈ 1.618 and that \(\lim _{k\to \infty }\frac {Q_{k}}{q_{k2}}=1\) (since A is random, we expect \(F_{1}^{\prime \prime }(\bar {u})(\bar s,\bar s)\neq 0\)). Table 2 also shows results for a cumulative run with (σ_{k}) ≡ 1 and \(\hat \alpha =0.1\). In accordance with Theorem 6 1, deviating from the choice \(B_{0}=F^{\prime }(u^{0})\) does not affect the qorder of convergence. Next we keep \(\hat \alpha =0.1\) and let σ_{k} = 0.5 for k ≤ 3 and (σ_{k})_{k≥ 4} ≡ 1. Theorem 6 1 predicts that this choice of (σ_{k}) maintains qorder φ for (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\), and Table 2 confirms this.
In contrast, if we choose \(\hat \alpha =0\) and (σ_{k}) ≡ 0.99, then the order of convergence drops significantly and the same holds for (σ_{k}) ≡ 1 − (k + 2)^{− 4}, cf. Table 2. In fact, except for some special cases it can be shown that (u^{k}) can only converge with qorder greater than one if σ_{k} → 1 fast enough. In particular, for (σ_{k}) ≡ 0.99 and (σ_{k}) ≡ 1 − (k + 2)^{− 4}, both (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\) have qorder 1. To confirm this for (σ_{k}) ≡ 1 − (k + 2)^{− 4}, we repeat the cumulative run with a higher precision of 100000 digits, using \({\lVert F(u^{k})\rVert }\leq 10^{50000}\) as termination criterion and only 100 single runs instead of 1000. We view the results in Table 2 as being in line with qorder 1. In any case, it is apparent that for (σ_{k}) ≡ 0.99 and (σ_{k}) ≡ 1 − (k + 2)^{− 4} the qorder of convergence is not φ anymore and that \(({\lVert B_{k+1}B_{k}\rVert })\) converges to zero at least qlinearly for all choices of (σ_{k}); hence, (B_{k}) converges, which validates Corollary 2. The values of \({\lVert E\rVert }^{}\) show that (B_{k}) never converges to \(F^{\prime }(\bar {u})\).
Example 2
We provide results for the example by Dennis and Moré discussed in Section 4.3.2, which concerns Broyden’s method, so (σ_{k}) ≡ 1. A single run is displayed in Table 3 and four cumulative runs in Table 4. For the single run and the first cumulative run, we use (u^{0},B_{0}) that satisfy (12) with randomly generated δ,𝜖 ∈ [− 0.5,0.5]. The results confirm that, as argued in Section 4.3.2; both (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\) have qorder 2. Because of \(F_{2}^{\prime \prime }(\bar {u})=0\), this does not contradict Theorem 6 1.
In the second cumulative run, we let u^{0} = (𝜖_{1},𝜖_{2})^{T} for random numbers 𝜖_{1},𝜖_{2} ∈ [− 0.5,0.5], while keeping B_{0} as in (12) with δ ∈ [− 0.5,0.5]. Due to 𝜖_{1}≠ 0, we cannot expect (s^{k}) to belong to a onedimensional subspace; hence, Theorem 6 does not apply anymore. Correspondingly, the second row in Table 4 shows that (u^{k}) does not attain the qorder φ but suggests that the qorder may still have a lower bound larger than 1. This view is further encouraged by the fact that the rorder of \((\lVert {B_{k+1}B_{k}}\rVert {2})\) seems to admit such a lower bound, too, which is a necessary condition for (u^{k}) to have a qorder, cf. Lemma 3. To investigate the potential qorder of (u^{k}) further, we repeat the cumulative run at a higher precision using \({\lVert F(u^{k})\rVert }\leq 10^{100000}\) as termination criterion and 400 single runs. The results are contained in Table 4 and support the existence of a qorder larger than one for (u^{k}).
In the third cumulative run, whose results are depicted in the last row of Table 4, we keep the choice u^{0} = (𝜖_{1},𝜖_{2})^{T} from the second cumulative run, but use \(B_{0}=F^{\prime }(u^{0})\) as initial, so that \({B_{0}^{1}} = F_{1}^{\prime }(u^{0})\) and hence Assumption 1 holds. In turn, Theorem 6 1 applies, which ensures a qorder, respectively, rorder no smaller than φ for (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\), respectively. It can be argued in the same way as in Section 4.3.2 that both sequences actually converge with qorder 2. Table 4 confirms this qorder.
The values of \({\lVert E\rVert }^{}\) in Table 4 show that (B_{k}) never converges to \(F^{\prime }(\bar {u})\). Yet, since \(({\lVert B_{k+1}B_{k}\rVert })\) declines quickly, the convergence of (u^{k}) is still rapid.
Example 3 a
We turn to Theorem 6 2, where \(F^{\prime }(\bar {u})\) is singular. Let:
Because of \({\mathcal {A}}^{\perp } = {\langle \{(0,1,1)^{T}\}\rangle ,}\) we have \(\bar s=\frac {1}{\sqrt {2}}(0,1,1)^{T}\), hence \(F_{1}^{\prime }(0)=0\) and \(F_{1}^{\prime \prime }(0)(\bar s,\bar s)=2\neq 0\), which implies \(\lim _{k\to \infty }q_{k}=\lim _{k\to \infty } Q_{k}=\frac {\sqrt {5}1}{2}\approx 0.618\) for the choice (σ_{k}) ≡ 1 that we consider first. We use \(\alpha =\hat \alpha =0.01\) in this example. The results of a cumulative run with (σ_{k}) ≡ 1 are displayed in Table 5 and are in perfect agreement with Theorem 6 2. Table 5 also provides results for (σ_{k}) ≡ 0.99, which are similar to those for (σ_{k}) ≡ 1. Moreover, it features ι^{−} and ι^{+}, which denote the minimal, respectively, maximal number of iterations of all single runs within a cumulative run. As in the previous examples, we consistently find \(B_{k}\not \to F^{\prime }(\bar {u})\).
Example 3 b
We change F_{1} in example 3 a, using \(F_{1}(u)={u_{2}^{3}}2 {u_{3}^{3}}\) instead. This results in \(F_{1}^{\prime }(0)=0\), \(F_{1}^{\prime \prime }(0)(\bar s,\bar s)=0\) and \(F_{1}^{\prime \prime \prime }(0)(\bar s,\bar s,\bar s)\neq 0\), so Theorem 6 2 implies \(\lim _{k\to \infty } q_{k} \approx 0.755\) and \(\lim _{k\to \infty } Q_{k} \approx 0.570\). Table 5 confirms this for (σ_{k}) ≡ 1 and shows that the choice (σ_{k}) ≡ 0.99 induces only marginal changes. Overall, example 3 exhibits a remarkably uniform convergence behavior of iterates and matrix updates, as evidenced, for instance, by the fact that q^{−} = q^{+} and Q^{−} = Q^{+}. Table 6 exemplifies this for example 3 b in a single run with (σ_{k}) ≡ 1. Since this uniformity is characteristic for singular \(F^{\prime }(\bar {u})\) of rank n − 1, cf. also [19], we used \({\lVert F(u^{k})\rVert }\leq 10^{500}\) as termination criterion in example 3 and the cumulative runs consisted of 100 single runs.
Example 4
To verify Theorem 7 1 we consider F(u) = Au, where \(A\in \mathbb {R}^{10\times 10}\) is an invertible random matrix with entries in [− 1000,1000] that is changed after each single run of the cumulative run. We choose \(\alpha =\hat \alpha =1000\). In the first cumulative run, we use σ_{4} = 1, σ_{k} = 0.1 otherwise. Theorem 7 1 guarantees F(u^{6}) = 0 if F(u^{k})≠ 0 for 0 ≤ k ≤ 5. Table 7 shows that ι^{−} = ι^{+} = 6, so all runs use exactly 6 steps. On a side note, we remark that Q^{−} = Q^{+} = 9 can easily be proven. The second experiment displayed in Table 7 uses (σ_{k}) ≡ 1 − (k + 2)^{− 4}. The outcome is in line with Theorem 7 1 that asserts global qsuperlinear, but not finite convergence for this choice of (σ_{k}), as well as convergence of (B_{k}). As in example 1, it can be shown that the qorder of (u^{k}) and \(({\lVert B_{k+1}B_{k}\rVert })\) is 1. To verify this, we repeat the cumulative run with (σ_{k}) ≡ 1 − (k + 2)^{− 4}, using a precision of 100000 digits and \({\lVert F(u^{k})\rVert }\leq 10^{50000}\) as termination criterion, but only 100 single runs. The result in Table 7 is in line with qorders of 1. Despite the fact that all B_{k} agree with A on n − 1 of n rows, the difference between B_{k} and A in the last 25% of iterations is large in norm, which, however, does not prevent finite convergence if σ_{k} = 1 for at least one k ≥ 1; cf. Theorem 7 and Remark 6.
Summary
We have shown that, up to a translation, the iterates of the Broydenlike method for mixed linear–nonlinear systems of equations can be obtained by applying the Broydenlike method to a lowerdimensional mapping, provided that the rows of the initial matrix agree with the rows of the Jacobian for (some of) the linear equations. We have used this subspace property to extend a sufficient condition for convergence of the Broydenlike matrices. For the special case that at most one equation is nonlinear, we have concluded that the Broydenlike matrices converge whenever the iterates converge. For Broyden’s method, we could, in addition, quantify how fast iterates and updates converge, respectively, prove finite convergence if the system is linear. We verified the results in highprecision numerical experiments.
References
 1.
AlBaali, M.: Extra updates for the BFGS method. Optim. Methods Softw. 13(3), 159–179 (2000). https://doi.org/10.1080/10556780008805781
 2.
AlBaali, M., Spedicato, E., Maggioni, F.: Broyden’s quasiNewton methods for a nonlinear system of equations and unconstrained optimization: a review and open problems. Optim. Methods Softw. 29(5), 937–954 (2014). https://doi.org/10.1080/10556788.2013.856909
 3.
Broyden, C.: A class of methods for solving nonlinear simultaneous equations. Math. Comput. 19, 577–593 (1965). https://doi.org/10.2307/2003941
 4.
Broyden, C., Dennis, J., More, J.J.: On the local and superlinear convergence of quasiNewton methods. J. Inst. Math. Appl. 12, 223–245 (1973). https://doi.org/10.1093/imamat/12.3.223
 5.
Conn, A.R., Gould, N.I.M., Toint, P.L.: Convergence of quasiNewton matrices generated by the symmetric rank one update. Math. Program. 50(2 (A)), 177–195 (1991). https://doi.org/10.1007/BF01594934
 6.
Decker, D.W., Kelley, C.T.: Broyden’s method for a class of problems having singular Jacobian at the root. SIAM J. Numer. Anal. 22, 566–574 (1985). https://doi.org/10.1137/0722034
 7.
Dennis, J., More, J.J.: A characterization of superlinear convergence and its application to quasiNewton methods. Math. Comput. 28, 549–560 (1974). https://doi.org/10.2307/2005926
 8.
Dennis, J., More, J.J.: QuasiNewton methods, motivation and theory. SIAM Rev. 19, 46–89 (1977). https://doi.org/10.1137/1019005
 9.
Dennis, J., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM, Philadelphia (1996). https://doi.org/10.1137/1.9781611971200
 10.
Díez, P.: A note on the convergence of the secant method for simple and multiple roots. Appl. Math. Lett. 16(8), 1211–1215 (2003). https://doi.org/10.1016/S08939659(03)901194
 11.
Fayez Khalfan, H., Byrd, R.H., Schnabel, R.B.: A theoretical and experimental study of the symmetric rankone update. SIAM J. Optim. 3(1), 1–24 (1993). https://doi.org/10.1137/0803001
 12.
Gay, D.M.: Some convergence properties of Broyden’s method. SIAM J. Numer. Anal. 16, 623–630 (1979). https://doi.org/10.1137/0716047
 13.
Ge, R., Powell, M.J.D.: The convergence of variable metric matrices in unconstrained optimization. Math. Program. 27, 123–143 (1983). https://doi.org/10.1007/BF02591941
 14.
Griewank, A.: Broyden updating, the good and the bad! Doc. Math. (Bielefeld) pp. 301–315 (2012) https://www.emis.de/journals/DMJDMV/volismp/45_griewankandreasbroyden.pdf
 15.
Kelley, C.: Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia (1995). https://doi.org/10.1137/1.9781611970944
 16.
Li, D., Fukushima, M.: A derivativefree line search and global convergence of Broydenlike method for nonlinear equations. Optim. Methods Softw. 13(3), 181–201 (2000). https://doi.org/10.1080/10556780008805782
 17.
Li, D., Zeng, J., Zhou, S.: Convergence of Broydenlike matrix. Appl. Math. Lett. 11(5), 35–37 (1998). https://doi.org/10.1016/S08939659(98)000767
 18.
Mannel, F.: On the 2n–step qquadratic convergence and the qorder of Broyden’s method. Submitted (2020). https://imsc.unigraz.at/mannel/Broy2n.pdf
 19.
Mannel, F.: On the convergence of Broyden’s method and of the Broyden matrices for a class of singular problems. Submitted (2020). https://imsc.unigraz.at/mannel/CGB_sing.pdf
 20.
Mannel, F.: On the convergence rate of Broyden–like methods. In preparation (2021)
 21.
Martínez, J.M.: Practical quasiNewton methods for solving nonlinear systems. J. Comput. Appl. Math. 124(12), 97–121 (2000). https://doi.org/10.1016/S03770427(00)004349
 22.
More, J., Trangenstein, J.: On the global convergence of Broyden’s method. Math. Comput. 30, 523–540 (1976). https://doi.org/10.2307/2005323
 23.
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006). https://doi.org/10.1007/9780387400655
 24.
O’Leary, D.P.: Why Broyden’s nonsymmetric method terminates on linear equations. SIAM J. Optim. 5(2), 231–235 (1995). https://doi.org/10.1137/0805012
 25.
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, vol. 30. SIAM, Philadelphia (2000). https://doi.org/10.1137/1.9780898719468
 26.
Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Rosen, J., Mangasarian, O., Ritter, K. (eds.) Nonlinear Programming, pp 31–65. Academic Press (1970). https://doi.org/10.1016/B9780125970501.500063
 27.
Powell, M.J.D.: How bad are the BFGS and DFP methods when the objective function is quadratic? Math. Program. 34, 34–47 (1986). https://doi.org/10.1007/BF01582161
 28.
Sachs, E.: Convergence rates of quasinewton algorithms for some nonsmooth optimization problems. SIAM J. Control Optim. 23, 401–418 (1985). https://doi.org/10.1137/0323026
 29.
Stoer, J.: The convergence of matrices generated by rank2 methods from the restricted βclass of Broyden. Numer. Math. 44, 37–52 (1984). https://doi.org/10.1007/BF01389753
 30.
Sun, L.: The convergence of quasiNewton matrices generated by the selfscaling symmetric rank one update. Indian J. Pure Appl. Math. 29(1), 51–58 (1998)
 31.
Vianello, M., Zanovello, R.: On the superlinear convergence of the secant method. Am. Math. Mon. 99(8), 758–761 (1992). https://doi.org/10.2307/2324244
Funding
Open Access funding provided by University of Graz.
Author information
Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mannel, F. Convergence properties of the Broydenlike method for mixed linear–nonlinear systems of equations. Numer Algor 88, 853–881 (2021). https://doi.org/10.1007/s1107502001060y
Received:
Accepted:
Published:
Issue Date:
Keywords
 Broydenlike method
 Broyden’s method
 Convergence of Broydenlike matrices
 QuasiNewton methods
 Uniform linear independence
Mathematics subject classification (2010)
 49M15
 65H10
 65K05
 90C30
 90C53