A dynamical view of nonlinear conjugate gradient methods with applications to FFT-based computational micromechanics

For fast Fourier transform (FFT)-based computational micromechanics, solvers need to be fast, memory-efficient, and independent of tedious parameter calibration. In this work, we investigate the benefits of nonlinear conjugate gradient (CG) methods in the context of FFT-based computational micromechanics. Traditionally, nonlinear CG methods require dedicated line-search procedures to be efficient, rendering them not competitive in the FFT-based context. We contribute to nonlinear CG methods devoid of line searches by exploiting similarities between nonlinear CG methods and accelerated gradient methods. More precisely, by letting the step-size go to zero, we exhibit the Fletcher–Reeves nonlinear CG as a dynamical system with state-dependent nonlinear damping. We show how to implement nonlinear CG methods for FFT-based computational micromechanics, and demonstrate by numerical experiments that the Fletcher–Reeves nonlinear CG represents a competitive, memory-efficient and parameter-choice free solution method for linear and nonlinear homogenization problems, which, in addition, decreases the residual monotonically.


Introduction
FFT-based solvers, as pioneered by Moulinec-Suquet [1,2], enjoy great popularity for solving homogenization problems on complex microstructures. Their success rests essentially on three pillars. First, they profit from modern imaging techniques like micro-computed tomography which naturally produce volumetric data on a regular grid. Secondly, the current implementations of the fast Fourier transform [3] are extremely powerful, providing programmers of FFT-based methods a cutting edge. Last but not least, FFT-based methods naturally support matrix-free solvers also for strongly nonlinear mechanical problems, which can be decisive if memory occupancy is an issue. A recent overview of applications in micromechanics is given in Matouš et al. [4], including the use in FE-FFT schemes [5] and as precomputations for model-order reduction methods [6][7][8] and deep learning techniques [9,10]. Progress in FFT-based methods focussed on three principal directions. Firstly, the areas of applicability were extended, for instance to finite strain problems [11], crystal viscoplasticity [12] and damage [13]. Recently, also applications to interface decohesion [14], upscaling of micromechanics-based fatigue [15] and the homogenization of brittle fracture [16] were established.
Secondly, the disadvantages of the original discretization method of Moulinec-Suquet were focus of investigations. Fourier-Galerkin methods [17] and finite element [18][19][20], finite difference [21,22] as well as finite volume discretizations [23] were investigated, some of them playing a crucial role when treating media with imperfections such as cracks and pores. Also, recently, a discretization in terms of Bsplines was introduced [24].
Last but not least, the basic (solution) scheme was superseded by more powerful numerical solution methods. Eyre-Milton [25] introduced a so-called polarization scheme, which was subsequently generalized by Michel-Moulinec-Suquet [26]. Monchiet-Bonnet [27] introduced a one-parameter family of polarization schemes and exhibited both the Eyre-Milton scheme and the augmented Lagrangian method of Michel-Moulinec-Suquet as particular instances, see also Moulinec-Silva [28]. Furthermore, they introduced a hybrid method which is essentially the average of the Eyre-Milton and the augmented Lagrangian operators. Schneider-Wicht-Böhlke [29] interpreted polarization schemes from the viewpoint of global optimization and identified them as Douglas-Rachford schemes [30], providing convergence guarantees and optimal parameter selection for strongly convex problems.
Zeman et al. [31] and Brisard-Dormieux [18] introduced Krylov subspace solvers for linear homogenization problems. Zeman et al. [31] proposed using the (linear) conjugate gradient method with great success. As the conjugate gradient method was applied to a non-symmetric operator, its successful application came as a surprise at that time (and triggered further research, see below). Brisard-Dormieux used a reformulation of the corrector equation in terms of the Hashin-Shtrikman variational principle, guaranteeing a symmetric, but possibly indefinite formulation. They proposed using the conjugate gradient method or MINRES and SYMMLQ, respectively, see Paige-Saunders [32].
As an alternative to the basic scheme and polarizationtype methods, Newton solvers were proposed by Lahellec et al. [11]. Initially, the basic scheme was used as the linear solver. Only much later, Gélébart-Mondon-Cancel [33] and Kabel-Böhlke-Schneider [34] proposed Newton-Krylov solvers, ie, to use Krylov subspace solvers for computing the Newton increment. For a thorough study on line-search methods suitable in the FFT context and strategies for choosing the forcing term, we refer to Wicht-Schneider-Böhlke [35].
Kabel-Böhlke-Schneider [34] observed that the basic scheme may be interpreted as a gradient descent scheme, and connected the reference material to (the inverse of) a stepsize. By this observation, numerical solution methods from large-scale optimization could enter into FFT-based computational homogenization. The benefits of using fast gradient methods in the FFT-based context were first exploited in Schneider [36], see also Ernesti-Schneider-Böhlke [37] for an application of fast gradient methods to phase-field fracture. Using Quasi-Newton methods was pioneered by Shantraj et al [38], and further extended in Wicht-Böhlke-Schneider [35].
All of these solvers have their distinctive advantages. The basic scheme is most memory-efficient and extremely robust, but may become slow for highly contrasted problems. The polarization schemes fail to converge quickly for problems involving pores (see Schneider [39] for a counterexample) and rely upon a complexity-reduction trick to be efficient, see Schneider-Wicht-Böhlke [29]. Newton methods are fast and robust, but suffer from heavy memory demands. They are particularly useful if evaluating the nonlinear material law is very expensive compared to applying the tangent. The fast gradient methods combine fast conver-gence with a low memory footprint. However, they suffer from a delicate parameter selection. More precisely, they require the strong convexity constant to be known to be efficient. However, determining the strong convexity constant for problems involving pores is extremely difficult, in general. Restarting techniques do not solve these problems. Indeed, for problems where the strong convexity constant is known, restarting schemes require about twice the iteration count as if the method was used with optimal parameter choice.
Thus, research was devoted to finding solution techniques that share the positive characteristics of the mentioned solution schemes without combining their disadvantages. Schneider [39] proposed the Barzilai-Borwein method as a general-purpose method for computational micromechanics. It may be interpreted as the basic scheme with adaptive time-stepping (or, equivalently, adaptive reference material). Thus, it has low memory-footprint, but is competitive in terms of convergence speed when compared to the solvers mentioned above. However, the Barzilai-Borwein method is an intrinsically non-monotone method, ie, the residual is not monotonically decreasing from one iteration to the next. This behavior is unfamiliar, and may limit the applicability for industrial applications. It may happen that after the lunch break the residual is two orders of magnitude higher than before the lunch break.
Thus, it is desirable to seek alternatives to the Barzilai-Borwein scheme with guaranteed monotonicity properties.

Contributions
Nonlinear conjugate gradient methods were introduced by Fletcher-Reeves [40] as a generalization of the linear conjugate gradient scheme [41]. Nonlinear conjugate gradient methods were among the first general-purpose solution schemes for unconstrained optimization problems, and have been studied thoroughly, see Nocedal-Wright [42] for a comprehensive textbook treatment. They are based upon an iteratively updated search direction, involving a coefficient (the "β") to be chosen wisely and a step-size which is typically determined by line search. For the β-coefficient, a number of different variants were proposed, the most popular due to Fletcher-Reeves [40], Polak-Ribière-Polyak [43,44], Hestenes-Stiefel [41] and Dai-Yuan [45].
On a formal level, nonlinear conjugate gradient methods and the heavy-ball scheme [46], a very powerful fast gradient scheme, are identical, see Polyak's book [47]. However, the parameters are chosen in a different way. Indeed, for the heavy-ball method, the step-size and the β-coefficient are typically chosen as constants depending on the properties of the function (the Lipschitz constant of the gradient of the objective, and the strong monotonicity constant, if available [44,48,49]). In contrast, nonlinear conjugate gradient methods choose the β-coefficient depending on the current and the last iterate, in general, and determine the step-size by a line-search procedure. We refer to Sect. 2 for more details.
A naive application of nonlinear conjugate gradient methods in the FFT-based context does not lead to competitive schemes, essentially due to the intrinsic line-search. (Another difficulty arising in computational micromechanics is the fact that the condensed incremental energy is typically not computed -only gradient information, ie, the stress, is evaluated.) In the nonlinear conjugate gradient community, this problem is well-known. Indeed, Quasi-Newton methods of the Broyden class, see chapter 8 in Nocedal-Wright [42], may also be interpreted as generalizations of linear CG. However, they typically outperform nonlinear conjugate gradient methods because they asymptotically accept the proposed step-sizein contrast, the most efficient conjugate gradient methods are reported to require about two line-search steps per iteration to be efficient. Sun-Zhang [50] pioneered using conjugate gradient methods without line search and proved convergence under mild conditions. However, these step-sizes are too conservative for the resulting methods to be efficient. Indeed, under suitable conditions, they ensure a monotonic decrease of the objective value, which is known to be inefficient for fast gradient methods.
In this work, we connect nonlinear conjugate gradient methods to fast gradient methods by letting the step-size go to zero, trying to understand the resulting dynamical system. Indeed, the Fletcher-Reeves CG permits such a limiting procedure, giving rise to Newtonian dynamics with nonlinear damping. More precisely, the damping is determined by the speed of decrease of the residual (the norm of the gradient). In theory, this damping could become negative, inducing an instability of the dynamical system. In the numerical experiments, however, this situation did not arise frequently, whence we discarded including proper safeguarding strategies into the manuscript.
We propose to combine the Fletcher-Reeves nonlinear CG with the step-size of the basic scheme, and detail its implementation in the FFT-based context, see Sect. 3. The method may be implemented on three strain-like fields, which is only one more than the Barzilai-Borwein method. However, the resulting solver leads to a monotonic, and thus reliable, decrease of the residual. We demonstrate the usefulness of the proposed method as a general-purpose solver for FFT-based computational micromechanics in Sect. 4, comparing the method to a multitude of state-of-the-art numerical solvers for problems of industrial scale.

From linear to nonlinear conjugate gradient methods
Let V be a Hilbert space with inner product ·, · , and let A ∈ L(V ) be a bounded linear operator, which we assume self-adjoint and uniformly positive, ie, Au, v = Av, u holds for all u, v ∈ V and there is μ > 0, s.t. Au, u ≥ μ u 2 holds for all u ∈ V , where u = √ u, u denotes the induced norm. For prescribed b ∈ V , we are concerned with minimizing the quadratic objective whose unique minimizer x * solves the linear equation For a collection (d 0 , d 1 , d 2 , . . .) of A-conjugate vectors and an initial point x 0 ∈ V , the conjugate direction method (Sect. 5.1 in Nocedal-Wight [42]) iteratively updates where the step size α k is chosen by exact line search which can be computed explicitly for the quadratic objective (2.1) as The latter formula may be simplified, taking into account the update (2.2) and the conjugacy to arrive at We formulate this version of the linear conjugate gradient algorithm for later reference and formally defining d −1 = 0. Fletcher-Reeves [40] extended this algorithm to general unconstrained nonlinear optimization problems in the form where α k is determined by a line search procedure (2.3), and where the superscript FR stands for Fletcher-Reeves. Subsequently, to account for inaccurate line-search procedures, a variety of other formulae for β k were introduced, the most popular are the following.
1. The formula introduced, independently, by Polak-Ribière [43] and Polyak [44], Due to a counter-example exhibiting divergence even for exact line search [51], Gilbert-Nocedal [52] introduced a modification which is the form of β k in the original work of Hestenes-Stiefel [41]. This variant ensures that the vectors d k and d k−1 are A k -conjugate for the averaged Hessian between successive iterates. 3. Dai-Yuan [45] introduced the formula which ensures that d k is always a descent direction, ie, d k , g k < 0 holds, provided f is convex and any α k [53] or for general f if α k satisfies the Wolfe conditions [45].
The different formulae for β k coincide for quadratic objective (2.1) and exact line search (2.4). For line search, typically either the strong Wolfe conditions 14) or the Wolfe conditions ie, no explicit convergence rate may be deduced, in general. The biggest limitation of the nonlinear conjugate gradient methods is the line search involved in the update (2.8). This becomes apparent even for the quadratic case (2.6). In the presented form, two applications of A (ie, gradient evaluations of f ) are required for the linear case. Of course, introducing z k = Ad k and noticing g k+1 = g k + α k z k , the standard implementation of linear CG (Sect. 5.1 in Nocedal-Wright [42]) may be obtained, which only requires one application of A per iteration. However, if we apply the nonlinear CG method to a quadratic function using the exact line search again requires two applications of A per iteration. Thus, if we assume that the effort of applying A dominates the computational expense per iteration, the nonlinear CG is about a factor of two slower than linear CG applied to quadratic problems (2.1).
For general nonlinear optimization problems, state-ofthe-art implementations of the nonlinear conjugate gradient methods require two gradient evaluations per line search (see the introduction of Dai-Kou [55]). In contrast, (limited memory) Quasi-Newton methods [56][57][58][59][60] asymptotically accept the suggested step size. Thus, they often outperform conjugate gradient methods in practice.
For these reasons, research effort has been invested in nonlinear conjugate gradient methods avoiding line search at all. Sun-Zhang [50] have shown that coming close to the upper bound in the Armijo condition [the first inequality in the Wolfe conditions (2.15)] leads to convergence for the different variants of β k . Also, combinations of nonlinear CG and the Barzilai-Borwein method have been explored [55,61].
In general, the Armijo condition ensures a monotonic decrease of the function value (provided d k is a descent direction). We will see that this is not optimal.

The heavy-ball method versus nonlinear conjugate gradients
For minimizing general (differentiable) objective functions (2.7), the heavy-ball method [46] uses the iterative scheme involving step-sizes α k and momentum coefficients γ k . The heavy-ball scheme (2.16) may be motivated, see Helmke-Moore [62], as a time-discretized version of the linearly damped Newtonian dynamics 1 for a curve x : [0, ∞) → V and damping parameter b ≥ 0. A backward Euler discretization of the dynamics (2.17) with time-step size h > 0 reads which may be rearranged into the scheme which has the form (2.16) with α k ≡ h 2 and β k ≡ 1 − bh. 1 Formally, the "mass" in front of the acceleration is set to unity.
Initially, Polyak investigated the local convergence of the scheme (2.16) (for strongly convex objective functions) by Lyapunov's first method. However, the determined optimal parameters failed to be stable for general strongly convex objectives with Lipschitz gradient, see Lessard-Recht-Packard [49] and Ghadimi-Feyzmahdavian-Johansson [48] for counterexamples.
A global convergence analysis was carried out based on Lyapunov's second method, ie, in terms of suitable Lyapunov functions, see the pioneering work Zavriev-Kostyuk [63]. More precisely, in the continuous setting, the total energy of the dynamical system (2.17) may be used as a Lyapunov function, ie,  (2.20) where x * denotes the minimizer of f . It has been long known that nonlinear conjugate gradient methods (2.8) and heavy-ball schemes (2.16) coincide as numerical schemes. Indeed, for the conjugate gradient update (2.8) combining these two equations yields Although they formally coincide, the way the numerical parameters are chosen is different. Also, both the respective interpretations and the convergence theories differ. For the heavy-ball method, the parameters α k and γ k are chosen independently of k, but depending on the strong convexity constant of the objective function and the Lipschitz constant of its gradient. More precisely, the Lipschitz constant L and the strong monotonicity constant μ of the gradient of f are required. Then, the convergence of the heavy-ball method is deduced by Lyapunov's first or second method, deducing local [46,49] or global [48,64] convergence, respectively. The convergence results are typically very strong, involving a qlinear convergence towards the minimum. Getting hands on the strong monotonicity constant μ is difficult, in general, and for homogenization problems of Sect. 3, in particular. This constitutes the biggest limitation of the heavy-ball method.
In contrast, for nonlinear conjugate gradient methods, the parameters α k and β k are chosen adaptively. The different formulae for β k measure the local geometry of the function, and the step-size α k is chosen by a line-search procedure. Notice that the bounds entering the Wolfe conditions (2.15) and (2.14) also depend on local information about the function f . Convergence of the conjugate gradient method can be typically established for a large class of objective functions (see, for instance, Dai [66]), not necessarily restricted to convex objective functions. However, the convergence results are typically rather weak, for instance ∇ f (x k n ) → 0 along a subsequence. As already mentioned in the previous section, the biggest limitation of nonlinear conjugate gradient methods is their reliance upon line-search procedures.

A dynamical perspective on conjugate gradient methods
We have seen that heavy-ball methods work well for fixed step-size, but the momentum parameter needs tweaking. Fixing the step-size α k ≡ α in the nonlinear conjugate gradient update (2.8) leads to γ k = β k in the heavy-ball scheme (2.16) and, thus, to the method Reversing the arguments leading to Eq. (2.18) yields, for Thus, provided the limit 1−β k h as h → 0 exists, we get the corresponding dynamical system Thus, we arrive at the dynamical system Several remarks are in order.
1. The Fletcher-Reeves dynamical system is similar to the dynamical system associated to Nesterov's fast gradient method [67,68], as developed by Su-Boyd-Candes [69], where 3 t serves as a time-dependent damping. For convex f , a 1/t 2 -convergence rate of the objective along a trajectory of the ODE (2.24) may be established using Lyapunov-type arguments. To highlight the similarity with the Fletcher-Reeves-ODE (2.23), we rewrite the latter in the form whereas we may cast (2.24) in the form holds. Put differently, as long as the residual ∇ f (x) 2 is non-increasing, the Fletcher-Reeves dynamical system (2.23) is stable. Clearly, due to the initial condition In our numerical experiments, b remained positive (except for very few iterates). However, we did not find a mathematical proof for this fact. Various safeguarding strategies (preventing b from becoming negative) never led to an improvement of the convergence behavior, whence we discarded discussing them from the manuscript. 3. In view of Siegel's convergence result (2.20), fast convergence is to be expected if b ≈ 2 √ μ. b may be considered as a feedback controller for the dynamical system (2.21). Its analysis goes beyond this manuscript. 4. For some other nonlinear conjugate gradient methods, corresponding dynamical systems may be derived as well. For instance, the Dai-Yuan method [45] leads to the dynamical system However, the latter ODE is degenerate, and we did not investigate it further. 5. The linear conjugate gradient method was interpreted as a dynamical system with feedback control in Bhaya-Kaszkurewicz [71]. Their treatment is different from ours, as they consider discrete dynamical systems and discrete controls, and does not directly generalize to nonlinear CG.

Application to micromechanical problems and implementation in the FFT framework
In a small-strain setting, suppose a heterogeneous, possibly nonlinear elastic energy w : Y × Sym(d) → R is given, where Y ⊆ R d denotes a rectangular cell in d dimensions and Sym(d) stands for the linear space of symmetric d × dmatrices. Typically, w is differentiable in the second variable, and has jumps in the first, the spatial, variable. The latter framework also encompasses inelastic problems, as w may stand for the condensed incremental potential corresponding to a fixed time step, see Ortiz-Stainier [72] and Miehe [73]. For prescribed macroscopic E ∈ Sym(d), we seek a periodic displacement field u : Y → R d solving the balance of linear momentum on the microscale div σ (E + ∇ s u) = 0, (3.1) where ∇ s denotes the symmetrized gradient, div stands for the divergence, and σ is the stress operator derived from the potential w, After solving Eq. (3.1) for u, the effective stress is computed by volume averaging To apply nonlinear conjugate gradient methods, we recast the balance of linear momentum (3.1) as an optimization problem (2.7). We set V = H 1 # (Y ; R d ), the first order L 2 -Sobolev space of periodic vector fields with vanishing mean, and which is well-defined under technical conditions, see Schneider [74]. We notice that the first order optimality condition of the functional (3.2) is precisely the balance of linear momentum (3.1).

For the Hilbert space
where : denotes the double contraction of tensor indices. Recall that, for a continuously differentiable function f on a Hilbert space, the differential D f (u), for u ∈ V , is an element of the (continuous) dual Hilbert space, ie, D f (u) ∈ V .
The gradient of f at u is defined implicitly via Thus, ∇ f (u) ∈ V , but is dependent on the chosen inner product on V . (In contrast, for two equivalent inner products on a fixed Hilbert space, the corresponding differentials are identical.) More explicitly, the gradient may be written as where R : V → V denotes the Riesz isomorphism, the inverse of the mapping V → V , u → u, · .
In the homogenization context (3.2), we have which we may write more succinctly in the form in terms of the "natural" pairing The Riesz map for the inner product (3.3), in turn, becomes R = −G in terms of the Green's function G = (div ∇ s ) −1 , which is well-defined on H 1 # (Y ; R d ) (due to the mean-value constraint).
With these identifications, for given initial vector u 0 and d −1 = 0, the nonlinear conjugate gradient update (2.8) becomes For FFT-based methods, it is convenient to introduce the total strain ε k = E + ∇ s u k and D k = ∇ s d k , and to rewrite the latter update in the form using the operator = ∇ s Gdiv , an orthogonal projector on L 2 (Y ; Sym(d)) (see Milton [75]). Upon choosing β k ≡ 0 in Eq. (3.4), we recover the basic scheme of Moulinec-Suquet with variable step-size, confirming the observations in Schneider [74]. Specializing to the Fletcher-Reeves CG (2.9), we may write Discretizing the balance of linear momentum (3.1), see Moulinec-Suquet [1,2], Willot [22] or Schneider-Ospald-Kabel [21], leads to discretized systems of equations (with corresponding discretized -operators) that may be solved using nonlinear CG (3.5). We summarized the implementation in algorithm 1. Several remarks are in order.

Algorithm 1 Fletcher-Reeves nonlinear CG (maxit, tol)
1: Determine initial guess ε and initial reference material C 0 2: residual ← 1 3: k ← 0 4: γ ← 1 5: D ← 0 6: while k < maxit and residual > tol do 7: 19: end while 20: return ε, residual 1. To complete the description of the Algorithm 1, a stepsize strategy needs to be prescribed. We use the step-size of the basic scheme (see Schneider [74] for the nonlinear convex case), ie, if c − and c + denote the smallest and largest eigenvalue of the material tangents ∂σ ∂ε , evaluated over all microscopic points and admissible strains, we set α = 2 c − +c + . 2. In the presented form, the algorithm can be implemented on three strain-like fields (ε, G, D). In contrast to the Barzilai-Borwein scheme [39] and the fast gradient schemes [74], an additional field is necessary for evaluating β k . 3. The other nonlinear conjugate gradient methods (2.10), (2.12) and (2.13) may be implemented in a similar way. However, they require storing (at least) an additional vector, as D k − D k−1 enters the formulae for β k . 4. If the memory footprint is decisive, reduced-memory implementations may be used, see for instance Grimm-Strehle-Kabel [76]. 5. Incorporating mixed boundary conditions is straightforward, see Kabel-Fliegener-Schneider [77].

Setup and materials
The nonlinear conjugate gradient method, cf Alg. 1, was integrated into an in-house FFT-based computational micromechanics code (written in Python with Cython extensions). For all but the MMC example, we use the staggered-grid discretization [21]. For the MMC example, in consistence with Schneider-Wicht-Böhlke [29], we rely upon the original spectral discretization of Moulinec-Suquet [1,2,78].
As the convergence criterion, we use Affine extrapolation [2] is used to generate a good initial guess for the subsequent load step. The linear elastic computations were carried out on a Laptop with 16 GB RAM and a dual core Intel i7 CPU. For the inelastic computations, a desktop computer with a 6-core Intel i7 CPU and 32GB RAM was utilized.
All used material parameters are collected in Table 1. The meaning of the inelastic parameters is specified in the proper subsection.

A sandcore microstructure
As our first example, we consider a microstructure of inorganically bound sand for casting applications, which was already investigated in Schneider [39]. It consists of 64 sand grains with 58.58% volume fraction, held together by 1.28% binder, cf Fig. 1a. The geometry was generated by the algorithm described in the work of Schneider et al. [84] and resolved by 256 3 voxels. The material parameters used for the grains and the binder are given in Table1.
We apply uniaxial extension in x-direction (for 5% axial strain), see Fig. 1b, where the norm of the strain field is shown. For more details about digital sand-core physics, we refer to the work of Ettemeyer et al [85]. We compare several solution methods for this porous example. First, we compare the different conjugate gradient methods, cf Fig. 2a. The linear conjugate gradient method [31,41] serves as our reference. We investigate the performance of the nonlinear conjugate gradient methods of Fletcher-Reeves (2.9), Dai-Yuan (2.13), Hestenes-Stiefel (2.12) and Polyak-Ribière-Polyak (2.10) for constant step-size (chosen according to the basic scheme). If exact line search was used, all these conjugate gradient methods would lead to identical iterates.
From Fig. 2a we see that the Fletcher-Reeves and Dai-Yuan methods require about 50% more iterations than linear CG. The Dai-Yuan method is only slightly slower than Fletcher-Reeves, but exhibits a distinct zig-zagging from one iteration to the next. In contrast, the decrease of the residual is monotonic for the Fletcher-Reeves method.
The two other investigated nonlinear conjugate gradient methods, Hestenes-Stiefel and Polyak-Ribière-Polyak are much slower for this scenario. Actually, their convergence behavior is very close to the basic scheme in Fig. 2b. This is no coincidence, as we observe β HS k → 0 and β PRP k → 0 as k → ∞.
In addition to the conjugate gradient method, other popular FFT-based solvers were applied to the problem at hand, see Fig. 2b. The Broyden-Fletcher-Goldfarb-Shanno algorithm [56][57][58][59] may be considered as another generalization of the linear CG method, as it reduces to the linear CG method when applied to quadratic problems with exact line search. Here, we investigate the limited-memory version of BFGS, as introduced by Nocedal [60], see Wicht-Schneider-Böhlke [35] for details in the FFT-based context. For a depth of 4, L-BFGS exhibits a convergence behavior that almost coincides with Fletcher-Reeves CG, however at a much higher computational cost per iteration.
The Barzilai-Borwein method [86] is also investigated, as it serves as a reference solver in FFT-based computational micromechanics [39]. The Barzilai-Borwein method exhibits a strongly non-monotonic convergence behavior, as can also be seen in Fig. 2b. However, the method can be shown to exhibit local r-linear convergence [87] when applied to uniformly convex optimization problems. As we are only interested in the smallest k, s.t. the convergence criterion (4.1) is fulfilled, the non-monotonicity may be noncritical. Concerning the number of iterations to terminate, the Barzilai-Borwein scheme is almost identical to L-BFGS(4) and Fletcher-Reeves CG. The Anderson-accelerated basic scheme was pioneered by Shantraj et al. [38] and Chen et al. [88]. It may be interpreted as a Quasi-Newton method of multi-secant type [89]. Applied to linear problems it is closely related to GMRES, see Toth-Kelley [90]. For low accuracy (above 10 −3 ), Anderson with depth 4 outperforms all other methods. This is to be expected, as the (generalized) minimal residual method represents the Krylov subspace solver with minimal residual [91]. (Recall that linear CG is the Krylov subspace solver with minimal "energy".) However, for higher accuracy, the method slows down considerably. This complies with theoretical and numerical results [90], which show that an Anderson-accelerated gradient method does not improve the convergence rate, only the constant, see Li-Li [92].   For comparison we have also included Nesterov's method with speed restart, as described in Schneider [74]. It represents the fastest accelerated gradient method for this kind of problem, compare Schneider [39], Sect. 3.2. Nesterov's method is optimal in the sense that its convergence rate is identical to the worst-case convergence rate for optimizing uniformly convex functions with Lipschitz gradient [70]. However, the constant in front of the convergence estimate is about 4 times as high as the worst-case bound (which might also be too pessimistic). Furthermore, speed restarting is always slower than if Nesterov's method is applied for explicitly known convexity constant. For the problem at hand, Nesterov's method requires about 5 times as many iterations as linear CG. Last but not least, we report on the basic scheme which fails to converge to the prescribed tolerance within 1000 iterations.

A planar short-fiber reinforced composite
We investigate a short-fiber reinforced microstructure and the associated effective mechanical response. We consider short glass fibers with a length of 275 µm and a diameter of 10 µm, reinforcing a polymer by 18% volume fraction. The second-order fiber-orientation tensor reads A = diag(0.45, 0.45, 0.1). The periodic microstructure, cf Fig. 3a, has a size of 1024 µm × 1024 µm × 128µm and was generated by the sequential addition and migration algorithm [93]. The resulting microstructure contains 1119 cylindrical fibers and was discretized by 512 × 512 × 64 voxels.
For the elastic parameters of E-glass and the polymer, cf Table 1, we solve a linear homogenization problem for prescribed uniaxial extension of 5% strain up to a specified tolerance of 10 −10 . Compared to the example of the previous section, the example at hand is simpler as the constituents are finitely contrasted. (The difficulties emerge for the nonlinear behavior, though). We compare the performance of the various nonlinear CG methods to linear CG in Fig. 4a. Both the Hestenes-Stiefel and the Polak-Ribière-Polyak method were not competitive, whence they are not included in the figure. The linear conjugate gradient method converges linearly, and quickly so. The Dai-Yuan and the Fletcher-Reeves CG are almost indistinguishable, converging also linearly, but with a slower rate than linear CG. Both require about 50% more iterations than linear CG. In contrast to the sandcore microstructure, the Dai-Yuan CG does not exhibit a zig-zagging convergence behavior. This observation may indicate that the Dai-Yuan method is more sensitive to the step-size than the Fletcher-Reeves CG. Indeed, due to the finite contrast, the step-size of the basic scheme (which we also use for the nonlinear CGs) is smaller than for the infinite-contrast case.
The numerical performance of a variety of other solvers is shown in Fig. 4b. We see that the Fletcher-Reeves method lags behind three other methods in terms of convergence rate: the L-BFGS(4) method, the heavy-ball scheme and the Barzilai-Borwein solver all lead to approximately identical convergence rate. The heavy-ball scheme, however, starts to become unstable for a residual of 10 −9 . This is certainly a problem of the implementation, see Ernesti-Schneider-Böhlke [37], and can be cured easily, but at the expense of storing an additional field. The fluctuations inherent to the Barzilai-Borwein method are smaller than for the infinitecontrast case of the previous section. Similar to the sandcore microstructure, the Anderson-accelerated basic scheme is superior to all other schemes up to a tolerance of 10 −3 , but become dramatically slower for higher accuracy. The basic scheme is not competitive.
After this warm-up, we turn our attention to the inelastic behavior. More precisely, following Wu et al. [82], we furnish the matrix with a J 2 -elastoplastic material model and linearexponential hardening We apply uniaxial extension in x-direction up to 5% strain, distributed into 50 equal-sized time steps. The stress-strain curves for matrix, fiber and the composite are shown in Fig. 5a. The local strain fields corresponding to 1%, 2%, 3%, 4% and 5% are shown in Fig. 3. Up to 1% applied strain, cf Fig. 3b, the local strain field exhibits peaks only at fiber tips and at spots where fibers are closely packed. For higher levels of strain, large portions of the matrix get plastified, as is reflected in the high strain levels. For the inelastic problem at hand, we compare the basic scheme, Fletcher-Reeves CG, the Newton-CG method, L-BFGS(4) and the Barzilai-Borwein scheme, both in terms of iteration count and runtime, cf Fig. 6. Please keep in mind that the runtimes were recorded for the computer with 6 cores, see Sect. 4.1. For the forcing term in Newton-CG, we choose forcing-term choice 2 with γ 0 = 0.75, see Wicht-Schneider-Böhlke [35].
Taking a look at the iteration count, cf Fig. 6a, we see that the iterations required for the basic scheme increase monotonically up to about 2% applied strain. For higher levels of loading, the iteration count decreases again. For the other solvers, the iterations counts are much smaller than for the basic scheme. Their respective iteration counts are on a similar level, with the Newton-CG lagging behind slightly for strain levels exceeding 3.5%.
A look at the runtime per step, cf Fig. 6b, enables us to further examine the performance of the solvers. As for the iteration counts, the basic scheme leads to much higher runtimes. The other solvers however, operate on different levels. The limited-memory BFGS method clearly lags behind in terms of performance. Fletcher-Reeves CG and Newton-CG exhibit more or less the same runtimes, with Newton-CG having advantages for lower strain-levels and Fletcher-Reeves  CG being faster later on. The Barzilai-Borwein scheme appears to be faster, probably because of the lower computational cost per step. Indeed, the Barzilai-Borwein scheme is very close to the basic scheme in terms of computational cost per step, whereas both Newton-CG and Fletcher-Reeves CG are similar in terms of these costs, as long as the nonlinear material law is comparably expensive as applying the material tangent. Note that we have also run the heavy-ball method for this problem. However, the scheme did not converge within 1000 iterations for the fourth load step, whence we consider the method clearly inferior.
To quantify our observations, both the average iteration count and the average runtime per step are collected in Table 2. The basic scheme takes more than ten times the iteration count that is required for the second worst solver considered. Thus, for the problem at hand, the basic scheme is clearly not competitive. Notice, however, that the basic scheme features the lowest run-time per iteration.
In terms of runtime per iteration, the Barzilai-Borwein method and Newton-CG are comparable. The Barzilai-Borwein method requires fewer iterations than Newton-CG. This is caused by the delicate choice of the forcing term (the tolerance to which the linear system needs to be solved) in the Newton method, see Wicht-Schneider-Böhlke [35] for a discussion. The Fletcher-Reeves CG has a similar iteration count as the Barzilai-Borwein method, but features a higher runtime per iteration. All in all, it is 20% slower than the Barzilai-Borwein method. Still, it is slightly faster than Newton-CG, but is characterized by a much smaller memory-  Notice that for the Newton-CG method we sum linear and nonlinear iterations, and mark it by a * . The performance overhead is defined as the runtime divided by the smallest runtime footprint than Newton-CG. Last but not least, we mention L-BFGS, which has comparable iteration count, but is not competitive in terms of runtime. Indeed, the runtime per iteration is more than twice as high as for the basic scheme.

A metal-matrix composite
As our last inelastic example, we consider a metal-matrix composite (MMC). The microstructure containing spherical fillers is shown in Fig. 7a. The ceramic fillers are modeled elastically, whereas the aluminium matrix is governed by J 2elastoplasticity with power-law hardening The material parameters are listed in Table 1 following Segurado et al [83]. The composite was subjected to uniaxial extension up to 3% strain, decomposed into 225 equidistant time-steps. The stress-strain curves are shown in Fig. 5b. The accumulated plastic strain fields for increasing levels of applied strain are shown in Fig. 7b-d. Due to the strong nonlinearity induced by power-law hardening, this setup represents a challenging benchmark for FFTbased solution methods, and has been considered before in Schneider-Wicht-Böhlke [29] and Schneider [39]. To ensure compatibility to the results in the mentioned papers, the spectral discretization of Moulinec-Suquet [1,2] was used. The iteration count per load step for the basic scheme, the Barzilai-Borwein method and Fletcher-Reeves CG is shown on the left of Fig. 8. We see that the basic scheme performs quite well for this example, and the Fletcher-Reeves CG is on the same level as Barzilai-Borwein. We refrained from including too many solvers into the latter diagram, as many solvers operate on a similar level for this example (or fail admirably, rendering the diagram less expressive). Thus, we included a  3. Fast gradient methods (Nesterov's method with optimal step-size [49], the heavy-ball method with optimal stepsize [49], Fletcher-Reeves CG (2.9)).
We notice that all schemes are stable w.r.t. mesh refinement except for Nesterov's method and the heavy-ball scheme. This is a result of the lack of uniform convexity of the condensed energy functional governing von-Mises elastoplasticity with power-law hardening. Recall that the performances of Nesterov's method and the heavy-ball scheme crucially rely upon knowing the strong convexity constant in question. We follow the approach of determining the strong convexity constant in terms of the eigenvalues of the tangent stiffness, for every iteration. As the mesh gets refined, the strains at the fiber tips increase, leading to a lowering of the convexity constant. A similar, but less severe trend is observed for the Eyre-Milton scheme, where the average iteration count increases for 128 3 voxels. For the polarization schemes, the strong convexity constant is required as well. However, the two damped polarization solvers, the augmented Lagrangian and the Monchiet-Bonnet scheme suffer less from mesh Concerning the overall performance, the basic scheme is worst, requiring about 150% more iterations than the Barzilai-Borwein scheme. In turn, the polarization schemes outperform the Barzilai-Borwein scheme, or operate on roughly the same level. Recall from Schneider [39] that polar-ization schemes fail to perform well for infinitely contrasted media, but exhibit very good performance for finitelycontrasted media, as long as the complexity-reduction trick [29] is available.
From the category of fast gradient schemes, both Nesterov's scheme and the heavy-ball solver fail to be competitive. The heavy-ball method becomes unstable, probably rooted in the lack of global convergence of the scheme. Nesterov's method is globally convergent (and faster than the basic scheme). However, it requires about twice as many iterations as the other competitive solvers. Fletcher-Reeves CG consistently outperforms the Barzilai-Borwein scheme in terms of required iterations, but lags behind the two better polarization schemes.

Conclusion
This work was devoted to studying nonlinear conjugate gradient methods applied in the context of FFT-based micromechanics. We showed that the Fletcher-Reeves nonlinear conjugate gradient method is capable of filling the role as general-purpose solver for FFT-based computational micromechanics. Its monotonicity properties, which we showed by computational examples, set it apart from the Barzilai-Borwein method. In other regards, the performance is similar. Of course, the price to pay for monotonicity is a slight increase in both memory footprint and computational effort. In this regard, we may insert the Fletcher-Reeves CG into Table 3, which compares the corresponding merits and downsides of the competitive solvers for FFT-based computational micromechanics.
Our focus has been on providing a useful and useable method, and did not dwell upon theoretical investigations. Exploring the Fletcher-Reeves dynamical system (2.23) appears alluding, in particular in terms of stability and basins of attraction. Also, the corresponding time-discrete system should be investigated.
Acknowledgements Open Access funding provided by Projekt DEAL. This work was supported by the German Research Foundation (DFG) within the International Research Training Group "Integrated engineering of continuous-discontinuous long fiber reinforced polymer structures" (GRK 2078). The author acknowledges stimulating discussions with H. Andrä, D. Wicht and S. Gajek.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.