Derivative-free separable quadratic modeling and cubic regularization for unconstrained optimization

We present a derivative-free separable quadratic modeling and cubic regularization technique for solving smooth unconstrained minimization problems. The derivative-free approach is mainly concerned with building a quadratic model that could be generated by numerical interpolation or using a minimum Frobenius norm approach, when the number of points available does not allow to build a complete quadratic model. This model plays a key role to generate an approximated gradient vector and Hessian matrix of the objective function at every iteration. We add a specialized cubic regularization strategy to minimize the quadratic model at each iteration, that makes use of separability. We discuss convergence results, including worst case complexity, of the proposed schemes to ﬁrst-order stationary points. Some preliminary numerical results are presented to illustrate the robustness of the specialized separable cubic algorithm


Introduction
We consider unconstrained minimization problems of the form where the objective function f : R n → R is continuously differentiable in R n . However, we assume that the derivatives of f are not available and that cannot be easily approximated by finite difference methods. This situation frequently arises when f must be evaluated through black-box simulation packages, and each function evaluation may be costly and/or contaminated with noise [12].
Recently [4,21,22], in a derivative-based context, several separable models combined with either a variable-norm trust-region strategy or with a cubic regularization scheme were proposed for solving (1), and their standard asymptotic convergence results were established. The main idea of these separable model approaches is to minimize a quadratic (or a cubic) model at each iteration, in which the quadratic part is the second-order Taylor approximation of the objective function. With a suitable change of variables, based on the Schur factorization, the solution of these subproblems is trivialized and an adequate choice of the norm at each iteration permits the employment of a trust-region reduction procedure that ensures the fulfillment of global convergence to second-order stationary points [4,21]. In that case, the separable model method with a trust-region strategy has the same asymptotic convergence properties as the trust-region Newton method. Later in [22], starting with the same modeling introduced in [21], the trust-region scheme was replaced with a separable cubic regularization strategy. Adding convenient regularization terms, the standard asymptotic convergence results were retained, and moreover the complexity of the cubic strategy for finding approximate first-order stationary points became O(ε −3/2 ). For the separable cubic regularization approach used in [22], complexity results with respect to second-order stationarity were also established. We note that regularization procedures serve to the same purpose and are strongly related to trust-region schemes, with the advantage of possessing improved worst-case complexity (WCC) bounds; see, e.g., [1,2,5,6,8,16,18,19,20,24,27].
However, as previously mentioned, the separable cubic approaches developed in [4,21,22] are based on the availability of the exact gradient vector and the exact Hessian matrix at every iteration. When exact derivatives are not available, quadratic models which are based only on the objective function values, computed at sample points, can be obtained retaining good quality of approximation of the gradient and the Hessian of the objective function. These derivative-free models can be constructed by means of polynomial interpolation or regression or by any other approximation technique. These models are called, depending on their accuracy, fully-linear or fully-quadratic; see [9,10,12] for details.
Fully-linear and fully-quadratic models are the basis for derivative-free optimization trust-region methods [11,12,26] and have also been successfully used in the definition of a search step for unconstrained directional direct search algorithms [13]. In the latter, minimum Frobenious norm approaches are adopted, when the number of points available does not allow the computation of a determined interpolation model. This state of affairs motivated us to develop a derivative-free separable version of the regularized method introduced in [22]. This means that we will start with a derivative-free quadratic model, which can be obtained by different schemes, to obtain an approximated gradient vector and Hessian matrix per iteration, and then we will add the separable regularization cubic terms associated with an adaptive regularization parameter to guarantee convergence to stationary points.
The paper is organized as follows. In Section 2 we present the main ideas behind the derivativebased separable modeling approaches. Section 3 revises several derivative-free schemes for building quadratic models. In Section 4 we describe our proposed derivative-free separable cubic regularization strategy, and discuss the associated convergence properties. Section 5 reports numerical results to give further insight into the proposed approach. Finally, in Section 6 we present some concluding remarks.
Throughout, unless otherwise specified, we will use the Euclidean norm x = (x x) 1/2 on R n , where the inner product x x = n i=1 x 2 i . For a given ∆ > 0 we will denote the closed ball B(x; ∆) = {y ∈ R n | y − x ≤ ∆}.

Separable Cubic Modeling
In the standard derivative-based quadratic modeling approach, for solving (1), a quadratic model of f (x) around x k is constructed by defining the model of the objective function as where f k = f (x k ), g k = ∇f (x k ) is the gradient vector at x k , and H k is either the Hessian of f at x k , ∇ 2 f (x k ), or a symmetric approximation of it. The step s k is the minimizer of m k (s).
In [21], instead of using the standard quadratic model associated with Newton's method, the equivalent separable quadratic model was considered to approximate the objective function f around the iterate x k . In (3), the change of variables y = Q k s is used, where the spectral (or Schur) factorization of H k : is computed at every iteration. In (4), Q k is an orthogonal n × n matrix whose columns are the eigenvectors of H k , and D k is a real diagonal n × n matrix whose diagonal entries are the eigenvalues of H k . Let us note that since H k is symmetric then (4) is well-defined for all k. We also note that (3) may be non-convex, i.e., some of the diagonal entries of D k could be negative. For the separable regularization counterpart in [22], the separable model (3) is kept and a cubic regularization term is added: where σ k ≥ 0 is dynamically obtained. Notice that a 1/6 factor is included in the last term of (5) to simplify derivative expressions. As a consequence, at every iteration k the subproblem min y∈R n m SR k (y) subject to y ∞ ≤ ∆, is solved to compute the vector y k , and then the step will be recovered as The gradient of the model m SR k (y), given by (5), can be written as follows: where the i-th entry of the n-dimensional vector u k is equal to |y i |y i . Similarly, the Hessian of (5) is given by Notice that, since D k is diagonal, the model (5) is separable. Hence, to solve ∇m SR k (y) = 0, and find the critical points, we only need to independently minimize n one-dimensional special functions in the closed interval [−∆, ∆]. These special one-variable functions are of the following form The details on how to find the global minimizer of h(z) on the closed and bounded interval [−∆, ∆], for ∆ > 0, are fully described in [22,Sec. 3].
In the next section, we will describe several derivative-free alternatives to compute a model of type (2), to be incorporated in the separable regularized model (5).

Fully-linear and Fully-quadratic Derivative-free Models
Interpolation or regression based models are commonly used in derivative-free optimization as surrogates of the objective function. In particular, quadratic interpolation models are used as replacement of Taylor models in derivative-free trust-region approaches [11,26].
The terminology fully-linear and fully-quadratic, to describe a derivative-free model that retains Taylor-like bounds, was first proposed in [12]. Definitions 3.1 and 3.2 provide a slightly modified version of it, suited for the present work.  ∇f (x + s) − ∇m(s) ≤ κ eg ∆, ∀s ∈ B(0; ∆), (6) and the error between the model and the function satisfies Such a model m is called fully-linear on B(x; ∆).
2. For this class M there exists an algorithm, which we will call a 'model-improvement' algorithm, that in a finite, uniformly bounded (with respect to x and ∆) number of steps can either establish that a given model m ∈ M is fully-linear on B(x; ∆) (we will say that a certificate has been provided), or find a model m ∈ M that is fully-linear on B(x; ∆).
For fully-quadratic models, stronger assumptions on the smoothness of the objective function are required.  1. There exist positive constants κ ef , κ eg , and κ eh such that for any x ∈ R n and ∆ ∈ (0, ∆ max ] there exists a model function m(s) in M , with Lipschitz continuous Hessian, and such that the error between the Hessian of the model and the Hessian of the function satisfies the error between the gradient of the model and the gradient of the function satisfies and the error between the model and the function satisfies Such a model m is called fully-quadratic on B(x; ∆).
2. For this class M there exists an algorithm, which we will call a 'model-improvement' algorithm, that in a finite, uniformly bounded (with respect to x and ∆) number of steps can either establish that a given model m ∈ M is fully-quadratic on B(x; ∆) (we will say that a certificate has been provided), or find a model m ∈ M that is fully-quadratic on B(x; ∆).
Algorithms for model certification or for improving the quality of a given model can be found in [12]. This quality is directly related to the geometry of the sample set used in its computation [9,10]. However, some practical approaches have reported good numerical results related to implementations that do not consider a strict geometry control [3,14].

Derivative-free Separable Cubic Regularization Approach
In a derivative-free optimization setting, instead of (2), we will consider the following quadratic model k (x k ) are good quality approximations of g k and H k , respectively, built using interpolation or a minimum Frobenius norm approach (see Chapters 3 and 5 in [12]). Hence, analogous to the discussion in Section 2, by using the change of variables y =Q k s, wherẽ H k =Q kDkQ k , withQ k an orthogonal n × n matrix whose columns are the eigenvectors ofH k , and D k is a real diagonal n × n matrix whose diagonal entries are the eigenvalues ofH k , the equivalent separable quadratic modelm is used for the approximation of the objective function f around the iterate x k . We then regularize (9) by adding a cubic or a quadratic term, depending on having been able to compute a fully-quadratic or a fully-linear model, respectively: where p ∈ {2, 3} and σ k ≥ 0 is dynamically obtained. As a consequence, at every iteration k the subproblem is solved to compute the vector y k , and then the step will be recovered as Again, the constraint y ∞ ≤ ∆ will ensure the existence of solution for problem (10). The additional constraint y ∞ ≥ ξ σ k relates the stepsize with the regularization parameter and is required to establish WCC results. A similar strategy has been used in [8] when building models using a probabilistic approach. As we will see in Section 4, this additional lower bound does not prohibit the iterative process to drive the first-order stationarity measure below any given small positive threshold.
In this case, by solving n one-dimensional independent minimization problems in the closed intervals [−∆, −ξ/σ] and [ξ/σ, ∆], we are being more demanding than the original constraint. These one-variable functions are of the form The details on how to find the global minimizer of h(z) on the closed and bounded intervals [−∆, −ξ/σ] and [ξ/σ, ∆], for ∆ > 0 and ξ/σ > 0, are similar to the ones described in [22,Sec. 3]. A practical approach for the resolution of (10) will be suggested and tested in Section 5.
The following algorithm is an adaptation of Algorithm 2.1 in [22], for the derivative-free case.
Step 2: Build a quadratic polynomial modelm k (s) = f k +g k s + 1 2 s H k s, by selecting points in B(x k , ξ σ k ) (fully-linear, minimum Frobenious norm models or fully-quadratic polynomial models can be considered, depending on the number of points available for reuse or on the effort allowed in terms of number of function evaluations). Set p = 2 (respectively p = 3) if the computed model is fully-linear (respectively fully-quadratic).
Step 3: Compute a solution s trial of whereH k =Q kDkQ k is a Schur factorization ofH k .
Step 4: Test the sufficient decrease condition If (12) is fulfilled, define s k = s trial , x k+1 = x k + s k , update k ← k + 1 and go to Step 1. Otherwise set σ new = ησ k , update σ k ← σ new , and go to Step 2. (11) does not affect the separability nature of Step 3, since it can be equivalently replaced by |(Q k s) i | ≤ ∆ for all i. However, the lower bound in (11) affects the separability of Step 3. Two strategies have been developed to impose the lower bound constraint in (11) while maintaining the separability approach. These strategies will be described in Section 5.

Remark 4.1 The upper bound constraint in
In the following subsections, the convergence and worst-case behavior of Algorithm 1 will be analyzed independently for the fully-linear and fully-quadratic cases.

Fully-linear Approach
This subsection will be devoted to the analysis of the WCC of Algorithm 1 when fully-linear models are used. For that, we need the following technical lemma.
As it is common in nonlinear optimization, we assume that the norm of the Hessian of each model is bounded.
Assumption 4.1 Assume that the norm of the Hessian of the model is bounded, i.e., for some κH > 0.
We also assume that the trial point provides decrease to the current model.

Assumption 4.2 Assume that
In the following lemma, we will derive an upper bound on the number of function evaluations required to satisfy the sufficient decrease condition (12), which in turn guarantees that every iteration of Algorithm 1 is well-defined. Moreover, we also obtain an upper bound for the regularization parameter.
Lemma 4.2 Let Assumptions 3.1, 4.1, and 4.2 hold and assume that at Step 2 of Algorithm 1 a fully-linear model is always used. In order to satisfy condition (12), with p = 2, Algorithm 1 needs at most  function evaluations, not accounting for the ones required for model computation. In addition, the maximum value of σ k for which (12) is satisfied, is given by Proof First, we will show that if then the sufficient decrease condition (12) of Algorithm 1 is satisfied for p = 2.
In view of (15), we have Thus, by using (6), (13), (14), and s trial ≥ ξ σ k (due to Step 3 of Algorithm 1), we obtain where the last inequality holds due to (18). Now, from the way σ k is updated at Step 4 of Algorithm 1, it can be easily seen that for the fulfillment of (12) with p = 2 we need function evaluations, and, additionally, the upper bound on σ k at (17) is derived from (18). 2 The following assumption, which holds for global solutions of subproblem (11) (with p = 2), is central in establishing our WCC results. For similar assumptions required to obtain worst-case complexity bounds see [2,20].
Under this assumption, we are able to prove that, when the trial point is not on the boundary of the feasible region of (11) (with p = 2), then the norm of the gradient of the objective function at the new point is of the same order as the norm of the trial point. or where κ 1 = L g + κ eg + κH + σ max + β 1 , and σ max was defined in Lemma 4.2.
Proof Assume that none of the equalities at (20) Now, by using Assumption 3.1, (14), and (6), we have Therefore, in view of (19), we have which completes the proof. 2 Now, we have all the ingredients to derive an upper bound on the number of iterations required by Algorithm 1 to find a point at which the norm of the gradient is below some given positive threshold.
where σ max and κ 1 were defined in Lemmas 4.2 and 4.3, respectively.
Proof In view of Lemma 4.3, we have Hence, since ∇f (x k+1 ) > and ∆ > ξ σ k , we obtain On the other hand, due to the sufficient decrease condition (12), we obtain By summing up these inequalities, for 0, 1, . . . , k, we obtain which concludes the proof.  The complexity bound derived here matches the one derived in [15] for derivative-free trust-region optimization methods.

Fully-quadratic Approach
In this subsection, we will analyze the WCC of Algorithm 1 when we build fully-quadratic models.
The following lemma is essential for establishing such bounds. and Similarly to the fully-linear case, we assume that the value of the model at the trial point is less than or equal to the current function value.

Assumption 4.4 Assume that
With this assumption, we are able to obtain upper bounds on the number of function evaluations required to satisfy the sufficient decrease condition (12), and also on the regularization parameter. function evaluations, not considering the ones required for model computation. In addition, the maximum value of σ k for which (12) is satisfied, is given by Proof First, we will show that if then the sufficient decrease condition (12) of Algorithm 1 is satisfied, with p = 3.

Now, by applying
which, in view of the inequality · 3 ≥ n −1/6 · 2 (see Theorem 16 on page 26 in [17]), leads to where the last inequality holds due to (26). Now, from the way σ k is updated at Step 4 of Algorithm 1, it can easily be seen that for the fulfillment of (26) we need function evaluations, and, additionally, the upper bound on σ k at (25) is derived from (26). 2 The following assumption is quite similar to condition (14) given in [22], and it holds for global solutions of subproblem (11) (with p = 3). For similar assumptions, required to obtain worst-case complexity bounds associated with cubic regularization, see [1,2,6,8,20,27].
Again, we are able to prove that, when the trial point is not on the boundary of the feasible region of (11) (with p = 3), then the norm of the gradient of the function computed at the new point is of the order of the squared norm of the trial point.
Proof Assume that none of the equalities at (28) hold. We have ∇ sm SR k (s trial ) =g k +H k s trial + r(s trial ), where Now, by using (23), (7), and (8), we have Therefore, in view of (27), we have which completes the proof. 2 Now, we have all the supporting results to establish the WCC bound of Algorithm 1 for the fully-quadratic case.
where σ max and κ 2 were defined in Lemmas 4.5 and 4.6, respectively.
Proof In view of Lemma 4.6, we have Hence, since ∇f (x k+1 ) > and ∆ > ξ σ k , we obtain On the other hand, due to the sufficient decrease condition (12) and the inequality · 3 ≥ n −1/6 · 2 , we obtain By summing up these inequalities, for 0, 1, . . . , k, we obtain , which concludes the proof. 2 Similarly to what we saw before for the fully-linear case, since κ eg = O(n) and κ eh = O(n) (see Chapter 3 in [12]), we have κ 2 = O(n 3/2 ). By choosing ξ in a way such that ξ σmax = O( κ 2 ), the dependency of the upper bound given at (29) on n becomes of the order O(n 11/4 ). Additionally, for building a fully-quadratic model we need O(n 2 ) function evaluations. Combining these facts with Theorem 4.2, we can establish a WCC bound for driving the first-order stationarity measure below some given positive threshold. In terms of , the derived complexity bound matches the one established in [7] for a derivative-free method with adaptive cubic regularization. The dependency of the bound derived here on n is worse than the one derived in [7]. However, we have explicitly taken into account the dependency of the constants κ eg and κ eh on n.

Illustrative Numerical Experiments
In this section we illustrate the different options to build the quadratic models at Step 2 of Algorithm 1 and two different strategies to address the subproblems (10).
Model computation is a key issue for the success of Algorithm 1. However, in Derivative-free Optimization, saving in function evaluations by reusing previously evaluated points is a main concern. At each evaluation of a new point, the corresponding function value is stored in a list, of maximum size equal to (n + 1)(n + 2), for possible future use in model computation. If new points need to be generated with the sole purpose of model computation, the center, 'extreme' points and 'mid-points' of the set defined by x k + 1 σ k [I − I] are considered. Inspired by the works of [3,14], no explicit control of geometry is kept (in fact, we also tried the approach suggested by [26], but the results did not improve). If a new point is evaluated and the maximum number of points allowed in the list has been reached, then the point farther away from the current iterate will be replaced by the new one. Points are always selected in B(x k ; 1 σ k ) for model computation. The option for a radius larger than ξ σ k , since in our numerical implementation ξ = 10 −5 , allows a better reuse of the function values previously computed, avoiding an excessive number of function evaluations just for model computation. Additionally, the definition of the radius as 1 σ k ensures that if the regularization parameter increases, the size of the neighborhood in which the points are selected decreases, a mechanism that resembles the behavior of trust-region radius in derivative-based optimization.
Fully-linear and fully-quadratic models can be considered at all iterations, as well as hybrid versions, where depending on the number of points available for reuse inside B(x k ; 1 σ k ) the option for a fully-linear or a fully-quadratic model is taken (thus, some iterations will use a fully-linear model and others a fully-quadratic model). Fully-quadratic models always require (n + 1)(n + 2)/2 points for computation. Fully-linear models are built using all the points available in B(x k ; 1 σ k ), once that this number is at least n + 2 and does not exceed (n + 1)(n + 2)/2 − 1. In this case, a minimum Frobenius norm approach is taken to solve the linear system that provides the model coefficients (see [12,Section 5.3]).
Regarding the solution of subproblem (10), the imposed lower bound causes difficulties to the separability approach. Two strategies were considered to address it. In the first one, every onedimensional problem considers the corresponding lower and upper bounds. This approach is not equivalent to the original formulation. It imposes a stronger condition since any vector y computed with this approach will satisfy y ∞ ≥ ξ σ k , but there could be a vector y satisfying y ∞ ≥ ξ σ k , which does not satisfy |y i | ≥ ξ σ k , ∀i ∈ {1, . . . , n}. The second approach adopted disregards the lower bound condition, only considering y ∞ ≤ ∆ when solving subproblem (10). After computing y, the lower bound condition is tested and, if not satisfied, max i=1,...,n |y i | is set equal to ξ σ k to force the obtained vector y to also satisfy the lower bound constraint at (10). Algorithm 1 was implemented in Matlab 2021a. The experiments were executed in a laptop computer with CPU Intel core i7 1.99 GHz, RAM memory of 16 GB, running Windows 10 64-bits. As test sets, we considered the smooth and nonsmooth collections proposed in [23]. Each test set has 53 problems, with a number of variables comprised between 2 and 12. Computational code for the problems and the proposed initial points can be found at https://www.mcs.anl.gov/∼more/df.
Parameters in Algorithm 1 were set to the following values: ∆ = 10, for each iteration k, σ small = 0.1, η = 8, and α = 10 −4 . At each iteration, the process is initialized with the minimization of the quadratic model (9), with no regularization term, computed by selecting points in B(x k ; 1). In this case, no lower bound is considered when solving subproblem (10). If the sufficient decrease condition (12) is not satisfied by the computed solution, then the regularization process is initiated, considering σ k = σ small . This approach allows to take advantage of the local properties of the "pure" quadratic models. As stopping criteria we consider g k < , where = 10 −5 , or a maximum of 1500 function evaluations.
Regarding model computation, four variants were tested, depending on using fully-linear or fullyquadratic models and also on the value of p in the sufficient decrease condition used to accept new points at Step 4 of Algorithm 1. Fully-quadratic variant always computes a fully-quadratic model, built using (n + 1)(n + 2)/2 points, with a cubic sufficient decrease condition (p = 3). Fully-linear always computes a quadratic model, using n + 2 points, under a minimum Frobenious norm approach. In this case, the sufficient decrease condition considers p = 2. Hybrid versions compute fully-quadratic models, using (n + 1)(n + 2)/2 points or fully-linear minimum Frobenious norm models, with at least n + 2 points and a maximum of (n + 1)(n + 2)/2 − 1 points (depending on the number of points available in B(x k ; 1/σ k )). In this case, variant Hybrid p3 always uses a cubic sufficient decrease condition to accept new points, whereas variant Hybrid p23 selects a quadratic or cubic sufficient decrease condition, depending on the type of model that could be computed at the current iteration (p = 2 for fully-linear and p = 3 for fully-quadratic).
Results are reported using data profiles [23]. In a simplified way, a data profile provides the percentage of problems solved by a given algorithmic variant inside a given computational budget (expressed in sets of n p + 1 function evaluations, where n p denotes the dimension of problem p). Let S and P represent the set of solvers, associated to the different algorithmic variants considered, and the set of problems to be tested, respectively. If h p,s represents the number of function evaluations required by algorithm s ∈ S to solve problem p ∈ P (up to a certain accuracy), the data profile cumulative function is given by With this purpose, a problem is considered to be solved to an accuracy level τ if the decrease obtained from the initial objective function value (f (x 0 ) − f (x)) is at least 1 − τ of the best decrease obtained for all the solvers considered (f (x 0 ) − f L ), meaning: In the numerical experiments reported, the accuracy level was set equal to 10 −5 . Figure 1 reports the results obtained when considering different strategies for building the quadratic models. In this case, the stricter approach is used for solving subproblem (10), always imposing the lower bound for each entry of the vector y at each one-dimensional minimization.
It is clear that hybrid versions present the best performance and always using a fully-quadratic model shows the worst behavior. However, depending on the level of smoothness of the objective functions, the best hybrid version differs. Using a cubic sufficient decrease condition to accept new points corresponds to the variant with the best performance for the nonsmooth test set, whereas the version that adapts the type of sufficient decrease required, depending on the type of model built seems to be more adequate for smooth problems.
For the best variant in each of the two test sets, we considered the second approach to the solution of problem (10), where the lower bound constraint is initially ignored, being the computed solution y modified a posteriori, if it would not satisfy the lower bound constraint. We denote these variants by adding the word projection. Results can be found in Figure 2.  It is worthwhile to mention that, for both considered test sets, it was never required to modify the final solution computed by ignoring the lower bound constraint, which means that, in all our experiments by only taking into account the upper bound constraint, subproblems (10) were in fact fully solved. Thus, in the nonsmooth test set, differences between the two curves result from the stricter approach, which forces every component of the solution to satisfy the lower bound, being too conservative.

Conclusions and Final Remarks
We present and analyze a derivative-free separable regularization approach for solving smooth unconstrained minimization problems. At each iteration we build a quadratic model of the objective function using only function evaluations. Several variants have been considered for this task, from a less expensive minimum Frobenius norm approach, to a more expensive fully-quadratic model, or a hybrid version that combines the previous approaches depending on the number of available useful points from previous iterations.
For each one of the variants, we add to the model either a separable quadratic or a separable cubic regularization term to guarantee convergence to stationary points. Moreover, for each option we present a WCC analysis and we establish that, for driving the norm of the gradient below > 0, the fully-quadratic and the Frobenius norm regularized approaches need at most O n 19/4 −3/2 or O n 2 −2 function evaluations, respectively.
The application of a convenient change of variables, based on the Schur factorization of the approximate Hessian matrices, trivializes the computation of the minimizer of the regularized models required at each iteration. In fact, the solution of the subproblem required at each iteration is reduced to the minimization of n independent one-dimensional simple functions (a polynomial of degree 2 plus a term of the form |z| 3 ) on a closed and bounded set on the real line. It is worth noticing that, for the typical low-dimensions used in DFO, the computational cost of Schur factorizations is insignificant, as compared to the cost associated with the function evaluations required to build the quadratic model. Nevertheless, in addition to its use in [4], this separability approach can be extended to be efficiently applied in other large-scale scenarios, for example in inexact or probabilistic adaptive cubic regularization; see, e.g., [1,8].
We also present a variety of numerical experiments to add understanding and illustrate the behavior of all the different options considered for model computation. In general, we noticed that all the options show a robust performance. However, the worst behavior, concerning the required number of function evaluations, is consistently observed when using the fully-quadratic approach, and the best performance is observed when using the hybrid versions combined with a separable regularization term and with the less restrictive projection scheme for solving the subproblems.
Concerning the worst case complexity (WCC) results obtained for the considered approaches, a few comments are in order. Even though these results are of a theoretical nature and in general pessimistic in relation to the practical behavior of the methods, it is interesting to analyze which of the two considered approaches produces a better WCC result. For that, it is convenient to use their leading terms, i.e., n 19/4 −3/2 for the one using the fully-quadratic model and n 2 −2 for the one using the Frobenius norm model. After some simple algebraic manipulations, we obtain that for the fully-quadratic approach to be better (that is, to require fewer function evaluations in the worst case), it must hold that n < −2/11 or equivalently that < 1/n 11/2 . Therefore, if n is relatively small and is not very large (for example n ≤ 8 and = 10 −5 ) then the combined scheme that is based on the fully-quadratic model has a better WCC result than the scheme based on the minimum Frobenius norm approach. Notice that, indeed, in our numerical experiments the average dimension is 7 and for our stopping criterion we fix = 10 −5 , and hence from the theoretical WCC point of view, the best option is the one based on the fully-quadratic model. However, in our computational experiments the worst practical performance is clearly associated with the combination that uses the fully-quadratic model. We also note that if we choose a more tolerant stopping criterion (say = 10 −2 ), then for most of the same considered small dimensional problems we have that > 1/n 11/2 , and so the scheme that uses the minimum Frobenius norm model exhibits simultaneously the best theoretical WCC result as well as the best practical performance.