Approximate solution of system of equations arising in interior-point methods for bound-constrained optimization

The focus in this paper is interior-point methods for bound-constrained nonlinear optimization, where the system of nonlinear equations that arise are solved with Newton's method. There is a trade-off between solving Newton systems directly, which give high quality solutions, and solving many approximate Newton systems which are computationally less expensive but give lower quality solutions. We propose partial and full approximate solutions to the Newton systems. The specific approximate solution depends on estimates of the active and inactive constraints at the solution. These sets are at each iteration estimated by basic heuristics. The partial approximate solutions are computationally inexpensive, whereas a system of linear equations needs to be solved for the full approximate solution. The size of the system is determined by the estimate of the inactive constraints at the solution. In addition, we motivate and suggest two Newton-like approaches which are based on an intermediate step that consists of the partial approximate solutions. The theoretical setting is introduced and asymptotic error bounds are given. We also give numerical results to investigate the performance of the approximate solutions within and beyond the theoretical framework.


Introduction
This work is intended for bound-constrained nonlinear optimization problems on the form minimize where f : R n → R is twice continuously differentiable, ∇ 2 f (x) is locally Lipschitz continuous and l, u ∈ {R ∪ {−∞, ∞}} n are such that l < u. However, to make the work and its ideas more comprehensible, we initially describe the theoretical framework and the corresponding results for problems on the form minimize f (x) subject to x ≥ 0.
For completeness, analogous results for problems on the form of (NLP) together with complementary remarks are given in Appendix A.1. Bound-constrained optimization problems appear in many different applications and are frequently subproblems in augmented Lagrangian methods. For a general overview of solution methods, see [15] and e.g., the introduction in [18] for a thorough review of previous work. Common solution techniques are: active-set methods, which aim to determine the active constraints and solve a reduced problem with the inactive variables, e.g., [8,18]; methods involving projections onto the feasible set such as projected-gradient methods, e.g., [1,27], projected-Newton or trustregion methods, e.g., [2,6,7,22] and projected quasi-Newton methods, e.g., [4,21,34]. We are not aware of any primal-dual interior-point methods specialized for boundconstrained optimization except for more general methods, e.g., [9,12,[28][29][30]. Other techniques that are related to trust-region and interior methods are affine-scaling interior-point methods, which are based upon a reformulation of the first-order necessary optimality conditions combined with a Newton-like method, e.g., [5,19,20].
In contrast, we consider the classical primal-dual interior-point framework. This means solving or approximately solving a sequence of systems of nonlinear equations for which we consider Newton's methods as the model method. As interior methods converge, the Newton systems typically become increasingly ill-conditioned due to large diagonal elements in the Schur complement. This is not harmful for direct solvers but it may deteriorate the performance of iterative solvers. We propose a strategy for generating approximate solutions to Newton systems, which in general involves solving smaller systems of linear equations. In the ideal case, these systems do not become increasingly ill-conditioned due to the barrier parameter approaching zero. The specific approximate solutions, and the size of the system that needs to be solved at each iteration, are determined by estimates of the active and inactive constraints at the solution. However, in general these sets are unknown and have to be estimated as the iterations proceed. In this work we use basic heuristics to determine the considered sets but other approaches may also be used, e.g., approaches similar to those in [8,18]. In addition, we motivate and suggest two Newton-like approaches which utilize an intermediate step in combination with the solution of a Newton-like system. The intermediate step partially consists of the proposed partial approximate solutions.
The work is meant to contribute to the theoretical and numerical understanding of approximate solutions to systems of linear equations arising in interiorpoint methods. The approach is mainly intended for, but not limited to, boundconstrained problems, e.g., the work may also be interpreted in the framework of linear complementarity problems, see e.g., [32]. We envisage the use of the approximate solution procedure as an accelerator for a direct solver. In particular, when solving a sequence of Newton systems for a given value of the barrier parameter µ. E.g., when the direct solver and the approximate solution procedure can be run in parallel. To give an indication of the potential of the approximate solutions, we show numerical simulations on randomly generated problems as well as problems from the CUTEst test collection [16].
The manuscript is organized as follows; Section 2 contains a brief background to primal-dual interior-point methods and an introduction to the theoretical framework; in Section 3 we propose partial and full approximate solutions to Newton systems arising in interior-point methods, as well as motivate two Newtonlike approaches; Section 4 contains numerical results on convex bound-constrained quadratic optimization problems, both randomly generated and problems from the CUTEst test collection; finally in Section 5 we give some concluding remarks.

Background
We are interested in the asymptotic behavior of primal-dual interior-point methods in the vicinity of a local minimizer x * and its corresponding multipliers λ * . In particular, we assume that the iterates of the method converge to a vector x * T , λ * T T (x * , λ * ) that satisfies ∇f (x * ) − λ * = 0, (stationarity) (1a) x * ≥ 0, (feasibility) (1b) λ * ≥ 0, (non-negativity of multipliers) (1c) x * · λ * = 0, (complementarity) (1d) where "·" is defined as the component-wise operator and Z(x * ) is a matrix whose columns span the nullspace of the Jacobian corresponding to the constraints with a strictly positive multiplier, λ * . Equations (1a)-(1d) constitute first-order necessary optimality conditions for a local minimizer of (P). These conditions together with (1e) form second-order sufficient conditions [17]. For the theoretical framework we also assume that (x * , λ * ) satisfies (1f). We are particularly interested in the function F µ : R 2n → R 2n defined by where µ ∈ R is the barrier parameter, X ∈ R n×n , Λ ∈ R n×n , X = diag(x), Λ = diag(λ) and e is a vector of ones of appropriate size. A vector (x, λ) with x ≥ 0, λ ≥ 0 and F µ (x, λ) = 0 for µ = 0 satisfies the first-order optimality conditions (1a)-(1d). Primal-dual interior-point methods aim to solve or approximately solve F µ (x, λ) = 0 for a decreasing sequence of µ > 0, while maintaining x > 0 and λ > 0. This is typically done with Newton-like methods which means solving a sequence of systems of linear equations on the form where F ′ : R 2n → R 2n is the Jacobian of F µ . The Jacobian is given by where H = ∇ 2 f (x) and the subscript µ is omitted since F ′ is independent of the barrier parameter. For each µ, iterations are performed until a specified measure of improvement is achieved, thereupon µ is decreased and the process is repeated. A natural measure in our setting is F µ (x, λ) 2 where F µ (x, λ) 2 = 0 gives the exact solution. To improve efficiency many algorithms seek approximate solutions, a basic condition for the reduction of µ is F µ (x, λ) 2 < µ [24,Ch. 17,p. 572].
Herein, we consider a possibly weaker version, namely F µ (x, λ) 2 < C 1 µ for some constant C 1 > 0. Moreover, it will throughout be assumed that all considered vectors (x, λ) satisfy x > 0 and λ > 0. The subscript in the norms will hereafter be omitted since all considered norms in this work are of type 2-norm.
The following two results provide the theoretical framework and additional definitions of various quantities. In particular, the existence of a neighborhood where the Jacobian is nonsingular and there exists a Lipschitz continuous barrier trajectory which is parameterized by the barrier parameter µ. The results are well known and can be found in e.g., the work of Ortega and Rheinboldt [26] and Byrd, Liu and Nocedal [3] whose setting is similar to the one in this work.
The next lemma relates the measure F µ (x, λ) to the distance between the barrier trajectory and vectors (x, λ) that are sufficiently close. An analogous result is given by Byrd, Liu and Nocedal [3].
Recall that the reduction of µ can be determined with the condition F µ (x, λ) < C 1 µ, for some constant C 1 > 0. It can be shown that vectors (x, λ), which satisfy this condition and are sufficiently close to the barrier trajectory, have their individual components bounded within certain intervals at sufficiently small µ. The individual components can be partitioned into two sets of indices which depend on how close the iterate is to its feasibility bound, see Definition 2.6. The order of magnitude of the individual components, which are given in Lemma 2.7 below, will be of importance in the derivation of various approximate solutions to (2 Proof. Under Assumption 1 it holds that where c i = Θ(1), i = 1, . . . , n. The function (x µ , λ µ ) is Lipschitz continuous and hence for each µ ≤μ it holds that ( There existμ 1 , with 0 <μ 1 ≤μ, such that for 0 < µ ≤μ 1 it holds that Similarly here, there existsμ 2 , with 0 <μ 2 ≤μ, such that the result follows for 0 < µ ≤μ withμ = min{μ 1 ,μ 2 }. The result of Lemma 2.7 shows two regions which depend on µ. The first region, 0 < µ ≤μ, defines where the barrier trajectory (x µ , λ µ ) exists and the second region, 0 < µ ≤μ ≤μ, defines where asymptotic behavior occurs.

Approximate solutions
This section initially contains an introduction to the groundwork of the ideas which precede the results. It is followed by a subsection that contains approximate solutions for specific components of the solution of (2) together with related results. The last subsection contains procedures for approximating the full solution of (2), as well as related results. Under Assumption 1 it holds that lim µ→0 x µ i = 0, i ∈ A, and lim µ→0 λ µ i = 0, i ∈ I, in consequence, the Schur complement of X in (2) becomes increasingly illconditioned as µ → 0. These properties have been utilized by several authors before, e.g., in the development of preconditioners [10,13]. The idea in this work is to exploit them and the additional property that (P) only has bound constraints to obtain partial or full approximate solutions of (2). In particular, by utilization of structure and the asymptotic behavior of coefficients in the arising systems of linear equations. With the partition (∆x N , where the first and second set in the matrix subscripts give the indices of rows and columns respectively. The Schur complement of X AA and X II in (5) is By continuity of (x µ , λ µ ) it follows that x i → 0, i ∈ A, and λ i → 0, i ∈ I, as µ → 0. In consequence, X II and Λ AA dominate the coefficients of the third and fourth block of (5) for sufficiently small µ under strict complementarity. Similarly X −1 AA Λ AA dominates the coefficients of the first block of (6). Consequently, approximate solutions of ∆x N A and ∆λ N I can be obtained from the third and fourth block of (5), and of ∆x N A from the first block of (6). These approximates can then be inserted into (5), or (6), to obtain a reduced system of size |I|×|I| that involves H II . The solution of this system gives an approximation of ∆x N I . These observations together with Lemma 2.7 and Lemma 3.1 below provide the foundation for the results. The essence of Lemma 3.1 is that the norm of the solution of (2) is bounded by a constant times µ.

Partial approximate solutions
In this section we initially propose an approximate solution of ∆x N A which originates from the Schur complement form (6). This approximate solution will be labeled with superscript "S" due to its origin. As µ → 0, the diagonal elements of the (1,1)-block become large and dominate the coefficients of the matrix under strict complementarity. In Proposition 3.2 we show that an approximate solution of ∆x N A can be obtained by neglecting all off-diagonal coefficients in the the first block of (6). Thereafter, we propose another approximate solution of ∆x N A , as well as one of ∆λ N I , which originate from the complementarity blocks of (5). These approximate solutions will be labeled with superscript "C" due to their origin. The solutions are obtained by neglecting the coefficients in the complementarity blocks which approach zero as µ → 0, i.e., those in X AA and Λ II . The resulting partial approximate solutions are given below in Proposition 3.3. The essence of both results is that, under certain conditions, the asymptotic component error bounds are in the order of µ 2 . Finally we motive and propose two Newton-like approaches which we later on investigate numerically.
The approximate solution ∆x S in (7) of Proposition 3.2 and its corresponding component error (8) may be undefined for certain components. However, the essence is that the expressions are well-defined sufficiently close to the barrier trajectory for sufficiently small µ, as shown by (9). In particular the component errors of (10) are bounded by a constant times µ 2 only for components i ∈ A, although the expressions (7) and associated errors (8) hold for all components i = 1, . . . , n. An approximate solution that is guaranteed to have all its components well-defined can be obtained from the complementarity blocks of (5). This approximate solution, and in addition an approximate solution of ∆λ N I , are given in the proposition below.
The expressions for ∆x C i and ∆λ C i , (13a) and (13b) respectively, and their associated component errors (14a) and (14b) respectively, hold for all components. The essence of the results in Proposition 3.3 is that the component errors are bounded by a constant times µ 2 only for certain components. Specifically, for ∆x C i , i ∈ A, and ∆λ C i , i ∈ I. Both ∆x S i given by (7) and ∆x C i given by (13a) provide approximate solutions of ∆x N i , i ∈ A, with similar asymptotic error bounds. Note that the order of the approximation error, ∆x A −∆x N A , is maintained even if some components i ∈ A are updated with (7) and others with (13a). Which expression to use can hence be chosen individually for each index i ∈ A. The factors in front of ∆x N i and ∆λ N i , i = 1, . . . , n, in the component errors of (8) and (14) respectively may be used as an indicator for which of the approximations to use, and also whether either expression is likely to provide an accurate approximation. Note also that the approximate solution ∆x C given by (13a) does not take into account any information from the first block equation of (2), whereas ∆x S given by (7) includes information from both blocks.
Provided that the norm of the combined steps ∆x N A and ∆λ N I is not smaller than the approximation error, then stepping in these components with (7) or (13) give a vector which is not further from the Newton iterate. This is formalized in Proposition 3.4 below.
with ∆x C i , ∆λ C i and ∆x S i given by (13) and (7) respectively. Assume that Proof. With (∆x, ∆λ) defined as in (17) of the proposition it holds that By Proposition 3.2 and Proposition 3.3 there existsμ 5 andμ 6 respectively, with 0 <μ i ≤μ, i = 5, 6 such that for ∆x i equal to ∆x S i or ∆x C i it holds that The right-hand side of (20) is non-positive for 0 < µ ≤ ( The partial approximate solution (17) of Proposition 3.4 is computationally inexpensive compared to solving (2). In consequence, (18) motivates the study of Newton-like approaches which make use of (17). We will construct two such approaches where the idea is to utilize the intermediate iterate with (∆x E , ∆λ E ) as in (17). It is thus only the active components of x and inactive components of λ that is updated in the step to (x E , λ E ). For simplicity we describe the ideas for unit step length, in practice the iterates would be required to be strictly feasible.
The first approach is based on the fact that solving a Newton system from the iterate (x E , λ E ) provides potential improvement, provided that (x E , λ E ) is strictly feasible and lies in B((x * , λ * ), δ). A full iteration in the approach consists of the approximate intermediate step (21) together with the solution of and the step (x E + ∆x, λ E + ∆λ). The idea of the second approach is to update the coefficients in the complementarity blocks of the matrix in (2). The approach may hence under strict complementarity be interpreted as an approximate higher-order method. A full iteration consists of the step (21), the solution of where The approach may hence also be interpreted as a modified Newton method where the Jacobian of each Newton system is altered.
Numerical results for the approximate intermediate step and the approximate higher-order approach are shown in Section 4. The results are for bound-constrained quadratic optimization problems where strict complementarity typically does not hold. The complexity of each iteration in both approaches is the same as with Newton's method. The hope is thus to reduce the total number of iteration necessary for convergence. See the work by Gondzio and Sobral [14] for quasi-Newton approaches for quadratic problems where each iteration is inexpensive in comparison to the approaches above.

Full approximate solution
In this section we propose approximate solutions of (2) that, in the considered framework, have an asymptotic error bound in the order of µ 2 . The full approximate solutions are obtained by utilizing either of the partial approximate solutions of ∆x N A in Proposition 3.2 or Proposition 3.3 while exploiting structure in the systems that arise. Specifically, suppose that an approximate ∆x A is given, e.g., ∆x S A given by (7) or ∆x C A given by (13a). Insertion of the approximate ∆x A into (5) yields  where the solution is given the superscript "ls" since it will lead to a least squares system. The second and fourth block of (24) provide unique solutions of ∆x ls I and ∆λ ls I which satisfy The solution of (25) can be obtained by first solving with the Schur complement of and then ∆λ ls Note that (26) can also be obtained by insertion of the given ∆x A into the second block of (6). The matrix of (26) is by Assumption 1 a symmetric positive definite (|I|×|I|)-matrix. Moreover, the matrix does not become increasingly illconditioned due to large elements in X −1 Λ, under strict complementarity as µ → 0, in contrast to the matrix of (6). The remanding part of the solution of (24), that is ∆λ ls A , is then given by If the approximate ∆x A is exact, i.e., if ∆x A = ∆x N A , then ∆x ls I = ∆x N I by (26). In consequence, the over-determined system (28) has a unique solution that satisfies all equations, i.e., ∆λ ls A is the corresponding part of the solution to (2). The solutions corresponding to the first and second block equation of (28) will be assigned superscripts "b" and "−" respectively. These are given by and Alternatively, ∆λ ls A can be obtained as the least squares solution of (28) that is ∆λ ls In Theorem 3.5 it is shown that, under certain conditions, both ∆λ b A given by (29a) and ∆λ ls A given by (30) can be used to approximate ∆λ N A without affecting the order of the asymptotic error. Note however that this is not true for ∆λ − A given by (29b) due to the last term that contains X −1 AA in combination with approximation error.
where ∆x S i is given by (7), ∆x C i by (13a), ∆x ls i by (26), ∆λ ls i by (30), ∆λ b i by (29a), ∆λ ls i by (27) and ∆λ C i by (13b). Assume that 0 < µ ≤μ and (x, λ) is sufficiently close to (x µ , λ µ ) ∈ B ((x * , λ * ), δ) such that F µ (x, λ) = O(µ). Then there existsμ, with 0 <μ ≤μ, such that for 0 < µ ≤μ it holds that Proof. Similarly as in the proof of Proposition 3.4. By Proposition 3.2 and Proposition 3.3 there existsμ 5 andμ 6 respectively, with 0 <μ i ≤μ, i = 5, 6 such that , i ∈ I, 0 < µ ≤μ 6 . The backward error with ∆x ls I as given in (26) is Due to the assumption on f the elements of H IA are bounded. Moreover, the smallest singular value of H II + X −1 II Λ II is bounded away from zero since the matrix is positive definite by Assumption 1. Hence it follows that ∆x ls The the largest singular value of I AA + X 2 AA −1 is bounded by 1 and hence ∆λ ls The elements of H AA and H AI are bounded and by Lemma 2.7 it holds that X AA Λ AA = O(µ), 0 < µ ≤ max{μ 5 ,μ 6 }. Thus it follows that ∆λ ls Similarly, (29a) gives the backward error Thus the result holds forμ = min{μ 5 ,μ 6 }.
Information is discarded in the calculation of the components ∆x S i , ∆x C i , i ∈ A, and ∆λ C i , i ∈ I, with (7) and (13) respectively. The equations for the approximate solution in Theorem 3.5 show that it is essential to obtain a good approximate solution of ∆x N A . It is the error in the approximate solution of ∆x N A that propagates through the suggested solutions labeled with ls and b. In contrast to all other components of the proposed full approximate solution, ∆λ ls i , i ∈ I, actually have asymptotic component error bounds in the order of magnitude µ 3 , as can be seen in the proof of Theorem 3.5.
In general the active and inactive sets at the optimal solution are unknown and have to be estimated as the iterations proceed. The quality of the approximate solution of ∆x N A will hence also depend on these estimates. There is a trade-off when estimating the set of active constraints. A restrictive strategy may lead to a more accurate approximate ∆x A . However, it increases the cardinality of the inactive set and in consequence the size of the system (26) that needs to be solved at each iteration. In theory, the cardinality of the inactive set is determined by the number of inactive constraints at the solution of the specific problem, whereas in practice it is determined by the estimate. The size of the system that needs to be solved at each iteration may thus range from 0 to n. A restrictive strategy may also increase the size of some coefficients in the diagonal of the matrix of (26), or (A.13) in the general case, which may increase the condition number. A generous strategy on the other hand, decreases the size of the system that has to be solved but may increase the error in the approximate ∆x A , which then propagates to other components of the approximate solution. In the ideal case with the true inactive set, then (25) and (26) are composed of the inactive parts of (2), or equivalently (5), and (6) respectively. Consequently, the inactive part of the Schur complement in (26) does not become increasingly ill-conditioned due to µ approaching zero, in contrast to the complete Schur complement in (6). However, in practice the behavior will be dependent on an estimate of the inactive set.
Note also that the system that needs to be solved for the full approximate solution has the same structure as the original one. In consequence, our analysis may be interpreted in the framework of previous work on stability and effects of finite-precision arithmetic for interior-point methods, e.g., [11,[31][32][33]. In the case of quadratic problems, see also [23].
To increase the comprehensibility of the work we have described the theoretical foundation for problems on the form (P). Analogous results for problems on the more general form (NLP) together with complementary remarks are given in Appendix A.1.

Numerical results
As an initial numerical study we consider convex quadratic optimization problems with lower and upper bounds. In particular, randomly generated problems and a selection from the corresponding class in the CUTEst test collection [16]. The minimizers of the randomly generated problems satisfy strict complementarity, whereas the minimizers of the CUTEst problems typically do not. The simulations were done in Julia and all systems of linear equations were solved by its built-in solver. Moreover, the benchmark problems were initially processed using the packages CUTEst.jl and NLPmodels.jl by Orban and Siqueira [25].
The purpose of the first part of this section is to compare the proposed approximate solutions in Theorem A.6. The intent is also to give a rough indication of how the approximation errors develop for practical values of µ. A setting is considered where the vector (x, λ), that satisfies F µ (x, λ) < µ, is found by an interior-point method. Thereafter, µ is decreased by a factor σ = 0.1 to µ + = σµ and the approximate solution of (2) is calculated. This procedure was then repeated for different values of µ. Mean errors with one standard deviation error bars for the proposed approximate solutions are shown in Figure 1. As mentioned, the results are for the approximate solutions given in Theorem A.6 of Appendix A.1 since the problems in general include lower and upper bounds. In order to avoid double subscripts in the approximates, we have throughout this section omitted the second subscript. Furthermore, ∆x S A was used in the equations which require an initial approximation of ∆x N A . Figure 1 also shows the mean improvement in terms of the measure F µ + for two new iterates (x S + , λ S + ) and (x C + , λ C + ) defined by with step lengths α P and α D as in Algorithm 1. Specifically, the search direction is composed of (A.5) or (A.6) combined with (A.13), (A.17) and (A.14). The figure also contains the mean improvement of the Newton iterate (x N + , λ N + ), which is defined analogously. The results are for 10 2 randomly generated problems, with 10 3 variables, whose minimizers satisfy (A.1). For each problem, both the specific bounds as well as the specific active and inactive constraints were chosen by random. Moreover, the elements of the Hessian were uniformly distributed around zero with a sparsity level corresponding to approximately 40 percent non-zero elements. The condition numbers were in the order of magnitude 10 7 -10 10 and the largest singular values in the order of magnitude of 10 3 . The least accurate approximate solutions in Figure 1 are those corresponding to active λ and inactive x. This is anticipated as their error bounds rely more heavily on the size of the elements of H. Moreover, it can be seen that ∆λ ls I is favorable over ∆λ C I for the problems considered. This is anticipated as ∆λ ls I has asymptotic error bounds in the order of magnitude µ 3 , in contrast to the bounds corresponding to ∆λ C I which is in the order of magnitude µ 2 , as mentioned in Section 3.2. In general, Figure 1 gives an indication of what equation that is favorable for each partial approximate solution if one is to be chosen. However, as mentioned, more sophisticated choices can be made by carefully considering the known quantities in the individual error terms for specific components. The right side of Figure 1 shows that the iterates (x S + , λ S + ) and (x C + , λ C + ) perform similar to (x N + , λ N + ) in terms of the measure F µ + for a wide range of µ. The error bars show that the results are not sensitive to changes in specific bounds, which of the constraints are active/inactive or different initial solutions. Numerical simulations have shown, as the theory also predicts, that the results can be improved (or dis-improved) by increasing (or decreasing) the size of the coefficients of the matrix H as well as its sparsity level.
Next we show results for a selection of problems in the CUTEst test collection in the analogous setting. In the problems with variable options, the number of primal variables, n x , was typically chosen to approximately 5000, resulting in a total number of primal-dual variables in the order of 10 4 . The number of primal variables of each specific problem is shown in Table 1. Each problem was initially solved by an interior-point method with stopping criterion F 0 (x, λ) < 10 −14 , i.e., the first-order optimality conditions given by (A.2) for µ = 0. This was to determine the selection of problems as well as estimates of the active and inactive sets. Problems with an unconstrained optimal solution or an optimal solution with only degenerate active constraints were not considered. In the first case the proposed approximate solutions are equivalent to the true solution. In the second case it is not clear how to deduce active/inactive sets. A constraint was considered as active if the corresponding variable was closer than 10 −10 to its bound. An active constraint was deemed degenerate if the corresponding multiplier value was below 10 −6 . An exception was made for problem ODNAMUR, due to its larger size, for which the tolerances above were increased by a factor of 10 1 and 10 2 .  The partial approximate solution errors in Figure 2 are significantly larger compared to those of Figure 1. This is expected since the optimal solutions of the CUTEst test problems typically do not satisfy strict complementarity. Moreover, with the above strategy for determining the active and inactive sets, the smallest active multipliers may be in the order of 10 −5 . Small active multipliers may cause inaccurate components in the approximate solution of ∆x N A . Nevertheless, the approximate solutions perform asymptotically similar to the Newton solution in terms of the measure F µ + , as shown in Figure 2. The figure also shows that the approximation error and the progress measure are not particularly sensitive to different initial solutions for smaller µ, whereas some effects can be seen for larger µ. The results may be improved and dis-improved depending on how the estimation of the active constraints at the solution is made. We chose to give the results for the strategy described above which gives a potentially significant reduction in the computational iteration cost.
In practice the active constraints at the optimal solution are unknown and have to be estimated as the iterations proceed. The purpose of the following simulations is to give an initial indication of the performance of the proposed approximate solutions within a primal-dual interior-point framework. In particular, we focus on the behavior on problems that do not satisfy the assumptions for which the theoretical results are valid, but also on the robustness in regards to how the set of active constraints is estimated. Algorithm 1 and 2 were considered with the aim of not drowning, or combining, approximation effects with other effects from more advanced features in more sophisticated methods. Algorithm 1 should here be seen as the reference method as it only contains Newton steps.

Algorithm 1 Reference interior-point method for convex (NLP).
Estimate active constraints to obtain active/inactive-sets (∆x k , ∆λ k ) ← (A.5) or (A.6) combined with (A.13), (A.14) and (A.17) At iteration k of Algorithm 1 and Algorithm 2, α P max,k and α D max,k are the maximum feasible step lengths for x k along ∆x k and λ k along ∆λ k respectively. Table 1 contains a comparison of Algorithm 1 and two versions of Algorithm 2 which differ in how ∆x A is computed. The versions are denoted by aN S and aN C as they use the approximates ∆x S A and ∆x C A respectively. In Algorithm 2, a constraint was considered active if the distance to its bound was smaller than the value of its multiplier and a threshold τ A . The procedure is thus a basic heuristic aimed at determining the non-degenerate active constraints. In essence, the heuristic gives an estimate of set A x , compare to Definition A.1 in the theoretical setting. The thresholds of the two versions aN S and aN C were chosen to τ A = µ 2/3 and the more restrictive τ A = µ 3/4 respectively. This was done to show the effects of two different thresholds τ A , but also because numerical experiments have shown that steps with Schur-based approximation are more robust at larger µ, see Figure 2. Table 1 gives a comparison of the number of iterations for different values of µ as well as the average cardinality of I x , the set of indices corresponding to the estimated inactive components of x, i.e., the size of the systems that has to be solved in every iteration. The symbol "-" denotes the situation when the method failed to converge within 50 iterations for the corresponding µ. If the method failed at a specific µ then Newton steps were performed instead until F µ (x, λ) < µ. The order of the problems is the same as in Figure 2.    1  1  2  3  3  3  3  2  2  aN S  1  1  2  3  3  4  3  2  2  aN C  29  ---3  3  3  2 N  1  1  2  2  3  3  3  3  3  aN S  1  1  2  3  3  3  3  3  3  aN C  29  ---3  3  3  3   The results in Table 1 display similar characteristics as the results in Figure 2. The version associated with the Schur-based approximate solution, aN S of Algorithm 2, makes sufficient progress at µ ∈ [10 2 , 10 −2 ), often at a relatively low computational cost. Version aN S converges at µ ∈ [10 −2 , 10 −6 ], however, often while solving relatively large systems due to the difficulty of estimating A x . At µ ∈ (10 −6 , 0) the asymptotic behavior becomes more pronounced. Consequently, aN S does similar in terms of iteration count to Algorithm 1 while solving systems of reduced size. Version aN S converges at all considered µ in all problems of Table 1, except on HARKERP2 for larger µ. The version associated with the complementaritybased approximate solution, aN C of Algorithm 2, tend to perform poorly overall for µ ∈ [10 2 , 10 −2 ) and parts of [10 −2 , 10 −6 ]. Although aN C converges for large µ, this is often at the expense of either solving relatively large systems or performing many iterations. In general, aN C performs similar to Algorithm 1 for µ in the approximate * The tables would be identical for other versions of the same problem and are therefore omitted region [10 −5 , 0) while solving systems of reduced size. The versions aN S and aN C have similar asymptotic performance, however in general, aN S performs better for larger values of µ, as also indicated by previous results in Figure 2.
Finally we show results for the two Newton-like approaches, mentioned in Section 3.1, in a simple primal-dual interior-point setting. The approximate intermediate step method and the approximate higher-order method are described in Algorithm 3 and Algorithm 4 respectively. In contrast to Section 3.1, here the intermediate iterate is required to be strictly feasible. The total number of iterations required at different intervals of µ with the two Newton-like approaches is shown in Figure 3. The figure shows results for three different choices of (∆x E , ∆λ E ). Moreover, the selection of which components to update was done as the iterations proceeded similarly as above. Note however that it is not necessary to label each constraint and each component of λ as active or inactive in this case, some may be defined as neither. The set of indices corresponding to active constraints, A x , was estimated as above and the sets of indices corresponding to inactive λ, I l and I u , see Definition A.1, were estimated analogously. I.e., a multiplier was considered inactive if its value was smaller than the distance of the corresponding x to its feasibility bound and a threshold τ I . Table 2 shows how the nonzero components of (∆x E , ∆λ E ) were chosen in the different versions of the approaches as well as the different thresholds τ A and τ I . Table 2: Thresholds and nonzero components of the steps to (x E , λ E ) for the three versions compared in Figure 3.

Nonzero components in (∆x
Algorithm 3 Simple interior-point method with an approximate intermediate step for convex (NLP).
Estimate active constraints to obtain active/inactive-sets

Algorithm 4
Simple interior-point method with approximate higher-order solve for convex (NLP).
Estimate active constraints to obtain active/inactive-sets In Algorithm 3 and Algorithm 4 at iteration k, α P max,k , α D max,k , α E,P max,k and α E,D max,k are for the prescribed steps defined analogously as in Algorithm 1. The total iteration count for µ ∈ [10 1 , 10 −10 ] in Figure 3 shows that the approximate higher-order approach requires the same, or fewer iterations, compared to the approach with the approximate intermediate step. The iteration count for the approaches with the Schur-based approximate is similar to that of Algorithm 1 for this range of µ. Also here, numerical experiments show indications of three regions. For µ in the approximate region [10 1 , 10 −2 ), the versions with the Schur-based approximate yield a potentially reduced number of iterations. Their performance varies in the region of intermediate sized µ. However, it can not be discarded that this is an effect of the relatively simple set estimation heuristics. On all problems, with the exception of ODNAMUR, in Figure 3 for µ ∈ [10 −5 , 10 −10 ] all versions of both approaches give an iteration count less or equal to Algorithm 1, hence providing potential savings in computation cost. The results may be improved with a flexible set estimation heuristics, e.g., more restrictive thresholds for intermediate sized µ. However, we chose not to include another layer of detail and instead give the results for a relatively simple setting to obtain an initial evaluation of the potential performance.

Conclusions
In this work we have given approximate solutions to systems of linear equations that arise in interior point methods for bound-constrained optimization; in particular, partial approximate solutions, where the asymptotic component error bounds are in the order of µ 2 , and full approximate solutions with asymptotic error bounds in the order of µ 2 . Numerical simulations on randomly generated bound-constrained convex quadratic optimization problems, whose minimizers satisfy strict complementarity, have shown that the approximate solutions perform similarly to Newton solutions for sufficiently small µ. Simulations on convex bound-constrained quadratic problems from the CUTEst test collection, whose minimizers typically do not satisfy strict complementarity, has shown that the predicted asymptotic behavior still occurs, however at significantly smaller values of µ.
We have performed numerical simulations in a simple yet more realistic setting. Specifically, in a primal-dual interior-point framework where the active and inactive sets were estimated with basic heuristics as the iterations proceeded. These simulations were done on a selection of CUTEst benchmark problems. The results showed that the behavior roughly varied with three regions determined by the size of µ. The Schur-based approximate solutions showed potential in the region for larger µ, in the region of intermediate sized µ the performance varied, partly due to difficulties in determining the active and inactive sets. For sufficiently small µ the approximate solutions showed performance similar to our reference method while solving systems of reduced size.
Finally we showed numerical results for two Newton-like approaches, which include an approximate intermediate step consisting of partial approximate solutions, on the considered CUTEst benchmark problems. The simulations showed similar characteristics as the previous results and also a potential for reducing the overall iteration count of interior-point methods.
The results of this work are meant to contribute to the theoretical and numerical understanding for approximate solutions to systems of linear equations that arise in interior-point methods. We hope that the work can lead to further research on approximate solutions and approximate higher-order methods for optimization problems with linear inequality constraints.
where ∇xL(x, λ) Ax = ∇f (x) Ax − λ l Ax + λ u Ax . If the approximate ∆x Ax is exact then so is ∆x ls Ix by (A.13). In consequence, the over-determined system (A.15) has a unique solution that satisfies all equations, i.e., ∆λ ls Ax , or equvalently ∆λ l,ls A l and ∆λ u,ls Au since Ax = A l ∪ Au, are the corresponding parts of the solution to (2). The solutions corresponding to the first and second block equation of (A.15) will be labeled with superscript "b" and "−" respectively. These are given by −I AxA l I AxAu ∆λ l,b