Abstract
The conic bundle implementation of the spectral bundle method for large scale semidefinite programming solves in each iteration a semidefinite quadratic subproblem by an interior point approach. For larger cutting model sizes the limiting operation is collecting and factorizing a Schur complement of the primaldual KKT system. We explore possibilities to improve on this by an iterative approach that exploits structural low rank properties. Two preconditioning approaches are proposed and analyzed. Both might be of interest for rank structured positive definite systems in general. The first employs projections onto random subspaces, the second projects onto a subspace that is chosen deterministically based on structural interior point properties. For both approaches theoretic bounds are derived for the associated condition number. In the instances tested the deterministic preconditioner provides surprisingly efficient control on the actual condition number. The results suggest that for large scale instances the iterative solver is usually the better choice if precision requirements are moderate or if the size of the Schur complemented system clearly exceeds the active dimension within the subspace giving rise to the cutting model of the bundle method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In semidefinite programming the ever increasing number of applications [3, 7, 27, 40] leads to a corresponding increase in demand for reliable and efficient solvers for linear programs over symmetric cones. In general, interior point methods are the method of choice. Yet, if the order of some semidefinite matrix variables gets large and the affine matrix functions involved do not allow to use decomposition or factorization approaches such as proposed in [5, 8, 31], general interior point methods are no longer applicable. The limiting factors are typically memory requirements and computation times connected with forming and factorizing a Schur complemented system matrix of the interior point KKT system. Large scale second order cone variables do not cause such problems, this is indeed specific to semidefinite settings. In such cases, the spectral bundle method of [26] offers a viable alternative.
The spectral bundle method reformulates the semidefiniteness condition via a penalty term on the extremal eigenvalues of a corresponding affine matrix function and assumes these eigenvalues to be efficiently computable by iterative methods. In each step it selects a subspace close to the current active eigenspace. Then the next candidate point is determined as the proximal point with respect to the extremal eigenvalues of the affine matrix function projected onto this subspace. The proximal point is the optimal solution to a quadratic semidefinite subproblem whose matrix variable is of the order of the dimension of the approximating subspace. If the subspace is kept small, this allows to find approximately optimal solutions in reasonable time. In order to reach solutions of higher precision it seems unavoidable to go beyond the full active eigenspace [10, 23]. In the current implementation within the callable library ConicBundle [19], which also supports second order cone and nonnegative variables, the quadratic subproblem is solved by an interior point approach. Again for each of its KKT systems the limiting work consists in collecting and factorizing a Schur complement matrix whose order is typically the square of the dimension of the active eigenspace. The main question addressed here is whether it is possible to stretch these limits by developing a suitably preconditioned iterative solver that allows to circumvent the collection and factorization of this Schur complement. The focus is thus not on the spectral bundle method itself but on solving KKT systems of related quadratic semidefinite and more generally quadratic conic programs by iterative methods. While the motivating and most general semidefinite case dominates in this work, natural extensions to second order and nonnegative cones will also be mentioned, because future applications may well expect and require support for arbitrary combinations of conic variables. Even though the methodology will be developed and discussed for low rank properties that arise in the ConicBundle setting, some of the considerations and ideas should be transferable to general conic quadratic optimization problems whose quadratic term consists of a positive diagonal plus a low rank Gram matrix or maybe even to general positive definite systems of this form.
Here is an outline of the paper and its main contributions. Section 2 provides the necessary background on the bundle philosophy underlying ConicBundle and derives the KKT system of the bundle subproblem. The core of the work is presented in Sect. 3 on low rank preconditioning for a Grammatrix plus positive diagonal. For slightly greater generality, denote the cone of positive (semi)definite matrices of order m by \({\mathbb {S}}^m_{++}\ ({\mathbb {S}}^m_+)\) and let the system matrix be given in the form
where it is tacitly assumed that \(D^{1}\) times vector and V times vector are efficiently computable. Typically \(n\le m\) but whenever n is sizable one would like to approximate V by a matrix \({{\hat{V}}}\in {\mathbb {R}}^{m\times k}\) with significantly smaller \(k<n\) to obtain a preconditioner \({{\hat{H}}}=D+ {{\hat{V}}}{{\hat{V}}}^\top \) whose inverse, by a low rank update, reads \({{\hat{H}}}^{1}=D^{1}D^{1}{{\hat{V}}}(I_k+{{\hat{V}}}^\top D^{1}{{\hat{V}}})^{1}{{\hat{V}}}^\top D^{1}\). Comparing this to the inverse of H, the goal is to capture the large eigenvalues of \(V^\top D^{1}V\), more precisely the directions belonging to large singular values of \(D^{\frac{1}{2}}V\). By the singular value decomposition (SVD) this can be achieved by the projection onto a subspace, say \(D^{\frac{1}{2}}VP\) for a suitably chosen orthogonal \(P\in {\mathbb {R}}^{n\times k}\). Because the full SVD is computationally too expensive, two other approaches will be developed and analyzed here. In the first, Sect. 3.1, the orthogonal P is generated by a Gaussian matrix \(\Omega \in {\mathbb {R}}^{n\times k}\). In the second, Sect. 3.2, some knowledge about the interior point method leading to V will be exploited in order to form P deterministically.
The projection onto a random subspace may be motivated geometrically by interpreting the Gram matrix \(VV^\top \) as the inner products of the row vectors of V. The result of Johnson–Lindenstrauss, cf. [1, 9], allows to approximate this with low distortion by a projection onto a low dimensional subspace. In matrix approximations this idea seems to have first appeared in [35]. In connection with preconditioning a recent probabilistic approach is described in [28] in the context of controlling the error of a LU preconditioner. [17] gives an excellent introduction to probabilistic algorithms for constructing approximate matrix decompositions and provides useful bounds. Building directly on their techniques we provide deterministic and probabilistic bounds on the condition number of the random subspace preconditioned system in Theorems 6 and 7. In comparison to the moment analysis of the Ritz values of the preconditioned matrix presented in Theorem 4, the bounds seem to fall below expectation and are maybe still improvable. Random projections do not require any problem specific structural insights, but it remains open how to choose the subspace dimension in order to obtain an efficient preconditioner.
In contrast, identifying the correct subspace seems to work well for the deterministic preconditioning routine. It exploits structural properties of the KKT system’s origin in interior point methods. Within interior point methods iterative approaches have been investigated in quite a number of works, in conjunction with semidefinite optimization see e.g. [32, 39]. These methods were mostly designed for exploiting sparsity rather than low rank structure. During the last months of this work an approach closely related to ours appeared in [16]. It significantly extends ideas of [41] for a deterministic preconditioning variant. It assumes the rank of the optimal solution to be known in advance and provides a detailed analysis for this case. Their ideas and arguments heavily influenced the condition number analysis of our approach presented in Theorems 2 and 9. In contrast to [16], our algorithmic approach does not require any a priori knowledge on the rank of the optimal solution. Rather, Theorem 9 and Lemma 12 motivate an estimate on the singular value induced by certain directions associated with active interior point variables, that seems to offer a good indicator for the relevance of the corresponding subspace.
In Sect. 4 the performance of the preconditioning approaches is illustrated relative to the direct solver on sequences of KKT systems that arise in solving three large scale instances within ConicBundle. The deterministic approach turns out to be surprisingly effective in identifying a suitable subspace. It provides good control on the condition number and reduces the number of matrix vector multiplications significantly. The selected instances are also intended to demonstrate the differences in the potential of the methods depending on the problem characteristics. Roughly, the direct solver is mainly attractive if the model is tiny, if significant parts of the Schur complement can be precomputed for all KKT systems of the same subproblem or if precision requirements get exceedingly high with the entire bundle model being strongly active. In general, however, the iterative approach with deteriministic preconditioner can be expected to lead to significant savings in computation time in large scale applications. In order to demonstrate that this KKT systems based analysis suitably reflects the performance of the solvers within the bundle method, the section closes with reporting preliminary experiments on randomly generated MaxCut instances where ConicBundle is run for each solver separately with exactly the same parameter settings that were developed and tuned for the direct solver. In Sect. 5 the paper ends with some concluding remarks.
Notation. For matrices or vectors \(A,B\in {\mathbb {R}}^{m\times n}\) the (trace) inner product is denoted by \(\left\langle {A},{B}\right\rangle =\mathop {\textrm{tr}}B^\top A=\sum _{ij}A_{ij}B_{ij}\). \(A\circ B=(A_{ij}B_{ij})\) denotes the elementwise or Hadamard product. \(A_{i,\bullet }\) refers to the rowvector of the ith row of A and \(A_{\bullet ,j}\) to the columnvector of the jth column of A. For some ordered index set \(J\subseteq \{1,\dots ,n\}\) the submatrix \(A_{\bullet ,J}\) consists of the respective columns. Consider symmetric matrices \(A,B\in {\mathbb {S}}^n\) of order n. For representing these as vectors, the operator \(\mathop {\textrm{svec}}A=(A_{11},\sqrt{2}A_{21},\dots ,\sqrt{2}A_{n1},A_{22 },\sqrt{2}A_{32},\dots ,A_{nn})^\top \) stacks the columns of the lower triangle with offdiagonal elements multiplied by \(\sqrt{2}\) so that \(\left\langle {A},{B}\right\rangle =\mathop {\textrm{svec}}(A)^\top \mathop {\textrm{svec}}(B)\). For matrices \(F,G\in {\mathbb {R}}^{k\times n}\) the symmetric Kronecker product \(\otimes _{s}\) is defined by \((F\otimes _{s}G)\mathop {\textrm{svec}}(A)=\frac{1}{2}\mathop {\textrm{svec}}(FAG^\top +GAF^\top )\). The Loewner partial order \(A\succeq B\) (\(A\succ B\)) refers to \(AB\in {\mathbb {S}}^n_{+}\) (\(AB\in {\mathbb {S}}^n_{++})\) being positive semidefinite (positive definite). The eigenvalues of A are denoted by \(\lambda _{\max }(A)=\lambda _1(A)\ge \dots \ge \lambda _n(A)=\lambda _{\min }(A)\). The norm \(\Vert \cdot \Vert \) refers to the Euclidean norm for vectors and to the spectral norm for matrices. \(I_n\) (I) denotes the identity matrix of order n (or of appropriate size), the canonical unit vectors \(e_i\) refer to the ith column of I. Unless stated explicitly otherwise, \(\mathbb {1}\) denotes the vector of all ones of appropriate size. \({\mathbb {E}}\) refers to the expected value of a random variable, \(\mathop {\textrm{Var}}\) to its variance and \({\mathcal {N}}(\mu ,\sigma ^2)\) to the normal or Gaussian distribution with mean \(\mu \) and standard deviation \(\sigma \).
2 The KKT system of the ConicBundle subproblem
The general setting of bundle methods deals with minimizing a (typically closed) convex function \(f:{\mathbb {R}}^m\rightarrow \overline{{\mathbb {R}}}:={\mathbb {R}}\cup \{\infty \}\) over a closed convex ground set \(C\subseteq \mathop {\textrm{dom}}f\) of simple structure like \({\mathbb {R}}^m\), a box or a polyhedron,
Typically, f is given by a first order oracle, i.e., a routine that returns for a given \({{\bar{y}}}\in C\) the function value \(f({{\bar{y}}})\) and an arbitrary subgradient \(g\in \partial f({{\bar{y}}})\) from the subdifferential of f in \({{\bar{y}}}\). Value \(f({{\bar{y}}})\) and subgradient g give rise to a supporting hyperplane to the epigraph of f in \(({{\bar{y}}},f({{\bar{y}}}))\). The algorithm collects these affine minorants in the bundle to form a cutting model of f. It will be convenient to arrange the value at zero and the gradient in a pair \(\omega =(\gamma =f({{\bar{y}}})\left\langle {g},{{{\bar{y}}}}\right\rangle ,g)\) and to denote, for \(y\in {\mathbb {R}}^m\), the minorant’s value in y by \(\omega (y):=\gamma +\left\langle {g},{y}\right\rangle \).
Let \(\mathcal {W}_f=\{\omega =(\gamma ,g)\in {\mathbb {R}}^{1+m}:\gamma +\left\langle {g},{y}\right\rangle \le f(y), y\in {\mathbb {R}}^m\}\) denote the set of all affine minorants of f. For closed f we have \(f(y)=\sup _{\omega \in \mathcal {W}_f}\omega (y)\). Any compact subset \(W\subseteq \mathcal {W}_f\) gives rise to a minorizing cutting model of f,
At the beginning of iteration \(k=0,1,\dots \) the bundle method’s state is described by a current stability center \({{\hat{y}}}_k\in {\mathbb {R}}^m\), a compact cutting model \(W_k\subseteq \mathcal {W}_f\), and a proximity term, here the square of a norm \(\Vert \cdot \Vert _{{\mathfrak {H}}_k}^2:=\left\langle {\cdot },{{\mathfrak {H}}_k\cdot }\right\rangle \) with positive definite \({\mathfrak {H}}_k\) (this Fraktur H will form the core of the final system matrix H). The method determines the next candidate \(y_{k+1}\in {\mathbb {R}}^m\) as minimizer of the augmented model or bundle subproblem
Solving this bundle subproblem may be viewed as determining a saddle point \((y_{k+1},{{\bar{\omega }}}_{k+1}=({{\bar{\gamma }}}_{k+1},{{\bar{g}}}_{k+1}))\in C\times \mathop {\textrm{conv}}W_k\), which exists for any closed convex C by [37], Theorems 37.3 and 37.6, due to the strong convexity in y and the compactness of \(W_k\),
Strong convexity in y ensures uniqueness of \(y_{k+1}\). First order optimality with respect to y implies
where \(N_C(y)\) denotes the normal cone to C at \(y\in C\). In the unconstrained case of \(C={\mathbb {R}}^m\) the aggregate \({{\bar{\omega }}}_{k+1}\) is also unique. Whether unique or not, the aggregate will refer to the solution \({{\bar{\omega }}}_{k+1}\in \mathop {\textrm{conv}}W_k\) produced by the algorithmic approach for solving (1). The progress predicted by the model is \(f({{\hat{y}}}_k){{\bar{\omega }}}_{k+1}(y_{k+1})=f({{\hat{y}}}_k)W_{k}(y_{k+1})\). Actual progress will be compared to a threshold value which arises from damping the progress predicted by the model by some \(\kappa \in (0,1)\)
Next, f is evaluated at \(y_{k+1}\) by calling the oracle which returns \(f(y_{k+1})\) and a new minorant \(\omega _{k+1}\) with \(\omega _{k+1}(y_{k+1})=f(y_{k+1})\). If progress in objective value is sufficiently large in comparison to the progress predicted by the model, i.e.,
the method executes a descent step which moves the center of stability to the new point, \({{\hat{y}}}_{k+1}=y_{k+1}\). Otherwise, in a null step, the center remains unchanged, \({{\hat{y}}}_{k+1}={{\hat{y}}}_{k}\), but the new minorant \(\omega _{k+1}\) is used to improve the model. In fact, the requirement \(\{{{\bar{\omega }}}_{k+1},\omega _{k+1}\}\subseteq W_{k+1}\) ensures convergence of the function values \(f({{\hat{y}}}_k)\) to \(f^*=\inf _{y\in C} f(y)\) under mild technical conditions on \({\mathfrak {H}}_{k+1}\). For these it suffices, e.g. to fix \(0<{\underline{\lambda }}\le {{\bar{\lambda }}}\) and to choose \({\underline{\lambda }}I\preceq {\mathfrak {H}}_{k+1} \preceq {{\bar{\lambda }}} I\) following a descent step and \({\mathfrak {H}}_k\preceq {\mathfrak {H}}_{k+1} \preceq {{\bar{\lambda }}} I\) following a null step, see [6].
The decisive elements for an efficient implementation are the following:

the choice of the cutting model \(W_k\),

the choice of the proximal term, in our case of \({\mathfrak {H}}_k\),

the solution method for the bundle subproblem (1) with corresponding structural requirements on supported ground sets C.
and their interplay. While most bundle implementations employ polyhedral cutting models combined with a suitable active set QP approach, the ConicBundle callable library [19] is primarily designed for (nonnegative combinations of) conic cutting models built from symmetric cones. In particular the cone of positive semidefinite matrices and the second order cone lead to nonpolyhedral models that change significantly in each step. In solving (1) for these models, interior point methods are currently the best option available, but with these methods the cost of assembling the coefficients and solving the subproblem dominates the work per iteration for most applications. This paper explores possibilities to replace the classical Schur complement approach for computing the Newton step by an iterative approach in order to improve applicability to large scale problems.
As the main focus of this work is on solving problem (1) for a single iteration, we will refrain from giving the iteration index k in the following. In order to describe the main structure of the primal dual KKT system for (1), let us briefly sketch the conic cutting models employed. These build on combinations of the cone of nonnegative vectors, the second order cone and the cone of positive semidefinite matrices, each with a specific trace vector for measuring the “size” of its elements,
Cartesian products of these are described by a triple \(t=(n,q,s)\) with \(n\in {\mathbb {N}}_0\), \(q\in {\mathbb {N}}^{n_q}\), \(s\in {\mathbb {N}}^{n_s}\) for some \(n_q\in {\mathbb {N}}_0\), \(n_s\in {\mathbb {N}}_0\) specifying the cone
The cone \(\mathcal {S}^t_+\) will be regarded as a full dimensional cone in \({\mathbb {R}}^t:={\mathbb {R}}^{n(t)}\) with \(n(t)=n+\mathbb {1}^\top q+\sum _{i=1}^{n_s}{s_i+1\atopwithdelims ()2}\). Whenever convenient, an \(x\in \mathcal {S}^t_+\) or \({\mathbb {R}}^t\) will be split into
in the natural way. Indices or elements may be omitted if the corresponding counters n, \(n_q\), \(n_s\) happen to be 0 or 1. The trace \(\mathop {\textrm{tr}}(\cdot )\) of an element \(x\in S_t\) is defined to be
In ConicBundle each cutting model may be considered to be specified by a tuple \(M=(t,{\tau },K,{\mathcal {B}},{{\underline{\omega }}})\) via
where \(t=(n,q,s)\) specifies the cone as above, \( \tau >0\) gives the trace value or trace upper bound, \( K\in \{\{0\},{\mathbb {R}}_+\}\) specifies constant or bounded trace, \(\mathcal {B}:\mathbb {R}^m \rightarrow {\mathbb {R}}^{n(t)},\quad y \mapsto \mathcal {B}(y)=B_0+By\) represents the bundle as affine function, \( {\underline{\omega }}=({\underline{\gamma }},{\underline{g}})\) provides a constant offset subgradient.
For example, the standard polyhedral model for h subgradients \(\omega _i=(\gamma _i,g_i)\), \(i=1,\dots ,h\) is obtained for \(M=(t=(h,0,0),\tau =1,K=\{0\}, (B_0=\left[ {\begin{matrix} \gamma _1\\ \vdots \\ \gamma _h \end{matrix}}\right] ,B= \left[ {\begin{matrix} g_1^\top \\ \vdots \\ g_h^\top \end{matrix}}\right] ),{\underline{\omega }}=0)\). Indeed, maximizing \(\left\langle {{\mathcal {B}}(y)},{x}\right\rangle \) over \(\xi \in {\mathbb {R}}^h_+\) with \(\mathbb {1}^\top \xi =1\) finds the best convex combination of the subgradients at y. In polyhedral models \({\mathcal {B}}\) may well be sparse (consider e.g. Lagrangean relaxations of multicommodity flow problems like in the train timetabling application described in [13, 14]), but in combination with positive semidefinite models large parts of \({\mathcal {B}}\) will be dense (see [26]; for concreteness, consider \(f(y)=\lambda _{\max }(\sum _{i=1}^mA_iy_i)\) with \(A_i\in {\mathbb {S}}^n\) and let the spectral bundle be described by \(h'\) orthonormal columns \(P\in {\mathbb {R}}^{n\times h'}\) that approximately span the eigenspace to large eigenvalues, then \(h={h'+1\atopwithdelims ()2}\) and \(B_{\bullet ,i}=\mathop {\textrm{svec}}(P^\top A_iP)\) for \(i=1,\dots ,m\)). Therefore we will not assume any specific structure in the bundle \({\mathcal {B}}\).
Depending on the variable metric heuristic in use, see [23, 24] for typical choices in ConicBundle, the proximal term may either be a positive multiple of \(I_m\) or it may be of the form \({\mathfrak {H}}=D_{\mathfrak {H}}+V_{\mathfrak {H}}V_{\mathfrak {H}}^\top \in {\mathbb {S}}^n_+\), where \(D_{\mathfrak {H}}\) is a diagonal matrix with strictly positive diagonal entries and \(V_{\mathfrak {H}}\in {\mathbb {R}}^{m\times h_{\mathfrak {H}}}\) specifies a rank \(h_{\mathfrak {H}}\) contribution. If \(V_{\mathfrak {H}}\) is present, it typically consists of dense orthogonal columns.
The final ingredient is the basic optimization set C, which may have a polyhedral description of the form
where \(A\in {\mathbb {R}}^{h_A\times m}\), \({\underline{a}}\in ({\mathbb {R}}\cup \{\infty \})^{h_A},{\overline{a}}\in ({\mathbb {R}}\cup \{\infty \})^{h_A}\), \({\underline{y}}\in ({\mathbb {R}}\cup \{\infty \})^m,{\overline{y}}\in ({\mathbb {R}}\cup \{\infty \})^m\) are given data. If A is employed in applications, we expect the number of rows \(h_A\) of A to be small in comparison to m. The set C is tested for feasibility in advance, but no preprocessing is applied.
In order to reduce the indexing load in this presentation, we consider problem (1) for a single model \((t,{\tau },K,\mathcal {B},{{\underline{\omega }}}=({{\underline{\gamma }}},\underline{g}))\). Putting \(b={\mathfrak {H}}{{\hat{y}}}+{\underline{g}}\) and \(\delta =\frac{1}{2}\left\langle {{\mathfrak {H}}{{\hat{y}}}},{{{\hat{y}}}}\right\rangle +{{\underline{\gamma }}}\) the bundle problem may be written in the form
The existence of saddle points is guaranteed by compactness of the xset and by strong convexity for y due to \({\mathfrak {H}}\succ 0\), see e.g. [29]. For the purpose of presentation, the primal dual KKT system for solving the saddle point problem (4) is built for \({\underline{a}}<{\overline{a}}\), \({\underline{y}}<{\overline{y}}\) and \(K={\mathbb {R}}_+\). The extension to the equality cases follows quite naturally and will be commented on at appropriate places. Throughout we will assume that \(A,A^\top \) and \(B,B^\top \) are given by matrix–vector multiplication oracles. A is assumed to have few rows, B may actually have a large number of columns, \({\mathfrak {H}}\) is a positive diagonal plus low rank, but no further structural information is assumed to be available.
The original spectral bundle approach of [26] was designed for the unconstrained case \(C={\mathbb {R}}^m\) which allows direct elimination of y by convex optimality. Setting up the maximization problem for x then requires forming the typically dense Schur complement \(B{\mathfrak {H}}^{1}B^\top \). For increasing bundle sizes this is in fact the limiting operation within each bundle iteration. The aim of developing an iterative approach for (4) is therefore not only to allow for general C but also to circumvent the explicit computation of this Schur complement.
For setting up a primaldual interior point approach for solving (4), the dual variables to constraints on the minimizing side will be denoted by \(s\in {\mathbb {R}}^{h_A}\), \(s_{{\underline{a}}},s_{{\overline{a}}}\in {\mathbb {R}}^{h_A}_+\), \(s_{{\underline{y}}},s_{{\overline{y}}}\in {\mathbb {R}}^m_+\), the dual variables to the constraints on the maximizing side will be \(z\in \mathcal {S}^t_+\) and \(\zeta \in K^*={\mathbb {R}}_+\).
With barrier parameter \(\mu > 0\) for the conic constraints the usual primaldual KKTSystem may be arranged in the form
In this, “\(\circ \)” denotes the componentwise Hadamard product and “\(\circ _t\)” a canonical generalization to the cone \(\mathcal {S}^t_+\), employing the arrow operator for second order cone parts and (typically symmetrized) matrix products for semidefinite parts.
In solving this by Newton’s method, the linearization of the first perturbed complementarity line yields
For writing the difference \(\Delta s_{{{\overline{a}}}}\Delta s_{{\underline{a}}}\) compactly it is advantageous to introduce
so that
Likewise, for the second perturbed complementarity line, introduce
to obtain
For dealing with the conic complementarity “\(\circ _t\)” we employ the symmetrization operators \({\mathcal {E}}_t\) and \({\mathcal {F}}_t\) of [38] in diagonal block form corresponding to \({\mathcal {S}}^t_+\), which give rise to a symmetric positive definite \({\mathfrak {X}}_t={\mathcal {E}}_t^{1}{\mathcal {F}}_t\succ 0\) with diagonal block structure according to \({\mathcal {S}}^t_{++}\). With this the last perturbed complementarity line results in
Employing the linearization of the defining equation for s,
the variable \(\Delta w\) may now be eliminated via
Put \(D_y:=\mathop {\textrm{Diag}}(d_y)>0\) and \(D_w:=\mathop {\textrm{Diag}}(d_w)>0\), then the Newton step is obtained by solving the system
with right hand side
The right hand side may be modified as usual to obtain predictor and corrector right hand sides, but this will not be elaborated on here. Note, for \(K=\{0\}\) the same system works with \(\sigma =0\) and without the centering terms associated with \(\zeta \) in \(r_\zeta \). Likewise, whenever line i of A corresponds to an equation, i.e., \({\underline{a}}_i={\overline{a}}_i\), the respective entry of \(d_w^{1}\) has to be replaced by zero.
Equation (6) is a symmetric indefinite system, that could be solved by appropriate iterative methods directly. So far, however, we were not able to conceive suitable general preconditioning approaches for exploiting the given structural properties in the full system. Surprisingly, a viable path seems to be offered by the traditional Schur complement approach after all. The resulting system allows to perform matrix vector multiplications at minimal additional cost and is frequently positive definite.
To see this, first take the Schur complement with respect to the \({\mathfrak {X}}_t^{1}\) block,
Assuming \(d_w>0\) (no equality constraint rows in A), eliminate the second and third block with further Schur complements and split \(B^\top {\mathfrak {X}}_t B=B^\top {\mathfrak {X}}_t^{\frac{1}{2}}{\mathfrak {X}}_t^{\frac{1}{2}}B\),
Also in the equality case of \(\sigma =0\) the matrix \(I\tfrac{{\mathfrak {X}}^{\frac{1}{2}}\mathbb {1}_{t}({\mathfrak {X}}^{\frac{1}{2}}\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t}\succeq 0\) is positive semidefinite, so the resulting system is positive definite. Equality constraints in A induce zero diagonal elements in \(D_w\) (or in \(D_w^{1}\)). In this case the corresponding rows will not be eliminated and give rise to an indefinite system of the form \(\left[ {\begin{matrix} H &{} {{\tilde{A}}}^\top \\ {{\tilde{A}}} &{} 0 \end{matrix}}\right] \) with a large positive definite block H and hopefully few further rows in \({{\tilde{A}}}\). For such systems it is well studied how to employ a preconditioner for H to solve the full indefinite system with e.g. MINRES, see [12].
The cost of multiplying the full KKT matrix of (6) by a vector is roughly the same as that of multiplying H by a vector. Indeed, the same multiplications arise for \({\mathfrak {H}}+D_y,A,A^\top ,B,B^\top \). So it remains to compare the cost of a multiplication by \(\left[ {\begin{matrix} {\mathfrak {X}}^{1} &{}\mathbb {1}_t\\ one_t^\top &{}\zeta ^{1}\sigma \end{matrix}}\right] \) to a multiplication by \({\mathfrak {X}}_t^{\frac{1}{2}}(I\tfrac{{\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t}({\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t}){\mathfrak {X}}_t^{\frac{1}{2}}={\mathfrak {X}}_t\tfrac{{\mathfrak {X}}_t\mathbb {1}_{t}({\mathfrak {X}}_t\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t}\). Recall that \({\mathfrak {X}}_t\) is a block diagonal matrix with a separate block for each cone \({\mathbb {R}}_+\), \({{{\mathcal {Q}}^{q_i}}}\), \({\mathbb {S}}^{s_i}_+\) specified by t and the cost of multiplying by \({\mathfrak {X}}_t\) or \({\mathfrak {X}}_t^{1}\) is identical. Thus the only difference are the multiplications by \(\mathbb {1}_t\) in the first case and by the precomputed vector \({\mathfrak {X}}_t\mathbb {1}_{t}\) in the second. The vector \({\mathfrak {X}}_t\mathbb {1}_{t}\) may be formed at almost negligible cost along with setting up \({\mathfrak {X}}_t\). So there is no noteworthy difference in the cost of matrix vector multiplications between the two systems and no structural advantages are lost when working with H instead of the full system. We will therefore concentrate on developing a preconditioner for H.
For this note that H of (7) arises from adding a Gram matrix to a positive diagonal,
where (recall \({\mathfrak {H}}=D_{\mathfrak {H}}+V_{\mathfrak {H}}V_{\mathfrak {H}}^\top \))
Note that the multiplication of V with a vector requires only a little bit more than half the number of operations of multiplying H (or the full KKT matrix) with a vector. This suggests to explore possibilities of finding low rank approximations of V for preconditioning.
3 Low rank preconditioning a Grammatrix plus positive diagonal
Consider a matrix
with a positive definite matrix \(D\in {\mathbb {S}}^m_{++}\) and \(V\in {\mathbb {R}}^{m\times n}\). In our application D is diagonal, but the results apply for general \(D\succ 0\). This is applicable in practice as long as \(D^{1}\) can be applied efficiently to vectors. Matrix V is assumed to be given by a matrix–vector multiplication oracle, i.e., V and \(V^\top \) may be multiplied by vectors but the matrix does not have to be available explicitly.
For motivating the following preconditioning approaches, first consider (without actually computing it) the singular value decomposition of
with orthogonal \(Q_H\in {\mathbb {R}}^{m\times m}\), diagonal \(\Sigma =\mathop {\textrm{Diag}}(\sigma _1,\dots ,\sigma _n)\) ordered nonincreasingly by \(\sigma _1\ge \cdots \ge \sigma _n\ge 0\) and orthogonal \(P_H\in {\mathbb {R}}^{n\times n}\) (for convenience, it is assumed that \(n\le m\)). Then
When \(\Sigma \) is replaced by the k largest singular values, this gives rise to a good “low rank” preconditioner, see Theorem 1 below. Computing the full matrix \(D^{\frac{1}{2}}V\) and its singular value decomposition will in general be too costly or even impossible. Instead the general idea is to work with \(D^{\frac{1}{2}}V\Omega \) for some random or deterministic choice of \(\Omega \in {\mathbb {R}}^{n\times k}\).
Multiplying by a random \(\Omega \) may be thought of as giving rise to a subspace approximation in the style of Johnson–Lindenstrauss, cf. [1, 9], and this formed the starting point of this investigation. The actual randomized approach and analysis, however, mainly builds on [17] and the bounding techniques presented there. For the deterministic preconditioning variant the recent work [16] provided strong guidance for analyzing the condition number.
Here, \(\Omega \) will mostly consist of orthonormal columns. Yet it is instructive to consider more general cases, as well. An arbitrary \(\Omega \in {\mathbb {R}}^{n\times k}\) gives rise to the preconditioner
Putting \({{{{\hat{G}}}({\Omega })}}:=D^{\frac{1}{2}}Q_H \left[ {\begin{matrix} I_{ n}+\Sigma P_H^\top \Omega \Omega ^\top P_H\Sigma &{} 0 \\ 0 &{} I_{mn} \end{matrix}}\right] ^{\frac{1}{2}}\) we have \({{{{{\hat{H}}}({\Omega })}}={{{{\hat{G}}}({\Omega })}}{{{{\hat{G}}}({\Omega })}}^\top }\). The preconditioner is better the closer \({{{{{\hat{G}}}({\Omega })}}}^{1}H{{{{{\hat{G}}}({\Omega })}}}^{T}\) is to the identity. In the analysis of convergence rates, see e.g. [12], this enters via the condition number
In this, the equations follow from \(BB^\top \) and \(B^\top B\) having the same eigenvalues for \(B\in {\mathbb {R}}^{n\times n}\).
Theorem 1
Let \(H=D+VV^\top \in {\mathbb {S}}^m_{++}\) with positive definite \(D\in {\mathbb {S}}^m_{++}\) and \(V\in {\mathbb {R}}^{m\times n}\) with \(n<m\) and singular value decomposition \(D^{\frac{1}{2}}V=Q_H\Sigma P_H^\top \), \(Q_H^\top Q_H=I_m\), \(P_H^\top P_H=I_n\), \(\Sigma =\mathop {\textrm{Diag}}(\sigma _1\ge \dots \ge \sigma _n)\in {\mathbb {S}}^n_+\). For \(\Omega \in {\mathbb {R}}^{n\times k}\) the preconditioner \({{{{{\hat{H}}}({\Omega })}}}\) of (9) results in condition number
In particular, for \(0\le k<n\) and \(\Omega =(P_H)_{\bullet ,[1,\dots ,k]}\) the condition number’s value is \(1+\sigma _{k+1}^2\).
Proof
For \({{{{\hat{G}}}({\Omega })}}\) as above direct computation yields
The eigenvalues of \((I_{ n}+\Sigma \Omega \Omega ^\top \Sigma )^{\frac{1}{2}}(I_{ n}+\Sigma ^2)(I_{ n}+\Sigma \Omega \Omega ^\top \Sigma )^{\frac{1}{2}}\) coincide with those of \((I_{ n}+\Sigma ^2)^{\frac{1}{2}}(I_{ n}+\Sigma \Omega \Omega ^\top \Sigma )^{1}(I_{ n}+\Sigma ^2)^{\frac{1}{2}}\), because for \(B=(I_{ n}+\Sigma \Omega \Omega ^\top \Sigma )^{\frac{1}{2}}(I_{ n}+\Sigma ^2)^{\frac{1}{2}}\) the two matrices are \(BB^\top \) and \(B^\top B\). This gives rise to the first line. The second follows because for positive definite A there holds \(\lambda _{\max }(A)=1/\lambda _{\min }(A^{1})\) and \(\lambda _{\min }(A)=1/\lambda _{\max }(A^{1})\). \(\square \)
Consider now a subspace spanned by k orthonormal columns collected in some matrix \(P_ \ell \in {\mathbb {R}}^{n\times k}\) which hopefully generates most of the large directions of \(D^{\frac{1}{2}}V\). In this orthonormal case a simpler bound on the condition number may be obtained by following the argument of Th. 5.1 in [16].
Theorem 2
Let \(H=D+VV^\top \) with positive definite \(D\in {\mathbb {S}}^m_{++}\) and general \(V\in {\mathbb {R}}^{m\times n}\), let \(P=[{\bar{P}},{{\underline{P}}}]\in {\mathbb {R}}^{n\times n}\), \(PP^\top =I_n\). Preconditioner \({{{{{\hat{H}}}({{{\bar{P}}}})}}}=D+V{{{\bar{P}}}} {{{\bar{P}}}}^\top V^\top \) has condition number \(\kappa _{{{{\bar{P}}}}}\le 1+\lambda _{\max }({{{{{\hat{H}}}({\bar{P}})}}}^{\frac{1}{2}}V{{\underline{P}}}\,{{\underline{P}}}^\top V^\top {{{{{\hat{H}}}({{{\bar{P}}}})}}}^{\frac{1}{2}})\). Equality holds if and only if \(\mathop {\textrm{rank}}(V{{\underline{P}}})<m\).
Proof
Because \(H=D+[V{{{\bar{P}}}},V{{\underline{P}}}][V{\bar{P}},V{{\underline{P}}}]^\top ={{{{{\hat{H}}}({\bar{P}})}}}+V{{\underline{P}}}\,{{\underline{P}}}^\top V^\top \) we have
The second summand is positive semidefinite with minimum eigenvalue 0 if and only if \(\mathop {\textrm{rank}}(V{{\underline{P}}})<m\). Thus,
By \(\kappa ({{{{{\hat{H}}}({{{\bar{P}}}})}}}^{\frac{1}{2}}H{{{{{\hat{H}}}({\bar{P}})}}}^{\frac{1}{2}})=\frac{\lambda _{\max }({{{{{\hat{H}}}({{{\bar{P}}}})}}}^{\frac{1}{2}}H{{{{{\hat{H}}}({\bar{P}})}}}^{\frac{1}{2}})}{\lambda _{\min }({{{{{\hat{H}}}({{{\bar{P}}}})}}}^{\frac{1}{2}}H{{{{{\hat{H}}}({{{\bar{P}}}})}}})}\) the result is proved. \(\square \)
Building on these two theorems we first analyze randomized approaches that do not make any assumptions on structural properties of \(D^{\frac{1}{2}}V\) but only require a multiplication oracle. Afterwards we present a deterministic approach that exploits some knowledge of the bundle subproblem and the interior point algorithm. The corresponding routines supply a \({{\hat{V}}}=V\Omega \). The actual preconditioning routine, Algorithm 3 below, does not use \({{{{{\hat{H}}}({\Omega })}}}\) directly, but a truncated preconditioner \({{{{{\hat{H}}}({\Omega {{\hat{P}}}})}}}\) that drops all singular values of \(D^{\frac{1}{2}}V\Omega \) that are less than one. The inverse is then formed via a Woodburyformula, see [30, Sect. 0.7.4]. Note, depending on the expected number of calls to the routine and the structure preserved in \({{\hat{V}}}\), it may or may not pay off to also precompute \({{\hat{V}}}{{\hat{P}}}\). For diagonal D and dense \({{\hat{V}}}\) the cost of applying this preconditioner is \(O(m+mk+k{{\hat{k}}})\).
Algorithm 3
(Preconditioning by truncated \({{{{{\hat{H}}}({\Omega })}}}=D+V\Omega (V\Omega )^\top \))
Input: \(v\in {\mathbb {R}}^m\), \(D\in {\mathbb {S}}^n_{++}\), precomputed \({\hat{V}}=V\Omega \in {\mathbb {R}}^{n\times k}\) and, for \({\hat{V}}^\top D^{1}{\hat{V}}=P\mathop {\textrm{Diag}}(\hat{\lambda }_1\ge \dots \ge {{\hat{\lambda }}}_k)P^\top \), \({\hat{k}}=\max \{0,i:{{\hat{\lambda }}}_i\ge 1\}\), \({{\hat{\Lambda }}}=\mathop {\textrm{Diag}}({{\hat{\lambda }}}_1,\dots ,{{\hat{\lambda }}}_{{\hat{k}}})\), \({\hat{P}}=P_{\bullet ,[1,\dots ,{\hat{k}}]}\).
Output: \({{{{{\hat{H}}}({\Omega {\hat{P}}})}}}^{1}v\).

1.
\(v\leftarrow D^{1}v\).

2.
If \({\hat{k}}>0\) set \(v \leftarrow v  D^{1}\hat{V}{\hat{P}}(I+{{\hat{\Lambda }}})^{1}{\hat{P}}^\top {\hat{V}}^\top v\).

3.
return v.
3.1 Preconditioning by random subspaces
For the random subspace approach fix some \(k\in {\mathbb {N}}\) with \(2\le k<n\). At first consider \(\Omega \) to be an \(n\times k\) random matrix whose elements are independently identically distributed by the normal distribution \(\mathcal {N}(0,\frac{1}{k})\). For this \(\Omega \) consider the low rank approximation \(D^{\frac{1}{2}} V\Omega = Q_{H} \left[ \begin{array}{c} \Sigma \\ 0 \end{array} \right] P_H^\top \Omega \). Because the normal distribution is invariant under orthogonal transformations, we may assume \( P_H=I\) and analyze the setting \( Q_{H} \left[ \begin{array}{c} \Sigma \\ 0 \end{array} \right] \Omega \) giving rise to the low rank approximation by the random matrix
In view of Theorem 1 such a preconditioner is good if \((I+\Sigma ^2)^{\frac{1}{2}}(I+\Sigma \Omega \Omega ^\top \Sigma )(I+\Sigma ^2)^{\frac{1}{2}}\) is close to the identity. Based on the Johnson–Lindenstrauss interpretation, it seems likely that large portions of the spectrum will be close to one. This can be justified to some extent by studying the moments of the Ritz values.
Theorem 4
Let \(\Omega \in {\mathbb {R}}^{ n\times k}\) have its elements i.i.d. according to the normal distribution \(\mathcal {N}(0,\frac{1}{k})\), then for any \(x\in {\mathbb {R}}^{ n}\) the quadratic form
has expected value \({\mathbb {E}}(q(x))=\Vert x\Vert ^2\) and variance \(\mathop {\textrm{Var}}(q(x))=\frac{2}{k}\big (\sum _{i=1}^{n}\frac{\sigma _i^2}{1+\sigma _i^2}x_i^2\big )^2\).
Proof
Let \(\Omega =(\omega _{ij})\) with i.i.d. elements \(\omega _{ij}\) from \(\mathcal {N}(0,\frac{1}{k})\). Recall that \({\mathbb {E}}(\omega _{ij})=0\), \({\mathbb {E}}(\omega _{ij}^2)=\frac{1}{k}\), \({\mathbb {E}}(\omega _{ij}^3)=0\), \({\mathbb {E}}(\omega _{ij}^4)=3/k^2\) and that for independent random variables X, Y there holds \({\mathbb {E}}(XY)={\mathbb {E}}(X){\mathbb {E}}(Y)\).
The expected value of the quadratic form evaluates to
For determining the variance, the second moment may be computed as follows.
In the cases of \(h\ne h'\) only terms with \(i=i'\) and \(j=j'\) are not zero. These evaluate to \({\mathbb {E}}\omega _{hi}^2 {\mathbb {E}}\omega _{h'j}^2=\frac{1}{k^2}\) giving
For each \(h=h'\) there remain \((i=i'=j=j')\) with value \({\mathbb {E}}\omega _{hi}^4=\frac{3}{k^2}\),
and the three pairings \((i=i',j=j')\), \((i=j,i'=j')\) and \((i=j',i'=j)\) each with value \(\frac{1}{k^2}\),
Summing up these three expressions yields
The result now follows from the usual \(\mathop {\textrm{Var}}X={\mathbb {E}}(X^2)({\mathbb {E}}X)^2\) for any random variable X. \(\square \)
This suggests that even for relatively small k the behavior of the preconditioned system may be expected to be reasonably close to the identity for a large portion of the directions. The result, however, does not seem to open a path towards good bounds on the condition number.
A first possibility is offered by Theorem 2. Recall that for an arbitrary matrix \(A\in {\mathbb {R}}^{m\times n}\) the projector
projects any vector of \({\mathbb {R}}^m\) onto the range space of A and \({{\textbf {P}}}\) depends only on this range space. Computationally it may be determined by a QRdecomposition of \(A=Q_AR_A\) with orthogonal \(Q_A\in {\mathbb {R}}^{m\times n'}\) for \(n'=\mathop {\textrm{rank}}(A)\) via \({{\textbf {P}}}_A=Q_AQ_A^\top \). The formula allows one to verify \({{\textbf {P}}}_A={{\textbf {P}}}_A^\top \), \({{\textbf {P}}}_A{{\textbf {P}}}_A={{\textbf {P}}}_A\) and \({{\textbf {P}}}_AA=A\) by direct computation. Furthermore, for any \(B\in {\mathbb {R}}^{n\times h}\) there holds \({{\textbf {P}}}_{AB}\preceq {{\textbf {P}}}_A\preceq I_m\) because of the containment relations between the ranges.
In the following \(\Omega \Omega ^\top \) in \({{{{{\hat{H}}}({\Omega })}}}\) will be replaced by the projector \({{\textbf {P}}}_\Omega \). The random low rank approximation to be considered reads
The following deterministic result holds for any matrix \(\Omega \in {\mathbb {R}}^{n\times k}\).
Corollary 5
Let \(H=D+VV^\top \in {\mathbb {R}}^m\) with positive definite \(D\in {\mathbb {S}}^m_{++}\) and \(V\in {\mathbb {R}}^{m\times n}\). Given \(\Omega \in {\mathbb {R}}^{n\times k}\), let \({{\textbf {P}}}_\Omega =\Omega (\Omega ^\top \Omega )^\dagger \Omega ^\top \). For the preconditioner \({{{{{\hat{H}}}({{{\textbf {P}}}_\Omega })}}}\) the condition number satisfies \(\kappa _{{{\textbf {P}}}_\Omega }\le 1+\Vert D^{\frac{1}{2}}V(I{{\textbf {P}}}_{\Omega })\Vert ^2\), where \(\Vert \cdot \Vert \) denotes the spectral norm.
Proof
Let \({{\textbf {P}}}_\Omega ={{{\bar{P}}}} {{{\bar{P}}}}^\top \) with \({{{\bar{P}}}}\in {\mathbb {R}}^{n\times k'}\) for some \(k'\le k\) and \({\bar{P}}^\top {{{\bar{P}}}}=I_{k'}\). Add orthonormal columns \({{\underline{P}}}\) so that \(P=[{{{\bar{P}}}},{{\underline{P}}}]\) satisfies \(PP^\top =I_n\). Note, \(I{{\textbf {P}}}_\Omega ={{\underline{P}}}\,{{\underline{P}}}^\top \) is the projector onto the orthogonal complement. Use this choice in Theorem 2, then \({{{{{\hat{H}}}({{{\textbf {P}}}_\Omega })}}}={{{{{\hat{H}}}({{{\bar{P}}}})}}}\succeq D\). Observe that by \(D\in {\mathbb {S}}^m_{++}\) for any \(\lambda \ge 0\), \(A\in {\mathbb {S}}^m\) the relation \(D^{\frac{1}{2}}AD^{\frac{1}{2}}\preceq \lambda I_m\) is equivalent to \(A\preceq \lambda D\) and implies \(A\preceq \lambda (D+V{{{\bar{P}}}} {{{\bar{P}}}}^\top V^\top )=\lambda {{{{{\hat{H}}}({{{\bar{P}}}})}}}\) which is equivalent to \({{{{{\hat{H}}}({{{\bar{P}}}})}}}^{\frac{1}{2}}A{{{{{\hat{H}}}({\bar{P}})}}}^{\frac{1}{2}}\preceq \lambda I_m\). Therefore
\(\square \)
While this bound is rather straight forward to derive, it does not seem strong enough to observe a reduced influence of the largest singular values of \(D^{\frac{1}{2}}V\). Indeed, in its derivation only the diagonal of \({{{{{\hat{H}}}({{{\textbf {P}}}_\Omega })}}}\) was considered and the influence of \(V\Omega \) is lost.
In order to obtain stronger bounds, the rather involved techniques laid out in [17] seem to be required. The next steps and results follow their arguments closely. This time the \({{{{{\hat{H}}}({{{\textbf {P}}}_\Omega })}}}\)part is kept inverted in the analysis of the condition number.
Because \(I+\Sigma {{{\textbf {P}}}_{\Omega }}\Sigma \preceq I+\Sigma ^2\) there holds
By Theorem 1 the condition number is bounded by
and will attain it, whenever \(n<m\). In terms of \(\Omega \), the best possible outcome is an event resulting in \({{{\textbf {P}}}_{\Omega }}=\left[ \begin{array}{cc}I_{k}&{}0 \\ 0 &{}0\end{array}\right] \) (see, e.g. [30, 7.4.52]). It corresponds to the truncated SVD and gives \(\kappa _{ \left[ {\begin{matrix} I_{k}&{} {0} \\ 0 &{} {0} \end{matrix}}\right] } = 1+\sigma _{k+1}^2\). Aiming for something more realistic, one hopes for a good coverage of the first k singular values when oversampling with \({{\mathfrak {p}}}\) additional columns. The first step in the analysis is to obtain a deterministic bound for a fixed \(\Omega \in {\mathbb {R}}^{ n\times (k+{{\mathfrak {p}}})}\) as outlined in [17, Sect. 9.2].
Theorem 6
Given \(\sigma _1\ge \dots \ge \sigma _{ n} \ge 0\) and a matrix \(\Omega \in {\mathbb {R}}^{ n\times (k+{{\mathfrak {p}}})}\) with \(k\le n\) so that the first k rows of \(\Omega \) are linearly independent, split \(\Sigma =\left[ \begin{array}{ll}\Sigma _1&{} 0\\ 0&{}\Sigma _2\end{array} \right] \) into blocks \(\Sigma _1=\mathop {\textrm{Diag}}(\sigma _1,\dots ,\sigma _k)\) and \(\Sigma _2=\mathop {\textrm{Diag}}(\sigma _{k+1},\dots ,\sigma _{ n})\) and \(\Omega =\left[ \begin{array}{c}\Omega _1\\ \Omega _2\end{array}\right] \) into the first k rows \(\Omega _1\in {\mathbb {R}}^{k\times (k+{{\mathfrak {p}}})}\) and the last \( nk\) rows \(\Omega _2\in {\mathbb {R}}^{( nk)\times k+{{\mathfrak {p}}}}\). Then
Proof
By assumption \(\Omega _1\) has full row rank and the range space of the matrix
is contained in the range space of \(\Omega \). Hence \({{{\textbf {P}}}_Z}\preceq {{{\textbf {P}}}_{\Omega }}\) and
The projector \({{{\textbf {P}}}_Z}\) computes to
Use this in the Woodburyformula for inverses of rank adjustments [30, Sect. 0.7.4] for \((I+\Sigma {{{\textbf {P}}}_Z}\Sigma )^{1}\) to obtain
The last line follows, because \(\Vert (I+\Sigma _2^2)^{\frac{1}{2}}F\Vert ^2=\lambda _{\max }(F^\top (I+\Sigma _2^2)F)=:{{\bar{\lambda }}}\) and therefore \(F^\top (I+\Sigma _2^2)F\preceq {{\bar{\lambda }}} I\) giving
so the relation is implied by semidefinite scaling. Put \(\Lambda =(I+\Sigma _1^2+{{\bar{\lambda }}} I)\) and note that the second diagonal block of (11) asserts \(\Sigma _2F\Lambda ^{1}F^\top \Sigma _2\preceq I\), then
Employing [17, Prop. 8.3] now results in
The last term evaluates to \(\lambda _{\max }(I+\Sigma _2^2)=1+\sigma _{k+1}^2\). For the second last term substituting in the definitions of \(\Lambda \) and \({{\bar{\lambda }}}\) yields
\(\square \)
The current bound falls somewhat short of expectation because of the identity in \(\Vert (I+\Sigma _2^2)^{\frac{1}{2}}\Omega _2\Omega _1^\dagger \Vert ^2\). By Theorem 1 and \(I_n\preceq I_n+\Sigma {{{\textbf {P}}}_{\Omega }} \Sigma \preceq I_n+\Sigma ^2\), the use of projectors will never result in condition numbers larger than \(1+\sigma _1^2\), so the influence of the dimension seems to be too dominant in this. Maybe a better bound is achievable by a more sophisticated argument.
The deterministic bound allows to also make use of the probabilistic bounds on \(\Vert (I+\Sigma _2^2)^{\frac{1}{2}}\Omega _2 \Omega _1^\dagger \Vert \) for standard Gaussian \(n\times (k+{\mathfrak {p}})\) matrices \(\Omega \) (i.e., matrix elements are independently \({\mathcal {N}}(0,1)\) distributed) given in [17]. These shed some light on the advantage of employing oversampling by \({{\mathfrak {p}}}\) additional random vectors in \(\Omega \). In our application, oversampling corresponds to computing the singular values of \(D^{\frac{1}{2}}V{{\textbf {P}}}_\Omega \) for \(k+{{\mathfrak {p}}}\) columns in order to get better control on the k largest singular values of \(D^{\frac{1}{2}}V\) by the preconditioner \({{{{{\hat{H}}}({{{\textbf {P}}}_\Omega })}}}\).
Theorem 7
In the setting of Theorem 6 let \(\Omega \) be drawn as a standard Gaussian \(n\times (k+{\mathfrak {p}})\) matrix. Then
Furthermore, if \(p\ge 4\) then for all \(u,t\ge 1\) the probability for
is at most \(2t^{{{\mathfrak {p}}}}+e^{u^2/2}\).
The same bounds hold for the condition number \(\kappa _{{{{\textbf {P}}}_{\Omega }}}\).
Proof
A central and complex step in [17, proof of Th. 10.2] is to establish the relation
which directly yields the bound on the expected value via Theorem 6.
Likewise, in [17, proof of Th. 10.8] the authors derive for \({{\mathfrak {p}}}\ge 4\) and \(u,t\ge 1\)
\(\square \)
Again, the presence of the identity in the deterministic bound of Theorem 6 has a major impact also in these probabilistic bounds. Indeed, one would hope that a better deterministic bound helps to prove stronger decay.
Without some a priori knowledge of the singular values of \(D^{\frac{1}{2}}V\in {\mathbb {R}}^{m\times n}\) it is difficult to determine a suitable number of columns for \(\Omega \), i.e., a suitable dimension of the random subspace. For huge m the Johnson–Lindenstrauss result as presented in [9] suggests that for k at most \(4(\varepsilon ^2/2\varepsilon ^3/3)^{1}\ln m\) a suitably chosen random \(\Omega \in {\mathbb {R}}^{n\times k}\) results in a distortion of \(1\pm \varepsilon \) with sufficiently high probability. This roughly translates to that each matrix element of \(D^{\frac{1}{2}}VV^\top D^{\frac{1}{2}}\) and \(D^{\frac{1}{2}}V\Omega \Omega ^\top V^\top D^{\frac{1}{2}}\) differs by at most this factor. When considering the sizes of m aimed for here—the dimension of the design space will be a few thousands to a few hundred thousands—this number is still too large for efficient computations even for a moderate \(\varepsilon =0.1\). Indeed, the burden of forming the preconditioner and of applying it would exceed the gain by far. [17] propose an algorithmic variant for identifying a significant drop in singular values, but this requires successive matrix vector multiplications and these are quite costly in practice. In preliminary experiments with a number of tentative randomized variants, those relating the number of columns to the number of matrix–vector multiplications of the previous solve seemed reasonable. It will turn out, however, that even the cost of this is too high and the gain too small in comparison to the deterministic approach of the next section. The latter appears to capture the important directions quite well and offers better possibilities to exploit structural properties of the data. Due to the rather clear superiority of the deterministic approach, the numerical experiments of Sect. 4 will only present results for one particular randomized variant that performed best among the tentative versions. It attempts to identify the most relevant subspace by storing and extending the important directions generated in the previous round. For completeness and reproducability, its details are given in Algorithm 8, but in view of the rather disencouraging results we refrain from further discussions.
Algorithm 8
(Randomized subspace selection forming \({{\hat{V}}}=V {{\hat{P}}}\))
Input: \(V\in {\mathbb {R}}^{m\times n}\), \(D\in {\mathbb {S}}^n_{++}\), previous relevant subspace \(P_{old}\in {\mathbb {R}}^{n\times {\underline{k}}}\) (initially \({\underline{k}}=0\)), previous number of multiplications \(n_{mult}\), previous \({{\hat{k}}}\) of Algorithm 3
Output: \({{\hat{V}}}\) (and stores \(P_{old}\))

1.
If (\({\underline{k}}=0\)) then

(a)
set \(k=\min \{n,3+2{{\hat{k}}},\lceil \sqrt{n_{mult}\frac{n_{mult}+n}{4}}\frac{n_{mult}}{2}\rceil \}\),

(b)
generate a standard Gaussian \(\Omega \in {\mathbb {R}}^{n\times k}\) and set \({{{\hat{P}}}}\leftarrow \Omega \),
else

(a)
set \(k_+=\max \{3,\lfloor \frac{\sqrt{n_{mult}}}{2}\rfloor {\underline{k}}\}\),

(b)
generate a standard Gaussian \(\Omega \in {\mathbb {R}}^{n\times k_+}\) and set \({{\hat{P}}}\leftarrow [P_{old},\Omega ]\).

(a)

2.
Orthonormalize \({{\hat{P}}}\), reset k to its number of columns, set \({{\hat{V}}}=V {{\hat{P}}}\).

3.
Compute eigenvalue decomposition \({{\hat{V}}}^\top D^{1}{{\hat{V}}}=P\mathop {\textrm{Diag}}(\hat{\lambda }_1\ge \dots \ge {{\hat{\lambda }}}_k)P^\top \).

4.
Compute threshold \({{\bar{\lambda }}}=\max \{10,e^{\frac{1}{10}\ln {{\hat{\lambda }}}_1\frac{9}{10}\ln {{\hat{\lambda }}}_k}\}\) (enforce \({{\hat{\lambda }}}_k>0\)).

5.
Set \({\underline{k}}\leftarrow \min \big \{k,\max \{3,i>3:{{\hat{\lambda }}}_i>{{\bar{\lambda }}}\}\big \}\) and set \(P_{old}\leftarrow {{\hat{P}}}P_{\bullet ,[1,\dots ,{\underline{k}}]}\).

6.
Return \({{\hat{V}}}\).
3.2 A deterministic subspace selection approach
In the conic bundle method, \(H=D+VV^\top \) of Theorem 2 is of the form described in (8). An inspection of the column blocks of this V suggests to concretize the bound of Theorem 2 for interior point related applications to Theorem 9 below. In this, \(B^\top X^{\frac{1}{2}}\) may be thought of as an adapted factorization variant of block \(B^\top {\mathfrak {X}}_t^{\frac{1}{2}}(I\tfrac{{\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t}({\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t})^{\frac{1}{2}}\) in (8) with \(X={{\mathfrak {X}}_t^{\frac{1}{2}}(I\tfrac{{\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t}({\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t}){\mathfrak {X}}_t^{\frac{1}{2}}}\). Alternatively, for the full V, consider X as consisting of the three diagonal blocks I, \(D_w\) and \({{\mathfrak {X}}_t^{\frac{1}{2}}(I\tfrac{{\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t}({\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t}){\mathfrak {X}}_t^{\frac{1}{2}}}\) with suitably adapted B and note that in the resulting bound each diagonal block of X is added separately.
Theorem 9
Given \(D\in {\mathbb {S}}^{m}_{++}\) and \(B\in {\mathbb {R}}^{n\times m}\), let \(X\in {\mathbb {S}}^n_+\) have eigenvalue decomposition \(X=[{\bar{P}},{{\underline{P}}}] \left[ {\begin{matrix} {{{\bar{\Lambda }}}} &{} 0\\ 0 &{} {{\underline{\Lambda }}} \end{matrix}}\right] [{{{\bar{P}}}},{{\underline{P}}}]^\top \) with \([{{{\bar{P}}}},{{\underline{P}}}]^\top [{{{\bar{P}}}},{{\underline{P}}}]=I_n\) and diagonal \({{{\bar{\Lambda }}}}\in {\mathbb {S}}^k_+\), \({{\underline{\Lambda }}}\in {\mathbb {S}}^{nk}_+\). Put \(V=B^\top X^{\frac{1}{2}}\). For \(H=D+VV^\top \) and preconditioner \({{{{{\hat{H}}}({{{\bar{P}}}})}}}=D+V{{{\bar{P}}}} {{{\bar{P}}}}^\top V^\top \) the condition number is bounded by
where \({{\bar{\rho }}}=\max _{i=1,\dots ,nk}({{\underline{\Lambda }}})_{ii}\) and \({{\bar{\beta }}}=\max _{i=1,\dots ,n}\Vert B_{i,\bullet }\Vert _{D^{1}}\).
Proof
We show \(\lambda _{\max }({{{{{\hat{H}}}({\bar{P}})}}}^{\frac{1}{2}}V{{\underline{P}}}\,{{\underline{P}}}^\top V^\top {{{{{\hat{H}}}({{{\bar{P}}}})}}}^{\frac{1}{2}})\le \sum _{i=1}^{nk}({{\underline{\Lambda }}})_{ii}\Vert B^\top ({{\underline{P}}})_{\bullet ,i}\Vert ^2_{D^{1}}\), then the statement follows by Theorem 2. Note that \(V{{\underline{P}}}\,{{\underline{P}}}^\top V^\top =B^\top {{\underline{P}}}\,{{\underline{\Lambda }}}\,{{\underline{P}}}^\top B\). Furthermore, if \(\lambda \ge 0\) satisfies \(B^\top B\preceq \lambda D\), it also satisfies \(B^\top B\preceq \lambda (D+V{\bar{P}} {{{\bar{P}}}}^\top V^\top )\), therefore
\(\square \)
Note, the proof weakens \({{{{{\hat{H}}}({{{\bar{P}}}})}}}\) to D, so the bound cannot be expected to be strong. Yet, it provides a good rule of thumb on which columns of \(D^{\frac{1}{2}}BX^{\frac{1}{2}}\) should be included, namely those with large value \(\Lambda _{ii}\Vert B^\top ({{\underline{P}}})_{\bullet ,i}\Vert ^2_{D^{1}}\).
In interior point methods the spectral decomposition and the size of the eigenvalues of X of Theorem 9 strongly depend on the current iteration, in particular on the value of the barrier parameter \(\mu \). Therefore it is worth to set up a new preconditioner for each new KKT system. In order to do so in a computationally efficient way, the following dynamic selection heuristic for \({{{\bar{P}}}}\) with respect to V of (8) tries to either pick columns of V directly by including unit vectors in \({{{\bar{P}}}}\) or to at least take linear combinations of few columns of V in order to reduce the cost of matrix–vector multiplications and to preserve potential structural proprieties. So instead of forming \({{{\bar{P}}}}\), the heuristic builds \({{\hat{V}}}=V{{{\bar{P}}}}\) directly by appending (linear combinations of selected) columns of V to \({{\hat{V}}}\). Also, it will often only employ approximations of the eigenvalues \(\lambda _i\) together with approximations \(p_i\) of the eigenvectors of the X described in Theorem 9. Generally, it will include those in \({{\hat{V}}}\) for which an estimate of \(\lambda _i\Vert B^\top p_i\Vert _{D^{1}}^2\) exceeds a given bound \({{\underline{\rho }}}\). In order to reduce the number of matrix vector multiplications, \(\Vert B^\top p_i\Vert ^2_{D^{1}}\) will only be computed for those \(p_i\) with \(\big (\sum _{j=1}^n(p_i)_j^2\Vert (B^\top )_{\bullet ,j}\Vert _{D^{1}}\big )^2\ge {{\underline{\rho }}}\) where the column norms of \(B^\top \) are precomputed for each KKT systems. The implementation uses \({\underline{\rho }}=10\). Next the selections are explained by going through V of (8) step by step for each of its three column groups \(V_{{{\mathfrak {H}}}}\), \(A^\top D_w^{\frac{1}{2}}\) and \(B^\top {\mathfrak {X}}_t^{\frac{1}{2}}(I\tfrac{{\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t}({\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t})^{\frac{1}{2}}\). Concerning the third group, it will become clear in the discussion of the semidefinite part that in practice it is advantageous to replace the square root \({\mathfrak {X}}_t^{\frac{1}{2}}\) in the factorization of \({\mathfrak {X}}_t\) by a more general, possibly nonsymmetric factorization \({\mathfrak {X}}_t={\mathfrak {F}}_t{\mathfrak {F}}_t^\top \). The matrix \({\mathfrak {F}}_t\) will have the same block structure and leads to a similar rank one correction by the transformed trace vector \({\mathfrak {F}}_t^\top \mathbb {1}_t\),
Algorithm 10
(Deterministic column selection heuristic forming \({{\hat{V}}}\))
Input: \(D_{{\mathfrak {H}}}\), \(V_{{\mathfrak {H}}}\), \(D_y\), A, \(D_w\), B, \({\mathfrak {X}}_t\), \(\zeta \), \(\sigma \) specifying D and \(V\in {\mathbb {R}}^{m\times n}\) of (8)
Output: \({{\hat{V}}}\in {\mathbb {R}}^{m\times n'}\) for some \(n'\le n\) with \({{\hat{V}}}= V{{{\bar{P}}}}\), \({{{\bar{P}}}}^\top {{\bar{P}}}=I_{n'}\).

1.
Initialize \({{\hat{V}}}\leftarrow 0\in {\mathbb {R}}^{m \times 0}\), \({\underline{\rho }}:={10}\).

2.
Find \({\mathcal {J}}_{V_{{{\mathfrak {H}}}}}=\{j:\Vert (V_{{{\mathfrak {H}}}})_{\bullet ,j}\Vert _{D^{1}}^2\ge {\underline{\rho }}\}\) and set \({{\hat{V}}}\leftarrow [{{\hat{V}}},(V_{{{\mathfrak {H}}}})_{\bullet ,{\mathcal {J}}_{V_{{{\mathfrak {H}}}}}}]\).

3.
Find \({\mathcal {J}}_A=\{j:(D_w)_{jj}\Vert (A^\top )_{\bullet ,j}\Vert _{D^{1}}^2\ge {\underline{\rho }}\}\) and set \({{\hat{V}}}\leftarrow [{{\hat{V}}},(A^\top D_w)_{\bullet ,{\mathcal {J}}_A}]\).

4.
Compute \({\mathfrak {F}}_t^{\top }\mathbb {1}_t\), \(\eta =\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {F}}_t{\mathfrak {F}}_t^\top \mathbb {1}_t\), \(B^\top {\mathfrak {F}}_t^{\top }\mathbb {1}_t\) and for each conic diagonal block of \({\mathfrak {X}}_t\) call append_“cone”_columns\(({{\hat{V}}})\) with corresponding parameters.

5.
Return \({{\hat{V}}}\).
The first group of columns \(V_{{\mathfrak {H}}}\in {\mathbb {R}}^{m\times h_{{\mathfrak {H}}}}\) matches, in the notation of Theorem 9, (a subblock of) \(B^\top =V_{{\mathfrak {H}}}\) and (a diagonal block) \(X=I_{h_{{\mathfrak {H}}}}\). The heuristic appends those columns j to \({{\hat{V}}}\) that satisfy \(\Vert (V_{{\mathfrak {H}}})_{\bullet ,j}\Vert ^2_{D^{1}}\ge {{\underline{\rho }}}\).
For the second group of columns \(A^\top D_w^{\frac{1}{2}}\), Theorem 9 applies to \(B^\top =A^\top \) and \(X=D_w=\mathop {\textrm{Diag}}(d_1,\dots ,d_{h_A})\). Thus, column j is appended to \({{\hat{V}}}\) if \(d_j\Vert (A^\top )_{\bullet ,j}\Vert _{D^{1}}^2\ge {{\underline{\rho }}}\).
With the comment above regarding \({\mathfrak {F}}_t\), the third column group is formed by a term \(B^\top {\mathfrak {F}}_t(I\tfrac{{\mathfrak {F}}_t^{\top }\mathbb {1}_{t}({\mathfrak {F}}_t^{\top }\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t})^{\frac{1}{2}}\) for each cutting model (we assume just one here). With respect to Theorem 9, B is just right and X is the positive (semi)definite matrix \({\mathfrak {X}}_t\tfrac{{\mathfrak {X}}_t\mathbb {1}_{t}({\mathfrak {X}}_t\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t}={\mathfrak {F}}_t(I\tfrac{{\mathfrak {F}}_t\mathbb {1}_{t}({\mathfrak {F}}_t\mathbb {1}_{t})^\top }{\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {F}}_t{\mathfrak {F}}_t^\top \mathbb {1}_t}){\mathfrak {F}}^\top \). Recall that \({\mathfrak {X}}_t\) is a block diagonal matrix with the structure of the diagonal blocks governed by the linearization of the perturbed complementarity conditions of the various cones. The overarching rank one modification by \({\mathfrak {F}}_t\mathbb {1}_{t}\) couples the blocks within the same cutting model and reappears in some form in each block together with \(\eta =\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t\). Observe that with \(\Vert {\mathfrak {F}}_t^{\top }\mathbb {1}_{t}\Vert ^2=\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t\)
For each column p of \({{{\bar{P}}}}\) computing \(B^\top {\mathfrak {F}}_t(I\tfrac{{\mathfrak {F}}_t^{\top }\mathbb {1}_{t}({\mathfrak {F}}_t^{\top }\mathbb {1}_{t})^\top }{\eta })^{\frac{1}{2}}p\) splits into
Thus, by keeping the support of p restricted to single blocks, the proper column computations can be kept restricted to the respective block. This also holds for the coefficient \(\left\langle {{\mathfrak {F}}_t^{\top }\mathbb {1}_{t}},{p}\right\rangle \). The overarching vector \(B^\top {\mathfrak {X}}_t\mathbb {1}_{t}\) needs to be evaluated only once and can be added to the columns afterwards. The latter step only requires the respective coefficients but not the vectors of \({{{\bar{P}}}}\). This allows to speed up the process of forming \({{\hat{V}}}\) considerably. Therefore, when forming the conceptional \({{{\bar{P}}}}\) in the heuristic, the influence of \({\mathfrak {F}}_t^{\top }\mathbb {1}_{t}\) on eigenvalues and eigenvectors of the blocks will mostly be considered as restricted to each single block. Next the actual selection procedure is described for \({\mathfrak {X}}_t\) blocks corresponding to cones \({\mathbb {R}}^h_+\) (Algorithm 11) and \({\mathbb {S}}^h_+\) (Algorithm 13) with NesterovToddscaling [33, 38].
Algorithm 11
(append_\({\mathbb {R}}^h_+\)_columns\(({{\hat{V}}})\))
Input: column indices \(J\in {\mathbb {N}}^h\) and \(x\circ z^ {1}\) of this block in \({\mathfrak {X}}_t\), \(B^\top _{\bullet ,J}\), \(\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t\), \(B^\top {\mathfrak {X}}_t\mathbb {1}_t\), \(\eta =\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t\), D, threshold \({\underline{\rho }}\).
Output: updated \({{\hat{V}}}\).

1.
For each \(i=1,\dots ,h\) with \((\frac{x_i}{z_i}\frac{1}{\eta }\frac{x_i^2}{z_i^2})\Vert (B^\top )_{\bullet ,J(i)}\Vert _{D^{1}}^2\ge {\underline{\rho }}\) set
$$\begin{aligned}{} & {} \alpha \leftarrow \sqrt{\tfrac{x_i}{z_i}}\tfrac{1}{\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t} \left( 1\tfrac{\sqrt{\zeta ^{1}\sigma }}{\sqrt{\eta }}\right) ,\\{} & {} {{\hat{b}}}_i=\sqrt{\tfrac{x_i}{z_i}}(B^\top )_{\bullet ,J(i)} \alpha B^\top {\mathfrak {X}}_t\mathbb {1}_t, \end{aligned}$$and if \(\Vert {{\hat{b}}}_i\Vert _{D^{1}}^2>{{\underline{\rho }}}\) set \({{\hat{V}}}\leftarrow [{{\hat{V}}},{{\hat{b}}}_i].\)
For Algorithm 11 consider, within the cone specified by t, a block with indices \(J\in {\mathbb {N}}^h\) representing a nonnegative cone \({\mathbb {R}}^h_+\) with primal dual pair (x, z). The corresponding “diagonal block” in \({\mathfrak {X}}_t\) is of the form \(\mathop {\textrm{Diag}}(x\circ z^{1})\) and for \({\mathfrak {F}}_t\) it is \(\mathop {\textrm{Diag}}(x\circ z^{1})^{\frac{1}{2}}\). The relevant part of the trace vector \({\mathfrak {X}}_t\mathbb {1}_t\) reads \(\mathop {\textrm{Diag}}(x\circ z^{1})\mathbb {1}=x\circ z^{1}\). Considering the influence of the trace vector as restricted to this block alone gives \(\mathop {\textrm{Diag}}(x\circ z^{1})\frac{1}{\eta }(x\circ z^{1})(x\circ z^{1})^\top \) with the correct overarching \(\eta =\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t\). The eigenvectors to large eigenvalues of this matrix have their most important coordinates associated with the largest diagonal entries. The heuristic appends the columns \(B^\top {\mathfrak {X}}_t^{\frac{1}{2}}(I\tfrac{{\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t}({\mathfrak {X}}_t^{\frac{1}{2}}\mathbb {1}_{t})^\top }{\eta })^{\frac{1}{2}}e_{J(i)}\) to \({{\hat{V}}}\) for those \(e_{J(i)}\) with \((\frac{x_i}{z_i}\frac{1}{\eta }\frac{x_i^2}{z_i^2})\Vert (B^\top )_{\bullet ,J(i)}\Vert _{D^{1}}^2\ge {\underline{\rho }}\).
Note that in interior point methods \(x_iz_i\approx \mu \) for barrier parameter \(\mu \searrow 0\) and \(x_i\rightarrow x_i^{opt}\), \(z_i\rightarrow z_i^{opt}\). Due to \(\eta \ge \frac{x_i}{z_i}\) with \(\eta \) mostly much larger, the estimated value roughly behaves like \(\frac{x_i^2}{\mu }\Vert (B^\top )_{\bullet ,J(i)}\Vert _{D^{1}}^2\) and, indeed, by experience it seems that columns are almost exclusively included only for active \(x_i^{opt}>0\) and only as \(\mu \) gets small enough. When computing high precision solutions with small \(\mu \), the rank of the preconditioner can thus be expected to match the number of active subgradients in the cutting model. Theorem 9 suggests that in iterative methods these columns have to be included in some form in order to obtain reliable convergence behavior.
Algorithm 13 below deals with a positive semidefinite cone \({\mathbb {S}}^h_+\) with NesterovToddscaling. For the current purposes it suffices to know that the diagonal block of \({\mathfrak {X}}_t\) indexed by appropriate \(J\in {\mathbb {N}}^{h+1\atopwithdelims ()2}\) is of the form \(W\otimes _{s}W\) for a positive definite \(W\in {\mathbb {S}}^h_{++}\); see [38] for its efficient computation and for an “Appendix” of convenient rules for computing with symmetric Kronecker products. The next result derives the eigenvectors and eigenvalues when considering the rank one correction restricted to this block.
Lemma 12
Let \(W=P_W\Lambda _W P_W^\top \) with \(\Lambda _W=\mathop {\textrm{Diag}}(\lambda ^W_1\ge \dots \ge \lambda ^W_h>0)\) and \(P_W^\top P_W=I_h\), \(P_W=[w_1,\dots ,w_h]\). Furthermore let \(U=\Lambda _W^2\frac{1}{\eta }(\Lambda _W^2\mathbb {1})(\Lambda _W^2\mathbb {1})^\top \) have eigenvalue decomposition \(U=P_U\Lambda _UP_U^\top \) with \(P_U^\top P_U=I_h\). The eigenvalues of \(W\otimes _{s}W\frac{1}{\eta }\big ((W\otimes _{s}W)\mathop {\textrm{svec}}(I_n)\big )\big ((W\otimes _{s}W)\mathop {\textrm{svec}}(I_h)\big )^ \top \) are \(\lambda ^U_i=(\Lambda _U)_{ii}\) with eigenvectors \(\sum _{j=1}^h (P_U)_{ji}\mathop {\textrm{svec}}(w_jw_j^\top )\) for \(i=1,\dots ,h\) and \(\lambda _i^W\lambda ^W_j\) with eigenvectors \(\frac{1}{\sqrt{2}}\mathop {\textrm{svec}}(w_iw_j^\top +w_jw_j^\top )\) for \(1\le i< j\le h\).
Proof
By [2] the eigenvalues of \((W\otimes _{s}W)\) are \(\lambda ^W_i\lambda ^W_j\) for \(1\le i\le j\le h\) with orthonormal eigenvectors
To see this e.g. for \(i<j\) observe \(w_{ij}^\top w_{ij}=\frac{1}{2}[2\left\langle {w_iw_j^\top },{w_iw_j^\top }\right\rangle +2\left\langle {w_iw_j^\top },{w_jw_i^\top }\right\rangle ]\) and \((W\otimes _{s}W)w_{ij}=\frac{1}{\sqrt{2}}\mathop {\textrm{svec}}(W w_iw_j^\top W+ W w_jw_i^\top W)= \lambda ^W_i\lambda ^W_j w_{ij}\).
From \((W\otimes _{s}W)\mathop {\textrm{svec}}(I_h)=\mathop {\textrm{svec}}(W^2)=\sum _{i=1}^h(\lambda ^W_i)^2w_{ii}\) one obtains
The eigenvectorsorting \(P_{\otimes _{s}}=[w_{11},w_{22},\dots ,w_{hh},w_{12},w_{13},\dots ,w_{h1,h}]\) gives
The result now follows by direct computation. \(\square \)
For semidefinite blocks, numerical experience indicates that it is indeed worth to determine the eigenvalue decomposition of U as in Lemma 12. Finding the eigenvalues and eigenvectors roughly requires the same amount of work as forming W and is of no concern. With \(J\in {\mathbb {N}}^{h+1\atopwithdelims ()2}\) denoting the column indices of this block within \(B^\top \), columns to corresponding eigenvectors are computed by \((\Lambda _U^{\frac{1}{2}})_{ii}\cdot (B^\top )_{\bullet ,J}\sum _{j=1}^h (P_U)_{ji}w_{jj}\) or \(\sqrt{\lambda ^W_i\lambda ^W_j}(B^\top )_{\bullet ,J}w_{ij}\). This involves linear combinations of \({h+1\atopwithdelims ()2}\) columns and is computationally expensive if the order h of W gets large. Indeed, when testing all columns by their correct norms \(\Vert (B^\top )_{\bullet ,J}w_{ij}\Vert _{D^{1}}^2\), too much time is spent in forming the preconditioner. Therefore the heuristic Alg. 13 first selects candidate eigenvectors to use for \({{{\bar{P}}}}\) via the rough estimate \(\sum _{{{{{\hat{\imath }}}}}=1}^{h\atopwithdelims ()2}(w_{ij})_{{{{\hat{\imath }}}}}^2\Vert (B^\top )_{\bullet ,J({{{{\hat{\imath }}}}})}\Vert _{D^{1}}^2=\Vert w_{ij}\Vert _{\mathop {\textrm{Diag}}(BD^{1}B^\top )_J}^2\). For the selected eigenvectors it then computes the precise values after the following transformation that is only seemingly involved.
In order to also account for the possibly overarching contribution of \({\mathfrak {F}}_t\mathbb {1}_{t}\) it is advantageous to find a representation equivalent to \(B^\top X^{\frac{1}{2}} {{{\bar{P}}}}\) with orthonormal columns in \({{{\bar{P}}}}\) as in Theorem 9 for a suitable factorization of X other than its square root. For this, let \(V_W=P_W\Lambda _W^{\frac{1}{2}}\), then \(W\otimes _{s}W=(V_W\otimes _{s}V_W)(V_W^\top \otimes _{s}V_W^\top )\). Because \((V_W^\top \otimes _{s}V_W^\top )\mathop {\textrm{svec}}I=\mathop {\textrm{svec}}(V_W^\top V_W)=\mathop {\textrm{svec}}\Lambda _W\) and \((V_W\otimes _{s}V_W)=(P_W\otimes _{s}P_W)(\Lambda _W^{\frac{1}{2}}\otimes _{s}\Lambda _W^{\frac{1}{2}})\), the notation of Lemma 12 and its proof allows to rephrase the semidefinite block of \({\mathfrak {X}}_t\tfrac{{\mathfrak {X}}_t\mathbb {1}_{t}({\mathfrak {X}}_t\mathbb {1}_{t})^\top }{\eta }\) as
where
This suggests to put \(V=B^\top P_{\otimes _{s}} F\) and to derive the columns corresponding to \({{{\bar{P}}}}\) via the singular value decomposition of \(F=Q_F\Sigma _FP_F^\top \). Lemma 12 provides the squared singular values \(\Sigma _F^2\) and the eigenvectors give the leftsingular vectors in \(Q_F\). For \(e_{ij}:=\frac{1}{\sqrt{2}}\mathop {\textrm{svec}}(e_ie_j^\top +e_je_i^\top )\) there holds \(\mathop {\textrm{svec}}(\lambda _W)^\top e_{ij}=0\), so the rightsingular vectors corresponding to \(\sqrt{\lambda _i\lambda _j}\) read \((P_F)_{\bullet ,ij}=e_{ij}\). The remaining rightsingular vectors of \(P_F\) may be computed via \(P_F=F^\top Q_F\Sigma _F^{1}\). In this it is sufficient and convenient to consider only the U block, i.e., the support restricted to the iicoordinates. Denote the columns of \(P_U=[u_1,\dots ,u_h]\) in Lemma 12 by \(u_j\) for \(j=1,\dots ,h\), then the corresponding rightsingular vectors \(u^F_j\in {\mathbb {R}}^h\) read for \(\Lambda _W=\mathop {\textrm{Diag}}(\lambda ^W)\) and \(\Lambda _U=\mathop {\textrm{Diag}}(\lambda ^U_1,\dots ,\lambda ^U_h)\)
By expanding the U block to the correct positions, the rightsingular vector to singular value \(\sqrt{\lambda ^U_j}\) is \((P_F)_{\bullet ,{jj}}=\mathop {\textrm{svec}}(\mathop {\textrm{Diag}}(u_j^F))\) for \(j=1,\dots ,h\).
With these preparations the selected semidefinite columns are appended to \({{\hat{V}}}\) as follows. First note that the semidefinite block with coordinates J of the factor \({\mathfrak {F}}_t\) is \((V_W \otimes _{s}V_W)\), which is nonsymmetric in general. The transformed trace vector \({\mathfrak {F}}_t^\top \mathbb {1}_t\) reads \(({\mathfrak {F}}_t^\top \mathbb {1}_t)_J=(V_W^\top \otimes _{s}V_W^\top )\mathop {\textrm{svec}}I=\mathop {\textrm{svec}}(\Lambda _W)\). If column \(p_{ij}^F\) of \(P^F\) with \(1\le i\le j\le h\) is selected for \({{{\bar{P}}}}\) by the heuristic, the column to be appended to \({{\hat{V}}}\) reads
If the selected indices satisfy \(i<j\), the vector \(p_{ij}^F\) is just \(e_{ij}=\frac{1}{\sqrt{2}}\mathop {\textrm{svec}}(e_ie_j^\top +e_je_i^\top )\). By \((V_W \otimes _{s}V_W)e_{ij}=\frac{\sqrt{\lambda _i^W\lambda _j^W}}{\sqrt{2}}\mathop {\textrm{svec}}(w_iw_j^\top +w_jw_i^\top )=\sqrt{\lambda _i^W\lambda _j^W}w_{ij}\) and \(\left\langle {\mathop {\textrm{svec}}{\Lambda _W}},{e_{ij}}\right\rangle =0\) the column computation simplifies to
Typically, several mixed eigenvectors \(w_{ij}\) have the same index i corresponding to a large value \(\lambda _i^W\), so it quickly pays off to precompute \(w_i^\top \mathop {\textrm{svec}}^{1}([B^\top ]_{k,J})\) and to use these hvectors for each \(w_j\). For ease of presentation this implementational detail is not described in Alg. 13. Also, this is not helpful for the nonmixed vectors \(p_{jj}^F=\mathop {\textrm{svec}}(\mathop {\textrm{Diag}}(u_j^F))\), because
consists of a linear combination over all \(w_{ii}\). Fortunately, throughout our experiments, only few of the nonmixed vectors are among those selected for preconditioning. A possible explanation for this might be that with respect to the selected bundle subspace the large nonmixed terms reflect the rank of the currently strongly active eigenspace while large mixed terms reflect its ongoing interaction with the eigenspace of moderately active or inactive eigenvalues. The transformed trace vector coefficient for \(p_{jj}^F\) evaluates to \(\left\langle {\Lambda _W},{\mathop {\textrm{Diag}}(u_j^F)}\right\rangle =\left\langle {\lambda ^W},{u_j^F}\right\rangle \). With this, the algorithm for appending semidefinite columns reads as follows.
Algorithm 13
(append_\({\mathbb {S}}^h_+\)_columns\(({{\hat{V}}})\))
Input: column indices \(J\in {\mathbb {N}}^{h+1\atopwithdelims ()2}\) and NesterovTodd scaling matrix \(W\succ 0\) of this block in \({\mathfrak {X}}_t\), \(B^\top _{\bullet ,J}\), \(\mathbb {1}_t{\mathfrak {X}}_t\mathbb {1}_t\), \(B^\top {\mathfrak {X}}_t\mathbb {1}_t\), \(\eta =\zeta ^{1}\sigma +\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t\), D, threshold \({\underline{\rho }}\)
Output: updated \({{\hat{V}}}\).

1.
Compute norms \(\Vert (B^\top )_{\bullet ,J(i)}\Vert _{D^{1}}\), set \({{\hat{\rho }}}={\underline{\rho }}/\max _{i=1,\dots ,{h+1\atopwithdelims ()2}}\Vert (B^\top )_{\bullet ,J(i)}\Vert _{D^{1}}^2\),
compute eigenvalue decomposition \(W=P_W\Lambda _W P_W^\top \), \(\Lambda _W=\mathop {\textrm{Diag}}(\lambda ^W)\) with \(\lambda ^W_{1}\ge \dots \ge \lambda ^W_{h}\), let \(w_{ij}\) be defined by (12).

2.
If \((\lambda ^W_{1})^2<{{\hat{\rho }}}\) do nothing and return \({{\hat{V}}}\).

3.
Compute \(U=\Lambda _W^2\frac{1}{\eta }(\lambda ^W)(\lambda ^W)^\top \), eigenvalue decomposition \(U=P_U\Lambda _U P_U^\top \),
with \(P=[u_1,\dots ,u_h]\).

4.
For each \({{{{\hat{\imath }}}}}=1,\dots ,h\) with \((\Lambda _U)_{{{{{\hat{\imath }}}}}{{{{\hat{\imath }}}}}}\ge {{\hat{\rho }}}\) do:
Compute \({{\hat{w}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\imath }}}}}}=\sum _{i=1}^h (u_{{{{\hat{\imath }}}}})_iw_{ii}\) (\(\in {\mathbb {R}}^{h+1\atopwithdelims ()2}\)).
If \((\Lambda _U)_{{{{{\hat{\imath }}}}}{{{{\hat{\imath }}}}}}\sum _{j=1}^{h+1\atopwithdelims ()2}( {{\hat{w}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\imath }}}}}})_j^2\Vert (B^\top )_{\bullet ,J(j)}\Vert _{D^{1}}^2\ge {\underline{\rho }}\) then:

(a)
Compute \(u_{{{{{\hat{\imath }}}}}}^F\) according to (13) and set
$$\begin{aligned}{} & {} \alpha \leftarrow \left\langle {\lambda ^W},{u_{{{{\hat{\imath }}}}}^F}\right\rangle \tfrac{1}{\mathbb {1}_t^\top {\mathfrak {X}}_t\mathbb {1}_t} \left( 1\tfrac{\sqrt{\zeta ^{1}\sigma }}{\sqrt{\eta }}\right) ,\\{} & {} {{\hat{b}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\imath }}}}}}=(B^\top )_{\bullet ,J}\sum _{i=0}^h (u_{{{{\hat{\imath }}}}}^F)_i\lambda ^W_iw_{ii}\alpha B^\top {\mathfrak {X}}_t\mathbb {1}_t. \end{aligned}$$ 
(b)
If \(\Vert {{\hat{b}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\imath }}}}}}\Vert _{D^{1}}^2\ge {{\underline{\rho }}}\) set \({{\hat{V}}}\leftarrow [{{\hat{V}}},{{\hat{b}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\imath }}}}}}]\).

(a)

5.
For each \(1\le {{{{\hat{\imath }}}}}<{{{{\hat{\jmath }}}}}\le h\) with \(\lambda ^W_{{{{{\hat{\imath }}}}}}\lambda ^W_{{{{{\hat{\jmath }}}}}}>{{\hat{\rho }}}\) do:
If \(\sqrt{\lambda ^W_{{{{{\hat{\imath }}}}}}\lambda ^W_{{{{{\hat{\jmath }}}}}}}\sum _{j=1}^{h+1\atopwithdelims ()2}( {{\hat{w}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}})_j^2\Vert (B^\top )_{\bullet ,J(j)}\Vert _{D^{1}}^2\ge {\underline{\rho }}\) set
$$\begin{aligned} {{\hat{b}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}}=\sqrt{\lambda ^W_{{{{{\hat{\imath }}}}}}\lambda ^W_{{{{{\hat{\jmath }}}}}}}(B^\top )_{\bullet ,J}w_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}} \end{aligned}$$and if \(\Vert {{\hat{b}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}}\Vert _{D^{1}}^2\ge {\underline{\rho }}\) set \({{\hat{V}}}\leftarrow [{{\hat{V}}},{{\hat{b}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}}]\).

6.
Return \({{\hat{V}}}\).
As for the linear case it can be argued that for small barrier parameter \(\mu \) the number of selected columns corresponds at least to the order of the active submatrix in the cutting model. Thus if \({{\hat{h}}}\le h\) eigenvalues of \(X\in {\mathbb {S}}^h_+\) converge to positive values in the optimum, the heuristic will end up with selecting at least \({{{\hat{h}}}+1\atopwithdelims ()2}\) columns once \(\mu \) gets small.
For second order cones \({{{\mathcal {Q}}^{h}}}\) the structural properties of the arrow operator and the NesterovTodddirection allow to restrict considerations to just two directions per cone for preconditioning, but as the computational experiments do not involve second order cones this will not be discussed here.
4 Numerical experiments
The purpose of the numerical experiments is to explore and compare the behavior and performance of the pure and preconditioned iterative variants to the original direct solver on KKT instances that arise in the course of solving large scale instances by the conic bundle method.
It has to be emphasized that the experiments are by no means designed and intended to investigate the efficiency of the conic bundle method with internal iterative solver. Indeed, many aspects of the ConicBundle code [19] such as the cutting model selection routines, the path following predictorcorrector approach and the internal termination criteria have been tuned to work reasonably well with the direct solver. As the theory suggests and the results support, the performance of iterative methods depends more on the size of the active set than on the size of the model. Thus somewhat larger models might be better in connection with iterative solvers. Also, the predictorcorrector approach is particularly efficient if setting up the KKT system is expensive. For iterative methods with deterministic preconditioning this hinges on the cost of forming the preconditioner which gets expensive once the barrier parameter gets small. Furthermore iterative methods might actually profit from staying in a rather narrow neighborhood of the central path. Therefore many implementational decisions need to be reevaluated for iterative solvers. This is out of scope for this paper. Hence, the experiments only aim to highlight the relative performance of the solvers on sequences of KKT systems that currently arise in ConicBundle. For the sole purpose of demonstrating the relevance of this KKT system based analysis, Sect. 4.4 will present a comparison on the performance of ConicBundle when employing the KKT solver variants without any further adaptations of parameters.
The KKT system oriented experiments will report on the performance for three different instances: the first, denoted by MC, is a classical semidefinite relaxation of MaxCut on a graph with 20,000 nodes as described in [15, 26], the second, BIS, is a semidefinite MinimumBisection relaxation improved by dynamic separation of odd cycle cutting planes on the support of the Boeing instance KKT_traj33 giving a graph on 20,006 nodes explained in [18], and the third, MMBIS, refers to a minmaxbisection problem problem shifting the edge weights so as to minimize a restricted maximum cut on a graph of 12,600 nodes. All three have a single semidefinite cutting model which consists of a semidefinite cone with up to one nonnegative variable, so the model cone \({\mathcal {S}}^t_+\) of (2) typically has \(t=(1,[],[h])\) for some \(h\in {\mathbb {N}}\). In the MaxCut instance the design variables are unconstrained, in the Bisection instance the design variables corresponding to the cutting planes are sign constrained (\(D_y\) is needed) and in the min–maxbisection problem some design variables have bounds and there are linear equality and inequality constraints (\(D_y\), \(D_w\) and A appear). Throughout, the proximal term is a multiple of the identity for a dynamic weight, i.e., \({\mathfrak {H}}_k=u_kI\) with \(u_k>0\) controlled as in [21].
In each case ConicBundle is run with default settings for the internal constrained QP solver with direct KKT solver for the bundle subproblems. Whenever a new KKT system arises, it is solved consecutively but independently on the same machine by

(DS) the original direct solver,

(IT) MINRES without preconditioning (the implementation follows [12]),

(RP) MINRES with randomized preconditioning (Algorithm 3 with Algorithm 8),

(DP) MINRES with deterministic preconditioning (Algorithm 3 with Algorithm 10).
Only the results of the direct solver are then used to continue the algorithm. Note, for nonsmooth optimization problems tiny deviations in the solution of the subproblem may lead to huge differences in the subsequent path of the algorithm. Therefore running the bundle method with different solvers would quickly lead to incomparable KKT systems. That the chosen approach does not impair the validity of the conclusions regarding the performance of the solvers within the bundle method will be demonstrated in Sect. 4.4.
The details of the direct solver DS are of little relevance at this point. Suffice it to say that its main work consists in Schur complementing the \({\mathfrak {H}}\) and \(\zeta ^{1}\sigma \) blocks of the KKT system (6) into the joined \(\mathop {\textrm{Diag}}(D_w^{1},{\mathfrak {X}}_t^{1})\) block and factorizing this. In the MaxCut setting (no \(D_y\)), the \({\mathfrak {H}}\) block is constant throughout each bundle subproblem. In this case the Schur complement is precomputed once for each bundle subproblem—thus for several KKT systems—and this makes this approach extremely efficient as long as the order h of the semidefinite model is small. Precomputation is no longer possible if \(D_y\) is needed which is the case in the two other instances. Finally, if A is also present, the system to be factorized in every iteration gets significantly larger. These differences motivated the choice of the instances and explain part of the strong differences in the performance of the solvers.
For MaxCut and Bisection the iterative solver could exploit the positive definiteness of the system by employing conjugate gradients instead of MINRES. The minmaxbisection problem comprises equality constraints in A, so the system is no longer positive definite and conjugate gradients are not applicable. Employing MINRES for all three facilitates the comparison, in particular as MINRES seemed to perform numerically better on the other instances as well. MINRES computes the residual norm with respect to the inverse of the preconditioner and the implementation uses this norm for termination. To safeguard against effects due to the changes in this norm, the relative precision requirement \(\min \{10^{6},10^{2}\mu \}\) of ConicBundle is multiplied, in the notation of Alg. 3, by the factor \(\big (\root m \of {\prod _{i=1}^{{{\hat{k}}}}(1+{{\hat{\lambda }}}_i)^{1}}\cdot \min _i (D^{1})_i\big )^{\frac{1}{2}}\).
The results on the three instances will be presented in eight plots per instance. The first four compare all four solvers, the last four plots are devoted to information that is only relevant for iterative solvers, so DS will not appear in these.

1.
Plot “time per subproblem (seconds)” gives for each of the four methods a box plot on the seconds (in logarithmic scale) required to solve the subproblems. For each subproblem this is the sum of the time required for initializing/forming and solving all KKT systems of this subproblem. This is needed, because in the case of MaxCut instance MC, the direct solver DS forms the Schur complement of the \({\mathfrak {H}}\)block only once per subproblem and this is also accounted for here.

2.
Plot “subproblem time (seconds) per iteration” displays the same cumulative time per subproblem in seconds (in logarithmic scale) for each successive iteration so that the development in solution time is aligned to the progress of the bundle method.

3.
Plot “time per subproblem vs. bundle size” serves to highlight the dependence of the solution time on the size of the cutting model (number of rows of B). For this the subproblems are grouped in the bundle size ranges (0, 50], (50, 500], (500, 1500], \((1500,\infty ]\). Instead of infinity the actual observed maximum is listed in the bottom line of the plot.

4.
Plot “time per subproblem vs. last \(\mu \)” illustrates the dependence of the solution time on the last barrier parameter \(\mu \) for which the subproblem has to be solved. Roughly this corresponds to the precision required for the subproblem. Because of the comparatively small number of subproblems and the strongly differing ranges of last \(\mu \) values results are presented for a subdivision of the subproblems into four groups of equal cardinality (up to integer division) sorted according to the \(\mu \) value of their respective last KKT system. The minimum \(\mu \) of each group is given in the bottom line of the plot.

5.
Plot “time per KKT system (seconds)” compares exclusively the iterative methods on the KKT systems belonging to the four different ranges of the barrier parameter \(\mu \) as collected over all subproblems. The first three box plots give the box plot statistics on the seconds (in logarithmic scale) spent in solving KKT systems for barrier parameter values \(\mu \ge 100\), the next three for \(100>\mu \ge 1\), etc. Note, DS would require the same time for all KKT systems of the same subproblem, because its solution time does not depend on \(\mu \) or the associated required relative precision described above.

6.
Plot “matrix vector multiplications per KKT system” shows box plots on the number of matrix–vector multiplications (in logarithmic scale) needed by MINRES, again subdivided into the same ranges of barrier parameter values.

7.
Plot “KKT system condition number estimate” presents the box plot statistics of an estimate of the condition number (in logarithmic scale) for the same ranges of the barrier parameter. The estimate is obtained by a limited number of Lanczos iterations on the respective (non)preconditioned system of the \({\mathfrak {H}}\) block; a possibly remaining equality part of A is ignored in this. Computation times for the condition number are not included in the time measurement listed above.

8.
Plot “preconditioning columns per KKT system” gives the box plot statistics of the number of columns \({{\hat{k}}}\) in Algorithm 3 for RP and DP for the usual ranges of the barrier parameter.
In all box plots, the width of the boxes indicates the relative size of the number of instances in the group, the horizontal lines of the boxes give the values of the upper quartile, the median and the lower quartile. The upper whisker shows the largest value below upper quartile\( + 1.5\cdot \)IQR, where IQR \(=\) (upper quartile − lower quartile) is the interquartile range. The lower whisker displays the smallest value above lower quartile\(  1.5\cdot \)IQR. The stars show maximum and minimum value.
Computation times refer to a virtualized compute server of 40 Intel Xeon Processor (Cascadelake) cores with 600 GB RAM under Ubuntu 18.04. This virtual machine is hosted on hardware consisting of two processors Intel(R) Xeon(R) Gold 6240R CPU with 2.40GHz with 24 cores and 768 GB RAM. The code, however, is purely sequential and does not exploit any parallel computation possibilities.
4.1 Maxcut (instance MC, Fig. 1)
The graph was randomly generated ([36], call rudy rnd_graph 20,000 1 1 for 20,000 nodes, edge density one percent, seed value 1). The semidefinite relaxation gives rise to an unconstrained problem with 20,000 variables. Each variable influences one of the diagonal elements of the Laplace matrix of the graph with cost one and the task is to minimize the maximum eigenvalue of the Laplacian times the number of nodes, see [26] for the general problem description.
For graphs of this type but smaller size like 5000 or 10,000 nodes the direct solver DS still seemed to perform better, so rather large sizes are needed to see some advantage of iterative methods. Other than that the relative behavior of the solvers was similar also for the smaller sizes. The jaggies within subproblem time in the second plot are due to the reduction of the model to its active part after each descent step while the model typically increases in size during null steps. During the very first iterations the bundle is tiny and DS is the best choice. Once the bundle size increases sufficiently, the iterative methods dominate. Over time, as precision requirements get higher and the choice of the bundle subspace converges, the advantage of iterative methods decreases. In the final phase of high precision the direct solver may well be more attractive again.
The plots also show that for this instance (and presumably for most instances of this random type) the performance of IT (MINRES without preconditioning) is almost as good as DP (deterministic preconditioning) while RP (randomized preconditioning) is not competitive. Note that the condition number does not grow excessively for IT in this instance. Deterministic preconditioning succeeds in keeping the condition number almost exactly at the intended value 10. For smaller values of \(\mu \), so for higher precision requirements, DP requires distinctly fewer matrix–vector multiplications, but it then also selects a large number of columns. In comparison to no preconditioning DP helps to improve stability but does not lead to significantly better computation times except maybe for the very last phase of the algorithm with high precision requirements.
4.2 Minimum bisection (instance BIS, Fig. 2)
The semidefinite relaxation of minimum bisection is similar in nature to maxcut, but in addition to the single diagonal elements there is a variable with coefficient matrix of all ones. Furthermore, variables with sparse coefficient matrices corresponding to odd cycles in the underlying graph are added dynamically in rounds, see [18] for the general framework and also for the origin of the instance KKT_traj33 with 20,006 nodes and roughly 260,000 edges.
Again, after the very first iterations the iterative methods turn out to perform distinctly better in the initial phase of the algorithm. Iterative methods get less attractive as precision requirements increase. The model size is often rather small (a bit larger than the active set of about 150 columns) which is favorable for DS. Indeed, additional output information of the log file indicates that the performance of DS drops off whenever the cutting model is significantly larger than that.
While for this instance RP is better than IT, the advantage of DP over the other iterative variants is quite apparent and its superiority also increases with precision requirements and smaller \(\mu \). In fact, for DP the condition number and the number of matrix–vector multiplications decrease again for smaller \(\mu \). Possible causes might be that the active set is easier to identify correctly. Due to the reduction in matrix–vector multiplications, computation time does not increase for DP in spite of a growing number of columns in the preconditioner.
4.3 A min–maxbisection problem (instance MMBIS, Fig. 3)
This problem arose in the context of an unpublished attempt^{Footnote 1} to optimize vaccination rates for five population groups \(N_1{{\dot{\cup }}}\dots {{\dot{\cup }}} N_5=N\) in a virtual town of \(n=N\) inhabitants. Briefly, within the town k anonymous people are assumed to be infectious. There is vaccine for at most \(\underline{n}\) people. The aim is to reduce the spreading rate of the disease by vaccinating each person with the respective group’s probability. The task of determining these vaccination rates motivated the following ad hoc model which would be hard to justify rigorously. In a graph \(G=(N,E)\) each edge \(ij=\{i,j\}\in E\) with \(i\in N_{{{{\hat{\imath }}}}}\), \(j\in N_{{{{\hat{\jmath }}}}}\) has a weight \({{\hat{w}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}}\) representing the infectiousness of the typical contact for these two persons of the respective groups. It will be convenient to define the weighted Laplacians \(L_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}}={{\hat{w}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}}\sum _{ij\in E, i\in N_{{{{\hat{\imath }}}}},j\in N_{{{{\hat{\jmath }}}}}}(e_ie_j)(e_ie_j)^\top \). In this simplified approach, vaccination rates \(v_{{{{\hat{\imath }}}}},v_{{{{\hat{\jmath }}}}}\) of the node groups reduce a nominal infectiousness \({{\hat{w}}}_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}}\) between these groups by the factor \(y_{{{{{\hat{\imath }}}}}{{{{\hat{\jmath }}}}}}\ge \max \{0,1v_{{{{\hat{\imath }}}}}v_{{{{\hat{\jmath }}}}}\}\). The spreading probability to be minimized is considered proportional to the restricted maxcut value
For determining the vaccination rates the combinatorial problem is replaced by the usual (dual) semidefinite relaxation
In this case the resulting KKT system also has an equality and several inequality constraints in the block A. Preconditioning results are presented for the KKT systems of an instance with n = 12,600 inhabitants splitting into groups of sizes 5770, 6000, 600, 30, 200, with \(k=126\) infectious persons and \(\underline{n}=1260\) available vaccinations.
In the actual computations the bundle size grows surprisingly fast. This not only entails enormous memory requirements but also excessive computation times for DS; indeed, computations of DS may exceed those of DP by a factor of 70. In consequence comparative results can only be reported for a very limited number of subproblem evaluations. In particular, the precision requirements remain rather moderate throughout these iterations. Still, the same initial behavior can be observed as for the previous two instances. For very small bundle sizes DS is best. Once the bundle size grows, the iterative methods take over. Among the iterative solvers RP is better than IT, but DP is the method of choice. It succeeds in tightly controlling the condition number by selecting rather few columns. With this DP requires the fewest matrix vector multiplications which seems to pay of quickly on this instance.
4.4 Performance within the bundle method for maxcut
The purpose of this section is to provide evidence for the reliability of the KKT oriented evaluations when the iterative solvers are employed within the bundle method directly. As explained in the introductory remarks to this Sect. 4, a full assessment of the use of iterative solvers within conic bundle methods is out of scope and beyond the possibilities of this work. Therefore results will only compare, without any further adaptations, the direct replacement of DS with the solvers IT, RP and DP, within the current ConicBundle implementation that was developed and tuned for DS. Note, however, that the evaluation of bundle methods requires a statistical approach.
In oracle based nonsmooth optimization it is typical that even slight numerical changes in the computation of candidates bring along significant differences in the actual trajectories. Indeed, candidates are generically close to ridges. Which subgradient is returned depends on which side of the ridge the candidate ends up. In particular, the use of different KKT solvers quickly leads to considerable differences in the models and subproblems and therefore also in the sequence of KKT problems. This erratic behavior is intrinsic at any level of precision, therefore it may be expected that the average number of bundle iterations (descent and null steps) does not depend too much on the actual KKT solver in use. Yet, due to this incomparability of trajectories, any attempt to assess the scope of the iterative solvers in comparison to the direct solver needs to be based on a reasonable collection of comparable instances. Their choice should help to illustrate the effects of parameters, that can be expected to be influential in the current context,

the cost of matrix–vector multiplications,

the size of the model,

precision requirements,

the use or nonuse of a predictor corrector approach,

the number of KKT instances and solves per subproblem.
In order to cover these aspects with manageable effort, results will be presented for eight methods and four groups of 25 randomly generated MaxCut instances. The methods without predictor corrector approach are denoted by DS, IT, RP, DP and those with predictor corrector approach by DSp, ITp, RPp, DPp. The names refer to using the respective direct or iterative solver for the KKT systems of the internal interior point method of ConicBundle for solving the subproblems. The four instance classes arise by generating five instances per number of nodes \(n\in \{10000, 20000\}\) and per density out of two edge density groups, one with smaller densities \(d\in \{0.1,0.2,0.3,0.4,0.5\}\) and one with higher density \(d\in \{1,2,3,4,5\}\) ( [36], call rudy rnd_graph n d s for seed \(s\in \{1,2,3,4,5\})\). The instances were solved with ConicBundle [19] on computers having QUADCoreprocessors INTELCoreI74770 with 4\(\times \) 3400 MHz, 8 MB Cache, 32 GB RAM and operating system Ubuntu 18.04. The code was run in sequential mode with each instance solved en suite for all methods on the same machine and all time measurements refer to user time. See the supplement [20] for the log files and the table listing the values of the plots in Figs. 4, 5, 6, 7 and 8. Mandated by limited resources, some volatility may have been caused by running two instances on each machine at the same time as well as by occasional further jobs. As instances and methods were randomly affected by this, influence on the conclusions should be marginal in view of the number of examples.
The maxcut instances serve the purpose well for the following reasons. First, as explained before, the direct solver DS is particularly efficient for MaxCut instances, because the Schur complement needs to be computed only once at the beginning of each bundle step for all interior point iterations / KKT systems associated with this subproblem. Thus, if iterative solvers are competitive for MaxCut this should also hold for more general cases. Likewise, the iterative solver IT without preconditioning performed better on the KKT instances for MaxCut than on those of the two other examples, therefore the limits of preconditioning are best discussed for MaxCut. Second, for MaxCut even large scale instances can be solved to reasonably high precision in manageable time which allows to compare the performance on several precision levels. To make this comparison reasonably efficient and fair in view of the weaknesses of the lack of progress stopping criterion of bundle methods, the comparisons use for each level of relative precision \(10^{3}\), \(10^{4}\), \(10^{5}\) and \(10^{6}\) the first descent step that produces a value below an instance dependent common relative reference value. For each instance this reference value is obtained by taking the minimum objective value obtained over all methods by running ConicBundle with termination precision \(10^{6}\). Third, random maxcut instances having the same number of nodes and similar edge density can be expected to have similar parameters and properties in terms of model size, cost of matrix–vector multiplications, and precision requirements. Note that a higher edge density increases the cost of matrix–vector multiplications but also causes a larger offset in objective value which entails somewhat reduced precision requirements for the KKT systems to reach the same relative precision. As the experiments will show, the model size—it is selected by ConicBundle on basis of the active rank, starts with roughly twice this size after descent steps for reasons investigated in [10] and increases further over null steps—seems to be less dependent on the edge densities but grows markedly with the number of nodes.
The first aspect to address is the dependence of the bundle method on the solvers. For this, Figs. 4 and 5 display for each group and solver the average development of the model sizes, the number of KKT systems solved per subproblem together with the last \(\mu \)value occuring there (it reflects the precision requirement of the final KKT system within the subproblem) and also the precision requirements on the subproblems themselves. This development is recorded in averages over groups of 10 steps and all 25 instances for each of the four instance groups. This rather detailed view will also help to explain differences in the computation times of the solvers for various precision levels.
The plots of Figs. 4 and 5 exhibit a natural separation between the methods with and without predictor corrector approach. In comparison, the differences between the solvers are almost negligible except for a few final iterations, which also suffer from stronger volatility due to a reduced number of samples. It is worth noting that the same holds for the overall number of KKT systems and bundle steps throughout all relative precision levels. A helpful visualization to illustrate such comparisons are performance profiles [11] for the total number of steps and KKT systems for each precision level. These turn out to have a shape similar to that displayed in Fig. 6 which presents the two profiles for bundle steps and KKT systems for the sparser case on 20,000 nodes and relative precision \(10^{6}\). While the smaller number of KKT systems (i.e. interior point iterations) of the predictor corrector variants is an expected outcome, it is rather surprising that the variants without predictor corrector seem to need a few bundle steps less on average to reach the required precision. A comparison with the model size plots of 4 and 5 suggests, that there is a distinct difference in the nature of these interior point solutions that also has its effect on the bundle selection mechanism, but so far this lacks a mathematically sound explanation. Still, a first conclusion might read that the behavior of the bundle method itself is, on average, independent of the choice of the four solvers.
Figures 7 and 8 display computation time performance profiles of the eight methods on the four classes of 25 instances for relative precision levels \(10^{3}\), \(10^{4}\), \(10^{5}\) and \(10^{6}\). These largely match the results on individual KKT systems. First consider the direct solvers DS and DSp. Again, a good explanation is lacking for the fact that DS dominates DSp in many cases with higher number of edges and higher precision. Whether DS and DSp are attractive compared to iterative methods depends on the ratio of the time invested into forming the Schur complement to the number of KKT steps required for solving the subproblem, i.e., they are preferable if the model size is small or the number of interior point iterations becomes large enough due to increasing precision requirements. In cases of strong initial growth of the model size (see the model size plots of Figs. 4, 5) iterative solvers are quickly better. The seemingly good performance of the direct solvers on the denser instances for precision level \(10^{3}\) is mostly due to the large constant offset that causes the methods to reach this precision often within ten steps (compare this to the asymptotic analysis in [22]); at this point model sizes are still small. Iterative solvers dominate precision levels \(10^{4}\) and \(10^{5}\) with reasonably low accuracy and few interior point iterations. The influence of the cost of a matrix–vector multiplication is visible in the difference of the initial head start to direct solvers between sparser and denser instances for precision \(10^{4}\). For instances on 20,000 nodes the average model size is one and a half times the average size of the 10,000 node instances (see Figs. 4, 5) and this explains part of the stronger performance of the iterative solvers on larger instances. For increasing precision requirements and number of interior point iterations the profiles also suggest that DS and DSp catch up faster for instances with fewer edges. This effect might again be caused by the constant offset, that is larger for random MaxCut instances with larger number of edges. Indeed, an inspection of the last \(\mu \) plots and KKT systems per subproblem in Figs. 4 and 5 suggests that for maxcut instances with a larger number of edges the subproblem solutions require less absolute accuracy which favors iterative solvers and compensates somewhat the higher cost of the matrix–vector multiplications. Note that the relative precision requirements for the solution of the subproblems (see Figs. 4, 5) are almost identical.
For the iterative methods the predictor corrector variant seems faster on lower precision levels but again the methods without predictor corrector catch up or may even dominate higher precision levels. For IT (no preconditioning) this should largely be due to the fact that predictor corrector requires two solves per KKT system. Thus, taking twice the number of KKT systems per problem for predictor corrector variants in Figs. 4 and 5 as the number of required solves provides a satisfactory explanation for IT. For RP and DP the situation is less clear cut, because the preconditioner is formed only once per KKT system, but the line of argument is similar. For the moderate accuracy levels \(10^{3}\) and \(10^{4}\) MaxCut instances could do without preconditioning, but the preconditioned variants do a good job. For \(10^{5}\) the advantage begins to show and for \(10^{6}\) the DP variants are almost consistently better than the other iterative methods.
Based on this analysis, a hybrid approach seems advisable that switches dynamically between the solvers depending on precision, model size and number of interior point iterations. In implementing these ideas a number of further design aspects would have to be reconsidered as outlined before. The true advantage of iterative solvers, however, is that dynamic model adaptations become feasible during the solution of the subproblem, because there is no need to recompute the Schur complement each time. This allows for entirely new strategies such as combining the ideas of [4, 34] and [25] in order to cut down on the number null steps at an early stage. This remains to be addressed in future work.
5 Conclusions
In search for efficient low rank preconditioning techniques for the iterative solution of the internal KKT system of the quadratic bundle subproblem two subspace selection heuristics—a randomized and a deterministic variant—were proposed. For the randomized approach the results are ambivalent in theory and in practice; obtaining a good subspace this way seems to be difficult and the cost of exploratory matrix–vector multiplications quickly dominates. In contrast, the deterministic subspace selection approach allows to control the condition number (and with it the number of matrix vector multiplications) at a desired level without the need to tune any parameters in theory as well as on the test instances. On these instances, for low precision requirements (large barrier parameter) the selected subspace is negligible small. For high precision requirements (small barrier parameter) the subspace grows to the active model subspace. If the bundle size is close to this active dimension, the work in forming the preconditioner may be comparable to forming the Schur complement for the direct solver. Still, for large scale instances the deterministically preconditioned iterative approach seems to be preferable.
Conceivably it is possible to profit in ConicBundle from the advantages of the deterministic iterative and the direct solver by switching dynamically between both. The current experiments relied on a predictorcorrector approach that was tuned for the direct solver. In view of the properties of the iterative approach it may well be worth to devise a different path following strategy for the iterative approach, in particular for the initial phase of the interior point method when the barrier parameter is still comparatively large and the work invested in forming the preconditioner is still negligible. Similar ideas should be applicable to interior point solvers for solving convex quadratic problems with low rank structure.
Notes
Together with B. Filipecki (TU Chemnitz), S. Heyder (TU Ilmenau), Th. Hotz (TU Ilmenau) within BMBFproject grant 05M18OCA.
References
Achlioptas, D.: Database friendly random projections. In: Proceedings of 20th ACM Symposium on Principles of Database Systems, Santa Barabara, CA, pp. 274–281 (2001)
Alizadeh, F., Haeberly, J.P.A., Overton, M.L.: Primaldual interiorpoint methods for semidefinite programming: convergence rates, stability and numerical results. SIAM J. Optim. 8(3), 746–768 (1998)
Anjos, M.F., Lasserre, J.B. (eds.): Handbook of Semidefinite, Conic and Polynomial Optimization. International Series in Operations Research & Management Science, vol. 166. Springer, Berlin (2012)
Babonneau, F., Beltran, C., Haurie, A., Tadonki, C., Vial, J.P.: ProximalACCPM: a versatile oracle based optimisation method. In: Kontoghiorghes, E.J., Gatu, C. (eds.) Optimisation, Econometric and Financial Analysis, pp. 67–89. Springer, Berlin (2007)
Benson, S., Ye, Y., Zhang, X.: Solving largescale sparse semidefinite programs for combinatorial optimization. SIAM J. Optim. 10(2), 443–461 (2000)
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004). Reprinted 2007 with corrections
Burer, S., Monteiro, R.D.: A nonlinear programming algorithm for solving semidefinite programs via lowrank factorization. Math. Program. 94, 329–357 (2003)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2002). https://doi.org/10.1002/rsa.10073
Ding, L., Grimmer, B.: Revisit of spectral bundle methods: Primaldual (sub)linear convergence rates (2020). arXiv:2008.07067
Dolan, E., Moré, J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Elman, H.C., Silvester, D.J., Wathen, A.J.: Finite Elements and Fast Iterative Solvers: With Applications in Incompressible Fluid Dynamics. Oxford University Press, Oxford (2005). Reprinted 2006
Fischer, F., Helmberg, C.: Dynamic graph generation for the shortest path problem in time expanded networks. Math. Program. 143(1–2), 257–297 (2014)
Fischer, F., Helmberg, C.: A parallel bundle framework for asynchronous subspace optimisation of nonsmooth convex functions. SIAM J. Optim. 24(2), 795–822 (2014)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42, 1115–1145 (1995)
Habibi, S., Kavand, A., Kocvara, M., Stingl, M.: Barrier and penalty methods for lowrank semidefinite programming with application to truss topology design (2021). arXiv:2105.08529
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Helmberg, C.: A cutting plane algorithm for large scale semidefinite relaxations. In: Grötschel, M. (ed.) The Sharpest Cut. MPSSIAM Series on Optimization, pp. 233–256. SIAM/MPS, Philadelphia (2004)
Helmberg, C.: ConicBundle v1.a.2. Fakultät für Mathematik, Technische Universität Chemnitz (2021). http://www.tuchemnitz.de/~helmberg/ConicBundle
Helmberg, C.: Supplement scientific data to publicaton “A preconditioned iterative interior point approach to the conic bundle subproblem". TU Chemnitz (2023). https://tucid.tuchemnitz.de/data/7bd7800c66774f7b954c105673e8e383 (persistent id)
Helmberg, C., Kiwiel, K.C.: A spectral bundle method with bounds. Math. Program. 93(2), 173–194 (2002)
Helmberg, C., Mohar, B., Poljak, S., Rendl, F.: A spectral approach to bandwidth and separator problems in graphs. Linear Multilinear Algebra 39, 73–90 (1995)
Helmberg, C., Overton, M.L., Rendl, F.: The spectral bundle method with secondorder information. Optim. Methods Softw. 29(4), 855–876 (2014)
Helmberg, C., Pichler, A.: Dynamic scaling and submodel selection in bundle methods for convex optimization. Preprint 2017, Fakultät für Mathematik, Technische Universität Chemnitz, Chemnitz (2017)
Helmberg, C., Rendl, F.: Solving quadratic (0,1)problems by semidefinite programs and cutting planes. Math. Program. 82(3), 291–315 (1998)
Helmberg, C., Rendl, F.: A spectral bundle method for semidefinite programming. SIAM J. Optim. 10(3), 673–696 (2000)
Henrion, D., Korda, M., Lasserre, H.B.: The MomentSOS Hierarchy. Series on Optimization and Its Applications, vol. 4. World Scientific, Singapore (2020)
Higham, N.J., Mary, T.: A new preconditioner that exploits lowrank approximations to factorization error. SIAM J. Sci. Comput. 41(1), A59–A82 (2019)
HiriartUrruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Grundlehren der Mathematischen Wissenschaften, vol. 305. Springer, Berlin (1993)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985)
Kim, S., Kojima, M., Mevissen, M., Yamashita, M.: Exploiting sparsity in linear and nonlinear matrix inequalities via positive semidefinite matrix completion. Math. Program. 129, 33–68 (2011)
Kocvara, M., Stingl, M.: On the solution of largescale SDP problems by the modified barrier method using iterative solvers. Math. Program. 109(2–3), 413–444 (2007)
Nesterov, Y., Todd, M.J.: Primaldual interiorpoint methods for selfscaled cones. SIAM J. Optim. 8, 324–364 (1998)
Oskoorouchi, M.R., Goffin, J.L.: The analytic center cutting plane method with semidefinite cuts. SIAM J. Optim. 13(4), 1029–1053 (2003)
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61, 217–235 (2000). https://doi.org/10.1006/jcss.2000.1711
Rinaldi, G.: A rudimental graph generator by JRT (1993). https://www.tuchemnitz.de/~helmberg/rudy.tar.gz
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Todd, M.J., Toh, K.C., Tütüncü, R.H.: On the Nesterov–Todd direction in semidefinite programming. SIAM J. Optim. 8(3), 769–796 (1998)
Toh, K.C.: An inexact primaldual path following algorithm for convex quadratic SDP. Math. Program. 112(1), 221–254 (2008)
Wolkowicz, H., Saigal, R., Vandenberghe, L. (eds.): Handbook of Semidefinite Programming. International Series in Operations Research and Management Science, vol. 27. Kluwer Academic Publishers, Boston (2000)
Zhang, R.Y., Lavaei, J.: Modified interiorpoint method for largeandsparse lowrank semidefinite programs. In: 2017 IEE 56th Conference on Decision and Control (CDC), pp. 5640–5647. Melbourne, Australia, December 12–15 (2017)
Acknowledgements
I have profited a lot from discussions with many colleagues, in part years back. In particular I have to thank K.C. Toh as well as my colleagues O. Ernst, R. Herzog, A. Pichler and M. Stoll in Chemnitz. Much of the preparatory restructuring of ConicBundle was done during my sabbatical at the University of Klagenfurt, thank you to F. Rendl and A. Wiegele for making this possible. Thank you also to several anonymous referees whose constructive criticism helped to improve the presentation. The support of research Grants 05M18OCA of the German Federal Ministry of Education and Research and the DFG CRC 1410 is gratefully acknowledged.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Tables
Appendix A: Tables
For each box plot of Figs. 1, 2 and 3 for the three instances MC, BIS and MMBIS the following tables list the number of instances and the values of the parameters minimum, lower quartile (\(Q_1\)), median, upper quartile (\(Q_3\)), maximum. For each of the three instances an additional table gives the statistics on the Euclidean norm of the resulting residual of (6) achieved by the respective solver for the KKT systems grouped by the usual value ranges of the barrier parameter. For the data to Figs. 4, 5, 6, 7 and 8 see the supplement material in [20].
1.1 Appendix A.1: Maxcut (instance MC, Fig. 1)
Time per subproblem in seconds (338 instances):
Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

DS  0.003832  34.3212  44.6572  55.9219  95.1265 
IT  0.020491  12.6342  26.1729  79.1747  209.58 
RP  0.030548  22.0347  34.3771  92.207  235.653 
DP  0.020898  13.0997  30.6566  58.1994  110.33 
Time per subproblem in seconds vs. ranges of bundle sizes:
Bundle size  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

[6, 46]  5  DS  0.003832  0.006275  0.0088455  0.021121  0.045992 
IT  0.020491  0.021558  0.031059  0.059577  0.095407  
RP  0.030548  0.0314005  0.043048  0.093556  0.146144  
DP  0.020898  0.0230755  0.0309575  0.0592245  0.094453  
[56, 497]  5  DS  0.073058  0.329229  0.657365  1.0668  4.51608 
IT  0.173868  0.374044  0.597913  0.800166  1.83209  
RP  0.273307  0.637968  1.02498  1.4059  3.19528  
DP  0.177225  0.382114  0.59731  0.80289  1.6773  
[596, 1486]  133  DS  6.62451  30.4929  33.6252  36.9311  43.6407 
IT  1.86562  12.7032  25.3882  65.6882  136.935  
RP  3.69818  20.2136  33.3548  77.6126  155.415  
DP  1.96853  13.9917  30.0676  49.426  90.8324  
[1541, 2279]  195  DS  41.2984  46.7501  53.9483  61.8286  95.1265 
IT  9.84382  13.6284  32.0618  90.4042  209.58  
RP  18.53  23.4963  42.6607  106.814  235.653  
DP  9.9354  13.9977  36.494  76.5581  110.33 
Time per subproblem in seconds vs. last barrier parameter \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([2.3e01,3.6e+01]\)  85  DS  0.003832  23.4027  49.1143  61.5018  95.1265 
IT  0.020491  7.2764  10.8231  11.9651  16.4467  
RP  0.030548  13.4739  19.5347  22.0031  27.3839  
DP  0.020898  7.2377  10.8421  11.9242  18.6599  
\([1.6e02,2.3e01]\)  85  DS  24.2787  33.6785  41.3086  48.8582  83.315 
IT  10.8004  13.7717  15.5724  17.2952  33.9445  
RP  17.3893  22.0349  24.1102  27.3977  44.4086  
DP  12.0901  15.1287  17.2843  18.8158  37.7594  
\([7.7e03,1.6e02]\)  85  DS  26.8515  35.9349  44.6111  54.1812  73.3521 
IT  23.8603  28.455  33.4883  86.0521  103.067  
RP  30.9349  37.0011  43.6655  100.814  120.387  
DP  28.6909  32.9195  37.3972  74.0808  87.0083  
\([3.1e04,7.7e03]\)  83  DS  31.4011  37.4751  45.8135  57.0963  75.7966 
IT  64.8852  75.3411  87.3138  154.968  209.58  
RP  77.5913  87.3499  99.7462  175.749  235.653  
DP  49.3655  53.8412  62.3306  96.8023  110.33 
Time per KKT system in seconds grouped by value ranges of the barrier parameter \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([4.5e+02,9.9e+02]\)  676  IT  0.001092  0.463337  0.578703  0.917693  1.34934 
RP  0.002696  1.54367  1.80541  2.1684  3.22664  
DP  0.00252  0.515255  0.630482  0.916957  1.37408  
\([1.4e+00,5.1e+01]\)  1490  IT  0.005506  1.2121  1.83286  2.31178  6.49419 
RP  0.007486  2.36829  3.09237  3.69645  7.72418  
DP  0.005681  1.22031  1.82556  2.29723  7.32683  
\([1.6e02,3.1e01]\)  510  IT  3.20468  5.69856  8.79662  21.8884  56.4563 
RP  4.19829  6.85746  10.4541  23.0927  60.4059  
DP  4.28737  7.03695  10.7758  18.0144  23.1913  
\([3.1e04,9.3e03]\)  156  IT  17.2603  22.94  37.6166  52.734  74.6325 
RP  19.0854  25.2808  40.9692  55.18  82.3144  
DP  16.0158  18.3264  19.1491  19.8728  22.2577 
Number of matrix vector multiplications per KKT system grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([4.5e+02,9.9e+02]\)  676  IT  4  6  6  10  12 
RP  4  6  6  10  12  
DP  4  6  6  10  12  
\([1.4e+00,5.1e+01]\)  1490  IT  9  17  22  23  64 
RP  9  17  22  23  64  
DP  9  17  22  23  55  
\([1.6e02,3.1e01]\)  510  IT  54  77  84  252  612 
RP  54  77  84  253  627  
DP  45  49  52  54  63  
\([3.1e04,9.3e03]\)  156  IT  240  248  550  574  706 
RP  249  253  574  592  739  
DP  31  37  38  42  54 
Condition number estimate of the KKT systems grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([4.5e+02,9.9e+02]\)  676  IT  1.003  1.009  1.009  1.01  1.065 
RP  1.003  1.009  1.009  1.01  1.065  
DP  1.003  1.009  1.009  1.01  1.065  
\([1.4e+00,5.1e+01]\)  1490  IT  1.013  1.473  1.566  2.393  17.06 
RP  1.013  1.473  1.566  2.393  16.98  
DP  1.013  1.473  1.566  2.393  11.38  
\([1.6e02,3.1e01]\)  510  IT  12.03  26.72  29.72  253.7  2349 
RP  11.87  26.44  29.33  250.7  2289  
DP  10.62  11.11  11.46  11.75  11.96  
\([3.1e04,9.3e03]\)  156  IT  523.5  536.4  2057  2404  4711 
RP  507.1  527.3  2017  2371  4599  
DP  10.17  10.43  10.56  10.66  10.93 
Number of preconditioning columns per KKT system grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([4.5e+02,9.9e+02]\)  676  RP  0  0  0  0  0 
DP  0  0  0  0  0  
\([1.4e+00,5.1e+01]\)  1490  RP  0  0  0  0  6 
DP  0  0  0  0  12  
\([1.6e02,3.1e01]\)  510  RP  6  6  6  6  8 
DP  5  171  212  1012  1167  
\([3.1e04,9.3e03]\)  156  RP  7  8  10  10  13 
DP  1010  1113  1156  1175  1177 
Euclidean norm of the residual of (6) per KKT system grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([4.5e+02,9.9e+02]\)  676  IT  6.4e10  1.4e09  2.6e09  5.2e09  1e06 
RP  6.4e10  1.4e09  2.6e09  5.2e09  1e06  
DP  6.4e10  1.4e09  2.6e09  5.2e09  1e06  
\([1.4e+00,5.1e+01]\)  1490  IT  3.6e10  6.7e09  5.2e08  3.9e07  1e06 
RP  3.6e10  6.4e09  5.2e08  3.9e07  1.1e06  
DP  3.6e10  6.7e09  5.2e08  3.9e07  1e06  
\([1.6e02,3.1e01]\)  510  IT  8.6e09  7e07  8.2e07  9.2e07  1e06 
RP  9.7e09  7.3e07  9.2e07  1.1e06  2.4e06  
DP  5.2e09  5.4e07  6.8e07  7.9e07  1.2e06  
\([3.1e04,9.3e03]\)  156  IT  9.7e09  9.6e07  9.8e07  9.9e07  1e06 
RP  1.9e08  1.4e06  2.2e06  4.2e06  6.2e06  
DP  8.6e09  5.8e07  7.1e07  8.1e07  1e06 
1.2 Appendix A.2: Minimum bisection (instance BIS, Fig. 2)
Time per subproblem in seconds (195 instances):
Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

DS  0.00923  14.1723  91.7467  359.175  15841.2 
IT  0.044218  42.3879  340.967  3329.25  24016.1 
RP  0.055209  36.3358  238.703  1079.8  4866.25 
DP  0.049162  15.9154  89.9136  277.508  890.086 
Time per subproblem in seconds vs. ranges of bundle sizes:
Bundle size  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

[6, 28]  2  DS  0.00923  –  0.00923  –  0.091545 
IT  0.044218  –  0.044218  –  0.153935  
RP  0.055209  –  0.055209  –  0.222727  
DP  0.049162  –  0.049162  –  0.177788  
[67, 497]  128  DS  0.594552  8.69331  22.7682  132.809  428.513 
IT  0.960953  43.8529  466.297  2076.04  11867  
RP  1.23022  33.9205  219.14  835.573  2856.49  
DP  0.941817  11.8932  53.3373  183.946  560.093  
[529, 1379]  40  DS  34.3749  81.2278  517.036  720.019  1505.5 
IT  9.82356  16.6676  4819.87  15092.7  24016.1  
RP  12.5897  20.357  1586.65  3277.46  4866.25  
DP  10.0625  16.2665  340.397  612.94  890.086  
[1597, 5887]  25  DS  302.944  783.079  1403.96  3335.16  15841.2 
IT  29.3698  85.8307  211.467  294.115  932.896  
RP  38.2829  108.971  248.069  341.147  955.341  
DP  31.1555  87.6845  204.633  295.527  789.044 
Time per subproblem in seconds vs. last barrier parameter \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.9e+00,1.0e+03]\)  49  DS  0.00923  8.97908  69.8681  955.047  15841.2 
IT  0.044218  4.08472  14.083  154.701  932.896  
RP  0.055209  4.93747  18.2564  178.33  955.341  
DP  0.049162  4.21385  14.8836  143.015  789.044  
\([2.8e03,1.4e+00]\)  49  DS  0.594552  9.95126  52.3298  128.666  6981.47 
IT  0.960953  46.9621  218.946  937.902  1927.16  
RP  1.23022  38.2691  183.586  508.265  933.538  
DP  0.941817  15.3137  66.5916  139.005  734.869  
\([9.4e04,2.8e03]\)  49  DS  2.12836  18.2489  73.8103  267.252  705.689 
IT  10.3255  198.11  1744.97  4383.54  7789.69  
RP  9.92439  111.622  670.068  1522.76  2315.44  
DP  4.13901  32.1878  138.761  282.768  459.79  
\([1.1e04,9.4e04]\)  48  DS  3.77535  20.2483  189.134  599.782  1505.5 
IT  21.8778  575.065  6101.11  13836.1  24016.1  
RP  19.1305  219.14  1524.83  2942.66  4866.25  
DP  5.98724  52.4419  299.974  580.588  890.086 
Time per KKT system in seconds grouped by value ranges of the barrier parameter \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.1e+02,2.6e+04]\)  824  IT  0.001823  0.328123  0.786603  1.66599  8.7638 
RP  0.007159  0.410637  0.980222  2.023  11.5388  
DP  0.006695  0.275799  0.669313  1.45762  7.73375  
\([1.0e+00,9.9e+01]\)  2262  IT  0.110818  2.7715  7.29372  14.4584  58.5776 
RP  0.145002  2.60334  6.57376  13.2324  54.0312  
DP  0.10742  1.46962  3.4827  7.40111  37.485  
\([1.0e02,1.0e+00]\)  1991  IT  0.817783  21.4256  46.2355  85.6325  327.564 
RP  0.744242  11.4276  23.3757  37.8191  100.053  
DP  0.365208  2.70049  5.41858  8.25427  18.679  
\([1.1e04,1.0e02]\)  1103  IT  3.30796  173.717  295.614  470.923  1316.81 
RP  2.48871  44.4168  62.784  86.7278  168.322  
DP  0.40273  5.37352  6.99157  10.5791  14.8519 
Number of matrix vector multiplications per KKT system grouped by value ranges of\(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.1e+02,2.6e+04]\)  824  IT  10  23  30  45  99 
RP  8  19  26  38  77  
DP  7  19  26  38  69  
\([1.0e+00,9.9e+01]\)  2262  IT  23  107  204  386.5  820 
RP  23  97  166  277  555  
DP  23  47  62  80  133  
\([1.0e02,1.0e+00]\)  1991  IT  141  1024.5  1632  2719.5  5960 
RP  130  548.5  722.5  910.5  1774  
DP  19  31  33  36  70  
\([1.1e04,1.0e02]\)  1103  IT  750  5607.5  7552.5  10475  19571 
RP  399  1249  1419.5  1657  2620  
DP  17  31  33  35  41 
Condition number estimate of the KKT systems grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.1e+02,2.6e+04]\)  824  IT  2.15  3.773  5.36  12.1  997.8 
RP  1.008  2.465  3.84  7.696  67.38  
DP  1.122  2.469  3.834  7.442  25.34  
\([1.0e+00,9.9e+01]\)  2262  IT  3.681  93.22  277.5  1041  1.234e+04 
RP  2.035  59.19  160  477  2544  
DP  2.035  7.361  15.99  27.31  407.7  
\([1.0e02,1.0e+00]\)  1991  IT  289.5  1.409e+04  4.986e+04  1.869e+05  2.075e+06 
RP  246.8  3107  6609  1.405e+04  9.741e+04  
DP  1.717  2.29  2.632  3.128  27.66  
\([1.1e04,1.0e02]\)  1103  IT  4.501e+04  1.324e+06  2.869e+06  6.53e+06  4.576e+07 
RP  7554  3.375e+04  5.161e+04  8.013e+04  4.099e+05  
DP  1.703  2.048  2.185  2.522  12.55 
Number of preconditioning columns per KKT system grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.1e+02,2.6e+04]\)  824  RP  0  0  0  0  6 
DP  0  0  0  0  9  
\([1.0e+00,9.9e+01]\)  2262  RP  0  2  6  8  24 
DP  0  9  43  82  152  
\([1.0e02,1.0e+00]\)  1991  RP  4  20  25  29  38 
DP  36  113  141  154  169  
\([1.1e04,1.0e02]\)  1103  RP  13  31  33  36  44 
DP  68  147.5  159  168  177 
Euclidean norm of the residual of (6) per KKT system grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.1e+02,2.6e+04]\)  824  IT  9.8e13  9.1e09  3.2e07  5.6e07  1e06 
RP  1.4e12  6.9e09  1.9e07  3.5e07  1.7e06  
DP  1.3e12  7e09  1.9e07  3.5e07  1.1e06  
\([1.0e+00,9.9e+01]\)  2262  IT  2.9e10  4.7e09  6.7e09  5e07  1e06 
RP  3.1e10  4e09  5.7e09  3.3e07  3.2e06  
DP  3.8e10  5e09  7.4e09  3.5e07  2.8e06  
\([1.0e02,1.0e+00]\)  1991  IT  1.2e09  5e09  9e09  1.9e08  1e06 
RP  1.6e09  6.4e09  1.2e08  2.4e08  2.2e05  
DP  6.6e10  5.2e09  9.4e09  2e08  7.2e06  
\([1.1e04,1.0e02]\)  1103  IT  4.3e09  4.9e08  1.1e07  3.3e07  2.9e06 
RP  3.5e09  3.3e08  5.7e08  9.5e08  7.3e05  
DP  2.2e09  3.4e08  6e08  1e07  5.3e06 
1.3 Appendix A.3: Min–max bisection (instance MMBIS, Fig. 3)
Time per subproblem in seconds (35 instances):
Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

DS  0.001669  0.958311  4942  35185.7  140651 
IT  0.06925  1.7246  319.124  2069.75  13368.7 
RP  0.059093  1.02453  215.038  1412.83  7763.78 
DP  0.046371  0.864647  154.628  590.86  2083.68 
Time per subproblem in seconds vs. ranges of bundle sizes:
Bundle size  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

[0, 37]  7  DS  0.001669  0.008018  0.0415445  0.091635  0.240095 
IT  0.06925  0.098375  0.229849  0.332737  0.752143  
RP  0.059093  0.069004  0.144198  0.224022  0.449136  
DP  0.046371  0.056267  0.121948  0.185404  0.391171  
[56, 352]  4  DS  0.46651  0.46651  1.45011  3.83018  17.636 
IT  1.08647  1.08647  2.36273  4.81555  16.5215  
RP  0.668257  0.668257  1.3808  3.04164  9.26137  
DP  0.567064  0.567064  1.16223  2.53117  6.96995  
[1327, 1379]  2  DS  300.616  –  300.616  –  969.539 
IT  56.453  –  56.453  –  416.7  
RP  35.8027  –  35.8027  –  257.055  
DP  28.0559  –  28.0559  –  167.41  
[2702, 16291]  22  DS  1807.41  6318.51  22730.5  53047.4  140651 
IT  223.837  319.124  641.539  4509.64  13368.7  
RP  163.342  215.038  450.345  2558.71  7763.78  
DP  119.399  157.057  315.234  814.558  2083.68 
Time per subproblem in seconds vs. last barrier parameter \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([3.6e02,1.4e+00]\)  9  DS  0.014367  0.0464135  0.186595  2.14835  1807.41 
IT  0.1275  0.285552  0.553999  2.95101  416.7  
RP  0.078915  0.183008  0.342684  1.85495  257.055  
DP  0.066163  0.154237  0.292596  1.54912  167.41  
\([7.0e03,3.5e02]\)  9  DS  1.45011  159.126  3825.74  4942  16156.8 
IT  2.36273  36.4873  230.856  257.261  539.789  
RP  1.3808  22.5321  163.699  181.259  397.781  
DP  1.16223  17.5129  121.09  135.255  246.92  
\([1.0e03,5.9e03]\)  9  DS  0.040436  9770.54  18267.6  30691.4  47352.7 
IT  0.198213  388.296  598.446  2093.91  5072.23  
RP  0.134192  265.913  414.119  1302.96  2838.2  
DP  0.11221  189.641  276.743  511.322  890.336  
\([2.4e04,9.6e04]\)  8  DS  0.001669  49409.5  64909.4  83523.5  140651 
IT  0.06925  1582.35  5497.04  8664.78  13368.7  
RP  0.059093  1175.55  3060.07  4787.77  7763.78  
DP  0.046371  614.286  954.903  1273.72  2083.68 
Time per KKT system in seconds grouped by value ranges of the barrier parameter \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.0e+02,1.0e+03]\)  295  IT  0.002025  0.259585  11.9083  22.8061  54.4588 
RP  0.001737  0.159694  7.95722  14.0881  32.6083  
DP  0.001638  0.119556  5.4995  10.3867  23.2037  
\([1.0e+00,9.9e+01]\)  146  IT  0.00647  1.8041  17.2027  28.9793  72.3436 
RP  0.005452  1.23622  11.287  20.3489  36.1462  
DP  0.003676  0.98141  9.03463  15.5483  28.2709  
\([1.2e02,1.0e+00]\)  159  IT  0.003261  8.2878  16.9553  27.6698  101.194 
RP  0.003848  5.98822  13.8806  24.4045  79.1858  
DP  0.002763  4.14727  8.67571  18.2895  52.1301  
\([2.4e04,8.6e03]\)  182  IT  0.002879  205.544  310.59  439.276  833.569 
RP  0.003367  123.886  170.845  253.51  417.102  
DP  0.00254  33.7196  42.3111  58.5254  79.5814 
Number of matrix vector multiplications per KKT system grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.0e+02,1.0e+03]\)  295  IT  13  57  69  85  146 
RP  7  21  29  31  45  
DP  7  21  28  32.5  45  
\([1.0e+00,9.9e+01]\)  146  IT  28  67  76  85  133 
RP  17  33  34  37  51  
DP  18  32  33  35  42  
\([1.2e02,1.0e+00]\)  159  IT  23  63  107  181  321 
RP  17  40  81  117  175  
DP  17  31  56  74.5  84  
\([2.4e04,8.6e03]\)  182  IT  18  478  761  987.5  1590 
RP  13  321.5  420  503  726  
DP  14  69.5  89  92.5  110 
Condition number estimate of the KKT systems grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.0e+02,1.0e+03]\)  295  IT  12.93  6950  7879  9417  1.496e+06 
RP  1.001  2.869  4.467  14.82  81.57  
DP  1.001  1.549  3.146  5.245  13.87  
\([1.0e+00,9.9e+01]\)  146  IT  3.102  213.5  714.8  6152  4.1e+05 
RP  1.434  5.384  8.388  38.71  271.3  
DP  2.706  5.471  6.97  8.305  11.16  
\([1.2e02,1.0e+00]\)  159  IT  2.177  404.9  1566  2739  2.984e+04 
RP  1.639  43.97  79.24  195.1  730.4  
DP  2.031  6.9  8.502  10.03  12.24  
\([2.4e04,8.6e03]\)  182  IT  1.179  3.755e+04  5.674e+04  9.099e+04  2.553e+05 
RP  1.131  580.6  737.5  1039  2200  
DP  1.169  11.08  11.61  12.17  20.08 
Number of preconditioning columns per KKT system grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.0e+02,1.0e+03]\)  295  RP  0  0  0  0  4 
DP  0  0  0  1  3  
\([1.0e+00,9.9e+01]\)  146  RP  0  0  0  1  3 
DP  0  0  0  1  2  
\([1.2e02,1.0e+00]\)  159  RP  0  0  2  3  5 
DP  0  2  3  15  84  
\([2.4e04,8.6e03]\)  182  RP  0  3  4  6  11 
DP  0  111.5  127  148.5  175 
Euclidean norm of the residual of (6) per KKT system grouped by value ranges of \(\mu \):
\(\mu \)Range  #  Solver  Min  \(Q_1\)  Median  \(Q_3\)  Max 

\([1.0e+02,1.0e+03]\)  295  IT  1e09  7.8e08  2.5e07  5e07  9.8e07 
RP  2.7e11  3.6e09  1e08  5e08  5.3e07  
DP  3.4e11  3.8e09  1.2e08  3.8e08  3.9e07  
\([1.0e+00,9.9e+01]\)  146  IT  3.3e09  3.4e08  3.3e07  6.5e07  1e06 
RP  1.2e12  2.3e08  9.1e08  1.8e07  1.1e06  
DP  2.2e10  1.7e08  7.6e08  2.5e07  1.1e06  
\([1.2e02,1.0e+00]\)  159  IT  6.5e10  1.3e08  1.4e07  4.9e07  9.8e07 
RP  5.4e10  8.5e09  8.8e08  1.7e07  7.7e07  
DP  2e10  1.1e08  1.2e07  2.9e07  8.9e07  
\([2.4e04,8.6e03]\)  182  IT  3.5e09  3.4e08  7.8e08  6.5e07  1.3e06 
RP  6.5e10  1.2e08  2.3e08  8.5e08  7.7e07  
DP  7.8e10  1.5e08  3.1e08  1e07  9e07 
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Helmberg, C. A preconditioned iterative interior point approach to the conic bundle subproblem. Math. Program. 205, 559–615 (2024). https://doi.org/10.1007/s1010702301986w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1010702301986w
Keywords
 Low rank preconditioner
 Quadratic semidefinite programming
 Nonsmooth optimization
 Interior point method