1 Introduction

The class of semi-definite linear complementarity problems (SDLCPs), which contains the class of semi-definite programs (SDPs) as an important subclass, has many real life applications, for example, in optimal control, estimation and signal processing, communications and networks, statistics, finance [3]. Semi-definite programming has wide applications in NP-hard combinatorial problems [8] and global optimization, where it is used to find bounds on optimal values and to find approximate solutions. Interior point methods have been proven to be successful in solving linear programs, with many works in the literature devoted to its study since the 80s. Semi-definite programs are extensions of linear programs to the space of symmetric matrices, and interior point methods have been successfully extended from solving linear programs to solving semi-definite programs with the same polynomial complexity results—see for example [1, 14, 17, 45].

Among different interior point methods (IPMs), primal–dual path following interior point algorithms are the most successful and most widely studied. Due to the difficulty in maintaining symmetry in the linearized complementarity when using primal–dual path following interior point method to solve an SDP, researchers working in the IPM domain have proposed ways to overcome this problem, resulting in different symmetrized search directions [1, 11, 14, 17, 18, 20, 21, 42] being introduced. Among these search directions, the Alizadeh–Haeberly–Overton (AHO), Helmberg–Kojima–Monteiro (HKM) and Nesterov–Todd (NT) directions are better known, with the latter two being implemented in SDP solvers, such as SeDuMi, SDPT3. Various studies have been conducted to analyze primal–dual path following interior point algorithm, using the HKM search direction, such as [25], and the NT search direction, such as [16], to solve semi-definite programs. Works [19, 27, 30] have also been done to give an unified polynomial complexity analysis of interior point algorithms on semi-definite programs using a commutative class of search directions, which include the HKM and NT directions. More recently, a stream of research [10, 29, 43, 44] has appeared that derives polynomial complexity for a full NT step interior point method in solving linear programs, semi-definite programs and symmetric cone programs. Recent works on interior point methods include [2, 6, 28] that design new interior point algorithms, different from that considered in this paper, to solve symmetric cone programs.

The focus of this paper is on analyzing an infeasible predictor–corrector primal–dual path following interior point algorithm, using the NT search direction, to solve semi-definite linear complementarity problems. We consider an infeasible interior point algorithm since it is more practical than a feasible interior point algorithm, as it is usually difficult to find an initial interior point iterate which is also feasible. The algorithm considered in this paper was studied in [25] to solve semi-definite programs using the HKM search direction. The analysis in [25] to show global convergence and polynomial complexity cannot be carried over in a straightforward manner to analyze the algorithm using the search direction considered in this paper. Our contributions in this paper include showing polynomial complexity \({\mathcal {O}}(n \ln ( \max \{n\tau _0, \Vert r_0\Vert _2\}/\epsilon ))\) for the infeasible interior point algorithm using the NT search direction to solve a semi-definite linear complementarity problem. Our result complements the result obtained in [25] which considers the algorithm using the HKM search direction to solve semi-definite programs. Our iteration complexity bound is the best known iteration bound for infeasible interior point algorithms using the “narrow” neighborhood known so far in the literature. To the best of our knowledge, it is also the first time this polynomial complexity is derived for an infeasible predictor–corrector primal–dual path following interior point algorithm, using the NT search direction, on a semi-definite linear complementarity problem. Furthermore, under strict complementarity assumption, we provide local convergence results in Sect. 4 that are analogous to that in [34]. Note that superlinear convergence results using interior point methods are hard to obtain, to quote the opening sentences of [22]: “Local superlinear convergence is a natural and very desirable property of many methods in nonlinear optimization. However, for interior-point methods the corresponding analysis is not trivial”. It is worthwhile mentioning that among these local convergence results, we show that for the important class of linear semi-definite feasibility problems, only a suitably chosen initial iterate is needed for superlinear convergence, unlike what is generally believed to be needed to achieve superlinear convergence, which is for iterates to get close to the central path by repeatedly solving the corrector-step linear system in an iteration (see for example [13, 22]). We should also mention that although local convergence results using the NT search direction has been established in [16] by “narrowing” the central path neighborhood, the algorithm considered there generates feasible iterates, while here we consider an infeasible algorithm, which is more practical, and the analysis for the infeasible case is more complicated than that for the feasible case. Finally, our results in this paper indicate that using the NT search direction in an interior point algorithm to solve semi-definite linear complementarity problems is as good as using the HKM search direction both from the “polynomial complexity” and the “local convergence” point of view.

1.1 Facts, notations and terminology

The space of symmetric \(n \times n\) matrices is denoted by \(S^n\). The cone of positive semi-definite (resp., postive definite) symmetric matrices is denoted by \(S^n_+\) (resp., \(S^n_{++}\)). The identity matrix is denoted by \(I_{n \times n}\), where n stands for the size of the matrix. We omit the subscript when the size of the identity matrix is clear from the context. Given a symmetric matrix G, \(\lambda _{\min }(G)\) and \(\lambda _{\max }(G)\) are denoted to be the minimum and maximum eigenvalue of G respectively.

Given matrices G and K in \(\mathfrak {R}^{n_1 \times n_2}\), the inner product, \(G \bullet K\), between the two matrices is defined to be \(G \bullet K := {{\mathrm{Tr}}}(G^TK) = {{\mathrm{Tr}}}(GK^T)\), where \({{\mathrm{Tr}}}(\cdot )\) is the trace of a square matrix. \(\Vert \cdot \Vert _2\) for a vector in \(\mathfrak {R}^n\) refers to its Euclidean norm, and for a matrix in \(\mathfrak {R}^{n_1 \times n_2}\), it refers to its operator norm. On the other hand, \(\Vert G \Vert _F := \sqrt{G \bullet G}\), for \(G \in \mathfrak {R}^{n_1 \times n_2}\), refers to the Frobenius norm of G.

For a matrix \(G \in \mathfrak {R}^{n_1 \times n_2}\), we denote its component in the ith row and the jth column by \(G_{ij}\). \(G_{i \cdot }\) denotes the ith row of G and \(G_{\cdot j}\) the jth column of G. In the case when G is partitioned into blocks of submatrices, then \(G_{ij}\) refers to the submatrix in the corresponding (ij) position.

Given square matrices \(G_i \in \mathfrak {R}^{n_i \times n_i}, i =1, \ldots , N\), \({\mathrm{Diag}}(G_1, \ldots , G_N)\) is a square matrix with \(G_i, i = 1, \ldots , N\), as its main diagonal blocks arranged in accordance to the way they are lined up in \({\mathrm{Diag}}(G_1, \ldots , G_N)\). All the other entries in \({\mathrm{Diag}}(G_1, \ldots , G_N)\) are zeroes.

Given \(X \in S^n\), \({\mathrm{svec}}(X)\) is defined to be

$$\begin{aligned} {\mathrm{svec}}(X) := (X_{11}, \sqrt{2}X_{21}, \ldots , \sqrt{2}X_{n1}, X_{22}, \sqrt{2} X_{32}, \ldots , \sqrt{2} X_{n2}, \ldots , X_{nn})^T \in \mathfrak {R}^{{\tilde{n}}}, \end{aligned}$$

where \({\tilde{n}} := n(n+1)/2\). \({\mathrm{svec}}(\cdot )\) sets up a one-to-one correspondence between \(S^n\) and \(\mathfrak {R}^{{\tilde{n}}}\).

Note that for all \(X, Y \in S^n\), \(X \bullet Y = {\mathrm{svec}}(X)^T {\mathrm{svec}}(Y)\). Hence, \(\Vert X \Vert _F = \Vert {\mathrm{svec}}(X) \Vert _2\) for \(X \in S^n\).

Given \(G, K \in \mathfrak {R}^{n \times n}\), \(G \otimes _s K\) is a square matrix of size \({\tilde{n}}\) defined by

$$\begin{aligned} (G \otimes _s K) {\mathrm{svec}}(H) := \frac{1}{2} {\mathrm{svec}}(KHG^T + GHK^T), \ \forall \ H \in S^n. \end{aligned}$$

Fact 1

(Appendix of [39]) Let \(G, K, L \in \mathfrak {R}^{n \times n}\).

  1. (a)

    \(G \otimes _s K = K \otimes _s G\) and \((G \otimes _s K)^T = G^T \otimes _s K^T\).

  2. (b)

    \((G \otimes _s K)(L \otimes _s L) = (GL)\otimes _s(KL)\) and \((L \otimes _s L)(G \otimes _s K)= (LG) \otimes _s (LK)\).

  3. (c)

    If G and K are commuting symmetric matrices. Let \(\{x_i\}\) be their common basis of eigenvectors with corresponding eigenvalues \(\lambda _i^G\) and \(\lambda _i^K\). Then \(G \otimes _s K\) is symmetric and has the set of eigenvalues given by \(\left\{ \frac{1}{2}(\lambda _i^G \lambda _j^K + \lambda _j^G \lambda _i^K) \right\} \). Also, \({\mathrm{svec}}\left( \frac{1}{2}(x_i x_j^T + x_j x_i^T)\right) \) is an eigenvector corresponding to the eigenvalue \(\frac{1}{2} (\lambda _i^G \lambda _j^K + \lambda _j^G \lambda _i^K)\) of \(G \otimes _s K\).

Fact 2

([12]) For \(G, K \in \mathfrak {R}^{n \times n}\), \(\Vert GK \Vert _F \le \min \{\Vert G\Vert _F \Vert K\Vert _2, \Vert G\Vert _2 \Vert K\Vert _F\}\).

Fact 3

For \(x \in \mathfrak {R}, x \ge 0\), we have

$$\begin{aligned} \frac{\sqrt{1 + x} - 1}{\sqrt{1 + x} + 1} \le \frac{\sqrt{x}}{1 + \sqrt{x}}. \end{aligned}$$

Given functions \(f: \Omega \rightarrow E\) and \(g: \Omega \rightarrow \mathfrak {R}_{++}\), where \(\Omega \) is an arbitrary set and E is a normed vector space with norm \(\Vert \cdot \Vert \). For a subset \({\hat{\Omega }} \subseteq \Omega \), we write \(f(w) = {\mathcal {O}}(g(w))\) for all \(w \in {\hat{\Omega }}\) to mean that \(\Vert f(w) \Vert \le M g(w)\) for all \(w \in {\hat{\Omega }}\), where \(M > 0\) is a positive constant. Suppose \(E = S^n\). Then we write \(f(w) = \Theta (g(w))\) if for all \(w \in {\hat{\Omega }}\), \(f(w) \in S^n_{++}\), \(f(w) = {\mathcal {O}}(g(w))\) and \(f(w)^{-1} = {\mathcal {O}}(1/g(w))\). The subset \({\hat{\Omega }}\) should be clear from the context. For example, \({\hat{\Omega }} = (0,{\hat{w}})\) for some \({\hat{w}} > 0\) or \({\hat{\Omega }} = \{w_k\ ; \ k \ge 0\}\), where \(w_k \rightarrow 0\) as \(k \rightarrow \infty \). In the latter case, we write \(f(w_k) = o(g(w_k))\) to mean that \(\Vert f(w_k) \Vert /g(w_k) \rightarrow 0\), as \(k \rightarrow \infty \).

2 A primal–dual path following interior point algorithm on an SDLCP

We consider a semi-definite linear complementarity problem (SDLCP) which is the problem to find a solution, (XY), to the following system:

$$\begin{aligned} XY= & {} 0, \end{aligned}$$
(1)
$$\begin{aligned} {\mathcal {A}}(X) + {\mathcal {B}}(Y)= & {} q, \end{aligned}$$
(2)
$$\begin{aligned} X, Y\in & {} S^n_+, \end{aligned}$$
(3)

where \(q \in \mathfrak {R}^{{\tilde{n}}}\) and \({\mathcal {A}}, {\mathcal {B}}: S^n \rightarrow \mathfrak {R}^{{\tilde{n}}}\) are linear operators mapping \(S^n\) to the space \(\mathfrak {R}^{{\tilde{n}}}\), \({\tilde{n}} := n(n+1)/2\). \({\mathcal {A}}\) and \({\mathcal {B}}\) take the form

$$\begin{aligned} {\mathcal {A}}(X) = (A_1 \bullet X, \ldots , A_{{\tilde{n}}} \bullet X)^T,\ {\mathcal {B}}(Y) = (B_1 \bullet Y, \ldots , B_{{\tilde{n}}} \bullet Y)^T, \end{aligned}$$
(4)

where \(A_i, B_i \in S^n\) for all \(i = 1, \ldots , {\tilde{n}}\). We also called the system (1)–(3) an SDLCP.

The following assumptions are assumed to hold for the system (1)–(3) in this and the next section, while we replace Assumption 1(b) by Assumption 2 in Sect. 4, although we still assume Assumptions 1(a), (c) in that section.

Assumption 1

  1. (a)

    System (1)–(3) is monotone. That is, \({\mathcal {A}}(X) + {\mathcal {B}}(Y) = 0\) for \(X, Y \in S^n \Rightarrow X \bullet Y \ge 0\).

  2. (b)

    There exists \((X^1,Y^1) \in S^n_{++} \times S^n_{++}\) such that \({\mathcal {A}}(X^1) + {\mathcal {B}}(Y^1) = q\).

  3. (c)

    \(\{ {\mathcal {A}}(X) + {\mathcal {B}}(Y)\ ; \ X, Y \in S^n \} = \mathfrak {R}^{{\tilde{n}}}\).

The first assumption [Assumption 1(a)] is satisfied for the class of semi-definite programs (SDPs), with equality for \(X \bullet Y\) instead of inequality. The second assumption ensures that (2) is satisfied for some positive definite matrix pair, while the last assumption is a technical assumption that can be satisfied for any SDP. Note that Assumption 1(b) is only used in this paper to ensure the existence of a solution to the SDLCP (1)–(3).

An SDP in its primal and dual form is given by

$$\begin{aligned} \begin{array}{lll} ({\mathcal {P}}) &{} \min &{} C \bullet X \\ &{} {\mathrm{subject\ to}} &{} A_i \bullet X = b_i, \ \ i = 1, \ldots , m, \\ &{} &{} X \in S^n_+, \\ ({\mathcal {D}}) &{} \max &{} \sum \nolimits _{i=1}^m b_i y_i \\ &{} {\mathrm{subject\ to}} &{} \sum \nolimits _{i=1}^m y_i A_i + Y = C, \\ &{} &{} Y \in S^n_+. \end{array} \end{aligned}$$

In the above formulation of an SDP, it is without loss of generality to assume that \(A_i, i = 1, \ldots , m\), are linearly independent.

An SDP is a special case of an SDLCP by letting \(A_i = 0\) for \(i = m+1, \ldots , {\tilde{n}}\), \(B_i = 0\) for \(i = 1, \ldots , m\), in (4). \(B_i, i = m+1, \ldots , {\tilde{n}}\), in (4) are chosen to be linearly independent and belong to the subspace in \(S^n\) orthogonal to the space spanned by \(A_i, i = 1, \ldots , m\).

Primal–dual path following interior point algorithms can be used to solve an SDLCP. We consider an infeasible predictor–corrector primal–dual path following interior point algorithm, as found in [25, 34], in this paper. In [25, 34], the search direction used is the Helmberg–Kojima–Monteiro (HKM) search direction [11, 14, 17], while in this paper, we consider the algorithm using the Nesterov–Todd (NT) search direction [20, 21]. The difference between the two search directions is the way “symmetrization” is being done on (1). For \(P \in \mathfrak {R}^{n \times n}\) an invertible matrix, the similarly transformed symmetrization operator \(H_P(\cdot )\), introduced in [45], is given by

$$\begin{aligned} H_P(U) := \frac{1}{2}(PUP^{-1} + (PUP^{-1})^T), \end{aligned}$$

where \(U \in \mathfrak {R}^{n \times n}\). Hence, \(H_P(\cdot )\) is a map from \(\mathfrak {R}^{n \times n}\) to \(S^n\). Different search directions correspond to different P, as will be explained later.

Infeasible primal–dual path following interior point algorithm works on the principle that iterates generated by the algorithm “follows” an (infeasible) central path \((X(\mu ),Y(\mu ))\), \(\mu > 0\), which is the unique solution to

$$\begin{aligned} XY= & {} \mu I, \\ {\mathcal {A}}(X) + {\mathcal {B}}(Y)= & {} q + \frac{\mu }{\mu _0} r_0, \\ X, Y\in & {} S^n_{++}, \end{aligned}$$

where

$$\begin{aligned} r_0 := {\mathcal {A}}(X_0) + {\mathcal {B}}(Y_0) - q, \end{aligned}$$

for some \(X_0, Y_0 \in S^n_{++}\) and \(\mu _0 > 0\). Here, \(X_0, Y_0\) are such that \(X_0 Y_0 = \mu _0 I\). The existence and uniqueness of this central path follows from Theorem 2.3 in [35]. It also follows from Theorem 2.4 in [35] that there exists a solution \((X^*, Y^*)\) to the SDLCP (1)–(3). Although these theorems apply to the feasible central path when \(r_0 = 0\) in [35], they can be easily shown to hold for an infeasible central path when \(r_0 \not = 0\) by assuming Assumptions 1(a)–(c). We leave their proofs as exercises for the reader.

From now onwards, \((X^*,Y^*)\) denotes a solution to the SDLCP (1)–(3).

Using \(H_P(\cdot )\), the above central path \((X(\mu ),Y(\mu ))\), \(\mu > 0\), is also the unique solution to

$$\begin{aligned} H_P(XY)= & {} \mu I, \\ {\mathcal {A}}(X) + {\mathcal {B}}(Y)= & {} q + \frac{\mu }{\mu _0} r_0, \\ X, Y\in & {} S^n_{++}, \end{aligned}$$

since we have for \(X, Y \in S^n_{++}\) and \(\mu > 0\),

$$\begin{aligned} H_P(XY) = \mu I \Leftrightarrow XY = \mu I. \end{aligned}$$

Below we describe the infeasible predictor–corrector primal–dual path following interior point algorithm considered in this paper. This algorithm is the same as that in [25], although there is a wider choice for \(\beta _1, \beta _2\) here. The key to the algorithm is solving the following system of linear equations:

$$\begin{aligned} H_P(X \Delta Y + \Delta X Y)= & {} \sigma \tau I - H_P(XY), \end{aligned}$$
(5)
$$\begin{aligned} {\mathcal {A}}(\Delta X) + {\mathcal {B}}(\Delta Y)= & {} - {\bar{r}}, \end{aligned}$$
(6)

for \(\Delta X, \Delta Y \in S^n\), where \(\tau > 0\), \(0 \le \sigma \le 1\) and \({\bar{r}} \in \mathfrak {R}^{{\tilde{n}}}\). Different choice of \(\tau , \sigma \) and \({\bar{r}}\) results in different step in Algorithm 1 described below.

The following (narrow) neighborhood of the central path is used in this paper:

$$\begin{aligned} {\mathcal {N}}_1(\beta , \tau ) = \{ (X, Y) \in S^n_{++} \times S^n_{++} \ ; \Vert H_P(XY) - \tau I \Vert _F \le \beta \tau \}, \end{aligned}$$
(7)

where \(\tau > 0\) and \(0< \beta < 1\).

Note that P that appears in (5) and (7) is not chosen arbitrary, but is related to \(X, Y \in S^n_{++}\) as we will see next, after describing the algorithm we are analyzing in this paper.

Algorithm 1

Given \(\epsilon > 0\). Choose \(\beta _1 < \beta _2\), with \(\beta _2^2/(2(1-\beta _2)) \le \beta _1< \beta _2< \beta _2/(1-\beta _2) < 1\). Choose \((X_0,Y_0) \in {\mathcal {N}}_1(\beta _1,\tau _0)\) with \(\mu _0 = \tau _0 = X_0 \bullet Y_0/n\). For \(k = 0, 1, \ldots \), do (a1) through (a3):

(a1):

If \(\max \{ X_k \bullet Y_k, \Vert r_k\Vert _2 \} \le \epsilon \), where \(r_k = {\mathcal {A}}(X_k) + {\mathcal {B}}(Y_k) - q\), then report \((X_k,Y_k)\) as an approximate solution to the system (1)–(3), and terminate.

(a2):

(Predictor Step) Find the solution \((\Delta X_k^p, \Delta Y_k^p)\) of the linear system (5), (6), with \(X = X_k, Y = Y_k, P = P_k\), \(\sigma = 0\), \(\tau = \tau _k\) and \({\overline{r}} = r_k\). Define

$$\begin{aligned} {\hat{X}}_k = X_k + {\hat{\alpha }}_k \Delta X_k^p,\ \ {\hat{Y}}_k = Y_k + {\hat{\alpha }}_k \Delta Y_k^p, \end{aligned}$$

where the steplength \({\hat{\alpha }}_k\) satisfies

$$\begin{aligned} \alpha _{k,1} \le {\hat{\alpha }}_k \le \alpha _{k,2}. \end{aligned}$$
(8)

Here,

$$\begin{aligned} \alpha _{k,1}= & {} \frac{2}{\sqrt{1 + 4 \delta _k/(\beta _2 - \beta _1)} + 1}, \end{aligned}$$
(9)
$$\begin{aligned} \delta _k= & {} \frac{1}{\tau _k} \Vert H_{P_k}(\Delta X_k^p \Delta Y_k^p) \Vert _F, \end{aligned}$$
(10)

and

$$\begin{aligned} \alpha _{k,2}= & {} \max \{ {\tilde{\alpha }} \in [0,1]\ ;\ (X_k + \alpha \Delta X_k^p, Y_k + \alpha \Delta Y_k^p)\nonumber \\&\in {\mathcal {N}}_1(\beta _2,(1-\alpha )\tau _k) \ \forall \ \alpha \in [0, {\tilde{\alpha }}] \}. \end{aligned}$$
(11)

If \({\hat{\alpha }}_k = 1\), then \(({\hat{X}}_k,{\hat{Y}}_k)\) solves the system (1)–(3) and terminate.

(a3):

(Corrector Step) Find the solution \((\Delta X_k^c, \Delta Y_k^c)\) of the linear system (5), (6), with \(X = {\hat{X}}_k, Y = {\hat{Y}}_k, P = {\hat{P}}_k\), \(\sigma = (1 - {\hat{\alpha }}_k)\), \(\tau = \tau _k\) and \({\overline{r}} = 0\). Set

$$\begin{aligned} \begin{array}{c} X_{k+1} = {\hat{X}}_k + \Delta X_k^c,\ \ Y_{k+1} = {\hat{Y}}_k + \Delta Y_k^c, \\ \tau _{k+1} = (1 - {\hat{\alpha }}_k) \tau _k. \end{array} \end{aligned}$$

Set \(k+1 \rightarrow k\) and go to Step (a1).

The above algorithm is an infeasible predictor–corrector primal–dual path following interior point algorithm. \(P_k\) in the algorithm is chosen such that \(P_k X_k Y_k P_k^{-1} \in S^n\). Examples of \(P_k\) that satisfy this are \(P_k = Y_k^{1/2}\), and \(P_k\) such that \(P_k^T P_k = W_k^{-1}\) with \(W_k Y_k W_k = X_k\). The former corresponds to the dual HKM search direction, while the latter corresponds to the NT search direction. \({\hat{P}}_k\) in the algorithm is also chosen to satisfy \({\hat{P}}_k {\hat{X}}_k {\hat{Y}}_k {\hat{P}}_k^{-1} \in S^n\).

We remark that \((\Delta X_k^p, \Delta Y_k^p)\) and \((\Delta X_k^c, \Delta Y_k^c)\) in Algorithm 1 exist and are unique, by observing that the left hand side of (5) and (6) together when written in matrix-vector product has the matrix invertible, which holds because \((X_k, Y_k) \in {\mathcal {N}}_1(\beta _1,\tau _k)\) (by Proposition 4) and \(({\hat{X}}_k,{\hat{Y}}_k) \in {\mathcal {N}}_1(\beta _2, (1 - {\hat{\alpha }}_k) \tau _k)\), respectively. (We leave the details to show the existence and uniqueness of \((\Delta X_k^p, \Delta Y_k^p)\) and \((\Delta X_k^c, \Delta Y_k^c)\) to the reader.) Furthermore, in the above algorithm, we note that there is a wider range of choice for \(\beta _1\) and \(\beta _2\) compared with the algorithm in [25].

Let us make an observation on our choice of \(P_k, {\hat{P}}_k\) in the following proposition:

Proposition 1

Suppose \(P \in \mathfrak {R}^{n \times n}\) is an invertible matrix with \(P X Y P^{-1} \in S^n\), where \(X, Y \in S^n_{++}\). Then \(PXP^{T}\) and \(P^{-T} Y P^{-1}\) have a common set of eigenvectors with corresponding real positive eigenvalues \(\lambda _i^X\) and \(\lambda _i^Y\), \(i = 1, \ldots , n\), respectively. Also, \(PXYP^{-1}\) has the same set of eigenvectors with corresponding eigenvalues \(\lambda _i^X \lambda _i^Y\), \(i = 1, \ldots , n\).

Proof

Since \(PXYP^{-1} \in S^n\), this implies that \(PXP^T, P^{-T}YP^{-1} \in S^n_{++}\) commute. Hence they share a common set of eigenvectors with corresponding real positive eigenvalues \(\lambda _i^X, \lambda _i^Y, i = 1, \ldots , n\). Furthermore, it is easy to see that \(PXYP^{-1} = (PXP^{T})(P^{-T}YP^{-1})\) has the same set of eigenvectors with corresponding eigenvalues \(\lambda _i^X \lambda _i^Y, i = 1, \ldots , n\). \(\square \)

From now onwards, we consider Algorithm 1 using the NT search direction in Steps (a2) and (a3) of the algorithm, which means that \({{P}}_{{k}}\) and \({\hat{{{P}}}_{{k}}}\) in these steps satisfy \({{P}}_{{k}}^{{T}} {{P}}_{{k}} = {{W}}_{{k}}^{-{{1}}}\) with \({{W}}_{{k}} {{Y}}_{{k}} {{W}}_{{k}} = {{X}}_{{k}}\), and \({\hat{{{P}}}}_{{k}}^{{T}} \hat{{{P}}}_{{k}} = \hat{{{W}}}_{{k}}^{-{1}}\) with \({\hat{{{W}}}}_{{k}} \hat{{{Y}}}_{{k}} \hat{{{W}}}_{{k}} = \hat{{{X}}}_{{k}}\) respectively. Proposition 1 then applies to \(P_k\) and \({\hat{P}}_k\) given in this way since they satisfy \(P_k X_k Y_k P_k^{-1}, {\hat{P}}_k {\hat{X}}_k {\hat{Y}}_k {\hat{P}}_k^{-1} \in S^n\), respectively. Furthermore, there are different ways in which \(P_k\) and \({\hat{P}}_k\) can be chosen to form the NT search direction in these steps of Algorithm 1. In Sect. 4, using a particular choice of \(P_k\), we establish two sufficient conditions for superlinear convergence using the algorithm on SDLCPs that satisfy the strict complementarity assumption.

We also require that P that appears in (7) to be related to XY in (7) by \({{{P}}^{{T}} {{P}} = {{W}}^{-\mathbf{1}}}\) with \({{WYW}} = {{X}}\).

The following are properties satisfied by \((X_k,Y_k)\) and \(({\hat{X}}_k,{\hat{Y}}_k)\) in Algorithm 1, which are useful in the analysis given in the paper on the convergence behavior of iterates generated by the algorithm.

Proposition 2

Let \((X,Y) \in {\mathcal {N}}_1(\beta ,\tau )\), where \(\tau > 0\), \(0< \beta < 1\), and P satisfies \(P^T P = W^{-1}\) with \(WYW = X\), then

$$\begin{aligned} PXP^T= & {} P^{-T}YP^{-1} \end{aligned}$$
(12)
$$\begin{aligned} X \bullet Y \le n(1 + \beta )\tau , \quad (1 - \beta )\tau \le \lambda _{\mathrm{{min}}}(XY) \le \lambda _{\mathrm{{max}}}(XY)\le & {} (1 + \beta )\tau , \end{aligned}$$
(13)
$$\begin{aligned} \Vert PXP^T\Vert _F \le \sqrt{(1+\beta )n\tau }, \quad \Vert P^{-T}YP^{-1} \Vert _F\le & {} \sqrt{(1+ \beta )n \tau }, \end{aligned}$$
(14)
$$\begin{aligned} \Vert PXP^T\Vert _2 \le \sqrt{(1+\beta )\tau }, \quad \Vert P^{-T}YP^{-1} \Vert _2\le & {} \sqrt{(1+ \beta ) \tau }, \end{aligned}$$
(15)
$$\begin{aligned} \Vert (PXP^{T})^{-1/2} \Vert _2 \le \frac{1}{((1 - \beta )\tau )^{1/4}}, \quad \Vert (P^{-T}YP^{-1})^{-1/2} \Vert _2\le & {} \frac{1}{((1 - \beta ) \tau )^{1/4}},\nonumber \\ \end{aligned}$$
(16)
$$\begin{aligned} \Vert [I \otimes _s (P X P^T)]^{-1} \Vert _2\le & {} \frac{1}{\sqrt{(1 - \beta )\tau }}. \end{aligned}$$
(17)

Proof

The relation (12) follows immediately from \(P^T P = W^{-1}\), \(W Y W = X\) and then taking inverses.

Since \((X,Y) \in {\mathcal {N}}_1(\beta ,\tau )\), we have \(X, Y \in S^n_{++}\) and

$$\begin{aligned} \Vert H_P(XY) - \tau I \Vert _F \le \beta \tau . \end{aligned}$$

By Proposition 1 and observing that \(H_P(XY) = PXYP^{-1}\), we have using the notations in Proposition 1 that

$$\begin{aligned} \sqrt{\sum _{i=1}^n (\lambda _i^X \lambda _i^Y - \tau )^2} \le \beta \tau . \end{aligned}$$

Hence, for all \(i = 1, \ldots , n\),

$$\begin{aligned} (1-\beta )\tau \le \lambda _i^X \lambda _i^Y \le (1 + \beta )\tau , \end{aligned}$$
(18)

therefore (13) follows.

By (12), \(\lambda _i^X = \lambda _i^Y\) for all \(i = 1, \ldots , n\). The second inequality in (18) then implies (14), (15). Also, from (18) and \(\lambda _i^X = \lambda _i^Y\), we have

$$\begin{aligned} \frac{1}{\lambda _i^X} = \frac{1}{\lambda _i^Y} \le \frac{1}{\sqrt{(1-\beta )\tau }} \end{aligned}$$

for all \(i = 1, \ldots , n\). Inequalities in (16) then follow.

Using Fact 1(c), \(I \otimes _s (PXP^T)\) is symmetric and

$$\begin{aligned} \lambda _{\min }(I \otimes _s (PXP^T)) = \min _{i,j=1,\ldots , n} \frac{1}{2}(\lambda _i^X + \lambda _j^X) \ge \sqrt{(1-\beta )\tau }. \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert [I \otimes _s (P X P^T)]^{-1} \Vert _2 = \frac{1}{\lambda _{\min }(I \otimes _s (PXP^T))} \le \frac{1}{\sqrt{(1-\beta )\tau }}, \end{aligned}$$

which shows (17). \(\square \)

The following technical result leads to Proposition 4, which ensures that the set in (11) is nonempty.

Proposition 3

Given \((X,Y) \in {\mathcal {N}}_1(\beta , \tau )\), where \(\tau > 0\), \(0< \beta < 1\), and P satisfies \(P^TP = W^{-1}\) with \(WYW = X\). Suppose \(U, V \in S^n\) are such that

$$\begin{aligned} {\mathcal {A}}(U) + {\mathcal {B}}(V)= & {} 0, \end{aligned}$$
(19)
$$\begin{aligned} H_P(XV + UY)= & {} R, \end{aligned}$$
(20)

then

$$\begin{aligned} \Vert (P \otimes _s P) {\mathrm{svec}}(U) \Vert _2^2 + \Vert (P^{-T} \otimes _s P^{-T}) {\mathrm{svec}}(V) \Vert _2^2 \le \frac{1}{(1 - \beta )\tau } \Vert {\mathrm{svec}}(R) \Vert _2^2. \end{aligned}$$

Proof

Relation (20) can be written as

$$\begin{aligned} (P \otimes _s (P^{-T}Y)) {\mathrm{svec}}(U) + ((PX) \otimes _s P^{-T}) {\mathrm{svec}}(V) = {\mathrm{svec}}(R). \end{aligned}$$

The latter can in turn be expressed as

$$\begin{aligned}&[I \otimes _s (P^{-T} Y P^{-1})](P \otimes _s P){\mathrm{svec}}(U) + [I \otimes _s (P X P^T)](P^{-T} \otimes _s P^{-T}){\mathrm{svec}}(V)\nonumber \\&\quad = {\mathrm{svec}}(R), \end{aligned}$$
(21)

by Fact 1(b).

Since P satisfies \(P^TP = W^{-1}\) with \(WYW=X\), by (12), we have \(PXP^{T} = P^{-T} Y P^{-1}\). Hence, from (21), taking the inverse of \(I \otimes _s (P X P^T)\), we get

$$\begin{aligned} (P \otimes _s P){\mathrm{svec}}(U) + (P^{-T} \otimes _s P^{-T}){\mathrm{svec}}(V) = [I \otimes _s (P X P^T)]^{-1}{\mathrm{svec}}(R). \end{aligned}$$
(22)

Hence,

$$\begin{aligned}&\Vert (P \otimes _s P){\mathrm{svec}}(U) \Vert ^2_2 + \Vert (P^{-T} \otimes _s P^{-T}){\mathrm{svec}}(V) \Vert _2^2 \\&\quad \le \Vert (P \otimes _s P){\mathrm{svec}}(U) \Vert ^2_2 + 2 {\mathrm{svec}}(U)^T {\mathrm{svec}}(V) + \Vert (P^{-T} \otimes _s P^{-T}){\mathrm{svec}}(V) \Vert _2^2 \\&\quad = \Vert (P \otimes _s P){\mathrm{svec}}(U) + (P^{-T} \otimes _s P^{-T}){\mathrm{svec}}(V) \Vert _2^2 \\&\quad = \Vert [I \otimes _s (P X P^T)]^{-1}{\mathrm{svec}}(R) \Vert ^2_2 \\&\quad \le \Vert [I \otimes _s (P X P^T)]^{-1} \Vert _2^2 \Vert {\mathrm{svec}}(R) \Vert _2^2 \\&\quad \le \frac{1}{(1 - \beta )\tau } \Vert {\mathrm{svec}}(R) \Vert _2^2, \end{aligned}$$

where the first inequality follows from (19) and Assumption 1(a), the first equality follows from (22), and the last inequality follows from Proposition 2. \(\square \)

The following result shows that iterates \((X_k,Y_k)\) generated by Algorithm 1 always belong to the narrow neighborhood of the central path (7).

Proposition 4

For all \(k \ge 0\), \((X_k, Y_k) \in {\mathcal {N}}_1(\beta _1,\tau _k)\).

Proof

We prove the proposition by induction. It is easy to see that \((X_0, Y_0) \in {\mathcal {N}}_1(\beta _1, \tau _0)\). Hence, the proposition holds for \(k = 0\). Suppose the proposition holds for \(k = k_0\), where \(k_0 \ge 0\), that is, \((X_{k_0},Y_{k_0}) \in {\mathcal {N}}_1(\beta _1, \tau _{k_0})\). We wish to show that

$$\begin{aligned} \Vert H_{P_{k_0+1}}(X_{k_0+1}Y_{k_0+1}) - \tau _{k_0+1}I \Vert _F \le \beta _1 \tau _{k_0+1}, \end{aligned}$$

which then prove the proposition by induction. First observe that

$$\begin{aligned}&\Vert H_{P_{k_0+1}}(X_{k_0+1}Y_{k_0+1}) - \tau _{k_0+1}I \Vert _F \nonumber \\&\quad = \Vert P_{k_0+1} X_{k_0+1} Y_{k_0+1} P_{k_0+1}^{-1} - \tau _{k_0+1}I \Vert _F \nonumber \\&\quad = \Vert P_{k_0+1} [X_{k_0+1}Y_{k_0+1} - \tau _{k_0+1}I] P_{k_0+1}^{-1} \Vert _F \nonumber \\&\quad \le \Vert H_{{\hat{P}}_{k_0} P_{k_0+1}^{-1}}[ P_{k_0+1}[X_{k_0+1}Y_{k_0+1} - \tau _{k_0+1}I]P_{k_0+1}^{-1}] \Vert _F \nonumber \\&\quad = \Vert H_{{\hat{P}}_{k_0}}(X_{k_0+1}Y_{k_0+1}) - \tau _{k_0+1}I \Vert _F, \end{aligned}$$
(23)

where the first equality follows since \(P_{k_0+1} X_{k_0+1}Y_{k_0+1} P_{k_0+1}^{-1} \in S^n\), and the inequality follows from Lemma 2.2 in [25], again using \(P_{k_0+1} X_{k_0+1} Y_{k_0+1} P_{k_0+1}^{-1} \in S^n\). Next, observe that

$$\begin{aligned}&H_{{\hat{P}}_{k_0}}(X_{k_0+1}Y_{k_0+1}) - \tau _{k_0+1}I \\&\quad = H_{{\hat{P}}_{k_0}}(({\hat{X}}_{k_0} + \Delta X_{k_0}^c)({\hat{Y}}_{k_0} + \Delta Y_{k_0}^c)) - \tau _{k_0+1} I \\&\quad = H_{{\hat{P}}_{k_0}}({\hat{X}}_{k_0} {\hat{Y}}_{k_0}) + H_{{\hat{P}}_{k_0}}({\hat{X}}_{k_0} \Delta Y_{k_0}^c + \Delta X_{k_0}^c {\hat{Y}}_{k_0}) + H_{{\hat{P}}_{k_0}}(\Delta X_{k_0}^c \Delta Y_{k_0}^c) - \tau _{k_0+1} I \\&\quad = H_{{\hat{P}}_{k_0}}(\Delta X_{k_0}^c \Delta Y_{k_0}^c), \end{aligned}$$

where the third equality holds since \((\Delta X_{k_0}^c, \Delta Y_{k_0}^c)\) is the solution to the linear system (5), (6), in which \( X = {\hat{X}}_{k_0}, Y = {\hat{Y}}_{k_0}, P = {\hat{P}}_{k_0}, \sigma = 1, \tau =\tau _{k_0+1}\) and \({\bar{r}} = 0\). Hence,

$$\begin{aligned}&\Vert H_{{\hat{P}}_{k_0}}(X_{k_0+1}Y_{k_0+1}) - \tau _{k_0+1} I \Vert _F \nonumber \\&\quad = \Vert H_{{\hat{P}}_{k_0}}(\Delta X_{k_0}^c \Delta Y_{k_0}^c) \Vert _F \nonumber \\&\quad \le \Vert {\hat{P}}_{k_0} \Delta X_{k_0}^c \Delta Y_{k_0}^c {\hat{P}}_{k_0}^{-1} \Vert _F \nonumber \\&\quad \le \Vert {\hat{P}}_{k_0} \Delta X_{k_0}^c {\hat{P}}_{k_0}^{T} \Vert _F \Vert {\hat{P}}_{k_0}^{-T} \Delta Y_{k_0}^c {\hat{P}}_{k_0}^{-1} \Vert _F \nonumber \\&\quad = \Vert {\mathrm{svec}}({\hat{P}}_{k_0} \Delta X_{k_0}^c {\hat{P}}_{k_0}^T) \Vert _2 \Vert {\mathrm{svec}}({\hat{P}}_{k_0}^{-T} \Delta Y_{k_0}^c {\hat{P}}_{k_0}^{-1}) \Vert _2 \nonumber \\&\quad = \Vert ({\hat{P}}_{k_0} \otimes _s {\hat{P}}_{k_0}){\mathrm{svec}}(\Delta X_{k_0}^c) \Vert _2 \Vert ({\hat{P}}_{k_0}^{-T} \otimes {\hat{P}}_{k_0}^{-T}) {\mathrm{svec}}(\Delta Y_{k_0}^c) \Vert _2 \nonumber \\&\quad \le \frac{1}{2}[ \Vert ({\hat{P}}_{k_0} \otimes _s {\hat{P}}_{k_0}){\mathrm{svec}}(\Delta X_{k_0}^c) \Vert _2^2 + \Vert ({\hat{P}}_{k_0}^{-T} \otimes {\hat{P}}_{k_0}^{-T}) {\mathrm{svec}}(\Delta Y_{k_0}^c) \Vert _2^2], \end{aligned}$$
(24)

where the second inequality follows from Fact 2 and \(\Vert U \Vert _2 \le \Vert U\Vert _F\) for \(U \in S^n\).

Since \((\Delta X_{k_0}^c, \Delta Y_{k_0}^c)\) is the solution to the linear system (5), (6), in which \(X = {\hat{X}}_{k_0}, Y = {\hat{Y}}_{k_0}, P = {\hat{P}}_{k_0}, \sigma = 1, \tau =\tau _{k_0+1}\) and \({\bar{r}} = 0\), where \(({\hat{X}}_{k_0},{\hat{Y}}_{k_0}) \in {\mathcal {N}}_1(\beta _2, \tau _{k_0+1})\), it follows from Proposition 3 that

$$\begin{aligned}&\Vert ({\hat{P}}_{k_0} \otimes _s {\hat{P}}_{k_0}){\mathrm{svec}}(\Delta X_{k_0}^c) \Vert _2^2 + \Vert ({\hat{P}}_{k_0}^{-T} \otimes {\hat{P}}_{k_0}^{-T}) {\mathrm{svec}}(\Delta Y_{k_0}^c) \Vert _2^2 \\&\quad \le \frac{1}{(1- \beta _2)\tau _{k_0+1}}\Vert {\mathrm{svec}}(H_{{\hat{P}}_{k_0}}({\hat{X}}_{k_0}{\hat{Y}}_{k_0}) - \tau _{k_0+1}I) \Vert _2^2. \end{aligned}$$

Therefore, by the above inequality, (23) and (24) leads to

$$\begin{aligned}&\Vert H_{P_{k_0+1}}(X_{k_0+1}Y_{k_0+1}) - \tau _{k_0+1}I \Vert _F \\&\quad \le \frac{1}{2(1- \beta _2)\tau _{k_0+1}}\Vert {\mathrm{svec}}(H_{{\hat{P}}_{k_0}}({\hat{X}}_{k_0}{\hat{Y}}_{k_0}) - \tau _{k_0+1}I) \Vert _2^2 \\&\quad \le \frac{\beta _2^2}{2(1-\beta _2)} \tau _{k_0+1}, \end{aligned}$$

where the last inequality holds since \(({\hat{X}}_{k_0},{\hat{Y}}_{k_0}) \in {\mathcal {N}}_1(\beta _2, \tau _{k_0+1})\).

Since \(\beta _1, \beta _2 \in (0,1)\) are chosen such that \(\beta _2^2/(2(1-\beta _2)) \le \beta _1\), we have from above, \((X_{k_0+1},Y_{k_0+1}) \in {\mathcal {N}}_1(\beta _1,\tau _{k_0+1})\). By induction, the proposition is proved. \(\square \)

As mentioned earlier, the above proposition ensures that \(\alpha _{k,2}\) given by (11) is meaningful. The following result allows us to say more about \(\alpha _{k,2}\) and also shows that we can always find \({\hat{\alpha }}_k\) that satisfies (8) in Algorithm 1.

Proposition 5

We have \(\alpha _{k,1} \le \alpha _{k,2}\).

Proof

The proposition is proved by showing that for all \(0 \le \alpha \le \alpha _{k,1}\),

$$\begin{aligned} (X_k + \alpha \Delta X_k^p, Y_k + \alpha \Delta Y_k^p) \in {\mathcal {N}}_1(\beta _2, (1 - \alpha ) \tau _k). \end{aligned}$$

We have for all \(0 \le \alpha \le \alpha _{k,1}\)

$$\begin{aligned}&\Vert H_{P_k}((X_k + \alpha \Delta X_k^p)(Y_k + \alpha \Delta Y_k^p)) - (1 - \alpha )\tau _k I \Vert _F \\&\quad = \Vert H_{P_k}(X_k Y_k) + \alpha H_{P_k}(X_k \Delta Y_k^p + \Delta X_k^p Y_k) + \alpha ^2 H_{P_k}(\Delta X_k^p \Delta Y_k^p) {-} (1 {-} \alpha )\tau _k I \Vert _F \\&\quad = \Vert (1 - \alpha )[H_{P_k}(X_k Y_k) - \tau _k I] + \alpha ^2 H_{P_k}(\Delta X_k^p \Delta Y_k^p) \Vert _F \\&\quad \le (1 - \alpha )\Vert H_{P_k}(X_k Y_k) - \tau _k I \Vert _F + \alpha ^2 \Vert H_{P_k}(\Delta X_k^p \Delta Y_k^p) \Vert _F \\&\quad \le (1 - \alpha )\beta _1 \tau _k + \alpha ^2 \Vert H_{P_k}(\Delta X_k^p \Delta Y_k^p) \Vert _F \\&\quad \le \beta _2 (1 - \alpha )\tau _k, \end{aligned}$$

where the second equality holds as \((\Delta X_k^p, \Delta Y_k^p)\) satisfies the linear system (5), (6), with \(X = X_k, Y = Y_k, P = P_k\), \(\sigma = 0\), \(\tau = \tau _k\) and \({\overline{r}} = r_k\), the second inequality holds as \((X_k, Y_k) \in {\mathcal {N}}_1(\beta _1,\tau _k)\), and the last inequality follows from (9), (10). Hence, since \(X_k, Y_k \in S^n_{++}\) and \(\beta _2 < 1\), we see that if \(0< \alpha \le \alpha _{k,1} < 1\), \(X_k + \alpha \Delta X_k^p, Y_k + \alpha \Delta Y_k^p \in S^n_{++}\).

Now, since \(X_k + \alpha \Delta X_k^p, Y_k + \alpha \Delta Y_k^p \in S^n_{++}\), the following holds.

$$\begin{aligned}&\Vert H_{P_k(\alpha )}((X_k + \alpha \Delta X_k^p)(Y_k + \Delta Y_k^p)) - (1 - \alpha )\tau _k I \Vert _F \\&\quad = \Vert P_k(\alpha )[(X_k + \alpha \Delta X_k^p)(Y_k + \alpha \Delta Y_k^p) - (1 - \alpha ) \tau _k I]P_k(\alpha )^{-1} \Vert _F \\&\quad \le \Vert H_{P_k P_k(\alpha )^{-1}}(P_k(\alpha )[(X_k + \alpha \Delta X_k^p)(Y_k + \alpha \Delta Y_k^p) - (1 - \alpha ) \tau _k I]P_k(\alpha )^{-1}) \Vert _F \\&\quad = \Vert H_{P_k}( (X_k + \alpha \Delta X_k^p)(Y_k + \alpha \Delta Y_k^p)) - (1 - \alpha ) \tau _k I \Vert _F, \end{aligned}$$

where \(P_k(\alpha )\) is such that \(P_k(\alpha )^T P_k(\alpha ) = W_k(\alpha )^{-1}\) with \(W_k(\alpha )(Y_k + \alpha \Delta Y_k^p) W_k = X_k + \alpha \Delta X_k^p\) and hence \(P_k(\alpha )(X_k + \alpha \Delta X_k^p)(Y_k + \alpha \Delta Y_k^p)P_k(\alpha )^{-1} \in S^n\), so that the above inequality holds by Lemma 2.2 in [25].

Putting everything together, if \(0 \le \alpha \le \alpha _{k,1}\), we have \((X_k + \alpha \Delta X_k^p, Y_k + \alpha \Delta Y_k^p) \in {\mathcal {N}}_1(\beta _2, (1 - \alpha )\tau _k)\), as required. \(\square \)

We remark that the above proposition implies that \(\alpha _{k,2}\) given by (11) is always positive.

3 Global convergence and polynomial complexity of interior point algorithm

In this section, we show global convergence and iteration complexity results for iterates \(\{(X_k,Y_k)\}\) generated by Algorithm 1, by considering the “duality gap”, \(\mu _k := X_k \bullet Y_k/n\), and the “feasibility gap”, \(r_k = {\mathcal {A}}(X_k) + {\mathcal {B}}(Y_k) - q\). The following proposition relates \({\hat{\alpha }}_k\), \(\tau _k\) and \((X_k,Y_k)\) generated by Algorithm 1 with \(\mu _k\) and \(r_k\). First, we have the following definition.

Definition 1

Define for \(k \ge 0\), \(\psi _k = \Pi _{j=0}^{k} (1 - {\hat{\alpha }}_j)\). Also, define \(\psi _{-1} = 1\).

Proposition 6

For all \(k \ge 0\), \((1 - \beta _1)\tau _k \le \mu _k \le (1 + \beta _1)\tau _k\) and \({\mathcal {A}}(X_k) + {\mathcal {B}}(Y_k) - q = r_k = \psi _{k-1} r_0\).

Proof

By Proposition 4, for all \(k \ge 0\), \((X_k, Y_k) \in {\mathcal {N}}_1(\beta _1, \tau _k)\). Hence,

$$\begin{aligned} \Vert H_{P_k}(X_k Y_k) - \tau _k I \Vert _F \le \beta _1 \tau _k. \end{aligned}$$

It then follows from Proposition 1 that

$$\begin{aligned} \sqrt{\sum _{i=0}^n (\lambda _i^{X_k} \lambda _i^{Y_k} - \tau _k)^2} \le \beta _1 \tau _k, \end{aligned}$$

where \(\lambda _i^{X_k} \lambda _i^{Y_k}\), \(i = 1, \ldots , n\) are the eigenvalues of \(P_k X_k Y_k P_k^{-1}\). Therefore, for all \(i = 1, \ldots , n\),

$$\begin{aligned} (1 - \beta _1)\tau _k \le \lambda _i^{X_k} \lambda _i^{Y_k} \le (1 + \beta _1)\tau _k. \end{aligned}$$

The first result in the proposition then follows by noting that \(X_k \bullet Y_k = \sum _{i=1}^{n} \lambda _i^{X_k} \lambda _i^{Y_k}\).

Next, we show that

$$\begin{aligned} {\mathcal {A}}(X_k) + {\mathcal {B}}(Y_k) = q + \psi _{k-1} r_0 \end{aligned}$$
(25)

by induction on \(k \ge 0\).

Equality in (25) holds for \(k = 0\). Suppose (25) holds for \(k = k_0\) for some \(k_0 \ge 0\). Then

$$\begin{aligned}&{\mathcal {A}}(X_{k_0+1}) + {\mathcal {B}}(Y_{k_0+1}) \\&\quad = {\mathcal {A}}({\hat{X}}_{k_0}) + {\mathcal {B}}({\hat{Y}}_{k_0}) \\&\quad = {\mathcal {A}}(X_{k_0} + {\hat{\alpha }}_{k_0} \Delta X^p_{k_0}) + {\mathcal {B}}(Y_{k_0} + {\hat{\alpha }}_{k_0} \Delta Y^p_{k_0}) \\&\quad = {\mathcal {A}}(X_{k_0}) + {\mathcal {B}}(Y_{k_0}) + {\hat{\alpha }}_{k_0}(q - {\mathcal {A}}(X_{k_0}) - {\mathcal {B}}(Y_{k_0})) \\&\quad = {\hat{\alpha }}_{k_0} q + (1 - {\hat{\alpha }}_{k_0})({\mathcal {A}}(X_{k_0}) + {\mathcal {B}}(Y_{k_0})) \\&\quad = {\hat{\alpha }}_{k_0} q + (1 - {\hat{\alpha }}_{k_0})(q + \psi _{k_0-1} r_0) \\&\quad = q + \psi _{k_0} r_0, \end{aligned}$$

where the first equality follows from \(X_{k_0+1} = {\hat{X}}_{k_0} + \Delta X_{k_0}^c, Y_{k_0+1} = {\hat{Y}}_{k_0} + \Delta Y_{k_0}^c\) and \((\Delta X_{k_0}^c,\Delta Y_{k_0}^c)\) satisfying (6) with \({\bar{r}} = 0\), the third equality follows from \((\Delta X_{k_0}^p,\Delta Y_{k_0}^p)\) satistying (6) with \({\bar{r}} = r_{k_0}\), and the fifth equality follows by induction hypothesis. Hence, (25) holds for \(k = k_0 + 1\), and by induction, (25) holds for all \(k \ge 0\). \(\square \)

To show global convergence and iteration complexity results using \(\mu _k\) and \(r_k\), we only need to investigate the behavior of \(\psi _{k-1}\), since \(\tau _k = \psi _{k-1} \tau _0\) and \(\mu _k \le (1 + \beta _1) \tau _k\), and \(r_k = \psi _{k-1} r_0\). By definition of \(\psi _k\) in Definition 1, this is achieved by analyzing \({\hat{\alpha }}_j\). We consider \(\alpha _{j,1}\) instead, which is given by (9) since it is a lower bound to \({\hat{\alpha }}_j\). Since \(\alpha _{j,1}\) given by (9) is expressed in terms of \(\delta _j\), to analyze \(\alpha _{j,1}\), we only need to analyze \(\delta _j\). We have the following upper bound on \(\delta _j\).

Lemma 1

For all \(k \ge 0\), we have

$$\begin{aligned} \delta _k \le L_x L_y, \end{aligned}$$

where

$$\begin{aligned} L_x= & {} \frac{1}{\sqrt{1-\beta _1}} \left[ \beta _1 + \sqrt{n} + (2 + \beta _1 + \varsigma ) n \left( \varsigma _x + \frac{\sqrt{1 + \beta _1}}{\sqrt{1 - \beta _1}}(\varsigma _x + \varsigma _y) \right) \right] , \qquad \end{aligned}$$
(26)
$$\begin{aligned} L_y= & {} \frac{1}{\sqrt{1-\beta _1}} \left[ \beta _1 + \sqrt{n} + (2 + \beta _1 + \varsigma ) n \left( \varsigma _y + \frac{\sqrt{1 + \beta _1}}{\sqrt{1 - \beta _1}}(\varsigma _x + \varsigma _y) \right) \right] . \end{aligned}$$
(27)

Here,

$$\begin{aligned} \varsigma:= & {} \frac{X_0 \bullet Y^*+ X^*\bullet Y_0}{X_0 \bullet Y_0}, \\ \varsigma _x:= & {} 1 + \Vert P_0 X^*P_0^T \Vert _F \Vert (P_0X_0P_0^T)^{-1/2} \Vert _2^2, \\ \varsigma _y:= & {} 1 + \Vert P_0^{-T}Y^*P_0^{-1} \Vert _F \Vert (P_0^{-T} Y_0 P_0^{-1})^{-1/2} \Vert _2^2. \end{aligned}$$

We prove the above lemma in Sect. 3.1, as its proof is quite involved.

Based on Lemma 1, we have the following global convergence theorem using Algorithm 1.

Theorem 1

Given \((X_0,S_0) \in {\mathcal {N}}_1(\beta _1,\tau _0)\), we have \(\mu _k \rightarrow 0\) and \(r_k \rightarrow 0\) as \(k \rightarrow \infty \), and hence any accumulation point of the sequence \(\{(X_k,Y_k)\}\) generated by Algorithm 1 is a solution to the SDLCP (1)–(3).

Proof

Since \((X_0,S_0) \in S^n_{++}\) and for any \((X^*, Y^*)\), a solution to the SDLCP (1)–(3), it is easy to see that \(L_x\) and \(L_y\) given by (26), (27) respectively are positive constants, since \(\varsigma \), \(\varsigma _x\) and \(\varsigma _y\) are constants. Therefore, by the relation between \(\delta _j\) and \(L_x, L_y\) in Lemma 1, \(\delta _j\) is bounded above by a positive constant (that depends on \(X_0, Y_0, X^*, Y^*\)), say, L, independent of \(j \ge 0\). From (9), we therefore have

$$\begin{aligned} \alpha _{j,1} \ge \frac{2}{\sqrt{1 + 4L/(\beta _2 - \beta _1)} + 1}, \end{aligned}$$

for all \(j \ge 0\). Since

$$\begin{aligned} \psi _k= & {} \Pi _{j=0}^{k} (1 - {\hat{\alpha }}_j) \\\le & {} \Pi _{j=0}^{k} (1 - \alpha _{j,1}) \\\le & {} \left( 1 - \frac{2}{\sqrt{1 + 4L/(\beta _2 - \beta _1)} + 1} \right) ^{k+1}, \end{aligned}$$

\(\psi _k\) tends to zero as \(k \rightarrow \infty \).

Therefore, the theorem is proved by applying Proposition 6. \(\square \)

Let us now state an iteration complexity result using Algorithm 1.

Theorem 2

Let \((X_0, Y_0) \in {\mathcal {N}}_1(\beta _1,\tau _0)\) in Algorithm 1 be chosen such that there exists \((X^*, Y^*)\) a solution to the SDLCP (1)–(3) with \(\max \{ \Vert P_0 X^*P_0^{T} \Vert _F, \Vert P_0^{-T} Y^*P_0^{-1} \Vert _F \} \le \sqrt{\tau _0}\). Then given \(\epsilon > 0\), if Algorithm 1 does not stop at Step (a2) before the kth iteration with a solution to the SDLCP (1)–(3), it stops at the kth iteration in Step (a1), where \(k = {\mathcal {O}}(n \ln ( \max \{n\tau _0, \Vert r_0\Vert _2\}/\epsilon ))\), with an \(\epsilon \)-approximate solution to the SDLCP (1)–(3).

Proof

We are given \((X_0, Y_0)\) in \({\mathcal {N}}_1(\beta _1, \tau _0) \subset S^n_{++} \times S^n_{++}\), and \((X^*, Y^*)\) in \(S^n_+ \times S^n_+\) such that \(\max \{ \Vert P_0 X^*P_0^T \Vert _F,\)\(\Vert P_0^{-T} Y^*P_0^{-1} \Vert _F \} \le \sqrt{\tau _0}\).

Observe that

$$\begin{aligned} \varsigma= & {} \frac{X_0 \bullet Y^*+ X^*\bullet Y_0}{X_0 \bullet Y_0} \\\le & {} \frac{\Vert P_0 X_0 P_0^{T} \Vert _F \Vert P_0^{-T} Y^*P_0^{-1} \Vert _F + \Vert P_0 X^*P_0^T \Vert _F \Vert P_0^{-T} Y_0 P_0^{-1}\Vert _F}{X_0 \bullet Y_0} \\\le & {} \frac{2\tau _0 \sqrt{(1+\beta _1)n}}{n \tau _0} \\= & {} 2\sqrt{\frac{1 + \beta _1}{n}}, \end{aligned}$$

where the second inequality holds by (14) (since \((X_0, Y_0) \in {\mathcal {N}}_1(\beta _1,\tau _0)\)), \(\max \{ \Vert P_0 X^* P_0^{T} \Vert _F,\)\(\Vert P_0^{-T} Y^*P_0^{-1} \Vert _F \} \le \sqrt{\tau _0}\) and \(X_0 \bullet Y_0 = n \tau _0\).

Also,

$$\begin{aligned}&\varsigma _x = 1 + \Vert P_0 X^*P_0^T \Vert _F \Vert (P_0 X_0 P_0^{T})^{-1/2} \Vert _2^2 \le 1 + \frac{1}{\sqrt{1 - \beta _1}}, \\&\varsigma _y = 1 + \Vert P_0^{-T} Y^*P_0^{-1} \Vert _F \Vert (P_0^{-T} Y_0 P_0^{-1})^{-1/2} \Vert _2^2 \le 1 + \frac{1}{\sqrt{1 - \beta _1}}, \end{aligned}$$

using (16) and \(\max \{ \Vert P_0 X^*P_0^T \Vert _F, \Vert P_0^{-T} Y^*P_0^{-1} \Vert _F \} \le \sqrt{\tau _0}\).

Therefore, \(L_x, L_y\) given by (26), (27) respectively are less than or equal to \(L_0 n\), where \(L_0\) is a large enough number that depends only on \(\beta _1\). Hence, by Lemma 1, (8) and (9),

$$\begin{aligned} {\hat{\alpha }}_j \ge \frac{2}{\sqrt{1 + 4L_0^2n^2/(\beta _2 - \beta _1)} + 1} \end{aligned}$$

for all \(j \ge 0\).

Therefore,

$$\begin{aligned} \psi _{k-1}= & {} \Pi _{j=0}^{k-1} (1 - {\hat{\alpha }}_j) \\\le & {} \left( 1 - \frac{2}{\sqrt{1 + 4L_0^2n^2/(\beta _2 - \beta _1)} + 1}\right) ^{k} \\= & {} \left( \frac{\sqrt{1 + 4L_0^2n^2/(\beta _2 - \beta _1)} - 1}{\sqrt{1 + 4L_0^2n^2/(\beta _2 - \beta _1)} + 1}\right) ^{k} \\\le & {} \left( \frac{2L_0n}{\sqrt{\beta _2 - \beta _1} + 2L_0 n}\right) ^k, \end{aligned}$$

where the last inequality holds by Fact 3.

Since \(\max \{X_k \bullet Y_k, \Vert r_k\Vert _2 \} \le \epsilon \) is needed for the algorithm to terminate at Step (a1), where \(X_k \bullet Y_k/n = \mu _k \le (1 + \beta _1) \tau _k = (1+ \beta _1) \psi _{k-1} \tau _0\) and \(r_k = \psi _{k-1} r_0\), a sufficient condition for termination at Step (a1) at the kth iteration is when k satisfies

$$\begin{aligned} \left( \frac{2L_0n}{\sqrt{\beta _2 - \beta _1} + 2L_0 n}\right) ^k \max \{(1 + \beta _1)n \tau _0, \Vert r_0\Vert _2 \} \le \epsilon . \end{aligned}$$

That is,

$$\begin{aligned} k \ge \ln \left( \frac{\max \{(1+\beta _1)n\tau _0, \Vert r_0\Vert _2 \}}{\epsilon } \right) \Big / \ln \left( \frac{\sqrt{\beta _2 - \beta _1} + 2L_0n}{2L_0 n} \right) \end{aligned}$$

and the result then follows. \(\square \)

3.1 Proof of Lemma 1

Note that

$$\begin{aligned} \delta _k= & {} \frac{1}{\tau _k} \Vert H_{P_k}(\Delta X^p_k \Delta Y_k^p) \Vert _F \nonumber \\\le & {} \frac{1}{\tau _k} \min \{ \Vert P_k \Delta X_k^p P^T_k \Vert _2 \Vert P_k^{-T} \Delta Y_k^p P_k^{-1}\Vert _F, \Vert P_k \Delta X_k^p P^T_k \Vert _F \Vert P_k^{-T} \Delta Y_k^p P_k^{-1}\Vert _2 \} \nonumber \\\le & {} \frac{1}{\tau _k} \Vert P_k \Delta X^p_k P^{T}_k \Vert _F \Vert P^{-T}_k \Delta Y^p_k P^{-1}_k \Vert _F, \end{aligned}$$
(28)

where the first inequality holds by Fact 2. To show Lemma 1, we analyze \(\Vert P_k \Delta X^p_k P^{T}_k \Vert _F\) and \(\Vert P^{-T}_k \Delta Y^p_k P^{-1}_k \Vert _F\) that appear in (28) further by bounding them from above as given in the following proposition.

Proposition 7

We have

$$\begin{aligned} \Vert P_k \Delta X^p_k P^{T}_k \Vert _F\le & {} t_x + \frac{t}{\sqrt{(1 - \beta _1)\tau _k}}, \end{aligned}$$
(29)
$$\begin{aligned} \Vert P^{-T}_k \Delta Y^p_k P^{-1}_k \Vert _F\le & {} t_y + \frac{t}{\sqrt{(1 - \beta _1)\tau _k}}, \end{aligned}$$
(30)

where

$$\begin{aligned} t_x:= & {} \psi _{k-1}\Vert (P_k \otimes _s P_k){\mathrm{svec}}(X_0 - X^*)\Vert _2, \\ t_y:= & {} \psi _{k-1}\Vert (P_k^{-T} \otimes _s P_k^{-T}){\mathrm{svec}}(Y_0 - Y^*)\Vert _2, \\ t:= & {} \Vert {\mathrm{svec}}(-H_{P_k}(X_k Y_k) + \psi _{k-1}H_{P_k}(X_k(Y_0 - Y^*) + (X_0 - X^*)Y_k))\Vert _2. \end{aligned}$$

Proof

We first observe that \((\Delta X^p_k, \Delta Y^p_k)\) satisfies

$$\begin{aligned} {\mathcal {A}}(\Delta X^p_k) + {\mathcal {B}}(\Delta Y^p_k) = q - {\mathcal {A}}(X_k) - {\mathcal {B}}(Y_k). \end{aligned}$$
(31)

Now, by Proposition 6,

$$\begin{aligned} {\mathcal {A}}(X_k) + {\mathcal {B}}(Y_k) = (1 - \psi _{k-1})q + \psi _{k-1}({\mathcal {A}}(X_0) + {\mathcal {B}}(Y_0)), \end{aligned}$$

and with \({\mathcal {A}}(X^*) + {\mathcal {B}}(Y^*) = q\), from (31), we obtain

$$\begin{aligned} {\mathcal {A}}(\Delta X^p_k) + {\mathcal {B}}(\Delta Y^p_k) = \psi _{k-1}({\mathcal {A}}(X^*- X_0) + {\mathcal {B}}(Y^*- Y_0)). \end{aligned}$$

That is,

$$\begin{aligned} {\mathcal {A}}(\Delta X^p_k + \psi _{k-1}(X_0 - X^*)) + {\mathcal {B}}(\Delta Y^p_k + \psi _{k-1}(Y_0 - Y^*)) = 0. \end{aligned}$$
(32)

On the other hand,

$$\begin{aligned}&H_{P_k}(X_k(\Delta Y^p_k + \psi _{k-1}(Y_0 - Y^*)) + (\Delta X^p_k + \psi _{k-1} (X_0 - X^*))Y_k) \nonumber \\&\quad = H_{P_k}(X_k \Delta Y^p_k + \Delta X^p_k Y_k) + \psi _{k-1}H_{P_k}(X_k(Y_0 - Y^*) + (X_0 - X^*)Y_k) \nonumber \\&\quad = -H_{P_k}(X_k Y_k) + \psi _{k-1}H_{P_k}(X_k(Y_0 - Y^*) + (X_0 - X^*)Y_k), \end{aligned}$$
(33)

where the last equality holds since \((X_k,Y_k)\) satisfies (5) with \(X = X_k, Y = Y_k, \Delta X = \Delta X^p_k, \Delta Y = \Delta Y^p_k, \sigma = 0\). Since \((X_k,Y_k) \in {\mathcal {N}}_1(\beta _1,\tau _k)\), Proposition 3 can be applied to (32), (33). We have from the proposition

$$\begin{aligned}&\Vert (P_k \otimes _s P_k) {\mathrm{svec}}(\Delta X^p_k + \psi _{k-1}(X_0 - X^*)) \Vert _2 \nonumber \\&\quad \le \frac{1}{\sqrt{(1 - \beta _1)\tau _k}}\Vert {\mathrm{svec}}(-H_{P_k}(X_k Y_k) +\psi _{k-1}H_{P_k}(X_k(Y_0 - Y^*) \nonumber \\&\qquad + (X_0 - X^*)Y_k)) \Vert _2, \end{aligned}$$
(34)
$$\begin{aligned}&\Vert (P_k^{-T} \otimes _s P_k^{-T}) {\mathrm{svec}}(\Delta Y^p_k + \psi _{k-1}(Y_0 - Y^*)) \Vert _2 \nonumber \\&\quad \le \frac{1}{\sqrt{(1 - \beta _1)\tau _k}}\Vert {\mathrm{svec}}(-H_{P_k}(X_k Y_k) + \psi _{k-1}H_{P_k}(X_k(Y_0 - Y^*)\nonumber \\&\qquad + (X_0 - X^*)Y_k)) \Vert _2. \end{aligned}$$
(35)

The proposition then follows by applying triangle inequality to (34), (35) and upon algebraic manipulations. \(\square \)

Our objective to prove Lemma 1 is achieved by bounding \(t_x, t_y\) and t that appear in the upper bounds to \(\Vert P_k \Delta X_k^p P_k^T \Vert _F, \Vert P_k^{-T} \Delta Y_k^p P_k^{-1} \Vert _F\) in the above proposition appropriately. We need the following results to achieve this.

Proposition 8

For all \(k \ge 0\), \(Y_k \bullet X_0 + Y_0 \bullet X_k \le (2 + \beta _1 + \varsigma ) n \tau _0\).

Proof

By Proposition 6,

$$\begin{aligned} {\mathcal {A}}(X_k) + {\mathcal {B}}(Y_k) = q + \psi _{k-1} r_0. \end{aligned}$$

Hence,

$$\begin{aligned} {\mathcal {A}}(X_k - (1 - \psi _{k-1}) X^*- \psi _{k-1} X_0) + {\mathcal {B}}(Y_k - (1 - \psi _{k-1}) Y^*- \psi _{k-1} Y_0) = 0. \end{aligned}$$

Assumption 1(a) implies that

$$\begin{aligned} (X_k - (1 - \psi _{k-1}) X^*- \psi _{k-1} X_0) \bullet (Y_k - (1 - \psi _{k-1}) Y^*- \psi _{k-1} Y_0) \ge 0, \end{aligned}$$

from which we have

$$\begin{aligned} Y_k \bullet X_0 + Y_0 \bullet X_k \le \frac{1}{\psi _{k-1}}X_k \bullet Y_k + \psi _{k-1} X_0 \bullet Y_0 + (Y_0 \bullet X^*+ Y^*\bullet X_0). \end{aligned}$$

Result then follows from the definition of \(\varsigma \), (13) (since \((X_k, Y_k) \in {\mathcal {N}}_1(\beta _1,\tau _k)\)), \(\tau _k = \psi _{k-1}\tau _0\), where \(\psi _{k-1} \le 1\), and \(X_0 \bullet Y_0 = n \mu _0 = n \tau _0\). \(\square \)

Remark 1

Since \((X_0,Y_0) \in S^n_{++} \times S^n_{++}\), we see easily from the above proposition that \(\{ (X_k, Y_k)\ ; \ k \ge 0 \}\) is bounded.

Using Proposition 8, the following holds.

Proposition 9

For all \(k \ge 0\),

$$\begin{aligned}&\Vert P_k X_0 P_k^T \Vert _F \le (2 + \beta _1 + \varsigma ) \Vert (P_k^{-T} Y_k P_k^{-1})^{-1/2} \Vert _2^2 n \tau _0, \\&\Vert P_k^{-T} Y_0 P_k^{-1} \Vert _F \le (2 + \beta _1 + \varsigma ) \Vert (P_k X_k P_k^{T})^{-1/2} \Vert _2^2 n \tau _0, \\&\Vert P_k X^*P_k^T \Vert _F \le (2 + \beta _1 + \varsigma ) \Vert (P_k^{-T} Y_k P_k^{-1})^{-1/2} \Vert _2^2 \times \\&\quad \Vert P_0 X^*P_0^T \Vert _F \Vert (P_0 X_0 P_0^T)^{-1/2} \Vert ^2_2 n \tau _0, \\&\Vert P_k^{-T} Y^*P_k^{-1} \Vert _F \le (2 + \beta _1 + \varsigma ) \Vert (P_k X_k P_k^{T})^{-1/2} \Vert _2^2 \times \\&\quad \Vert P_0^{-T} Y^*P_0^{-1} \Vert _F \Vert (P_0^{-T} Y_0 P_0^{-1})^{-1/2} \Vert ^2_2 n \tau _0. \end{aligned}$$

Proof

It suffices to prove the first inequality since the proofs for the last three inequalities are similar.

We have

$$\begin{aligned} \Vert P_k X_0 P_k^T \Vert _F\le & {} \Vert (P_k X_0 P_k^T)^{1/2}\Vert _F^2 \\= & {} \Vert (P_k X_0 P_k^T)^{1/2} (P_k^{-T} Y_k P_k^{-1})^{1/2} (P_k^{-T} Y_k P_k^{-1})^{-1/2} \Vert _F^2 \\\le & {} \Vert (P_k X_0 P_k^T)^{1/2} (P_k^{-T} Y_k P_k^{-1})^{1/2} \Vert _F^2 \Vert (P_k^{-T} Y_k P_k^{-1})^{-1/2} \Vert _2^2 \\= & {} (X_0 \bullet Y_k) \Vert (P_k^{-T} Y_k P_k^{-1})^{-1/2} \Vert _2^2 \\\le & {} (2 + \beta _1 + \varsigma ) \Vert (P_k^{-T} Y_k P_k^{-1})^{-1/2} \Vert _2^2 n \tau _0, \end{aligned}$$

where the first inequality holds since \(P_k X_0 P_k^T \in S^n_{++}\), the second inequality holds by Fact 2, and the last inequality holds by Proposition 8. \(\square \)

With the above, we are ready to prove Lemma 1 by providing suitable upper bounds for \(t_x, t_y\) and t as given below.

Proposition 10

$$\begin{aligned} t\le & {} \tau _k \left[ \beta _1 + \sqrt{n} + \frac{\sqrt{1+\beta _1}(2 + \beta _1 + \varsigma )n}{\sqrt{1-\beta _1}}(\varsigma _x + \varsigma _y) \right] , \end{aligned}$$
(36)
$$\begin{aligned} t_x\le & {} \frac{(2 + \beta _1 + \varsigma )n \sqrt{\tau _k}}{\sqrt{1 - \beta _1}}\varsigma _x, \end{aligned}$$
(37)
$$\begin{aligned} t_y\le & {} \frac{(2 + \beta _1 + \varsigma )n \sqrt{\tau _k}}{\sqrt{1 - \beta _1}}\varsigma _y. \end{aligned}$$
(38)

Proof

We first prove the upper bound on t. We have

$$\begin{aligned} t= & {} \Vert {\mathrm{svec}}(-H_{P_k}(X_k Y_k) + \psi _{k-1}H_{P_k}(X_k(Y_0 - Y^*) + (X_0 - X^*)Y_k))\Vert _2 \nonumber \\= & {} \Vert -H_{P_k}(X_k Y_k) + \psi _{k-1}H_{P_k}(X_k(Y_0 - Y^*) + (X_0 - X^*)Y_k)\Vert _F \nonumber \\\le & {} \Vert H_{P_k}(X_k Y_k) \Vert _F + \psi _{k-1} \Vert H_{P_k}(X_k(Y_0 - Y^*) + (X_0 - X^*)Y_k))\Vert _F. \end{aligned}$$
(39)

Since \((X_k, Y_k) \in {\mathcal {N}}_1(\beta ,\tau _k)\),

$$\begin{aligned} \Vert H_{P_k}(X_k Y_k) - \tau _k I \Vert _F \le \beta _1 \tau _k. \end{aligned}$$

Hence,

$$\begin{aligned} \Vert H_{P_k}(X_k Y_k) \Vert _F \le (\beta _1 + \sqrt{n}) \tau _k. \end{aligned}$$

On the other hand,

$$\begin{aligned}&\Vert H_{P_k}(X_k(Y_0 - Y^*) + (X_0 - X^*)Y_k))\Vert _F \\&\quad \le \Vert P_k(X_k(Y_0 - Y^*))P_k^{-1} \Vert _F + \Vert P_k ((X_0 - X^*)Y_k)P_k^{-1} \Vert _F. \end{aligned}$$

Now,

$$\begin{aligned} \Vert P_k(X_k(Y_0 - Y^*))P_k^{-1} \Vert _F\le & {} \Vert P_k X_k P_k^T \Vert _2 \Vert P_k^{-T} (Y_0 - Y^*)P_k^{-1} \Vert _F \\\le & {} \Vert P_k X_k P_k^T \Vert _2 [\Vert P_k^{-T} Y_0 P_k^{-1} \Vert _F + \Vert P_k^{-T} Y^*P_k^{-1} \Vert _F ] \\\le & {} \Vert P_k X_k P_k^T \Vert _2 \Vert (P_k X_k P_k^T)^{-1/2} \Vert _2^2 (2 + \beta _1 + \varsigma )n \tau _0 \varsigma _y \\\le & {} \frac{\sqrt{1 + \beta _1} (2 + \beta _1 + \varsigma ) n \tau _0}{\sqrt{1 - \beta _1}} \varsigma _y, \end{aligned}$$

where the third inequality follows by Proposition 9, and the last inequality follows from (15), (16). Similarly,

$$\begin{aligned} \Vert P_k ((X_0 - X^*)Y_k)P_k^{-1} \Vert _F \le \frac{\sqrt{1 + \beta _1} (2 + \beta _1 + \varsigma ) n \tau _0}{\sqrt{1 - \beta _1}} \varsigma _x. \end{aligned}$$

Putting everything together, we have (36).

In a similar way, we can show (37) and (38). \(\square \)

From (28), (29), (30), using Proposition 10, Lemma 1 is proved.

4 Local convergence study of interior point algorithm

In this section, we investigate the local convergence behavior of iterates \(\{ (X_k, Y_k) \}\) generated by Algorithm 1.

We first need an assumption as follows.

Assumption 2

There exists a strictly complementary solution \((X^*,Y^*)\) to the SDLCP (1)–(3). That is, \((X^*,Y^*)\) satisfies \(X^*+ Y^*\in S^n_{++}\).

Assumption 2 is usually applied when we study the local convergence behavior of interior point algorithms on semi-definite linear complementary problems [13, 16, 24, 25, 34]. Paper [5] consider asymptotic behavior of the central path for an SDP when this assumption is relaxed.

In this section, we consider an SDLCP (1)–(3) that satisfies Assumptions 1(a),(c) and Assumption 2.

\((X^*,Y^*)\) now denotes a strictly complementary solution to the SDLCP (1)–(3). Since \(X^*, Y^*\) commute, they are jointly diagonalizable by some orthogonal matrix Q. Applying this orthogonal similarity transformation on the matrices in the SDLCP (1)–(3), we may assume without loss of generality that

$$\begin{aligned} X^*= \left[ \begin{array}{cc} \Lambda ^{X^*} &{} 0 \\ 0 &{} 0 \end{array} \right] , \ \ \ \ \ Y^*= \left[ \begin{array}{cc} 0 &{} 0 \\ 0 &{} \Lambda ^{Y^*} \end{array} \right] , \end{aligned}$$

where \(\Lambda ^{X^*} = {\mathrm{Diag}}(\lambda _1^{X^*}, \ldots , \lambda _{k_0}^{X^*}) \in S^{k_0}_{++}\) and \(\Lambda ^{Y^*} = {\mathrm{Diag}}(\lambda _1^{Y^*}, \ldots , \lambda _{n-k_0}^{Y^*}) \in S^{n-k_0}_{++}\).

Henceforth, whenever we partition a matrix \(U \in S^n\), it is always understood that it is partitioned as \(\left[ \begin{array}{cc} U_{11} &{} U_{12} \\ U_{12}^T &{} U_{22} \end{array} \right] \), where \(U_{11} \in S^{k_0}\), \(U_{22} \in S^{n - k_0}\) and \(U_{12} \in \mathfrak {R}^{k_0 \times (n - k_0)}\).

We study local superlinear convergence using Algorithm 1 in the sense of

$$\begin{aligned} \frac{\mu _{k+1}}{\mu _k} \rightarrow 0, \ {\mathrm{as}}\ k \rightarrow \infty . \end{aligned}$$
(40)

This is equivalent to

$$\begin{aligned} \frac{\tau _{k+1}}{\tau _k} \rightarrow 0,\ {\mathrm{as}}\ k \rightarrow \infty , \end{aligned}$$
(41)

by Proposition 6. Note that we have \(\tau _k \rightarrow 0\) as \(k \rightarrow \infty \), by Theorem 1 and Proposition 6.

Superlinear convergence in the sense of (40) is intimately related to local convergence behavior of iterates, as investigated for example in [23]. The following can be verified easily.

Proposition 11

A sufficient condition for (41) to hold is

$$\begin{aligned} \delta _k = \frac{1}{\tau _k} \Vert H_{P_k}(\Delta X_k^p \Delta Y_k^p) \Vert _F \rightarrow 0,\ {\mathrm{as}}\ k \rightarrow \infty ,\ {\mathrm{or}}\ \Vert H_{P_k}(\Delta X_k^p \Delta Y_k^p) \Vert _F = o(\tau _k).\nonumber \\ \end{aligned}$$
(42)

Proof

Note that (41) holds if \(\alpha _{k,1} \rightarrow 1\) as \(k \rightarrow \infty \). By (9), this sufficient condition is equivalent to \(\delta _k \rightarrow 0\) as \(k \rightarrow \infty \), where \(\delta _k\) is given by (10). Hence, the proposition is proved. \(\square \)

In the rest of this section, we are going to show that if certain condition holds for \((X_k, Y_k)\) and for certain subclass of SDLCPs, superlinear convergence using Algorithm 1 can be achieved. This is achieved by showing that the above sufficient condition (42) holds. Towards this end, we transform the system of equations, (5), (6), that relates \((\Delta X_k^p, \Delta Y_k^p)\) to \((X_k,Y_k)\) to an equivalent system of equations, and then analyze the resulting system.

From Step (a1) of Algorithm 1, where the system of Eqs. (5), (6) is used with \(\Delta X = \Delta X_k^p, \Delta Y = \Delta Y_k^p, X = X_k, Y = Y_k, P = P_k, \sigma = 0, \tau = \tau _k\) and \({\overline{r}} = r_k = (\tau _k/\tau _0)r_0\), we note that \((\Delta X_k^p, \Delta Y_k^p)\) satisfies

$$\begin{aligned}&{\mathscr {A}}{\mathrm{svec}}(\Delta X_k^p) + {\mathscr {B}}{\mathrm{svec}}(\Delta Y_k^p) = -\frac{\tau _k}{\tau _0} r_0, \end{aligned}$$
(43)
$$\begin{aligned}&H_{P_k}(X_k \Delta Y_k^p + \Delta X_k^p Y_k) = -H_{P_k}(X_k Y_k), \end{aligned}$$
(44)

where \({\mathscr {A}}: \mathfrak {R}^{{\tilde{n}}} \rightarrow \mathfrak {R}^{{\tilde{n}}}\) and \({\mathscr {B}}: \mathfrak {R}^{{\tilde{n}}} \rightarrow \mathfrak {R}^{{\tilde{n}}}\) are defined by \({\mathscr {A}}{\mathrm{svec}}(U) := {\mathcal {A}}(U)\) and \({\mathscr {B}}{\mathrm{svec}}(U) := {\mathcal {B}}(U)\), respectively, for \(U \in S^n\). Here, \(({\mathscr {A}}\ {\mathscr {B}})\) has full row rank equal to \({\tilde{n}}\) by Assumption 1(c).

We observe that (43), (44) can be written in the following way.

$$\begin{aligned} \left[ \begin{array}{cc} {\mathscr {A}} &{} {\mathscr {B}} \\ I &{} W_k \otimes _s W_k \end{array} \right] \left[ \begin{array}{c} {\mathrm{svec}}(\Delta X_k^p) \\ {\mathrm{svec}}(\Delta Y_k^p) \end{array} \right] = - \left[ \begin{array}{c} \frac{\tau _k}{\tau _0}r_0 \\ {\mathrm{svec}}(X_k) \end{array} \right] , \end{aligned}$$
(45)

by (12).

The task now is to transform (45) to an equivalent system of equations that allows us to show that if certain condition on iterates \((X_k,Y_k)\) as given in Theorem 3 is satisfied, and for certain subclass of SDLCPs as given in Theorem 4, superlinear convergence using Algorithm 1 can be ensured. First, we observe the following.

Proposition 12

\(X_k = \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(\sqrt{\tau _k}) \\ {\mathcal {O}}(\sqrt{\tau _k}) &{} \Theta (\tau _k) \end{array} \right] ,\ Y_k = \left[ \begin{array}{cc} \Theta (\tau _k) &{} {\mathcal {O}}(\sqrt{\tau _k}) \\ {\mathcal {O}}(\sqrt{\tau _k}) &{} \Theta (1) \end{array} \right] \).

Since the proof of the above proposition follows from proofs of similar results in [26, 36] and the relation between \(\mu _k\) and \(\tau _k\) as stated in Proposition 6, it will not be shown here.

The new system of equations that we are going to derive involves “iterate” \(({\bar{X}}_k, {\bar{Y}}_k)\) corresponding to \((X_k,Y_k)\), \({\bar{W}}_k\) corresponding to \(W_k\), and corresponding “predicted step” \((\Delta {\bar{X}}_k^p, \Delta {\bar{Y}}_k^p)\) to \((\Delta {X}_k^p, \Delta {Y}_k^p)\).

Define

$$\begin{aligned} {\bar{X}}_k:= & {} \left[ \begin{array}{cc} I &{} 0 \\ 0 &{} \frac{1}{\sqrt{\tau _k}}I \end{array} \right] {X}_k \left[ \begin{array}{cc} I &{} 0 \\ 0 &{} \frac{1}{\sqrt{\tau _k}}I \end{array} \right] , \\ {\bar{Y}}_k:= & {} \left[ \begin{array}{cc} \frac{1}{\sqrt{\tau _k}}I &{} 0 \\ 0 &{} I \end{array} \right] {Y}_k \left[ \begin{array}{cc} \frac{1}{\sqrt{\tau _k}}I &{} 0 \\ 0 &{} I \end{array} \right] , \end{aligned}$$

and

$$\begin{aligned} {\bar{W}}_k := \frac{1}{\sqrt{\tau _k}} \left[ \begin{array}{cc} \sqrt{\tau _k}I &{} 0 \\ 0 &{} I \end{array} \right] W_k \left[ \begin{array}{cc} \sqrt{\tau _k}I &{} 0 \\ 0 &{} I \end{array} \right] . \end{aligned}$$
(46)

Proposition 13

$$\begin{aligned} {\bar{X}}_k = \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right] , {\bar{X}}_k^{-1} = \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right]\in & {} S^n_{++}, \nonumber \\ {\bar{Y}}_k = \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right] , {\bar{Y}}_k^{-1} = \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right]\in & {} S^n_{++}, \nonumber \\ {\bar{W}}_k = \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right] , {\bar{W}}_k^{-1} = \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right]\in & {} S^n_{++}. \end{aligned}$$
(47)

Also, any accumulation point of \(\{{\bar{X}}_k\}, \{{\bar{Y}}_k\}, \{{\bar{W}}_k\}\), as k tends to infinity, are symmetric, positive definite matrices.

Proof

Observe by Proposition 12 that

$$\begin{aligned} {\bar{X}}_k= & {} \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right] \in S^n_{++}, \\ {\bar{Y}}_k= & {} \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right] \in S^n_{++}, \end{aligned}$$

and by (13) (from which we get \(1 - \beta _1 \le \lambda _{\min }({\bar{X}}_k {\bar{Y}}_k) \le \lambda _{\max }({\bar{X}}_k {\bar{Y}}_k) \le 1 + \beta _1\)), any accumulation point of \(\{ {\bar{X}}_k \}, \{ {\bar{X}}_k^{-1}\}\) or \(\{ {\bar{Y}}_k \}, \{ {\bar{Y}}_k^{-1} \}\) are symmetric, positive definite as k tends to infinity, with

$$\begin{aligned} {\bar{X}}_k^{-1}= & {} \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right] , \\ {\bar{Y}}_k^{-1}= & {} \left[ \begin{array}{cc} \Theta (1) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right] . \end{aligned}$$

By definition of \({\bar{W}}_k\) in (46), we have \({\bar{W}}_k {\bar{Y}}_k {\bar{W}}_k = {\bar{X}}_k\), which implies that \({\bar{W}}_k = {\bar{X}}_k^{1/2}({\bar{X}}_k^{1/2} {\bar{Y}}_k {\bar{X}}_k^{1/2})^{-1/2} {\bar{X}}_k^{1/2}\), from which we see that \(\{ {\bar{W}}_k \}\) is a sequence of symmetric, positive definite matrices and has accumulation points which are symmetric, positive definite as k tends to infinity, since these are so for \(\{ {\bar{X}}_k \}\) and \(\{ {\bar{Y}}_k \}\). Therefore, (47) holds. \(\square \)

By partitioning matrices \(A_i\) and \(B_i\) in \({\mathcal {A}}\), \({\mathcal {B}}\), as appeared in (4), respectively into the 4 blocks format as discussed near the beginning of this section, we perform block Gaussian elimination on \(({\mathscr {A}}\ {\mathscr {B}})\) so that \({\mathscr {A}}, {\mathscr {B}}\) can be rewritten as

$$\begin{aligned} \left[ \begin{array}{c} \left( {\mathrm{svec}} \left( \begin{array}{cc} (A_1)_{11} &{} (A_1)_{12} \\ (A_1)_{12}^T &{} (A_1)_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (A_{i_1})_{11} &{} (A_{i_1})_{12} \\ (A_{i_1})_{12}^T &{} (A_{i_1})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} (A_{i_1+1})_{12} \\ (A_{i_1+1})_{12}^T &{} (A_{i_1+1})_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} (A_{i_1+i_2})_{12} \\ (A_{i_1+i_2})_{12}^T &{} (A_{i_1+i_2})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (A_{i_1+i_2+1})_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (A_{{\tilde{n}}})_{22} \end{array} \right) \right) ^T \end{array} \right] , \ \ \left[ \begin{array}{c} \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_1)_{11} &{} (B_1)_{12} \\ (B_1)_{12}^T &{} (B_1)_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{i_1})_{11} &{} (B_{i_1})_{12} \\ (B_{i_1})_{12}^T &{} (B_{i_1})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{i_1+1})_{11} &{} (B_{i_1+1})_{12} \\ (B_{i_1+1})_{12}^T &{} 0 \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{i_1+i_2})_{11} &{} (B_{i_1+i_2})_{12} \\ (B_{i_1+i_2})_{12}^T &{} 0 \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{i_1+i_2+1})_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{{\tilde{n}}})_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) \right) ^T \end{array} \right] ,\nonumber \\ \end{aligned}$$
(48)

respectively. This technique has been used for example in [33, 36]. See also [15, 26]. We will take \({\mathscr {A}}, {\mathscr {B}}\) to be expressed as in (48) from now onwards. Note that \(({\mathscr {A}}\ {\mathscr {B}}) \in \mathfrak {R}^{{\tilde{n}} \times 2 {\tilde{n}}}\) has full row rank by Assumption 1(c), and \({\mathscr {A}}, {\mathscr {B}}\) also satisfy

$$\begin{aligned} {\mathscr {A}}u + {\mathscr {B}}v = 0 \ {\mathrm{for}}\ u, v \in \mathfrak {R}^{{\tilde{n}}} \Rightarrow u^T v \ge 0. \end{aligned}$$
(49)

The implication in (49) holds by Assumption 1(a).

Remark 2

In the case of an SDP, \({\mathscr {A}}\) and \({\mathscr {B}}\) are written as

$$\begin{aligned} {\mathscr {A}} = \left[ \begin{array}{c} {\mathscr {A}}_1 \\ 0 \end{array} \right] , \quad {\mathscr {B}} = \left[ \begin{array}{c} 0 \\ {\mathscr {B}}_1 \end{array} \right] , \end{aligned}$$
(50)

where \({\mathscr {A}}_1\) consists of m rows and \({\mathscr {B}}_1\) consists of \({\tilde{n}} - m\) rows. As discussed in [34], by performing block Gaussian elimination, \({\mathscr {A}}_1\) and \({\mathscr {B}}_1\) are given by

$$\begin{aligned} \left[ \begin{array}{c} \left( {\mathrm{svec}} \left( \begin{array}{cc} (A_1)_{11} &{} (A_1)_{12} \\ (A_1)_{12}^T &{} (A_1)_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (A_{j_1})_{11} &{} (A_{j_1})_{12} \\ (A_{j_1})_{12}^T &{} (A_{j_1})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} (A_{j_1+1})_{12} \\ (A_{j_1+1})_{12}^T &{} (A_{j_1+1})_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} (A_{j_1+j_2})_{12} \\ (A_{j_1+j_2})_{12}^T &{} (A_{j_1+j_2})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (A_{j_1+j_2+1})_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (A_{m})_{22} \end{array} \right) \right) ^T \end{array} \right] , \ \ \left[ \begin{array}{c} \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_1)_{11} &{} (B_1)_{12} \\ (B_1)_{12}^T &{} (B_1)_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{k_1})_{11} &{} (B_{k_1})_{12} \\ (B_{k_1})_{12}^T &{} (B_{k_1})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{k_1+1})_{11} &{} (B_{k_1+1})_{12} \\ (B_{k_1+1})_{12}^T &{} 0 \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{k_1+k_2})_{11} &{} (B_{k_1+k_2})_{12} \\ (B_{k_1+k_2})_{12}^T &{} 0 \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{k_1+k_2+1})_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{{\tilde{n}}-m})_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) \right) ^T \end{array} \right] , \end{aligned}$$

respectively. Note that the way \({\mathscr {A}}\), \({\mathscr {B}}\) for an SDP are written in (50) is different from that for an SDLCP, see (48). We can however write them in the form of (48) by appropriately interchanging rows in \(({\mathscr {A}} \ {\mathscr {B}})\) for the SDP.

Now, in order to transform the equation system (45) to an equivalent system, let us define \(\bar{{\mathscr {A}}}(\tau ) \in \mathfrak {R}^{{\tilde{n}} \times {\tilde{n}}}\) and \(\bar{{\mathscr {B}}}(\tau ) \in \mathfrak {R}^{{\tilde{n}} \times {\tilde{n}}}\) to be

$$\begin{aligned} \left[ \begin{array}{c} \left( {\mathrm{svec}} \left( \begin{array}{cc} (A_1)_{11} &{} \sqrt{\tau }(A_1)_{12} \\ \sqrt{\tau }(A_1)_{12}^T &{} \tau (A_1)_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (A_{i_1})_{11} &{} \sqrt{\tau }(A_{i_1})_{12} \\ \sqrt{\tau }(A_{i_1})_{12}^T &{} \tau (A_{i_1})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} (A_{i_1+1})_{12} \\ (A_{i_1+1})_{12}^T &{} \sqrt{\tau }(A_{i_1+1})_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} (A_{i_1+i_2})_{12} \\ (A_{i_1+i_2})_{12}^T &{} \sqrt{\tau }(A_{i_1+i_2})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (A_{i_1+i_2+1})_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (A_{{\tilde{n}}})_{22} \end{array} \right) \right) ^T \end{array} \right] , \ \ \left[ \begin{array}{c} \left( {\mathrm{svec}} \left( \begin{array}{cc} \tau (B_1)_{11} &{} \sqrt{\tau }(B_1)_{12} \\ \sqrt{\tau }(B_1)_{12}^T &{} (B_1)_{22} \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} \tau (B_{i_1})_{11} &{} \sqrt{\tau }(B_{i_1})_{12} \\ \sqrt{\tau }(B_{i_1})_{12}^T &{} (B_{i_1})_{22} \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} \sqrt{\tau }(B_{i_1+1})_{11} &{} (B_{i_1+1})_{12} \\ (B_{i_1+1})_{12}^T &{} 0 \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} \sqrt{\tau }(B_{i_1+i_2})_{11} &{} (B_{i_1+i_2})_{12} \\ (B_{i_1+i_2})_{12}^T &{} 0 \end{array} \right) \right) ^T \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{i_1+i_2+1})_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) \right) ^T \\ \vdots \\ \left( {\mathrm{svec}} \left( \begin{array}{cc} (B_{{\tilde{n}}})_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) \right) ^T \end{array} \right] , \end{aligned}$$
(51)

respectively, for \(\tau \ge 0\).

The following proposition, whose proof can be found for example in [32, 33, 36], relates \({\mathscr {A}}\) with \(\bar{{\mathscr {A}}}(\tau )\) and \({\mathscr {B}}\) with \(\bar{{\mathscr {B}}}(\tau )\):

Proposition 14

$$\begin{aligned} {\mathscr {A}} \left( \left[ \begin{array}{cc} I &{} 0 \\ 0 &{} \sqrt{\tau }I \end{array} \right] \otimes _s \left[ \begin{array}{cc} I &{} 0 \\ 0 &{} \sqrt{\tau }I \end{array} \right] \right) = {\mathrm{Diag}}( I_{i_1 \times i_1}, \sqrt{\tau }I_{i_2 \times i_2}, \tau I_{({\tilde{n}} - i_1 - i_2) \times ({\tilde{n}} - i_1 - i_2)}) \bar{{\mathscr {A}}}(\tau )\nonumber \\ \end{aligned}$$
(52)

and

$$\begin{aligned} {\mathscr {B}} \left( \left[ \begin{array}{cc} \sqrt{\tau }I &{} 0 \\ 0 &{} I \end{array} \right] \otimes _s \left[ \begin{array}{cc} \sqrt{\tau }I &{} 0 \\ 0 &{} I \end{array} \right] \right) = {\mathrm{Diag}}( I_{i_1 \times i_1}, \sqrt{\tau }I_{i_2 \times i_2}, \tau I_{({\tilde{n}} - i_1 - i_2) \times ({\tilde{n}} - i_1 - i_2)}) \bar{{\mathscr {B}}}(\tau ).\nonumber \\ \end{aligned}$$
(53)

An important property of \(\bar{{\mathscr {A}}}(\tau ), \bar{{\mathscr {B}}}(\tau )\) is given below.

Proposition 15

For \(\tau \ge 0\), \((\bar{{\mathscr {A}}}(\tau )\ \bar{{\mathscr {B}}}(\tau ))\) has full row rank, and

$$\begin{aligned} \bar{{\mathscr {A}}}(\tau )u + \bar{{\mathscr {B}}}(\tau )v = 0 \ { {for}}\ u, v \in \mathfrak {R}^{{\tilde{n}}} \Rightarrow u^T v \ge 0. \end{aligned}$$
(54)

Proof

We note that \(({\mathscr {A}}(\tau )\ {\mathscr {B}}(\tau ))\) has full row rank for all \(\tau \ge 0\) follows from Assumption 1(c) and the way block Gaussian elimination is performed on \(({\mathscr {A}}\ {\mathscr {B}})\) to obtain \({\mathscr {A}}, {\mathscr {B}}\) in the form (48)—see explanations for example in [36].

The implication in (54) holds for \(\tau > 0\) follows from Assumption 1(a) and (52), (53). For \(\tau = 0\), (54) holds by following the proof of Proposition 2.4 in [36] (see also the proof of Theorem 3.13 in [26]). \(\square \)

Because of the way \({\mathscr {A}}, {\mathscr {B}}\) are now structured, we have

Proposition 16

q has the following form

$$\begin{aligned} q = \left[ \begin{array}{c} q^1 \\ 0 \\ 0 \end{array} \right] \in \mathfrak {R}^{{\tilde{n}}}, \end{aligned}$$

where \(q^1 \in \mathfrak {R}^{i_1}\).

Proof

By Proposition 6, the following equation holds

$$\begin{aligned} {\mathscr {A}}{\mathrm{svec}}(X_k) + {\mathscr {B}}{\mathrm{svec}}(Y_k) = q + \frac{\tau _k}{\tau _0} r_0. \end{aligned}$$

This equation is equivalent to

$$\begin{aligned}&\bar{{\mathscr {A}}}(\tau _k){\mathrm{svec}}({\bar{X}}_k) + \bar{{\mathscr {B}}}(\tau _k){\mathrm{svec}}({\bar{Y}}_k) \\&\quad = {\mathrm{Diag}}\left( I_{i_1 \times i_1}, \frac{1}{\sqrt{\tau _k}}I_{i_2 \times i_2}, \frac{1}{\tau _k}I_{({\tilde{n}} - i_1 - i_2) \times ({\tilde{n}} - i_1 - i_2)} \right) \left( q + \frac{\tau _k}{\tau _0} r_0 \right) . \end{aligned}$$

Since the left hand side of the above equation and \(\frac{\tau _k}{\tau _0} {\mathrm{Diag}}\Big ( I_{i_1 \times i_1}, \frac{1}{\sqrt{\tau _k}}I_{i_2 \times i_2}, \frac{1}{\tau _k}I_{({\tilde{n}} - i_1 - i_2) \times ({\tilde{n}} - i_1 - i_2)} \Big ) r_{0}\) are bounded as k tends to infinity, we conclude that q must take the form as given in the proposition. \(\square \)

Remark 3

From the proof of the above proposition, we see that \(({\bar{X}}_k, {\bar{Y}}_k)\) satisfies

$$\begin{aligned} \bar{{\mathscr {A}}}(\tau _k){\mathrm{svec}}({\bar{X}}_k) + \bar{{\mathscr {B}}}(\tau _k){\mathrm{svec}}({\bar{Y}}_k) = q + \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k}, \end{aligned}$$

where

$$\begin{aligned} {\bar{r}}_{0,k} := {\mathrm{Diag}}\left( I_{i_1 \times i_1}, \frac{1}{\sqrt{\tau _k}}I_{i_2 \times i_2}, \frac{1}{\tau _k}I_{({\tilde{n}} - i_1 - i_2) \times ({\tilde{n}} - i_1 - i_2)} \right) r_0. \end{aligned}$$
(55)

Finally, by defining \(\Delta {\bar{X}}_k^p\) and \(\Delta {\bar{Y}}_k^p\) to be

$$\begin{aligned} \Delta {\bar{X}}_k^p:= & {} \left[ \begin{array}{cc} I &{} 0 \\ 0 &{} \frac{1}{\sqrt{\tau _k}}I \end{array} \right] \Delta {X}_k^p \left[ \begin{array}{cc} I &{} 0 \\ 0 &{} \frac{1}{\sqrt{\tau _k}}I \end{array} \right] , \end{aligned}$$
(56)
$$\begin{aligned} \Delta {\bar{Y}}_k^p:= & {} \left[ \begin{array}{cc} \frac{1}{\sqrt{\tau _k}}I &{} 0 \\ 0 &{} I \end{array} \right] \Delta {Y}_k^p \left[ \begin{array}{cc} \frac{1}{\sqrt{\tau _k}}I &{} 0 \\ 0 &{} I \end{array} \right] , \end{aligned}$$
(57)

we obtain from (45) the following new system of equations derived from the original system (45)

$$\begin{aligned} \left[ \begin{array}{cc} \bar{{\mathscr {A}}}(\tau _k) &{} \bar{{\mathscr {B}}}(\tau _k) \\ I &{} {\bar{W}}_k \otimes _s {\bar{W}}_k \end{array} \right] \left[ \begin{array}{c} {\mathrm{svec}}(\Delta {\bar{X}}_k^p) \\ {\mathrm{svec}}(\Delta {\bar{Y}}_k^p) \end{array} \right] = - \left[ \begin{array}{c} \frac{\tau _k}{\tau _0}{\bar{r}}_{0,k} \\ {\mathrm{svec}}({\bar{X}}_k) \end{array} \right] , \end{aligned}$$
(58)

Let us take the inverse of the matrix on the left hand side of the above equation, which can be shown to exist for \(\tau _k > 0\). Define

$$\begin{aligned} {\mathcal {G}}_k := {\mathscr {B}} - {\mathscr {A}}({W}_k \otimes _s {W}_k) \end{aligned}$$

and

$$\begin{aligned} \bar{{\mathcal {G}}}_k := \bar{{\mathscr {B}}}(\tau _k) - \bar{{\mathscr {A}}}(\tau _k)({\bar{W}}_k \otimes _s {\bar{W}}_k). \end{aligned}$$

Note that \({\mathcal {G}}_k\) and \(\bar{{\mathcal {G}}}_k\) are related by

$$\begin{aligned}&{\mathcal {G}}_k \left( \left[ \begin{array}{cc} \sqrt{\tau _k}I &{} 0 \\ 0 &{} I \end{array} \right] \otimes _s \left[ \begin{array}{cc} \sqrt{\tau _k}I &{} 0 \\ 0 &{} I \end{array} \right] \right) \\&\quad = {\mathrm{Diag}}\left( I_{i_1 \times i_1}, \frac{1}{\sqrt{\tau _k}}I_{i_2 \times i_2}, \frac{1}{\tau _k}I_{({\tilde{n}} - i_1 - i_2) \times ({\tilde{n}} - i_1 - i_2)} \right) \bar{{\mathcal {G}}}_k. \end{aligned}$$

We have from (58), by taking the inverse of the matrix on the left hand side in (58), the following

$$\begin{aligned} \left[ \begin{array}{c} {\mathrm{svec}}(\Delta {\bar{X}}_k^p) \\ {\mathrm{svec}}(\Delta {\bar{Y}}_k^p) \end{array} \right]= & {} - \left[ \begin{array}{cc} -({\bar{W}}_k \otimes _s {\bar{W}}_k) \bar{{\mathcal {G}}}_k^{-1} &{} I + ({\bar{W}}_k \otimes _s {\bar{W}}_k) \bar{{\mathcal {G}}}_k^{-1} {\mathscr {A}} \\ \bar{{\mathcal {G}}}_k^{-1} &{} - \bar{{\mathcal {G}}}_k^{-1} {\mathscr {A}} \end{array} \right] \left[ \begin{array}{c} \frac{\tau _k}{\tau _0}{\bar{r}}_{0,k} \\ {\mathrm{svec}}({\bar{X}}_k) \end{array} \right] \nonumber \\= & {} \frac{1}{2} \left[ \begin{array}{c} -({\bar{W}}_k \otimes _s {\bar{W}}_k) \bar{{\mathcal {G}}}_k^{-1}\left( q - \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \right) - {\mathrm{svec}}({\bar{X}}_k) \\ \bar{{\mathcal {G}}}_k^{-1}\left( q - \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \right) - {\mathrm{svec}}({\bar{Y}}_k) \end{array} \right] , \end{aligned}$$
(59)

where to obtain the second equality, we use the identity

$$\begin{aligned} -\bar{{\mathcal {G}}}_k^{-1} \left( \frac{\tau _k}{\tau _0}{\bar{r}}_{0,k} \right) + \bar{{\mathcal {G}}}_k^{-1} {\mathscr {A}} {\mathrm{svec}}({\bar{X}}_k) = \frac{1}{2} \bar{{\mathcal {G}}}_k^{-1} \left( q - \frac{\tau _k}{\tau _0}{\bar{r}}_{0,k} \right) -\frac{1}{2} {\mathrm{svec}}({\bar{Y}}_k). \end{aligned}$$

Given that we are using (59), which involves \(\Delta {\bar{X}}^p_k, \Delta {\bar{Y}}^p_k\), to derive meaningful conditions for superlinear convergence using Algorithm 1, let us now express (42) in terms of them. We know that \(P_k^T P_k = W_k^{-1}\) and by (46), we can let

$$\begin{aligned} P_k = \frac{1}{\tau _k^{1/4}} {\bar{W}}_k^{-1/2} \left[ \begin{array}{cc} \sqrt{\tau _k}I &{} 0 \\ 0 &{} I \end{array} \right] . \end{aligned}$$
(60)

In the following lemma, we provide the sufficient condition for superlinear convergence using Algorithm 1 in terms of \(\Delta {\bar{X}}_k^p, \Delta {\bar{Y}}_k^p\).

Lemma 2

If

$$\begin{aligned} \Delta {\bar{X}}_k^p \Delta {\bar{Y}}_k^p = o(1), \end{aligned}$$
(61)

then superlinear convergence in the sense of (40) using Algorithm 1 follows.

Proof

A sufficient condition for superlinear convergence in the sense of (40) is (42). Now observe that for (42) to hold, it is sufficient to have

$$\begin{aligned} P_k \Delta X_k^p \Delta Y_k^p P_k^{-1} = o(\tau _k). \end{aligned}$$
(62)

By (47), \(P_k\) given by (60) satisfies

$$\begin{aligned} P_k = \frac{1}{\tau _k^{1/4}}\left[ \begin{array}{cc} \Theta (\sqrt{\tau _k}) &{} {\mathcal {O}}(1) \\ {\mathcal {O}}(\sqrt{\tau _k}) &{} \Theta (1) \end{array} \right] , P_k^{-1} = \tau _k^{1/4} \left[ \begin{array}{cc} \Theta \left( \frac{1}{\sqrt{\tau _k}} \right) &{} {\mathcal {O}}\left( \frac{1}{\sqrt{\tau _k}} \right) \\ {\mathcal {O}}(1) &{} \Theta (1) \end{array} \right] . \end{aligned}$$
(63)

By (63), (62) holds if and only if

$$\begin{aligned} \Delta X_k^p \Delta Y_k^p = \left[ \begin{array}{cc} o(\tau _k) &{} o(\sqrt{\tau _k}) \\ o(\tau _k^{3/2}) &{} o(\tau _k) \end{array} \right] . \end{aligned}$$
(64)

Hence, since (62) is sufficient for (40) to hold, the lemma follows by applying (56) and (57) on (64). \(\square \)

The system of equations in (59) relates \((X_k,Y_k)\) through \(({\bar{X}}_k,{\bar{Y}}_k)\) to \((\Delta {\bar{X}}_k^p, \Delta {\bar{Y}}_k^p)\), and allows us to have a way to validate conditions on \((X_k,Y_k)\) for superlinear convergence using Algorithm 1 by showing that (61) holds. Before we provide such a condition in Theorem 3 below, let us observe the following.

Proposition 17

$$\begin{aligned} \lim _{k \rightarrow \infty } \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} = \frac{1}{\tau _0} \bar{{\mathscr {A}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (X_0)_{22} \end{array} \right) + \frac{1}{\tau _0} \bar{{\mathscr {B}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} (Y_0)_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) . \end{aligned}$$

Proof

We have

$$\begin{aligned}&\lim _{k \rightarrow \infty } \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \\&\quad = \lim _{k \rightarrow \infty } \frac{1}{\tau _0} {\mathrm{Diag}}(\tau _k I_{i_1 \times i_1}, \sqrt{\tau _k} I_{i_2 \times i_2}, I_{({\tilde{n}} - i_1 - i_2) \times ({\tilde{n}} - i_1 - i_2)}) r_0 \\&\quad = \lim _{k \rightarrow \infty } \frac{1}{\tau _0} \bar{{\mathscr {A}}}(\tau _k) \left( \left[ \begin{array}{cc} \sqrt{\tau _k}I &{} 0 \\ 0 &{} I \end{array} \right] \otimes _s \left[ \begin{array}{cc} \sqrt{\tau _k}I &{} 0 \\ 0 &{} I \end{array} \right] \right) {\mathrm{svec}}(X_0) \\&\quad \ \ \ \ + \frac{1}{\tau _0} \bar{{\mathscr {B}}}(\tau _k) \left( \left[ \begin{array}{cc} I &{} 0 \\ 0 &{} \sqrt{\tau _k}I \end{array} \right] \otimes _s \left[ \begin{array}{cc} I &{} 0 \\ 0 &{} \sqrt{\tau _k}I \end{array} \right] \right) {\mathrm{svec}}(Y_0) \\&\quad = \frac{1}{\tau _0} \bar{{\mathscr {A}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (X_0)_{22} \end{array} \right) + \frac{1}{\tau _0} \bar{{\mathscr {B}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} (Y_0)_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) , \end{aligned}$$

where the first equality holds by definition of \({\bar{r}}_{0,k}\) in (55), and the second equality holds by (52), (53) and the structure of q in Proposition 16. \(\square \)

Theorem 3

Suppose

$$\begin{aligned} \frac{X_k Y_k}{\sqrt{\tau _k}} \rightarrow 0,\ as\ k \rightarrow \infty , \end{aligned}$$
(65)

then Algorithm 1 is a superlinearly convergent algorithm in the sense of (40).

Proof

By Proposition 12, (65) implies that

$$\begin{aligned} (X_k Y_k)_{12} = o(\sqrt{\tau _k}), \end{aligned}$$

which further implies that

$$\begin{aligned} (X_k)_{12} = o(\sqrt{\tau _k}), \ (Y_k)_{12} = o(\sqrt{\tau _k}), \end{aligned}$$
(66)

by Claim 1, given in “Appendix”.

Let \(({\bar{X}}^*, {\bar{Y}}^*)\) be any accumulation point of \(\{ ({\bar{X}}_k,{\bar{Y}}_k) \}\). Then (66) implies that

$$\begin{aligned} ({\bar{X}}^*)_{12} = ({\bar{Y}}^*)_{12} = 0. \end{aligned}$$

We also have \({\bar{W}}^*\), which is the corresponding accumulation point of \(\{ {\bar{W}}_k \}\), with \(({\bar{W}}^*)_{12} = 0\).

Hence,

$$\begin{aligned}&(\bar{{\mathscr {A}}}(0)({\bar{W}}^*\otimes _s {\bar{W}}^*) + \bar{{\mathscr {B}}}(0)){\mathrm{svec}} \left( \begin{array}{cc} ({\bar{Y}}^*)_{11} &{} 0 \\ 0 &{} ({\bar{Y}}^*)_{22} \end{array} \right) \nonumber \\&\quad = \bar{{\mathscr {A}}}(0){\mathrm{svec}} \left( \begin{array}{cc} ({\bar{X}}^*)_{11} &{} 0 \\ 0 &{} ({\bar{X}}^*)_{22} \end{array} \right) + \bar{{\mathscr {B}}}(0){\mathrm{svec}} \left( \begin{array}{cc} ({\bar{Y}}^*)_{11} &{} 0 \\ 0 &{} ({\bar{Y}}^*)_{22} \end{array} \right) . \end{aligned}$$
(67)

Furthermore, we have

$$\begin{aligned}&q + \lim _{k \rightarrow \infty } \left( \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \right) \nonumber \\&\quad = q + \frac{1}{\tau _0} \bar{{\mathscr {A}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (X_0)_{22} \end{array} \right) + \frac{1}{\tau _0} \bar{{\mathscr {B}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} (Y_0)_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) , \end{aligned}$$
(68)

which follows from Proposition 17.

Note that

$$\begin{aligned} \bar{{\mathscr {A}}}(0){\mathrm{svec}}({\bar{X}}^*) + \bar{{\mathscr {B}}}(0){\mathrm{svec}}({\bar{Y}}^*) = q + \lim _{k \rightarrow \infty } \left( \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k}\right) , \end{aligned}$$

by Remark 3. From this equality, it then follows from (67), (68), \(({\bar{X}}^*)_{12} = ({\bar{Y}}^*)_{12} = 0\), the structure of q and that of \(\bar{{\mathscr {A}}}(0), \bar{{\mathscr {B}}}(0)\) that

$$\begin{aligned}&-\bar{{\mathscr {A}}}(0){\mathrm{svec}} \left( \begin{array}{cc} -({\bar{X}}^*)_{11} &{} 0 \\ 0 &{} ({\bar{X}}^*)_{22} \end{array} \right) + \bar{{\mathscr {B}}}(0){\mathrm{svec}} \left( \begin{array}{cc} -({\bar{Y}}^*)_{11} &{} 0 \\ 0 &{} ({\bar{Y}}^*)_{22} \end{array} \right) \\&\quad = q - \frac{1}{\tau _0} \bar{{\mathscr {A}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (X_0)_{22} \end{array} \right) - \frac{1}{\tau _0} \bar{{\mathscr {B}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} (Y_0)_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) \end{aligned}$$

Therefore,

$$\begin{aligned} (\bar{{\mathscr {B}}}(0) - \bar{{\mathscr {A}}}(0)({\bar{W}}^*\otimes {\bar{W}}^*)){\mathrm{svec}}\left( \begin{array}{cc} - ({\bar{Y}}^*)_{11} &{} 0 \\ 0 &{} ({\bar{Y}}^*)_{22} \end{array} \right) = q - \lim _{k \rightarrow \infty } \left( \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \right) . \end{aligned}$$

Hence, in the limit as k tends to infinity, \(\bar{{\mathcal {G}}}_k^{-1} \left( q - \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \right) \) tends to

$$\begin{aligned} {\mathrm{svec}}\left( \begin{array}{cc} - ({\bar{Y}}^*)_{11} &{} 0 \\ 0 &{} ({\bar{Y}}^*)_{22} \end{array} \right) . \end{aligned}$$

Therefore, by (59),

$$\begin{aligned} \Delta {\bar{Y}}_k^p = \left( \begin{array}{cc} -\Theta (1) &{} o(1) \\ o(1) &{} o(1) \end{array} \right) . \end{aligned}$$

In a similar way, we can show that

$$\begin{aligned} \Delta {\bar{X}}_k^p = \left( \begin{array}{cc} o(1) &{} o(1) \\ o(1) &{} -\Theta (1) \end{array} \right) . \end{aligned}$$

Since (61) holds in this case, the theorem is proved. \(\square \)

In Theorem 3, we provide a sufficient condition for superlinear convergence using Algorithm 1 on any SDLCP that satisfies Assumptions 1(a), (c) and Assumption 2. This sufficient condition is similar to that found in [24, 34] using Algorithm 1 with the HKM search direction. We have shown in the above theorem that the same condition is also sufficient for superlinear convergence using Algorithm 1 with the NT search direction. Superlinear convergence result has been establised in [16] using the NT search direction by “narrowing” the neighborhood of the central path, although in [16], a feasible algorithm is considered, while here, we consider an infeasible algorithm, with more involved analysis.

In the following theorem, we give another sufficient condition for superlinear convergence using Algorithm 1 on SDLCPs that have certain structure

Theorem 4

Let \({\mathscr {A}}, {\mathscr {B}}\) be such that for all \(1 \le i \le {\tilde{n}}\), if \({\mathscr {A}}_{i \cdot } \not = 0\), then \({\mathscr {B}}_{i \cdot } = 0\) (or equivalently, if \({\mathscr {B}}_{i \cdot } \not = 0\), then \({\mathscr {A}}_{i \cdot } = 0\)). Furthermore, let q satisfies either one of the following two conditions:

  1. 1.

    For all \(1 \le i \le i_1\), if \({\mathscr {B}}_{i \cdot } \not = 0\), then \(q_i = 0\).

  2. 2.

    For all \(1 \le i \le i_1\), if \({\mathscr {A}}_{i \cdot } \not = 0\), then \(q_i = 0\).

Suppose \((X_0,Y_0)\) is chosen such that

$$\begin{aligned} for\ all \ i_1 + i_2 + 1 \le i \le {\tilde{n}},\ {\mathscr {A}}_{i \cdot } {\mathrm{svec}}(X_0) = 0, \end{aligned}$$

if q satisfies the first condition, and

$$\begin{aligned} for\ all \ i_1 + i_2 + 1 \le i \le {\tilde{n}},\ {\mathscr {B}}_{i \cdot } {\mathrm{svec}}(Y_0) = 0, \end{aligned}$$

if q satisfies the second condition, then iterates generated by Algorithm 1 converge superlinearly in the sense of (40).

Proof

We only need to show the theorem when the first condition (condition 1) on q is satisfied. The proof of the theorem when the second condition on q holds is similar.

By Remark 3 and \({\bar{W}}_k {\bar{Y}}_k {\bar{W}}_k = {\bar{X}}_k\), we have

$$\begin{aligned} (\bar{{\mathscr {A}}}(\tau _k)({\bar{W}}_k \otimes {\bar{W}}_k) + \bar{{\mathscr {B}}}(\tau _k)){\mathrm{svec}}({\bar{Y}}_k) = q + \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \end{aligned}$$

Let \(({\bar{X}}^*, {\bar{Y}}^*)\) be any accumulation point of \(\{ ({\bar{X}}_k,{\bar{Y}}_k) \}\) as k tends to infinity, with \({\bar{W}}^*\) the corresponding accumulation point of \(\{ {\bar{W}}_k \}\). Then,

$$\begin{aligned} q + \lim _{k \rightarrow \infty } \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} = (\bar{{\mathscr {A}}}(0)({\bar{W}}^*\otimes {\bar{W}}^*) + \bar{{\mathscr {B}}}(0)){\mathrm{svec}}({\bar{Y}}^*). \end{aligned}$$
(69)

On the other hand,

$$\begin{aligned}&q + \lim _{k \rightarrow \infty } \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \nonumber \\&\quad = q + \frac{1}{\tau _0} \bar{{\mathscr {A}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (X_0)_{22} \end{array} \right) + \frac{1}{\tau _0} \bar{{\mathscr {B}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} (Y_0)_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) , \end{aligned}$$
(70)

which follows from Proposition 17.

By conditions imposed on \(({\mathscr {A}},{\mathscr {B}},q)\) and \((X_0,Y_0)\) in the theorem, (51), and the structure of q in Proposition 16, it follows from (69) and (70) that

$$\begin{aligned}&(\bar{{\mathscr {B}}}(0) - \bar{{\mathscr {A}}}(0)({\bar{W}}^*\otimes {\bar{W}}^*)){\mathrm{svec}}({\bar{Y}}^*) \\&\quad = - q + \frac{1}{\tau _0} \bar{{\mathscr {A}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} (X_0)_{22} \end{array} \right) + \frac{1}{\tau _0} \bar{{\mathscr {B}}}(0) {\mathrm{svec}} \left( \begin{array}{cc} (Y_0)_{11} &{} 0 \\ 0 &{} 0 \end{array} \right) . \end{aligned}$$

Hence, in the limit as k tends to infinity, \(\bar{{\mathcal {G}}}_k^{-1}\left( q - \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \right) \) tends to \(- {\mathrm{svec}}({\bar{Y}}^*)\) and \(({\bar{W}}_k \otimes {\bar{W}}_k)\bar{{\mathcal {G}}}_k^{-1}\left( q - \frac{\tau _k}{\tau _0} {\bar{r}}_{0,k} \right) \) tends to \(- {\mathrm{svec}}({\bar{X}}^*)\). Therefore, (59) implies that

$$\begin{aligned} \Delta {\bar{X}}_k^p = o(1),\ \Delta {\bar{Y}}_k^p = {\mathcal {O}}(1), \end{aligned}$$

from which we conclude that (61) holds which then implies that Algorithm 1 is a superlinearly convergent algorithm. \(\square \)

Theorem 4 tells us that for certain subclass of SDLCPs, if a suitable starting point \((X_0,Y_0)\) is chosen, then we have superlinear convergence using Algorithm 1 on the SDLCP. We can apply this theorem to an important subclass of semi-definite programs known as linear semi-definite feasibility problems. A linear semi-definite feasibility problem is a semi-definite program where \(C = 0\) in (\({\mathcal {P}}\)) or \(b_j = 0\), \(j = 1, \ldots , m\), in (\({\mathcal {D}}\)).

Corollary 1

For a linear semi-definite feasibility problem, when \(C = 0\) in (\({\mathcal {P}}\)), if \(X_0\) is chosen such that for all \(i_1 + i_2 + 1 \le i \le {\tilde{n}}\), \({\mathscr {A}}_{i \cdot } {\mathrm{svec}}(X_0) = 0\), and when \(b_j = 0\), \(j = 1, \ldots , m\), in (\({\mathcal {D}}\)), if \(Y_0\) is chosen such that for all \(i_1 + i_2 + 1 \le i \le {\tilde{n}}\), \({\mathscr {B}}_{i \cdot }{\mathrm{svec}}(Y_0) = 0\), then Algorithm 1 is a superlinearly convergent algorithm on the linear semi-definite feasibility problem.

Proof

When \(C = 0\) in (\({\mathcal {P}}\)), it is easy to check that \(({\mathscr {A}}, {\mathscr {B}}, q)\) satisfies the conditions in Theorem 4, with q satisfying the first condition there. When \(b_j = 0\), \(j = 1, \ldots , m\), in (\({\mathcal {D}}\)), it is also easy to check that \(({\mathscr {A}}, {\mathscr {B}}, q)\) satisfies the conditions in Theorem 4, with q satisfying the second condition there. Corollary then follows from Theorem 4. \(\square \)

Corollary 1 states a similar result as Theorem 5.1 in [34]. The latter holds for the HKM search direction, while Corollary 1 applies to the NT search direction. It is worthwhile to note that to solve the linear semi-definite feasibility problem, in the literature [4, 7, 9, 38, 41] the assumption that the interior of dual feasible region is nonempty is usually made. In the above corollary, we do not need such an assumption to show superlinear convergence using Algorithm 1. Only strict complementarity assumption and a suitable initial iterate are needed. In fact, we see from Corollary 1 that if the linear semi-definite feasibility problem has primal feasible region with nonempty relative interior in the case when \(C = 0\) in (\({\mathcal {P}}\)), and has dual feasible region with nonempty interior in the case when \(b_j = 0\), \(j = 1, \ldots , m\), in (\({\mathcal {D}}\)), then Algorithm 1 is always a superlinearly convergent algorithm, irrespectively of starting point \((X_0, Y_0)\). This is so because the conditions in the corollary are satisfied trivially, as \(k_0 = n\) in the former case and \(k_0 = 0\) in the latter case. That is, in these cases, the matrices that we considered are no longer partitioned into 4 blocks, and we have \(i_1 = {\tilde{n}}\) and \(i_2 = 0\) in both cases.

5 Numerical study

In this section, we report on the numerical results we obtained upon applying Algorithm 1 to solve instances of SDLCP (1)–(3). The algorithm is implemented by writing Matlab R2018a scripts and is run on a personal computer with an Intel(R) Core(TM) i5-4460 CPU and 8 GB of memory. In existing SDP solvers, such as SeDuMi [37], SDPT3 [40], there is an option for the equation system to find the corrector step in the interior point algorithm to have a second order term. Having the second order term tends to enhance practical efficiency of the algorithm. However, we decide to perform our numerical experiments without introducing a second order term in the equation system to find the corrector step in Algorithm 1. This is mainly due to more observed numerical warnings when we run our Matlab programs as iterates get closer to an optimal solution for instances of SDLCP (1)–(3) we tested when the algorithm is used with the second order term introduced. It is also worthwhile to note that with or without the second order term in the equation system to find the corrector step, the number of iterations to solve an instance of SDLCP (1)–(3), namely, a linear semi-definite feasibility problem (LSDFP) with

$$\begin{aligned} \begin{array}{l} A_1 = \left( \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{array} \right) , \quad A_2 = \left( \begin{array}{cccc} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{array} \right) , \quad A_3 = \left( \begin{array}{cccc} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{array} \right) , \\ A_4 = \left( \begin{array}{cccc} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 &{} -1 \end{array} \right) , \quad A_5 = \left( \begin{array}{cccc} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 &{} 1 \end{array} \right) , \\ C = 0, b_1 = 1, b_i = 0,\ i = 2, 3, 4, 5, \end{array} \end{aligned}$$
(71)

using our implemented algorithm differs only by 1 iteration, when tolerance \(\epsilon \) is set to \(10^{-10}\).

In all our numerical experiments, we set \(\beta _1 = 0.3, \beta _2 = 0.45\) and the tolerance \(\epsilon \) to be \(10^{-10}\) in Algorithm 1.

We generate random instances of SDLCP (1)–(3) by first generating diagonal matrices \(D_{\hat{{\mathscr {A}}}}, D_{\hat{{\mathscr {B}}}} \in \mathfrak {R}^{{\tilde{n}} \times {\tilde{n}}}\), where \({\tilde{n}} = n(n+1)/2\), such that the main diagonal entries of \(D_{\hat{{\mathscr {B}}}}\) are randomly taken from the uniform distribution between \(-5\) and \(-1\), and the main diagonal entries of \(D_{\hat{{\mathscr {A}}}}\) are zero or nonzero with equal probability. If a main diagonal entry of \(D_{\hat{{\mathscr {A}}}}\) is nonzero, then it is randomly assigned a value from the uniform distribution between 0 and 4. We obtain the matrices \(\hat{{\mathscr {A}}}, \hat{{\mathscr {B}}} \in \mathfrak {R}^{{\tilde{n}} \times {\tilde{n}}}\) by

$$\begin{aligned} \hat{{\mathscr {A}}} = VD_{\hat{{\mathscr {A}}}}U, \quad \hat{{\mathscr {B}}} = VD_{\hat{{\mathscr {B}}}}U, \end{aligned}$$

where \(U, V \in \mathfrak {R}^{{\tilde{n}} \times {\tilde{n}}}\) are randomly generated orthogonal matrices, see [31]. We have \({\mathscr {A}}, {\mathscr {B}} \in \mathfrak {R}^{{\tilde{n}} \times {\tilde{n}}}\) to be obtained from \(\hat{{\mathscr {A}}}, \hat{{\mathscr {B}}}\) by interchanging corresponding columns in the latter matrices when a random number generated from the uniform (0, 1) distribution is less than 0.5, and keeping these columns when the random number is greater than or equal to 0.5. \({\mathscr {A}}, {\mathscr {B}}\) thus obtained satisfy Assumption 1(a),(c). On the other hand, Assumption 1(b) is satisfied by setting q to be

$$\begin{aligned} {\mathscr {A}}({\mathrm{svec}}(I_{n \times n})) + {\mathscr {B}}({\mathrm{svec}}(I_{n \times n})). \end{aligned}$$

Hence, \(X^1, Y^1\) in Assumption 1(b) are both equal to the identity matrix. We set

$$\begin{aligned} X_0 = \eta I_{n \times n}, \quad Y_0 = \eta I_{n \times n}, \end{aligned}$$

where

$$\begin{aligned} \eta = \max \left\{ 10, \sqrt{n}, n\max _{1 \le i \le {\tilde{n}}} \left\{ \frac{1 + |q_i|}{1+ \Vert {\mathscr {A}}_{i\cdot }\Vert _2}, \frac{1 + |q_i|}{1 + \Vert {\mathscr {B}}_{i\cdot } \Vert _2} \right\} \right\} . \end{aligned}$$

This choice of initial iterate \((X_0,Y_0)\) is motivated by [40].

For each n from 5 to 15, we attempt to solve, using Algorithm 1, 100 instances of SDLCP (1)–(3) of size n, randomly generated with given initial iterate \((X_0,Y_0)\) as described in the last paragraph. For each n, we compute the average number of iterations taken and average runtime for the algorithm to terminate for those instances that give real valued \((X_k,Y_k)\) upon termination of the algorithm. We denote \(I_n\) to be the number of these instances. Our results are given in Table 1.

Table 1 Performance of implemented interior point algorithm on SDLCP (1)–(3)

Comparing our implemented algorithm with existing solvers, using our implemented algorithm on the LSDFP with data given by (71), we need 12 iterations and a runtime of 0.02 secs before termination, while SDPT3 needs 7 iterations with a runtime of 0.13 secs as reported by the software (OPTIONS.gaptol in SDPT3 is set to \(1e-10\)), and SeDuMi needs 5 iterations with a runtime of 1.28 secs as reported by the software (pars.eps in SeDuMi is set to \(1e-14\)). The initial iterate \((X_0,Y_0)\) used for our implemented algorithm on this LSDFP is

$$\begin{aligned} X_0 = 10 \left( \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} -0.5 \\ 0 &{} 0 &{} -0.5 &{} 1 \end{array} \right) , \quad Y_0 = 10I. \end{aligned}$$
(72)

The same initial iterate is also used when solving this problem using SDPT3.

We now report on our numerical investigation on the local convergence of Algorithm 1 when it is used to solve the LSDFP with data given by (71). It is easy to check that the LSDFP satisfies Assumption 2 with

$$\begin{aligned} X^*= \left( \begin{array}{cccc} x_1 &{} x_{2} &{} 0 &{} 0 \\ x_{2} &{} 1 - x_{1} &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{array} \right) , \quad Y^*= \left( \begin{array}{cccc} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} y_1 + y_2 &{} y_3 \\ 0 &{} 0 &{} y_3 &{} y_3 - y_2 \end{array} \right) , \end{aligned}$$

where \(x_1, x_2, y_1, y_2, y_3\) are constrained for these matrices to be positive semi-definite. It is also easy to check that when the initial iterate \((X_0,Y_0)\) in Algorithm 1 is chosen to be given by (72), the condition in Corollary 1 is satisfied. As predicted by our theoretical result in the corollary, our numerical results given in Table 2 shows that superlinear convergence of iterates generated by the implemented algorithm when solving this LSDFP takes place. It is interesting to note that with other choices of initial iterate \((X_0,Y_0)\), we also observe superlinear convergence of iterates generated by the implemented algorithm.

Table 2 Convergence behavior of iterates generated by implemented interior point algorithm on LSDFP with data given by (71)

6 Conclusion

In this paper, we consider an infeasible predictor–corrector primal–dual path following interior point algorithm, using the Nesterov–Todd (NT) search direction, to solve a semi-definite linear complementarity problem (SDLCP). Global convergence is shown using the algorithm to solve an SDLCP and an iteration complexity bound which is polynomial in n, the size of the matrices involved, is also provided. This complexity bound is the best known so far for infeasible interior point algorithms when the “narrow” neighborhood is used for solving SDPs. Furthermore, we study superlinear convergence using the algorithm under strict complementarity assumption. Two sufficient conditions are provided for this to occur. The first sufficient condition is on the behavior of iterates generated by the algorithm, while the second sufficient condition is on the structure of SDLCPs. We finally report on preliminary numerical results we obtained upon implementing the interior point algorithm and using it to solve SDLCPs that are not necessarily SDPs.