## 1 Introduction

### 1.1 Low-rank approximation

Low-rank matrix approximation involves approximating a matrix by a matrix whose rank is less than that of the original matrix. Let $$A\in {\mathbb {R}}^{m\times n}$$; then, a rank k approximation of A is given by

$$A \approx BC$$

where $$B\in {\mathbb {R}}^{m\times k}$$ and $$C\in {\mathbb {R}}^{k\times n}$$. Low-rank matrix approximation appears in many applications such as data mining [5] and machine learning [14]. It also plays an important role in tensor decompositions [12].

This paper discusses truncation errors of low-rank matrix approximation using QR decomposition with pivoting, or pivoted QR. In this study, rounding errors are not considered, and the norm used is basically 2-norm. $$A\in {\mathbb {R}}^{m\times n}$$ (without loss of generality, we assume that $$m\ge n$$) is approximated by a product of $$B\in {\mathbb {R}}^{m\times k}$$ and $$C\in {\mathbb {R}}^{k\times n}$$, and the truncation error is defined by $$\Vert A-BC\Vert _2$$.

It is well-known that for any matrix $$A\in {\mathbb {R}}^{m\times n}$$ ($$m\ge n$$), there are orthogonal matrices $$U \in {\mathbb {R}}^{m\times m}$$ and $$V\in {\mathbb {R}}^{n\times n}$$ and a diagonal matrix $$\varSigma \in {\mathbb {R}}^{n\times n}$$ with nonnegative diagonal elements that satisfy

\begin{aligned} A = U\begin{pmatrix} \varSigma \\ O \end{pmatrix}V^T.\ \end{aligned}

This is a singular value decomposition (SVD) of A. We define $$\sigma _i(A)$$ for $$i = 1$$, 2, ..., n, satisfying

\begin{aligned} {\mathrm{diag }}(\sigma _1(A),\sigma _2(A),\dots ,\sigma _n(A)) = \varSigma , \end{aligned}

and assume that $$\sigma _1(A) \ge \sigma _2(A) \ge \dots \ge \sigma _n(A) \ge 0$$ without loss of generality. The $$\sigma _i$$ values are singular values of A. A has rank k if and only if $$\sigma _k(A) > 0 = \sigma _{k+1}(A)$$. Let

\begin{aligned} \varSigma _k = {\mathrm{diag}}(\sigma _1(A),\sigma _2(A),\dots ,\sigma _k(A)). \end{aligned}

Then,

\begin{aligned} \min _{{\mathrm{rank}}(X)\le k}\Vert A-X\Vert _2&=\left\| A-U\begin{pmatrix} \varSigma _k &{}\quad O\\ O &{}\quad O \end{pmatrix}V^T\right\| _2\\&=\sigma _{k+1}(A) \end{aligned}

holds [8]. Therefore, this is an A’s rank-k approximation whose 2-norm of truncation error is the smallest. We define the truncation error of low-rank approximation by SVD as

\begin{aligned} SVD_k(A) = \sigma _{k+1}(A). \end{aligned}

The amount of computation required to calculate SVD is $$O(nm \min (n,m))$$.

Pivoted QR was proposed by Golub in 1965 [7]. Because the amount of computation required to calculate the low-rank approximation by pivoted QR is O(nmk), it is cheaper than SVD and hence useful in many applications such as solving rank-deficient least squares problems [2]. It consists of QR decomposition and pivoting. For any matrix A, there exist $$Q\in {\mathbb {R}}^{m\times n}$$ and an upper triangular matrix $$R\in {\mathbb {R}}^{n\times n}$$ that satisfy $$A=QR$$ and $$Q^TQ=I_n$$. This is a QR decomposition of A. We use pivoting to determine the permutation matrix $$\varPi _{grd}$$ and apply the QR decomposition algorithm to $$A\varPi _{grd}$$. The subscript grd signifies the greedy method, as explained previously. Hereafter, we redefine QR as a QR decomposition of $$A\varPi _{grd}=QR$$. Let Q and R be partitioned as

\begin{aligned}Q=\begin{pmatrix} Q_{1k}&Q_{2k} \end{pmatrix}, R=\begin{pmatrix} R_{1k} &{}\quad R_{2k}\\ O &{}\quad R_{3k} \end{pmatrix} \end{aligned}

where $$Q_{1k}\in {\mathbb {R}}^{m\times k}$$ and $$R_{1k}\in {\mathbb {R}}^{k\times k}$$. Then, we can approximate A to $$Q_{1k}\begin{pmatrix} R_{1k}&R_{2k} \end{pmatrix}\varPi _{grd}^T$$ and

\begin{aligned}\Vert A-Q_{1k}\begin{pmatrix} R_{1k}&R_{2k} \end{pmatrix}\varPi _{grd}^T\Vert _2 = \Vert R_{3k}\Vert _2 \end{aligned}

holds. We define the truncation error of low-rank approximation by pivoted QR as

\begin{aligned} pivotQR_k(A) = \Vert R_{3k}\Vert _2. \end{aligned}

In this study, the greedy method is used to make $$\Vert R_{3k}\Vert _2$$ small in pivoting. Pivoting is performed such that the elements in $$R = (r_{ij})$$ satisfy the following inequalities [1, p.103]

\begin{aligned} r_{ll}^2 \ge \sum _{i=l}^jr_{ij}^2 \quad (l=1,2,\dots ,n-1,\ j=l+1,l+2,\dots ,n). \end{aligned}
(1)

Condition (1) is not used to analyze the error for $$l=k+1$$, $$k+2$$, ..., $$n-1$$.

The greedy method of pivoting is not always optimal. QR decompositions of $$A\varPi _{RR}$$, where $$\varPi _{RR}$$ is chosen such that $$R_{RR}$$ has a small lower right block and where $$Q_{RR}R_{RR}$$ is a QR decomposition of $$A\varPi _{RR}$$, are called rank-revealing QR (RRQR). The following theorem was shown by Hong et al. in 1992 [9].

### Theorem 1

Let $$m\ge n > k$$, and $$A\in {\mathbb {R}}^{m\times n}$$. Then, there exists a permutation matrix $$\varPi \in {\mathbb {R}}^{n\times n}$$ such that the diagonal blocks of $$R = \begin{pmatrix}R_1 &{}\quad R_2\\ O &{}\quad R_3\end{pmatrix}$$, the upper triangular factor of the QR decomposition of $$A\varPi$$ with $$R_1\in {\mathbb {R}}^{k\times k}$$, satisfy the following inequality:

\begin{aligned} \Vert R_3\Vert _2 \le \sqrt{k(n-k)+\min (k,n-k)} \ \sigma _{k+1}(A). \end{aligned}

Finding the optimal permutation matrix is not practical from the viewpoint of computational complexity.

### 1.2 Truncation error of pivoted QR

Pivoted QR sometimes results in a large truncation error. A well-known example was shown by Kahan, whose work we do not reproduce here [10]. In 1968, Faddeev et al. [6] showed that

\begin{aligned} pivotQR_{n-1}(A)\le \frac{\sqrt{4^n+6n-1}}{3} \ SVD_{n-1}(A). \end{aligned}

Furthermore,

\begin{aligned} pivotQR_{k}(A)\le \frac{n\sqrt{4^k+6k-1}}{3} \ SVD_{k}(A) \end{aligned}

holds [3].

However, in a survey in 2017, it was stated that “very little is known in theory about its behaviour” [13, p. 2218] with regard to pivoted QR, thus there is still room for further research on pivoted QR.

Our previous work showed that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is $$\sqrt{\frac{4^{n-1}+2}{3}}$$ in case an $$m\times n$$ ($$m\ge n$$) matrix is approximated to a matrix whose rank is $$n-1$$, i.e., for $$k = n - 1$$ [11]. The tight upper bound for all k is proved in the rest of this paper.

We assume that all matrices and vectors in this paper are real numbers; however, we can easily extend the discussion in this paper to complex numbers, and the same results can be obtained.

## 2 Preliminaries

In this section, we define the notations and examine the basic properties to analyze the truncation errors. First, we introduce the concept $${\mathrm{resi}}$$.

### Proposition 1

[1, p. 16] For $$A\in {\mathbb {R}}^{m\times n}$$, there exists $$X\in {\mathbb {R}}^{n\times m}$$ that satisfies

\begin{aligned} AXA=A,XAX=X,(AX)^T=AX,(XA)^T=XA \end{aligned}

and X is uniquely determined by the four conditions.

### Definition 1

For $$A\in {\mathbb {R}}^{m\times n}$$ ($$m\ge n$$), the generalized inverse of A is defined by $$X\in {\mathbb {R}}^{n\times m}$$ that satisfies the four conditions in Proposition 1 and is denoted by $$A^{\dagger }$$.

The following notation is closely related to the truncation error of pivoted QR.

### Definition 2

Let $$A\in {\mathbb {R}}^{m\times n}$$ ($$m\ge n$$) and $$B\in {\mathbb {R}}^{m \times l}$$. We define $${\mathrm{resi}}(A,B)$$ as

\begin{aligned}{\mathrm{resi}}(A,B) = B-AA^{\dagger }B. \end{aligned}

We denote the inner product of two vectors $$\varvec{x}$$ and $$\varvec{y}$$ as $$(\varvec{x},\varvec{y})$$.

### Example 1

For $$\varvec{x}\in {\mathbb {R}}^{n}$$ and $$\varvec{y}\in {\mathbb {R}}^{n}$$, if $$\varvec{x} \ne \varvec{0}$$, then the following holds:

\begin{aligned} {\mathrm{resi}}(\varvec{x},\varvec{y}) = \varvec{y} - \frac{(\varvec{x},\varvec{y})}{\Vert \varvec{x}\Vert ^2}\varvec{x}. \end{aligned}

The following lemma will be used to identify $${\mathrm{resi}}$$.

### Lemma 1

Let $$A\in {\mathbb {R}}^{m\times n}$$ $$(m\ge n)$$ and $$B\in {\mathbb {R}}^{m\times l}$$. For $$X\in {\mathbb {R}}^{n\times l}$$,

\begin{aligned} A^TAX-A^TB = O \Leftrightarrow {\mathrm{resi}}(A,B) = B - AX \end{aligned}

holds.

### Proof

If $${\mathrm{resi}}(A,B) = B-AX$$ holds, then

\begin{aligned} A^TAX-A^TB&= -A^T{\mathrm{resi}}(A,B)=(A^TAA^{\dagger }-A^T)B\\&=(A^T(AA^{\dagger })^T-A^T)B=((AA^{\dagger }A)^T-A^T)B=O \end{aligned}

holds. If $$A^TAX-A^TB = O$$ holds, then

\begin{aligned}&{\mathrm{resi}}(A,B) -B+AX = AX-AA^{\dagger }B = AA^{\dagger }AX-AA^{\dagger }B\\&\quad = (AA^{\dagger })^TAX-(AA^{\dagger })^TB = A^{\dagger T}(A^TAX-A^TB) = O \end{aligned}

holds. $$\square$$

### Lemma 2

[1, p. 5] Let $$A\in {\mathbb {R}}^{m\times n}$$ $$(m\ge n)$$, $$\varvec{b}\in {\mathbb {R}}^{m}$$, and $$\varvec{x}\in {\mathbb {R}}^{n}$$. $$\Vert \varvec{b}-A\varvec{x}\Vert \le \Vert \varvec{b}-A\varvec{y}\Vert$$ holds for any $$\varvec{y}\in {\mathbb {R}}^n$$ if and only if $$A^T(A\varvec{x}-\varvec{b})=\varvec{0}$$ holds.

Using Lemmas 1 and 2, we can obtain the following lemma.

### Lemma 3

Let $$A\in {\mathbb {R}}^{m\times n}$$ $$(m\ge n)$$, $$\varvec{b}\in {\mathbb {R}}^{m}$$, and $$\varvec{x}\in {\mathbb {R}}^{n}$$. $$\Vert \varvec{b}-A\varvec{x}\Vert \le \Vert \varvec{b}-A\varvec{y}\Vert$$ holds for any $$\varvec{y}\in {\mathbb {R}}^n$$ if and only if $${\mathrm{resi}}(A,\varvec{b}) = \varvec{b} - A\varvec{x}$$ holds.

### Lemma 4

Let $$m\ge n > k$$, $$A\in {\mathbb {R}}^{m\times n}$$, and $$B\in {\mathbb {R}}^{m\times l}$$. Let A be partitioned as

\begin{aligned}A = \begin{pmatrix} A_{1k}&A_{2k} \end{pmatrix}\end{aligned}

where $$A_{1k}\in {\mathbb {R}}^{m\times k}$$. Then,

\begin{aligned} {\mathrm{resi}}(A,B)={\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B)) \end{aligned}

holds.

### Proof

From the definition of $${\mathrm{resi}}$$, we can see that

\begin{aligned} {\mathrm{resi}}(A_{1k},A_{2k})= & {} A_{2k}-A_{1k}X, \end{aligned}
(2)
\begin{aligned} {\mathrm{resi}}(A_{1k},B)= & {} B-A_{1k}Y \end{aligned}
(3)

and

\begin{aligned} {\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B))={\mathrm{resi}}(A_{1k},B)-{\mathrm{resi}}(A_{1k},A_{2k})Z \end{aligned}
(4)

hold where $$X = A_{1k}^{\dagger }A_{2k}$$, $$Y = A_{1k}^{\dagger }B$$ and $$Z = {\mathrm{resi}}(A_{1k},A_{2k})^{\dagger }{\mathrm{resi}}(A_{1k},B)$$. Thus,

\begin{aligned} {\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B)) = B - A_{1k}Y - A_{2k}Z + A_{1k}XZ \end{aligned}
(5)

holds from (2), (3), and (4). Lemma 1 proves

\begin{aligned} A_{1k}^TA_{2k}=A_{1k}^TA_{1k}X, \end{aligned}
(6)

from (2),

\begin{aligned} A_{1k}^TB=A_{1k}^TA_{1k}Y \end{aligned}
(7)

from (3), and

\begin{aligned} {\mathrm{resi}}(A_{1k},A_{2k})^T({\mathrm{resi}}(A_{1k},B)-{\mathrm{resi}}(A_{1k},A_{2k})Z)=O \end{aligned}
(8)

from (4). We can see that

\begin{aligned}&A_{1k}^T{\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B)) \nonumber \\&\quad = \ A_{1k}^T(B-A_{1k}Y-A_{2k}Z+A_{1k}XZ) \nonumber \\&\quad = \ O \end{aligned}
(9)

from (5), (6), and (7). We can see that

\begin{aligned}&A_{2k}^T{\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B)) \nonumber \\&\quad = \ (A_{2k}-A_{1k}X)^T{\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B)) \nonumber \\&\quad = \ {\mathrm{resi}}(A_{1k},A_{2k})^T({\mathrm{resi}}(A_{1k},B)-{\mathrm{resi}}(A_{1k},A_{2k})Z) \nonumber \\&\quad = \ O \end{aligned}
(10)

from (2), (4), (8), and (9). Then, (9) and (10) can be combined as

\begin{aligned}&\begin{pmatrix} A_{1k}^T\\ A_{2k}^T \end{pmatrix}{\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B)) \nonumber \\&\quad = \ A^T{\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B)) = O. \end{aligned}
(11)

Next, (5) can be rewritten as

\begin{aligned}{\mathrm{resi}}({\mathrm{resi}}(A_{1k},A_{2k}),{\mathrm{resi}}(A_{1k},B)) = B-A\begin{pmatrix} Y-XZ\\ Z \end{pmatrix}.\end{aligned}

From this and (11), we have

\begin{aligned}A^T\left( B-A\begin{pmatrix} Y-XZ\\ Z \end{pmatrix}\right) =O.\end{aligned}

Application of Lemma 1 to this proves the lemma. $$\square$$

QR decomposition and $${\mathrm{resi}}$$ have the following relation. Note that QR in this lemma is without pivoting.

### Lemma 5

Let $$m\ge n > l$$, $$A\in {\mathbb {R}}^{m\times n}$$, and $$A=QR$$ be a QR decomposition partitioned as

\begin{aligned}A=\begin{pmatrix} A_{1l}&A_{2l} \end{pmatrix}, \ Q=\begin{pmatrix} Q_{1l}&Q_{2l} \end{pmatrix}, \ R=\begin{pmatrix} R_{1l} &{}\quad R_{2l}\\ O &{}\quad R_{3l} \end{pmatrix} \end{aligned}

where $$A_{1l}\in {\mathbb {R}}^{m\times l},Q_{1l}\in {\mathbb {R}}^{m\times l},R_{1l}\in {\mathbb {R}}^{l\times l}$$. If $${\mathrm{rank}}(A_{1l}) = l$$ holds, then

\begin{aligned}{\mathrm{resi}}(A_{1l},A_{2l}) = Q_{2l}R_{3l} \end{aligned}

holds.

### Proof

We have

\begin{aligned} A_{1l} = Q_{1l}R_{1l}, \ A_{2l}=Q_{1l}R_{2l} + Q_{2l}R_{3l}. \end{aligned}

Let

\begin{aligned} X = R_{1l}^{-1}R_{2l}. \end{aligned}

Then, we have

\begin{aligned} Q_{2l}R_{3l} = A_{2l} - A_{1l}X. \end{aligned}

Furthermore,

\begin{aligned} A_{1l}^T(A_{2l} - A_{1l}X) = A_{1l}^TQ_{2l}R_{3l} = R_{1l}^TQ_{1l}^TQ_{2l}R_{3l}=R_{1l}^TOR_{3l}=O \end{aligned}

holds. Application of Lemma 1 to this proves the lemma. $$\square$$

\begin{aligned}A\varPi _{grd} = \begin{pmatrix} \varvec{a}_{\pi _1}&\varvec{a}_{\pi _2}&\dots&\varvec{a}_{\pi _n} \end{pmatrix} = \begin{pmatrix} A_{1k}&A_{2k} \end{pmatrix} \end{aligned}

where $$A_{1k}\in {\mathbb {R}}^{m\times k}$$. From Lemma 5, we can see that

\begin{aligned} (1)&\Leftrightarrow \left\| \begin{pmatrix} r_{ll}&r_{l+1l}&\dots&r_{nl} \end{pmatrix}^T\right\| ^2 \ge \left\| \begin{pmatrix} r_{lj}&r_{l+1j}&\dots&r_{nj} \end{pmatrix}^T\right\| ^2\\&\Leftrightarrow \left\| Q_{2(l-1)}\begin{pmatrix} r_{ll}&r_{l+1l}&\dots&r_{nl} \end{pmatrix}^T \right\| ^2 \ge \left\| Q_{2(l-1)}\begin{pmatrix} r_{lj}&r_{l+1j}&\dots&r_{nj} \end{pmatrix}^T\right\| ^2\\&\Leftrightarrow \left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{\pi _1}&\dots&\varvec{a}_{\pi _{l-1}} \end{pmatrix},\varvec{a}_{\pi _l}\right) ^T\right\| ^2 \ge \left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{\pi _1}&\dots&\varvec{a}_{\pi _{l-1}} \end{pmatrix},\varvec{a}_{\pi _{j}}\right) ^T\right\| ^2 \end{aligned}

for $$l=1$$, 2, ..., k and $$j=l+1$$, $$l+2$$, ..., n and

\begin{aligned} pivotQR_k(A) = \Vert R_{3k}\Vert _2 = \Vert Q_{2k}R_{3k}\Vert _2 = \Vert {\mathrm{resi}}(A_{1k},A_{2k})\Vert _2 \end{aligned}

if $${\mathrm{rank}}(A_{1k}) = k$$ holds. The last equation suggests that, as long as $${\mathrm{rank}}(A_{1k}) = k$$ holds, the value of $$pivotQR_k(A)$$ is determined only from $$A_{1k}$$ and $$A_{2k}$$, or equivalently from $$\varPi _{grd}$$, and is independent of how (or in what algorithm) the QR decomposition is computed.

## 3 Evaluation from above

We bound $$\frac{pivotQR_k(A)}{SVD_k(A)}$$ from above in this section. Since $$pivotQR_k(A) = SVD_k(A) = 0$$ holds if $${\mathrm{rank}}(A) \le k$$ holds, we only consider the case $${\mathrm{rank}}(A) > k$$. Let $$A=U\varSigma V^T$$ be one SVD. Since $$A\varPi _{grd} = U\varSigma (\varPi _{grd}^TV)^T$$ and $$(\varPi _{grd}^TV)^T(\varPi _{grd}^TV)=I_n$$ hold, $$U\varSigma (\varPi _{grd}^TV)^T$$ is one SVD of $$A\varPi _{grd}$$. Then, we can see that

\begin{aligned}SVD_k(A) = \sigma _{k+1}(A) = \sigma _{k+1}(A\varPi _{grd}). \end{aligned}

Hereafter, we change what A represents. The previous $$A\varPi _{grd}$$ is replaced by A. Let $$A\in {\mathbb {R}}^{m\times n}$$ that satisfies

\begin{aligned}&\left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{1}&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_{i}\right) \right\| \nonumber \\&\quad \ge \ \left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{1}&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_{j}\right) \right\| \ (i=1,\dots ,k, \ j=i+1,\dots ,n) \end{aligned}
(12)

be partitioned as

\begin{aligned}A = \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_n \end{pmatrix} = \begin{pmatrix} A_{1k}&A_{2k} \end{pmatrix} \end{aligned}

where $$A_{1k} \in {\mathbb {R}}^{m\times k}$$ and $${{\mathrm{rank}}}(A_{1k}) = k$$. We should compare $$\sigma _{k+1}(A) = SVD_k(A)$$ and $$\Vert {\mathrm{resi}}(A_{1k},A_{2k})\Vert _2=pivotQR_k(A)$$.

### Lemma 6

Let $$m\ge n$$, $$A\in {\mathbb {R}}^{m\times n}$$, and $$B\in {\mathbb {R}}^{m\times l}$$. For any $$\varvec{v}\in {\mathbb {R}}^{l}$$,

\begin{aligned} {\mathrm{resi}}(A,B)\varvec{v} = {\mathrm{resi}}(A,B\varvec{v}) \end{aligned}

holds.

### Proof

From the definition of $${\mathrm{resi}}$$,

\begin{aligned} {\mathrm{resi}}(A,B) = B-AA^{\dagger }B \end{aligned}

holds. Thus,

\begin{aligned} {\mathrm{resi}}(A,B)\varvec{v} = B\varvec{v}-AA^{\dagger }B\varvec{v} = {\mathrm{resi}}(A,B\varvec{v}) \end{aligned}

holds. $$\square$$

We can see that

\begin{aligned} \Vert {\mathrm{resi}}(A_{1k},A_{2k})\Vert _2&= \max _{\varvec{z}\in {\mathbb {R}}^{n-k}, \Vert \varvec{z}\Vert =1}\Vert {\mathrm{resi}}(A_{1k},A_{2k})\varvec{z}\Vert \\&=\max _{\varvec{z}\in {\mathbb {R}}^{n-k}, \Vert \varvec{z}\Vert =1}\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert \end{aligned}

from the definition of 2-norm and Lemma 6. Now, we introduce an essential theorem of this paper.

### Theorem 2

Let $$m\ge n > 1$$, $$A\in {\mathbb {R}}^{m\times n}$$, $${\mathrm{rank}}(A)=n$$, and A be partitioned as

\begin{aligned} A = \begin{pmatrix}\varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{n}\end{pmatrix}. \end{aligned}

We define $$\hat{A_i}$$ as

\begin{aligned}\hat{A_i} = \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1}&\varvec{a}_{i+1}&\dots&\varvec{a}_{n} \end{pmatrix}\end{aligned}

for $$i=1$$, 2, ..., n, and $$\varvec{d}_i$$ as

\begin{aligned} \varvec{d}_i = {\mathrm{resi}}(\hat{A_i},\varvec{a}_i) \end{aligned}

for $$i=1$$, 2, ..., n. Then, $$\varvec{d}_i \ne \varvec{0}$$ for $$i=1$$, 2, ..., n and

\begin{aligned} \frac{\Vert \varvec{a}_1\Vert }{\Vert \varvec{d}_1\Vert }\le \sum _{i=2}^n\frac{\Vert \varvec{a}_i\Vert }{\Vert \varvec{d}_i\Vert } \end{aligned}

hold.

### Proof

Since $${\mathrm{rank}}(A) = n$$, $$\{\varvec{a}_1,\varvec{a}_2,\dots ,\varvec{a}_n\}$$ is linearly independent. Because $$\varvec{d}_i$$ is a linear combination of $$\{\varvec{a}_1,\varvec{a}_2,\dots ,\varvec{a}_n\}$$ with the coefficient of $$\varvec{a}_i$$ being 1, $$\varvec{d}_i \ne \varvec{0}$$ holds for $$i=1$$, 2, ..., n. From the definition of $${\mathrm{resi}}$$,

\begin{aligned} \varvec{d}_1 = \varvec{a}_1 - \hat{A_1}\varvec{x}_1 \end{aligned}

holds, where $$\varvec{x}_1 = \hat{A_1}^{\dagger }\varvec{a}_1$$. Let $$\varvec{x}_1 = \begin{pmatrix} x_{12}&x_{13}&\dots&x_{1n} \end{pmatrix}^T$$. Let i be one of 2, 3, ..., n. We can see that

\begin{aligned} \Vert \varvec{d}_i\Vert \le \left\| \varvec{a}_i - \hat{A_i}\begin{pmatrix} \frac{1}{x_{1i}} \\ -\frac{x_{12}}{x_{1i}} \\ \vdots \\ -\frac{x_{1i-1}}{x_{1i}} \\ -\frac{x_{1i+1}}{x_{1i}} \\ \vdots \\ -\frac{x_{1n}}{x_{1i}} \end{pmatrix}\right\| = \frac{\Vert \varvec{d}_1\Vert }{|x_{1i}|} \end{aligned}

holds if $$x_{1i} \ne 0$$ from Lemma 3. Thus,

\begin{aligned} |x_{1i}| \le \frac{\Vert \varvec{d}_1\Vert }{\Vert \varvec{d}_i\Vert } \end{aligned}
(13)

holds. This (13) also holds if $$x_{1i} = 0$$. We define $$\varvec{y}\in {\mathbb {R}}^{m}$$ as

\begin{aligned} \varvec{y} = \varvec{d}_1 + x_{1n}\varvec{a}_n = \varvec{a}_1-\sum _{i=2}^{n-1}x_{1i}\varvec{a}_i. \end{aligned}

Since $$\{\varvec{a}_1,\varvec{a}_2,\dots ,\varvec{a}_{n-1}\}$$ is linearly independent, $$\varvec{y}\ne \varvec{0}$$ holds. As Lemma 1 gives $$\hat{A_1}^T\varvec{d}_1=\varvec{0}$$, we have $$(\varvec{a}_n,\varvec{d}_1) = 0$$. Thus,

\begin{aligned} x_{1n} = \frac{(\varvec{a}_n,\varvec{y})}{\Vert \varvec{a}_n\Vert ^2} \end{aligned}

holds. We can see that

\begin{aligned} \Vert \varvec{d}_n\Vert \le \left\| \varvec{a}_n - \frac{(\varvec{y},\varvec{a}_n)}{\Vert \varvec{y}\Vert ^2}\varvec{y}\right\| \end{aligned}

holds from Lemma 3 because $$\varvec{y}$$ is a linear combination of $$\varvec{a}_i$$ $$(i=1, 2, \dots , n-1)$$. Since

\begin{aligned} \Vert \varvec{d}_n\Vert ^2= & {} \frac{\Vert \varvec{d}_n\Vert ^2}{\Vert \varvec{d}_1\Vert ^2}\left\| \varvec{y}-\frac{(\varvec{a}_n,\varvec{y})}{\Vert \varvec{a}_n\Vert ^2}\varvec{a}_n\right\| ^2 = \frac{\Vert \varvec{d}_n\Vert ^2}{\Vert \varvec{d}_1\Vert ^2}\Vert \varvec{y}\Vert ^2\left( 1 - \frac{(\varvec{a}_n,\varvec{y})^2}{\Vert \varvec{y}\Vert ^2\Vert \varvec{a}_n\Vert ^2}\right) ,\\&\left\| \varvec{a}_n- \frac{(\varvec{y},\varvec{a}_n)}{\Vert \varvec{y}\Vert ^2}\varvec{y}\right\| ^2 = \Vert \varvec{a}_n\Vert ^2\left( 1 - \frac{(\varvec{a}_n,\varvec{y})^2}{\Vert \varvec{y}\Vert ^2\Vert \varvec{a}_n\Vert ^2}\right) \end{aligned}

and $$\Vert \varvec{d}_n\Vert > 0$$ hold,

\begin{aligned} \frac{\Vert \varvec{a}_n\Vert }{\Vert \varvec{d}_n\Vert } \ge \frac{\Vert \varvec{y}\Vert }{\Vert \varvec{d}_1\Vert } \end{aligned}

holds. Furthermore, since

\begin{aligned} \Vert \varvec{y}\Vert&\ge \Vert \varvec{a}_1\Vert -\sum _{i=2}^{n-1}|x_{1i}|\Vert \varvec{a}_i\Vert \\&\ge \Vert \varvec{a}_1\Vert -\sum _{i=2}^{n-1}\frac{\Vert \varvec{d}_1\Vert }{\Vert \varvec{d}_i\Vert }\Vert \varvec{a}_i\Vert \end{aligned}

holds from (13),

\begin{aligned} \frac{\Vert \varvec{a}_n\Vert }{\Vert \varvec{d}_n\Vert } \ge \frac{\Vert \varvec{a}_1\Vert }{\Vert \varvec{d}_1\Vert }-\sum _{i=2}^{n-1}\frac{\Vert \varvec{a}_i\Vert }{\Vert \varvec{d}_i\Vert } \end{aligned}

holds, and the theorem has been proved. $$\square$$

We refer to an essential theorem by Hong et al.

### Theorem 3

[9, p. 218] Let $$m\ge n > l$$, $$A\in {\mathbb {R}}^{m\times n}$$ and $$A = QR = U\varSigma V^T$$ be a QR decomposition and an SVD, respectively. Let R and V be partitioned as

\begin{aligned}R = \begin{pmatrix} R_{1l} &{}\quad R_{2l}\\ O &{}\quad R_{3l} \end{pmatrix},\quad V = \begin{pmatrix} V_{1l} &{}\quad V_{2l}\\ V_{3l} &{}\quad V_{4l} \end{pmatrix} \end{aligned}

where $$R_{1l}\in {\mathbb {R}}^{l\times l}$$ and $$V_{1l}\in {\mathbb {R}}^{l\times l}$$.

\begin{aligned} \Vert R_{3l}\Vert _2 \ \sigma _{n-l}(V_{4l}) \le \sigma _{l+1}(A)\end{aligned}

holds.

In the present study, this theorem is only used for $$l = n-1$$. The following lemma provides an inequality between $${\mathrm{resi}}$$ and the singular value.

### Lemma 7

Under the same assumptions as Theorem 2,

\begin{aligned} 1\le (\sigma _n(A))^2\sum _{i=1}^n\frac{1}{\Vert \varvec{d}_i\Vert ^2} \end{aligned}

holds.

### Proof

Let $$A=U\varSigma V^T$$ be an SVD partitioned as

\begin{aligned}V = \begin{pmatrix} V_1&\varvec{v}_2 \end{pmatrix}, \ \varvec{v}_2 = \begin{pmatrix} v_{21}&v_{22}&\dots&v_{2n} \end{pmatrix}^T \end{aligned}

where $$V_1\in {\mathbb {R}}^{n\times (n-1)}$$. Let $$\varvec{e}_i$$ be the ith column of $$I_n$$ for $$i=1$$, 2, ..., n. Define a permutation matrix $$\varPi _i$$ as

\begin{aligned}\varPi _i = \begin{pmatrix} \varvec{e}_1&\dots&\varvec{e}_{i-1}&\varvec{e}_{i+1}&\dots&\varvec{e}_n&\varvec{e}_i \end{pmatrix} \end{aligned}

for $$i=1$$, 2, ..., n. Since

\begin{aligned}\begin{pmatrix} \hat{A_i}&\varvec{a}_i \end{pmatrix} = A\varPi _i = U\varSigma (\varPi _i^TV)^T \end{aligned}

and $$(\varPi _i^TV)^T(\varPi _i^TV) = I_n$$, $$U\varSigma (\varPi _i^TV)^T$$ is one SVD of $$\begin{pmatrix} \hat{A_i}&\varvec{a}_i \end{pmatrix}$$. Let $$A\varPi _i = Q_iR_i$$ be a QR decomposition partitioned as

\begin{aligned}Q_i = \begin{pmatrix} Q_{i1}&\varvec{q}_{i2} \end{pmatrix},R_i = \begin{pmatrix} R_{i1} &{} \varvec{r}_{i2}\\ O &{} r_{i3} \end{pmatrix} \end{aligned}

where $$Q_{i1}\in {\mathbb {R}}^{m\times (n-1)},R_{i1}\in {\mathbb {R}}^{(n-1)\times (n-1)}$$. Using Theorem 3,

\begin{aligned} \sigma _n(A) = \sigma _n(A\varPi _i) \ge |v_{2i}| \ |r_{i3}| \end{aligned}

holds. We can see that

\begin{aligned} \Vert \varvec{d}_i\Vert = \Vert {\mathrm{resi}}(\hat{A_i},\varvec{a}_i)\Vert = \Vert r_{i3}\varvec{q}_{i2}\Vert = |r_{i3}| \end{aligned}

holds from Lemma 5. Thus,

\begin{aligned} \sigma _n(A) \ge |v_{2i}| \ \Vert \varvec{d}_i\Vert \end{aligned}

holds. Then,

\begin{aligned} 1 = \sum _{i=1}^n(v_{2i})^2\le (\sigma _n(A))^2\sum _{i=1}^n\frac{1}{\Vert \varvec{d}_i\Vert ^2} \end{aligned}

holds. $$\square$$

### Proposition 2

Let $$m\ge n > k$$ and $$A \in {\mathbb {R}}^{m\times n}$$ satisfy (12) with being partitioned as

\begin{aligned}A = \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_n \end{pmatrix} = \begin{pmatrix} A_{1k}&A_{2k} \end{pmatrix} \end{aligned}

where $$A_{1k} \in {\mathbb {R}}^{m\times k}$$. Let A satisfy $${\mathrm{rank}}(A_{1k})=k$$. Then, for all $$\varvec{z}\in {\mathbb {R}}^{n-k}$$ with $$\Vert \varvec{z}\Vert =1$$,

\begin{aligned} \Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert \le \sqrt{\frac{4^k-1}{3}(n-k)+1} \ \sigma _{k+1}(A) \end{aligned}

holds.

### Proof

From (12) and Lemma 6, the following holds for $$i=1$$, 2, ..., k:

\begin{aligned} (n-k) \left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{1}&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_{i}\right) \right\| ^2&\ge \sum _{j=k+1}^n \left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{1}&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_{j}\right) \right\| ^2 \nonumber \\&=\left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{1}&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{2k}\right) \right\| _F^2 \nonumber \\&\ge \left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{1}&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{2k}\right) \right\| _2^2 \nonumber \\&\ge \left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_{1}&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{2k}\varvec{z}\right) \right\| ^2. \end{aligned}
(14)

Define $$A'$$ as

\begin{aligned}A' = \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_k&A_{2k}\varvec{z} \end{pmatrix}. \end{aligned}

If $${\mathrm{rank}}(A') \ne k+1$$, then $$\{ \varvec{a}_1, \varvec{a}_2, \dots , \varvec{a}_k, A_{2k}\varvec{z}\}$$ is linearly dependent. Since

$${\mathrm{rank}}(A_{1k}) = k$$, $$\{\varvec{a}_1 , \varvec{a}_2 , \dots , \varvec{a}_k\}$$ is linearly independent, and $$A_{2k}\varvec{z}$$ can be expressed as a linear combination of $$\{\varvec{a}_1, \varvec{a}_2, \dots , \varvec{a}_k\}$$. Then, we have

$${\mathrm{resi}}(A_{1k},A_{2k}\varvec{z}) = \varvec{0}$$ from Lemma 3, and the conclusion holds. Therefore, we only consider the case $${\mathrm{rank}}(A') = k+1$$ in the remainder of this proof. We define $$\varvec{d}'_i$$ as

\begin{aligned} \varvec{d}'_i = {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1}&\varvec{a}_{i+1}&\dots&\varvec{a}_k&A_{2k}\varvec{z} \end{pmatrix},\varvec{a}_i\right) \ (i=1,2,\dots ,k). \end{aligned}

From Lemma 4, we can see that

\begin{aligned}\varvec{d}'_j = {\mathrm{resi}}\left( {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{ijk}'\right) ,{\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_j\right) \right) \end{aligned}

holds for $$i=1$$, 2, ..., k and $$j=i$$, $$i+1$$, ..., k, where

$$A_{ijk}' = \begin{pmatrix} \varvec{a}_{i}&\dots&\varvec{a}_{j-1}&\varvec{a}_{j+1}&\dots&\varvec{a}_k&A_{2k}\varvec{z} \end{pmatrix}$$, and

\begin{aligned}&{\mathrm{resi}}(A_{1k},A_{2k}\varvec{z}) \\&\quad = \ {\mathrm{resi}}\left( {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},\begin{pmatrix} \varvec{a}_{i}&\dots&\varvec{a}_{k} \end{pmatrix}\right) ,{\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{2k}\varvec{z}\right) \right) \end{aligned}

holds for $$i=1$$, 2, ..., k. Using Theorem 2 on $${\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},\begin{pmatrix} \varvec{a}_{i}&\varvec{a}_{i+1}&\dots&\varvec{a}_k&A_{2k}\varvec{z} \end{pmatrix}\right)$$, we can see that

\begin{aligned}&\frac{\left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_i\right) \right\| }{\left\| {\mathrm{resi}}\left( {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{iik}'\right) ,{\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_i\right) \right) \right\| } \\&\quad \le \ \sum _{j=i+1}^k\frac{\left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_j\right) \right\| }{\left\| {\mathrm{resi}}\left( {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{ijk}'\right) ,{\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_j\right) \right) \right\| }\\&\qquad + \ \frac{\left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{2k}\varvec{z}\right) \right\| }{\left\| {\mathrm{resi}}\left( {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},\begin{pmatrix} \varvec{a}_{i}&\dots&\varvec{a}_{k} \end{pmatrix}\right) ,{\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{2k}\varvec{z}\right) \right) \right\| } \end{aligned}

holds. Thus,

\begin{aligned}&\frac{\left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_i\right) \right\| }{\Vert \varvec{d}'_i\Vert } \\&\quad \le \ \sum _{j=i+1}^k\frac{\left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_j\right) \right\| }{\Vert \varvec{d}'_j\Vert }+\frac{\left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},A_{2k}\varvec{z}\right) \right\| }{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert }\\&\quad \le \ \left\| {\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},\varvec{a}_i\right) \right\| \left( \sum _{j=i+1}^k\frac{1}{\Vert \varvec{d}'_j\Vert }+\frac{\sqrt{n-k}}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert }\right) \end{aligned}

holds for $$i = 1$$, 2, ..., k from (12) and (14). Thus,

\begin{aligned} \frac{1}{\Vert \varvec{d}'_i\Vert } \le \sum _{j=i+1}^k\frac{1}{\Vert \varvec{d}'_j\Vert }+\frac{\sqrt{n-k}}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert } \quad (i=1,2,\dots ,k) \end{aligned}
(15)

holds. We want to show that

\begin{aligned} \frac{1}{\Vert \varvec{d}'_i\Vert } \le \frac{2^{k-i}\sqrt{n-k}}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert } \quad (i=1,2,\dots ,k) \end{aligned}
(16)

and prove this using induction in the order of $$i=k$$, $$k-1$$, ..., 1. Applying (15) for $$i=k$$ gives

\begin{aligned} \frac{1}{\Vert \varvec{d}'_k\Vert } \le \frac{\sqrt{n-k}}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert } = \frac{2^{k-k}\sqrt{n-k}}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert }. \end{aligned}

Thus, (16) is shown in case $$i=k$$. Then, we prove that (16) holds for $$i=l$$, assuming that (16) holds for $$i=l+1$$, $$l+2$$, ..., k. We can see that

\begin{aligned} \frac{1}{\Vert \varvec{d}'_l\Vert }&\le \sum _{j=l+1}^k\frac{1}{\Vert \varvec{d}'_j\Vert }+\frac{\sqrt{n-k}}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert }\\&\le \frac{\sqrt{n-k}}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert }\left( \sum _{j=l+1}^k2^{k-j}+1\right) \\&= \frac{2^{k-l}\sqrt{n-k}}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert } \end{aligned}

holds from (15) and the assumption of induction. Thus, (16) has been shown in case $$i=1$$, 2, ..., k. Using Lemma 7 on $$A'$$,

\begin{aligned} 1&\le (\sigma _{k+1}(A'))^2 \left( \sum _{i=1}^k\frac{1}{\Vert \varvec{d}'_i\Vert ^2}+\frac{1}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert ^2}\right) \\&\le \frac{(\sigma _{k+1}(A'))^2}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert ^2} \left( (n-k)\sum _{i=1}^k 4^{k-i}+1\right) \\&= \frac{(\sigma _{k+1}(A'))^2}{\Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert ^2} \left( \frac{4^k-1}{3}(n-k)+1\right) \end{aligned}

holds. Thus,

\begin{aligned} \Vert {\mathrm{resi}}(A_{1k},A_{2k}\varvec{z})\Vert \le \sqrt{\frac{4^k-1}{3}(n-k)+1} \ \sigma _{k+1}(A') \end{aligned}

holds. Now, if we can show that

\begin{aligned} \sigma _{k+1}(A') \le \sigma _{k+1}(A), \end{aligned}

then the proof is complete. Considering the fact that

\begin{aligned} \sigma _{k+1}(A) = \max _{\varTheta ,{\mathrm{dim}}\varTheta = k+1}\min _{\varvec{x}\in \varTheta ,\Vert \varvec{x}\Vert =1}\Vert A\varvec{x}\Vert , \end{aligned}

we want a subspace $$\varTheta$$ that satisfies

\begin{aligned} \min _{\varvec{x}\in \varTheta ,\Vert \varvec{x}\Vert =1}\Vert A\varvec{x}\Vert \ge \sigma _{k+1}(A'). \end{aligned}

Let

\begin{aligned}\varTheta ' = {\mathrm{span}}\left\{ \varvec{e}_1 , \varvec{e}_2 , \dots , \varvec{e}_k , \begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix} \right\} .\end{aligned}

Then, we have $${\mathrm{dim}}(\varTheta ') = k+1$$ since $$\left\{ \varvec{e}_1 , \varvec{e}_2 , \dots , \varvec{e}_k , \begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix} \right\}$$ is linearly independent. Let $$\varvec{y} = (y_i)\in {\mathbb {R}}^{k+1}$$. Since $$\begin{pmatrix} \varvec{e}_1 &{} \varvec{e}_2 &{} \dots &{} \varvec{e}_k &{} \begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix} \end{pmatrix}^T\begin{pmatrix} \varvec{e}_1 &{} \varvec{e}_2 &{} \dots &{} \varvec{e}_k &{} \begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix} \end{pmatrix}=I_{k+1}$$ holds,

\begin{aligned} \Vert \varvec{y}\Vert = 1 \Leftrightarrow \left\| \sum _{i=1}^ky_i\varvec{e}_i + y_{k+1}\begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix}\right\| =1 \end{aligned}
(17)

holds. For all $$\varvec{y}\in {\mathbb {R}}^{k+1}$$ that satisfies the right-hand side of (17),

\begin{aligned} \left\| A\left( \sum _{i=1}^ky_i\varvec{e}_i + y_{k+1}\begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix}\right) \right\| = \Vert A'\varvec{y}\Vert \ge \sigma _{k+1}(A') \end{aligned}

holds. Then,

\begin{aligned} \sigma _{k+1}(A) \ge \min _{\varvec{x}\in \varTheta ',\Vert \varvec{x}\Vert =1}\Vert A\varvec{x}\Vert \ge \sigma _{k+1}(A') \end{aligned}

holds. $$\square$$

Thus, we have proved that

\begin{aligned} pivotQR_k(A) \le \sqrt{\frac{4^k-1}{3}(n-k)+1} \ SVD_k(A). \end{aligned}

## 4 Evaluation from below

In this section, we show that the inequality proved in the previous section is tight. An example of matrix $$R_h$$ with real-valued parameter h that satisfies

\begin{aligned} \frac{pivotQR_k(R_h)}{SVD_k(R_h)}\xrightarrow [h\rightarrow 0]{} \sqrt{\frac{4^k-1}{3}(n-k)+1} \end{aligned}

is shown. $$R_h$$ is as follows:

\begin{aligned}R_h = \begin{pmatrix}\begin{array}{cccc} 1 &{}\quad 0 &{}\quad \dots &{}\quad 0\\ 0 &{}\quad h &{}\quad \ddots &{}\quad \vdots \\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad 0\\ 0 &{}\quad \dots &{}\quad 0 &{}\quad h^k \end{array} &{}\quad {O}\\ {O} &{}\quad {O} \end{pmatrix} \begin{pmatrix}\begin{array}{cccccc} 1 &{}\quad -\sqrt{1-h^2} &{}\quad \dots &{}\quad -\sqrt{1-h^2} &{}\quad \dots &{}\quad -\sqrt{1-h^2}\\ 0 &{}\quad 1 &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad \vdots \\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad -\sqrt{1-h^2} &{}\quad \dots &{}\quad -\sqrt{1-h^2}\\ 0 &{}\quad \dots &{}\quad 0 &{}\quad 1 &{}\quad \dots &{}\quad 1 \end{array}\\ {O} \end{pmatrix} .\end{aligned}

The Kahan matrix is [10]

\begin{aligned}K_n = \begin{pmatrix} 1 &{}\quad 0 &{}\quad \dots &{}\quad 0\\ 0 &{}\quad h &{}\quad \ddots &{}\quad \vdots \\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad 0\\ 0 &{}\quad \dots &{}\quad 0 &{}\quad h^{n-1} \end{pmatrix}\begin{pmatrix} 1 &{}\quad -\sqrt{1-h^2} &{}\quad \dots &{}\quad -\sqrt{1-h^2}\\ 0 &{}\quad 1 &{}\quad \ddots &{}\quad \vdots \\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad -\sqrt{1-h^2}\\ 0 &{}\quad \dots &{}\quad 0 &{}\quad 1 \end{pmatrix}. \end{aligned}

Therefore, $$R_h$$ is the same as the Kahan matrix in case $$m = n = k+1$$ and is an extension of the Kahan matrix otherwise.

### Proposition 3

Let $$m\ge n > k$$. Define $$\varSigma _h \in {\mathbb {R}}^{m\times n}$$, $$(w_{hij}) = W_h \in {\mathbb {R}}^{n\times n}$$, and $$R_h\in {\mathbb {R}}^{m\times n}$$ as follows:

\begin{aligned}\varSigma _h= & {} \begin{pmatrix} {\mathrm{diag}}(1,h,\dots ,h^{k},0,0,\dots ,0)\\ O \end{pmatrix},\\ w_{hij}= & {} {\left\{ \begin{array}{ll} 1 &{}\quad (i=j \ {\mathrm{and}} \ 1\le i \le k) \ {\mathrm{or}} \ (i=k+1 \ {\mathrm{and}} \ k+1\le j \le n), \\ -\sqrt{1-h^2} &{} (i < j \ {\mathrm{and}} \ 1 \le i \le k), \\ 0 &{}\quad {\mathrm{otherwise}} \end{array}\right. } \end{aligned}

and $$R_h=\varSigma _h W_h$$ where $$0< h < 1$$. Then,

\begin{aligned} \lim _{h\rightarrow 0}\frac{pivotQR_k(R_h)}{SVD_k(R_h)} = \sqrt{\frac{4^k-1}{3}(n-k)+1} \end{aligned}

holds.

### Proof

Let $$Q = \begin{pmatrix} I_n\\ O \end{pmatrix} \in {\mathbb {R}}^{m \times n}$$ and $$R = {\mathrm{diag}}(1,h,\dots ,h^{k},0,0,\dots ,0)W_h \in {\mathbb {R}}^{n \times n}$$. Since R is an upper triangular matrix and $$Q^TQ=I_n$$ holds, $$R_h = QR$$ is one QR decomposition. We check (1) for this R. Since

\begin{aligned} (\text{ left } \text{ side } \text{ of } \text{(1) }) =&\ h^{2l-2} = (1-h^2)\sum _{i=l}^{\min (j, k + 1)-1}h^{2i-2} + h^{2\min (j , k + 1)-2} \\ =&\ (\text{ right } \text{ side } \text{ of } \text{(1) }), \end{aligned}

(1) holds for $$l=1$$, 2, ..., $$k+1$$, $$j=l+1$$, $$l+2$$, ..., n. Obviously (1) also holds for $$l=k+2$$, $$k+3$$, ..., $$n-1$$, $$j=l+1$$, $$l+2$$, ..., n. As in Sect. 2, let R be partitioned as

\begin{aligned}R = \begin{pmatrix} R_{1k} &{}\quad R_{2k}\\ O &{}\quad R_{3k} \end{pmatrix} \end{aligned}

where $$R_{1k} \in {\mathbb {R}}^{k\times k}$$. Then,

\begin{aligned} pivotQR_k(R_h) = \Vert R_{3k}\Vert _2 \end{aligned}

holds. Define $$V\in {\mathbb {R}}^{(n-k)\times (n-k)}$$ and $$\varvec{v}_1\in {\mathbb {R}}^{n-k}$$ as follows:

\begin{aligned}V = \begin{pmatrix} \varvec{v}_1&\varvec{v}_2&\dots&\varvec{v}_{n-k} \end{pmatrix},\quad \varvec{v}_1 = \frac{1}{\sqrt{n-k}}\begin{pmatrix} 1&1&\dots&1 \end{pmatrix}^T \end{aligned}

where $$\varvec{v}_2$$, $$\varvec{v}_3$$, ..., $$\varvec{v}_{n-k}$$ are chosen such that $$V^TV = I_{n-k}$$ holds. We can choose them freely as long as this is satisfied. Since

\begin{aligned} R_{3k} = h^{k}\sqrt{n-k}\begin{pmatrix} \varvec{v}_1&\varvec{0}&\dots&\varvec{0} \end{pmatrix}^T \end{aligned}

holds, $$\Vert R_{3k}\Vert _2 = h^{k}\sqrt{n-k}$$ holds. We consider the value of $$SVD_k(R_h) = \sigma _{k+1}(R_h)$$. Considering the fact that

\begin{aligned} \sigma _{k+1}(R_h) = \min _{\varTheta ,{\mathrm{dim}}\varTheta = n-k}\max _{\varvec{x}\in \varTheta ,\Vert \varvec{x}\Vert =1}\Vert R_h\varvec{x}\Vert , \end{aligned}

we want a subspace $$\varTheta$$ whose $$\max _{\varvec{x}\in \varTheta ,\Vert \varvec{x}\Vert =1}\Vert R_h\varvec{x}\Vert$$ is small. Since $$\varvec{v}_1^T\varvec{v}_i=0$$ holds for $$i=2$$, 3, ..., $$n-k$$,

\begin{aligned}R_h\begin{pmatrix} \varvec{0}\\ \varvec{v}_i \end{pmatrix} = \begin{pmatrix} R_{2k}\\ R_{3k}\\ O \end{pmatrix} \varvec{v}_i = \begin{pmatrix} -\sqrt{(n-k)(1-h^2)} \ h^0\\ -\sqrt{(n-k)(1-h^2)} \ h^1\\ \vdots \\ -\sqrt{(n-k)(1-h^2)} \ h^{k-1}\\ \sqrt{n-k} \ h^k\\ 0\\ 0\\ \vdots \\ 0 \end{pmatrix}\varvec{v}_1^T\varvec{v}_i = \varvec{0} \end{aligned}

holds for $$i=2$$, 3, ..., $$n-k$$. We define $$y_j = 1$$ for $$j=k+1$$, ..., n and define $$y_j$$ from $$j=k$$ down to $$j=1$$ as $$y_j = \sqrt{1-h^2}\sum _{i=j+1}^n y_i$$. We define $$\varvec{y}\in {\mathbb {R}}^n$$ as $$\begin{pmatrix} y_1&y_2&\dots y_n \end{pmatrix}^T$$. Then,

\begin{aligned}R_h\varvec{y} = \begin{pmatrix} 0&0&\dots&0&(n-k)h^k&0&0&\dots&0 \end{pmatrix}^T \end{aligned}

holds. Since

\begin{aligned} \lim _{h\rightarrow 0}\Vert \varvec{y}\Vert&= \left\| \begin{pmatrix} (n-k)2^{k-1}&(n-k)2^{k-2}&\dots&(n-k)2^0&1&1&\dots&1 \end{pmatrix}\right\| \\&= \sqrt{\frac{4^k-1}{3}(n-k)^2+n-k} \end{aligned}

holds,

\begin{aligned} \lim _{h\rightarrow 0}\frac{1}{h^k}\left\| R_h\frac{\varvec{y}}{\Vert \varvec{y}\Vert }\right\| = \frac{\sqrt{n-k}}{\sqrt{\frac{4^k-1}{3}(n-k)+1}} \end{aligned}

holds. Let

\begin{aligned}\varTheta ' = {\mathrm{span}}\left\{ \frac{\varvec{y}}{\Vert \varvec{y}\Vert } , \begin{pmatrix} \varvec{0}\\ \varvec{v}_2 \end{pmatrix} , \begin{pmatrix} \varvec{0}\\ \varvec{v}_3 \end{pmatrix} , \dots , \begin{pmatrix} \varvec{0}\\ \varvec{v}_{n-k} \end{pmatrix} \right\} . \end{aligned}

Then, we have $${\mathrm{dim}}(\varTheta ') = n-k$$. Since

\begin{aligned}&\begin{pmatrix} \frac{\varvec{y}}{\Vert \varvec{y}\Vert } &{} \begin{pmatrix} \varvec{0}\\ \varvec{v}_2 \end{pmatrix} &{} \begin{pmatrix} \varvec{0}\\ \varvec{v}_3 \end{pmatrix} &{} \dots &{}\begin{pmatrix} \varvec{0}\\ \varvec{v}_{n-k} \end{pmatrix} \end{pmatrix}^T\begin{pmatrix} \frac{\varvec{y}}{\Vert \varvec{y}\Vert } &{} \begin{pmatrix} \varvec{0}\\ \varvec{v}_2 \end{pmatrix} &{} \begin{pmatrix} \varvec{0}\\ \varvec{v}_3 \end{pmatrix} &{} \dots &{}\begin{pmatrix} \varvec{0}\\ \varvec{v}_{n-k} \end{pmatrix} \end{pmatrix}\\&\quad = I_{n-k} \end{aligned}

holds, we have

\begin{aligned} \sum _{i=1}^{n-k}z_i^2=1\Leftrightarrow \left\| z_{1}\frac{\varvec{y}}{\Vert \varvec{y}\Vert } + \sum _{i=2}^{n-k}z_i\begin{pmatrix} \varvec{0}\\ \varvec{v}_i \end{pmatrix}\right\| =1 \end{aligned}
(18)

for $$z_1$$, $$z_2$$, ..., $$z_{n-k}\in {\mathbb {R}}$$. Thus, if the right-hand side of (18) holds, we have

\begin{aligned}&\frac{1}{h^k}\left\| R_h\left( z_{1}\frac{\varvec{y}}{\Vert \varvec{y}\Vert } + \sum _{i=2}^{n-k}z_i\begin{pmatrix} \varvec{0}\\ \varvec{v}_i \end{pmatrix}\right) \right\| \\&\quad = \ \frac{|z_{1}|}{h^k} \ \left\| R_h\frac{\varvec{y}}{\Vert \varvec{y}\Vert }\right\| \le \frac{1}{h^k}\left\| R_h\frac{\varvec{y}}{\Vert \varvec{y}\Vert }\right\| \xrightarrow [h\rightarrow 0]{}\frac{\sqrt{n-k}}{\sqrt{\frac{4^k-1}{3}(n-k)+1}}. \end{aligned}

Then, we have

\begin{aligned} \lim _{h\rightarrow 0}\frac{\sigma _{k+1}(R_h)}{h^k} \le \lim _{h\rightarrow 0}\frac{1}{h^k}\max _{\varvec{x}\in \varTheta ',\Vert \varvec{x}\Vert =1}\Vert R_h\varvec{x}\Vert \le \frac{\sqrt{n-k}}{\sqrt{\frac{4^k-1}{3}(n-k)+1}}. \end{aligned}

Thus,

\begin{aligned} \lim _{h\rightarrow 0}\frac{pivotQR_k(R_h)}{SVD_k(R_h)}\ge \sqrt{\frac{4^k-1}{3}(n-k)+1} \end{aligned}

holds, and the theorem has been proved using the results of the previous section. $$\square$$

## 5 Numerical experiments

### 5.1 Experiments

Because $$SVD_{k}(R_h)$$ cannot be calculated numerically when h is very small, we prepare another matrix for the experiments. Here, we present the matrix in case $$k=n-1$$.

### Proposition 4

[11] Let $$m\ge n$$ and $$\epsilon _i\in {\mathbb {R}}$$ satisfy $$0<\epsilon _i < 1$$ $$(i=1, 2, \dots , n)$$. Let $$\{\varvec{v}_i\}_{i=1}^n\in {\mathbb {R}}^n$$ be

\begin{aligned} v_{j,i}={\left\{ \begin{array}{ll} 0 \ (j < i)\\ -1-\epsilon _i \ (j=i)\\ 1-\epsilon _i \ (j > i) \end{array}\right. } \end{aligned}

where $$v_{j,i}$$ is the j-th element of $$\varvec{v}_i$$. Let $$\{\varvec{w}_i\}_{i=1}^n$$ be the output of performing the Gram-Schmidt orthonormalization on $$\{\varvec{v}_i\}_{i=1}^n$$. Define $$\varSigma \in {\mathbb {R}}^{m\times n}$$ and $$W \in {\mathbb {R}}^{n\times n}$$ as follows:

\begin{aligned}\varSigma = \begin{pmatrix} {\mathrm{diag}}(\sigma _1,\sigma _2,\dots ,\sigma _{n})\\ O \end{pmatrix},\\W = \begin{pmatrix} \varvec{w}_1&\varvec{w}_2&\dots&\varvec{w}_n \end{pmatrix} ,\end{aligned}

where $$\sigma _1 \ge \sigma _2 \ge \dots \ge \sigma _n > 0$$. Let $$A=\varSigma W^T$$. Then,

\begin{aligned} \lim _{(\epsilon _1,\epsilon _2,\dots ,\epsilon _n)\rightarrow (0,\dots ,0)}\lim _{\sigma _{n-1}\rightarrow \infty }\lim _{\sigma _{n-2}\rightarrow \infty }\dots \lim _{\sigma _{1}\rightarrow \infty }\frac{pivotQR_{n-1}(A)}{SVD_{n-1}(A)} = \sqrt{\frac{4^{n-1}+2}{3}} \end{aligned}

holds.

This proposition introduces matrices that converge to the least upper bound only in case $$k = n-1$$. In this paper, matrices that conjecturally converge to the least upper bound in case $$k < n-1$$ are defined in an analogous manner as follows.

### Conjecture 1

Let $$m\ge n > k$$ and $$\epsilon _i\in {\mathbb {R}}$$ satisfy $$0< \epsilon _i < 1$$ $$(i=1, 2, \dots , n)$$. Let $$\{\varvec{v}_i\}_{i=1}^n\in {\mathbb {R}}^n$$ be

\begin{aligned}v_{j,i}={\left\{ \begin{array}{ll} -1-\epsilon _i &{}\quad (j = i) \ {\mathrm{or}} \ ((i = k+1) \ {\mathrm{and}} \ (k+1 \le j))\\ 1-\epsilon _i &{}\quad (j > i) \ {\mathrm{and}} \ (i \le k)\\ 0 &{}\quad ({\mathrm{otherwise}}) \end{array}\right. } \end{aligned}

where $$v_{j,i}$$ is the j-th element of $$\varvec{v}_i$$. Let $$\{\varvec{w}_i\}_{i=1}^n$$ be the output of performing the Gram-Schmidt orthonormalization on $$\{\varvec{v}_i\}_{i=1}^n$$. Define $$\varSigma \in {\mathbb {R}}^{m\times n}$$ and $$W \in {\mathbb {R}}^{n\times n}$$ as follows:

\begin{aligned}\varSigma= & {} \begin{pmatrix} {\mathrm{diag}}(\sigma _1,\dots ,\sigma _{k+1},0,\dots ,0)\\ O \end{pmatrix},\\ W= & {} \begin{pmatrix} \varvec{w}_1&\varvec{w}_2&\dots&\varvec{w}_n \end{pmatrix}, \end{aligned}

where $$\sigma _1 \ge \sigma _2 \ge \dots \ge \sigma _{k+1} > 0$$. Let $$A=\varSigma W^T$$. Then,

\begin{aligned} \lim _{(\epsilon _1,\epsilon _2,\dots ,\epsilon _n)\rightarrow (0,\dots ,0)}\lim _{\sigma _{k}\rightarrow \infty }\lim _{\sigma _{k-1}\rightarrow \infty }\dots \lim _{\sigma _{1}\rightarrow \infty }\frac{pivotQR_{k}(A)}{SVD_{k}(A)} = \sqrt{\frac{4^k-1}{3}(n-k)+1} \end{aligned}

holds. $$\square$$

$$\{\varvec{v}_i\}_{i=1}^n$$ is as follows:

\begin{aligned} \begin{pmatrix} \varvec{v}_1&\varvec{v}_2&\dots&\varvec{v}_n \end{pmatrix} = \begin{pmatrix} -1-\epsilon _1 &{} 1-\epsilon _1 &{} \dots &{} \dots &{} 1-\epsilon _1 &{} \dots &{} 1-\epsilon _1\\ 0 &{} -1-\epsilon _2 &{} 1-\epsilon _2 &{} \dots &{} 1-\epsilon _2 &{} \dots &{} 1-\epsilon _2\\ \vdots &{} \ddots &{} \ddots &{} \ddots &{} \ddots &{} \ddots &{} \ddots \\ \vdots &{} \ddots &{} \ddots &{} -1-\epsilon _k &{} 1-\epsilon _k &{} \dots &{} 1-\epsilon _k\\ \vdots &{} \ddots &{} \ddots &{} 0 &{} -1-\epsilon _{k+1} &{} \dots &{} -1-\epsilon _{k+1}\\ \vdots &{} \ddots &{} \ddots &{} 0 &{} 0 &{} \ddots &{} 0\\ 0 &{} \ddots &{} \ddots &{} 0 &{} 0 &{} 0 &{} -1-\epsilon _{n}\\ \end{pmatrix}. \end{aligned}

Because the limits cannot be taken in numerical experiments, we assign $$s^{n-i}$$ to $$\sigma _i$$ for $$i=1$$, 2, ..., $$k+1$$ and $$s^{-1}$$ to $$\epsilon _i$$ for $$i=1$$, 2, ..., n. Numerical experiments were performed for $$n=2$$, 3, ..., 25 and $$k=1$$, 2, ..., $$n-1$$ with $$m = n$$. We calculate u as follows:

\begin{aligned} u = \frac{\sqrt{\frac{4^k-1}{3}(n-k)+1}}{\frac{pivotQR_{k}(A)}{SVD_{k}(A)}}-1. \end{aligned}

Because we want $$\frac{pivotQR_{k}(A)}{SVD_{k}(A)}\rightarrow \sqrt{\frac{4^k-1}{3}(n-k)+1}$$ as $$s \rightarrow \infty$$, we want $$u \rightarrow 0$$ as $$s \rightarrow \infty$$.

### 5.2 Environment

Python 3.5.2, scipy 1.4.1, and numpy 1.13.3 were used. The data type used in these experiments was double precision. We used SVD of numpy and pivoted QR of scipy. Gram-Schmidt orthonormalization was performed twice using the scipy QR decomposition. Let $$V = \begin{pmatrix} \varvec{v}_1&\dots&\varvec{v}_n \end{pmatrix}$$ and $$V = QR$$ and $$Q = WR'$$ be QR decompositions. The i-th column of W is assigned to $$\varvec{w}_i$$ for $$i=1$$, 2, ..., n.

### 5.3 Results

The maximum value of u is $$2.3482077194\cdot 10^{-5}$$ and $$u \ge 0$$ holds in all cases where $$s = 10^9$$. The results of cases $$m = n = 2$$, 3, ..., 25, $$k = n-1$$, $$\lfloor \frac{n}{2}\rfloor$$, $$\lfloor \log _2 n\rfloor$$, and $$s = 10^9$$ are shown in Fig. 1. We can see that it almost approaches the least upper bound, and u increases monotonically with n and k. The results of cases $$m = n = 25$$, $$k = n-1$$, $$\lfloor \frac{n}{2}\rfloor$$, $$\lfloor \log _2 n\rfloor$$, and $$s=10$$, $$10^2$$, ..., $$10^9$$ are shown in Fig. 2. We can see that u decreases monotonically with s. From these results, we can see that numerical solutions are likely to converge to the least upper bound.

## 6 Conclusion

We compare the 2-norm of the truncation error of pivoted QR to that of SVD. We obtain the following theorem:

### Theorem 4

Let $$m\ge n > k$$. For any $$A\in {\mathbb {R}}^{m\times n}$$,

\begin{aligned} pivotQR_k(A)\le \sqrt{\frac{4^k-1}{3}(n-k)+1} \ SVD_k(A) \end{aligned}

holds. Furthermore, for $$t\in {\mathbb {R}}$$ that satisfies $$t < \sqrt{\frac{4^k-1}{3}(n-k)+1}$$, there exists $$A\in {\mathbb {R}}^{m\times n}$$ that satisfies

\begin{aligned} pivotQR_k(A) > t \ SVD_k(A). \end{aligned}

$$\square$$

This proposition states that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is $$\sqrt{\frac{4^k-1}{3}(n-k)+1}$$ when an $$m\times n$$ $$(m\ge n)$$ matrix is approximated to a matrix whose rank is k. Furthermore, an example where the ratio converges to the least upper bound is found. We also find an example where the ratio is close to the least upper bound through a numerical experiment in case n is small.

## 7 Future work

We can see that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is $$O(2^k\sqrt{n})$$. However, this upper bound is attained only on a contrived example [4]. We expect that the upper bound may become significantly smaller by adding some restrictions to the matrices. We intend to find such a property.