Least upper bound of truncation error of low-rank matrix approximation algorithm using QR decomposition with pivoting

Low-rank approximation by QR decomposition with pivoting (pivoted QR) is known to be less accurate than singular value decomposition (SVD); however, the calculation amount is smaller than that of SVD. The least upper bound of the ratio of the truncation error, defined by ‖A-BC‖2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Vert A-BC\Vert _2$$\end{document}, using pivoted QR to that using SVD is proved to be 4k-13(n-k)+1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\frac{4^k-1}{3}(n-k)+1}$$\end{document} for A∈Rm×n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A\in {\mathbb {R}}^{m\times n}$$\end{document}(m≥n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(m\ge n)$$\end{document}, approximated as a product of B∈Rm×k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B\in {\mathbb {R}}^{m\times k}$$\end{document} and C∈Rk×n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C\in {\mathbb {R}}^{k\times n}$$\end{document} in this study.


Low-rank approximation
Low-rank matrix approximation involves approximating a matrix by a matrix whose rank is less than that of the original matrix. Let A ∈ ℝ m×n ; then, a rank k approximation of A is given by where B ∈ ℝ m×k and C ∈ ℝ k×n . Low-rank matrix approximation appears in many applications such as data mining [5] and machine learning [14]. It also plays an important role in tensor decompositions [12].

3
This paper discusses truncation errors of low-rank matrix approximation using QR decomposition with pivoting, or pivoted QR. In this study, rounding errors are not considered, and the norm used is basically 2-norm. A ∈ ℝ m×n (without loss of generality, we assume that m ≥ n ) is approximated by a product of B ∈ ℝ m×k and C ∈ ℝ k×n , and the truncation error is defined by ‖A − BC‖ 2 .
It is well-known that for any matrix A ∈ ℝ m×n ( m ≥ n ), there are orthogonal matrices U ∈ ℝ m×m and V ∈ ℝ n×n and a diagonal matrix ∈ ℝ n×n with nonnegative diagonal elements that satisfy This is a singular value decomposition (SVD) of A. We define i (A) for i = 1 , 2, ..., n, satisfying and assume that 1  holds [8]. Therefore, this is an A's rank-k approximation whose 2-norm of truncation error is the smallest. We define the truncation error of low-rank approximation by SVD as The amount of computation required to calculate SVD is O(nm min(n, m)).
Pivoted QR was proposed by Golub in 1965 [7]. Because the amount of computation required to calculate the low-rank approximation by pivoted QR is O(nmk), it is cheaper than SVD and hence useful in many applications such as solving rankdeficient least squares problems [2]. It consists of QR decomposition and pivoting. For any matrix A, there exist Q ∈ ℝ m×n and an upper triangular matrix R ∈ ℝ n×n that satisfy A = QR and Q T Q = I n . This is a QR decomposition of A. We use pivoting to determine the permutation matrix grd and apply the QR decomposition algorithm to A grd . The subscript grd signifies the greedy method, as explained previously. Hereafter, we redefine QR as a QR decomposition of A grd = QR . Let Q and R be partitioned as

3
QR decomposition with pivoting and singular value decomposition where Q 1k ∈ ℝ m×k and R 1k ∈ ℝ k×k . Then, we can approximate A to Q 1k R 1k R 2k T grd and holds. We define the truncation error of low-rank approximation by pivoted QR as In this study, the greedy method is used to make ‖R 3k ‖ 2 small in pivoting. Pivoting is performed such that the elements in R = (r ij ) satisfy the following inequalities [1, p.103] Condition (1) is not used to analyze the error for l = k + 1 , k + 2 , ..., n − 1.
The greedy method of pivoting is not always optimal. QR decompositions of A RR , where RR is chosen such that R RR has a small lower right block and where Q RR R RR is a QR decomposition of A RR , are called rank-revealing QR (RRQR). The following theorem was shown by Hong et al. in 1992 [9]. Theorem 1 Let m ≥ n > k , and A ∈ ℝ m×n . Then, there exists a permutation matrix , the upper triangular factor of the QR decomposition of A with R 1 ∈ ℝ k×k , satisfy the following inequality: Finding the optimal permutation matrix is not practical from the viewpoint of computational complexity.

Truncation error of pivoted QR
Pivoted QR sometimes results in a large truncation error. A well-known example was shown by Kahan, whose work we do not reproduce here [10]. In 1968, Faddeev et al. [6] showed that Furthermore, (1) holds [3]. However, in a survey in 2017, it was stated that "very little is known in theory about its behaviour" [13, p. 2218] with regard to pivoted QR, thus there is still room for further research on pivoted QR.
Our previous work showed that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is √ 4 n−1 +2 3 in case an m × n ( m ≥ n ) matrix is approximated to a matrix whose rank is n − 1 , i.e., for k = n − 1 [11]. The tight upper bound for all k is proved in the rest of this paper.
We assume that all matrices and vectors in this paper are real numbers; however, we can easily extend the discussion in this paper to complex numbers, and the same results can be obtained.

Preliminaries
In this section, we define the notations and examine the basic properties to analyze the truncation errors. First, we introduce the concept resi.
Proposition 1 [1, p. 16] For A ∈ ℝ m×n , there exists X ∈ ℝ n×m that satisfies and X is uniquely determined by the four conditions. Definition 1 For A ∈ ℝ m×n ( m ≥ n ), the generalized inverse of A is defined by X ∈ ℝ n×m that satisfies the four conditions in Proposition 1 and is denoted by A † .
The following notation is closely related to the truncation error of pivoted QR.

Definition 2 Let
We denote the inner product of two vectors x and y as (x, y). Example 1 For x ∈ ℝ n and y ∈ ℝ n , if x ≠ 0 , then the following holds: The following lemma will be used to identify resi.
from (3), and from (4). We can see that from (5), (6), and (7). We can see that from (2), (4), (8), and (9). Then, (9) and (10) can be combined as Next, (5) can be rewritten as From this and (11), we have Application of Lemma 1 to this proves the lemma. ◻ QR decomposition and resi have the following relation. Note that QR in this lemma is without pivoting.

Lemma 5
Let m ≥ n > l , A ∈ ℝ m×n , and A = QR be a QR decomposition partitioned as

3
QR decomposition with pivoting and singular value decomposition where A 1k ∈ ℝ m×k . From Lemma 5, we can see that for l = 1 , 2, ..., k and j = l + 1 , l + 2 , ..., n and if rank(A 1k ) = k holds. The last equation suggests that, as long as rank(A 1k ) = k holds, the value of pivotQR k (A) is determined only from A 1k and A 2k , or equivalently from grd , and is independent of how (or in what algorithm) the QR decomposition is computed.

Evaluation from above
We bound

holds.
Proof From the definition of resi, holds. Thus,

holds. ◻
We can see that from the definition of 2-norm and Lemma 6. Now, we introduce an essential theorem of this paper.
QR decomposition with pivoting and singular value decomposition holds. This (13) also holds if x 1i = 0 . We define y ∈ ℝ m as A = a 1 a 2 … a n .
Since {a 1 , a 2 , … , a n−1 } is linearly independent, y ≠ 0 holds. As Lemma 1 gives Â 1 T d 1 = 0 , we have (a n , d 1 ) = 0 . Thus, holds. We can see that holds from Lemma 3 because y is a linear combination of a i (i = 1, 2, … , n − 1) . Since and ‖d n ‖ > 0 hold, holds. Furthermore, since holds from (13), holds, and the theorem has been proved. ◻ We refer to an essential theorem by Hong et al.
Theorem 3 [9, p. 218] Let m ≥ n > l , A ∈ ℝ m×n and A = QR = U V T be a QR decomposition and an SVD, respectively. Let R and V be partitioned as holds.
In the present study, this theorem is only used for l = n − 1 . The following lemma provides an inequality between resi and the singular value.

Lemma 7 Under the same assumptions as Theorem 2,
holds.

Proof Let A = U V T be an SVD partitioned as
where V 1 ∈ ℝ n×(n−1) . Let e i be the ith column of I n for i = 1 , 2, ..., n. Define a permutation matrix i as for i = 1 , 2, ..., n. Since

Proposition 2
Let m ≥ n > k and A ∈ ℝ m×n satisfy (12) with being partitioned as where A 1k ∈ ℝ m×k . Let A satisfy rank(A 1k ) = k . Then, for all z ∈ ℝ n−k with ‖z‖ = 1,

holds.
Proof From (12) and Lemma 6, the following holds for i = 1 , 2, ..., k: Define A ′ as If rank(A � ) ≠ k + 1 , then {a 1 , a 2 , … , a k , A 2k z} is linearly dependent. Since rank(A 1k ) = k , {a 1 , a 2 , … , a k } is linearly independent, and A 2k z can be expressed as a linear combination of {a 1 , a 2 , … , a k } . Then, we have resi(A 1k , A 2k z) = 0 from Lemma 3, and the conclusion holds. Therefore, we only consider the case rank(A � ) = k + 1 in the remainder of this proof. We define d ′ i as From Lemma 4, we can see that holds for i = 1 , 2, ..., k and j = i , i + 1 , ..., k, where A � ijk = a i … a j−1 a j+1 … a k A 2k z , and  a 1 a 2 … a i−1 , a i a i+1 … a k A 2k z , we can see that holds. Thus, holds for i = 1 , 2, ..., k from (12) and (14). Thus, holds. We want to show that and prove this using induction in the order of i = k , k − 1 , ..., 1. Applying (15) for i = k gives  holds. For all y ∈ ℝ k+1 that satisfies the right-hand side of (17), holds. Then, holds. ◻ Thus, we have proved that

Evaluation from below
In this section, we show that the inequality proved in the previous section is tight. An example of matrix R h with real-valued parameter h that satisfies is shown. R h is as follows: The Kahan matrix is [10] (17) Therefore, R h is the same as the Kahan matrix in case m = n = k + 1 and is an extension of the Kahan matrix otherwise.

Experiments
Because SVD k (R h ) cannot be calculated numerically when h is very small, we prepare another matrix for the experiments. Here, we present the matrix in case k = n − 1.

holds.
This proposition introduces matrices that converge to the least upper bound only in case k = n − 1 . In this paper, matrices that conjecturally converge to the least upper bound in case k < n − 1 are defined in an analogous manner as follows.
. Define ∈ ℝ m×n and W ∈ ℝ n×n as follows: is as follows: Because we want

Environment
Python 3.5.2, scipy 1.4.1, and numpy 1.13.3 were used. The data type used in these experiments was double precision. We used SVD of numpy and pivoted QR of scipy. Gram-Schmidt orthonormalization was performed twice using the scipy QR decomposition. Let V = v 1 … v n and V = QR and Q = WR � be QR decompositions. The i-th column of W is assigned to w i for i = 1 , 2, ..., n.

Results
The maximum value of u is 2.3482077194 ⋅ 10 −5 and u ≥ 0 holds in all cases where s = 10 9 . The results of cases m = n = 2 , 3, ..., 25, k = n − 1 , ⌊ n 2 ⌋ , ⌊log 2 n⌋ , and s = 10 9 are shown in Fig. 1. We can see that it almost approaches the least upper bound, and u increases monotonically with n and k. The results of cases m = n = 25 , k = n − 1 , ⌊ n 2 ⌋ , ⌊log 2 n⌋ , and s = 10 , 10 2 , ..., 10 9 are shown in Fig. 2. We can see that u decreases monotonically with s. From these results, we can see that numerical solutions are likely to converge to the least upper bound.

Conclusion
We compare the 2-norm of the truncation error of pivoted QR to that of SVD. We obtain the following theorem: Theorem 4 Let m ≥ n > k . For any A ∈ ℝ m×n , holds. Furthermore, for t ∈ ℝ that satisfies t < √ 4 k −1 3 (n − k) + 1 , there exists A ∈ ℝ m×n that satisfies ◻ This proposition states that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is √ 4 k −1 3 (n − k) + 1 when an m × n (m ≥ n) matrix is approximated to a matrix whose rank is k. Furthermore, an example where the ratio converges to the least upper bound is found. We also find an example where the ratio is close to the least upper bound through a numerical experiment in case n is small.

Future work
We can see that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is O(2 k √ n) . However, this upper bound is attained only on a contrived example [4]. We expect that the upper bound may become significantly smaller by adding some restrictions to the matrices. We intend to find such a property.