Skip to main content

Sublinear Cost Low Rank Approximation via Subspace Sampling

  • Conference paper
  • First Online:
Mathematical Aspects of Computer and Information Sciences (MACIS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11989))

  • 418 Accesses

Abstract

Low Rank Approximation (LRA) of a matrix is a hot research subject, fundamental for Matrix and Tensor Computations and Big Data Mining and Analysis. Computations with LRA can be performed at sublinear cost, that is, by using much fewer memory cells and arithmetic operations than an input matrix has entries. Although every sublinear cost algorithm for LRA fails to approximate the worst case inputs, we prove that our sublinear cost variations of a popular subspace sampling algorithm output accurate LRA of a large class of inputs.

Namely, they do so with a high probability (whp) for a random input matrix that admits its LRA. In other papers we propose and analyze other sublinear cost algorithms for LRA and Linear Least Sqaures Regression. Our numerical tests are in good accordance with our formal results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Here and hereafter “Gaussian matrices” stands for “Gaussian random matrices” (see Definition 1). “SRHT and SRFT” are the acronyms for “Subsample Random Hadamard and Fourier transforms”. Rademacher’s are the matrices filled with iid variables, each equal to 1 or \(-1\) with probability 1/2.

  2. 2.

    Subpermutation matrices are full-rank submatrices of permutation matrices.

  3. 3.

    \({{\,\mathrm{nrank}\,}}(W)\) denotes numerical rank of W (see Appendix A.1).

  4. 4.

    We defined Algorithm 4 in Remark 4

  5. 5.

    Additive white Gaussian noise is statistical noise having a probability density function (PDF) equal to that of the Gaussian (normal) distribution.

References

  1. Björck, Å.: Numerical Methods in Matrix Computations. TAM, vol. 59. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-05089-8

    Book  MATH  Google Scholar 

  2. Chen, Z., Dongarra, J.J.: Condition numbers of Gaussian random matrices, SIAM. J. Matrix Anal. Appl. 27, 603–620 (2005)

    Article  MathSciNet  Google Scholar 

  3. Cichocki, C., Lee, N., Oseledets, I., Phan, A.-H., Zhao, Q., Mandic, D.P.: Tensor networks for dimensionality reduction and large-scale optimization: part 1 low-rank tensor decompositions. Found. Trends® Mach. Learn. 9(4–5), 249–429 (2016)

    Article  Google Scholar 

  4. Davidson, K.R., Szarek, S.J.: Local operator theory, random matrices, and banach spaces. In: Johnson, W.B., Lindenstrauss, J., (eds.) Handbook on Geometry of Banach Spaces, pp. 317–368, North Holland (2001)

    Google Scholar 

  5. Edelman, A.: Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl. 9(4), 543–560 (1988)

    Article  MathSciNet  Google Scholar 

  6. Edelman, A., Sutton, B.D.: Tails of condition number distributions. SIAM J. Matrix Anal. Appl. 27(2), 547–560 (2005)

    Article  MathSciNet  Google Scholar 

  7. Golub, G.H., Van Loan, C.F.: Matrix Computations, fourth edition. The Johns Hopkins University Press, Baltimore (2013)

    Google Scholar 

  8. Goreinov, S., Oseledets, I., Savostyanov, D., Tyrtyshnikov, E., Zamarashkin, N.: How to find a good submatrix. In: Matrix Methods: Theory, Algorithms, Applications,(dedicated to the Memory of Gene Golub, edited by V. Olshevsky and E. Tyrtyshnikov), pp. 247–256. World Scientific Publishing, New Jersey (2010)

    Google Scholar 

  9. Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)

    Article  MathSciNet  Google Scholar 

  10. Kishore Kumar, N., Schneider, J.: Literature survey on low rank approximation of matrices. Linear Multilinear Algebra 65(11), 2212–2244 (2017). arXiv:1606.06511v1 [math.NA] 21 June 2016

    Article  MathSciNet  Google Scholar 

  11. Luan, Q., Pan, V.Y.: CUR LRA at sublinear cost based on volume maximization, In: Salmanig, D. et al. (eds.) MACIS 2019, LNCS 11989, pp. xx–yy. Springer, Switzerland (2020). https://doi.org/10.1007/978-3-030-43120-49. arXiv:1907.10481 (2019)

  12. Luan, Q., Pan, V.Y., Randomized approximation of linear least squares regression at sublinear cost. arXiv:1906.03784, 10 June 2019

  13. Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends Mach. Learn. 3, 2 (2011)

    MATH  Google Scholar 

  14. Osinsky, A.: Rectangular maximum volume and projective volume search algorithms. arXiv:1809.02334, September 2018

  15. Pan, C.-T.: On the existence and computation of rank-revealing LU factorizations. Linear Algebra Appl. 316, 199–222 (2000)

    Article  MathSciNet  Google Scholar 

  16. Pan, V.Y., Luan, Q.: Refinement of low rank approximation of a matrix at sublinear cost. arXiv:1906.04223, 10 June 2019

  17. Pan, V.Y., Luan, Q., Svadlenka, J., Zhao, L.: Primitive and Cynical Low Rank Approximation, Preprocessing and Extensions. arXiv 1611.01391, 3 November 2016

    Google Scholar 

  18. Pan, V.Y., Luan, Q., Svadlenka, J., Zhao, L.: Superfast Accurate Low Rank Approximation. Preprint, arXiv:1710.07946, 22 October 2017

  19. Pan, V.Y., Luan, Svadlenka, Q., Zhao, L.: CUR Low Rank Approximation at Sublinear Cost. arXiv:1906.04112, 10 June 2019

  20. Pan, V.Y., Luan, Q., Svadlenka, J., Zhao, L.: Low rank approximation at sublinear cost by means of subspace sampling. arXiv:1906.04327, 10 June 2019

  21. Pan, V.Y., Qian, G., Yan, X.: Random multipliers numerically stabilize Gaussian and block Gaussian elimination: proofs and an extension to low-rank approximation. Linear Algebra Appl. 481, 202–234 (2015)

    Article  MathSciNet  Google Scholar 

  22. Pan, V.Y., Zhao, L.: New studies of randomized augmentation and additive preprocessing. Linear Algebra Appl. 527, 256–305 (2017)

    Article  MathSciNet  Google Scholar 

  23. Pan, V.Y., Zhao, L.: Numerically safe Gaussian elimination with no pivoting. Linear Algebra Appl. 527, 349–383 (2017)

    Article  MathSciNet  Google Scholar 

  24. Sankar, A., Spielman, D., Teng, S.-H.: Smoothed analysis of the condition numbers and growth factors of matrices. SIMAX 28(2), 446–476 (2006)

    Article  MathSciNet  Google Scholar 

  25. Tropp, J.A.: Improved analysis of subsampled randomized Hadamard transform. Adv. Adapt. Data Anal. 3(1–2), 115–126 (2011). (Special issue "Sparse Representation of Data and Images")

    Article  MathSciNet  Google Scholar 

  26. Tropp, J.A., Yurtsever, A., Udell, M., Cevher, V.: Practical sketching algorithms for low-rank matrix approximation. SIAM J. Matrix Anal. Appl. 38, 1454–1485 (2017)

    Article  MathSciNet  Google Scholar 

  27. Woolfe, F., Liberty, E., Rokhlin, V., Tygert, M.: A fast randomized algorithm for the approximation of matrices. Appl. Comput. Harmonic. Anal. 25, 335–366 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We were supported by NSF Grants CCF–1116736, CCF–1563942, CCF–1733834 and PSC CUNY Award 69813 00 48.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Y. Pan .

Editor information

Editors and Affiliations

Appendix

Appendix

A Background on Matrix Computations

A.1 Some Definitions

  • An \(m\times n\) matrix M is orthogonal if \(M^*M=I_n\) or \(MM^*=I_m\).

  • For \(M=(m_{i,j})_{i,j=1}^{m,n}\) and two sets \(\mathcal I\subseteq \{1,\dots ,m\}\) and \(\mathcal J\subseteq \{1,\dots ,n\}\), define the submatrices \(M_{\mathcal I,:}:=(m_{i,j})_{i\in \mathcal I; j=1,\dots , n}, M_{:,\mathcal J}:=(m_{i,j})_{i=1,\dots , m;j\in \mathcal J},\) and  \(M_{\mathcal I,\mathcal J}:=(m_{i,j})_{i\in \mathcal I;j\in \mathcal J}.\)

  • \({{\,\mathrm{rank}\,}}(M)\) denotes the rank of a matrix M.

  • argmin\(_{|E|\le \epsilon |M|}{{\,\mathrm{rank}\,}}(M+E)\) is the \(\epsilon \)-\({{\,\mathrm{rank}\,}}(M)\) it is numerical rank, \({{\,\mathrm{nrank}\,}}(M)\), if \(\epsilon \) is small in context.

  • Write \(\sigma _j(M)=0\) for \(j>r\) and obtain \(M_{r}\), the rank-r truncation of M.

  • \(\kappa (M)=||M||~||M^+||\) is the spectral condition number of M.

A.2 Auxiliary Results

Next we recall some relevant auxiliary results (we omit the proofs of two well-known lemmas).

Lemma 2

[The norm of the pseudo inverse of a matrix product]. Suppose that \(A\in \mathbb R^{k\times r}\), \(B\in \mathbb R^{r\times l}\) and the matrices A and B have full rank \(r\le \min \{k,l\}\). Then \(|(AB)^+| \le |A^+|~|B^+|\).

Lemma 3

(The norm of the pseudo inverse of a perturbed matrix, [B15, Theorem 2.2.4]). If \({{\,\mathrm{rank}\,}}(M+E)={{\,\mathrm{rank}\,}}(M)=r\) and \(\eta =||M^+||~||E||<1\), then

$$\frac{1}{\sqrt{r}}||(M+E)^+||\le ||(M+E)^+||\le \frac{1}{1-\eta } ||M^+||.$$

Lemma 4

(The impact of a perturbation of a matrix on its singular values, [GL13, Corollary 8.6.2]). For \(m\ge n\) and a pair of \({m\times n}\) matrices M and \(M+E\) it holds that

$$|\sigma _j(M+E)-\sigma _j(M)|\le ||E||~\mathrm{for}~j=1,\dots ,n. $$

Theorem 8

(The impact of a perturbation of a matrix on its top singular spaces, [GL13, Theorem 8.6.5]). Let \(g=:\sigma _{r}(M)-\sigma _{r+1}(M)>0~\mathrm{and}~||E||_F\le 0.2g.\) Then for the left and right singular spaces associated with the r largest singular values of the matrices M and \(M+E\), there exist orthogonal matrix bases \( B_{r,\mathrm left}(M)\), \(B_{r,\mathrm right}(M)\), \( B_{r,\mathrm left}(M+E)\), and \(B_{r,\mathrm right}(M+E)\) such that

$$\max \{||B_{r,\mathrm left}(M+E)- B_{r,\mathrm left}(M)||_F,||B_{r,\mathrm right}(M+E)-B_{r,\mathrm right}(M)||_F\}\le \textstyle \frac{4||E||_F}{g}.$$

For example, if \(\sigma _{r}(M)\ge 2\sigma _{r+1}(M)\), which implies that \(g\ge 0.5~\sigma _{r}(M)\), and if \(||E||_F\le 0.1~ \sigma _{r}(M)\), then the upper bound on the right-hand side is approximately \(8||E||_F/\sigma _r(M)\).

A.3 Gaussian and Factor-Gaussian Matrices of Low Rank and Low Numerical Rank

Lemma 5

[Orthogonal invariance of a Gaussian matrix]. Suppose that k, m, and n are three positive integers, \(k\le \min \{m,n\},\) \(G_{m,n}{\mathop {=}\limits ^{d}}\mathcal G^{m\times n}\), \(S\in \mathbb R^{k\times m}\), \(T\in \mathbb R^{n\times k}\), and S and T are orthogonal matrices. Then SG and GT are Gaussian matrices.

Definition 2

[Factor-Gaussian matrices]. Let \(r\le \min \{m,n\}\) and let \(\mathcal G_{r,B}^{m\times n}\), \(\mathcal G_{A,r}^{m\times n}\), and \(\mathcal G_{r,C}^{m\times n}\) denote the classes of matrices \(G_{m,r}B\), \(AG_{r,n}\), and \(G_{m,r} C G_{r,n}\), respectively, which we call left, right, and two-sided factor-Gaussian matrices of rank r, respectively, provided that \(G_{p,q}\) denotes a \(p\times q\) Gaussian matrix, \(A\in \mathbb R^{m\times r}\), \(B\in \mathbb R^{r\times n}\), and \(C\in \mathbb R^{r\times r}\), and A, B and C are well-conditioned matrices of full rank r.

Theorem 9

The class \(\mathcal G_{r,C}^{m\times n}\) of two-sided \(m\times n\) factor-Gaussian matrices \(G_{m,r} \varSigma G_{r,n}\) does not change if in its definition we replace the factor C by a well-conditioned diagonal matrix \(\varSigma =(\sigma _j)_{j=1}^r\) such that \(\sigma _1\ge \sigma _2\ge \dots \ge \sigma _r>0\).

Proof

Let \(C=U_C\varSigma _C V_C^*\) be SVD. Then \(A=G_{m,r}U_C{\mathop {=}\limits ^{d}}\mathcal G^{m\times r}\) and \(B=V_C^*G_{r,n}{\mathop {=}\limits ^{d}}\mathcal G^{r\times n}\) by virtue of Lemma 5, and so \(G_{m,r} C G_{r,n} =A\varSigma _C B\) for \(A{\mathop {=}\limits ^{d}}\mathcal G^{m\times r}\), \(B{\mathop {=}\limits ^{d}}\mathcal G^{r\times n}\), and A independent from B.

Definition 3

The relative norm of a perturbation of a Gaussian matrix is the ratio of the perturbation norm and the expected value of the norm of the matrix (estimated in Theorem 11).

We refer to all three matrix classes above as factor-Gaussian matrices of rank r, to their perturbations within a relative norm bound \(\epsilon \) as factor-Gaussian matrices of \(\epsilon \)-rank r, and to their perturbations within a small relative norm as factor-Gaussian matrices of numerical rank r to which we also refer as perturbations of factor-Gaussian matrices.

Clearly \(||(A\varSigma )^+||\le ||\varSigma ^{-1}||~||A^+||\) and \(||(\varSigma B)^+||\le ||\varSigma ^{-1}||~||B^+||\) for a two-sided factor-Gaussian matrix \(M=A\varSigma B\) of rank r of Definition 2, and so whp such a matrix is both left and right factor-Gaussian of rank r.

Theorem 10

Suppose that \(\lambda \) is a positive scalar, \(M_{k,l}\in \mathbb R^{k\times l}\) and G a \(k\times l\) Gaussian matrix for \(k-l \ge l+2 \ge 4\). Then, we have

$$ \mathbb E~|| (M_{k, l} + \lambda G)^+ || \le \frac{\lambda e \sqrt{k-l}}{k-2l} ~\text { and }~ \mathbb E~|| (M_{k, l} + \lambda G)^+||_F \le \lambda \sqrt{\frac{l}{k-2l-1}} $$

Proof

Let \(M_{k,l}=U\varSigma V^*\) be full SVD such that \(U\in \mathbb R^{k\times k}\), \(V\in \mathbb R^{l\times l}\), U and V are orthogonal matrices, \(\varSigma =(D~|~O_{l,k-l})^*\), and D is an \(l\times l\) diagonal matrix. Write \(W_{k,l}:=U^*(M_{k,l}+\lambda G)V\) and observe that \(U^*M_{k,l}V=\varSigma \) and \(U^*GV = \begin{bmatrix} G_1 \\ G_2 \end{bmatrix}\) is a \(k\times l\) Gaussian matrix by virtue of Lemma 5. Hence

$$\sigma _l(W_{k,l}) =\sigma _l\Big (\begin{bmatrix} D + \lambda G_1 \\ \lambda G_2 \end{bmatrix}\Big ) \ge \max \{ \sigma _l(D+\lambda G_1),\lambda \sigma _l( G_2) \},$$

and so \(|W_{k,l}^+|\le \min \{|(D+ \lambda G_1)^+|,|\lambda G_2^+|\}.\) Recall that \(G_1 {\mathop {=}\limits ^{d}}\mathcal G^{l\times l}\) and \(G_2 {\mathop {=}\limits ^{d}}\mathcal G^{k-l\times l}\) are independent, and now Theorem 10 follows because \(|(M_{k,l}+\lambda G_{k,l})^+|= |W_{k,l}^+|\) and by virtue of claim (iii) and (iv) of Theorem 12.

A.4 Norms of a Gaussian Matrix and Its Pseudo Inverse \(\varGamma (x)= \int _0^{\infty }\exp (-t)t^{x-1}dt\) denotes the Gamma function.

Theorem 11

[Norms of a Gaussian matrix. See [DS01, Theorem II.7] and our Definition 1].

(i) Probability\(\{\nu _{\mathrm{sp},m,n}>t+\sqrt{m}+\sqrt{n}\}\le \exp (-t^2/2)\) for \(t\ge 0\), \(\mathbb E(\nu _{\mathrm{sp},m,n})\le \sqrt{m}+\sqrt{n}\).

(ii) \(\nu _{F,m,n}\) is the \(\chi \)-function, with \(\mathbb E(\nu _{F,m,n})=mn\) and probability density \(\frac{2x^{n-i}\mathrm{exp}(-x^2/2)}{2^{n/2}\varGamma (n/2)}.\)

Theorem 12

[Norms of the pseudo inverse of a Gaussian matrix (see Definition 1)].

  1. (i)

    Probability \(\{\nu _{\mathrm{sp},m,n}^+\ge m/x^2\}<\frac{x^{m-n+1}}{\varGamma (m-n+2)}\) for \(m\ge n\ge 2\) and all positive x,

  2. (ii)

    Probability \(\{\nu _{F,m,n}^+\ge t\sqrt{\frac{3n}{m-n+1}}\}\le t^{n-m}\) and Probability \(\{\nu _{\mathrm{sp},m,n}^+\ge t\frac{e\sqrt{m}}{m-n+1}\}\le t^{n-m}\) for all \(t\ge 1\) provided that \(m\ge 4\),

  3. (iii)

    \(\mathbb E((\nu ^+_{F,m,n})^2)=\frac{n}{m-n-1}\) and \(\mathbb E(\nu ^+_{\mathrm{sp},m,n})\le \frac{e\sqrt{m}}{m-n}\) provided that \(m\ge n+2\ge 4\),

  4. (iv)

    Probability \(\{\nu _{\mathrm{sp},n,n}^+\ge x\}\le \frac{2.35\sqrt{n}}{x}\) for \(n\ge 2\) and all positive x, and furthermore \(||M_{n,n}+G_{n,n}||^+\le \nu _{n,n}\) for any \(n\times n\) matrix \(M_{n,n}\) and an \(n\times n\) Gaussian matrix \(G_{n,n}\).

Proof

See [CD05, Proof of Lemma 4.1] for claim (i), [HMT11, Proposition 10.4 and equations (10.3) and (10.4)] for claims (ii) and (iii), and [SST06, Theorem 3.3] for claim (iv).

Theorem 12 implies reasonable probabilistic upper bounds on the norm \(\nu _{m,n}^+\) even where the integer \(|m-n|\) is close to 0; whp the upper bounds of Theorem 12 on the norm \(\nu ^+_{m,n}\) decrease very fast as the difference \(|m-n|\) grows from 1.

B Small Families of Hard Inputs for Sublinear Cost LRA Any sublinear cost LRA algorithm fails on the following small families of LRA inputs.

Example 1

Let \(\varDelta _{i,j}\) denote an \(m\times n\) matrix of rank 1 filled with 0s except for its (ij)th entry filled with 1. The mn such matrices \(\{\varDelta _{i,j}\}_{i,j=1}^{m,n}\) form a family of \(\delta \)-matrices. We also include the \(m\times n\) null matrix \(O_{m,n}\) filled with 0s into this family. Now any fixed sublinear cost algorithm does not access the (ij)th entry of its input matrices for some pair of i and j. Therefore it outputs the same approximation of the matrices \(\varDelta _{i,j}\) and \(O_{m,n}\), with an undetected error at least 1/2. Arrive at the same conclusion by applying the same argument to the set of \(mn+1\) small-norm perturbations of the matrices of the above family and to the \(mn+1\) sums of the latter matrices with any fixed \(m\times n\) matrix of low rank. Finally, the same argument shows that a posteriori estimation of the output errors of an LRA algorithm applied to the same input families cannot run at sublinear cost.

The example actually covers randomized LRA algorithms as well. Indeed suppose that with a positive constant probability an LRA algorithm does not access K entries of an input matrix with a positive constant probability. Apply this algorithm to two matrices of low rank whose difference at all these K entries is equal to a large constant C. Then, clearly, with a positive constant probability the algorithm has errors at least C/2 at at least K/2 of these entries. The paper [LPa] shows, however, that accurate LRA of a matrix that admits sufficiently close LRA can be computed at sublinear cost in two successive Cross-Approximation (C-A) iterations (cf. [GOSTZ10]) provided that we avoid choosing degenerating initial submatrix, which is precisely the problem with the matrix families of Example 1. Thus we readily compute close LRA if we recursively perform C-A iterations and avoid degeneracy at some C-A step.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pan, V.Y., Luan, Q., Svadlenka, J., Zhao, L. (2020). Sublinear Cost Low Rank Approximation via Subspace Sampling. In: Slamanig, D., Tsigaridas, E., Zafeirakopoulos, Z. (eds) Mathematical Aspects of Computer and Information Sciences. MACIS 2019. Lecture Notes in Computer Science(), vol 11989. Springer, Cham. https://doi.org/10.1007/978-3-030-43120-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43120-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43119-8

  • Online ISBN: 978-3-030-43120-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics