ISAAC 2008: Algorithms and Computation pp 414-423

Deterministic Sparse Column Based Matrix Reconstruction via Greedy Approximation of SVD

• Ali Çivril
• Malik Magdon-Ismail
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5369)

Abstract

Given a matrix A ∈ ℝ m ×n of rank r, and an integer k < r, the top k singular vectors provide the best rank-k approximation to A. When the columns of A have specific meaning, it is desirable to find (provably) “good” approximations to A k which use only a small number of columns in A. Proposed solutions to this problem have thus far focused on randomized algorithms. Our main result is a simple greedy deterministic algorithm with guarantees on the performance and the number of columns chosen. Specifically, our greedy algorithm chooses c columns from A with $$c=O \left({{k^2\log k} \over {\epsilon^2}} \mu^2(A)\ln\left({\sqrt{k}\|{A_k}\|_F} \over {\epsilon}\|{A-A_k}\|_F\right)\right)$$ such that

$${\|A-C_{gr}C_{gr}^+A\|}_F \leq \left(1+\epsilon \right)\|{A-A_k}_F,$$

where C gr is the matrix composed of the c columns, $$C_{gr}^+$$ is the pseudo-inverse of C gr ($$C_{gr}C_{gr}^+A$$ is the best reconstruction of A from C gr ), and μ(A) is a measure of the coherence in the normalized columns of A. The running time of the algorithm is O(SVD(A k ) + mnc) where SVD(A k ) is the running time complexity of computing the first k singular vectors of A. To the best of our knowledge, this is the first deterministic algorithm with performance guarantees on the number of columns and a (1 + ε) approximation ratio in Frobenius norm. The algorithm is quite simple and intuitive and is obtained by combining a generalization of the well known sparse approximation problem from information theory with an existence result on the possibility of sparse approximation. Tightening the analysis along either of these two dimensions would yield improved results.

Keywords

Greedy Algorithm Failure Probability Singular Vector Frobenius Norm Deterministic Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

1. 1.
Chan, T.F.: Rank revealing QR factorizations. Linear Algebra Appl. (88/89), 67–82 (1987)
2. 2.
Chandrasekaran, S., Ipsen, I.C.F.: On rank-revealing factorizations. SIAM J. Matrix Anal. Appl. 15, 592–622 (1994)
3. 3.
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Review 43(1), 129–159 (2001)
4. 4.
de Hoog, F.R., Mattheijb, R.M.M.: Subset selection for matrices. Linear Algebra and its Applications (422), 349–359 (2007)
5. 5.
Deshpande, A., Rademacher, L., Vempala, S., Wang, G.: Matrix approximation and projective clustering via volume sampling. In: SODA 2006, pp. 1117–1126. ACM Press, New York (2006)Google Scholar
6. 6.
Deshpande, A., Varadarajan, K.: Sampling-based dimension reduction for subspace approximation. In: STOC 2007: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp. 641–650. ACM, New York (2007)
7. 7.
Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. In: Díaz, J., Jansen, K., Rolim, J.D.P., Zwick, U. (eds.) APPROX 2006 and RANDOM 2006. LNCS, vol. 4110, pp. 292–303. Springer, Heidelberg (2006)
8. 8.
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering in large graphs and matrices. In: SODA 1999: Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, pp. 291–299. SIAM, Philadelphia (1999)Google Scholar
9. 9.
Drineas, P., Kannan, R., Mahoney, M.W.: Fast monte carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing 36(1), 158–183 (2006)
10. 10.
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Subspace sampling and relative-error matrix approximation: column-row-based methods. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 304–314. Springer, Heidelberg (2006)
11. 11.
Friedman, J.H., Stuetzle, W.: Projection pursuit regressions. J. Amer. Statist. Soc. 76, 817–823 (1981)
12. 12.
Frieze, A., Kannan, R., Vempala, S.: Fast monte-carlo algorithms for finding low-rank approximations. Journal of the Association for Computing Machinery 51(6), 1025–1041 (2004)
13. 13.
Golub, G.H., Loan, C.V.: Matrix Computations. Johns Hopkins U. Press (1996)Google Scholar
14. 14.
Gu, M., Eisenstat, S.C.: Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM Journal on Scientific Computing 17(4), 848–869 (1996)
15. 15.
Hong, Y.P., Pan, C.T.: Rank-revealing QR factorizations and the singular value decomposition. Mathematics of Computation 58, 213–232 (1992)
16. 16.
Kuruvilla, F.G., Park, P.J., Schreiber, S.L.: Vector algebra in the analysis of genome-wide expression data. Genome Biology (3) (2002)Google Scholar
17. 17.
Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 41(12), 3397–3415 (1993)
18. 18.
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM Journal on Computing 24(2), 227–234 (1995)
19. 19.
Pan, C.T., Tang, P.T.P.: Bounds on singular values revealed by QR factorizations. BIT Numerical Mathematics 39, 740–756 (1999)
20. 20.
Rudelson, M., Vershynin, R.: Sampling from large matrices: An approach through geometric functional analysis. J. ACM 54(4) (2007)Google Scholar
21. 21.
Sarlos, T.: Improved approximation algorithms for large matrices via random projections. In: FOCS 2006: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, Washington, DC, USA, pp. 143–152. IEEE Computer Society Press, Los Alamitos (2006)
22. 22.
Shyamalkumar, N.D., Varadarajan, K.: Efficient subspace approximation algorithms. In: SODA 2007: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 532–540. Society for Industrial and Applied Mathematics, Philadelphia (2007)Google Scholar
23. 23.
Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Transactions on Information Theory 50(10), 2231–2242 (2004)