Abstract
Low-rank binary matrix approximation is a generic problem where one seeks a good approximation of a binary matrix by another binary matrix with some specific properties. A good approximation means that the difference between the two matrices in some matrix norm is small. The properties of the approximation binary matrix could be: a small number of different columns, a small binary rank or a small Boolean rank. Unfortunately, most variants of these problems are NP-hard. Due to this, we initiate the systematic algorithmic study of low-rank binary matrix approximation from the perspective of parameterized complexity. We show in which cases and under what conditions the problem is fixed-parameter tractable, admits a polynomial kernel and can be solved in parameterized subexponential time.
Similar content being viewed by others
Notes
We are grateful to the anonymous reviewer who pointed to us that the running time of our algorithm can be improved from the original \( 2^{{\mathcal {O}}(r\sqrt{k\log {(k+r)}})}\cdot nm\) to \( 2^{{\mathcal {O}}( \sqrt{rk\log {(k+r)}\log r})}\cdot nm\).
References
Agarwal PK, Har-Peled S, Varadarajan KR (2004) Approximating extent measures of points. J ACM 51(4):606–635
Aho AV, Ullman JD, Yannakakis M (1983) On notions of information transfer in VLSI circuits. In: Proceedings of the 15th annual ACM symposium on theory of computing (STOC), ACM, pp 133–139
Alon N, Sudakov B (1999) On two segmentation problems. J Algorithms 33(1):173–184
Alon N, Yuster R, Zwick U (1995) Color-coding. J ACM 42(4):844–856
Arora S, Ge R, Kannan R, Moitra A (2012) Computing a nonnegative matrix factorization—provably. In: Proceedings of the 44th annual ACM symposium on theory of computing (STOC), ACM, pp 145–162
Badoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings of the 34th annual ACM symposium on theory of computing (STOC). ACM, pp 250–257
Ban F, Bhattiprolu V, Bringmann K, Kolev P, Lee E, Woodruff DP (2019) A PTAS for \(\ell _p\)-low rank approximation. In: Proceedings of the thirtieth annual ACM-SIAM symposium on discrete algorithms, SODA 2019, San Diego, California, USA, 6–9 Jan 2019. SIAM, pp 747–766
Bartl E, Belohlávek R, Konecny J (2010) Optimal decompositions of matrices with grades into binary and graded matrices. Ann Math Artif Intell 59(2):151–167
Basu A, Dinitz M, Li X (2016) Computing approximate PSD factorizations. CoRR arXiv:1602.07351
Belohlávek R, Vychodil V (2010) Discovery of optimal factors in binary data via a novel method of matrix decomposition. J Comput Syst Sci 76(1):3–20
Bodlaender HL, Downey RG, Fellows MR, Hermelin D (2009) On problems without polynomial kernels. J Comput Syst Sci 75(8):423–434
Boucher C, Lo C, Lokshtanov D (2011) Outlier detection for DNA fragment assembly. CoRR arXiv:1111.0376
Bringmann K, Kolev P, Woodruff DP (2017) Approximation algorithms for \(\ell _0\)-low rank approximation. In: Advances in neural information processing systems 30 (NIPS), pp 6651–6662
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11:1–11:37
Chandran LS, Issac D, Karrenbauer A (2016) On the parameterized complexity of biclique cover and partition. In: Proceedings of the 11th international symposium on parameterized and exact computation (IPEC), Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, LIPIcs, vol 63, pp 11:1–11:13
Chandrasekaran V, Sanghavi S, Parrilo PA, Willsky AS (2011) Rank-sparsity incoherence for matrix decomposition. SIAM J Optim 21(2):572–596
Cichocki A, Zdunek R, Phan AH, Si Amari (2009) Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley, Hoboken
Cilibrasi R, van Iersel L, Kelk S, Tromp J (2007) The complexity of the single individual SNP haplotyping problem. Algorithmica 49(1):13–36
Clarkson KL, Woodruff DP (2015) Input sparsity and hardness for robust subspace approximation. In: Proceedings of the 56th annual symposium on Foundations of Computer Science (FOCS). IEEE Computer Society, pp 310–329
Cohen JE, Rothblum UG (1993) Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra Appl 190:149–168
Cygan M, Fomin FV, Kowalik L, Lokshtanov D, Marx D, Pilipczuk M, Pilipczuk M, Saurabh S (2015) Parameterized algorithms. Springer, Berlin
Dan C, Hansen KA, Jiang H, Wang L, Zhou Y (2015) On low rank approximation of binary matrices. CoRR arXiv:1511.01699
Downey RG, Fellows MR (1992) Fixed-parameter tractability and completeness. In: Proceedings of the 21st Manitoba conference on numerical mathematics and computing Congressus Numerantium, vol 87, pp 161–178
Downey RG, Fellows MR (2013) Fundamentals of parameterized complexity. Texts in computer science. Springer, Berlin
Drange PG, Reidl F, Villaamil FS, Sikdar S (2015) Fast biclustering by dual parameterization. CoRR arXiv:1507.08158
Feige U (2014) NP-hardness of hypercube 2-segmentation. CoRR arXiv:1411.0821
Fiorini S, Massar S, Pokutta S, Tiwary HR, de Wolf R (2015) Exponential lower bounds for polytopes in combinatorial optimization. J ACM 62(2):17
Fomin FV, Kratsch S, Pilipczuk M, Pilipczuk M, Villanger Y (2014) Tight bounds for parameterized complexity of cluster editing with a small number of clusters. J Comput Syst Sci 80(7):1430–1447
Fomin FV, Golovach PA, Lokshtanov D, Panolan F, Saurabh S (2018a) Approximation schemes for low-rank binary matrix approximation problems. CoRR arXiv:1807.07156
Fomin FV, Lokshtanov D, Meesum SM, Saurabh S, Zehavi M (2018b) Matrix rigidity from the viewpoint of parameterized complexity. SIAM J Discrete Math 32(2):966–985
Fomin FV, Lokshtanov D, Saurabh S, Zehavi M (2019) Kernelization. Theory of parameterized preprocessing. Cambridge University Press, Cambridge
Fu Y (2014) Low-rank and sparse modeling for visual analysis, 1st edn. Springer, Berlin
Geerts F, Goethals B, Mielikäinen T (2004) Tiling databases. In: Proceedings of the 7th international conference on discovery science, (DS), pp 278–289
Gillis N, Vavasis SA (2015) On the complexity of robust PCA and \(\ell _1\)-norm low-rank matrix approximation. CoRR arXiv:1509.09236
Gramm J, Guo J, Hüffner F, Niedermeier R (2008) Data reduction and exact algorithms for clique cover. ACM J Exp Algorithmics. https://doi.org/10.1145/1412228.1412236
Gregory DA, Pullman NJ, Jones KF, Lundgren JR (1991) Biclique coverings of regular bigraphs and minimum semiring ranks of regular matrices. J Comb Theory Ser B 51(1):73–89
Grigoriev D (1976) Using the notions of separability and independence for proving the lower bounds on the circuit complexity (in Russian). Notes of the Leningrad branch of the Steklov Mathematical Institute, Nauka
Grigoriev D (1980) Using the notions of separability and independence for proving the lower bounds on the circuit complexity. J Sov Math 14(5):1450–1456
Gutch HW, Gruber P, Yeredor A, Theis FJ (2012) ICA over finite fields—separability and algorithms. Sig Process 92(8):1796–1808
Guterman AE (2008) Rank and determinant functions for matrices over semirings. In: Surveys in contemporary mathematics, London Mathematical Society lecture note series, vol 347. Cambridge University Press, Cambridge, pp 1–33
Inaba M, Katoh N, Imai H (1994) Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the 10th annual symposium on computational geometry. ACM, pp 332–339
Jiang P, Heath MT (2013) Mining discrete patterns via binary matrix factorization. In: ICDM workshops. IEEE Computer Society, pp 1129–1136
Jiang P, Peng J, Heath M, Yang R (2014) A clustering approach to constrained binary matrix factorization. Springer, Berlin, pp 281–303
Kannan R, Vempala S (2009) Spectral algorithms. Found Trends Theor Comput Sci 4(3–4):157–288
Kleinberg J, Papadimitriou C, Raghavan P (2004) Segmentation problems. J ACM 51(2):263–280
Koyutürk M, Grama A (2003) Proximus: a framework for analyzing very high dimensional discrete-attributed datasets. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 147–156
Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM 57(2):5:1–5:32
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Lokam SV (2009) Complexity lower bounds using linear algebra. Found Trends Theor Comput Sci 4:1–155
Lovász L, Saks ME (1988) Lattices, möbius functions and communication complexity. In: Proceedings of the 29th annual symposium on Foundations of Computer Science (FOCS). IEEE, pp 81–90
Lu H, Vaidya J, Atluri V (2008) Optimal boolean matrix decomposition: application to role engineering. In: Proceedings of the 24th international conference on data engineering, (ICDE), pp 297–306
Lu H, Vaidya J, Atluri V, Shin H, Jiang L (2011) Weighted rank-one binary matrix factorization. In: Proceedings of the eleventh SIAM international conference on data mining, SDM 2011, 28–30 Apr 2011, Mesa, Arizona, USA. SIAM/Omnipress, pp 283–294
Lu H, Vaidya J, Atluri V, Hong Y (2012) Constraint-aware role mining via extended boolean matrix decomposition. IEEE Trans Dependable Secur Comput 9(5):655–669
Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends Mach Learn 3(2):123–224
Marx D (2008) Closest substring problems with small distances. SIAM J Comput 38(4):1382–1410
Meesum SM, Saurabh S (2016) Rank reduction of directed graphs by vertex and edge deletions. In: Proceedings of the 12th Latin American symposium on (LATIN), lecture notes in computer science, vol 9644. Springer, pp 619–633
Meesum SM, Misra P, Saurabh S (2016) Reducing rank of the adjacency matrix by graph modification. Theoret Comput Sci 654:70–79
Miettinen P, Vreeken J (2011) Model order selection for boolean matrix factorization. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 51–59
Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362
Mitra B, Sural S, Vaidya J, Atluri V (2016) A survey of role mining. ACM Comput Surv 48(4):50:1–50:37
Moitra A (2016) An almost optimal algorithm for computing nonnegative rank. SIAM J Comput 45(1):156–173
Naik GR (2016) Non-negative matrix factorization techniques. Springer, Berlin
Naor M, Schulman LJ, Srinivasan A (1995) Splitters and near-optimal derandomization. In: Proceedings of the 36th annual symposium on Foundations of Computer Science (FOCS). IEEE, pp 182–191
Orlin J (1977) Contentment in graph theory: covering graphs with cliques. Nederl Akad Wetensch Proc Ser A 80=Indag Math 39(5):406–424
Ostrovsky R, Rabani Y (2002) Polynomial-time approximation schemes for geometric min-sum median clustering. J ACM 49(2):139–156
Painsky A, Rosset S, Feder M (2016) Generalized independent component analysis over finite alphabets. IEEE Trans Inf Theory 62(2):1038–1053
Razborov AA (1989) On rigid matrices. Manuscript in Russian
Razenshteyn IP, Song Z, Woodruff DP (2016) Weighted low rank approximations with provable guarantees. In: Proceedings of the 48th annual ACM symposium on theory of computing (STOC). ACM, pp 250–263
Shen BH, Ji S, Ye J (2009) Mining discrete patterns via binary matrix factorization. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 757–766
Shi Z, Wang L, Shi L (2014) Approximation method to rank-one binary matrix factorization. In: 2014 IEEE international conference on automation science and engineering, CASE 2014, New Taipei, Taiwan, 18–22 Aug 2014. IEEE, pp 800–805
Vaidya J (2012) Boolean matrix decomposition problem: theory, variations and applications to data engineering. In: Proceedings of the 28th IEEE international conference on data engineering (ICDE). IEEE Computer Society, pp 1222–1224
Vaidya J, Atluri V, Guo Q (2007) The role mining problem: finding a minimal descriptive set of roles. In: Proceedings of the 12th ACM symposium on access control models and (SACMAT), pp 175–184
Valiant LG (1977) Graph-theoretic arguments in low-level complexity. In: Mathematical foundations of computer science (MFCS), Lecture Notes in Computer Science, vol 53. Springer, pp 162–176
Woodruff DP (2014) Sketching as a tool for numerical linear algebra. Found Trends Theor Comput Sci 10(1–2):1–157
Wright J, Ganesh A, Rao SR, Peng Y, Ma Y (2009) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Proceedings of 23rd annual conference on neural information processing systems (NIPS). Curran Associates, Inc., pp 2080–2088
Wulff S, Urner R, Ben-David S (2013) Monochromatic bi-clustering. In: Proceedings of the 30th international conference on machine learning, (ICML), JMLR.org, JMLR workshop and conference proceedings, vol 28, pp 145–153
Yannakakis M (1991) Expressing combinatorial optimization problems by linear programs. J Comput Syst Sci 43(3):441–466
Yeredor A (2011) Independent component analysis over Galois fields of prime order. IEEE Trans Inf Theory 57(8):5342–5359
Acknowledgements
We thank Daniel Lokshtanov, Syed Mohammad Meesum and Saket Saurabh for helpful discussions on the topic of the paper. We also are very grateful to the anonymous reviewers whose suggestions helped us to improve our results.
Funding
The research leading to these results have been supported by the Research Council of Norway via the projects “CLASSIS” (grant 249994) and “MULTIVAL” (grant 263317).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Pauli Miettinen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work was done within the CEDAS center in Bergen. The preliminary version of this paper appeared as an extended abstract in the proceedings of ICALP 2018.
Rights and permissions
About this article
Cite this article
Fomin, F.V., Golovach, P.A. & Panolan, F. Parameterized low-rank binary matrix approximation. Data Min Knowl Disc 34, 478–532 (2020). https://doi.org/10.1007/s10618-019-00669-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-019-00669-5