Skip to main content
Log in

Parameterized low-rank binary matrix approximation

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Low-rank binary matrix approximation is a generic problem where one seeks a good approximation of a binary matrix by another binary matrix with some specific properties. A good approximation means that the difference between the two matrices in some matrix norm is small. The properties of the approximation binary matrix could be: a small number of different columns, a small binary rank or a small Boolean rank. Unfortunately, most variants of these problems are NP-hard. Due to this, we initiate the systematic algorithmic study of low-rank binary matrix approximation from the perspective of parameterized complexity. We show in which cases and under what conditions the problem is fixed-parameter tractable, admits a polynomial kernel and can be solved in parameterized subexponential time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. We are grateful to the anonymous reviewer who pointed to us that the running time of our algorithm can be improved from the original \( 2^{{\mathcal {O}}(r\sqrt{k\log {(k+r)}})}\cdot nm\) to \( 2^{{\mathcal {O}}( \sqrt{rk\log {(k+r)}\log r})}\cdot nm\).

References

  • Agarwal PK, Har-Peled S, Varadarajan KR (2004) Approximating extent measures of points. J ACM 51(4):606–635

    Article  MathSciNet  MATH  Google Scholar 

  • Aho AV, Ullman JD, Yannakakis M (1983) On notions of information transfer in VLSI circuits. In: Proceedings of the 15th annual ACM symposium on theory of computing (STOC), ACM, pp 133–139

  • Alon N, Sudakov B (1999) On two segmentation problems. J Algorithms 33(1):173–184

    Article  MathSciNet  MATH  Google Scholar 

  • Alon N, Yuster R, Zwick U (1995) Color-coding. J ACM 42(4):844–856

    Article  MathSciNet  MATH  Google Scholar 

  • Arora S, Ge R, Kannan R, Moitra A (2012) Computing a nonnegative matrix factorization—provably. In: Proceedings of the 44th annual ACM symposium on theory of computing (STOC), ACM, pp 145–162

  • Badoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings of the 34th annual ACM symposium on theory of computing (STOC). ACM, pp 250–257

  • Ban F, Bhattiprolu V, Bringmann K, Kolev P, Lee E, Woodruff DP (2019) A PTAS for \(\ell _p\)-low rank approximation. In: Proceedings of the thirtieth annual ACM-SIAM symposium on discrete algorithms, SODA 2019, San Diego, California, USA, 6–9 Jan 2019. SIAM, pp 747–766

  • Bartl E, Belohlávek R, Konecny J (2010) Optimal decompositions of matrices with grades into binary and graded matrices. Ann Math Artif Intell 59(2):151–167

    Article  MathSciNet  MATH  Google Scholar 

  • Basu A, Dinitz M, Li X (2016) Computing approximate PSD factorizations. CoRR arXiv:1602.07351

  • Belohlávek R, Vychodil V (2010) Discovery of optimal factors in binary data via a novel method of matrix decomposition. J Comput Syst Sci 76(1):3–20

    Article  MathSciNet  MATH  Google Scholar 

  • Bodlaender HL, Downey RG, Fellows MR, Hermelin D (2009) On problems without polynomial kernels. J Comput Syst Sci 75(8):423–434

    Article  MathSciNet  MATH  Google Scholar 

  • Boucher C, Lo C, Lokshtanov D (2011) Outlier detection for DNA fragment assembly. CoRR arXiv:1111.0376

  • Bringmann K, Kolev P, Woodruff DP (2017) Approximation algorithms for \(\ell _0\)-low rank approximation. In: Advances in neural information processing systems 30 (NIPS), pp 6651–6662

  • Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11:1–11:37

    Article  MathSciNet  MATH  Google Scholar 

  • Chandran LS, Issac D, Karrenbauer A (2016) On the parameterized complexity of biclique cover and partition. In: Proceedings of the 11th international symposium on parameterized and exact computation (IPEC), Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, LIPIcs, vol 63, pp 11:1–11:13

  • Chandrasekaran V, Sanghavi S, Parrilo PA, Willsky AS (2011) Rank-sparsity incoherence for matrix decomposition. SIAM J Optim 21(2):572–596

    Article  MathSciNet  MATH  Google Scholar 

  • Cichocki A, Zdunek R, Phan AH, Si Amari (2009) Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley, Hoboken

    Book  Google Scholar 

  • Cilibrasi R, van Iersel L, Kelk S, Tromp J (2007) The complexity of the single individual SNP haplotyping problem. Algorithmica 49(1):13–36

    Article  MathSciNet  MATH  Google Scholar 

  • Clarkson KL, Woodruff DP (2015) Input sparsity and hardness for robust subspace approximation. In: Proceedings of the 56th annual symposium on Foundations of Computer Science (FOCS). IEEE Computer Society, pp 310–329

  • Cohen JE, Rothblum UG (1993) Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra Appl 190:149–168

    Article  MathSciNet  MATH  Google Scholar 

  • Cygan M, Fomin FV, Kowalik L, Lokshtanov D, Marx D, Pilipczuk M, Pilipczuk M, Saurabh S (2015) Parameterized algorithms. Springer, Berlin

    Book  MATH  Google Scholar 

  • Dan C, Hansen KA, Jiang H, Wang L, Zhou Y (2015) On low rank approximation of binary matrices. CoRR arXiv:1511.01699

  • Downey RG, Fellows MR (1992) Fixed-parameter tractability and completeness. In: Proceedings of the 21st Manitoba conference on numerical mathematics and computing Congressus Numerantium, vol 87, pp 161–178

  • Downey RG, Fellows MR (2013) Fundamentals of parameterized complexity. Texts in computer science. Springer, Berlin

    Book  MATH  Google Scholar 

  • Drange PG, Reidl F, Villaamil FS, Sikdar S (2015) Fast biclustering by dual parameterization. CoRR arXiv:1507.08158

  • Feige U (2014) NP-hardness of hypercube 2-segmentation. CoRR arXiv:1411.0821

  • Fiorini S, Massar S, Pokutta S, Tiwary HR, de Wolf R (2015) Exponential lower bounds for polytopes in combinatorial optimization. J ACM 62(2):17

    Article  MathSciNet  MATH  Google Scholar 

  • Fomin FV, Kratsch S, Pilipczuk M, Pilipczuk M, Villanger Y (2014) Tight bounds for parameterized complexity of cluster editing with a small number of clusters. J Comput Syst Sci 80(7):1430–1447

    Article  MathSciNet  MATH  Google Scholar 

  • Fomin FV, Golovach PA, Lokshtanov D, Panolan F, Saurabh S (2018a) Approximation schemes for low-rank binary matrix approximation problems. CoRR arXiv:1807.07156

  • Fomin FV, Lokshtanov D, Meesum SM, Saurabh S, Zehavi M (2018b) Matrix rigidity from the viewpoint of parameterized complexity. SIAM J Discrete Math 32(2):966–985

    Article  MathSciNet  MATH  Google Scholar 

  • Fomin FV, Lokshtanov D, Saurabh S, Zehavi M (2019) Kernelization. Theory of parameterized preprocessing. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Fu Y (2014) Low-rank and sparse modeling for visual analysis, 1st edn. Springer, Berlin

    MATH  Google Scholar 

  • Geerts F, Goethals B, Mielikäinen T (2004) Tiling databases. In: Proceedings of the 7th international conference on discovery science, (DS), pp 278–289

  • Gillis N, Vavasis SA (2015) On the complexity of robust PCA and \(\ell _1\)-norm low-rank matrix approximation. CoRR arXiv:1509.09236

  • Gramm J, Guo J, Hüffner F, Niedermeier R (2008) Data reduction and exact algorithms for clique cover. ACM J Exp Algorithmics. https://doi.org/10.1145/1412228.1412236

    Article  MATH  Google Scholar 

  • Gregory DA, Pullman NJ, Jones KF, Lundgren JR (1991) Biclique coverings of regular bigraphs and minimum semiring ranks of regular matrices. J Comb Theory Ser B 51(1):73–89

    Article  MathSciNet  MATH  Google Scholar 

  • Grigoriev D (1976) Using the notions of separability and independence for proving the lower bounds on the circuit complexity (in Russian). Notes of the Leningrad branch of the Steklov Mathematical Institute, Nauka

  • Grigoriev D (1980) Using the notions of separability and independence for proving the lower bounds on the circuit complexity. J Sov Math 14(5):1450–1456

    Article  Google Scholar 

  • Gutch HW, Gruber P, Yeredor A, Theis FJ (2012) ICA over finite fields—separability and algorithms. Sig Process 92(8):1796–1808

    Article  Google Scholar 

  • Guterman AE (2008) Rank and determinant functions for matrices over semirings. In: Surveys in contemporary mathematics, London Mathematical Society lecture note series, vol 347. Cambridge University Press, Cambridge, pp 1–33

  • Inaba M, Katoh N, Imai H (1994) Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the 10th annual symposium on computational geometry. ACM, pp 332–339

  • Jiang P, Heath MT (2013) Mining discrete patterns via binary matrix factorization. In: ICDM workshops. IEEE Computer Society, pp 1129–1136

  • Jiang P, Peng J, Heath M, Yang R (2014) A clustering approach to constrained binary matrix factorization. Springer, Berlin, pp 281–303

    MATH  Google Scholar 

  • Kannan R, Vempala S (2009) Spectral algorithms. Found Trends Theor Comput Sci 4(3–4):157–288

    Article  MathSciNet  MATH  Google Scholar 

  • Kleinberg J, Papadimitriou C, Raghavan P (2004) Segmentation problems. J ACM 51(2):263–280

    Article  MathSciNet  MATH  Google Scholar 

  • Koyutürk M, Grama A (2003) Proximus: a framework for analyzing very high dimensional discrete-attributed datasets. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 147–156

  • Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM 57(2):5:1–5:32

    Article  MathSciNet  MATH  Google Scholar 

  • Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  MATH  Google Scholar 

  • Lokam SV (2009) Complexity lower bounds using linear algebra. Found Trends Theor Comput Sci 4:1–155

    Article  MathSciNet  MATH  Google Scholar 

  • Lovász L, Saks ME (1988) Lattices, möbius functions and communication complexity. In: Proceedings of the 29th annual symposium on Foundations of Computer Science (FOCS). IEEE, pp 81–90

  • Lu H, Vaidya J, Atluri V (2008) Optimal boolean matrix decomposition: application to role engineering. In: Proceedings of the 24th international conference on data engineering, (ICDE), pp 297–306

  • Lu H, Vaidya J, Atluri V, Shin H, Jiang L (2011) Weighted rank-one binary matrix factorization. In: Proceedings of the eleventh SIAM international conference on data mining, SDM 2011, 28–30 Apr 2011, Mesa, Arizona, USA. SIAM/Omnipress, pp 283–294

  • Lu H, Vaidya J, Atluri V, Hong Y (2012) Constraint-aware role mining via extended boolean matrix decomposition. IEEE Trans Dependable Secur Comput 9(5):655–669

    Google Scholar 

  • Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends Mach Learn 3(2):123–224

    MATH  Google Scholar 

  • Marx D (2008) Closest substring problems with small distances. SIAM J Comput 38(4):1382–1410

    Article  MathSciNet  MATH  Google Scholar 

  • Meesum SM, Saurabh S (2016) Rank reduction of directed graphs by vertex and edge deletions. In: Proceedings of the 12th Latin American symposium on (LATIN), lecture notes in computer science, vol 9644. Springer, pp 619–633

  • Meesum SM, Misra P, Saurabh S (2016) Reducing rank of the adjacency matrix by graph modification. Theoret Comput Sci 654:70–79

    Article  MathSciNet  MATH  Google Scholar 

  • Miettinen P, Vreeken J (2011) Model order selection for boolean matrix factorization. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 51–59

  • Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362

    Article  Google Scholar 

  • Mitra B, Sural S, Vaidya J, Atluri V (2016) A survey of role mining. ACM Comput Surv 48(4):50:1–50:37

    Article  Google Scholar 

  • Moitra A (2016) An almost optimal algorithm for computing nonnegative rank. SIAM J Comput 45(1):156–173

    Article  MathSciNet  MATH  Google Scholar 

  • Naik GR (2016) Non-negative matrix factorization techniques. Springer, Berlin

    Book  MATH  Google Scholar 

  • Naor M, Schulman LJ, Srinivasan A (1995) Splitters and near-optimal derandomization. In: Proceedings of the 36th annual symposium on Foundations of Computer Science (FOCS). IEEE, pp 182–191

  • Orlin J (1977) Contentment in graph theory: covering graphs with cliques. Nederl Akad Wetensch Proc Ser A 80=Indag Math 39(5):406–424

    Article  MathSciNet  MATH  Google Scholar 

  • Ostrovsky R, Rabani Y (2002) Polynomial-time approximation schemes for geometric min-sum median clustering. J ACM 49(2):139–156

    Article  MathSciNet  MATH  Google Scholar 

  • Painsky A, Rosset S, Feder M (2016) Generalized independent component analysis over finite alphabets. IEEE Trans Inf Theory 62(2):1038–1053

    Article  MathSciNet  MATH  Google Scholar 

  • Razborov AA (1989) On rigid matrices. Manuscript in Russian

  • Razenshteyn IP, Song Z, Woodruff DP (2016) Weighted low rank approximations with provable guarantees. In: Proceedings of the 48th annual ACM symposium on theory of computing (STOC). ACM, pp 250–263

  • Shen BH, Ji S, Ye J (2009) Mining discrete patterns via binary matrix factorization. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 757–766

  • Shi Z, Wang L, Shi L (2014) Approximation method to rank-one binary matrix factorization. In: 2014 IEEE international conference on automation science and engineering, CASE 2014, New Taipei, Taiwan, 18–22 Aug 2014. IEEE, pp 800–805

  • Vaidya J (2012) Boolean matrix decomposition problem: theory, variations and applications to data engineering. In: Proceedings of the 28th IEEE international conference on data engineering (ICDE). IEEE Computer Society, pp 1222–1224

  • Vaidya J, Atluri V, Guo Q (2007) The role mining problem: finding a minimal descriptive set of roles. In: Proceedings of the 12th ACM symposium on access control models and (SACMAT), pp 175–184

  • Valiant LG (1977) Graph-theoretic arguments in low-level complexity. In: Mathematical foundations of computer science (MFCS), Lecture Notes in Computer Science, vol 53. Springer, pp 162–176

  • Woodruff DP (2014) Sketching as a tool for numerical linear algebra. Found Trends Theor Comput Sci 10(1–2):1–157

    Article  MathSciNet  MATH  Google Scholar 

  • Wright J, Ganesh A, Rao SR, Peng Y, Ma Y (2009) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Proceedings of 23rd annual conference on neural information processing systems (NIPS). Curran Associates, Inc., pp 2080–2088

  • Wulff S, Urner R, Ben-David S (2013) Monochromatic bi-clustering. In: Proceedings of the 30th international conference on machine learning, (ICML), JMLR.org, JMLR workshop and conference proceedings, vol 28, pp 145–153

  • Yannakakis M (1991) Expressing combinatorial optimization problems by linear programs. J Comput Syst Sci 43(3):441–466

    Article  MathSciNet  MATH  Google Scholar 

  • Yeredor A (2011) Independent component analysis over Galois fields of prime order. IEEE Trans Inf Theory 57(8):5342–5359

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank Daniel Lokshtanov, Syed Mohammad Meesum and Saket Saurabh for helpful discussions on the topic of the paper. We also are very grateful to the anonymous reviewers whose suggestions helped us to improve our results.

Funding

The research leading to these results have been supported by the Research Council of Norway via the projects “CLASSIS” (grant 249994) and “MULTIVAL” (grant 263317).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr A. Golovach.

Additional information

Responsible editor: Pauli Miettinen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work was done within the CEDAS center in Bergen. The preliminary version of this paper appeared as an extended abstract in the proceedings of ICALP 2018.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fomin, F.V., Golovach, P.A. & Panolan, F. Parameterized low-rank binary matrix approximation. Data Min Knowl Disc 34, 478–532 (2020). https://doi.org/10.1007/s10618-019-00669-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-019-00669-5

Keywords

Navigation