Data Mining and Knowledge Discovery

, Volume 17, Issue 1, pp 39–56 | Cite as

The Boolean column and column-row matrix decompositions

  • Pauli MiettinenEmail author


Matrix decompositions are used for many data mining purposes. One of these purposes is to find a concise but interpretable representation of a given data matrix. Different decomposition formulations have been proposed for this task, many of which assume a certain property of the input data (e.g., nonnegativity) and aim at preserving that property in the decomposition. In this paper we propose new decomposition formulations for binary matrices, namely the Boolean CX and CUR decompositions. They are natural combinations of two previously presented decomposition formulations. We consider also two subproblems of these decompositions and present a rigorous theoretical study of the subproblems. We give algorithms for the decompositions and for the subproblems, and study their performance via extensive experimental evaluation. We show that even simple algorithms can give accurate and intuitive decompositions of real data, thus demonstrating the power and usefulness of the proposed decompositions.


Matrix decompositions Approximation CX decomposition CUR decomposition Boolean decompositions 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Berry MW, Pulatova SA, Stewart GW (2005) Algorithm 844: computing sparse reduced-rank approximations to sparse matrices. ACM Trans Math Softw 31(2): 252–269zbMATHCrossRefMathSciNetGoogle Scholar
  2. Berry M et al (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1): 155–173CrossRefGoogle Scholar
  3. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3: 993–1022zbMATHCrossRefGoogle Scholar
  4. Drineas P, Mahoney MW, Muthukrishnan S (2007) Relative-error CUR matrix decompositions. Technical report arXiv:0708.3696v1 [cs.DS]Google Scholar
  5. Fortelius M (2003) Neogene of the old world database of fossil mammals (NOW). Accessed 17 July 2003
  6. Fortelius M et al (2006) Spectral ordering and biochronology of European fossil mammals. Paleobiology 32(2): 206–214CrossRefGoogle Scholar
  7. Golub GH, Van Loan CF (1996) Matrix computations. Johns Hopkins University PressGoogle Scholar
  8. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd ACM SIGIR conference on research and development in information retrieval, pp 50–57Google Scholar
  9. Hyvönen S, Miettinen P, Terzi E (2008) Interpretable nonnegative matrix decompositions. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery & data mining (KDD) (in press)Google Scholar
  10. Kozlov M, Tarasov S, Hačijan L (1979) Polynomial solvability of convex quadratic programming. Soviet Math Dokl 20(5): 1108–1111zbMATHGoogle Scholar
  11. Lu H, Vaidya J, Atluri V (2008) Optimal Boolean matrix decomposition: application to role engineering. In: Proceedings of the 24th IEEE international conference on data engineering (ICDE), p 297–306Google Scholar
  12. Miettinen P (2008a) On the positive–negative partial set cover problem. Inform Process Lett. doi: 10.1016/j.ipl.2008.05.007 (in press)
  13. Miettinen P et al (2008b) The discrete basis problem. IEEE Trans Knowl Data Eng. doi: 10.1109/tkde.2008.53 (in press)
  14. Peleg D (2007) Approximation algorithms for the label−coverMAX and red-blue set cover problems. J Discrete Algorithms 5: 55–64zbMATHCrossRefMathSciNetGoogle Scholar
  15. Sun J et al (2008) Less is more: sparse graph mining with compact matrix decomposition. Stat Anal Data Min 1(1): 6–22CrossRefGoogle Scholar
  16. Zhang Z et al (2007) Binary matrix factorization with applications. In: Proceedings of the 7th IEEE international conference on data mining (ICDM), pp 391–400Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Helsinki Institute for Information TechnologyUniversity of HelsinkiHelsinkiFinland

Personalised recommendations