Skip to main content
Log in

From-below Boolean matrix factorization algorithm based on MDL

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

During the past few years Boolean matrix factorization (BMF) has become an important direction in data analysis. The minimum description length principle (MDL) was successfully adapted in BMF for the model order selection. Nevertheless, a BMF algorithm performing good results w.r.t. standard measures in BMF is missing. In this paper, we propose a novel from-below Boolean matrix factorization algorithm based on formal concept analysis. The algorithm utilizes the MDL principle as a criterion for the factor selection. On various experiments we show that the proposed algorithm outperforms—from different standpoints—existing state-of-the-art BMF algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. MDLGreConD is an abbreviation of Minimum Description Length Greedy Concept on Demand.

  2. GreConD is an abbreviation of Greedy Concept on Demand.

  3. Breast Cancer Wisconsin (Original).

References

  • Belohlavek R, Trnecka M (2015) From-below approximations in Boolean matrix factorization: geometry and new algorithm. J Comput Syst Sci 81(8):1678–1697

    Article  MathSciNet  Google Scholar 

  • Belohlavek R, Vychodil V (2010) Discovery of optimal factors in binary data via a novel method of matrix decomposition. J Comput Syst Sci 76(1):3–20

    Article  MathSciNet  Google Scholar 

  • Belohlavek R, Grissa D, Guillaume S, Nguifo EM, Outrata J (2014) Boolean factors as a means of clustering of interestingness measures of association rules. Ann Math Artif Intell 70(1–2):151–184

    Article  MathSciNet  Google Scholar 

  • Belohlavek R, Outrata J, Trnecka M (2018) Toward quality assessment of Boolean matrix factorizations. Inf Sci 459:71–85

    Article  MathSciNet  Google Scholar 

  • Brault V, Mariadassou M (2015) Co-clustering through latent bloc model: a review. Journal de la Société Française de Statistique 156(3):120–139

    MathSciNet  MATH  Google Scholar 

  • Ene A, Horne WG, Milosavljevic N, Rao P, Schreiber R, Tarjan RE (2008) Fast exact and heuristic methods for role minimization problems. In: Ray I, Li N (eds) 13th ACM symposium on access control models and technologies, SACMAT 2008, Estes Park, CO, USA, June 11–13, 2008, Proceedings. ACM, pp 1–10

  • Ganter B, Wille R (1999) Formal concept analysis mathematical foundations. Springer, Berlin

    Book  Google Scholar 

  • Geerts F, Goethals B, Mielikäinen T (2004) Tiling databases. In: Suzuki E, Arikawa S (eds) Discovery science, 7th international conference, DS 2004, Padova, Italy, October 2–5, 2004, Proceedings, volume 3245 of Lecture Notes in Computer Science. Springer, pp 278–289

  • Govaert G, Nadif M (2008) Block clustering with Bernoulli mixture models: comparison of different approaches. Comput Stat Data Anal 52(6):3233–3245

    Article  MathSciNet  Google Scholar 

  • Grünwald PD (2007) The minimum description length principle (adaptive computation and machine learning). The MIT Press, Cambridge

    Book  Google Scholar 

  • Hashemi S, Tann H, Reda S (2019) Approximate logic synthesis using Boolean matrix factorization. In: Reda S, Shafique M (eds) Approximate circuits. Springer, pp 141–154

  • Ignatov DI, Nenova E, Konstantinova N, Konstantinov AV (2014) Boolean matrix factorisation for collaborative filtering: an FCA-based approach. In: Agre G, Hitzler P, Krisnadhi AA, Kuznetsov SO (eds) Artificial intelligence: methodology, systems, and applications—16th international conference, AIMSA 2014, Varna, Bulgaria, September 11–13, 2014. Proceedings, volume 8722 of Lecture Notes in Computer Science. Springer, pp 47–58

  • Iovleff S, Singh Bhatia P, Demont J, Brault V, Kubicki V, Govaert G, Biernacki C, Celeux G (2019) Blockcluster: co-clustering package for binary, categorical, contingency and continuous data-sets. https://CRAN.R-project.org/package=blockcluster. Accessed 26 Mar 2019

  • Kim KH (1982) Boolean matrix theory and applications. Dekker, New York

    MATH  Google Scholar 

  • Kocayusufoglu F, Hoang MX, Singh AK (2018) Summarizing network processes with network-constrained Boolean matrix factorization. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 237–246

  • Lichman M (2013) UCI machine learning repository

  • Lucchese C, Orlando S, Perego R (2014) A unifying framework for mining approximate top-k binary patterns. IEEE Trans Knowl Data Eng 26(12):2900–2913

    Article  Google Scholar 

  • Lucchese C, Orlando S, Perego R (2010) Mining top-k patterns from binary datasets in presence of noise. In: Proceedings of the SIAM international conference on data mining, SDM 2010, 29 April–May 1, 2010, Columbus, Ohio, USA. SIAM, pp 165–176

  • Makhalova TP, Kuznetsov SO, Napoli A (2018) A first study on what MDL can do for FCA. In: Ignatov DI, Nourine L (eds) Proceedings of the fourteenth international conference on concept lattices and their applications, CLA 2018, Olomouc, Czech Republic, June 12-14, 2018, volume 2123 of CEUR Workshop Proceedings, pp 25–36. CEUR-WS.org

  • Miettinen P, Vreeken J (2014) MDL4BMF: minimum description length for Boolean matrix factorization. ACM Trans Knowl Discov Data (TKDD) 8(4):18

    Google Scholar 

  • Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362

    Article  Google Scholar 

  • Miettinen P, Vreeken J (2011) Model order selection for Boolean matrix factorization. In: Apté C, Ghosh J, Smyth P (eds) Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21–24, 2011. ACM, pp 51–59

  • Monson SD, Pullman S, Rees R (1995) A survey of clique and biclique coverings and factorizations of (0,1)-matrices. Bull ICA 14:17–86

    MathSciNet  MATH  Google Scholar 

  • Nau DS, Markowsky G, Woodbury MA, Amos DB (1978) A mathematical analysis of human leukocyte antigen serology. Math Biosci 40(3–4):243–270

    Article  MathSciNet  Google Scholar 

  • Stockmeyer LJ (1975) The set basis problem is NP-complete. Research reports. IBM Thomas J, Watson Research Division

  • Tatti N, Mielikäinen T, Gionis A, Mannila H (2006) What is the dimension of your binary data? In: Proceedings of the 6th IEEE international conference on data mining (ICDM 2006), 18–22 December 2006, Hong Kong, China. IEEE Computer Society, pp 603–612

  • Xiang Y, Jin R, Fuhry D, Dragan FF (2011) Summarizing transactional databases with overlapped hyperrectangles. Data Min Knowl Discov 23(2):215–251

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers for their comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Trnecka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Makhalova, T., Trnecka, M. From-below Boolean matrix factorization algorithm based on MDL. Adv Data Anal Classif 15, 37–56 (2021). https://doi.org/10.1007/s11634-019-00383-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-019-00383-6

Keywords

Mathematics Subject Classification

Navigation