Abstract
During the past few years Boolean matrix factorization (BMF) has become an important direction in data analysis. The minimum description length principle (MDL) was successfully adapted in BMF for the model order selection. Nevertheless, a BMF algorithm performing good results w.r.t. standard measures in BMF is missing. In this paper, we propose a novel from-below Boolean matrix factorization algorithm based on formal concept analysis. The algorithm utilizes the MDL principle as a criterion for the factor selection. On various experiments we show that the proposed algorithm outperforms—from different standpoints—existing state-of-the-art BMF algorithms.
Similar content being viewed by others
Notes
MDLGreConD is an abbreviation of Minimum Description Length Greedy Concept on Demand.
GreConD is an abbreviation of Greedy Concept on Demand.
Breast Cancer Wisconsin (Original).
References
Belohlavek R, Trnecka M (2015) From-below approximations in Boolean matrix factorization: geometry and new algorithm. J Comput Syst Sci 81(8):1678–1697
Belohlavek R, Vychodil V (2010) Discovery of optimal factors in binary data via a novel method of matrix decomposition. J Comput Syst Sci 76(1):3–20
Belohlavek R, Grissa D, Guillaume S, Nguifo EM, Outrata J (2014) Boolean factors as a means of clustering of interestingness measures of association rules. Ann Math Artif Intell 70(1–2):151–184
Belohlavek R, Outrata J, Trnecka M (2018) Toward quality assessment of Boolean matrix factorizations. Inf Sci 459:71–85
Brault V, Mariadassou M (2015) Co-clustering through latent bloc model: a review. Journal de la Société Française de Statistique 156(3):120–139
Ene A, Horne WG, Milosavljevic N, Rao P, Schreiber R, Tarjan RE (2008) Fast exact and heuristic methods for role minimization problems. In: Ray I, Li N (eds) 13th ACM symposium on access control models and technologies, SACMAT 2008, Estes Park, CO, USA, June 11–13, 2008, Proceedings. ACM, pp 1–10
Ganter B, Wille R (1999) Formal concept analysis mathematical foundations. Springer, Berlin
Geerts F, Goethals B, Mielikäinen T (2004) Tiling databases. In: Suzuki E, Arikawa S (eds) Discovery science, 7th international conference, DS 2004, Padova, Italy, October 2–5, 2004, Proceedings, volume 3245 of Lecture Notes in Computer Science. Springer, pp 278–289
Govaert G, Nadif M (2008) Block clustering with Bernoulli mixture models: comparison of different approaches. Comput Stat Data Anal 52(6):3233–3245
Grünwald PD (2007) The minimum description length principle (adaptive computation and machine learning). The MIT Press, Cambridge
Hashemi S, Tann H, Reda S (2019) Approximate logic synthesis using Boolean matrix factorization. In: Reda S, Shafique M (eds) Approximate circuits. Springer, pp 141–154
Ignatov DI, Nenova E, Konstantinova N, Konstantinov AV (2014) Boolean matrix factorisation for collaborative filtering: an FCA-based approach. In: Agre G, Hitzler P, Krisnadhi AA, Kuznetsov SO (eds) Artificial intelligence: methodology, systems, and applications—16th international conference, AIMSA 2014, Varna, Bulgaria, September 11–13, 2014. Proceedings, volume 8722 of Lecture Notes in Computer Science. Springer, pp 47–58
Iovleff S, Singh Bhatia P, Demont J, Brault V, Kubicki V, Govaert G, Biernacki C, Celeux G (2019) Blockcluster: co-clustering package for binary, categorical, contingency and continuous data-sets. https://CRAN.R-project.org/package=blockcluster. Accessed 26 Mar 2019
Kim KH (1982) Boolean matrix theory and applications. Dekker, New York
Kocayusufoglu F, Hoang MX, Singh AK (2018) Summarizing network processes with network-constrained Boolean matrix factorization. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 237–246
Lichman M (2013) UCI machine learning repository
Lucchese C, Orlando S, Perego R (2014) A unifying framework for mining approximate top-k binary patterns. IEEE Trans Knowl Data Eng 26(12):2900–2913
Lucchese C, Orlando S, Perego R (2010) Mining top-k patterns from binary datasets in presence of noise. In: Proceedings of the SIAM international conference on data mining, SDM 2010, 29 April–May 1, 2010, Columbus, Ohio, USA. SIAM, pp 165–176
Makhalova TP, Kuznetsov SO, Napoli A (2018) A first study on what MDL can do for FCA. In: Ignatov DI, Nourine L (eds) Proceedings of the fourteenth international conference on concept lattices and their applications, CLA 2018, Olomouc, Czech Republic, June 12-14, 2018, volume 2123 of CEUR Workshop Proceedings, pp 25–36. CEUR-WS.org
Miettinen P, Vreeken J (2014) MDL4BMF: minimum description length for Boolean matrix factorization. ACM Trans Knowl Discov Data (TKDD) 8(4):18
Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362
Miettinen P, Vreeken J (2011) Model order selection for Boolean matrix factorization. In: Apté C, Ghosh J, Smyth P (eds) Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21–24, 2011. ACM, pp 51–59
Monson SD, Pullman S, Rees R (1995) A survey of clique and biclique coverings and factorizations of (0,1)-matrices. Bull ICA 14:17–86
Nau DS, Markowsky G, Woodbury MA, Amos DB (1978) A mathematical analysis of human leukocyte antigen serology. Math Biosci 40(3–4):243–270
Stockmeyer LJ (1975) The set basis problem is NP-complete. Research reports. IBM Thomas J, Watson Research Division
Tatti N, Mielikäinen T, Gionis A, Mannila H (2006) What is the dimension of your binary data? In: Proceedings of the 6th IEEE international conference on data mining (ICDM 2006), 18–22 December 2006, Hong Kong, China. IEEE Computer Society, pp 603–612
Xiang Y, Jin R, Fuhry D, Dragan FF (2011) Summarizing transactional databases with overlapped hyperrectangles. Data Min Knowl Discov 23(2):215–251
Acknowledgements
We thank the anonymous reviewers for their comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Makhalova, T., Trnecka, M. From-below Boolean matrix factorization algorithm based on MDL. Adv Data Anal Classif 15, 37–56 (2021). https://doi.org/10.1007/s11634-019-00383-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-019-00383-6