Impact of Boolean factorization as preprocessing methods for classification of Boolean data



We explore a utilization of Boolean matrix factorization for data preprocessing in classification of Boolean data. In our previous work, we demonstrated that preprocessing that consists in replacing the original Boolean attributes by factors, i.e. new Boolean attributes obtained from the original ones by Boolean matrix factorization, can improve classification quality. The aim of this paper is to explore the question of how the various Boolean factorization methods that were proposed in the literature impact the quality of classification. In particular, we compare five factorization methods, present experimental results, and outline issues for future research.


Matrix decomposition Factor analysis Formal concept analysis 

Mathematics Subject Classifications (2010)

15A23 03C45 46L36 62H25 65F30 68T30 68W25 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository University of California, Irvine, School of Information and Computer Sciences (2007).
  2. 2.
    Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Sys. Sci. 76(1), 3–20 (2010)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Belohlavek, R., Trnecka, M.: From-below approximations in Boolean matrix factorization: geometry and new algorithm (submitted, available at arXiv:1306.4905 [cs.NA])
  4. 4.
    Chung, Y., Lee, S. Y., Elston, R. C., Park, T.: Odds ratio based multifactor-dimensionality reduction method for detecting genegene interactions. Bioinformatics 23(1), 71–76 (2007)CrossRefGoogle Scholar
  5. 5.
    Ganter, B., Glodeanu, C. V.: Ordinal factor analysis. Lect. Notes Comput. Sci. 7278, 128–139 (2012)CrossRefGoogle Scholar
  6. 6.
    Ganter, B., Wille, R.: Formal Concept Analysis. Mathematical Foundations. Springer, Berlin (1999)CrossRefMATHGoogle Scholar
  7. 7.
    Geerts, F., Goethals, B., Mielikäinen, T.: Tiling, Databases. In: Proceedings DS 2004, LNCS, vol. 3245, pp. 278–289 (2004)Google Scholar
  8. 8.
    Kim, K.H. In: Dekker, M. (ed.) : Boolean Matrix Theory and Applications (1982)Google Scholar
  9. 9.
    Kohavi, R.: A study on cross-validation and bootstrap for accuracy estimation and Model Selection. Proc. IJCAI, 1137–1145 (1995)Google Scholar
  10. 10.
    Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)Google Scholar
  11. 11.
    Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008). (preliminary version in PKDD 2006, pp. 335–346.)CrossRefGoogle Scholar
  12. 12.
    Outrata, J. Preprocessing input data for machine learning by FCA. In: Proceedings CLA 2010, pp. 187–198, Sevilla, SpainGoogle Scholar
  13. 13.
    Outrata, J. Boolean factor analysis for data preprocessing in machine learning. In: Proceedings ICMLA 2010, pp. 899-902, Washington, D.C., USAGoogle Scholar
  14. 14.
    Outrata, J., Vychodil, V.: Fast algorithm for computing fixpoints of galois connections induced by object-attribute relational data. Inf. Sci. 185(1) (114127)Google Scholar
  15. 15.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)Google Scholar
  16. 16.
    Ritchie, M. D., et al.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet 69, 138–147 (2001)CrossRefGoogle Scholar
  17. 17.
    Tatti, N., Mielikäinen, T., Gionis, A., Mannila, H.: What is the dimension of your binary data? Proc. ICDM, 603–612 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Data Analysis and Modeling Lab (DAMOL), Department of Computer SciencePalacky University, OlomoucOlomoucCzech Republic

Personalised recommendations