Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2015: Machine Learning and Knowledge Discovery in Databases pp 36-52 | Cite as

Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining: Complexity Beyond Blocks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9285)

Abstract

Matrix factorizations are a popular tool to mine regularities from data. There are many ways to interpret the factorizations, but one particularly suited for data mining utilizes the fact that a matrix product can be interpreted as a sum of rank-1 matrices. Then the factorization of a matrix becomes the task of finding a small number of rank-1 matrices, sum of which is a good representation of the original matrix. Seen this way, it becomes obvious that many problems in data mining can be expressed as matrix factorizations with correct definitions of what a rank-1 matrix and a sum of rank-1 matrices mean. This paper develops a unified theory, based on generalized outer product operators, that encompasses many pattern set mining tasks. The focus is on the computational aspects of the theory and studying the computational complexity and approximability of many problems related to generalized matrix factorizations. The results immediately apply to a large number of data mining problems, and hopefully allow generalizing future results and algorithms, as well.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alon, N., Panigrahy, R., Yekhanin, S.: Deterministic approximation algorithms for the nearest codeword problem. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) PPROX and RANDOM 2009. LNCS, vol. 5687, pp. 339–351. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  2. 2.
    Ames, B.P.W., Vavasis, S.A.: Nuclear norm minimization for the planted clique and biclique problems. Math. Program. B 129(1), 69–89 (2011)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Araujo, M., Günnemann, S., Mateos, G., Faloutsos, C.: Beyond blocks: hyperbolic community detection. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part I. LNCS, vol. 8724, pp. 50–65. Springer, Heidelberg (2014) Google Scholar
  4. 4.
    Arora, S., Babai, L., Stern, J., Sweedyk, Z.: The hardness of approximate optima in lattices, codes, and systems of linear equations. In: FOCS 1993, pp. 724–733 (1993)Google Scholar
  5. 5.
    Bělohlávek, R., Krmelova, M.: Beyond boolean matrix decompositions: toward factor analysis and dimensionality reduction of ordinal data. In: ICDM 2013, pp. 961–966 (2013)Google Scholar
  6. 6.
    Bělohlávek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010)CrossRefMATHGoogle Scholar
  7. 7.
    Belohlavek, R., Vychodil, V.: Factorizing three-way binary data with triadic formal concepts. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010, Part I. LNCS, vol. 6276, pp. 471–480. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  8. 8.
    Berman, P., Karpinski, M.: Approximating minimum unsatisfiability of linear equations. In: SODA 2002, pp. 514–516 (2002)Google Scholar
  9. 9.
    Cerf, L., Besson, J., Nguyen, K.N.T., Boulicaut, J.F.: Closed and noise-tolerant patterns in n-ary relations. Data Min. Knowl. Discov. 26(3), 574–619 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Discov. 23(3), 407–446 (2011)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Dumer, I., Micciancio, D., Sudan, M.: Hardness of approximating the minimum distance of a linear code. IEEE Trans. Inform. Theory 49(1), 22–37 (2003)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Ene, A., Horne, W., Milosavljevic, N., Rao, P., Schreiber, R., Tarjan, R.E.: Fast exact and heuristic methods for role minimization problems. In: SACMAT 2008, pp. 1–10 (2008)Google Scholar
  13. 13.
    Feige, U.: A threshold of \(\ln n\) for Approximating Set Cover. J. ACM 45(4), 634–652 (1998)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Garey, M.R., Johnson, D.S.: Computers and intractability: A guide to the theory of NP-Completeness. W. H. Freeman, New York (1979)Google Scholar
  15. 15.
    Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  16. 16.
    Johnson, D.S.: Approximation Algorithms for Combinatorial Problems. J. Comput. Syst. Sci. 9, 256–278 (1974)CrossRefGoogle Scholar
  17. 17.
    Junttila, E.: Patterns in permuted binary matrices. Ph.D. thesis, Helsinki University Press, Helsinki, August 2011Google Scholar
  18. 18.
    Kötter, T., Günnemann, S., Berthold, M., Faloutsos, C.: Extracting taxonomies from bipartite graphs. In: WWW 2015 Companion, pp. 51–52 (2015)Google Scholar
  19. 19.
    Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VoG: summarizing and understanding large graphs. In: SDM 2014, pp. 91–99 (2014)Google Scholar
  20. 20.
    Le Van, T., van Leeuwen, M., Nijssen, S., Fierro, A.C., Marchal, K., De Raedt, L.: Ranked tiling. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8725, pp. 98–113. Springer, Heidelberg (2014) Google Scholar
  21. 21.
    Lewis, J.M., Yannakakis, M.: The node-deletion problem for hereditary properties is NP-complete. J. Comput. Syst. Sci. 20(2), 219–230 (1980)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Lucchese, C., Orlando, S., Perego, R.: A Unifying Framework for Mining Approximate Top-k Binary Patterns. IEEE Trans. Knowl. Data Eng. 26(12), 2900–2913 (2013)CrossRefGoogle Scholar
  23. 23.
    Maurus, S., Plant, C.: Ternary matrix factorization. In: ICDM 2014, pp. 400–409 (2014)Google Scholar
  24. 24.
    Miettinen, P.: On the positive-negative partial set cover problem. Inform. Process. Lett. 108(4), 219–221 (2008)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Miettinen, P.: Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms. Ph.D. thesis, Department of Computer Science, University of Helsinki (2009)Google Scholar
  26. 26.
    Miettinen, P.: Boolean tensor factorizations. In: ICDM 2011, pp. 447–456 (2011)Google Scholar
  27. 27.
    Miettinen, P.: Fully dynamic quasi-biclique edge covers via Boolean matrix factorizations. In: DyNetMM 2013, pp. 17–24 (2013)Google Scholar
  28. 28.
    Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The Discrete Basis Problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008)CrossRefMATHGoogle Scholar
  29. 29.
    Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Appl. Math. 131(3), 651–654 (2003)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Peleg, D.: Approximation algorithms for the Label-Cover\(_{MAX}\) and Red-Blue Set Cover problems. J. Discrete Alg. 5(1), 55–64 (2007)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Ramon, J., Miettinen, P., Vreeken, J.: Detecting bicliques in GF[q]. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 509–524. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  32. 32.
    Simon, H.U.: On approximate solutions for combinatorial optimization problems. SIAM J. Discrete Math. 3(2), 294–310 (1990)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov. 23(2), 215–251 (2011)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: WSDM 2013 (2013)Google Scholar
  36. 36.
    Yannakakis, M.: Node-Deletion Problems on Bipartite Graphs. SIAM J. Comput. 10(2), 310–327 (1981)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Max-Planck-Institut Für InformatikSaarbrückenGermany

Personalised recommendations