Detecting Bicliques in GF[q]

  • Jan Ramon
  • Pauli Miettinen
  • Jilles Vreeken
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8188)

Abstract

We consider the problem of finding planted bicliques in random matrices over GF[q]. That is, our input matrix is a GF[q]-sum of an unknown biclique (rank-1 matrix) and a random matrix. We study different models for the random graphs and characterize the conditions when the planted biclique can be recovered. We also empirically show that a simple heuristic can reliably recover the planted bicliques when our theory predicts that they are recoverable.

Existing methods can detect bicliques of \(O(\sqrt{N})\), while it is NP-hard to find the largest such clique. Real graphs, however, are typically extremely sparse and seldom contain such large bicliques. Further, the noise can destroy parts of the planted biclique. We investigate the practical problem of how small a biclique can be and how much noise there can be such that we can still approximately correctly identify the biclique. Our derivations show that with high probability planted bicliques of size logarithmic in the network size can be detected in data following the Erdős-Rényi model and two bipartite variants of the Barabási-Albert model.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alon, N., Panigrahy, R., Yekhanin, S.: Deterministic Approximation Algorithms for the Nearest Codeword Problem. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX 2009. LNCS, vol. 5687, pp. 339–351. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  2. 2.
    Ames, B.P.W., Vavasis, S.A.: Nuclear norm minimization for the planted clique and biclique problems. Math. Program. B 129(1), 69–89 (2011)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Arora, S., Babai, L., Stern, J., Sweedyk, Z.: The Hardness of Approximate Optima in Lattices, Codes, and Systems of Linear Equations. In: FOCS 1993, pp. 724–733 (1993)Google Scholar
  4. 4.
    Berman, P., Karpinski, M.: Approximating minimum unsatisfiability of linear equations. In: SODA 2002, pp. 514–516 (January 2002)Google Scholar
  5. 5.
    Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: SIGCOMM, pp. 251–262 (1999)Google Scholar
  6. 6.
    Hochbaum, D.S.: Approximating clique and biclique problems. J. Algorithm 29(1), 174–200 (1998)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Koyutürk, M., Grama, A.: PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets. In: KDD 2003, pp. 147–156 (2003)Google Scholar
  8. 8.
    Lee, V.E., Ruan, N., Jin, R., Aggarwal, C.: A Survey of Algorithms for Dense Subgraph Discovery. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data, pp. 303–336. Springer, New York (2010)CrossRefGoogle Scholar
  9. 9.
    Leskovec, J., Chakrabarti, D., Kleinberg, J.M., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)MathSciNetMATHGoogle Scholar
  10. 10.
    Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE TKDE 20(10), 1348–1362 (2008)Google Scholar
  11. 11.
    Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Appl. Math. 131(3), 651–654 (2003)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Sim, K., Li, J., Gopalkrishnan, V., Liu, G.: Mining maximal quasi-bicliques: Novel algorithm and applications in the stock market and protein networks. Statistical Analysis and Data Mining 2(4), 255–273 (2009)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Wall, M.E.: Structure–function relations are subtle in genetic regulatory networks. Math. Bioscience 231(1), 61–68 (2011)MATHCrossRefGoogle Scholar
  14. 14.
    Yeger-Lotem, E., Sattath, S., Kashtan, N., Itzkovitz, S., Milo, R., Pinter, R.Y., Alon, U., Margalit, H.: Network motifs in integrated cellular networks of transcriptionregulation and proteinprotein interaction. PNAS 101(16), 5934–5939 (2004)CrossRefGoogle Scholar
  15. 15.
    Zhang, Z.-Y., Li, T., Ding, C., Ren, X.-W., Zhang, X.-S.: Binary matrix factorization for analyzing gene expression data. Data Min. Knowl. Disc. 20(1), 28–52 (2010)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jan Ramon
    • 1
  • Pauli Miettinen
    • 2
  • Jilles Vreeken
    • 3
  1. 1.Department of Computer ScienceKU LeuvenBelgium
  2. 2.Max-Planck Institute for InformaticsSaarbrückenGermany
  3. 3.Dept. of Mathematics and Computer ScienceUniversity of AntwerpBelgium

Personalised recommendations