Annals of Operations Research

, Volume 216, Issue 1, pp 145–161 | Cite as

A branch-and-bound approach for maximum quasi-cliques

  • Foad Mahdavi Pajouh
  • Zhuqi Miao
  • Balabhaskar Balasundaram
Article

Abstract

Detecting quasi-cliques in graphs is a useful tool for detecting dense clusters in graph-based data mining. Particularly in large-scale data sets that are error-prone, cliques are overly restrictive and impractical. Quasi-clique detection has been accomplished using heuristic approaches in various applications of graph-based data mining in protein interaction networks, gene co-expression networks, and telecommunication networks. Quasi-cliques are not hereditary, in the sense that every subset of a quasi-clique need not be a quasi-clique. This lack of heredity introduces interesting challenges in the development of exact algorithms to detect maximum cardinality quasi-cliques. The only exact approaches for this problem are limited to two mixed integer programming formulations that were recently proposed in the literature. The main contribution of this article is a new combinatorial branch-and-bound algorithm for the maximum quasi-clique problem.

Keywords

Clique Quasi-clique Cluster detection Graph-based data mining 

References

  1. Abello, J., Pardalos, P. M., & Resende, M. G. C. (1999). On maximum clique problems in very large graphs. In J. Abello & J. Vitter (Eds.), DIMACS series on discrete mathematics and theoretical computer science: Vol. 50. External memory algorithms and visualization (pp. 119–130). Providence: American Mathematical Society. Google Scholar
  2. Abello, J., Resende, M. G. C., & Sudarsky, S. (2002). Massive quasi-clique detection. In S. Rajsbaum (Ed.), LATIN 2002: proceedings of the 5th Latin American symposium on theoretical informatics (pp. 598–612). London: Springer. CrossRefGoogle Scholar
  3. Adamic, L., & Huberman, B. (2000). Power-law distribution of the World Wide Web. Science, 287, 2115a. CrossRefGoogle Scholar
  4. Alba, R. D. (1973). A graph-theoretic definition of a sociometric clique. The Journal of Mathematical Sociology, 3(1), 113–126. CrossRefGoogle Scholar
  5. Almaas, E., & Barabási, A. L. (2006). Power laws in biological networks. In E. Koonin, Y. I. Wolf, & G. P. Karev (Eds.), Power laws, scale-free networks and genome biology (pp. 1–11). New York: Springer. CrossRefGoogle Scholar
  6. Balasundaram, B., Butenko, S., & Trukhanov, S. (2005). Novel approaches for analyzing biological networks. Journal of Combinatorial Optimization, 10(1), 23–39. CrossRefGoogle Scholar
  7. Balasundaram, B., Butenko, S., & Hicks, I. V. (2011). Clique relaxations in social network analysis: the maximum k-plex problem. Operations Research, 59(1), 133–142. CrossRefGoogle Scholar
  8. Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512. CrossRefGoogle Scholar
  9. Barabási, A. L., Albert, R., & Jeong, H. (2000). Scale-free characteristics of random networks: the topology of the World Wide Web. Physica. A, 281(1–4), 69–77. CrossRefGoogle Scholar
  10. Batagelj, V., & Mrvar, A. (2006). Pajek datasets: Reuters terror news network. Online: http://vlado.fmf.uni-lj.si/pub/networks/data/CRA/terror.htm. Accessed March 2008.
  11. Boginski, V., Butenko, S., & Pardalos, P. M. (2003). On structural properties of the market graph. In A. Nagurney (Ed.), Innovation in financial and economic networks. London: Edward Elgar. Google Scholar
  12. Boginski, V., Butenko, S., & Pardalos, P. (2006). Mining market data: a network approach. Computers & Operations Research, 33, 3171–3184. CrossRefGoogle Scholar
  13. Bomze, I. M., Budinich, M., Pardalos, P. M., & Pelillo, M. (1999). The maximum clique problem. In D. Z. Du & P. M. Pardalos (Eds.), Handbook of combinatorial optimization (pp. 1–74). Dordrecht: Kluwer Academic. CrossRefGoogle Scholar
  14. Broido, A., & Claffy, K. C. (2001). Internet topology: connectivity of IP graphs. In S. Fahmy & K. Park (Eds.), Scalability and traffic control in IP networks (pp. 172–187). Bellingham: SPIE. CrossRefGoogle Scholar
  15. Brunato, M., Hoos, H., & Battiti, R. (2008). On effectively finding maximal quasi-cliques in graphs. In V. Maniezzo, R. Battiti, & J. P. Watson (Eds.), Lecture notes in computer science: Vol. 5313. Learning and intelligent optimization (pp. 41–55). Berlin: Springer. CrossRefGoogle Scholar
  16. Bu, D., Zhao, Y., Cai, L., Xue, H., Zhu, X., Lu, H., Zhang, J., Sun, S., Ling, L., Zhang, N., Li, G., & Chen, R. (2003). Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Research, 31(9), 2443–2450. CrossRefGoogle Scholar
  17. Carlson, J. M., & Doyle, J. (1999). Highly optimized tolerance: a mechanism for power laws in designed systems. Physical Review E, 60(2), 1412–1427. CrossRefGoogle Scholar
  18. Carraghan, R., & Pardalos, P. (1990). An exact algorithm for the maximum clique problem. Operations Research Letters, 9, 375–382. CrossRefGoogle Scholar
  19. Chung, F., & Lu, L. (2006). CBMS lecture series. Complex graphs and networks. Providence: American Mathematical Society. Google Scholar
  20. Cook, D. J., & Holder, L. B. (2000). Graph-based data mining. IEEE Intelligent Systems, 15(2), 32–41. CrossRefGoogle Scholar
  21. Corman, S., Kuhn, T., McPhee, R., & Dooley, K. (2002). Studying complex discursive systems: centering resonance analysis of organizational communication. Human Communication Research, 28(2), 157–206. Google Scholar
  22. Corneil, D., & Perl, Y. (1984). Clustering and domination in perfect graphs. Discrete Applied Mathematics, 9, 27–39. CrossRefGoogle Scholar
  23. Dimacs (1995). Cliques, coloring, and satisfiability: second Dimacs implementation challenge. Online: http://dimacs.rutgers.edu/Challenges/. Accessed March 2007.
  24. Erdös, P., & Rényi, A. (1959). On random graphs. Publicationes Mathematicae, 6, 290–297. Google Scholar
  25. Faloutsos, M., Faloutsos, P., & Faloutsos, C. (1999). On power-law relationships of the Internet topology. In Proceedings of the ACM-SIGCOMM conference on applications, technologies, architectures, and protocols for computer communication, Cambridge (pp. 251–262). CrossRefGoogle Scholar
  26. Feige, U., Kortsarz, G., & Peleg, D. (2001). The dense k-subgraph problem. Algorithmica, 29, 410–421. CrossRefGoogle Scholar
  27. Gagneur, J., Krause, R., Bouwmeester, T., & Casari, G. (2004). Modular decomposition of protein-protein interaction networks. Genome Biology, 5(8), R57. CrossRefGoogle Scholar
  28. Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12), 7821–7826. CrossRefGoogle Scholar
  29. Grossman, J., Ion, P., & Castro, R. D. (1995). The Erdös number project. Online: http://www.oakland.edu/enp/. Accessed March 2007.
  30. IBM Corporation (2010). IBM ILOG CPLEX Optimizer 12.2. http://www.ibm.com/software/integration/optimization/cplex-optimizer/. IBM Academic Initiative. Accessed June 2011.
  31. Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., & Sakaki, Y. (2001). A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America, 98(8), 4569–4574. CrossRefGoogle Scholar
  32. Jiang, D., & Pei, J. (2009). Mining frequent cross-graph quasi-cliques. ACM Transactions on Knowledge Discovery from Data, 2(4), 16. CrossRefGoogle Scholar
  33. Kortsarz, G., & Peleg, D. (1993). On choosing a dense subgraph. In Proceedings of the 34th annual IEEE symposium on foundations of computer science (pp. 692–701). Piscataway: IEEE Comput. Soc. Google Scholar
  34. Kreher, D. L., & Stinson, D. R. (1998). Combinatorial algorithms: generation, enumeration, and search (1st ed.). Boca Raton: CRC Press. Google Scholar
  35. Leskovec, J., & Horvitz, E. (2008). Planetary-scale views on a large instant-messaging network. In Proceeding of the 17th international conference on World Wide Web. WWW ’08 (pp. 915–924). New York: ACM. CrossRefGoogle Scholar
  36. Lu, H., Zhu, X., Liu, H., Skogerb, G., Zhang, J., Zhang, Y., Cai, L., Zhao, Y., Sun, S., Xu, J., Bu, D., & Chen, R. (2004). The interactome as a tree—an attempt to visualize the protein-protein interaction network in yeast. Nucleic Acids Research, 32(16), 4804–4811. CrossRefGoogle Scholar
  37. Luce, R. D. (1950). Connectivity and generalized cliques in sociometric group structure. Psychometrika, 15(2), 169–190. CrossRefGoogle Scholar
  38. Mokken, R. J. (1979). Cliques, clubs and clans. Quality and Quantity, 13(2), 161–173. CrossRefGoogle Scholar
  39. Östergård, P. R. J. (2002). A fast algorithm for the maximum clique problem. Discrete Applied Mathematics, 120, 197–207. CrossRefGoogle Scholar
  40. Patillo, J., Veremyev, A., Butenko, S., & Boginski, V. (2012). On the maximum quasi-clique problem. Discrete Applied Mathematics. doi:10.1016/j.dam.2012.07.019. Google Scholar
  41. Pei, J., Jiang, D., & Zhang, A. (2005a). Mining cross-graph quasi-cliques in gene expression and protein interaction data. In Proceedings of the 21st international conference on data engineering. ICDE 2005 (pp. 353–356). Google Scholar
  42. Pei, J., Jiang, D., & Zhang, A. (2005b). On mining cross-graph quasi-cliques. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. KDD ’05 (pp. 228–238). New York: ACM. CrossRefGoogle Scholar
  43. Peng, X., Langston, M. A., Saxton, A. M., Baldwin, N. E., & Snoddy, J. R. (2007). Detecting network motifs in gene co-expression networks through integration of protein domain information. In P. McConnell, S. M. Lin, & P. Hurban (Eds.), Methods of microarray data analysis V (pp. 89–102). New York: Springer. CrossRefGoogle Scholar
  44. Seidman, S. B., & Foster, B. L. (1978). A graph theoretic generalization of the clique concept. The Journal of Mathematical Sociology, 6, 139–154. CrossRefGoogle Scholar
  45. Simonite, T. (2011). Bracing for the data deluge. http://www.technologyreview.com/business/37506/. Accessed May 2011.
  46. Spirin, V., & Mirny, L. A. (2003). Protein complexes and functional modules in molecular networks. Proceedings of the National Academy of Sciences of the United States of America, 100(21), 12123–12128. CrossRefGoogle Scholar
  47. Washio, T., & Motoda, H. (2003). State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter, 5(1), 59–68. CrossRefGoogle Scholar
  48. West, D. (2001). Introduction to graph theory. Upper Saddle River: Prentice-Hall. Google Scholar
  49. Zeng, Z., Wang, J., Zhou, L., & Karypis, G. (2006). Coherent closed quasi-clique discovery from large dense graph databases. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’06 (pp. 797–802). New York: ACM. CrossRefGoogle Scholar
  50. Zeng, Z., Wang, J., Zhou, L., & Karypis, G. (2007). Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Transactions on Database Systems, 32, 13. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Foad Mahdavi Pajouh
    • 1
  • Zhuqi Miao
    • 1
  • Balabhaskar Balasundaram
    • 1
  1. 1.School of Industrial Engineering & ManagementOklahoma State UniversityStillwaterUSA

Personalised recommendations