A branch-and-bound approach for maximum quasi-cliques
Detecting quasi-cliques in graphs is a useful tool for detecting dense clusters in graph-based data mining. Particularly in large-scale data sets that are error-prone, cliques are overly restrictive and impractical. Quasi-clique detection has been accomplished using heuristic approaches in various applications of graph-based data mining in protein interaction networks, gene co-expression networks, and telecommunication networks. Quasi-cliques are not hereditary, in the sense that every subset of a quasi-clique need not be a quasi-clique. This lack of heredity introduces interesting challenges in the development of exact algorithms to detect maximum cardinality quasi-cliques. The only exact approaches for this problem are limited to two mixed integer programming formulations that were recently proposed in the literature. The main contribution of this article is a new combinatorial branch-and-bound algorithm for the maximum quasi-clique problem.
KeywordsClique Quasi-clique Cluster detection Graph-based data mining
- Abello, J., Pardalos, P. M., & Resende, M. G. C. (1999). On maximum clique problems in very large graphs. In J. Abello & J. Vitter (Eds.), DIMACS series on discrete mathematics and theoretical computer science: Vol. 50. External memory algorithms and visualization (pp. 119–130). Providence: American Mathematical Society. Google Scholar
- Batagelj, V., & Mrvar, A. (2006). Pajek datasets: Reuters terror news network. Online: http://vlado.fmf.uni-lj.si/pub/networks/data/CRA/terror.htm. Accessed March 2008.
- Boginski, V., Butenko, S., & Pardalos, P. M. (2003). On structural properties of the market graph. In A. Nagurney (Ed.), Innovation in financial and economic networks. London: Edward Elgar. Google Scholar
- Chung, F., & Lu, L. (2006). CBMS lecture series. Complex graphs and networks. Providence: American Mathematical Society. Google Scholar
- Corman, S., Kuhn, T., McPhee, R., & Dooley, K. (2002). Studying complex discursive systems: centering resonance analysis of organizational communication. Human Communication Research, 28(2), 157–206. Google Scholar
- Dimacs (1995). Cliques, coloring, and satisfiability: second Dimacs implementation challenge. Online: http://dimacs.rutgers.edu/Challenges/. Accessed March 2007.
- Erdös, P., & Rényi, A. (1959). On random graphs. Publicationes Mathematicae, 6, 290–297. Google Scholar
- Grossman, J., Ion, P., & Castro, R. D. (1995). The Erdös number project. Online: http://www.oakland.edu/enp/. Accessed March 2007.
- IBM Corporation (2010). IBM ILOG CPLEX Optimizer 12.2. http://www.ibm.com/software/integration/optimization/cplex-optimizer/. IBM Academic Initiative. Accessed June 2011.
- Kortsarz, G., & Peleg, D. (1993). On choosing a dense subgraph. In Proceedings of the 34th annual IEEE symposium on foundations of computer science (pp. 692–701). Piscataway: IEEE Comput. Soc. Google Scholar
- Kreher, D. L., & Stinson, D. R. (1998). Combinatorial algorithms: generation, enumeration, and search (1st ed.). Boca Raton: CRC Press. Google Scholar
- Pei, J., Jiang, D., & Zhang, A. (2005a). Mining cross-graph quasi-cliques in gene expression and protein interaction data. In Proceedings of the 21st international conference on data engineering. ICDE 2005 (pp. 353–356). Google Scholar
- Peng, X., Langston, M. A., Saxton, A. M., Baldwin, N. E., & Snoddy, J. R. (2007). Detecting network motifs in gene co-expression networks through integration of protein domain information. In P. McConnell, S. M. Lin, & P. Hurban (Eds.), Methods of microarray data analysis V (pp. 89–102). New York: Springer. CrossRefGoogle Scholar
- Simonite, T. (2011). Bracing for the data deluge. http://www.technologyreview.com/business/37506/. Accessed May 2011.
- West, D. (2001). Introduction to graph theory. Upper Saddle River: Prentice-Hall. Google Scholar