Knowledge-Guided Maximal Clique Enumeration

  • Steve Harenberg
  • Ramona G. Seay
  • Gonzalo A. Bello
  • Rada Y. Chirkova
  • P. Murali Doraiswamy
  • Nagiza F. Samatova
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10086)

Abstract

Maximal clique enumeration is a long-standing problem in graph mining and knowledge discovery. Numerous classic algorithms exist for solving this problem. However, these algorithms focus on enumerating all maximal cliques, which may be computationally impractical and much of the output may be irrelevant to the user. To address this issue, we introduce the problem of knowledge-biased clique enumeration, a query-driven formulation that reduces output space, computation time, and memory usage. Moreover, we introduce a dynamic state space indexing strategy for efficiently processing multiple queries over the same graph. This strategy reduces redundant computations by dynamically indexing the constituent state space generated with each query. Experimental results over real-world networks demonstrate this strategy’s effectiveness at reducing the cumulative query-response time. Although developed in the context of maximal cliques, our techniques could possibly be generalized to other constraint-based graph enumeration tasks.

References

  1. 1.
  2. 2.
    Abello, J., Resende, M.G.C., Sudarsky, S.: Massive Quasi-Clique detection. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 598–612. Springer, Heidelberg (2002). doi:10.1007/3-540-45995-2_51 CrossRefGoogle Scholar
  3. 3.
    Angles, R.: A comparison of current graph database models. In: Workshops Proceedings of the IEEE 28th International Conference on Data Engineering, ICDE, pp. 171–177 (2012)Google Scholar
  4. 4.
    Bast, H., Funke, S., Matijevic, D., Sanders, P., Schultes, D.: In transit to constant time shortest-path queries in road networks. In: Proceedings of the Nine Workshop on Algorithm Engineering and Experiments, ALENEX (2007)Google Scholar
  5. 5.
    Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)CrossRefMATHGoogle Scholar
  6. 6.
    Creamer, G., Rowe, R., Hershkop, S., Stolfo, S.J.: Segmentation and automated social hierarchy detection through email network analysis. In: Advances in Web Mining and Web Usage Analysis, 9th International Workshop on Knowledge Discovery on the Web, WebKDD 2007, pp. 40–58 (2007)Google Scholar
  7. 7.
    Cui, W., Xiao, Y., Wang, H., Lu, Y., Wang, W.: Online search of overlapping communities. In: Proceedings of the International Conference on Management of Data, SIGMOD 2013, pp. 277–288 (2013)Google Scholar
  8. 8.
    Cui, W., Xiao, Y., Wang, H., Wang, W.: Local search of communities in large graphs. In: International Conference on Management of Data, SIGMOD, pp. 991–1002 (2014)Google Scholar
  9. 9.
    Davis, A.P., Murphy, C.G., Johnson, R., Lay, J.M., Lennon-Hopkins, K., Saraceni-Richards, C.A., Sciaky, D., King, B.L., Rosenstein, M.C., Wiegers, T.C., Mattingly, C.J.: The comparative toxicogenomics database: update 2013. Nucleic Acids Res. 41(D1), D1104–D1114 (2013)CrossRefGoogle Scholar
  10. 10.
    Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth, A., Lin, J., Minguez, P., Bork, P., von Mering, C., Jensen, L.J.: STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41(D1), D808–D815 (2013)CrossRefGoogle Scholar
  11. 11.
    Goldenberg, S.B., Shapiro, L.J.: Physical mechanisms for the association of El Niño and West African rainfall with Atlantic major hurricane activity. J. Clim. 9(6), 1169–1187 (1996)CrossRefGoogle Scholar
  12. 12.
    Iordanov, B.: HyperGraphDB: a generalized graph database. In: Web-Age Information Management, pp. 25–36 (2010)Google Scholar
  13. 13.
    Jia, L., Ye, J., Haiyan, L.V., Wang, W., Zhou, C., Zhang, X., Xu, J., Wang, L., Jia, J.: Genetic association between polymorphisms of pen2 gene and late onset Alzheimer’s disease in the north chinese population. Brain Res. 1141, 10–14 (2007)CrossRefGoogle Scholar
  14. 14.
    Klotzbach, P.J.: On the Madden-Julian oscillation-Atlantic hurricane relationship. J. Clim. 23(2), 282–293 (2010)CrossRefGoogle Scholar
  15. 15.
    Modani, N., Dey, K.: Large maximal cliques enumeration in large sparse graphs. In: Proceedings of the 15th International Conference on Management of Data (2009)Google Scholar
  16. 16.
    Moon, J.W., Moser, L.: On cliques in graphs. Israel J. Math. 3(1), 23–28 (1965)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Sozio, M., Gionis, A.: The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th International Conference on Knowledge Discovery and Data Mining, KDD. pp. 939–948 (2010)Google Scholar
  18. 18.
    Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science. Stat. Anal. Data Min. 4(5), 497–511 (2011)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Tsourakakis, C.E., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.A.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: The 19th International Conference on Knowledge Discovery and Data Mining, KDD, pp. 104–112 (2013)Google Scholar
  20. 20.
    Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference, p. 42 (2010)Google Scholar
  21. 21.
    Vimont, D.J., Kossin, J.P.: The Atlantic meridional mode and hurricane activity. Geophysical Research Letters 34(7) (2007)Google Scholar
  22. 22.
    Wei, F.: TEDI: efficient shortest path query answering on graphs. In: Graph Data Management: Techniques and Applications., pp. 214–238 (2011)Google Scholar
  23. 23.
    Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of The First International Workshop on Semantic Web and Databases, SWDB, pp. 131–150 (2003)Google Scholar
  24. 24.
    Xiao, Y., Wu, W., Pei, J., Wang, W., He, Z.: Efficiently indexing shortest paths by exploiting symmetry in graphs. In: 12th International Conference on Extending Database Technology, EDBT, pp. 493–504 (2009)Google Scholar
  25. 25.
    Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: 12th International Conference on Data Mining, ICDM, pp. 745–754 (2012)Google Scholar
  26. 26.
    Zhang, B., Park, B.H., Karpinets, T., Samatova, N.F.: From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24(7), 979–986 (2008)CrossRefGoogle Scholar
  27. 27.
    Zhu, F., Yan, X., Han, J., Yu, P.S.: gPrune: a constraint pushing framework for graph pattern mining. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 388–400. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71701-0_38 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Steve Harenberg
    • 1
    • 2
  • Ramona G. Seay
    • 1
    • 2
  • Gonzalo A. Bello
    • 1
    • 2
  • Rada Y. Chirkova
    • 1
  • P. Murali Doraiswamy
    • 3
  • Nagiza F. Samatova
    • 1
    • 2
  1. 1.North Carolina State UniversityRaleighUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA
  3. 3.Duke UniversityDurhamUSA

Personalised recommendations