Abstract
We present a fast algorithm for finding large common subgraphs, which can be exploited for detecting structural and functional relationships between biological macromolecules. Many fast algorithms exist for finding a single maximum common subgraph. We show with an example that this gives limited information, motivating the less studied problem of finding many large common subgraphs covering different areas. As the latter is also hard, we give heuristics that improve performance by several orders of magnitude. As a case study, we validate our findings experimentally on protein graphs with thousands of atoms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Artymiuk, P., Poirrette, A., Grindley, H., Rice, D., Willett, P.: A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J Mol. Biol. 243(2), 327–344 (1994)
Artymiuk, P., Spriggs, R., Willett, P.: Graph theoretic methods for the analysis of structural relationships in biological macromolecules. J. AM. Soc. Inf. Sci. Technol. 56(5), 518–528 (2005)
Avis, D., Fukuda, K.: Reverse search for enumeration. Discrete Appl. Math. 65(1), 21–46 (1996)
Bonchev, D.: Chemical Graph Theory: Introduction and Fundamentals. CRC Press, Boca Raton (1991)
Brint, A., Willett, P.: Algorithms for the identification of three-dimensional maximal common substructures. J. Chem. Inf. Comput. Sci. 27(4), 152–158 (1987)
Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)
Brun, L., Gaüzère, B., Fourey, S.: Relationships between graph edit distance and maximal common unlabeled subgraph. Technical report, HAL Id: hal-00714879, July 2012
Cao, Y., Charisi, A., Cheng, L., Jiang, T., Girke, T.: ChemmineR: a compound mining framework for R. Bioinformatics 24(15), 1733–1734 (2008)
Cao, Y., Jiang, T., Girke, T.: A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics 24(13), i366–i374 (2008)
Carraghan, R., Pardalos, P.: An exact algorithm for the maximum clique problem. Oper. Res. Lett. 9(6), 375–382 (1990)
Conte, A., Grossi, R., Marino, A., Versari, L.: Sublinear-space bounded-delay enumeration for massive network analytics: maximal cliques. In: ICALP (2016)
Conte, D., Foggia, P., Vento, M.: Challenging complexity of maximum common subgraph detection algorithms: a performance analysis of three algorithms on a wide database of graphs. J. Graph Algorithms Appl. 11(1), 99–143 (2007)
Holder, L.: PDB-to-graph program (2015). https://github.com/mikeizbicki/datasets/tree/master/graph/pdb2graph. Accessed 04 May 2016
Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 581–586. ACM (2004)
Kann, V.: On the approximability of the maximum common subgraph problem. In: Finkel, A., Jantzen, M. (eds.) STACS 1992. LNCS, vol. 577, pp. 375–388. Springer, Heidelberg (1992). doi:10.1007/3-540-55210-3_198
Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1), 1–30 (2001)
Koch, I., Lengauer, T., Wanke, E.: An algorithm for finding maximal common subtopologies in a set of protein structures. J. Comput. Biol. 3(2), 289–306 (1996)
Krissinel, E., Henrick, K.: Common subgraph isomorphism detection by backtracking search. Softw.: Pract. Experience 34(6), 591–607 (2004)
Levi, G.: A note on the derivation of maximal common subgraphs of two directed or undirected graphs. CALCOLO 9(4), 341–352 (1973)
Mcgregor, J.: Backtrack search algorithm and the maximal common subgraph problem. Softw. Pract. Experience 12, 23–34 (1982)
Raymond, J., Gardiner, E., Willett, P.: Rascal: calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45, 2002 (2002)
Sheridan, R., Kearsley, S.: Why do we need so many chemical similarity search methods? Drug Discov. Today 7(17), 903–911 (2002)
Suters, W.H., Abu-Khzam, F.N., Zhang, Y., Symons, C.T., Samatova, N.F., Langston, M.A.: A new approach and faster exact methods for the maximum common subgraph problem. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 717–727. Springer, Heidelberg (2005). doi:10.1007/11533719_73
Ullmann, J.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)
Van Berlo, R., Winterbach, W., De Groot, M., Bender, A., Verheijen, P., Reinders, M., de Ridder, D.: Efficient calculation of compound similarity based on maximum common subgraphs and its application to prediction of gene transcript levels. Int. J. Bioinform. Res. Appl. 9(4), 407–432 (2013)
Versari, L.: Ricerca veloce di pattern comuni a due grafi. Master’s thesis, University of Pisa, Pisa, Bachelor Thesis (in Italian), University of Pisa (2015)
Wang, T., Zhou, J.: EMCSS: a new method for maximal common substructure search. J. Chem. Inf. Comput. Sci. 37(5), 828–834 (1997)
Welling, R.: A performance analysis on maximal common subgraph algorithms. In: 15th Twente Student Conference on IT, University of Twente, The Netherlands (2011)
Acknowledgments
Work partially supported by projects MIUR PRIN 2012C4E3KT (all authors except LT, LV) and UNIPI PRA_2015_0058 (authors RG, LT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Conte, A., Grossi, R., Marino, A., Tattini, L., Versari, L. (2017). A Fast Algorithm for Large Common Connected Induced Subgraphs. In: Figueiredo, D., Martín-Vide, C., Pratas, D., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2017. Lecture Notes in Computer Science(), vol 10252. Springer, Cham. https://doi.org/10.1007/978-3-319-58163-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-58163-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58162-0
Online ISBN: 978-3-319-58163-7
eBook Packages: Computer ScienceComputer Science (R0)