# Automatic Inference of Graph Transformation Rules Using the Cyclic Nature of Chemical Reactions

## Abstract

Graph transformation systems have the potential to be realistic models of chemistry, provided a comprehensive collection of reaction rules can be extracted from the body of chemical knowledge. A first key step for rule learning is the computation of atom-atom mappings, i.e., the atom-wise correspondence between products and educts of all published chemical reactions. This can be phrased as a maximum common edge subgraph problem with the constraint that transition states must have cyclic structure. We describe a search tree method well suited for small edit distance and an integer linear program best suited for general instances and demonstrate that it is feasible to compute atom-atom maps at large scales using a manually curated database of biochemical reactions as an example. In this context we address the network completion problem.

## Keywords

Chemistry Atom-atom mapping Maximum common edge subgraph Integer linear programming Network completion## Notes

### Acknowledgments

This work was supported in part by the Volkswagen Stiftung proj. no. I/82719, and the COST-Action CM1304 “Systems Chemistry” and by the Danish Council for Independent Research, Natural Sciences.

## Supplementary material

## References

- 1.Akutsu, T.: Efficient extraction of mapping rules of atoms from enzymatic reaction data. J. Comp. Biol.
**11**, 449–462 (2004)CrossRefGoogle Scholar - 2.Akutsu, T., Tamura, T.: A polynomial-time algorithm for computing the maximum common connected edge subgraph of outerplanar graphs of bounded degree. Algorithms
**6**(1), 119 (2013)MathSciNetCrossRefGoogle Scholar - 3.Andersen, J.L., Flamm, C., Merkle, D., Stadler, P.F.: 50 shades of rule composition. In: Fages, F., Piazza, C. (eds.) FMMB 2014. LNCS, vol. 8738, pp. 117–135. Springer, Heidelberg (2014)Google Scholar
- 4.Bahiense, L., Mani, G., Piva, B., de Souza, C.C.: The maximum common edge subgraph problem: a polyhedral investigation. Discrete Appl. Math.
**160**(18), 2523–2541 (2012). v Latin American Algorithms, Graphs, and Optimization Symposium Gramado, Brazil, 2009MathSciNetCrossRefzbMATHGoogle Scholar - 5.Benkö, G., Flamm, C., Stadler, P.F.: A graph-based toy model of chemistry. J. Chem. Inf. Comput. Sci.
**43**, 1085–1093 (2003). presented at MCC 2002, Dubrovnik CRO, June 2002; SFI # 02–09-045CrossRefGoogle Scholar - 6.Biggs, M.B., Papin, J.A.: Metabolic network-guided binning of metagenomic sequence fragments. Bioinformatics (2015)Google Scholar
- 7.Breitling, R., Vitkup, D., Barrett, M.P.: New surveyor tools for charting microbial metabolic maps. Nat. Rev. Microbiol.
**6**, 156–161 (2008)CrossRefGoogle Scholar - 8.Burkard, R., ela, E., Pardalos, P., Pitsoulis, L.: The quadratic assignment problem. In: Du, D.Z., Pardalos, P. (eds.) Handbook of Combinatorial Optimization, pp. 1713–1809. Springer, US (1999)Google Scholar
- 9.Chen, W.L., Chen, D.Z., Taylor, K.T.: Automatic reaction mapping and reaction center detection. WIREs Comput. Mol. Sci.
**3**, 560–593 (2013)CrossRefGoogle Scholar - 10.Cordella, L.P., Pasquale, F., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell.
**26**(10), 1367–1372 (2004)CrossRefGoogle Scholar - 11.Crabtree, J., Mehta, D., Kouri, T.: An open-source java platform for automated reaction mapping. J. Chem. Inf. Model
**50**, 1751–1756 (2010)CrossRefGoogle Scholar - 12.Degenhardt, J., Köllner, T.G., Gershenzon, J.: Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochem
**70**, 1621–1637 (2009)CrossRefGoogle Scholar - 13.Ehrlich, H.C., Rarey, M.: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput. Mol. Sci.
**1**, 68–79 (2011). doi: 10.1002/wcms.5 CrossRefGoogle Scholar - 14.Feist, A.M., Herrgøard, M.J., Thiele, I., Reed, J.L., Palsson, B.Ø.: Reconstruction of biochemical networks in microorganisms. Nat. Rev. Microbiol.
**7**, 129–143 (2009)CrossRefGoogle Scholar - 15.First, E.L., Gounaris, C.E., Floudas, C.A.: Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J. Chem. Inf. Model
**52**, 84–92 (2012)CrossRefGoogle Scholar - 16.Fujita, S.: Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts. J. Chem. Inf. Comput. Sci.
**26**, 205–212 (1986)CrossRefGoogle Scholar - 17.Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl.
**13**(1), 113–129 (2010)MathSciNetCrossRefGoogle Scholar - 18.Hendrickson, J.B.: Comprehensive system for classification and nomenclature of organic reactions. J. Chem. Inf. Comput. Sci.
**37**, 852–860 (1997)CrossRefGoogle Scholar - 19.Herges, R.: Organizing principle of complex reactions and theory of coarctate transition states. Angew. Chem. Int. Ed.
**33**, 255–276 (1994)CrossRefGoogle Scholar - 20.Jeltsch, E., Kreowski, H.J.: Grammatical inference based on hyperedge replacement. In: Ehrig, H., Kreowski, H.-J., Rozenberg, G. (eds.) Graph Grammars 1990. LNCS, vol. 532, pp. 461–474. Springer, Heidelberg (1991)CrossRefGoogle Scholar
- 21.Justice, D., Hero, A.: A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell.
**28**(8), 1200–1214 (2006)CrossRefGoogle Scholar - 22.Latendresse, M., Malerich, J.P., Travers, M., Karp, P.D.: Accurate atom-mapping computation for biochemical reactions. J. Chem. Inf. Model
**52**, 2970–2982 (2012)CrossRefGoogle Scholar - 23.Mann, M., Nahar, F., Schnorr, N., Backofen, R., Stadler, P.F., Flamm, C.: Atom mapping with constraint programming. Alg. Mol. Biol.
**9**, 23 (2014)CrossRefGoogle Scholar - 24.Morgat, A., Axelsen, K.B., Lombardot, T., Alcntara, R., Aimo, L., Zerara, M., Niknejad, A., Belda, E., Hyka-Nouspikel, N., Coudert, E., Redaschi, N., Bougueleret, L., Steinbeck, C., Xenarios, I., Bridge, A.: Updates in rhea a manually curated resource of biochemical reactions. Nucleic Acids Res.
**43**(D1), 459–464 (2015)CrossRefGoogle Scholar - 25.Prigent, S., Collet, G., Dittami, S.M., Delage, L., Ethis de Corny, F., Dameron, O., Eveillard, D., Thiele, S., Cambefort, J., Boyen, C., Siegel, A., Tonon, T.: The genome-scale metabolic network of Ectocarpus siliculosus (EctoGEM): a resource to study brown algal physiology and beyond. Plant J.
**80**(2), 367–381 (2014)CrossRefGoogle Scholar - 26.Raymond, J.W., Willett, P.: Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput. Aided Mol. Des.
**16**(7), 521–533 (2002)CrossRefGoogle Scholar - 27.Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput.
**27**(7), 950–959 (2009). 7th IAPR-TC15 Workshop on Graph-based Representations (GbR 2007)CrossRefGoogle Scholar - 28.Schaub, T., Thiele, S.: Metabolic network expansion with answer set programming. In: Hill, P.M., Warren, D.S. (eds.) ICLP 2009. LNCS, vol. 5649, pp. 312–326. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 29.Veblen, O.: An application of modular equations in analysis situs. Ann. Math.
**14**, 86–94 (1912)MathSciNetCrossRefzbMATHGoogle Scholar - 30.Warr, W.A.: A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform.
**33**, 469–476 (2014)CrossRefGoogle Scholar - 31.Wittig, U., Rey, M., Kania, R., Bittkowski, M., Shi, L., Golebiewski, M., Weidemann, A., Müller, W., Rojas, I.: Challenges for an enzymatic reaction kinetics database. FEBS J.
**281**, 572–582 (2014)CrossRefGoogle Scholar - 32.Yadav, M.K., Kelley, B.P., Silverman, S.M.: The potential of a chemical graph transformation system. In: Ehrig, H., Engels, G., Parisi-Presicce, F., Rozenberg, G. (eds.) ICGT 2004. LNCS, vol. 3256, pp. 83–95. Springer, Heidelberg (2004)CrossRefGoogle Scholar
- 33.Yoder, R.A., Johnston, J.N.: A case study in biomimetic total synthesis: polyolefin carbocyclizations to terpenes and steroids. Chem. Rev.
**105**, 4730–4756 (2005)CrossRefGoogle Scholar