# Enumerating and indexing many-body intramolecular interactions: a graph theoretic approach

## Abstract

The central idea observes a recursive mapping of \(n\)-body intramolecular interactions to \((n+1)\)-body terms that is consistent with the molecular topology. Iterative application of the line graph transformation is identified as a natural and elegant tool to accomplish the recursion. The procedure readily generalizes to arbitrary \(n\)-body potentials. In particular, the method yields a complete characterization of \(4\)-body interactions. The hierarchical structure of atomic index lists for each interaction order \(n\) is compactly expressed as a directed acyclic graph. A pseudo-code description of the generating algorithm is given. With suitable data structures (e.g., edge lists or adjacency matrices), automatic enumeration and indexing of \(n\)-body interactions can be implemented straightforwardly to handle large bio-molecular systems. Explicit examples are discussed, including a chemically relevant effective potential model of taurocholate bile salt.

## Keywords

Computer simulation Molecular modelling Graph theory## Mathematics Subject Classification

82-08 05A15 94C15## 1 Introduction

The implementation of computer code for realistically simulating the configurations and motion of molecular objects requires modelling of many-body through-bond interactions. In turn, it is necessary to identify the participating atoms in each interaction. This paper presents a novel and exhaustive enumeration procedure exploiting the line graph transformation of the graph that encodes the molecular structure. In principle, by virtue of the recursive nature of the algorithm, straightforward extension to arbitrarily high order interactions is possible.

Application of graph theoretic methods to the general study of molecular structure, and to equilibrium statistical mechanics in particular, are not new. Significant examples include the development of molecular branching rules [1], the enumeration of isomers and the definition of topological indexes [2], as well as the analysis of discrete lattice models [3] and Mayer’s cluster decomposition of the \(2\)-body configuration integral [4].

*define*the molecular framework, while the higher order terms in (1) serve to model the more or less restricted molecular flexibility associated with bond hybridization or electronic delocalisation (e.g., aromaticity and resonance structures). The selection of these higher order potentials is often based on chemical intuition and are typically supplied to simulation software as user defined input. Mature and widely used simulation packages (e.g., GROMACS, LAMMPS, NAMD, etc.) invariably support highly optimized force fields (e.g., CHARMM, AMBER, OPLS, MMFF, etc.) that faithfully represent detailed atomistic structures. An alternative approach is to input only the molecular topology, then systematically generate all possible many-body index lists from this information and invite the user to select non-zero force constants and appropriate functional forms for the required potentials. This paradigm is more natural for the implementation of coarse-grained models derived by thermodynamic considerations (e.g., MARTINI [11]). To facilitate this semi-automatic procedure, a graph theoretic construction is developed here to exhaustively enumerate and index arbitrary \(n\)-body intramolecular interactions starting from the description of \(2\)-body adjacency.

The central idea in this work observes the correspondence between the hierarchy of \(n\)-body intramolecular interactions and iteration of the line graph [12] transformation \(L(G)\) on a connected “molecular” graph \(G\).

## 2 Graph theory

*signless*Laplace matrix of \(G\) is [16]

We recall the following definitions and terminology. A cycle graph \(C_{r}\) comprises \(p=r\) vertices, all of degree \(2\), connected in a closed chain by \(q=r\) edges. Removing a single edge produces a path graph \(P_{r}\) (of order \(p=r\) and size \(q=r-1\)) with two terminal vertices of degree \(1\). The complete graph \(K_{r}\) on \(p=r\) vertices is maximally connected with \(q={\textstyle {\frac{1}{2}}}r(r-1)\) edges such that \(\bigl \{v_{i},\ v_{j}\bigr \}\in E\bigl (K_{r}\bigr )\) for all distinct \(v_{i},\ v_{j}\in V\bigl (K_{r}\bigr )\). A graph \(G=\bigl (V(G),\ E(G)\bigr )\) is \(k\)-partite if the vertices can be partitioned into \(k\) disjoint sets, so that \(V(G)=\cup _{r=1}^{k}V_{r}(G)\) where \(V_{r}(G)\cap V_{s}(G)=\emptyset \) for \(r\ne s\). The complete bipartite graph \((k=2)\) is denoted \(K_{m,n}\) with \( V\bigl (K_{m,n}\bigr )=V_{1}\bigl (K_{m,n}\bigr )\cup V_{2}\bigl (K_{m,n}\bigr ) \) and size \(p=m+n\) such that \(m=\bigl |V_{1}\bigl (K_{m,n}\bigr )\bigr |\) and \(n=\bigl |V_{2}\bigl (K_{m,n}\bigr )\bigr |\).

We will also have occasion to consider directed graphs \(G=\bigl (V(G),\ A(G)\bigr )\) where edges are replaced by arrows specified by ordered pairs \(\bigl (v_{i},\ v_{j}\bigr )\in A(G)\) and oriented with the tail at vertex \(v_{i}\) pointing towards the head at vertex \(v_{j}\). Associated with each vertex \(v_{i}\in V(G)\) are two disjoint neighbor sets \(S_{i}^{-}(G)=\bigl \{v_{j}:\,\bigl (v_{i},\ v_{j}\bigr )\in A(G)\bigr \}\) and \(S_{i}^{+}(G)=\bigl \{v_{j}\bigl (v_{j},\ v_{i}\bigr )\in A(G)\bigr \}\) where \(S_{i}=S_{i}^{-}\cup S_{i}^{+}\). At most one of \(S_{i}^{-}\) or \(S_{i}^{+}\) may be empty. The indegree of vertex \(v_{i}\in V(G)\) is \(\mathrm{deg}^{-}\bigl (v_{i}\bigr )=\bigl |S_{i}^{-}(G)\bigr |\) and the outdegree is \(\mathrm{deg}^{+}\bigl (v_{i}\bigr )=\bigl |S_{i}^{+}(G)\bigr |\). If \(S_{i}^{-}=\emptyset \) so that \(\mathrm{deg}^{-}\bigl (v_{i}\bigr )=0\) then vertex \(v_{i}\) is called a source and, similarly, if \(S_{i}^{+}=\emptyset \) so that \(\mathrm{deg}^{+}\bigl (v_{i}\bigr )=0\) then vertex \(v_{i}\) is called a sink.

## 3 Intramolecular interactions

- 1.
if \(G\cong C_{r}\) (a cycle graph on \(r\) vertices), then \(L^{n}(G)\cong G\) for all \(n\in {\mathbb {N}}\) (cycle graphs are the only connected graphs for which \(L(G)\) is isomorphic to \(G\));

- 2.
if \(G\cong K_{1,3}\) (the complete bipartite “claw” graph), then \(L^{n}(G)\cong C_{3}\) (a triangle) for all \(n\in {\mathbb {N}}\);

- 3.
if \(G\cong P_{r}\) (a path graph on \(r\) vertices), then \(L^{n}(G)\cong P_{\max \{0,r-n\}}\) so each subsequent graph is a shorter path until eventually the sequence terminates at the trivial null graph;

- 4.otherwise, \(G\) is a “prolific” graph [20] so that the sizes of the graphs in the sequence eventually increase without bound,$$\begin{aligned} \left| V\bigl (L^{n}(G)\bigr ) \right| \rightarrow \infty \quad \text {as} \quad n \rightarrow \infty .\ \end{aligned}$$

### 3.1 Enumeration

### 3.2 Indexing

## 4 Examples

### 4.1 A toy model: methylcyclopropane

For added clarity, the edge index associated with each adjacent vertex pair is indicated by the superscript. From these matrices, the DAG obtained that represents the line graph hierarchy is shown in Fig. 2. The atomic index sequences automatically generated on the DAG, particularly at the \(4\)-body level \((n=4)\), confirm the informal analysis of the molecular graph. It is easy to show that the complete graph on five vertices \(K_{5}\) is a minor of \(L^{3}(G)\), whence it follows from the theorem of Wagner [23] that \(L^{3}(G)\) is nonplanar. All of \(G\), \(L(G)\) and \(L^{2}(G)\) are manifestly planar (see Fig. 2) so the line index \(\xi (G)=3\) in accord with the result of Ghebleh and Khatirinejad [21].

### 4.2 Bile salt: taurocholate

Vertex sequences associated with the directed acyclic graph (DAG) description of the line graph hierarchy for the effective bile salt model of Vila Verde and Frenkel [24]

\(N_{n}\) | \(n=1\) | \(n=2\) | \(n=3\) | \(n=4\) | |
---|---|---|---|---|---|

1 | 1 | (1, 2) | ((1, 2), (2, 3)) | (((1, 2), (2, 3)), ((1, 2), (2, 4))) | i |

2 | 2 | (2, 3) | ((1, 2), (2, 4)) | (((1, 2), (2, 3)), ((2, 3), (2, 4))) | i |

3 | 3 | (2, 4) | ((2, 3), (2, 4)) | (((1, 2), (2, 4)), ((2, 3), (2, 4))) | i |

4 | 4 | (3, 5) | ((2, 3), (3, 5)) | (((1, 2), (2, 3)), ((2, 3), (3, 5))) | p |

5 | 5 | (4, 5) | ((2, 4), (4, 5)) | (((2, 3), (2, 4)), ((2, 3), (3, 5))) | p |

6 | 6 | (5, 6) | ((3, 5), (4, 5)) | (((1, 2), (2, 4)), ((2, 4), (4, 5))) | p |

7 | 7 | (6, 7) | ((3, 5), (5, 6)) | (((2, 3), (2, 4)), ((2, 4), (4, 5))) | p |

8 | 8 | (7, 8) | ((4, 5), (5, 6)) | (((2, 3), (3, 5)), ((3, 5), (4, 5))) | p |

9 | 9 | (8, 9) | ((5, 6), (6, 7)) | (((2, 4), (4, 5)), ((3, 5), (4, 5))) | p |

10 | 10 | (1, 10) | ((6, 7), (7, 8)) | (((2, 3), (3, 5)), ((3, 5), (5, 6))) | p |

11 | 11 | (3, 11) | ((7, 8), (8, 9)) | (((3, 5), (4, 5)), ((3, 5), (5, 6))) | i |

12 | 12 | (4, 12) | ((1, 2), (1, 10)) | (((2, 4), (4, 5)), ((4, 5), (5, 6))) | p |

13 | ((2, 3), (3, 11)) | (((3, 5), (4, 5)), ((4, 5), (5, 6))) | i | ||

14 | ((3, 5), (3, 11)) | (((3, 5), (5, 6)), ((4, 5), (5, 6))) | i | ||

15 | ((2, 4), (4, 12)) | (((3, 5), (5, 6)), ((5, 6), (6, 7))) | p | ||

16 | ((4, 5), (4, 12)) | (((4, 5), (5, 6)), ((5, 6), (6, 7))) | p | ||

17 | (((5, 6), (6, 7)), ((6, 7), (7, 8))) | p | |||

18 | (((6, 7), (7, 8)), ((7, 8), (8, 9))) | p | |||

19 | (((1, 2), (2, 3)), ((1, 2), (1, 10))) | p | |||

20 | (((1, 2), (2, 4)), ((1, 2), (1, 10))) | p | |||

21 | (((1, 2), (2, 3)), ((2, 3), (3, 11))) | p | |||

22 | (((2, 3), (2, 4)), ((2, 3), (3, 11))) | p | |||

23 | (((2, 3), (3, 5)), ((2, 3), (3, 11))) | i | |||

24 | (((2, 3), (3, 5)), ((3, 5), (3, 11))) | i | |||

25 | (((3, 5), (4, 5)), ((3, 5), (3, 11))) | p | |||

26 | (((3, 5), (5, 6)), ((3, 5), (3, 11))) | p | |||

27 | (((2, 3), (3, 11)), ((3, 5), (3, 11))) | i | |||

28 | (((1, 2), (2, 4)), ((2, 4), (4, 12))) | p | |||

29 | (((2, 3), (2, 4)), ((2, 4), (4, 12))) | p | |||

30 | (((2, 4), (4, 5)), ((2, 4), (4, 12))) | i | |||

31 | (((2, 4), (4, 5)), ((4, 5), (4, 12))) | i | |||

32 | (((3, 5), (4, 5)), ((4, 5), (4, 12))) | p | |||

33 | (((4, 5), (5, 6)), ((4, 5), (4, 12))) | p | |||

34 | (((2, 4), (4, 12)), ((4, 5), (4, 12))) | i |

## 5 Conclusion

The line graph transformation provides a practical and elegant theoretical tool for exhaustively enumerating and indexing many-body intramolecular interactions. Given a suitable graphical representation of a molecular structure, an explicit pseudo-code implementation of the recursive line graph algorithm is given for automatically generating complete canonical lists of atomic indexes associated with each interaction order. No attempt has been made to computationally optimize this algorithm or the associated data structures. Instead, clarity of exposition is the main objective here. We anticipate the main application will involve embedding the algorithm within a Monte Carlo or Molecular Dynamics simulation code where other implementation details will determine the most efficient realization. In accord with common practice, intramolecular interactions up to order \(4\) have been considered (bonds, bends and dihedrals), but the method can be extended to arbitrarily many atomic centers. Higher order interactions will involve increasingly many sub-type variations and polycyclic structures. Two specific examples are discussed: a toy model of methylcyclopropane and a published effective potential model of taurocholate bile salt [24] that is relevant for the study of digestive processes in the human lower gastrointestinal tract.

## Notes

### Acknowledgments

This work was financially supported by the Biotechnology and Biological Sciences Research Council through its core strategic grant to the Institute of Food Research. We also thank Andrew Watson for critical reading of the manuscript.

## References

- 1.D. Bonchev, J. Mol. Struct. (Theochem)
**336**, 137 (1995)CrossRefGoogle Scholar - 2.D.H. Rouvray, A.T. Balaban, in
*Applications of Graph Theory*, ed. by R.J. Wilson, L.W. Beinecke (Academic Press, London, 1979), p. 177Google Scholar - 3.H.N.V. Temperley, in
*Applications of Graph Theory*, ed. by R.J. Wilson, L.W. Beinecke (Academic Press, London, 1979), p. 121Google Scholar - 4.J.-P. Hanson, I.R. McDonald,
*Theory of Simple Liquids*, 2nd edn. (Academic Press, London, 1986), pp. 79–92Google Scholar - 5.B. Kirchner, Phys. Rep.
**440**, 1 (2007)CrossRefGoogle Scholar - 6.J. Hutter, WIRE: Comp. Mol. Sci.
**2**, 604 (2012)Google Scholar - 7.R.D. Kohn, A.D. Becke, R.G. Parr, J. Phys. Chem.
**100**, 12974 (1996)CrossRefGoogle Scholar - 8.R. Penfold, S. Abbas, S. Nordholm, Fluid Phase Equilib.
**120**, 39 (1996)CrossRefGoogle Scholar - 9.J.T. Padding, A.A. Louis, Phys. Rev. E
**74**, 031402 (2006)CrossRefGoogle Scholar - 10.M.P. Allen, in
*Computational Soft Matter: From Synthetic Polymers to Proteins*, ed. by N. Attig, K. Binder, H. Grubmüller, K. Kremer (John von Neumann Institute for Computing, Jülich, 2004), p. 1Google Scholar - 11.S.J. Marrink, H.J. Risselada, S. Yefimov, D.P. Tieleman, A.H. de Vries, J. Phys. Chem. B.
**111**, 7812 (2007)CrossRefGoogle Scholar - 12.R.L. Hemminger, L.W. Beinecke, in
*Selected Topics in Graph Theory*, ed. by L.W. Beinecke, R.J. Wilson (Academic Press, London, 1978), p. 271Google Scholar - 13.F. Harary,
*Graph Theory*(Addison-Wesley, Reading, 1969)Google Scholar - 14.J. Clark, D.A. Holton,
*First Look at Graph Theory*(World Scientific, Singapore, 1991)CrossRefGoogle Scholar - 15.G. Sabidussi, Math. Zeitschr
**76**, 385 (1961)CrossRefGoogle Scholar - 16.A.E. Brouwer, W.H. Haemers,
*Spectra of Graphs*(Springer, New York, 2012)CrossRefGoogle Scholar - 17.P.J. Flory,
*Statistical Mechanics of Chain Molecules*(Carl Hanser Verlag, Munich, 1989)Google Scholar - 18.R.E. Tuzun, D.W. Noid, B.G. Sumpter, J. Comp. Chem.
**18**, 1513 (1997)CrossRefGoogle Scholar - 19.A.C.M. van Rooij, H.S. Wilf, Acta Math. Hungar.
**16**, 263 (1965)CrossRefGoogle Scholar - 20.M. Knor, P. Potočnik, R. Škrekovski, Discrete Appl. Math.
**160**, 2234 (2012)CrossRefGoogle Scholar - 21.M. Ghebleh, M. Khatirinejad, Discrete Math.
**308**, 144 (2012)CrossRefGoogle Scholar - 22.R.J. Wilson,
*Introduction to Graph Theory*, 3rd edn. (Longman, Harlow, 1985)Google Scholar - 23.R. Diestel,
*Graph Theory*, 3rd edn. (Springer, Heidelberg, 2005)Google Scholar - 24.A. Vila Verde, D. Frenkel, Soft Matter
**6**, 3815 (2010)CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.