A new approach to the design of acyclic chemical compounds using skeleton trees and integer linear programming

Intelligent systems are applied in a wide range of areas, and computer-aided drug design is a highly important one. One major approach to drug design is the inverse QSAR/QSPR (quantitative structure-activity and structure-property relationship), for which a method that uses both artificial neural networks (ANN) and mixed integer linear programming (MILP) has been proposed recently. This method consists of two phases: a forward prediction phase, and an inverse, inference phase. In the prediction phase, a feature function f over chemical compounds is defined, whereby a chemical compound G is represented as a vector f(G) of descriptors. Following, for a given chemical property \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi$$\end{document}, using a dataset of chemical compounds with known values for property \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi$$\end{document}, a regressive prediction function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi$$\end{document} is computed by an ANN. It is desired that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (f(G))$$\end{document} takes a value that is close to the true value of property \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi$$\end{document} for the compound G for many of the compounds in the dataset. In the inference phase, one starts with a target value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^*$$\end{document} of the chemical property \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi$$\end{document}, and then a chemical structure \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^*$$\end{document} such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi (f(G^*))$$\end{document} is within a certain tolerance level of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^*$$\end{document} is constructed from the solution to a specially formulated MILP. This method has been used for the case of inferring acyclic chemical compounds. With this paper, we propose a new concept on acyclic chemical graphs, called a skeleton tree, and based on it develop a new MILP formulation for inferring acyclic chemical compounds. Our computational experiments indicate that our newly proposed method significantly outperforms the existing method when the diameter of graphs is up to 8. In a particular example where we inferred acyclic chemical compounds with 38 non-hydrogen atoms from the set {C, O, S} times faster.


Introduction
Computer-aided drug design is one of the most significant areas for application of recently developed methods in artificial intelligence. One particular approach that has attracted extensive studies is the inverse QSAR/QSPR (quantitative structure-activity and structure-property relationship) [16,23]. The task of QSAR/QSPR is to compute a regression function between the structure of chemical compounds and some chemical activity and/or property of interest. The structure of chemical compounds is commonly represented in the form of undirected graphs, and the regression function is computed by using statistical machine learning methods from a set of training data of pairs of known molecular compounds and their activities/properties. The inverse QSAR/QSPR then, given such a regression function, asks to infer the structure of a chemical compound that would exhibit certain activity or property, perhaps while obeying some additional constraints. A common method to the inverse QSAR/QSPR is to formulate an optimization problem that asks to find a chemical graph that maximizes or minimizes a particular objective function under various constraints.
Directly handling chemical graphs in statistical methods and machine learning methods poses a difficult challenge, and therefore it is common to represent chemical compounds by numerical vectors, called a set of descriptors, or a set of features. There have been several methods that have been developed for deriving graph structures that are optimal or close to optimal for a given objective function [10,16,20]. In addition to getting one solution that is optimal or close to optimal, it is often required to infer or enumerate graph structures that satisfy a given feature vector. There have been various methods developed for solving the task of enumerating graph structures [7,11,14,18]. In addition, the computational complexity of the enumeration task has been studied [1,17].

Related work
Undoubtedly Artificial Neural Networks (ANNs) and their application in deep learning have enjoyed unprecedentedly rapid progress recently. Applications of these technologies to the problem of the inverse QSAR/QSPR include variational autoencoders [8,15], recurrent neural networks [22,26], and grammar variational autoencoders [13]. These applications standardly involve training a neural network with a set of known compound/activity data. Then, the inverse QSAR/QSPR is solved by solving the problem of inverting a trained neural network, commonly done through statistical methods. However, one of the major drawbacks of statistical methods is that there is no guarantee that an obtained solution will be optimal or exact.
A recently proposed approach based on mixed integer linear programming (MILP) [3], comes with a mathematical guarantee for the optimality of the derived solution. Since the proposed method [3] relies on linear programming, the activation functions of the neurons in an ANN are represented as piece-wise linear functions, and therefore ReLU functions can be represented without any loss, whereas sigmoid functions must be approximated.
The MILP-based method for inverting trained ANNs [3] has recently been combined with methods for efficient enumeration of tree-like graphs, e.g., the algorithm proposed by Fujiwara et al. [7], into a two-phase framework for inverse QSAR/QSPR [4,6].
The first phase in the framework solves (I) PREDICTION PROBLEM, by constructing a prediction, or a regression, function by using an ANN N. In this phase, given a set of chemical compounds, that is, chemical graphs G, and known values a(G) for a certain chemical property , each chemical compound G in the set is represented by a feature vector f(G).
These feature vectors are used as inputs for training the ANN N , to obtain a prediction function N in such a way that a(G) is predicted as N ðf ðGÞÞ.
The second phase solves (II) INVERSE PROBLEM. Starting with a given a target value y Ã for a chemical property , in stage (II-a), a feature vector x Ã is computed based on the trained ANN N under the constraint that N ðx Ã Þ is within a certain tolerance range close to y Ã . Following, (II-b) a set of chemical structures G Ã is generated under the condition that f ðG Ã Þ ¼ x Ã . Stage (II-a) in the methods of a combined framework [4,6] is based on an MILP formulation, which incorporates the one due to Akutsu and Nagamochi [3]. In particular, the MILP formulation proposed by Azam et al. [4] guarantees that for a given trained ANN N and a desired target value y Ã , either: admits a corresponding chemical structure G Ã , or (ii) no chemical structure exists for the given target value y Ã when no feature vector is inferred from the ANN N .
Notable related works on the inverse QSAR/QSPR include the frameworks and results reported by Sumita et al. [24], as well as Takeda et al. [25]. However, there are certain drawbacks to these frameworks as compared to the combined framework described above [4,6].
The work due to Sumita et al. [24] is noteworthy since it is reported that the finally obtained structures have been synthesized and their properties experimentally tested. A major drawback of this approach is that it relies on a Monte-Carlo based simulation, which is reported to take on the order of days of computation time.
On the other hand, Takeda et al. [25] propose a framework for constructing a regression function, solving the inverse problem on the regression function to obtain the descriptors of a desired chemical compound, and enumerating several chemical compounds with some desired properties. In this work, the descriptors used as arguments to construct a regression function are general sub-structure frequency vectors, which is a disadvantage, since such descriptors are dependent on the features of the training set. As opposed to general sub-structures, the framework on which we build [4,6] uses graph-theoretical descriptors, which easily preserves explainability. Further, Takeda et al. [25] propose a custom-implemented gradient search method to solve the problem of inverting the regression function, which is not guaranteed to arrive at a globally optimal, and hence, exact solution. As opposed to that, using a solution to an MILP formulation offers an exact solution to the problem of inverting the regression function constructed by an ANN.

Our contribution
With this paper, we propose a new MILP formulation, which when included in the combined framework for the inverse QSAR/QSPR [4,6], serves the purpose to infer acyclic chemical compounds with a bounded degree. To this purpose, we introduce the concept of skeleton trees, which are trees with the maximum number of vertices for a given diameter and degree. Then, an acyclic chemical graph to be inferred is constructed as an induced subgraph of a skeleton tree.
The aim for introducing a new MILP formulation is due to the fact that solving an MILP is known to be a computationally difficult problem. Even though modern-day commercial solvers such as CPLEX [9] are highly effective in practice, our intuition is that there is room for improvement, especially by taking into account the special structure of acyclic chemical compounds with a limited degree. Here we note that chemical graphs with diameter up to 11 and degree at most 3, and diameter at most 8 and maximum degree equal to 4 account for about 35% and 18%, respectively, out of all acyclic chemical graphs with 200 or fewer non-hydrogen atoms registered in the PubChem chemical database. Further, those figures are about 63% and 40% with respect to the acyclic chemical graphs with 200 or fewer non-hydrogen atoms with degree at most 3 and maximum degree 4, respectively.
We report computation experiments comparing the performance of our new approach with the method due to Azam et al. [4] over several chemical properties. The results of our experiments, presented in Section 6, indicate that the new method proposed with this paper consistently outperforms the previous method [4] in terms of running time for target compounds with a limited number of chemical elements and a small diameter.

Preliminaries
Let the sets of real and non-negative integer numbers be denoted by R and Z, respectively. For two integers a and b, let [a, b] denote the closed interval between a and b, that is, the set of integers i with a i b.
Graphs Let H ¼ ðV; EÞ be a graph with a set V of vertices and a set E of edges. For a vertex v 2 V, let N H ðvÞ denote the set of neighbors of v in H. Then, the degree deg H ðvÞ of v is defined to be the size jN H ðvÞj of N H ðvÞ. We define the length of a path to be the number of edges in the path. The distance dist H ðu; vÞ between two vertices u; v 2 V is defined to be the minimum length of a path in H whose endpoints are u and v. The diameter diaðHÞ of H is defined to be the maximum distance between two vertices in H. The sum-distance smdtðHÞ of H is defined to be the sum of distances over all vertex pairs. Chemical graphs We represent the graph structure of a chemical compound in a hydrogen-suppressed model as a vertex-labeled multi-graph. Let Λ be a set of labels, and each label represent a chemical element, such as C (carbon), O (oxygen), N (nitrogen), etc. Since we work with hydrogensuppressed models, we assume that Λ does not contain H (hydrogen). For a chemical element a 2 Λ, let massðaÞ and valðaÞ denote its mass and valence, respectively. In our model, we round the ten-fold atomic mass value down to the nearest integer, i.e., we take mass Ã ðaÞ ¼ b10 Á massðaÞc, a 2 Λ. Let the set Λ of labels be totally-ordered based on the mass of the corresponding elements, and we write ab for chemical elements a; b 2 Λ with massðaÞmassðbÞ. For a tuple ¼ ða; b; kÞ 2 Λ Â Λ Â ½1; 3, let denote the tuple ðb; a; kÞ. Let Γ Λ Â Λ Â ½1; 3 be a set of tuples ¼ ða; b; kÞ such that ab, and set Γ > ¼ f j 2 Γ g, Γ ¼ ¼ fða; a; kÞ j a 2 Λ; k 2 ½1; 3g and Γ ¼ Γ [ Γ ¼ . We denote by a tuple ¼ ða; b; kÞ 2 Γ a pair of atoms with labels a and b which are connected by a bond of multiplicity k.
We define a chemical graphs in a hydrogen-suppressed model to be a tuple G ¼ ðH; ; Þ of a graph H ¼ ðV; EÞ, a mapping : V ! Λ and a mapping : E ! ½1; 3 such that the following conditions are satisfied: (i) H is connected; and (ii) for each vertex u 2 V it holds that P e¼uv2E ðeÞ val ððuÞÞ.
We note that nearly 55% of the acyclic chemical graphs with at most 200 non-hydrogen atoms that are registered in the chemical database PubChem 1 [12] have degree at most 3 in their hydrogen-suppressed model. Figure 1 illustrates an example of a chemical graph G ¼ ðH; ; Þ.
Descriptors To define feature vectors, we use only graph-theoretical descriptors. This choice serves our purpose to design an algorithm for constructing graphs. Henceforth, we define the feature vector f(G) of a chemical graph G ¼ ðH ¼ ðV; EÞ; ; Þ to be a numerical vector that consists of the following eight kinds of descriptors:

A method for inferring chemical graphs
We review the framework for the inverse QSAR/QSPR [4] that employs both ANNs and MILPs. The framework is schematically illustrated in Fig. 2. Let G be a given chemical compound, represented by a chemical graph G ¼ ðH; ; Þ, and let denote a specified chemical property such as boiling point. We denote by a(G) the observed value of the property for chemical compound G. In the first phase of the two-phase framework, we solve (I) PREDICTION PROBLEM for the inverse QSAR/QSPR through the following three steps, as schematically illustrated in Fig. 2. 1. Gather a dataset D ¼ fðG i ; aðG i ÞÞ j i ¼ 1; 2; . . . ; mg of pairs of a chemical graph G i and the value aðG i Þ. We fix two values a; a 2 R so that a aðG i Þ a,i ¼ 1; 2; . . . ; m. 2. Choose a class of graphs G to be a set of chemical graphs such that G fG i j i ¼ 1; 2; . . . ; mg. Introduce a feature function f : G ! R k for a positive integer k. We call f(G) the feature vector of G 2 G, and call each entry of vector f(G) a descriptor of G. 3. Using the dataset D, train an ANN N to construct a regression prediction function N that given a vector in x 2 R k , returns a real value N ðxÞ with a N ðxÞ a and such that N ðf ðGÞÞ takes a value nearly equal to a(G) for many of the chemical graphs in the dataset D.
In the second phase, we solve (II) INVERSE PROBLEM for the inverse QSAR/QSPR through the following two inference problems.

(II-a) Inference of Vectors
In order to tackle Problem (II-a), we use the following result.
Theorem 1 [3] Let N be an ANN with a piecewise-linear activation function for an input vector x 2 R k , n A denote the number of nodes in the architecture and n B denote the total number of break-points over all activation functions. Then there is an MILP Mðx; y; C 1 Þ that consists of variable vectors x 2 R k , y 2 R , and an auxiliary variable vector z 2 R p for some integer p ¼ Oðn A þ n B Þ and a set C 1 of Oðn A þ n B Þ constraints on these variables such that In addition, we introduce a variable vector g 2 R h , for some integer h, and a set C 2 of constraints on x and g such that ðx Ã ; g Ã Þ is feasible to the MILP Mðx; g; C 2 Þ if and only if g Ã forms a chemical graph G Ã 2 G with f ðG Ã Þ ¼ x Ã (see [4] for details). Finally, we note that by using MILPs, it is not difficult to introduce additional linear constraints or to fix some of the variables to specified constants.
To address Problem (II-b), we design a branch-and-bound algorithm, akin to the work of Fujiwara et al. [7] for enumerating acyclic chemical compounds.
The second phase comprises the following two steps.
G : a class of chemical

a: property function
Step 2 Step 1 Step 3 Step 4 Fig. 2 An illustration of a property function a, a feature function f, a prediction function N and an MILP that either delivers a vector ðx Ã ; g Ã Þ that forms a chemical graph G Ã 2 G such that N ðf ðG Ã ÞÞ ¼ y Ã (or að G Ã Þ ¼ y Ã ) or detects that no such chemical graph G Ã exists in G 4. Formulate Problem (II-a) as the above MILP Mðx; y; g; C 1 ; C 2 Þ taking into account the class G of graphs and the

Figure 2 illustrates Steps 4 and 5.
In the MILP formulation Mðx; g; C 2 Þ proposed by Azam et al. [4] in order to construct an acyclic chemical graphG Ã with n vertices, we choose as edges a subset of n À 1 vertex pairs from an n Â n adjacency matrix, that is, a subset of n À 1 edges from a complete graph K n on n vertices. In Section 4, we introduce an MILP formulation Mðx; g; C 2 Þ in which a graph G Ã is constructed as an induced subgraph of a larger acyclic graph, which we call "a skeleton tree," formally introduced in Section 4.

Skeleton trees
Before introducing our MILP formulation for inferring chemical graphs in Step 4 of the framework outlined in Section 3, we introduce the concept of skeleton trees. Based on this concept, we effectively reduce the number of variables and constraints, and thus the computational complexity and time needed to solve the formulation in practice.
For an integer D, let T ½D;3 (resp., T ½D;4 ) denote the set of trees H with diaðHÞ ¼ D and whose maximum degree is at most 3 (resp., equal to 4). We define the skeleton tree T y ½D;d , d 2 f3; 4g, to be a tree in T ½D;d with the maximum number of vertices. Let n max ðD; dÞ denote the number of vertices in T y ½D;d . Then, we assume that by convention the vertices and the edges in the skeleton tree T y ½D;d ¼ ðV y ¼ fv 1 ; v 2 ; . . . ; v n max ðD;dÞ g; E y ¼ fe 1 ; e 2 ; . . . ; e n max ðD;dÞÀ1 gÞ are indexed in an ordering σ as follows: (i) T ½D;d is rooted at vertex v 1 , and for any vertex v i and a child v j of v i it holds that i < j; (ii) Each edge e j joins two vertices v jþ1 and v k with k j, and tailðjÞ denotes the index k of the parent v k of vertex v jþ1 ; and . . . ; e D Þ is one of the longest paths in the tree T y ½D;d . Figure 3 gives an illustration of an ordering σ as described above for the skeleton trees T y ½3;4 in Fig. 3(a) and T y ½4;4 in Fig.  3(b). For each i ¼ 1; 2; . . . ; n max ðD; dÞ, let N σ ðiÞ denote the set of indices j of edges e j incident to vertex v i , and dist σ ði; jÞ . . . ; e D g E , and an integer i ¼ 2; 3; . . . ; D, we denote by H ðiÞ the subtree of H rooted at v i and induced by its descendants except the vertex v iþ1 and the descendants of v iþ1 . An illustration is given in Fig. 4 (a).
For a rooted tree T ¼ ðV; EÞ and a vertex v 2 V, we denote by prt T ðvÞ the parent of v, and by Cld T ðvÞ the set of children of v in T.

A proper form for subtrees
For integers D ! 2 and d 2 f3; 4g, let T denote T y ½D;d and B its base path. Let K be a rooted subtree of T with EðBÞ EðKÞ. For a vertex v 2 VðT ÞnVðBÞ, we define the s-value s(v; K) of v with respect to K as follows: and jCldðv; KÞj < jCldðv; T Þj "; and 3. sðv; KÞ ¼ min u2Cldðv;KÞ sðu; KÞ þ 1 otherwise.
Let H be a subtree of T with EðBÞ EðHÞ. We call H an sproper tree, if for each integer i 2 ½2; D, the subtree H ðiÞ is an s-left heavy tree and one of the following conditions holds: (a-1) d ¼ 3 and jVðH ð2Þ Þj ! jVðH ðDÞ Þj; An illustration of an s-proper tree and non-s-proper trees is shown in Fig. 4. Recall that B denotes the base path in T. We define an s-proper form of H to be a subtree H 0 such that (i) EðBÞ EðH 0 Þ; (ii) there is an isomorphism from H 0 to H such that ðuÞ 2 VðBÞ for any vertex u 2 VðBÞ; and (iii) H 0 is an s-proper tree. Notice that an s-proper form of a subtree H is not necessarily unique.
Theorem 2 Every subtree H of T y ½D;d with EðBÞ EðHÞ has an s-proper form.
Proof We set G :¼ H. If G is an s-proper tree then G is an sproper form of H and we are done. Therefore, assume that G is not an s-proper tree. If G has a subtree G ðiÞ for some i 2 ½2; D that is non-s-left heavy due to a vertex v j 2 VðG ðiÞ Þ, then we can re-order the descendant subtrees of the children of v j so that the s-value of its children from left to right is non-increasing, since it will not change the s-value of v j . Let G Ã denote the tree obtained by applying this re-ordering operation. Clearly there exists an isomorphism from G Ã to H such that ðuÞ 2 VðBÞ for any vertex u 2 VðBÞ, since we only re-order the descendant subtrees of the children of a vertex in G. Then set G :¼ G Ã and repeat the same operation of re-ordering until all subtrees G ðiÞ ; i 2 ½2; D of G are s-left heavy trees. Next, for the subtree G, if one of conditions (a-1) and (a-2) is satisfied, then G is an s-proper form of H. Otherwise, i.e., when none of conditions (a-1) and (a-2) is satisfied, we can get an s-proper form of H by switching G ðiÞ and G ðDþ2ÀiÞ , i 2 ½2; bD=2c þ 1, which completes the proof.

A proper set based on s-proper form
Let P prc be a set of ordered index pairs (i, j) with D þ 2 i < j n max . We call P prc proper if the next conditions hold: (c-1) For each subtree H of T y ½D;d with EðBÞ EðHÞ, there is at least one subtree H 0 with EðBÞ EðH 0 Þ such that 1. there is an isomorphism from H 0 to H such that ðuÞ 2 VðBÞ for any vertex u 2 VðBÞ; and 2. for each pair ði; jÞ 2 P prc , if e j 2 EðH 0 Þ then e i 2 EðH 0 Þ; and (c-2) For each pair of edges e i and e j in T y ½D;d such that e i is the parent e j , there exists a sequence ði 1 ; i 2 Þ; ði 2 ; i 3 Þ; . . . ; ði kÀ1 ; i k Þ of index pairs in P prc such that i 1 ¼ i and i k ¼ j.
Note that a given skeleton tree does not necessarily have a unique proper set P prc . In the remainder of this section, we give a construction method for a proper set P prc based on s-proper form.     Let T denote T y ½D;d . We define P 0 prc of T to be the set of ordered index pairs (i, j) such that either (i) v jþ1 is the first child of v iþ1 or; (ii) j ¼ i þ 1 and v iþ1 and v iþ2 share the same parent in T.
In Fig. 5 (a) and (b), we illustrate an example of ordered index pairs (i, j) that satisfy conditions (i) and (ii), respectively, with e i at level t À 1 and e j at level t, t 2 ½3; bD=2c þ 1.
For d ¼ 3 and edges at level 2, we define P 00 ð3Þ 2 to be the set fðD þ 1; ðd À 2ÞðD À 2Þ þ D þ 1 ¼ 2D À 1Þg . For d ¼ 3 and edges at level 4, we define P 00 ð3Þ 4 to be the set of ordered index pairs (i, j) such that (i) v iþ1 ; v jþ1 2 VðT ðpÞ Þ for some p 2 ½2; D; and (ii) v iþ1 and v jþ1 are each the h-th child of their parents in T for some h 2 ½1; d À 1.
Theorem 3 For two integers, D ! 2 and d 2 f3; 4g, the set P 0 prc [ P 00 ðdÞ prc is proper for the tree T y ½D;d .
Proof Let T denote the tree T y ½D;d , and let P ¼ P 0 prc [ P 00 ðdÞ prc . To show that P is proper, we need to show that P satisfies conditions (c-1) and (c-2). Let H be a subtree of T with fe 1 ; e 2 ; . . . . We next show that condition (c-1)(b) holds for P 0 prc and P 00 ðdÞ prc separately. Let ði; jÞ 2 P 0 prc such that v jþ1 is the first child of v iþ1 . Ife j 2 EðGÞ but e i = 2EðGÞ, then G would be disconnected, which is a contradiction. Let ði; jÞ 2 P 0 prc such that j ¼ i þ 1 and v iþ1 and v iþ2 share the same parent in T. This implies that there exists an integer p 2 ½2; D such that v iþ1 ; v iþ2 2 VðT ðpÞ Þ. Let K denote G ðpÞ . If e j 2 EðGÞ but e i = 2EðGÞ, then sðv iþ1 ; KÞ ...   prc such that v jþ1 is the first child ofv iþ1 and edgee j is at levelt ! 3; (b) An element of P 0 prc such that j ¼ i þ 1 and v iþ1 and v iþ2 share the same parent in T y ½D;d and edge e j is at levelt ! 2; (c) The element ðD þ 1; ðd À 2ÞðD À 2Þ þ D þ1 ¼ 2D À 1Þ of P 00 ð3Þ 2 and edge e Dþ1 and e 2DÀ1 are at level 2; (d) An element (i, j) ofP 00 ð3Þ 4 such that v iþ1 ; v jþ1 2 VðT ðpÞ Þ for some p 2 ½2; D and v iþ1 and v jþ1 are each the h-th child of their parents in T y ½D;d for some h 2 ½1; d À 1; (e) The elements ðD þ 1; ðd À 2ÞðD À2Þ þ D þ 1 ¼ 3D À 3Þ and ðD þ2; ðd À 2ÞðD À 2Þ þ D þ 2 ¼ 3D À 2Þ in P 00 ð4Þ 2 ; and (f) An element (i, j) of P 00 ð4Þ 3 such that v iþ1 ; v jþ1 2 VðT ðpÞ Þ for some p 2 ½2; D and v iþ1 and v jþ1 are each the h-th child of their parents in T y ½D;d for some h 2 ½1; d À 1 ¼ 0 and sðv iþ2 ; KÞ ! 1 holds. This implies that sðv iþ1 ; KÞ < sðv iþ2 ; KÞ, which contradicts the fact that K is an s-left heavy tree. Hence, P 0 prc satisfies condition (c-1)(b). Let d ¼ 3 and ði; jÞ 2 P 00 ðdÞ prc . This implies that ði; jÞ 2 P 00 ð3Þ 2 or ði; jÞ 2 P 00 ð3Þ 4 . Let ði; jÞ 2 P 00 ð3Þ 2 , then i ¼ D þ 1 and j ¼ ðd À 2ÞðD À 2Þ þD þ 1. Notice that v iþ1 2 VðT ð2Þ Þ and v jþ1 2 VðT ðDÞ Þ. If e j 2 EðGÞ but e i = 2EðGÞ, then jVðG ð2Þ Þj < jVðG ðDÞ Þj would hold, which contradicts the fact that G is an s-proper tree. Let ði; jÞ 2 P 00 ð3Þ 4 , then it holds that levelðe i Þ ¼ levelðe j Þ ¼ 4 and v iþ1 and v jþ1 are in the same rooted subtree T ðpÞ for some integer p 2 ½2; D . Let K denote G ðpÞ . Since d ¼ 3 , there exists a positive integer u such that the four edges e u ; e uþ1 ; e uþ2 and e uþ3 are at level four. Note that ðu; u þ 1Þ and ðu þ 2; u þ 3Þ are the elements of P 0 prc since the vertices in the pairs ðv uþ1 ; v uþ2 Þ and ðv uþ3 ; v uþ4 Þ have the same parents. This implies that the condition v iþ1 and v jþ1 are each the h-th child of their parents in T for some h 2 ½1; d À 1 can only be true for ði; jÞ ¼ ðu; u þ 2Þ or ðu þ 1; u þ 3Þ. Let ði; jÞ ¼ ðu; u þ 2Þ. If e uþ2 2 EðGÞ but e u = 2EðGÞ, then it holds that e uþ1 = 2EðGÞ since ðu; u þ 1Þ 2 P 0 prc . Letv x and v y denote the parents of v uþ1 and v uþ3 in T, respectively. Then sðv x ; KÞ ¼ 1 and sðv y ; KÞ ! 1 would hold, which implies that sðv y ; KÞ ! sðv x ; KÞ. Then we can get another s-proper form H 00 ¼ ðV 00 ; E 00 Þ by switching the two subtrees rooted at v x and v y in G. Clearly by the construction of H 00 , it holds that e u 2 E 00 if e uþ2 2 E 00 and E 00 satisfies all those conditions that are satisfied by E(G). In such a case, we set G :¼ H 00 . Let ði; jÞ ¼ ðu þ 1; u þ 3Þ. If e uþ3 2 EðGÞ but e uþ1 = 2 EðGÞ, then it holds that e uþ2 2 EðGÞ since ðu þ 2; u þ 3Þ 2 P 0 prc and e u 2 EðGÞ since we have shown that ðu; u þ 2Þ 2 P 0 0 ðdÞ prc . Let v x and v y denote the parents of v uþ1 and v uþ3 in T, respectively. Then sðv y ; KÞ ! 2 and sðv x ; KÞ ¼ 1 would hold, which implies that sðv x ; KÞ < sðv y ; KÞ. Notice that v x and v y have the same parent in T by the choice of u. This and sðv x ; KÞ < sðv y ; KÞ contradicts the fact that K is an s-left heavy tree. This implies that e uþ1 2 EðGÞ if e uþ3 2 EðGÞ holds.
Let v x and v y denote the parents of v uþ1 and v uþ4 in T, respectively. Then sðv x ; KÞ ¼ 1 and sðv y ; KÞ ! 1 would hold, which implies that sðv y ; KÞ ! sðv x ; KÞ. Then we can get another s-proper form H 00 ¼ ðV 00 ; E 00 Þ by switching the two subtrees rooted at v x and v y in G. Clearly by the construction of H 00 , it holds that e uþ1 2 E 00 if e uþ4 2 E 00 and E 00 satisfies all those conditions that are satisfied by E(G). In such a case, we set G :¼ H 00 . Let ði; jÞ ¼ ðu þ 2; u þ 5Þ. If e uþ5 2 EðGÞ but e uþ2 = 2EðGÞ, then it holds that e uþ4 ; e uþ3 2 EðGÞ since ðu þ 4 ; u þ 5Þ; ðu þ 3; u þ 4Þ 2 P 0 prc and e uþ1 ; e u 2 EðGÞ since we have shown that ðu þ 1; u þ 4Þ; ðu; u þ 3Þ 2 P 00 ðdÞ prc .
Let v x and v y denote the parents of v uþ1 and v uþ4 in T, respectively. Then sðv y ; KÞ ! 2 and sðv x ; KÞ ¼ 1 would hold, which implies that sðv x ; KÞ < sðv y ; KÞ. Notice that v x and v y have the same parent in T by the choice of u. This and sðv x ; KÞ < sðv y ; KÞ contradicts the fact that K is an s-left heavy tree. This implies that e uþ2 2 EðGÞ if e uþ5 2 EðGÞ holds. Hence, in each of the cases P 00 ðdÞ prc satisfies condition (c-1)(b). Next we prove that P satisfies condition (c-2). Let i 1 ¼ i, i 2 þ 1 be the index of the first child of v iþ1 and i 2 ; i 3 ; . . . ; i k ¼ j be consecutive integers. Then the sequence ði 1 ; i 2 Þ; ði 2 ; i 3 Þ; . . . ; ði kÀ1 ; i k Þ are ordered index pairs in P such that i 1 ¼ i and i k ¼ j. Hence P is a proper set, which completes the proof.

An algorithm to calculate a proper set
In this section, we give an algorithm to compute a proper set based on Theorem 3. In Algorithm GENPPRC(D , d), the variables P 1 ; P 2 ; P 3 ; P 4 and P 5 store the sets defined in Section 4.2, P 0 prc , P 00 ð3Þ 2 , P 00 ð3Þ 4 , P 00 ð4Þ 2 , and P 00 ð4Þ 3 , respectively. For an edge e 2 EðT y ½D;d Þ, the variable level[e] stores the level of e.

MILPs for representing acyclic chemical graphs
In this section, we propose a new MILP formulation Mðx; g ; C 2 Þ as used in Step 4 of the method introduced in Section 3. For our purpose, we consider acyclic chemical graphs where each vertex has degree at most 3 or the maximum degree is 4. We formulate the MILP Mðx; g; C 2 Þ so that the underlying graph H is an induced subgraph of the skeleton tree T y ½dia Ã ;d max introduced in Section 4, and moreover, it holds that fv 1 ; v 2 ; . . . ; v dia Ã þ1 g V. We remark that in order to reduce the number of graph-isomorphic solutions to this MILP, for a skeleton tree, we make use of precedence constraints based on the proper set P prc as formalized in Section 4.2.
For a technical reason, we introduce a dummy chemical element , and denote by Γ 0 the set of dummy tuples ð; ; kÞ ,ð; a; kÞ andða; ; kÞ (a 2 Λ,k 2 ½0; 3 ). To represent elements in an MILP, we encode these elements a into some integers denoted by ½a, where we assume that ½ ¼ 0. For simplicity, we also denote n Ã by n and n max ð dia Ã ; d max Þ by n max . Our new formulation is given as follows.

Experimental results
The main aim of our experiments is to compare implementations of the MILP formulations proposed in Section 5 and the one due to Azam et al. [4] in Step 4 of the method for the inverse QSAR/QSPR [4]. The results of this main experiment are presented in Section 6.2, after giving our findings on the construction of regression functions by training ANNs in Section 6.1.
We executed the experiments on a PC with Intel Core i5 CPU running at 1.6 GHz and 8 GB of RAM, under the Mac OS 10.14.4 operating system. For a study case, we selected three chemical properties: heat of atomization (HA), octanol/ water partition coefficient (KOW) and heat of combustion (HC).

Experiments on Phase 1
In this section we present our experiments conducted on Phase 1, that is, the forward phase of the framework for the inverse QSAR/QSPR.

In
Step 1, we collected a dataset D of acyclic chemical graphs for HA made available by Roy and Saha [19]. For the properties KOW and HC, we collected data available from the hazardous substances data bank (HSDB) from PubChem. We choose label set Λ to be such that each element in Λ appears as a chemical element in at least one of the chemical graphs in the dataset D; and similarly, we choose the set Γ to be the set of all tuples ¼ ða; b; kÞ 2 Λ Â Λ Â ½1; 3 appearing in at least one of the chemical graphs indataset D. In Step 2, we set a graph class G to be the set of all acyclic chemical graphs that are possible to be constructed with elements from the sets Λ and Γ chosen in Step 1. In Step 3, we used the MLPRegressor tool from the Python package scikit-learn 2 (version 0.24.2) to construct ANNs N , and we set ReLU as the activation function of neurons. We tested several different architectures of ANNs for each chemical property. With simple preliminary experiments we identified promising ranges for hyperparameter values, and then performed a grid search over the following hyperparameter values: number of hidden layers in f1; 2; 3; 4; 5g, number of nodes per hidden layer in f7; 10; 15; 30; 50g, learning rate η in f0:00025; 0:0005; 0:001; 0:002; 0:004g, and regularization term in f10 À5 ; 2 Â 10 À5 ; 4 Â 10 À5 ; 8 Â10 À5 ; 1:6 Â 10 À4 g.
The maximum number of training epochs was set to 10 8 due to the moderately small number of training data. Since our initial experiments derived satisfactory results, other model parameters were used with their default values provided by scikitlearn. We used 5-fold cross validation to evaluate the performance of the trained ANNs, where a given dataset D is randomly partitioned into five subsets D i , i 2 ½1; 5. The evaluation is given in terms of the coefficient of determination R 2 , which for a collection ða 1 ; a 2 ; . . . ; a p Þ of p real values with average b a ¼ 1 p P p i¼1 a i that are associated with a collection ð y 1 ; y 2 ; . . . ; y p Þ of values predicted by a regression model, gives a model error as i¼1 ða i Àb aÞ 2 : Table 1 shows the size and range of values in the datasets that we used for each chemical property, as well as results on Phase 1. The notation and symbols used in Table 1 are as follows: : the tested chemical property, one of HA, KOW, and HC; |D|: the number of data points in the collected dataset D for a chemical property ; Λ: the set of all chemical elements that appear in at least one of the chemical graphs in the dataset D; n; n: the minimum and maximum number of vertices in a chemical graph G ¼ ðH; ; Þ over the dataset D; a; a: the minimum and maximum values of a(G) over the dataset D; K: the number of descriptors in f(G) for a chemical property , where K ¼ jΛj þ jΓ j þ 12 for our feature vector f(G); Arch.: the size of hidden layers of ANNs, where h10iÂ1 (resp., h30iÂ2 ) means an architecture (K, 10, 1) with an input layer with K nodes, one hidden layer with 10 nodes (resp., two hidden layers, each with 30 nodes), and an output layer with a singe node; η: the learning rate chosen for training the ANN; : the regularization term used for training the ANN; Ltime: the average time, in seconds (s), to construct ANNs for each trial; Test R 2 : the coefficient of determination averaged over the five test sets for the corresponding combination of hyperparameter values.
Note that the parameters given in Table 1 for Step 3 are those that achieved the highest average coefficient of determination over the test set in the cross-validation trials. As can be observed in Table 1, we cannot draw a conclusion as to whether a certain hyperparameter of an ANN has a predictable influence on the performance of the ANN model. For different chemical properties, and in fact, for the case of property HC, even for a single property observed over a different dataset of chemical compounds, noticeably different hyperparameter combinations achieve the best performance, i.e., the highest coefficient of determination over an unobserved test set.

Experiments on Phase 2
In this section we delve into our main interest with this study, namely the inverse phase of the combined framework for the inverse QSAR/QSPR [4,5], and in particular, Step 4, inverting a trained ANN by solving an MILP formulation.
We call the MILP formulation due to Azam et al. [4] based on an adjacency matrix the AM method, and the MILP formulation based on skeleton tree presented in Section 5 the ST method. We use the CPLEX (ILOG CPLEX version 12.9) [9] solver to solve MILP instances formulated in the framework. We performed experiments for each of the properties HA, KOW, and HC as follows. For several pairs ðd max ; dia Ã Þ of integers d max 2 f3; 4g and dia Ã 2 ½6; 13, choose each integer n Ã 2 ½14; n max ðdia Ã , d max Þ and six target values y Ã i , i 2 ½1; 6. We attempted to solve the six MILP instances by using the AM and ST methods. We started by setting n Ã ¼ 14, and then gradually increased n Ã up to n max ðdia Ã ; d max Þ. Whenever the running time while solving at least one of the six instances reached a time limit set to be 300 seconds, we stopped further attempts to solve the MILP instances with each of the two methods.
We present our findings in Tables 2 and 3, as well as Figure 4, where we summarize the results from our experiments, in particular, the computation time of the AM and ST methods in Step 4 for property HA. The notation used is as follows: For property HA, additionally, we executed the AM method for instances with n Ã ¼ 36, n Ã ¼ 38, and n Ã ¼ 40, dia Ã ¼ 6, and d max ¼ 4 without imposing a time limit. The respective computation times were 21,962 seconds for n Ã ¼ 36, 124,903 seconds for n Ã ¼ 38 , and 148,672 seconds for n Ã ¼ 40 . Meanwhile, the computation time for the ST method was 2.133 seconds for the instances with n Ã ¼ 38, which means that for this range of instance size, the ST method was 58,557 times faster than the AM method.
We give a short comment summary on the results of the experiments with instances of KOW and HC: The ST method outperformed the AM method for the cases of ð ¼ KOW; jΛ j ¼ 3;  From the experimental results, we observe that the ST method completed Step 4 in shorter time than the AM method did when the diameter of graphs was up to around 11 for d max ¼ 3, and 8 for d max ¼ 4. In particular, it can be seen from Tables 2 and 3, as well as Fig. 6, that under such conditions, the ST method could handle chemical graphs with number n Ã of non-hydrogen vertices up to 48 in reasonable CPU time, whereas the AM method could only handle chemical graphs with n Ã < 30. Therefore, the results of computational experiments suggest that the ST method can handle a much larger number of chemical graph than the AM method can. Finally, recall that chemical graphs with diameter up to 11 for d max ¼ 3 and 8 for d max ¼ 4 account for about 35 % and 18 %, respectively, out of all acyclic chemical graphs with 200 or fewer non-hydrogen atoms registered in the PubChem chemical database, and about 63 % and 40 % out of the acyclic chemical graphs with 200 or fewer non-hydrogen atoms with d max ¼ 3 and d max ¼ 4, respectively.

Concluding remarks
With this work, we presented a new MILP formulation for inferring acyclic chemical graphs. Our MILP formulation can be directly incorporated in the method for the inverse QSAR/QSPR proposed to Azam et al. [4]. One drawback of the formulation given by Azam et al. [4] is that in order to represent a tree on n vertices, subsets of vertex pairs over an n Â n adjacency matrix are used, requiring the same number of variables in the MILP formulation. With the aim to reduce the number of variables in the MILP formulation, we introduced the concept of skeleton trees, which are trees with the maximum number of vertices for fixed diameter and maximum degree. Then, in our method, a target tree is chosen as an induced subgraph of a skeleton tree. In this way, whenever  the target acyclic graphs have a limited diameter, we significantly reduce the number of variables used in our MILP formulation, and thereby also the time needed to solve it in practice when the number of chemical elements is relatively small. The results on some computational experiments confirm this, i.e., we observe that the proposed method is more efficient than the previously proposed method. Even though the MILP formulation presented in this paper targets the class G of acyclic chemical graphs, we note that a similar formulation can be applied to the acyclic part of any chemical graph, regardless of the number of cycles it has. Based on the idea of prescribing a tree that serves as a supergraph of a target acyclic chemical graph, Azam et al. [5] and Akutsu and Nagamochi [2] have developed methods for inferring chemical acyclic graphs with a larger diameter and cyclic chemical graphs with any cycle index, respectively, where the proposed method/systems are available at GitHub https://github.com/ku-dml/mol-infer.
As future work it would be interesting to explore a way of defining the graph topology of a desired chemical graph, i.e., generate target chemical graphs with a fixed scaffold [21]. Another line of research would be to explore different methods for constructing a regressive prediction function, for example convolutional ANNs, different types of multiple linear regression, or decision trees.
Funding This research was supported in part by the Japan Society for the Promotion of Science, Japan, under Grant #18H04113.

Declarations
Conflicts of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Fig. 6 The average computation time of the AM and ST methods for HA