1 Introduction

In the early stages of drug discovery, hit-to-lead optimization is conducted to identify lead compounds with sufficient affinity and desirable pharmacological properties from hit compounds that bind weakly to the target receptor [1]. The alchemical free energy perturbation (FEP) calculation [2,3,4] is a popular tool for lead optimization because it uses a perturbation-based approach to calculate the exact relative binding free energies between a candidate compound and its target receptor from molecular dynamics simulations [5, 6].

Closed thermodynamic cycles are crucial for minimizing errors in the relative binding free energy calculations [7,8,9]. The sum of the relative free energies of the thermodynamically closed paths is theoretically zero. The error in a closed cycle is called the cycle closure error or hysteresis [7]. Cycle closure methods, which reduce errors based on closed cycles, can improve the prediction accuracy of perturbation calculations. Therefore, to guarantee the accuracy of the FEP calculations, the FEP graph must include as many closed cycles as computational resources allow. Figure 1 illustrates an example FEP graph of a TYK2 target benchmark set by Wang et al. [2] generated with Lomap [7].

Fig. 1
figure 1

Example FEP graphs on the Wang-benchmark dataset (target:TYK2). Drawn using Lomap and NetworkX library

Here, we introduce the types of FEP graphs and their generation methods. First, the most common FEP graph is the star graph, which connects all compounds to only one reference compound. No particular graph construction algorithm is required, and the perturbation calculation is only performed \(N-1\) times. However, the accuracy of the FEP calculation cannot be guaranteed because no single closed thermodynamic cycle exists. PyAutoFEP [10], an open-source tool for automating FEP calculations, provides an implementation that builds wheel-shaped and optimal graphs using ant colony optimization.

DiffNet [8] provides a theoretical paradigm for optimizing the covariance matrix for errors. HiMap [9] also provides a statistically optimal design based on optimization of the covariance matrix, similar to DiffNet.

Lomap [7] is an automated planning algorithm for calculating relative free energy. Lomap calculates the scores between compound pairs based on several design criteria for proper relative free energy calculations, and FEP graphs are constructed based on these scores. Here, Lomap constructs an FEP graph based on the criterion that all molecules must be part of at least one closed cycle. However, Lomap only handles the generation of FEP graphs for a few dozen compounds and must consider hundreds of FEP calculations.

Recently, active learning [11,12,13], a machine learning approach for planning experiments for lead optimization, has been applied to relative binding free energy calculations [14,15,16]. Active learning involves hundreds or thousands of extensive FEP calculations. However, conventional Lomap calculations incur a substantial computational cost for FEP graphs of several hundred compounds. Therefore, applying the Lomap graph directly to active learning cases requires significant effort. Conventional active learning methods with FEP only attempt to perform star-shaped FEP calculations without closed cycles [15, 16] or perform more detailed calculations for a limited number of compounds [14]. Therefore, simplifying the introduction of FEP cycles, even for hundreds of compounds, is imperative to support FEP calculations by active learning.

The main reason why Lomap calculations require a considerable amount of time is that they check the constraints for every edge in the initial graph. Figure 2 displays the outcomes of generating FEP graphs via the Lomap algorithm across 500 nodes. A significant portion of the build_graph function’s execution time, as indicated in the first line of Fig. 2, was consumed by the check_constraints function, noted in the fourth line of the same figure.

Fig. 2
figure 2

Results of profiles by cProfile of the baseline Lomap algorithm (Thompson TYK2 dataset, \(N=500\)). The upper function calls the lower function. Drawing by snakeviz

In this paper, we propose an improved Lomap algorithm that introduces a chunk check process in which edges are processed collectively as much as possible to maintain the number of constraint checks dependent on the number of nodes. We also improved the calculation of graph diameters and bridges. The proposed method can construct FEP graphs several hundred times faster for hundreds of compounds. Furthermore, the output FEP graph is the same as the original Lomap.

We published a preliminary version of this work in conference proceedings [17], including limited experiments and discussion. We have proposed a new streamlining constraint checks mechanism and added it to the implementation to further speed up Lomap. We have added accompanying experiments to significantly extend the results and discussion.

2 Methods

2.1 Lomap algorithm

Lomap is available as an open-source implementation using RDKit (https://github.com/OpenFreeEnergy/Lomap), and this study is based on and refined from it. In this section, we describe the algorithm used in the original Lomap. Lomap generates an undirected graph that determines the compound pairs for which relative binding free energy calculations will be conducted. To obtain valid results in FEP calculations, the target compound pairs must be sufficiently similar to each other. The link score is an index based on the similarity between two compounds. Therefore, the FEP graph uses the link score, an index based on the similarity between two compounds, to evaluate whether a compound pair is valid for FEP calculation. The link score is calculated based on the maximum common substructure (MCS) of a compound pair using the following equation:

$$\begin{aligned} S=\exp [-\beta (N_A+N_B-2N_{MCS})], \end{aligned}$$
(1)

where \(\beta\) is a parameter, and \(N_A\), \(N_B\) and \(N_{\rm MCS}\) are the number of heavy atoms in the two input compounds and MCS, respectively. That is the number of insertions and deletions required for the perturbation. The score was defined between 0 and 1, with a maximum of 1 when the two compounds were the same. The score includes a correction that takes a product of other scores calculated based on whether they are valid for the perturbation calculation, such as having the same net charge and preserving the rings as much as possible. A crucial aspect of graph generation is that edges with low link scores should be removed except when they favorably impact a closed cycle. Computing the link scores among all the compounds is required to generate an FEP graph using the Lomap algorithm. In this study, the score calculation for each compound pair was not subject to acceleration because it could be parallelized. In addition, score calculation is not subject to execution time measurement because the scores are calculated in advance.

Lomap’s FEP graph generation is explained as follows:

  1. 1.

    Compute a scoring matrix for all compound-to-compound pairs.

  2. 2.

    The initial graph should consist of only the edges of pairs whose link scores are above the threshold. The initial graph may be partitioned into multiple subgraphs.

  3. 3.

    The edge removal process is executed for each connected component of the initial graph. First, sort the edges in the subgraph in order of descending link score.

  4. 4.

    Begin with the edge having the lowest link score and assess whether the constraints are met upon its removal. If the constraints are satisfied, remove the edge; otherwise, leave it in place. The four constraints are as follows (shown in Fig. 3):

    1. (a)

      Connectivity constraints The subgraph is connected.

    2. (b)

      Cycle constraints The number of nodes not included in the cycles does not change. This constraint can be determined by the number of bridges in the cycle not changing.

    3. (c)

      Diameter constraints The diameter of the subgraph is less than or equal to the threshold MAXDIST.

    4. (d)

      Distance constraints (if necessary) The distance from the known active ligand is less than or equal to the threshold.

  5. 5.

    Connect each connected component.

Fig. 3
figure 3

Constraints for Lomap edge deletion decisions. a Removing an edge makes node 4 disconnected. b Removing an edge makes node 4 not belong to any cycle. c Removing an edge makes the distance from node 4 to node 3 to 2, and the constraint is not satisfied (when MAXDIST=1). d Removing an edge makes the distance from node 2 (known active ligand) to node 4 to 2, and the constraint is not satisfied (for threshold=1)

Here, Process (4) required the most time when the graph was large. This is because Process (4) checks all edges in the subgraph to determine whether an edge can be removed. The graph diameter and bridges were then calculated to determine whether an edge could be removed. If the number of nodes in the graph is N and the number of edges is M, the computational complexity of the diameter is O(MN), and that of the bridges is \(O(M+N)\). The computational complexity becomes at least \(O(M^2N)\) because the number of edges in the graph changes each time the FEP graph is updated. The most significant factor in this computational complexity is that it checks all edges to determine whether they can be removed. The Lomap algorithm removes edges in the order of the worst score; however, this order must be preserved. Therefore, the graph-generation phase cannot be computed in parallel.

2.2 Chunk check process

In this section, we propose FastLomap, which introduces an algorithm called the chunk check process that streamlines the Lomap algorithm. Because the number of edges in the initial graph is proportional to the square of the number of nodes, if the link score is not pruned with a threshold, the ratio of edges that do not satisfy the constraint follows the inverse of the number of nodes. In other words, the larger the number of nodes, the greater the number of edges that can be removed. Edges that do not satisfy the constraints are unevenly distributed toward the end of the checking process. Figure 4 illustrates this tendency during the edge removal process. This is because numerous redundant edges exist at the outset of the process, making it easier to satisfy the constraints by removing the edges. However, as the procedure progressed, the conditions became more stringent, making edge removal more challenging. Therefore, these blank areas should be removed as much as possible in a single check, and the edges need to be removed individually in areas with a concentration of unremovable edges.

Fig. 4
figure 4

Occurrence of unremovable edges and their cumulative sum across constraint checks (Thompson TYK2 dataset, 500 nodes). Among all possible edge combinations, 19,066 edges with a link score of 0.4 or higher are checked, resulting in 740 edges for the outcome

Given these observations, we propose a chunk check process. Figure 5 provides an overview of the chunk check process.

Fig. 5
figure 5

Overview of chunk check processing

First, let the chunk scale s be a parameter related to the size of the chunk, let the chunk level k be the depth of the layer of the chunk process, and let \(C_{k}\) be the chunk when chunk level k is set. In addition, for the number of edges M, \(k_{max}=\lfloor \log _{s}M \rfloor\) represents the maximum chunk level, indicating the number of times that chunk processing can be executed recursively. The chunk size for the first \(k=0\) was \(S_0=s^k_{\rm max}\). For example, when \(M=5100\) and \(s=10\), \(S_0=s^{k_{\rm max}}=10^3\). The algorithmic procedure for the chunk check process is as follows:

  1. 1.

    Check the constraints of step (2) for every \(S_0\) edge sorted in order of decreasing link score.

  2. 2.

    Constraint check. If the constraint is not satisfied (i.e., there are some unremovable edges in the chunk), the check fails, and all edges are restored and go to the next step (3). The constraint check succeeds when the constraint is satisfied even if \(S_0\) edges are removed. If the constraint check succeeds for all edges, return to step (1).

  3. 3.

    As \(S_{k}=\max (\lfloor S_{k-1}/s\rfloor , 1)\), check \(C_k\) constraints for each \(S_{k}\) edges in chunk \(C_{k-1}\).

  4. 4.

    If the unremovable edges are identified, check the constraints for the remaining edges of chunk \(C_{k-1}\). If the constraint is satisfied, complete the chunk process for chunk \(C_{k-1}\) and return to step (2). If not satisfied, continue with step (3) until the chunk check of all edges in the chunk succeeds.

The chunk check process allows the removal of each large chunk in areas with few unremovable edges. When the chunk scale is large, unremovable edges can be identified quickly in areas with numerous unremovable edges.

2.3 Streamlining constraint checks

In FastLomap, we accelerated the diameter and cycle constraints, which had been bottlenecks. First, we explain the speeding up of diameter constraints. The original Lomap implementation used the NetworkX [18] eccentricity method, but we changed it to calculate the diameter of the graph using the bounding diameter algorithm [19, 20]. This algorithm is available in the NetworkX diameter function with the usebounds option. The diameter calculation is usually O(MN) computation complex because it computes the all-pairs shortest path (ASAP). The algorithm uses the lower and upper boundary relationships between node eccentricity and diameter to determine the exact diameter within a computation time proportional to O(N). Border cases involving complete graphs and circular graphs need to be considered, but are acceptable when constructing FEP graphs.

Next, we explain the speeding up of cycle constraints The original Lomap keeps cycle constraints by checking that the number of bridges does not change from the initial graph. It uses a bridge calculation based on the chain decomposition method [21] implemented in NetworkX [18]. Here, we replaced the bridge calculation method by Tarjan because it is faster than that method. Tarjan’s method is an \(O(N+M)\) algorithm that decomposes into strongly connected components by considering the smallest reachable index called lowlink. We will also evaluate the effects of these constraint check changes through experiments.

2.4 Code availability

Our proposed improved version of the Lomap code (called FastLomap) is available at https://github.com/ohuelab/FastLomap under the same MIT license as the original Lomap.

3 Experiments

3.1 Thompson TYK2 dataset

In this study, we used 100–1000 nodes for extensive FEP graph construction experiments. Therefore, we used a congeneric series of TYK2 inhibitors based on the aminopyrimidine scaffold provided by Thompson et al. [22].

(a) Thompson TYK2 dataset:

From the original dataset, we randomly selected 1000 cases and constructed a subset of the experimental set from 100 to 1000 nodes.

(b) Thompson TYK2\(_{0.7}\)  dataset:

From the original dataset with a link score of 0.7 or higher with the reference ligand (PDB: 4GIH [23]), the experimental set was constructed from 100 to 1000 cases in the same way. This case is to evaluate the effectiveness of the proposed method when the number of edges is extremely large.

Figure 6 illustrates the number of edges in the initial graph against the number of nodes for the Thompson TYK2 and Thompson TYK2\(_{0.7}\) datasets. The edges of the initial graph were pruned to a link score threshold of 0.4. Therefore, the number of edges in the Thompson TYK2\(_{0.7}\) dataset, which consisted of similar compounds, was at most five times larger than that in the Thompson TYK2 dataset. Figure 7 shows the distribution of link scores for the two datasets. The peak of the link score for the Thompson TYK2\(_{0.7}\) dataset was higher than that for the Thompson TYK2 dataset. The numbers of edges in the generated FEP graph for each node are shown in Fig. 8. Both datasets have a similar number of edges in the FEP graph. Note that the generated graphs are the same for the proposed method and the existing Lomap.

Fig. 6
figure 6

Number of edges relative to the number of nodes of the initial graph in Thompson TYK2 datasets

Fig. 7
figure 7

Histogram of link scores for Thompson TYK2 dataset

Fig. 8
figure 8

Number of edges relative to the number of nodes of the generated graph in Thompson TYK2 datasets

3.2 Wang dataset

The Wang dataset [2] was used as a benchmark to evaluate RBFE prediction performance and contained several dozen ligands for each of the eight targets. Our proposed algorithm assumes that the number of compounds exceeds 100, and its benefit is small for such a dataset. However, we used the Wang dataset to demonstrate that the proposed method can be applied to a general use case.

3.3 Setup for graph generation

First, we generated a complex structure because no complex structure was provided for the Thompson TYK2 dataset. Complex structures were generated by the superposition of a reference ligand (PDB: 4GIH [23]) using the alignment tools (very permissive option) of Cresset Flare [24].

All link scores were computed using Lomap’s default 3D mode. Note that the link score calculation for all pairs of 1000 compounds takes 26.4 min in parallel processing on 28 CPUs.

We experimented with the original Lomap calculation as a baseline and with chunk scales \(s=2, 5, 10, 50\) for the proposed method. To generate the FEP graph, we used the default settings of Lomap, that is, a graph diameter of 6, no tree structure allowed and a link score cutoff of 0.4.

4 Results and discussion

4.1 Results of Thompson TYK2 dataset

Figures 9 and 10 show the execution time and the number of constraint checks for each number of nodes in the Thompson TYK2 and Thompson TYK2\(_{0.7}\) datasets. Here, all methods are compared using the fastest constraint checking method. The baseline method did not complete the calculation within 24 h for \(N\ge 700\) in the Thompson TYK2\(_{0.7}\) dataset. First, we compared the baseline and the proposed methods. For the Thompson TYK dataset, it is up to about 20 times faster at \(N=1000\). In the case of the Thompson TYK2\(_{0.7}\) dataset, the FEP graph was hundred times faster at \(N=600\), and the baseline method could not construct an FEP graph at \(N\ge 700\). The Thompson TYK2\(_{0.7}\) dataset has many edges in the initial graph owing to the poor pruning effect. Consequently, it is evident that the baseline method, which performs constraint checks an equal number of times as the number of edges, is unsuitable. Hence, the proposed method is effective for cases involving many edges because the number of constraint checks depends on the number of nodes.

Next, the chunk scale score was evaluated. For both datasets, the chunk scale was less for the execution time and the number of constraint checks when \(s=5, 10\) than \(s=2, 50\). This was better than \(s=2\) because a larger chunk scale reduced the maximum chunk level \(k_{\max}\) and promptly identified unremovable edges. Conversely, if the chunk scale is larger, such as \(s=50\), the chunk process becomes less effective in areas with few unremovable edges because a single chunk check failure leads to splitting into finer chunks. Moreover, it is essential to note that the number of constraint checks and execution time do not directly correspond because deleting and restoring edges consumes time. The number of constraint checks for \(s=50\) is lower than that for \(s=2\) in Fig. 10, but the execution time is comparable due to the time to remove and restore edges.

Fig. 9
figure 9

Execution time and number of constraint checks in the Thompson TYK2 dataset

Fig. 10
figure 10

Execution time and number of constraint checks in the Thompson TYK2\(_{0.7}\) dataset

4.2 Results of Wang dataset

Figure 11 exhibits the execution time and the number of constraint checks for each number of nodes in the Wang dataset. Although the execution time increases with the number of nodes, even for \(s=5, 10\), the execution time is slightly shorter than that of the baseline. Even at baseline, the process took less than 3 s at most. Therefore, the proposed method has no negative impact on general use cases.

Fig. 11
figure 11

Execution time and number of constraint checks in the Wang dataset

4.3 Results of streamlining constraint checks

We compared the following 4 variations on the implementation of constraint checks (chunk scale \(s=5\))

Version 1:

Existing Lomap case with no changes to either cycle or diameter constraints.

Version 2:

The case with speedup of cycle constraints only.

Version 3:

The case with speedup of diameter constraints only.

Version 4:

The fastest case with both cycle and diameter constraints changed.

Figures 12 shows the execution time for each number of nodes in the Thompson TYK2 and Thompson TYK2\(_{0.7}\) datasets for 4 variations. Figures 13 also show how many times faster each version is compared to Version 1 in the Thompson TYK2 dataset (number of nodes is fixed at \(N=500\)). From the figure, Version 4 modified both constraint checks is the fastest, more than 10 times faster than Version 1, which made no changes to the constraint checks. In addition, Version 3 modified with only the diameter constraint check is faster than Version 2 modified with only the cycle constraint check. Therefore, the diameter constraint is the most significant contribution to the implementation change.

Fig. 12
figure 12

Execution time for each variation of constraint checks in the Thompson TYK2 dataset

Fig. 13
figure 13

Speedup factor for each variation of constraint checks in Thompson TYK2 dataset (number of nodes fixed at \(N=500\))

Fig. 14
figure 14

Speedup factor for Lomap and FastLomap and for changes in constraint checks in Thompson TYK2 dataset (number of nodes fixed at \(N=500\))

Finally, Fig. 14 shows how many times faster the proposed method FastLomap is compared to conventional Lomap for Version 1 and Version 4. (The number of nodes is fixed at \(N=500\).) FastLomap is 7.7 times faster than Lomap, and combined with faster constraint checks, it is 100 times faster.

5 Conclusion

In this study, we propose a chunk check process to improve the Lomap algorithm and make it scalable for increasing the number of compounds. The proposed method may have limited effectiveness for general use in a few dozen cases; however, it demonstrates significant efficiency and is several dozen times faster when applied to several hundred cases. In addition to the chunk check process, Lomap can be made about one hundred times faster by streamlining the cycle constraint and diameter constraint checks. As the output graph of the proposed method is the same as that of the original Lomap algorithm, it can be used immediately to replace existing methods.

Although not addressed in this study, accelerating link score calculation is an issue for future research. Because link score calculations for all compound pairs are required, the computational complexity increases by an order of squares. The recently proposed HiMap [9] uses the DBSCAN method for unsupervised clustering of similar compounds. Clustering using compound fingerprints and calculating the scores based on the maximum common substructure for each cluster may reduce the overall computational cost. In addition, further discussion is required on whether the Lomap’s graphs is appropriate for large FEP designs. It is crucial to analyze the impact of cycle closure redundancy on the accuracy of perturbation maps in large FEP graphs.

Our proposed method can generate graphs up to hundreds times faster than the original Lomap for cases with fewer edges and up to several hundred times faster for those with numerous edges. The proposed method is not only applicable to the planning of large-scale FEP experiments but is also promising as an active learning approach for relative binding free energy calculations.