1 Introduction

The development of highly effective antibodies for therapeutic applications remains a key challenge in the field of immunology and drug discovery [1, 2]. While natural antibodies undergo affinity maturation in the immune response, their binding capacity often falls short of therapeutic requirements due to inherent limitations in affinity gains [3,4,5,6]. Therefore, it is imperative to develop strategies that can enhance the binding affinity of antibodies for successful therapeutic applications.

An antibody molecule is constituted by two functional regions: the constant region (Fc), which modulates effector functions, and the variable region (Fv), which is directly involved in antigen recognition. Within the variable region, a specific loop structure known as the complementarity-determining region (CDR) plays a pivotal role in recognizing and binding to antigens. Unlike the relatively conserved amino acid sequences found in other parts of the antibody, the CDR features highly diverse sequences and conformations, thus enabling the generation of antibodies with elevated antigen specificity.

The CDR itself is composed of six different sub-regions, designated L1 to L3 for the light chain and H1 to H3 for the heavy chain. The length of these sub-regions varies, ranging from approximately 4–30 amino acid residues. Given these variations, the task of experimentally optimizing sequences for maximal affinity and specificity becomes daunting, often requiring the assessment of a near-infinite number of sequence combinations. This approach is not only time-consuming, but also financially burdensome.

Recent advancements in computational methods offer promising avenues for improving this cumbersome process. Within the broader field of protein engineering, machine learning-based methods, particularly those involving deep learning language models, have emerged as powerful tools for generating antibody sequences with desirable properties, often at a fraction of the traditional experimental costs [7,8,9]. However, these sequence-based approaches have limitations, especially when applied to highly variable regions like the CDR or when dealing with sequences lacking experimental data for training.

Significant strides have also been made in the prediction of protein conformations, most notably through the development of AlphaFold2 [10] and RoseTTAFold [11]. These tools have revolutionized the field with their ability to predict protein tertiary structures with remarkable accuracy. Building on this, a novel approach termed “Hallucination” has been developed, which employs structural predictions for de novo protein design [12]. By incorporating constraints based on 3D structure, “hallucination” provides a powerful framework for generating proteins tailored to interact with specified ligands.

Motivated by these advancements, our study leverages the AfDesign protein design method, which incorporates AlphaFold2-based structural predictions within the hallucination framework [13]. Building on the strengths of AfDesign, there have already been several successful reports of designing peptides that bind to target proteins using this method [14, 15]. We specifically target the CDR regions of existing antibody-antigen complexes for amino acid sequence redesign. Our aim is to capitalize on the structural prediction capabilities of AlphaFold2 to create antibodies with enhanced binding affinities, even in instances where experimental structural data are lacking. We published a preliminary version of this work in conference proceedings [16], including limited experiments and discussion. We have added new results on the use of DDG Predictor [17] for the AlphaFold2 prediction structure and expanded all sections to clarify methods and content.

Fig. 1
figure 1

Schematic of binder hallucination

2 Methods

2.1 Binder hallucination

In this study, we focused on the binder hallucination function of AfDesign and applied it to the antibody sequence design problem. A schematic of the methodology used in this study is shown in Fig. 1. Binder hallucination designs protein sequences using the following mechanism [14, 15]:

  1. 1.

    Specify the type of target protein and the sequence length of the binder protein.

  2. 2.

    The amino acid sequence of the binder side is randomly generated at the specified length.

  3. 3.

    AlphaFold2 predicts the 3D structure of the target protein-binder protein complex and outputs pLDDT and pAE, which represent the reliability of the prediction and the C\(_{\beta }\)-distogram, a distance matrix between C\(_{\beta }\) atoms, respectively.

  4. 4.

    The amino acid sequence is updated to minimize the loss using dgram-cce, pLDDT, and pAE as loss functions (dgram-cce is a categorical cross-entropy of the distogram).

Binder hallucination is a protein design methodology that concurrently “generates a three-dimensional structure” and “searches for a sequence” through iterative cycles, as detailed in steps 3 and 4. This aims to design a protein with a high likelihood of binding to a target protein.

The stepwise optimization procedure, termed design_3stage(), is furnished by AfDesign to directly optimize one-hot encoded arrays, particularly for complex three-dimensional structural configurations. This method starts by optimizing the neural network’s output before the softmax activation function (logits). It then optimizes the output post-softmax activation (soft), and finally refines the one-hot encoded array (hard). This multifaceted optimization strategy is suitable for complex topologies and was utilized in this study [18].

In the binder hallucination module within AfDesign, one can partially redesign existing proteins by inputting the amino acid sequence of the binding protein and specifying the regions for redesign during step 1. In the present work, we employed binder hallucination to target the CDR amino acid sequences of antibodies, using antigens as the targets and antibodies as the binders. We used AfDesign to generate sequences with improved binding affinities compared to the original sequences.

2.2 Output value of AlphaFold2

2.2.1 pLDDT (predicted Local Distance Difference Test)

The pLDDT metric is an adaptation of the lDDT index introduced by Mariani et al. [19]. The lDDT metric is commonly used to assess the accuracy of predicted protein structures in terms of their ability to replicate corresponding reference structures. Specifically, it involves the calculation of all interatomic distances between a set of atoms belonging to a target residue in the reference structure and a set of atoms within a predetermined distance threshold (\(R_0\)) that do not belong to the same residue. The percentage of atoms within the threshold that are also present in the predicted structure is computed to quantify the extent to which the predicted structure preserves the local geometry of the reference structure. The pLDDT index builds on this metric by incorporating additional factors that account for uncertainties and errors inherent in the prediction process, resulting in a more robust and reliable measure of structural accuracy.

AlphaFold2 calculates lDDT-C\(_{\alpha }\), which targets only the C\(_{\alpha }\) carbons in each residue and trains it to output pLDDT, which is the predicted value of lDDT, even in cases where the correct structure is unavailable. It outputs the reliability of the prediction for each residue within a range of 0 to 100. The regions with low pLDDT values often correspond to intrinsically disordered regions that do not have specific structures [20]. In AfDesign, pLDDT values are normalized to a scale between 0 and 1.

2.2.2 pAE (predicted Aligned Error)

pAE is a matrix representing the positional error between the C\(_{\alpha }\) atoms of residue i and residue j, which is calculated by aligning the predicted structure by AlphaFold2 using the backbone structure of residue i with the experimentally determined structure. When predicting structures for which no ground-truth structure is available, regions where the relative error between residues i and j is large indicate lower confidence in the prediction [21].

2.2.3 C\(_{\beta }\) distogram

In AfDesign, the distance matrix (distogram) representing the distance between C\(_{\beta }\) and C\(_{\beta }\) residues predicted by AlphaFold2 can be used. Using dgram-cce as a loss function allows the design of predictions that have higher confidence in forming complexes [22].

2.3 Affinity calculation

In this study, the change in binding free energy during the antigen-antibody binding process, \(\Delta \Delta G\), was used to indicate the strength of the antigen-antibody interaction. However, because it is difficult to obtain the value of \(\Delta \Delta G\) experimentally, we used the value predicted by the calculation in this study. In this study, we used the DDG predictor [17, 23], a deep learning-based \(\Delta \Delta G\) prediction tool.

In this study, we utilized the change in binding free energy, \(\Delta \Delta G\), as an indicator of antigen-antibody interaction strength. Given the experimental challenges in measuring \(\Delta \Delta G\), we relied on computational predictions for this value. Specifically, we employed DDG predictor, a deep learning-based tool for predicting \(\Delta \Delta G\) values [17, 23].

3 Experimental procedure

3.1 Performance evaluation of DDG predictor

In this study, we employed the deep learning-based DDG predictor to evaluate the binding affinity of the generated antibody-antigen complex. The structure after the mutation is the one predicted by AlphaFold2. Therefore, in this experiment, we checked whether there is a correlation between the \(\Delta \Delta G\) output by DDG predictor and the experimental values even for the 3D structure predicted by AlphaFold2.

In this preliminary experiment, we used the AB-Bind database [24], a database that collects structural information on proteins and experimental data on changes in binding affinity due to mutations. From the AB-Bind Database, we extracted information on a total of 308 mutations in 11 complexes that are antibody-antigen complexes.

First, we input the amino acid sequences that reproduce the mutations in the above database into AlphaFold2 to predict their 3D structures. Then, we use DDG predictor to predict the \(\Delta \Delta G\) between the generated structures and the original structures. Finally, we examine the correlation between the predicted values and experimental data.

3.2 Determination of loss weights

In the binder hallucination function of AfDesign, the dgram-cce, pLDDT, and pAE described in Sect. 2.2 can be used as loss functions, and AfDesign allows the user to set the weight of each loss. In this experiment, we varied the relative weights of dgram-cce, pLDDT, and pAE losses to redesign the antibody sequences under different conditions and investigated the appropriate weights of these losses. A schematic diagram of how the antibody sequences were generated is shown in Fig. 2.

Fig. 2
figure 2

Schematic diagram of how to generate antibody sequences using AfDesign and DDG predictor

3.2.1 Protein design tools

AfDesign, a protein design tool available at the GitHub repository [13], was used in this study. Software version 1.0.8 was used in this study.

3.2.2 Parameters

The following parameters were used in the experiment: the number of iterations represents the number of iterations of logits, soft, and hard in the design_3stage(). In AfDesign, the number of times that AlphaFold2 iteratively improves the structure through cycling is specified as num_recycle.

  • Design object: 1VFB

  • Number of repetitions: logits-soft-hard=50-50-5

  • num_recycles = 0

  • Learning rate: 0.01

  • Optimization Algorithm: Adam[25]

Ten samples were generated for each of the six regions of the CDR, L1–3, and H1–3 under six conditions (A–F). For each of the 360 samples generated, the values of \(\Delta \Delta G\) were outputted using the DDG predictor. CDRs were determined according to Chothia’s definition [26]. The coordinates of atoms other than the target atom were fixed, and structure prediction using AlphaFold2 was performed only for the target amino acid.

Case A:

dgram-cce=1.0

Case B:

dgram-cce=1.0, pLDDT=0.2, pAE=0.2

Case C:

dgram-cce=1.0, pLDDT=0.4, pAE=0.4

Case D:

dgram-cce=1.0, pLDDT=0.6, pAE=0.6

Case E:

dgram-cce=1.0, pLDDT=0.8, pAE=0.8

Case F:

dgram-cce=1.0, pLDDT=1.0, pAE=1.0

3.2.3 Affinity

Using \(\Delta \Delta G\) predicted by the DDG predictor, the average \(\Delta \Delta G\) and IMP (IMProved Percentage), the percentage of sequences generated with a higher binding capacity than the original amino acid sequence was used as the valuation index.

3.3 Antibody CDR sequence design with AfDesign

We used the appropriate weights for the losses pLDDT, pAE, and dgram-cce determined from the results of Sect. 3.2, and applied AfDesign to the CDRs of the 12 complexes manually selected from the Protein Data Bank (PDB). For each CDR, we generated ten samples by redesigning with AfDesign and randomly mutating the amino acids. A total of 720 samples were generated using these two methods, and the predicted \(\Delta \Delta G\) values were obtained by inputting the mutant and original PDB files into the DDG predictor for each sample (Table 1).

Table 1 Protein Data Bank (PDB) IDs of 12 antigen-antibody complexes. The bold represented that were not used for AlphaFold2 training [10] (AlphaFold-Multimer were trained using PDB structures with a release date before 30 Apr, 2018)

The following parameters were used:

  • Number of repetitions: logits-soft-hard = 50-50-5

  • num_recycles = 0

  • Learning rate: 0.01

  • Optimization Algorithm: Adam[25]

  • Types of loss functions and their respective weights: dgram-cce=1.0, pLDDT=1.0, pAE=1.0

Using \(\Delta \Delta G\) predicted by the DDG predictor, the average \(\Delta \Delta G\) and IMP, the percentage of sequences generated with a higher binding capacity than the original amino acid sequence, were used as valuation indexes.

Fig. 3
figure 3

Scatterplot comparing the experimental and predicted values of \(\Delta \Delta G\) for antibody-antigen complexes, with a correlation coefficient of \(R=0.56\). A black dashed diagonal line represents the line of perfect agreement between experimental and predicted values. Blue dashed lines are placed at \(\pm 2\) kcal/mol around the diagonal. In the scatterplot, blue points represent single mutations (point mutations) in the antibodies, while red points represent multiple mutations

4 Results and discussion

4.1 Performance of DDG predictor

Figure 3 displays a scatterplot comparing predicted \(\Delta \Delta G\) values from the DDG predictor to experimental values. These values pertain to mutation structures sourced from the AB-Bind database and are generated using AlphaFold2. The overall Pearson correlation coefficient of \(R=0.56\) (\(p\text{-value } = 3.9\times 10^{-27}\)) suggests a high level of agreement between experimental and computational predictions. Further subdivision of the dataset reveals Pearson correlation coefficients of \(R=0.48\) (\(p\text{-value } = 1.5\times 10^{-13}\)) for the 209 data points concerning single mutations and \(R=0.67\) (\(p\text{-value } = 2.7\times 10^{-14}\)) for the 99 data points linked to multiple mutations. Notably, the predictive accuracy is particularly high for multiple mutations, underscoring the potential of our computational approach for more complex antibody designs. While the Pearson correlation is a useful measure, it should be noted that it does not capture all aspects of prediction accuracy, and further validation might be necessary. Overall, these findings align well with previous studies, strengthening the evidence for the efficacy of computational methods in predicting antibody-antigen binding affinity.

4.2 Loss weights

Figure 4 shows the distribution of \(\Delta \Delta G\) values for 60 samples when designing the six regions of CDR-L1–3 and CDR-H1–3, each with 1VFB, as design targets under various conditions. According to Fig. 4, there is no difference in the distribution among the different conditions; no advantage of \(\Delta \Delta G\) by condition was found.

Fig. 4
figure 4

Box-and-whisker plot illustrating the distribution of \(\Delta \Delta G\) values under each condition. In the boxes, the line represents the median, while the cross mark signifies the mean

Table 2 Average \(\Delta \Delta G\) and IMP values (PDB 1VFB) for each condition. Letters in bold indicate the best value in each CDR

Figure 5 is a graph that shows the transition of the three parameters, dgram-cce, pLDDT, and pAE, during the design process under various conditions when designing CDR-H1 as the design target. Looking at the transition graph for Case A, even though only dgram-cce was considered a loss, pLDDT and pAE also showed correlated changes in their values. In addition, even when the proportions of pLDDT and pAE in the loss function increased in Cases B and C, the changes in the transitions of the two parameters were very small.

Fig. 5
figure 5

1VFB CDR-H1 as the design target and the evolution of the values of the three parameters when designed for each condition

Table 2 lists the average \(\Delta \Delta G\) values for the ten samples generated under each condition when each CDR was the design target, as well as the average \(\Delta \Delta G\) and IMP values for each condition. Although no clear advantage in binding affinity was found depending on the conditions, we conducted the experiment using Case F, which showed the best average \(\Delta \Delta G\) and IMP values. Since we did not observe significant changes even when the ratio of pLDDT to pAE was altered, we decided not to conduct further experiments.

Fig. 6
figure 6

IMP of sequence design results for each of AfDesign and random mutation

4.3 Antibody CDR sequence design

Figure 6a shows the comparison between designs obtained using AfDesign and those obtained from 20 random mutations, in which 20 amino acids were randomly mutated with equal probability within the same design range. In particular, when CDR-H1 and CDR-H3 were targeted for design, more sequences with higher binding affinity than the original sequence were generated compared to random mutations. However, when CDR-L2 and CDR-H2 were targeted, the design performance was inferior to random mutations, and the results differed depending on the CDR.

Figure 6b shows the results for three of the 12 complexes used in Experiment 3 that were not used for AlphaFold2 training. It was confirmed that in all regions, a higher percentage of sequences with higher binding affinity than the original sequence was generated compared to random mutations or at least the same level. Thus, it is possible to create mutations that improve binding affinity, even without training data.

Figure 7a and b shows the superimposed 3D structures of the actual sequences generated using AfDesign with CDR-L1 and CDR-H3 as the design targets and the original structures for PDB ID 7RR0. The version of AfDesign used in this study utilized the proteins registered in PDB in 2019 for AlphaFold’s training; therefore, the 7RR0 complex registered in PDB in 2021 was not used for training. However, when designing regions other than CDR-H3, as shown in Fig. 7a, there was a tendency to generate sequences with structures similar to the original loop structure, even for complexes not used in AlphaFold2 training. The results of this study show that CDR-H3 tends to produce a structure that is slightly different from the original loop structure, as shown in Fig. 7b. This is because it is difficult to predict the loop structure of CDR-H3 because of the diversity in the sequence and structure of CDR-H3 compared with those of other regions of the CDR. Therefore, we speculate that the difference in CDR-H3 loop structure prediction accuracy by AlphaFold2 may be due to the generated loop structures being slightly offset from the original loop structures.

Fig. 7
figure 7

Sequence redesign of PDB ID 7RR0 structure by AfDesign

5 Conclusion

In this study, we successfully utilized AlphaFold2 to design antibody sequences with improved binding affinity. Notably, AfDesign exhibited significant improvements over randomly mutated sequences, even in the absence of experimental data. This fills a critical gap noted in related studies that struggled with designing sequences for highly variable regions such as CDRs. The application of loss functions like dgram-cce, pLDDT, and pAE in AfDesign yielded sequences that closely resembled the original loop structures, suggesting that AfDesign is not only effective, but also versatile, capable of working even on complexes not included in AlphaFold2’s training data.

However, it is important to note that our results are based on computational evaluations. Hence, biochemical validation is essential to confirm the designed sequences’ binding capabilities with their respective targets. Further investigation is also warranted to fine-tune the approach for broader applications, including a wider variety of antigen-antibody complexes.

This study serves as a stepping stone toward the rapid and efficient development of antibody-based therapeutics, particularly in scenarios where experimental data are scarce. Future work should aim at empirical validation of our findings and expanding the applicability of our approach to a broader range of biological interactions.