Key words

1 Introduction

Antibodies play important roles in immune systems of vertebrates, and the sequences and structural features of antibodies depend on species [1,2,3]. Antibodies are Y-shaped molecules consisting of multiple domains. The arms of Y-shaped structures are called fragment of antigen-binding (Fab) regions and consist of light and heavy chains. Although many of antibodies from vertebrates, including camelids, possess both light and heavy chains, camelids also have antibodies that lack the light chains. In these heavy chain antibodies, the C-terminal variable region is called VHH domain. Single-domain VHH antibodies combine the advantages of the specificity and affinity of conventional antibodies with high stability and solubility originating from nature of single-domain proteins [4, 5]. These properties of single-domain VHH antibodies make them more useful in various applications than antibodies that have both light and heavy chains.

1.1 Architectures of Antigen-Binding Sites

In conventional antibodies, antigen-binding sites are comprised of six complementarity-determining regions (CDR-L1, L2, L3, H1, H2, and H3); in VHH antibodies, the antigen-binding sites consist of only three CDRs (H1, H2, and H3) (Fig. 1a). In both cases, CDR-H3 is formed around the junction of the VH, DH, and JH segments, and hence CDR-H3 is the most diverse and important for antigen recognition [6,7,8,9]. The length distributions of CDR-H3 (residues H95–H102 based on the Chothia numbering) of VHH and human antibodies were retrieved from the abYsis database (Fig. 1b) [10]. The median length of CDR-H3 in single-domain VHH antibodies is one residue longer than that of human antibodies and four residues longer than those from mouse and rabbit, which are common animals for immunization.

Fig. 1
An illustration of the C D R-H3 structure in figure a has the non-C D R H3 and C D R-H3 in two colors with antibodies. In figure b, a frequency versus length of C D R H3 has V H H-P D B which reaches a peak at a length of 11.5 with a frequency of 11.5, V H H-abYsis reaches a peak at a length of 15 with a frequency as 11, Human-abYsis reaches a peak at a length of 13 with frequency as 10.5.

Structure and length of CDR-H3 in antibodies are diverse. (a) Structures of non-CDR-H3 and CDR-H3 in single-domain VHH antibodies are shown in white and magenta, respectively. (b) Length distributions of VHH (from camel, lama, and alpaca) and human antibodies. The numbers of antibodies in each category are provided in parentheses

In conventional antibodies, a source of diversity in addition to CDR-H3 is pairing of VL and VH domains, which together form FV domains. The VL/VH interfaces have hydrophobic nature and can adjust their orientations upon antigen binding to result in a wide antigen-binding interface [11, 12]. The length of CDR-L1 is correlated with the type of antigen bound [13]: A longer CDR-L1 can form pocket-like antigen-binding sites suitable for recognition of small molecules, whereas a shorter CDR-L1 makes antigen-binding sites flatter, enabling recognition of globular protein antigens. On the other hand, single-domain VHH antibodies have only the VH domain by definition. In these domains, the residues H37, H44, H45, and H47 (Chothia numbering), which are the so-called VHH tetrad at the former VL domain interface, are hydrophilic.

1.2 Short History of CDR Classifications

One might expect that all CDRs should be diverse because the number of antigens is much higher than the number of gene segments used on V(D)J recombination [14]. However, comparison of crystal structures of antibodies revealed that the conformations of CDRs other than H3 are relatively rigid leading to classification into so-called canonical structures [15,16,17,18,19]. As the number of antibody crystal structures has increased dramatically since canonical structures were first reported in the late 1980s [15, 20], there have been several updates that classified and defined new canonical conformations, enabling high-resolution structure prediction from amino acid sequences [16,17,18,19, 21,22,23]. Canonical structures have been defined for each CDR. Therefore, a combination of these canonical conformations can confer diversity on the antigen-binding site. The limited number of combinations are called structural repertoires [24, 25]. Residues in framework regions (FRs) also function to maintain canonical structures, which often rationalizes success and failure of antibody humanization [26, 27]. Although the conformations of canonical structures can be predicted based purely on sequence homology [28], machine learning-based template selection can be also helpful for antibody modeling [29].

Due to their limited conformations, analyses of non-H3 CDRs have enabled accurate structure classification, even when the number of crystal structures was small. In a pioneering study, Chothia and Lesk evaluated the six crystal structures of antibodies available at that time in the Protein Data Bank (PDB) [15]. Since CDR-H3 is more diverse than other CDRs, it is difficult to define canonical conformations of CDR-H3. Based on careful inspection of 100 antibody crystal structures, Shirai et al. proposed the so-called H3-rules [30, 31], which predict substructures of CDR-H3 from the amino acid sequences. Morea et al. arrived at a similar conclusion that charged residues at the both ends of CDR-H3 are important to maintain their conformations [32]. These sequence–structure correlations were further refined and formularized by Kuroda et al. using additional crystal structures [6]. These studies focused on peculiar substructures, such as kinked or extended conformations, at the C-terminal regions or “base” of CDR-H3. Weitzner et al. later identified kinked conformations in other protein families, such as PDZ domains, and suggested that the kink conserved in the immunoglobulin heavy chain fold is a critical driver of the observed structural diversity in CDR-H3 since it disrupts the β-strand pairing at the base of the loop structure [7]. The diversity of CDR-H3 continues to hamper accurate structure prediction of antibodies, especially when the CDR-H3 is long (>12 residues) [28].

Most studies have focused on antibodies that possess both light and heavy chains. However, crystal structures of many CDR-H3s of single-domain VHH antibodies are now available. In the following sections, we first review our current knowledge of sequence–structure correlations of single-domain VHH antibodies. Based on the 370 structures in the PDB, we classified CDR-H3s of single-domain VHH antibodies into three distinct structural classes and eight subclasses. We discuss possible correlations between sequences and each structural class. To our knowledge, this is the first attempt to define canonical-like conformations of CDR-H3s in VHH antibodies. These results will enable accurate structure prediction of VHH antibodies and will guide rational antibody engineering.

2 Current Knowledge of Sequence–Structure–Function Correlations of VHH Antibodies

2.1 Diversity of FRs and CDRs

Despite the smaller size of paratopes due to the nature of the single-domain architecture, the binding affinities of single-domain VHH antibodies are comparable to those of conventional antibodies. This apparently paradoxical consequence of depletion of VL domains has led to many crystallographic studies of single-domain VHH antibodies as well as the complexes with antigens. Hence, the number of crystal structures of single-domain antibody has been gradually increasing and now exceeds 600 in the PDB [23, 33], enabling large-scale structure-based analyses of single-domain VHH antibodies and the complexes with antigens at unprecedented depth [34,35,36,37,38,39]. Although some variations of backbone dihedral angles have been reported [34], FRs in single-domain VHH antibodies are more conserved in terms of sequence and structure than those in conventional antibodies [37]. Furthermore, although sequence diversities in CDR-H1 and CDR-H2 in single-domain VHH antibodies are similar to those of conventional antibodies, the structural diversities are larger. Although canonical conformations have been reported for single-domain antibodies, there are a significant number of the outliers [38]. In addition, CDR-H3 in single-domain VHH antibodies is more diverse in terms of sequence and structure than those of conventional antibodies, as the former can occupy a greater range of positions relative to the FRs due to the absence of a VL domain [37]. There was an attempt to classify structures of CDR-H3 in single-domain antibodies based on 149 crystal structures of CDR-H3 in single variable domains of heavy chains derived from human, shark, and camelid antibodies; in that study, CDR-H3s were visually classified into extended, flat, or pleated conformations without considering sequence–structure correlations [35].

2.2 Contribution of FRs and CDRs to Antigen Binding

Crystal structures of antibody–antigen complexes revealed residues in FRs that directly contact antigens. In conventional antibodies, the DE loop, sometimes referred to as CDR4, in a FR interacts with antigens [40]. These FR-mediated interactions are more evident in single-domain VHH antibodies [35,36,37, 39]. The contribution of CDR-H3 is the most dominant in antigen recognition, however, which is followed by CDR-H2, FRs, and CDR-H1 [39]. A significant proportion of such FR-mediated interactions involve charged residues, and these non-CDR interactions are with residues in four locations, the FR2, the N-terminal region, the CD loop, and the DE loop in decreasing order of importance [39]. Thus, whereas conventional antibodies tend to exclusively use only six CDRs to recognize antigens, single-domain VHH antibodies employ residues not only in three CDRs but also in FRs [36].

Epitopes of single-domain VHH antibodies tend to be concave and more ordered than those of conventional antibodies [39], reflecting the fact that many single-domain VHH antibodies have long protruding CDR-H3. Indeed, crystal structures of single-domain VHH antibodies have revealed that they inhibit enzyme activities either through direct, competitive interactions of the long protruding CDR-H3 with a concave catalytic pockets of antigen surfaces [41,42,43] or through an indirect allosteric interaction of a relatively flat interface formed by CDR-H3 [44].

3 Structural Classification of CDR-H3

3.1 Single-Domain VHH Antibodies in the PDB

As of the end of June 2021, there were 677 PDB files that contain structural data on single-domain VHH antibodies. From these structures, we extracted structures determined by X-ray crystallography, which resulted in 551 antibodies. After removing redundancy based on CDR-H3 sequences (H95–H102), 370 VHH antibodies that have CDR-H3s with no missing residues remained. We use the Chothia numbering scheme throughout the manuscript. Statistics of our dataset are provided in Table 1, and the length distribution of CDR-H3s is shown in Fig. 1b. In general, although proteins can change their conformations upon binding, those changes tend to be small in conventional antibodies [45]. Considering their smaller size and the dominant roles of CDR-H3 in antigen binding, conformational change of CDR-H3 may be common in single-domain antibodies. To examine such conformational changes, we also evaluated 44 pairs of antibodies with the same CDR-H3 sequences that were crystalized in both antigen-bound and unbound states. PDB IDs of structures evaluated are listed in the Notes section.

Table 1 Statistics of 370 single-domain VHH antibodies used in this study

3.2 Visual Inspection of CDR-H3 Identified Three Structural Classes and Eight Subclasses

In conventional VL/VH antibodies, the base conformations of CDR-H3 are kinked or extended as defined by dihedral angles that consist of four consecutive Cα atoms at the base region of CDR-H3 (θbase) [6]. We plotted the frequencies of the θbase in 370 nonredundant single-domain VHH antibodies as well as those in 1019 nonredundant human antibodies in the PDB (Fig. 2). As previously reported [6, 31], most of the human antibodies (93%) assume kinked conformations (−120° ≤ θbase ≤ 120°). In contrast, the significant number of single-domain VHH antibodies assumes extended conformations by definition (θbase < −120° or θbase > −120°): For single-domain VHH antibodies, 69% adopt kinked conformations and 31% adopt extended conformations. These differences between single-domain and human antibodies may reflect the larger diversity of the former.

Fig. 2
A graph of frequency versus Dihedral angle has V H H starting at a negative180-degree with a frequency of 0.4. It increases and peaks at 60-degree with a frequency of 1. It ends at a 180-degree angle with a frequency of 0.1. Human starts at a negative 180-degree and peak at a 30-degree with a frequency of 1 and ends at 180 degrees with a frequency of 0.

Frequency of kinked (−120° ≤ θbase ≤ 120°) and extended (θbase < −120°, θbase > 120°) base conformations of CDR-H3 in antibodies determined based on kernel density estimation

Based on the visual inspection, we identified three structural classes of CDR-H3s in VHH antibodies, one including those with helical bending structures, another with kinked conformations without helices, and the other with β-hairpin-like extended structures (Fig. 3). The helical bending conformations were further divided into four subclasses based on the locations of helices. The kinked conformations were further classified into two subclasses, one resembling typical kinked structures in human antibodies and the other bending strongly to the former VL interface. These helical and kinked conformations employed CDR-H3 to interact FRs to differing extents, suggesting that helical and kinked conformations are driven by hydrophobic forces between CDR-H3 and the former VL interface. Strikingly, in the 155 structures with helical bending conformation, 109 have a helix near or at the C-terminal region. The overall conformations are quite similar especially when the lengths are identical (Fig. 4a). In the 117 structures with the extended conformation, 107 CDR-H3s have standard β-hairpin structures (Fig. 4b), whereas the other ten have a helical or coil conformation, which does not interact with the FRs. Interestingly, although their overall structures are similar within the subclasses, especially for the helical bending conformations, the sequences are quite diverse except for the C-terminal regions. In the C-terminal helical bending structures of 15-residue CDR-H3, most antibodies possess a Tyr residue at position H100G, whereas the corresponding position in the extended conformation is Tyr, Asn, or Ser (Fig. 4).

Fig. 3
An illustration of C D R-H3 has three structural classes: Helical bending conformations, Kinked confirmations, and Extended conformations. Helical bending has 4 subclasses with 155 antibodies, Kinked conformation has 2 subclasses with 98 antibodies, and extended conformation has 2 subclasses with 117 antibodies.

Three structural classes and eight subclasses of CDR-H3 in VHH antibodies. The numbers of antibodies in each class are provided in parentheses

Fig. 4
An illustration of helical bending with 15-residue C D R-H3 in figure a is H95, H96, H97, H98, H99, H100, H100A, H100B, H100C, H100D, H100E, H100F, H100G, H101, H102. Helical bending with 12-residue C D R-H3 in fig b is H95, H96, H97, H98, H99, H100, H100A, H100B, H100C, H100D, H101, H102. Here, the helix structure starts from the C-terminal.

Dominant conformations of the helical bending and extended classes. (a) Helical bending conformations of 15-residue CDR-H3 (24 cases), where a helix structure is observed near the C-terminal region. (b) Extended conformations of antibodies with 12-residue CDR-H3s (18 cases). The corresponding sequence conservations of 15-residue and 12-residue CDR-H3s are shown below the ribbon diagrams

In addition to the canonical disulfide bond between positions H22 and H92, which is commonly observed in heavy chains of conventional antibodies, single-domain VHH antibodies have noncanonical disulfide bonds between CDR-H3 residues and residues outside CDR-H3 [46]. In our dataset, there are 63 cases where CDR-H3 possess at least one Cys residue, and 62 of these form disulfide bonds with the surrounding residues (Table 2). In 38 cases, there is a noncanonical disulfide bond between CDR-H3 and the boundary of CDR-H2 and the FR2 (e.g., H50) (Fig. 5a). In 16 cases, there is a noncanonical disulfide bond between CDR-H3 and the boundary of CDR-H1 and the FR2 (e.g., H33) (Fig. 5b). In three cases, there is a noncanonical disulfide bond between CDR-H3 and the FR2 (e.g., H45) (Fig. 5c). In five cases, there is a noncanonical disulfide bond within CDR-H3 (Fig. 5d). In a single case (PDB ID: 6X08) [47] in our dataset, the Cys residue located at H50 does not form a disulfide bond with the Cys residue in CDR-H3 (H100A). These noncanonical disulfide bonds are observed mostly in antibodies in helical bending or kinked classes, suggesting a role in enhancing the interactions between CDR-H3 and the former VL interface. These disulfide bonds constrain the conformation of the CDR-H3, leading to higher thermal stability [48], and may minimize entropy loss upon antigen binding.

Table 2 The numbers and locations of noncanonical disulfide bonds involving residues in CDR-H3 and the surrounding regions
Fig. 5
A schematic diagram of H50 in figure a has the disulfide bonds with 2 heavy chains and 2 light chains. Here, there is a heavy loop chain with both CDR and non-CDR. In figure b, H33 has disulfide bonds with less heavy chain and light chain. In figure c, H45 has a loop between both heavy and light chains. In figure d, C D R-H3 has no non-CDR chain.

Noncanonical disulfide bonds between CDR-H3 and residues at (a) H50, (b) H33, (c) H45, and (d) CDR-H3. For (a) and (b), only the representative examples are shown. The disulfide bonds are shown as stick models. CDR-H3 is colored magenta. The non-CDR-H3 residues involved in the disulfide bonds are colored cyan

3.3 Conformational Changes of CDR-H3 upon Antigen Binding

Forty-four single-domain VHH antibodies have been crystallized in both antigen-bound and unbound states. To quantify the structural changes upon antigen binding, we computed Cα-RMSDs of CDR-H3 between the bound and unbound states after superposing Cα atoms of three residues at N- and C-terminal regions of CDR-H3. Large conformational changes were rarely observed: 80% (35/44) of the antibodies had less than 2.0-Å RMSD upon antigen binding (Fig. 6). In notable exceptions, nanobodies L06 [49] and Nb5776 [50] had the RMSDs of 9.1 Å and 8.7 Å, respectively, between antigen-bound (PDB ID: 5E7F and 5DA0, respectively) and unbound states (PDB ID: 5E7B and 5DA4, respectively). The resolutions of the crystal structures of the bound-state are poor (2.7 Å and 3.2 Å for L06 and Nb5776, respectively), but it is clear that without such large conformational changes, the unbound state would have steric clash with the antigen, and hence structural changes upon antigen binding are plausible. In addition, even in these cases, the structural classes we defined in this study did not change upon antigen binding.

Fig. 6
A graph of the number of antibodies versus R M S D of C D R-H3 has the smallest conformational change which is less than or equal to 1 and has the largest conformational change which is less than or equal to 10.

Frequency of conformational change of CDR-H3 in single-domain VHH antibodies upon antigen binding. Plotted is number of antibodies with indicated Cα-RMSD. The protein schematic is of L06 (PDB ID: 5E7F and 5E7B), the antibody with the largest conformational change (9.1 Å) with the CDR-H3 of the bound state in magenta and the unbound state in green. The black surface model is the antigen, the lactococcal phage Tuc2009 receptor-binding protein

4 Sequence–Structure Correlations of CDR-H3

4.1 H3-Rules Do Not Hold for CDR-H3 in VHH Antibodies

We next evaluated sequence–structure correlations within each structural class of single-domain VHH antibodies. For conventional antibodies, the H3-rules predict conformations of the base structures of CDR-H3 as well as patterns of hydrogen-bond formation of the β-hairpin from the amino acid sequences [6, 31]. H3-rules mainly rely on the Asp switch model: When the side chain of the conserved Asp at position H101 points toward paratope surface, CDR-H3s assumes a kinked conformation, whereas when the side chain points toward the VL/VH interface, CDR-H3 assumes an extended conformations [6]. The direction of the side chain of Asp at position H101 is governed by the presence or absence of Arg/Lys at position H94. In human antibodies, Arg/Lys at position H94 and Asp at position H101 are well conserved (relative frequencies are 84% and 83%, respectively) [10]. Thus, H3-rules classify sequences of CDR-H3 based on the conservation of Arg/Lys at position H94 and Asp at position H101. Rule i-c states that when Arg/Lys is present at position H94 and Asp is present at position H101, they form a salt bridge that results in a kinked conformation of CDR-H3. Rule i-d states that Arg or Lys at position H93 forms a salt bridge with the conserved Asp at position H101, resulting in an extended conformation. Rule i-b states that when only the Asp at H101 is present, the antibody adopts an extended conformation. Rule i-a states that when there is neither Arg/Lys at position H94 nor Asp at position H101, a kinked conformation is formed.

In our dataset, 67.1% (684/1019) of human antibodies possess a positively charged residue at position H94 and an Asp at position H101. On the other hand, only 6.5% (24/370) of single-domain VHH antibodies have those residues. Therefore, when we classified the sequences of CDR-H3s based on the H3-rules, in stark contrast to human antibodies, there were few that met the rule i-c criteria of an Arg/Lys at position H94 and an Asp at position H101. Instead sequences of rules i-a and i-b (only the Asp at H101 or neither conserved residue present, respectively) were dominant (Table 3). These differences between human and VHH antibodies are partly due to the origin of germline genes. Human antibodies are derived from a variety of germline genes, whereas single-domain VHH antibodies are mostly from a few germline genes, and most are from a single family of IGHV1 genes. Importantly, 38.1% and 76.5% of the sequences of rule i-a and rule i-b of single-domain VHH antibodies assume extended and kinked conformations, respectively, which are in opposition to the predictions of the H3-rules.

Table 3 Sequence classification of CDR-H3 in antibodies based on H3-rules

4.2 Sequence Features of Each Structural Class of Single-Domain Antibodies

To make single VH domains soluble, nature has used two distinct strategies. In one, four hydrophobic residues in human antibodies, the VHH-tetrad (H37/H44/H45/H47), are hydrophilic in single-domain VHH antibodies. Mutation of the residues VHH-tetrad has been used previously to make human VH domains soluble [51]. The second strategy involves interaction between CDR-H3 residues and residues of FRs. In this case, hydrophobic residues of the VHH-tetrad would remain hydrophobic. We therefore first looked for correlations between structural classes and amino acid sequences of the VHH-tetrad. We found a clear distinction between the sequences at the VHH-tetrad and helical bending/kinked conformations and extended ones (Fig. 7). A common structural feature of the former group is that CDR-H3 interacts with the VHH-tetrad with bulkier hydrophobic resides preferred in position H37 and H47 at the VHH-tetrad (Fig. 7a).

Fig. 7
An illustration of the sequence conservation at V H H-tetrads has helical bending as F E R F, extended bending as Y Q R L, and human as V G L W. In figure b, a graph of frequency versus length of C D R-H3 has helical bending reach peaks at 15 lengths with a frequency of 18, kinked reach peaks at 12 lengths with a frequency of 13, and extended reach peaks at 12 lengths with a frequency of 21.

Sequence–structure correlation in single-domain VHH antibodies. (a) Sequence conservation at VHH-tetrads of each structural class. The corresponding amino acids in human antibodies are shown for reference. (b) Length distributions of each structural class. The numbers of unique sequences in each class are provided in parentheses

Second, we examined residues of CDR-H3. Because of the length variability, CDR-H3 residues were renumbered from 1 to n, corresponding to the Chothia numbering scheme from 95 to 102 of the VH domain [6]. The 15-residue CDR-H3s that assume a bending conformation with a helix near the C-terminal region showed strong preference of Tyr residue at the position of n-1 (Fig. 4a), and this trend was observed for the other lengths of CDR-H3. Conserved residues of the VHH-tetrad and at the position of n-1 on CDR-H3 interact with each other at the former VL interface.

Third, the length of CDR-H3 was correlated with structural class (Fig. 7b). For example, when CDR-H3 was 14 residues or longer, the helical bending conformation was dominant; 14 residues is the median length of CDR-H3 in single-domain VHH antibodies in the large sequence database [10]. Extended conformations were more common when CDR-H3 was shorter. These trends may be due to physicochemical properties of the antibodies. Long, extended CDR-H3s will be flexible in solution, and there will be large entropic loss upon antigen binding, which may result in poor binding capability and poor thermodynamic stability.

5 Conclusions and Perspectives

5.1 Implications for Antibody Modeling

Through visual inspection, we were able to identify three structural classes and eight subclasses of CDR-H3s in single-domain VHH antibodies. Based on the theory of canonical structures, structures of conventional antibodies can be predicted from amino acid sequences. Although CDR-H3 is diverse, H3-rules have been useful to characterize CDR-H3 structures in conventional antibodies [52, 53], and C-terminal constraints based on the knowledge have been applied to prediction of CDR-H3 structure in conventional antibodies [28, 54, 55]. Although a previous study demonstrated the difficulty in predicting CDR-H3 in single-domain VHH antibodies [56], our new structural classifications will facilitate structure predictions once we gain the ability to predict the structural class from the amino acid sequences: Approaches such as hidden Markov models could be employed for that purpose.

5.2 Implications for Antibody Engineering and Design

Knowledge provided in this chapter will be useful in antibody engineering and design. For example, synthetic antibody libraries have been developed based on a limited knowledge of sequence–structure correlations [51, 57,58,59,60], and library-based approaches can guide affinity maturation of existing antibodies [61]. Our classifications will be beneficial in library design, as the conformation of the CDR-H3 could be controlled based on the sequence–structure correlations that we identified. Single-domain VHH antibodies in the current PDB mostly bind protein antigens, but there seems to be no correlation between structural classes of CDR-H3 and the type of antigens. However, the number of protein antigens is almost infinite, and shapes of their epitopes are diverse: They can be flat, concave, or even rugged. Protrusion of CDR-H3 observed in the extended conformations would be suited to target a cavity on the antigen surface, as seen in anti-lysozyme antibodies [43], whereas helices can form a flat interface, enabling the antibody to target a flat protein–protein interface. Together with detailed physicochemical measurements of antibody–antigen interactions [62], analyzing these structure–function correlations would be a logical next step toward rational antibody design.

Recently, miniaturization of proteins has attracted much attention. Due to their modular architecture, antibodies are an attractive target for miniaturization. Conventional VL/VH antibodies can be miniaturized into single-domain antibodies through a process called antibody camelization [63]. As antigen-binding capability is realized through CDRs, antibodies can be further miniaturized to only a single CDR [64, 65]. A key here is that conformations of CDRs govern the molecular recognition, and the three-dimensional structure of the CDR in the parent antibody must be preserved to maintain function in a miniaturized version. To realize this, sequence–structure relationships, as we have shown in this chapter, are extremely important.

Antibodies can be obtained through animal immunization or selection from synthetic libraries, but it is often difficult to know a priori where those antibodies bind on antigen surface. This can be a problematic, especially when targets are viral proteins, which utilize only a part of the protein surface to invade host cells. Therefore, methods for epitope-specific antibody generation are highly desirable [66], and there have been attempts to design antibodies de novo [67,68,69,70]. In this context, computational analyses should facilitate design of epitope-specific antibodies. Since antibody structures are well conserved except for CDR-H3, the main challenge in de novo design is to build arbitrary conformations of CDR-H3 that bind to desired antigens. As structure prediction of CDR-H3 from the amino acid sequence is still challenging, complete de novo design of antibodies is still not yet possible. However, considering recent advances in high-throughput repertoire-scale experiments and machine learning as well as molecular simulations, we expect that fully automated de novo antibody generation will become feasible, and will be routinely employed in various fields. Although direct sequence–structure correlations revealed in our analysis of single-domain VHH antibodies are less well defined than those of conventional antibodies, the length-dependent nature of CDR-H3 in single-domain VHH antibodies will immediately inform development of new synthetic libraries and de novo antibody design.

6 Notes

The PDB IDs with the associated chain ID of single-domain VHH antibodies used in this study are listed below. The lengths of CDR-H3s are provided in parentheses: 6heq-A (3), 5l21-B (3), 5h8d-A (3), 3zkx-B (3), 3zkx-C (3), 6oq7-C (3), 6ssi-F (3), 5c3l-D (3), 2xxc-B (4), 5e03-A (4), 5ip4-A (4), 5dxw-A (5), 4x7f-C (5), 6ui1-D (5), 5omn-C (5), 6u53-A (6), 6y0e-A (6), 7mfu-B (6), 5m2j-D (6), 7mfv-B (6), 7d2z-A (6), 6lz2-B (6), 7c8v-A (6), 7a4y-A (6), 5f7k-C (6), 6ui1-B (6), 6fv0-F (6), 5o2u-B (6), 6tyl-C (6), 1t2j-A (7), 6gwn-B (7), 5ocl-A (7), 5my6-B (7), 5o04-E (7), 7a4d-C (7), 5dmj-B (7), 5vxl-B (7), 7kji-A (7), 5fv2-A (7), 6qv1-E (7), 5dfz-E (7), 6cnw-A (8), 6h6y-E (8), 1hcv-A (8), 5ja9-A (8), 3k74-B (8), 4orz-C (8), 6xw7-C (8), 6oq8-C (8), 5jmr-A (8), 6csy-N (8), 6fe4-F (8), 6apo-A (9), 6dyx-D (9), 7a48-A (9), 6dbe-A (9), 6xw5-C (9), 6dbd-A (9), 4zg1-A (9), 6qup-B (9), 5f21-B (9), 6u55-A (9), 5g5r-B (9), 4eig-B (9), 3cfi-C (9), 7a4d-A (9), 5usf-C (9), 6gk4-C (9), 6i8h-B (9), 5eul-V (9), 4ksd-B (9), 4b41-A (10), 5nlu-A (10), 5vnv-A (10), 6xzu-A (10), 1yc7-A (10), 4w2p-A (10), 6apq-A (10), 3zhk-A (10), 7kkj-A (10), 6qtl-A (10), 6gju-B (10), 6j7w-A (10), 6r7t-A (10), 6v7z-E (10), 4xt1-C (10), 6h16-B (10), 6c5w-C (10), 7jvb-C (10), 5m2w-A (11), 6i2g-A (11), 6t2j-A (11), 6obe-B (11), 6obc-B (11), 3b9v-A (11), 5i0z-A (11), 1ohq-A (11), 7kjh-A (11), 1sjx-A (11), 5mwn-N (11), 6x04-B (11), 6hhu-G (11), 4cdg-C (11), 5jqh-C (11), 6qx4-C (11), 4pir-F (11), 6v80-E (11), 6o8d-H (11), 2p45-B (12), 3tpk-A (12), 7n0r-C (12), 5lmw-A (12), 4ppt-A (12), 4poy-A (12), 6u50-C (12), 3qsk-B (12), 5m7q-A (12), 1ol0-A (12), 2p4a-B (12), 6obo-C (12), 6rtw-B (12), 4m3j-A (12), 4x7c-C (12), 1zv5-A (12), 5o05-C (12), 6obg-C (12), 6gwn-C (12), 6oc8-A (12), 7a4t-A (12), 6i6j-C (12), 4u3x-A (12), 5lhr-B (12), 7kgj-B (12), 7joo-K (12), 5ja8-B (12), 6obm-B (12), 7bnw-A (12), 6gju-C (12), 5mje-B (12), 7bu7-B (12), 4weu-D (12), 6z20-B (12), 3v0a-C (12), 6tej-C (12), 7c8w-A (12), 3ezj-B (12), 5hvf-B (12), 5hm1-A (12), 6war-B (12), 6zbv-B (12), 3p0g-B (12), 2x1o-A (13), 4wem-B (13), 1kxv-C (13), 5ivo-A (13), 6yu8-B (13), 4wen-B (13), 6qgw-B (13), 6x05-K (13), 3k7u-A (13), 6xw4-C (13), 6h72-C (13), 5da4-A (13), 6gjq-B (13), 5y80-B (13), 7kn6-C (13), 7b27-C (13), 5m94-B (13), 6vi4-C (13), 6tyl-J (13), 5nqw-C (13), 3k81-A (13), 6u14-A (14), 6dba-A (14), 6z1v-B (14), 5j1t-C (14), 4z9k-B (14), 1kxq-E (14), 6fpv-A (14), 4nbx-B (14), 6s0y-A (14), 6zrv-B (14), 4qo1-A (14), 7lvu-A (14), 5lmj-A (14), 6ocd-B (14), 4laj-H (14), 5m2i-G (14), 6qgx-B (14), 7a0v-B (14), 3p9w-B (14), 4nc2-B (14), 6qgy-B (14), 2wzp-D (14), 6ehg-C (14), 6o3c-B (14), 5mzv-D (14), 4yga-B (14), 6gkd-C (14), 5hvh-B (14), 5tjw-K (14), 6n4y-E (14), 7aej-D (14), 5nbd-C (14), 6her-B (15), 6ru3-C (15), 5vlv-A (15), 6sge-B (15), 6xxo-A (15), 6itq-A (15), 4gft-B (15), 4qlr-A (15), 5vak-B (15), 5o02-C (15), 4nbz-B (15), 3qxu-A (15), 3ln9-A (15), 5j56-B (15), 3eba-A (15), 6ir1-B (15), 6gjs-B (15), 2vyr-I (15), 4idl-A (15), 4c58-B (15), 6rvc-D (15), 5ocl-B (15), 5f1k-C (15), 5m2m-D (15), 6i8g-B (15), 3k80-A (15), 4aq1-B (15), 6hjx-F (15), 4krm-B (15), 5f7l-B (15), 6h02-B (15), 4kfz-C (15), 5nbl-E (15), 6f2g-B (15), 6b73-C (15), 6nfj-E (15), 6ysq-C (15), 6gci-B (15), 6tyl-D (15), 7k7y-C (15), 5ukb-H (15), 5imm-B (16), 5m13-B (16), 3qyc-A (16), 1u0q-A (16), 7mfu-C (16), 5tp3-A (16), 5m15-C (16), 5bop-A (16), 3eak-A (16), 6ul6-B (16), 5vxm-B (16), 6ck8-A (16), 6oca-D (16), 6waq-A (16), 6uht-C (16), 6uc6-C (16), 6rbb-H (16), 1qd0-A (16), 4grw-E (16), 5lwf-C (16), 6h15-C (16), 6gs4-H (16), 5mp2-C (16), 6c9w-B (16), 6eqi-B (16), 6qv0-E (16), 5uk4-X (16), 6qx4-D (16), 5ojm-K (16), 5tok-D (16), 6x08-K (16), 5e7b-A (17), 6xyf-A (17), 5u64-B (17), 6ir2-B (17), 5lz0-A (17), 1mvf-A (17), 5j57-B (17), 6h70-C (17), 1ri8-A (17), 7kn5-E (17), 5e0q-A (17), 7a50-A (17), 4u7s-A (17), 6z6v-G (17), 7n0i-I (17), 7nkt-B (17), 7k84-B (17), 7nmu-C (17), 4w6w-B (17), 4grw-F (17), 7n0s-D (17), 5sv4-A (17), 7kn7-B (17), 6zg3-E (17), 6uft-B (17), 6jri-B (17), 5boz-G (17), 6ul4-B (17), 6rpj-B (17), 2x1q-A (18), 2x1p-A (18), 6jb9-A (18), 6cwk-A (18), 2xa3-A (18), 3r0m-A (18), 6jb2-A (18), 4i13-B (18), 4tvs-H (18), 6lr7-B (18), 5n88-A (18), 6zbp-F (18), 4i1n-B (18), 6f0d-A (18), 4b5e-A (18), 4dka-A (18), 4s11-A (18), 5hdo-A (18), 6f5g-B (18), 6v7y-F (18), 3stb-A (18), 6xxn-A (18), 5ovw-G (18), 6z2m-D (18), 4krp-B (18), 6do1-C (18), 6oyh-E (18), 6oq6-D (18), 4n1h-B (18), 5c2u-B (19), 5jds-B (19), 4ej1-C (19), 7now-A (19), 6xw6-C (19), 1kxt-B (19), 7a6o-B (19), 1f2x-K (19), 6ksn-A (19), 2x89-A (19), 5f1o-B (19), 5u65-A (19), 5vxk-B (19), 6rqm-B (19), 3sn6-N (19), 4qgy-A (20), 4w6y-B (20), 7kn5-C (20), 5vm4-A (20), 6rnk-B (20), 6fys-A (20), 5e1h-B (20), 4fhb-D (20), 6ssp-K (20), 5gxb-B (20), 5wb2-D (20), 6qv2-E (20), 4krn-A (21), 5fwo-A (21), 7ldj-E (21), 5nml-C (21), 6knm-A (21), 3g9a-B (23), 5lhn-B (24), 5o0w-E (24), 1zmy-A (24).