Introduction

Monoclonal antibodies (mAbs) are widely used therapeutics against cancer, autoimmune, and infectious diseases1,2,3,4,5. The global mAb market is forecasted to grow to >$ 300 billion in 20256. Despite their commercial success, mAb discovery remains a resource- and time-consuming process resulting in a costly and lengthy clinical approval, hindering accessibility and affordability7,8. A successful mAb molecule should not only show sufficient affinity in its target binding profile but also exhibit a desirable “developability” profile9. The term “developability” refers to a combination of intrinsic physicochemical parameters defined as developability parameters (DPs) that relate to biophysical aspects of antibodies and their formulations—including aggregation, solubility, and stability10,11,12,13,14. The feasibility of an antibody candidate to successfully progress from discovery to development is underpinned by specific DPs, which reflect its manufacturability and druggability10,15. Thus, suboptimal developability is one of the main factors of mAbs failure in preclinical and clinical development stages16,17,18. Therefore, the ability to predict and prospectively design developability properties, in line with clinical and manufacturing requirements, would help by reducing the time and resources invested in developing therapeutic mAbs, thus, boosting their success rate19,20,21.

Traditionally, developability screening is performed through a series of laborious in vitro assays17,22,23. Therefore, major efforts have been invested into developing real-world-relevant in silico tools and machine learning (ML) algorithms that can computationally quantify or predict DP values using antibody sequence and/or structure information7,24,25,26,27,28,29,30,31,32,33,34,35,36. Given the current high throughput of antibody structure prediction at the repertoire scale37,38,39, tools for computational developability determination have become available, which can be used to identify potential design shortfalls during the development and selection of lead mAb candidates40,41,42,43,44,45,46.

In contrast to the design and selection of pharmaceutical mAbs, the natural (or native, used interchangeably) immune system has the capacity to “design” antigen-specific antibodies with physiologically compatible and optimized biochemical properties within days to weeks47,48,49,50. Indeed, it was previously reported that antibodies obtained via, for example, humanized mice exhibit far fewer developability risks compared to mAb candidates obtained from in vitro display campaigns4,17,20,51. In line with these findings, clinical mAbs have been reported to exhibit high sequence identity matches (>70%) with natural antibodies for both heavy and light chains, implying that artificially developed mAbs are not entirely dissimilar from their native counterparts, thus, highlighting the relevance of mining the natural antibody repertoires for therapeutic mAb discovery52,53. These findings motivate the embedding of therapeutic mAb sequences in the developability landscape (here defined as the multidimensional distribution of DP parameter values across DPs and antibodies) of the natural (human and mouse) antibody repertoires to assess the nativeness of their developability profiles (DPLs, a DPL is a set of DP values for a given antibody sequence/structure)54. As such, a mAb candidate with a DPL falling outside the range of its variation in natural antibodies may be assumed unnatural and, therefore, more likely to exhibit undesirable in vivo characteristics53. Recent studies have also pointed to the futility of attempting complete separation between natural antibodies and therapeutic mAbs based solely on DP values55. Similar findings continue to emphasize the valuable knowledge that can be harnessed from interrogating the growing sequence space of naturally sourced antibody sequences to accelerate the engineering and optimization of mAb candidates2,52.

So far, the relationship between the developability landscape of the natural antibody repertoire and that of therapeutic (or, more broadly, human-engineered) antibodies remains unclear. While not all natural antibodies may be suitable as therapeutic candidates from a developability perspective7,55, we lack a large-scale overview of sequence and structure-based natural antibody developability landscapes stratified by antibody isotype and species. Furthermore, given previous low-sample size investigations, we are unaware to what extent sequence changes affect a given DP and which sequence or structure-based design restrictions may limit all-vs-all DP optimization with a given antibody sequence. Furthermore, many studies have focused on extracting developability guidelines from a limited number of successful mAbs, considering them a “gold standard” of desired developability7,10,17,28,56. In addition, most studies have focused on a small number of DPs17,57, and apply their hypotheses to limited antibody datasets comprising of a few 100 s to 1000 s antibody sequences7,10,32,56 or datasets not including patent-submitted antibodies or antibodies that failed during early clinical trials7,10,58. So far, the lack of datasets with sufficient sample size has hindered an in-depth understanding of the plasticity of the antibody developability space.

Understanding the natural antibody landscape could enable the integration of both current and prospective antibody therapeutic candidates, which will improve our interpretation of the disparities in developability between human-engineered mAbs and natural antibodies (Fig. 1). To this aim, we have built an atlas of over two million unique native antibody sequences from human and murine heavy and light chains (≈200,000 per isotype and chain) annotated with DPs. We predicted the 3D structure of all antibodies and calculated 40 sequence-based and 46 structure-based DPs for each antibody. Using correlation and graph theory, we identified a subset of DPs that are maximally different from one another, thus delineating a non-redundant multidimensional antibody developability space. Across all antibody isotypes, we found lower interdependency among structure-based DPs in contrast to sequence-based DPs. Notably, distinct developability landscapes emerged across species (mouse, human) and antibody chains (heavy, light). In addition, we quantified DP sensitivity by analyzing the DP value distribution of mutants with single amino acid substitutions. We also found that the values of DPs measured on the conformational ensembles of antibodies evolve throughout their molecular dynamics (MD). Regarding predictability, our analysis revealed that ML is more successful in predicting the values of sequence-based DPs, indicating a less confined design landscape for structure-based DPs. Our analysis also suggested that the observed developability spaces of human-engineered antibodies are essentially subsets of the broader natural developability space (in terms of the major principal components of variation). While our study relies on computationally predicted developability, in which experimental correspondence may vary15, it serves as a practical and real-world relevant use case of integrating and charting repertoire-wide developability to guide mAb selection and development.

Fig. 1: Redundancy, sensitivity, and predictability of developability parameters in native and human-engineered antibodies.
figure 1

Introduction: The development of therapeutic mAbs takes years, and DPs dictate the selection and design of candidates for (pre-)clinical testing. Here, we analyzed the plasticity of the developability landscapes of natural antibodies in terms of DP redundancy (extent of DP intercorrelation), sensitivity (extent of DP change as a function of antibody sequence change), and predictability (predictability of a given DP based on one or several DPs). Methods: To analyze the constraints on natural antibody developability and to relate these to current human-engineered antibody datasets, we assembled a dataset of over 2 M native antibody sequences (heavy and light chain isotypes, human and murine) and computed 40 sequence- and 46 structure-based DPs. To reduce redundancy, we determined the minimum-weight dominating sets (MWDS) of DP correlation networks. To quantify sensitivity, we analyzed single-amino-acid substituted variants followed by characterization of the impact of sequence variation on DP distribution. To compute predictability and assess the interdependence of DPs, we trained multiple linear regression (MLR) using developability profile (DPL) and protein language model (PLM) embeddings. These embeddings were used to relate native antibodies to human-engineered ones via principal component analysis (PCA). Moreover, we performed classical molecular dynamics simulations to analyze the distributions of antibody DP values and define how the rigid models fit into these distributions. Results: Our results address all three research areas (redundancy, sensitivity, and predictability). Redundancy: We found a lower degree of interdependence among structure DPs than the sequence-based ones for all isotypes of the native dataset, and higher pairwise antibody sequence similarity was not always associated with higher pairwise antibody developability similarity. Native antibody datasets contained species- and chain-specific developability signatures. Sensitivity: We propose methods to quantify the sensitivity of antibody DPs to minimal sequence changes. Predictability: We found that structure-based DPs are less predictable than sequence-based DPs using protein language model (PLM) and multiple linear regression (MLR) embeddings. The comparison between native and human-engineered datasets revealed that human-engineered (therapeutic, patented, and Kymouse) datasets were localized within the native developability landscape.

Results

Developability parameters

We computed 40 sequence-based and 46 structure-based developability parameters (Supplementary Data 1) for each antibody (Fv region sequence) after having predicted their 3D structures using ABodyBuilder (ABB)37. The choice of these DPs was intended to cover a comprehensive array of the physicochemical properties of antibodies based on our understanding of the developability literature and antibody structure. Some developability parameters included in the therapeutic antibody profiler (TAP, e.g. positive and negative charge heterogeneity, sequence-based hydrophobicity)7 were incorporated in our study. All DPs were categorized into groups based on the main physicochemical property. For instance, the instability index and the aliphatic index were included within the “stability” group as they reflect the stability of the antibody (Supplementary Data 1). Other groups include categorical amino acid composition, electro- and photo-chemical parameters, as well as structural interactions and conformational descriptors (detailed in Supplementary Data 1).

Sequence-based DPs show higher association and redundancies compared to structure-based DPs

The diversity and redundancy of the natural DP landscape of human and murine antibody repertoires has not been investigated. To address this knowledge gap, we assembled a dataset of ~2 M non-paired VH and VL native antibody sequences (~170K–200 K sequences for each isotype, human; IgD, IgM, IgG, IgA, IgK, and IgL, murine; IgM, IgG, IgK—Supplementary Fig. 1A).

Prior to the DP analysis, we controlled for the quality of the computational data generated along two axes Briefly, (1) given that the majority of the work was performed on unpaired chain data due to a lack of isotype-stratified paired-chain data, we verified that structure-based DPs measured on a subset of paired-chain antibodies strongly correlate with the corresponding DP values measured on each of the unpaired (heavy and light) single chains separately (median Pearson correlation 0.84–0.9, Supplementary Fig. 2, Supplementary Note 1). (2) We also analyzed, using rigid models and molecular dynamics (MD), the dependence of structure-based DPs on computational antibody structure prediction methods (such as IgFold39 and AbodyBuilder238) (Supplementary Fig. 3, Supplementary Fig. 4, Supplementary Fig. 5) We found that the predominant structure prediction method used in this study (AbodyBuilder37) faithfully replicated conformations within the antibody structure conformational ensemble of a given antibody sequence as determined by MD (Supplementary Fig. 4, Supplementary Fig. 5, Supplementary Note 2), thus validating our strategy to compare antibody developability landscapes.

First, we inquired about the degree of correlation among different DPs within the native antibody dataset. To address this, we examined the pairwise correlation among the values of DPs by isotype and species for the full native dataset (Fig. 2a). Overall, we found that sequence-based parameters were significantly more correlated with one another compared to structure DPs across all isotypes of the native dataset, suggesting greater general association among sequence DPs (median absolute Pearson correlation coefficient 0.14–0.21 for sequence-based DPs, 0 for structure-based DPs, Fig. 2a). Although the degree of correlation for both sequence and structure DPs was relatively low (median absolute Pearson correlation ≤0.21), sequence-based parameters formed larger correlation clusters with high correlation values (>0.6) in the native human IgG dataset (Fig. 2b; boxed in black). Similar clustering patterns with high correlation values were found across non-IgG isotypes for the sequence-based DPs (Supplementary Fig. 6, Supplementary Fig. 7). As a control measure, we repeated this analysis on permuted DPs and reported a drastic loss of correlation and a significant change in the distribution of Pearson correlation coefficient values (Supplementary Fig. 8).

Fig. 2: Sequence-based developability parameters show higher redundancies compared to structure-based parameters.
figure 2

a Absolute pairwise Pearson correlation of sequence and structure developability parameters within the native antibody dataset. Numerical values on the figure represent the median of Pearson correlation for the corresponding subset. Differences were assessed using pairwise Mann-Whitney test with p-value adjustment (Benjamini-Hochberg method). ****p < 0.0001 (Human; IgD: 2.09e-112, IgM: 6.54e-104, IgG: 5.21e-100, IgA: 8.94e-101, IgE: 1.04e-94, IgK: 4.61e-74, IgL: 7.46e-80, Mouse; IgM: 1.276e-98, IgG: 4.76e-101, IgK: 6.9e-88, IgL: 6.68e-71). n = 785 sequence and 1035 structure biologically independent pairwise correlation experiments for each isotype and species combination. b Hierarchical clustering of 40 sequence and 46 structure developability parameters based on pairwise Pearson correlation for 170,473 IgG human antibodies (median of absolute Pearson correlation: 0.02 + /-0.003 SEM). As explained in the inset (top right), each cell within the heatmap reflects the value of Pearson correlation for a pair of DPs. Developability parameters are color-annotated with their corresponding level (sequence or structure), physicochemical property (as detailed in Supplementary Data 1), and dominance status from the ABC-EDA algorithm output at Pearson correlation coefficient threshold of 0.6 (see Methods). Black boxes highlight correlation clusters that contain more than three DPs and exhibit pairwise Pearson correlation coefficient > 0.6. Supplementary Figs. 610A.

Subsequently, we asked which DPs are redundant, as quantified by the previous analysis. To answer this question, we utilized the pairwise Pearson correlation coefficient values from Fig. 2a, b to construct undirected weighted network graphs (Supplementary Fig. 9A, B). Additionally, we employed the hybrid artificial bee colony—estimation of distribution hybrid algorithm (ABC-EDA59) to identify the minimum weighted dominating set (MWDS) among DPs (see Methods). In this context, the MWDS comprises the most uncorrelated DPs that sufficiently reflect the overall developability for a set of antibodies at a given Pearson correlation threshold59. When we conducted this analysis on the native IgG repertoire at an absolute correlation threshold of 0.6, we found that the pairwise relationships among DPs can form subnetworks, doublets, and isolated nodes (Fig. 2b, Supplementary Fig. 9B). At this threshold, subnetworks were largely formed by DPs that reflect similar physicochemical properties regardless of their level (sequence or structure—Supplementary Data 1). For instance, sequence- and structure-based charge and isoelectric point (electrochemical) DPs (n = 20) clustered with the acidic and basic amino acid composition DPs (Supplementary Fig. 9B). The largest subnetwork was predominantly occupied by sequence DPs. Meanwhile, structure-based DPs mainly formed isolated nodes (n = 17) and smaller subnetworks (Supplementary Fig. 9B).

When we repeated this analysis on all the isotypes of the native dataset (Supplementary Fig. 1A), we found that the proportion of isolated nodes to the initial DP count was consistently higher among structure-based DPs, starting from low correlation thresholds (0.1–0.2) across all isotypes, in comparison to sequence-based DPs (Supplementary Fig. 10A). For example, only 12.5%–22.5% of sequence-based DPs (5–9 out of 40) compared to 26.1%–41.3% of structure-based DPs (12–19 out of 46) were classified as isolated nodes at a Pearson correlation threshold of 0.6 (Supplementary Fig. 10A). Meanwhile, we found that the proportion of subnetwork dominant DPs was higher among sequence DPs across all isotypes for the higher correlation thresholds (Pearson correlation > 0.6). For example, 7.5%–12.5% of sequence-based DPs were categorized as dominant nodes at the strictest correlation threshold (0.9) in comparison to structure-based DPs (2.2%–6.5%) (Supplementary Fig. 10A). Collectively, these results emphasize the lower interdependence among structure-based DPs when compared to the sequence-based counterparts.

The native antibody dataset exhibits chain-type and species-specific developability signatures

Next, we asked to what extent the isotypes of the native datasets are similar to one another regarding DP redundancies (MWDS parameters) and associations (DP pairwise correlations). In relation to these driving questions, we also asked to what extent natural antibodies harbor chain (VH,VL) and species-specific (human, mouse) DP differences. To address these questions, we first investigated the similarities in parameter redundancies among the native dataset for a given Pearson correlation value (0.6). Specifically, we explored the pairwise intersection size of the MWDS parameters on both sequence and structure levels for the human and murine antibody datasets, featuring the common isotypes (IgM and IgG) between the two species in our dataset (Fig. 3a), and all the isotypes of the human VH antibodies (Supplementary Fig. 10B).

Fig. 3: The native (human and murine) antibody datasets exhibit chain-specific and species-specific developability signatures.
figure 3

a MWDS intersection size for the human and mouse native datasets. Numerical values on the figure reflect the MWDS count (for an individual subset) and intersection size (for more than one subset). The MWDS for the respective isotypes was identified using the ABC-EDA algorithm (see Methods) at a threshold of absolute Pearson correlation of 0.6. For MWDS intersection size among all human heavy chain subsets, please refer to Supplementary Fig. 10B. b Distance-based hierarchical clustering of isotype-specific pairwise DP correlation matrices (sequence and structure levels). The height of the dendrograms (shown to the left of the dendrograms) represents the correlation distance among the dendrogram tips. c Repertoire-wide principal component analysis (PCA) of the native antibody developability profiles. We performed this analysis for the complete native dataset (left pane; ~2 M sequences) and for the chain-specific datasets (right panels; ~1.2 M sequences in the top panel, ~0.8 M sequences in the bottom panel). The dimensionality of complex developability profiles was reduced to 2D PCA projections. The full value distribution of the corresponding PCs associated with each projection is shown in Supplementary Fig. 10C. Supplementary Fig. 10B, C.

This analysis revealed that both heavy (IgG and IgM) and light chain human antibody datasets displayed a larger MWDS intersection size on both sequence and structure levels than the murine counterparts (Fig. 3a). For instance, human IgM and IgG datasets shared 18 sequence DPs (86% overlap) and 29 structure DPs (90% overlap) in their MWDS sets, whereas the same heavy chain isotypes of the murine dataset shared only 12 sequence DPs (75% overlap) and 23 structure DPs (77% overlap) (Fig. 3a). Moreover, the human heavy chain dataset displayed greater or comparative MWDS intersection size even when considering all five antibody isotypes (71% overlap on sequence level, 84% overlap on structure level) in comparison to the murine heavy chain dataset (IgM and IgG only) (Fig. 3a and Supplementary Fig. 10B). Similarly, the MWDS overlap for the human light chains (IgK and IgL) was greater on both levels (15 DPs—71% overlap on sequence level and 32 DPs—97% overlap on structure level) in comparison to the mouse light chain dataset (14 sequence DPs—7% overlap and 29 structure DPs—88% overlap) (Fig. 3a). Thus, our findings suggest greater consistency among the isotypes in the human antibody dataset when it comes to DP redundancies (MWDS overlap), as opposed to the mouse antibody dataset.

Second, we sought to investigate the similarities in DP associations among the isotypes of the native antibody dataset. To this end, we clustered the isotype-specific datasets based on the distance of their pairwise DP correlation matrices (see Methods). This analysis revealed chain-specific segregation (heavy and light) and, within a given chain, species-specific segregation (human and mouse) of antibody subsets on the structural level (Fig. 3b, right panel). Additionally, the human dataset showed a closer distance among its isotypes within the heavy and light chain clusters (0.03 for heavy chain isotypes, 0.08 for light chain isotypes) than the mouse dataset (0.15 and 0.12 for heavy and light chain clusters, respectively). An equivalent sequence-based analysis (Fig. 3b, left panel) drew a similar conclusion regarding the uniqueness of chain-type developability. However, interspecies isotype-specific clustering occurred among the light chain subsets (Fig. 3b). Similarly to the structure-based DP association clustering, the human heavy chain isotypes showed a smaller distance among themselves (0.08) in comparison to the murine IgM and IgG subsets (0.14) (Fig. 3b). These findings suggest native (human and murine) datasets harbor chain-type-specific developability signatures. Species-specific developability differences were less pronounced, especially for the light-chain antibody subsets.

The aforementioned clustering of antibody datasets based on DP associations led us to investigate whether the antibody species and chain type are key sources of variance in DP values. To this end, we performed a dimensionality reduction analysis on the developability profiles of the native antibodies using a principal component (PC) analysis (PCA) (Fig. 3c). Examining the 2D PCA projections of developability profiles of all native antibodies (~2 M) further emphasized differences in antibody developability by antibody chain type (Fig. 3c). The axes of maximal variance (PC1 and PC2) separated antibody sequences by VH and VL chain (absolute differences of medians = 5.6 and 1.8, respectively—Supplementary Fig. 10C). A similar projection of each chain type subset (heavy; ~1.2 M, light; ~0.8 M) allowed for (partial) species-based distinction of antibodies (Fig. 3c). The influence of the antibody species on developability was more prominent among the heavy chain antibodies (absolute difference of PC1 medians = 4.1) in comparison to the light chain ones (absolute difference of PC1 medians = 3.2) (Supplementary Fig. 10C).

In summary, the isotypes of the human native dataset exhibit greater pairwise relatedness regarding their DP associations and redundancies compared to the murine dataset. Moreover, the antibody chain type and species of origin considerably influence its overall developability.

The sensitivity of sequence-based developability parameters is quantifiable by single-amino acid substitution analysis

Future antibody design will be performed in a multi-objective manner3,60, which means that the design approach optimizes many parameters at once. In certain cases, introducing minor changes might be sufficient to improve the value of a certain developability parameter. However, improving one parameter may compromise another3,61. Therefore, it is interesting to understand to what extent minor sequence changes impact DP values. To address this question, we performed a single-substitution sensitivity analysis of DPs by quantifying the changes in DP value induced by every possible amino acid alteration in the antibody sequence at a time (see Methods, Fig. 4a). Since there are no established methods and metrics to quantify the sensitivity of antibody developability parameters62, we employed two proxy measures to estimate DP sensitivity. First, we define the average sensitivity by the excess kurtosis, and secondly, the potential sensitivity as the range of a DP distribution of an antibody and all its possible single amino acid substituted variants (see Methods). Given the inability of current antibody structure tools, both template-based or de novo deep learning-based, to resolve differences between structures of antibodies with single amino acid variations (Supplementary Fig. 11, Supplementary Note 3), we focused our analysis on sequence-based DPs only.

Fig. 4: Developability parameter sensitivity can be quantified by analyzing mutated variants of wildtype antibodies.
figure 4

a DP values were computed for all possible single amino acid substituted mutants of 500 sampled wildtype human VH antibody sequences (100 sequences sampled per isotype; n = 301,777 independent mutants in total). Values of each DP were scaled and mean-centered. The sensitivity was quantified for each DP by analyzing the DP dispersion of the mutants from their corresponding wildtype. Average sensitivity was measured by excess kurtosis (small kurtosis = high average sensitivity), while potential sensitivity was measured by the range (see Methods). b Average and potential sensitivity of selected sequence-based DPs. c Average and potential sensitivity of DPs from (B) grouped by antibody region in which the mutation occurred. In both (b) and (c), numerical values on the x-axis represent the median of the corresponding sensitivity metric. Supplementary Figs. 11 and 12.

We linked the dispersion of a given DP value distribution (from mutated variants and their corresponding sampled wildtype) to its sensitivity. We employed two metrics to quantify this dispersion (see Methods). The first metric, excess kurtosis63, was implemented as a proxy measure for the average sensitivity. It describes how far the “tailedness” of a given distribution deviates from that of a Gaussian distribution (i.e., excess kurtosis = 0, Fig. 4a). In this context, a positive excess kurtosis indicates a low average sensitivity and a negative excess kurtosis implies a higher sensitivity with an increased proportion of mutant DP values diverging from the wildtype. Strictly, this is only valid under the assumption of a Gaussian distribution and should be considered when estimating sensitivity by excess kurtosis.

The second metric is the range, which is the absolute difference between the smallest and largest values of a distribution after normalization. It was used as a proxy measure for the potential sensitivity of DPs as it reflects the potential extreme changes that can be introduced on DP values as a factor of amino acid sequence change (Fig. 4a). Since only single amino acid substitutions were analyzed, we removed all DPs describing categorical amino acid composition (Supplementary Data 1) where it is trivial to predict the changes in DP values, as well as the length of the sequence (AbChain_length) since it is not affected by substitutions. Because DPs denoting a sequence’s charge at a given pH were clustered in three highly correlated clusters (Fig. 2b), we chose to retain only three of them, which represent acidic, neutral and basic pH (AbChain_4_charge, AbChain_7_charge and AbChain_12_charge respectively). Additionally, all DPs with ‘_percentextcoef’-suffix were analyzed instead of their highly correlated counterparts with ‘_molextcoef’-suffix (Supplementary Data 1).

We found the median excess kurtosis of most sequence-based DPs was >0 and that most DPs exhibit an average sensitivity close to that of a normal distribution (excess kurtosis of a normal distribution = 0, Fig. 4a, Supplementary Fig. 12A). In fact, some parameters, such as the molar extinction coefficient (of cysteine bridges) and the hydrophobic moment, were insensitive on average to substitutions as indicated by their high kurtosis (median excess kurtosis: 6.7, 49.02, and 29.49, respectively; Supplementary Fig. 12). Notably, none of the tested DPs displayed high average sensitivity (median excess kurtosis « 0, Fig. 4a, Supplementary Fig. 12), suggesting that developability is relatively stable to the average single amino acid mutation with few outliers. Nevertheless, since DP values were normalized, small relative shifts induced by a mutation may still have a large effect in practice.

Next, we show that the median range in sequence-based DPs varied between 0.38 for the molecular weight DP (AbChain_mw) and 3.82 for the hydrophobic moment DP (AbChain_hmom—Supplementary Fig. 12B). Notably, some DPs (such as AbChain_4_charge—Fig. 4b first column second row) have a constant or close to constant range, likely due to the fact that the set of all possible single amino acid substitutions covers the entire range of possible DP values. For example, when considering the charge of an antibody, the lowest possible charge results from substituting the most positively charged amino acid with the most negatively charged amino acid and vice versa. Since all amino acids are present in close to all sampled wildtype sequences, their ranges are almost identical as well. In DPs that depend on non-linear amino acid interactions, such as solubility (AbChain_solubility), the range was more diverse (2.33, Fig. 4b).

To investigate the impact of substitutions across antibody regions, we grouped DP values of human heavy chain mutants by the region in which the substitution occurred and calculated their sensitivity metrics separately. We observed that electrochemical DPs such as charge and hydrophobicity (AbChain_4_charge and AbChain_hydrophobicity) exhibited higher potential sensitivity (range) in CDR3 and framework regions (median ranges AbChain_4_charge and AbChain_hydrophobicity respectively; CDR3: 1.65, 1.39, FR1: 1.31, 1.27, FR2: 1.49, 1.4 and FR3: 1.65, 1.43) compared to CDR1 (0.83, 1.06) and CDR2 (1.16, 1.11) and FR4 (0.83, 1.06; Fig. 4c). Although we found the average sensitivity (excess kurtosis) to differ by antibody region (Fig. 4c), there was no apparent general rule that clearly separates framework regions from CDRs. Since short sequences have a higher probability of missing the most charged or polar residues, the potential sensitivities of a given region tend to be lower for shorter regions. The stark differences in range between heavy chain framework regions 1–3 and CDRHs 1–2 may thus be explained by the shorter sequence length of CDRH regions (with the exception of CDRH3).

In summary, our sensitivity analysis suggests that most DPs are comparably ‘normally’ sensitive (close to 0 excess kurtosis: mutant DP distribution as ‘tailed’ as a Gaussian distribution), and some parameters are especially insensitive to the average substitution. Additionally, although average and potential sensitivity differ by antibody region in which a mutation occurs, the differences are not generalizable across DPs.

Antibody sequence similarity does not imply antibody developability similarity

Given that the values of DPs were prone to change with minor sequence changes, we asked to what extent pairwise sequence similarity is related to pairwise developability profile similarity, where the developability profile (DPL) was defined as a numerical vector that carries (sequence and/or structure) DP values in a fixed order for a given antibody sequence (see Methods).

To this end, we first examined the pairwise correlation of antibody developability profiles (developability profile correlation: DPC) alongside the pairwise sequence similarity score (see Methods) for a random sample of 100 natural antibodies from the human IgM dataset that share the IGHV gene family annotation (Fig. 5a). This is to eliminate the role of the V-gene as a factor of variance in our analysis, as up to 80% of sequence similarity can be expected among antibodies that belong to the same IGHV gene family64,65. Sequence-level DPC clusters were often, but not always, accompanied by sequence similarity clusters, while structure-level DPC clusters were independent regarding sequence similarity clusters (Fig. 5a, Supplementary Fig. 13, Supplementary Fig. 14 and Supplementary Fig. 15). To quantify the association between the two metrics (DPC and sequence similarity), we computed the Pearson correlation coefficient between the pairwise DPC matrices and the pairwise sequence similarity matrices (Fig. 5b, Supplementary Fig. 16A). We repeated this analysis for 100 randomly sampled sets of 100 sequences each (within the same IGHV gene family). Samples were taken from all isotypes of the native dataset to account for variation of associations among batches. We restricted the single set (batch) size to 100 antibodies to ensure correlation matrix regularization66. We examined the resulting Pearson correlation values alongside the average sequence similarity score (100 values for each metric for the 100 sampled sets per isotype—Fig. 5b). We found that Pearson correlation coefficients (between DPC and average sequence similarity) tended to be higher on the sequence level (0.2–0.7) than on the structure level (0.1–0.4) across all antibody isotypes (Fig. 5b). For instance, the mean Pearson correlation coefficient for the human IgD dataset was 0.5 on the sequence level and 0.2 on the structure level (Fig. 5b). This finding suggested that similar sequences exhibit higher sequence-based developability similarity compared to structure-based developability similarity. However, higher sequence similarity was not always accompanied by higher Pearson correlation values of DP profiles. For example, although the average sequence similarity of the murine IgL dataset was as high as 0.9, the mean value of Pearson correlation coefficient was only 0.5 (Fig. 5b). We reported a similar Pearson correlation average (0.5) for the human IgE dataset, even though its mean sequence similarity was less than the IgL murine dataset (0.7, Fig. 5b).

Fig. 5: Developability profile similarity is not necessarily associated with sequence similarity.
figure 5

a Pairwise developability profile Pearson correlation (DPC—left panels) alongside the pairwise Levenshtein distance (LD) based-sequence similarity score (right panels—see Methods) for a random sample of n = 100 antibodies from the human IgM dataset (100 × 100 matrices) that share the same IGHV gene family (IGHV1) annotation (shown both for sequence and structure DPLs). Each row and each column represent a single antibody sequence. Rows and columns in the left panels were hierarchically clustered. In the right panels (sequence similarity), rows and columns were ordered in the same order as the corresponding left panel (DPC) for ease of comparison. The distribution of DPC and sequence similarity is shown in Supplementary Fig. 16A. b Pearson correlation between DPC and sequence similarity matrices for 100 sets of randomly sampled non-overlapping 100 antibody sequences (within the same IGHV gene family per batch) from all isotypes of the native dataset (total n = 100 independent experiments of 100 antibodies per experiment). Pearson correlation coefficient values (shown in beige) are presented alongside the corresponding mean sequence similarity values (shown in green) for the same 100 sets. The height of the bars and the numerical values on the figure reflect the mean of the corresponding metric (mean Pearson correlation and the mean sequence similarity). The error bars represent the standard deviation. c Principal component analysis (PCA) of the developability profiles of the native human heavy-chain dataset (n = ~0.8 M antibodies). The developability profiles (DPLs) were utilized as embeddings for this analysis (see Methods). Antibody clusters (1–7) were created for the groups of antibodies that are at least 75% similar in sequence (as determined by USEARCH) and contain at least 10 K antibodies. Antibodies that did not satisfy the clustering conditions were labeled as “non-clustered” (727861 sequences) and sent to the back layer of the figure. For antibody counts per cluster, please refer to Supplementary Fig. 16B. Supplementary Figs. 1316.

Next, we investigated the relationship between antibody developability and sequence similarity using a geometric approach to test the finding that developability profile similarity and sequence similarity are not necessarily associated (Fig. 5a, b). We leveraged the fact that antibody developability profiles are numerical vectors in the developability space (RN), where N represents the number of DPs that compose a single developability profile (see Methods). This space (RN) offers a natural notion of antibody developability diversity. However, it is challenging to inspect due to its high dimensionality. Thus, we applied a dimensionality reduction technique (principal component analysis: PCA) on the developability profile space of the human heavy-chain antibody dataset (Fig. 5c) as it is the largest data subset that belongs to a single species and chain type (~0.8 M antibodies, Supplementary Fig. 1A). Within this subset, we identified seven sequence similarity groups (1–7). Each group encapsulates at least 10 K antibodies and group members exhibit at least 75% pairwise sequence similarity (Supplementary Fig. 16B—see Methods). We found that antibody members that belong to the same sequence similarity group did not occupy a restricted space on the PCA projection plane, suggesting that antibody developability and sequence similarity are not correlated (Fig. 5c). To quantify this observation, we studied the correlation between the pairwise Euclidean distances in the developability space (RN) and the pairwise (normalized) Levenshtein distances for 5000 antibody sequences from the human IgM dataset that share the same IGHV gene family annotation (IGHV1) and belong to the same sequence similarity group (group 1—Supplementary Fig. 16B, C). We found that the two distance measures showed only minimal correlation (Pearson correlation coefficient = 0.18, Supplementary Fig. 16C). As Fv sequences belonging to the same IGHV gene family share high (up to 80%) sequence similarity64,65, most of which is attributed to conserved sequences in the frameworks, CDRH1 and CDRH2 regions67,68, the majority of sequence variation is credited to the CDRH3. As such, the lack of correlation between the pairwise Normalised Levenshtein distance and the pairwise Euclidean distance in the developability space for these antibodies (Supplementary Fig. 16C) indicates an independence of developability profile and CDRH3 sequence similarity among antibodies of the native dataset.

In conclusion, our analysis demonstrated that antibody developability and sequence similarity were largely independent, suggesting that improving the developability profile for a certain therapeutic mAb candidate with a desirable target binding profile may be possible by introducing small changes in its amino acid sequence.

DP predictability implies interdependence and antibody design space restriction

Building on the prior finding that no significant association exists between antibody sequence similarity and developability similarity, we inquired whether missing values of given DPs can be predicted based on the knowledge of the values of other DPs. This is to investigate to what extent the developability space is amenable to orthogonal DP design (Fig. 6a). In this context, high predictability of a given DP would indicate a restriction of the antibody design space, while low predictability could signify a more plastic space with a higher degree of freedom for the values of this DP. Answering the above question would also provide insights into which DPs can be better predicted (with the provided knowledge of the remaining DPs), which may accelerate antibody developability screening. Also, we evaluated the predictability of missing DP values depending on the sole knowledge of the amino acid sequences of antibodies (through their protein language model representations, Fig. 6a). This aims to investigate the feasibility of DP predictability in the absence of other DP data.

Fig. 6: Sequence-based developability parameters are more predictable than structure-based parameters.
figure 6

a Graphical representation of machine learning (ML) approaches used to assess the predictability of DPs. We investigated two scenarios where the missing (deleted) DP values were either all from one (single) DP (ML Task 1) or were randomly missing from several DPs (ML Task 2). For ML Task 1, we compared the predictive accuracy of two different embeddings; single-DP-wise incomplete developability profiles (DPLs) (embedding 1; order of magnitude 101) and PLM vectors (embedding 2; order of magnitude 103). We used these embeddings to train multiple linear regression (MLR) models (separately) to predict the missing DP values in the test set. To enable the comparison between these two embeddings, we used identical training subsamples (in regards to size and antibody identity, see Methods). For ML Task 2, we used cross-DP-wise incomplete developability profiles as input for the multivariate imputation by chained random forests (MICRF) algorithm to predict missing DP values. For both ML tasks, we estimated the prediction accuracy by computing the coefficient of determination (R2) using observed and predicted DP values171. b Comparison of the predictive accuracy of incomplete developability profiles (single-DP-wise incomplete DPLs) and PLM vectors as embeddings for MLR models to predict the values of missing DPs in the test set (ML Task 1). The x-axis reflects the number of antibody sequences (sample size) used for the embedding. For each sample size, we repeated the prediction of missing DPs 20 times (n = 20 independent experiments). The y-axis represents the mean R2 for sequence DPs (left facet) and structure DPs (right facet). Error bars represent the standard deviation of R2. Missing DPs tested in this analysis belonged to the MWDS exclusively, as determined at a Pearson correlation coefficient threshold of 0.6, for the human IgG dataset, summing to 13 sequence DPs and 28 structure DPs (after removing a single element from each doublet and immunogenicity DPs, Supplementary Table 2). c Evaluating the predictability of randomly missing DP values using the MICRF algorithm where cross-DP-wise incomplete developability profiles are used as embeddings. The x-axis reflects the number of antibodies (sample size) used for the embedding. For each sample size, we repeated the prediction of missing DPs 20 times (n = 20 independent experiments). The y-axis represents the mean R2 for sequence DPs (left facet) and structure DPs (right facet) when the proportion of the missing data is either 2% (light blue line) or 4% (dark blue line). Missing DPs tested in this analysis belonged to the MWDS, analogously to (b). Numbers on the x-axis in both (b) and (c) reflect the average values of mean R2. Supplementary Fig. 17.

Importantly, the primary objective of this analysis is to evaluate how various factors (type of ML input, type of predicted DP, the redundancy state of the predicted DPs) affect the predictability of DPs rather than achieving absolute predictive precision given that we did not perform a comprehensive benchmarking of encoding or ML approaches69,70. Ultimately, when comprehensive benchmarking and fine-tuning of multitask ML models is performed on developability data, it might be possible to depend on these models for DP value predictability rather than relying on several in silico tools to achieve the same task.

We initially used MWDS DPs to eliminate collinear dependence that may increase ML prediction accuracy in a non-interesting or trivial manner. To this end, we investigated two scenarios where the missing (deleted) DP values were either all from a single DP (ML Task 1), or randomly missing from several DPs (ML Task 2) (Fig. 6a). In practical terms, ML Task 1 replicates a use case where the values of a single DP—that might be challenging to compute or measure—are missing from all antibodies in the dataset. Meanwhile, as “spotted” (i.e., missing-at-random) data is a real-world problem in biomedical research71,72,73, ML Task 2 replicates the use case of obtaining a developability dataset where the values of several DPs are sporadically missing from the antibodies (Fig. 6a).

For ML Task 1, we compared the predictive accuracy of two types of (input) embeddings to predict the missing DP values via multiple linear regression (MLR) models after defining training and test (sub)sets from the native human VH antibody dataset (Fig. 6a, see Methods). (i) The first embedding is the single-DP-wise incomplete developability profiles (DPLs) which depend on the knowledge of all other DP values included in the training set (low-dimensional embedding—order of magnitude ≈ 101—Fig. 6a). (ii) The second embedding is an amino acid sequence encoding output produced by the protein language model (PLM) ESM-1v, which generates large semantically-rich digital representations of antibodies (high-dimensional embedding—order of magnitude ≈ 103)74,75,76. Unlike DPL-based embedding, PLM-based embedding is entirely unaware of antibody DP values (Fig. 6a) and we aim to test whether these biochemical properties are implicitly contained in it.

We found that the predictive accuracy of MLR models increased with increasing training set size for both embeddings. However, DPL-based models reached their saturation point earlier than PLM-based ones for both sequence and structure DPs (1000 antibodies for DPLs, 20000 for PLM—Fig. 6b). This is also (partly) attributable to the fact that higher dimensional inputs (PLM embeddings) correspond to additional degrees of freedom when training the model. Overall, both embeddings achieved higher prediction accuracy for sequence DPs compared to structure DPs at their saturation points (Fig. 6b). However, the disparity in prediction accuracy between the two embeddings was more pronounced for sequence DPs, with PLM-based embeddings achieving a mean prediction accuracy of 0.92 compared to 0.34 for DPL-based ones (Fig. 6b). Thus, the high predictability enabled by PLM-based embedding on sequence level highlighted its capacity to capture the biophysical properties of antibodies based on amino acid sequence77.

When conducting an analogous analysis including non-MWDS (redundant) DPs, we found notable improvement in DPL-based MLR models to predict missing DP values (mean R2 of 0.88 for sequence DPs and 0.36 for structure DPs) (Supplementary Fig. 17A). This is due to the inherent collinearity among non-MWDs DPs (Fig. 2b) that simplifies DPL-based prediction, resulting in higher prediction accuracy78. In contrast, PLM-based embeddings showed far lesser to no distinct improvements when predicting non-MWDS DPs (mean R2 of 0.96 for sequence DPs and 0.38 for structure DPs) (Supplementary Fig. 17A).

For ML Task 2, we implemented the multivariate imputation by chained random forests (MICRF) algorithm72,79 to evaluate its prediction accuracy to recover randomly missing (deleted) DP values from the developability dataset (Fig. 6A—see Methods). We investigated two cases where either 2% or 4% of DP values are missing from either sequence or structure MWDS parameters (Fig. 6C), as implementing the MICRF algorithm with a greater fraction of missing data could compromise the accuracy and reliability of the imputed (predicted) DP values79. Overall, and similarly to the observations reported from ML Task 1 (Fig. 6b), structure DPs were more challenging to predict, and a larger number of data inputs (antibodies) aided the achievement of higher prediction accuracies (Fig. 6c). However, we noticed (only) subtle differences in the prediction accuracy (mean R2) when comparing the algorithm capacity to restore 2% or 4% of missing DP values (Fig. 6c), outlining its robustness within the advised data loss limits for its application80,81. For example, we reported mean R2 of 0.64 (sequence DPs) and 0.42 (structure DPs) with 2% data loss compared to 0.59 (sequence DPs) and 0.4 (structure DPs) with 4% data loss, at a saturation point of 20000 antibodies (Fig. 6c).

Similarly to ML Task 1, non-MWDS DPs were shown to be easier to predict (Supplementary Fig. 17B). However, the disparity in prediction accuracy between the two classes of DPs (MWDS and non-MWDS) was less pronounced in comparison to DPL-based predictions in ML Task 1. For instance, at 2% data loss, the improvement in prediction accuracy of sequence DPs was (only) ~0.2 higher (0.85 for non-MWDS, 0.64 for MWDS—Supplementary Fig. 17B) compared to ~0.6 in DPL-based predictions in ML Task 1 (0.88 for non-MWDS, 0.34 for MWDS Supplementary Fig. 17A).

Of note, we performed ablation studies82 on both ML tasks, by randomly permuting the values of DPs at the columns in the input data (feature shuffling), and confirmed that the prediction accuracy was diminished (R2 ≤ 0, Fig. 6b, c, see Methods).

In summary, for the ML methods and embeddings applied in this analysis, we found that the structural developability space is less restricted. Additional analyses suggested that variations in structure prediction alone cannot explain this (Supplementary Notes 23). Additionally, this analysis highlighted the potential of PLMs to reflect the sequence-based aspects of antibody developability, if provided with sufficient training data. We want to emphasize that our main goal was not obtaining perfect prediction accuracy of DPs but rather measuring the amount of readily available biological information contained in the two representations examined, i.e. DPL- and PLM-based representations. Finally, we also addressed the presence of potential gaps in developability information in practical settings by showing how two different ML algorithms can be utilized in two different scenarios of missing data.

Patent-submitted, humanized mouse (Kymouse) and therapeutic monoclonal antibodies represent a subset of the natural developability atlas

Previously, clinically approved therapeutic antibodies were used to build classifiers for optimal developability7,36. However, the higher abundance of native antibody data incentivizes the investigation of how human-engineered antibodies relate to their native counterparts. Thus, we investigated to what extent we can detect differences between native and human-engineered antibodies (therapeutic mAbs, Kymouse and PAD). To this end, we examined the proportional abundance of liability sequence motifs and the representation of developable germlines across the native and human-engineered datasets (Supplementary Fig. 18). Liability motifs are short amino acid sequences, which may negatively impact various aspects of antibody developability when present in their CDRs83. Developable germlines represent a group of human immunoglobulin genes (VH and VL), which have been suggested to harbor favorable biophysical properties84,85.

The native dataset was comparable to the human-engineered ones regarding the proportion of antibodies that exhibit liability motifs, and no consistent increased or decreased representation (trend) of these motifs was reported (Supplementary Fig. 18A). For instance, while the native human VL dataset exhibited a higher abundance of the asparagine deamidation motif “NS” (17.8%) compared to PAD and mAbs (15.7%, 16.6%), it exhibited a lower abundance of the proteolysis motif “DP” (0.5%, 2%, 3.8%—Supplementary Fig. 18A). Similarly, the native dataset harbored a comparable representation of the developable germline genes with the exception of IGHV1 members (5.2% in native, 3.1% in Kymouse, 16.6% in PAD, and 21.6% in mAbs) and IGK1 members (13.3% in native, 20.8 in PAD, and 25.9 in mAbs—Supplementary Fig. 18B). These findings suggest a notable resemblance between the native and human-engineered antibody datasets with regard to potential developability-related liability sequence motifs and germline annotations.

Thus, we asked how the native antibody dataset relates to the human-engineered ones in terms of DP values. To this end, we first conducted a distance-based clustering (similar to Fig. 3b) starting from the pairwise DP correlation matrices of the native and human-engineered antibody datasets (Fig. 7a). On the sequence level, we found that the general species-specific (human and mouse) and chain-specific (light and heavy) clustering, as previously reported in our investigation of the native dataset (Fig. 3b; left panel), remained consistent when integrating native with human-engineered antibody datasets (Fig. 7a; top panel). Specifically, the human antibodies (light and heavy chains) from the PAD and mAbs datasets localized within the native human clusters of the same chain type (correlation distance ≤ 0.1—Fig. 7a; top panel). Similarly, the light-chain human-engineered murine antibodies (PAD and mAbs) clustered with the IgK native mouse dataset (correlation distance ≈ 0.1—Fig. 7a; top panel). Kymouse antibodies were the most distant subset, which may be explained by their unique intermediate diversity between mice and humans86.

Fig. 7: Human-engineered antibodies are contained in the developability landscape of natural antibodies.
figure 7

a Distance-based hierarchical clustering of isotype-specific pairwise DP correlation matrices (sequence and structure levels—similar to the analysis shown in Fig. 3a). The height of the dendrograms (shown to the left of the figure) represents the correlation distance among the dendrogram tips. The dashed square in the right (structure-based) panel highlights the native-only dataset. b Top three panels: The positioning of the human-aligned human-engineered VH antibodies (Kymouse; 209,452, PAD; 99,213 and therapeutic mAbs; 329) in the developability profile space of the native human VH dataset (854,418 antibodies) based on a principal component analysis (PCA, see Methods). Bottom two panels: The positioning of the human-aligned human-engineered VL antibodies (PAD; 78,921 and therapeutic mAbs; 320) in the developability space of the native human VL dataset (385,633 antibodies). The hexagonal bins (shown in the back layer) represent the count of native antibodies (scale shown on the top right of the panels), and the human-engineered antibodies are represented as data points. c Evaluating the predictability of sequence (left panel) and structure DPs of the human-aligned human-engineered VH antibodies (Kymouse; 209,452, PAD; 99,213 and therapeutic mAbs; 329), using multiple linear regression (MLR) models trained on native human VH antibodies. As explained in Fig. 6a (ML Task 1), the predictive accuracy of two types of embeddings was tested, including single-DP-wise incomplete developability profiles (DPLs) and ESM-1v protein language model encoding vectors (PLM). MLR models were trained using 1000 antibodies for DPL-based predictions and 20000 antibodies for PLM-based predictions (respective saturation points). Missing DPs tested in this analysis belonged to the MWDS exclusively as determined at a Pearson correlation coefficient threshold of 0.6, for the native human IgG dataset, summing to 13 sequence DPs and 28 structure DPs (Supplementary Table 2). The y-axis represents the mean coefficient of determination (R2) across 20 repetitions (n = 20 independent experiments). Numerical values shown represent the average values of mean R2 across (sequence or structure) DPs. Supplementary Figs. 1821.

On the structure level, we found that the original clustering pattern among the native antibody isotypes is preserved from the analysis conducted on the native-only dataset (Fig. 3b, Fig. 7a; “Native only” zone). Human-engineered subsets clustered apart from the “Native only” zone, maintaining either species-specific or chain-specific clustering. Although suggestive, the results of this analysis should be interpreted with caution due to the imbalance of dataset sizes (number of antibodies) (Supplementary Fig. 1A, B). Indeed, a correlation matrix stabilization study suggested that a dataset size of at least 50 K antibodies is required to stabilize association values among DPs (Supplementary Fig. 19A, C) and the distribution of their pairwise Pearson correlation values (Supplementary Fig. 19A, B, D). This threshold (50 K antibodies) is higher than the count of antibodies included in the murine subsets of the PAD dataset (≈ 24 K for heavy chains, ≈ 21 K for light chains) and all the species-specific and chain-specific subsets within the mAbs dataset (329 for human heavy chains, 320 for human light chains, 62 for murine heavy chains, 71 for murine light chains) (Supplementary Fig. 1B).

Next, we leveraged the principal component analysis (PCA) that we conducted on the developability profiles of native antibodies (Fig. 5c—see Methods) to examine how human-engineered antibodies relate to the native ones in the developability spaces (VH; Fig. 7b: top panels, VL; Fig. 7b: bottom panels). In addition to comparing native and human-engineered antibodies in the developability profile space (dimensionality: 101), we also used sequence embeddings provided by the ESM-1v75 protein language model (PLM, dimensionality: 103) (Supplementary Fig. 20B, Supplementary Fig. 21B). PLM embedding is an alternative to biologically motivated features (such as DPL), learned without supervision from a large pool of proteins87. Protein language modeling enables capturing longer-distance relationships within protein sequences88,89,90. We focused our analysis on human antibodies as they represent the largest species-specific subset among our datasets (≈0.8 M for native VH antibodies, ≈ 0.4 M for native VL antibodies, 99213 for PAD VH antibodies, 78921 for PAD VL antibodies, 329 for VH mAbs and 320 VL mAbs—Fig. 7B, Supplementary Fig. 1A, B).

Overall, we found that human-engineered antibodies (both VH and VL) are majorly contained within both the developability and PLM landscapes of the native antibodies (Fig. 7b, Supplementary Fig. 20B, Supplementary Fig. 21B), suggesting that—for the DPs included in our analysis (see Methods)—the developability and sequence landscapes of human-engineered antibodies merely occupy subspaces of the natural space (in terms of the two main axes of variation studied).

From the native repertoire perspective, VH antibodies coalesced into a single cluster in the VH developability space, and the positioning of the antibodies in this space was independent of both their isotype and IGHV gene family annotation (Supplementary Fig. 20A). In contrast to VH antibodies, native VL antibodies clustered in two distinct clusters in the VL developability space where the majority of IgK antibodies (99.999%) and IgL antibodies (95.6%) occupied the bottom cluster, and a small proportion of IgL antibodies (4.4%) predominantly occupied the top cluster (Supplementary Fig. 21A). This finding suggests that, except for a minor subset of native IgL antibodies, VL sequences (both IgK and IgL) are homogeneous with respect to their developability. This aligns with a recent report where, using the therapeutic antibody profiler (TAP), it was found that native IgL antibodies exhibited comparable structural developability characteristics to those of native IgK antibodies36. Among the human-engineered antibodies, only two (out of 320 VL) therapeutic mAbs (Erenumab and Bentracimab) of the isotype IgL, and only 1345 (1342 IgL and 3 IgK, out of 78,291 VL) patent-submitted antibodies were found to be contained in the top cluster (Fig. 7b), displaying a similar distribution trend between the two clusters as the native VL dataset. It is worth noting that previous studies (including TAP) have previously reported Erenumab to exhibit developability risk factors7,10, which may explain its localization in the top cluster of the VL developability space (Fig. 7b). Bentracimab was not included in these studies as it is not yet clinically approved (Phase III of clinical development as of September 2023)91.

As the human-engineered antibodies were shown to harbor comparable developability and sequence properties to those of the native ones (Fig. 7b, Supplementary Fig. 20B, Supplementary Fig. 21B), we investigated the generalizability of the native-trained multiple linear regression (MLR) models to predict the values of single missing DPs of the human-engineered datasets (Fig. 7c). Specifically, we implemented the MLR models from ML Task 1 (Fig. 6a and b) at the training sample size, which achieved the highest prediction accuracy before plateauing (1000 antibodies for DPL-based predictions, 20 K antibodies for PLM-based predictions) to predict the values of MWDS DPs (sequence and structure) on the human-engineered antibody datasets (Fig. 7c).

We found that the predictability of DP values for the human-engineered antibodies was similar to that of the native antibodies (Fig. 6b, Fig. 7c). For instance, the mean prediction accuracy (mean R2) of sequence DPs was between 0.25–0.33 for DPL-based predictions, and between 0.80–0.87 for PLM-based predictions among human-engineered datasets (Fig. 7c) in comparison to 0.34 and 0.92 (DPL and PLM respectively) for native antibodies (Fig. 6b). Similarly, when considering human-engineered datasets, the mean prediction accuracy of structure DPs ranged from 0.10 to 0.12 for DPL-based predictions and from 0.15 to 0.24 for PLM-based predictions (Fig. 7c). In comparison, the native antibodies exhibited prediction accuracies of 0.12 and 0.27 (respectively—Fig. 7c).

Collectively, our results suggest that the knowledge learned from the native antibodies in regards to their developability and sequence properties is, in part, generalizable to the human-engineered antibody datasets. Nevertheless, it is worth reiterating that the predictability analysis performed on the native and human-engineered datasets aims to inform experimental design and investigate factors of generalizability rather than improving predictability.

Discussion

Previous studies have shown that several experimental DPs may be computationally inferred7,12,32,56,60, which supports the real-world relevance of computer-based developability screening of large antibody sequence libraries. However, discrepancies between computational and experimental DP profiling remain18,30. Furthermore, previous studies using computational profiling varied in the number of DPs, ranging from <10 to >50010,55,57. In this study, we used 86 sequence and structure-based DPs, many of which were pairwise lowly or uncorrelated to one another (Fig. 2a and b,, Supplementary Fig. 6, Supplementary Fig. 7, Supplementary Fig. 9), thus capturing at least a subspace of the multidimensional developability space. So far, the relevant dimensionality of the antibody developability space remains unclear. That said, our analysis can be replicated with any other set of computationally or experimentally determined DPs. While most previous studies focus on small antibody datasets and mostly on the comparison between experimental and computational DPs, this study explores the plasticity of the in silico DP space on large-scale antibody datasets. While our findings may partly depend on the DPs studied, our conclusions derived from DP profiling (Fig. 7b) are consistent with those from PLM-based profiling (Supplementary Fig. 20B, Supplementary Fig. 21B). Specifically, we observe that in both representations, human-engineered antibodies are predominantly localized within regions occupied by natural antibodies and display a tendency to cluster in specific areas rather than dispersing uniformly throughout the space. In other words, our study should be understood as forecasting the types of analyses possible once large-scale experimental or highly validated computational DPs exist, allowing for routine repertoire-scale DP profiling and prediction. In summary, the real-world relevance of our systems approach is grounded in (i) the profiling of a large number of pairwise-independent DP parameters, (ii) sequence and rigid and dynamic structure-based DP analysis (Supplementary Note 2), (iii) diverse experimental datasets ranging from native to human-engineered, and (iv) parameter-independent computational and machine learning analysis. In the future, it would be of interest to add display libraries for comparison92,93,94 as well as a larger number of paired datasets with broad and deep isotype information95,96.

Intercorrelation analysis revealed that structure-based DPs show higher independence (lower pairwise correlation) than sequence-based DPs (Fig. 2). These findings emphasize the significance of structural consideration for therapeutic antibody design3,8,25,97,98. Due to the scarcity of antibody structural information, structure-based developability estimation has previously presented a substantial challenge in developability screening3,20. However, in silico high-throughput antibody structure prediction tools have evolved in speed and accuracy with machine learning algorithms, permitting the screening of structural parameters in large datasets9,38,39,99.

The scale of this work’s analysis facilitated the discovery of parameter redundancies that could accelerate future antibody developability screening processes. For instance, while Chen and colleagues included both the molar extinction coefficient and the extinction coefficient of the variable region sequence (AbChain_molextcoef, AbChain_percentextcoef) and the cysteine bridges (AbChain_cysbridges_molextcoef, AbChain_cysbridges_percenextcoef) as important developability predictors57, we found that one of these coefficients could likely be sufficient to replace the other. Similarly, although Ahmed and colleagues highlighted the importance of structure-based isoelectric point (pI) as an essential developability parameter on a limited-size therapeutic antibody dataset (77 clinical-stage antibodies)10, our analysis suggested that sequence-based pI (AbChain_pI) could potentially replace structure-based pI. We further highlighted the importance of large-scale antibody developability data to stabilize the associations among DPs (Supplementary Fig. 19), and, thus, to reveal such redundancies. Additionally, most developability studies emphasize the relevance of antibody charge in the physiological pH (7.4)7. However, mAbs are usually exposed to a wider range of solution pH (4.8-9) during production and formulation10,100. Also, antibody variable region charge in acidic pH has proven to be a critical factor in IgG mAb pharmacokinetics101,102,103 and product formulation28. Therefore, we included the sequence-based variable region charge measures in 14 pH points (1-14) in our analysis. We found that the charge measures of native IgG sequences formed three correlation clusters with intermediate (0.4–0.7) pairwise correlations, emphasizing the importance of antibody charge considerations on a wider spectrum of pH values (Fig. 2b). Other pairs of potential parameter redundancies are the sequence content of polar and non-polar amino acids (AbChain_polar_content Vs AbChain_nonpolar_content), the sequence content of aliphatic amino acids and the sequence aliphatic index (AbChain_aliphatic_content Vs AbChain_aliphatic_index), the average atomic interaction distance of the antibody structure and the number of Van der Waals clashes (AbStruc_mean_interaction_distance Vs AbStruc_vdw_clashes) and the sequence length and the sequence molecular weight (AbChain_mw Vs AbChain_length) (Fig. 2A). Redundancy-based parameter reduction, analogous to feature selection for ML models, would accelerate future antibody developability investigations by screening for a more comprehensive and sufficiently representative set of parameters.

Our analyses revealed chain-specific developability signatures in relation to DP values and pairwise associations (Fig. 3b, c), emphasizing possible differences in developability design considerations for therapeutic antibody development. Within each chain type, we found that murine and human antibodies occupy distinguishable developability spaces (Fig. 3c), highlighting the importance of transgenic mice for antibody screening and the challenges of antibody humanization efforts104,105,106,107. Interestingly, our findings suggest that the Kymouse (humanized mice) dataset under investigation undersampled the human dataset, both with respect to developability (Fig. 7b) and sequence spaces (Supplementary Fig. 20B), even though lower-level features such as VDJ gene usage and CDR3 length were previously found to overlap86.

In addition to chain type and species, we found that human heavy chain (VH) isotypes harbor high similarities in regard to their pairwise DP associations (Fig. 3b) and redundancies (Supplementary Fig. 10B, Fig. 3a). We found that they aggregated homogeneously in the developability space regardless of their isotype (Supplementary Fig. 20A). Thus, although all currently approved therapeutic mAbs belong to the IgG isotype108, our findings provide the incentive to explore the available native antibody Fv sequence space beyond the isotype annotation for novel mAb discovery. In regards to VL isotypes, human IgK and the majority of IgL antibodies clustered together in the developability space (Supplementary Fig. 21A), questioning the previous association of IgL antibodies with poor developability, in line with recent findings by Raybould and colleagues36, and providing an incentive to re-include IgL sequences in future antibody discovery libraries. Nevertheless, we are aware that our findings involve only the Fv regions of antibody sequences as current antibody-specific structure prediction models do not take into account the Fc region and do not account for the impact of the Fc region on developability103,109.

When examining the impact of sequence similarity on developability similarity, we found that antibodies that are highly similar in sequence can possess dissimilar developability profiles (Fig. 5), which is in line with previous findings110. This suggests that there are degrees of freedom available for therapeutic antibody candidate engineering to optimize developability with minimal changes in antibody sequence.

For instance, if an antibody candidate exhibited optimal antigen binding properties with a suboptimal developability profile, minor sequence changes could result in a substantial improvement (or deterioration) of this profile without major changes to its antigen binding properties. In this context, Petersen and colleagues showed that the native human antibody repertoire can aid the identification of advantageous (or disadvantageous) “universal” (native) framework mutations that could facilitate therapeutic mAb development54. Furthermore, small changes in antibody sequence with large effects on function have been observed for both developability and antibody binding properties3,111, underscoring the importance of simultaneous optimization of multiple design parameters. Since optimization of a specific property often comes with undesirable trade-offs for other properties29,61, studying the effect of substitutions can help with designing antibodies that exhibit improved developability with comparable efficacy.

To directly probe the relationship of sequence to developability, we performed a single amino acid substitution analysis (see Methods). Although changing one amino acid at a time only explores a small fraction of the input factor space62, it allows for the definitive attribution of the change in output to a specific amino acid substitution. Although including all possible variants with two or more substituted amino acids could account for epistatic mutation effects, it would geometrically inflate the combinatorial space beyond current computational feasibility. Because exhaustively mapping the DPs of such large sequence spaces is (currently) computationally infeasible, to explore more than single amino acid substitutions, one must find ways to efficiently and uniformly sample the input space, possibly through latin hypercube sampling or low-discrepancy sequences112,113. Briefly, we found that across all possible single amino acid substituted variants of a sample of 500 human heavy chain antibodies some parameters were especially insensitive to the average mutation. Furthermore, to quantify the sensitivity of a parameter, we measured the dispersion of parameter values of all possible single amino acid substituted variants of an antibody. We used “tailedness” (measured by excess kurtosis) and the range of the distributions as proxy measures for average and maximum potential sensitivity, respectively. Since excess kurtosis is, strictly speaking, defined for normal distributions, its interpretive power depends on the nature of the observed parameter. It may be beneficial in future work to consider additional properties of distributions, such as skewness or diversity (as measured by Shannon entropy). Despite the apparent low sensitivity of charge and hydrophobicity DPs, they have been shown to impact the developability of monoclonal antibodies greatly100. This is, in fact, not contradictory since the “tailedness” does not describe absolute DP shifts, but the proportion of outliers of a DP distribution (average sensitivity) or the range proportional to the original wildtype (WT) sequences (potential sensitivity). Since the sensitivity metrics are an aggregate score averaged over all variants of the sampled antibodies, they serve as a rough global characterization of DPs. Drastic property changes still require more comprehensive sequence- and structure-based local approaches in individual antibodies to model114. Prospectively, although we only studied general trends in sensitivity from the perspective of individual developability parameters, it would be interesting to explore how a mutation impacts values across all DP of an individual antibody.

In this work, we did not perform structure-based substitution analysis. Specifically, given that there is a lack of experimentally determined structures of antibody variants, it remains unclear how accurately antibody structure prediction tools can resolve single amino acid structural differences and reflect them in models (Supplementary Fig. 11, Supplementary Note 3)115.

More generally, in the context of classical MD simulations of antibodies, the conformational landscape typically exhibits minor variations compared to the initial rigid structure116. In MD simulations of five experimentally determined antibodies without antigen (Supplementary Note 2), we found that the structural fluctuations in the CDRH3 loop regions were generally minimal (on the order of 0.1 Å) throughout 100 ns simulations (as determined by RMSD)117. However, the structural variance between the initial (rigid) antibody conformation and the relaxed version was between 1.2 Å and 1.7 Å on average (Supplementary Fig. 4A, Supplementary Note 2). Given the relatively small magnitude of CDRH3 fluctuations and the current limitations of structure prediction methods, it is challenging to detect and accurately represent these small conformational changes in rigid antibody models. The accuracy of current structure prediction tools typically is below the range of these small fluctuations118. However, when it comes to studies involving antibody variants, we cannot overlook these fluctuations, as they reflect the effects of mutations. These computational structure analyses should be combined with experimentally determined antibody structures, such as X-ray or cryo-EM data. While experimental structures contain variations due to differences in crystallization conditions, resolution levels, and inherent protein dynamics, making the same structure slightly different in various studies, computationally predicted structures lack the experimental-related noise found in real structures, causing all models to appear identical outside the mutation site. Given these limitations, we anticipate that the differences in distances between variants, specifically at the mutation site, will be lower than expected in experimental variants and greater than expected in computational models. Our findings suggest that the reality of variant effects lies between the generated models and structures observed experimentally. Furthermore, due to the dynamic nature of proteins, simulated or experimentally determined structural differences of the same protein can overshadow differences caused by mutations, especially in highly flexible antibody loops like CDRH3.

ML models have been shown to be able to predict the biophysical characteristics of proteins29,31,119,120. While their use may not be widespread in the pharmaceutical industry, the emergence of AI/ ML may become routine as part of initial in silico efforts to screen and assess molecular properties and interactions before any experimental efforts23. To efficiently determine biological (and, possibly, clinical) properties of antibodies using machine learning-based methods, it is important first to identify a suitable representation. In our study, we compare two alternative embedding types, one (DPL), which collects (a subset of) DPs into a numerical vector and the other (PLM) obtained by encoding the antibody sequence using a pre-trained neural network (NNs). The former representation mirrors feature-selection approaches to DP prediction, while the latter embraces the latest advances in deep NNs and, in particular, transformer-based models. We compute DP predictions by passing DPL and PLM embedding through linear regressor heads. As shown in Fig. 6b, the predictive power of our models is limited and this holds especially for DPL predictors, even on sequence DPs, which are instead well predicted using the PLM representation. The stark difference in performance is not surprising and can be understood as the combination of two facts. First, we observe that our use of linear heads is a restrictive architectural choice, which is likely over-simplistic: it is natural to expect that most—even though not all—DPs cannot be written as a linear combination of the remaining ones. This limitation is exacerbated by multiple collinear features (as in the full developability profile, Fig. 2) and by a reduced number of dimensions (the number of coordinates of DPLs vs PLMs span two orders of magnitude). Second, as visible in Supplementary Fig. 20C (right), PLM representations retain considerable information about sequence identity and they can be thought of as a learned map from sequences to a latent space. They are, therefore, clearly more suitable to derive sequence DPs and, as our experiments find, also structure DPs, although to a lesser extent. Overall, PLM-based DP predictions generalize well across human-engineered datasets, hinting at a modest amount of overfitting. This, paired with the observation that regressor quality improves with bigger train set sizes (no clear plateaus found for the cardinalities tested in Fig. 6b), indicates a healthy learning trend and suggests that predictions may benefit from more advanced head architectures.

DP predictors also shed light on the relationship between different groups of DPs, e.g., between sequence and structure DPs, and MWDS vs. non-MWDS ones. Both DPL- and PLM-based regressors fail in inferring most structure-based parameters and, with few exceptions (notably AbStruc_psi_angle and AbStruc_unfolded_pI), succeed on the same DPs (e.g., AbStruc_loops, AbStruc_beta_strands, AbStruc_sasa, AbStruc_unfolded_pI, AbStruc_pcharge_hetrgen, AbStruc_psi_angle). This seems to suggest the failure of both representations to capture the complex relations between sequences in linear subspaces of the latent space. We also remark that although the lower correlation between structure DPs seen in Fig. 2A may suggest that noise introduced by structure prediction tools also contributes to the weak prediction outcomes, we were unable to obtain better scores when predicting structure DPs computed on the (few) available solved antibody structures.

Of note, the ML methods employed in this work are by no means exhaustive and only represent a first step toward ML-based DP predictability. It is of interest that already relatively simple ML approaches achieve good prediction accuracy. More generally, our ML approach is to be understood as an example of potential studies that may be performed once it is clearer which DPs are causally linked to downstream antibody candidate success.

When examining the role of structure prediction tool on DP value computation, we reported that structure-based DPs calculated on different antibody structure prediction models correlate overall poorly (Supplementary Fig. 3A, Supplementary Note 2). These observations align with recent reports by others32,36,118. Importantly, we found that although ABB238 and IgFold39 have been found to be superior to ABB37,38,121 (which was used in this work for the majority of the results), the correlation of ABB with all other tools (experimental) in terms of structure-based DPs was low and did not differ to the aforementioned tools. Strikingly, we found that structures obtained by computational structure prediction methods belonged to a common underlying ensemble structural distribution as revealed by MD (Supplementary Fig. 4, Supplementary Note 2). Therefore, the usage of ABB did not considerably bias the results in this paper.

However, it remains challenging to predict the antibody CDRH3 loop regions, which are of great interest due to their involvement in antibody-antigen binding97 as well as developability20. These loop regions are generally more flexible and less structurally conserved compared to stable secondary structure elements like beta sheets118. Consequently, predicting the specific conformations of these loops accurately can be difficult, and it becomes even more challenging to determine when a prediction is correct without combining experimental validation and MD simulations32,118,122.

MD is important for a fuller understanding of antibody systems as it provides insight into the flexibility and fluctuations of antibody structure and developability parameters28,32,36,121,123,124. Specifically, Park and Izadi found that antibody developability surface descriptor parameters (e.g., positive electrostatic potential on the surface of the CDR region125) vary extensively as a function of the structure prediction method used, which is in line with the findings in this manuscript (Supplementary Note 2). However, after Gaussian-accelerated MD (GaMD) simulations and averaging the values of the descriptors through MD frames, the consistency between the descriptors improved32. Similarly, Raybould and colleagues36 evaluated variations in four structure-based Therapeutic Antibody Profiler (TAP)7 scores using molecular dynamics (MD). The mean values of the TAP properties observed during the simulations closely matched an ensemble generated from three TAP predictions based on static Fv models directly generated by ABodyBuilder2. Furthermore, certain structure prediction tools, such as IgFold39, refine the final model using short relaxation using OpenMM126 or PyRosetta127.

In the future, generating multiple models with various refinements could yield a conformational ensemble similar to what is obtained from a full-fledged long MD simulation118,122. Training on dynamic data, such as MD simulation-based conformations, enables the model to capture loop flexibility and variability, resulting in more robust and realistic predictions that can improve currently challenging CDRH3 structure prediction. An ideal ML model trained on dynamic data could streamline antibody structure prediction by eliminating the need for additional MD simulations, saving computational resources, and providing a higher level of confidence in predicted structures and insights into hidden antibody characteristics and developability properties.

Since the development of therapeutic antibodies involves human-directed sequence engineering to attain desirable developability properties, there have been efforts to separate natural from therapeutic antibodies7,55. However, we showed that human-engineered (including therapeutic) antibodies fall within the developability space of natural antibodies (Fig. 7b). Although this could simply be due to the particular selection of DPs and the (major) principal components of variations, we showed that human-engineered antibodies also fall within the sequence space of natural antibodies (Supplementary Fig. 21B). Here, we discuss several possible explanations:

(1) Our findings are consistent with52, who have shown considerable sequence overlap between therapeutic antibodies and NGS-derived natural repertoires, indicating that therapeutic antibody sequences are largely derived from natural sequences with few modifications. This could also explain why MLR models trained on natural DPs predict DPs of human-engineered antibodies with similar accuracy (Fig. 7c). (2) Although the development and formulation of therapeutic antibodies involve distinct challenges from natural antibody generation, both might be subject to converging primary restrictions. As shown by ref. 54, certain framework mutations regularly occurring in clinical-stage therapeutic antibodies are also frequently observed in natural repertoires, indicating converging selection towards common characteristics such as stability. It is conceivable that the sequence space of stable (and non-immunogenic) antibodies is restricted so that natural and human-engineered sequences occupy overlapping subspaces within. (3) There might be considerable engineering of the Fc region128, which is not included in our study, that could separate human-engineered from natural antibodies in sequence space. (4) Lastly, the number of patent-submitted58,129 and therapeutic antibody sequences available is relatively small, which means that there may be potential therapeutics in yet unexplored regions of the sequence space. However, given that sample sizes differ across the human-engineered datasets, more data is needed to study how cross-transferable conclusions are.

Despite being globally similar to the native dataset, DPLs of human-engineered antibodies show distinctive traits. Most notably, they differ in the patterns of correlation found among DPs. This is best seen in Fig. 7b, which is obtained by projecting native DPLs along two axes which are determined in order to be uncorrelated. In this subspace, the cluster originated by native antibodies has a circular shape, i.e., it is isotropic, meaning that the knowledge of one of the two coordinates gives limited information about the other. This property, however, does not hold for the set of engineered antibodies, which form an elongated shape, stretching along a diagonal. A similar cluster shape implies that growing x-values tend to correspond to bigger y-values or, in other terms, that there is a (weak) positive correlation between the two axes. Among human-engineered antibodies, then, alterations to the two parts of DPL, which are projected on each axis, do not happen independently anymore, likely signaling the artificial effect on DPs of the antibody optimization process. In the future, it would be of interest to study how this potential signal of the antibody optimization process is represented in different PLM embeddings, for example, in antibody-specific PLM embeddings, for which evidence is conflicting as to whether general protein or antibody-specific PLM more faithfully represent inter-sequence functional similarity77,130,131

Furthermore, it is interesting to note that human-engineered DPLs do not cover entirely the native DPL cluster (Fig. 7b). Among human-engineered antibodies, it is hence less probable to find some types of DPLs, which appear instead frequently among native ones: this suggests the existence of an unexplored region of DP space. Our conclusion is strengthened by the presence of a similar low-density zone on the right of the cluster of PLM embedding (Supplementary Fig. 20B), which may constitute evidence of under-investigated classes of antibody sequences.

Finally, computational developability profiling, similar to computational antigen binding profiling3,98, will be increasingly useful once the real-world relevance of computational DPs has been further established30,31,32,100. To this end, multiple avenues require further development: (i) insight into differences of paired vs single-chain developability95,96, (ii) faster computation of structural and MD-based DPs132, (iii) negative controls for clinical stage antibodies that have not progressed due to developability issues, (iv) large-scale data where multiple properties for a given antibody are captured to start exploring multiparameter generative design29,61,106,133, and (v) open and unbiased competitions with agreed-upon quality experimental (and simulated69,134) data to quantify the current state-of-the-art in DP predictability, causal relationships and importance ranking vis-a-vis DP impact on antibody developability.

Methods

Antibody datasets

The native antibody dataset

We collected 2,036,789 native human and murine antibody sequences (Supplementary Fig. 1A). Briefly, we assembled non-redundant sequences of human (heavy chains: IgD 173,342, IgM 173,437, IgG 170,473, IgA 171,174, IgE 165,992, and light chains: IgK 198,255, IgL 187,378) and murine (heavy chains: IgM 198,967, IgG 199,326, and light chains: IgK 198,795, IgL 199,650) native antibody variable region sequences (Supplementary Fig. 1A). Antibody sequences were majorly sourced from Observed Antibody Space (1,738,091 sequences)135 in addition to including our own experimentally-generated sequences (total of 298,698 sequences with the IgD, IgK and IgL human datasets) to provide balanced antibody count among all isotypes. Experimental sequences were generated following a protocol inspired by ref. 136, starting from human blood (see Methods), and exported as VDJ clones from raw sequencing files using MiXCR (version 3.0.1)137. OAS sequences were aligned and IMGT-numbered using the IMGT/High V-QUEST138. The kappa and lambda light chain types (IgK and IgL) were often referred to as isotypes in this manuscript to provide a cohesive reading experience.

The human-engineered antibody datasets

(1) The therapeutic antibody dataset (mAb)

A total of 782 therapeutic antibody sequences belonging to the “whole mAb” format were obtained from TheraSAbDAb as per July 2021139. First, all mAb sequences were aligned to both murine and human germlines (species of interest) with the antigen receptor alignment and annotation tool ANARCI140. We used the same IMGT-numbering scheme as implemented in the native dataset sequences numbering to ensure consistent methodology. For each chain, we kept the alignment with the highest germline certainty (human or mouse) by selecting for the highest V gene identity (the sequence identity over the V-region to the most sequence-identical germline). In case V gene identity was equal for murine and human alignments, we kept the alignment with the highest J gene identity (the sequence identity over the J-region to the most sequence-identical germline). To ensure the accuracy of annotations, we excluded sequences with V gene identities <0.7 (Supplementary Fig. 1B).

(2) The patented antibody database (PAD)

Under a non-commercial agreement with NaturalAntibody, we obtained a total of ≈ 240 K unpaired variable region antibody sequences (Supplementary Fig. 1B). Sequences were originally extracted from patent documents from intellectual property organizations and bioinformatic databases (WIPO: world intellectual property organization, USPTO: United States Patent and Trademark Office, EBI: European Bioinformatics Institute, DDBJ: DNA Data Bank of Japan), and mined as explained by ref. 58. Sequences were provided with species and V-gene annotation metadata and IMGT-numbered. We selected for human and murine sequences and classified them into heavy (IgH) and light (IgK and IgL) chains based on the corresponding V-gene annotation resulting in a final count of 223,613 PAD sequences (Supplementary Fig. 1B).

(3) Humanized mouse dataset (Kymouse)

We obtained 209,452 IgM VH sequences from humanized transgenic mice (Kymouse)86. Sequences were downloaded from OAS135, and their structures were predicted with ABB as explained above. Finally, we also measured their sequence and structure DPs.

The in silico mutated antibody dataset

We first sampled 500 wildtype (WT) antibodies from the human native VH dataset (100 samples per isotype for IgM, IgD, IgG, IgA and IgE). We generated all possible acid substitutions for each antibody, resulting in a total of 301,777 sequences for which we predicted/computed their sequence-based developability parameters as detailed above. We used this data to perform the sequence DP sensitivity analysis described in Fig. 4.

Antibody structure prediction

We used ABodyBuilder (ABB) to predict the structures of antibody variable regions37. The high-throughput version of ABB is provided as an image for Vagrant VirtualBox (version 2.2.16), known as SabBox. We ran this version with default parameters using unpaired single chain input (heavy or light) to predict all structures in all datasets, unless mentioned otherwise. The ABB pipeline includes a relaxation with MODELLER after the initial antibody structure prediction.

In silico calculation of developability parameters

Calculation of sequence-based developability parameters

Molecular parameters: the molecular weight of an antibody variable region sequence was calculated using the Peptides R package version 2.4.4141,142. We calculated the antibody length and the average residue weight using custom R scripts143. Amino acid categorical composition: Using the Peptides R package and a custom R script, we calculated the proportional (%) content of amino acid categories142 (Supplementary Table 1) by dividing the occurrences of the amino acids in one group by the length of the antibody variable region sequence. pI and charge: To compute the pI and charge of a given sequence, we used the Peptides R package141 using the “Lehninger” scale between pH = 1 and pH = 14 with step size = 1 (14 data points). Extinction coefficient and molar extinction coefficient (for all and for cysteine bridges): Calculations were performed as mentioned in refs. 144,145 using custom R scripts. Hydrophobicity and hydrophobic moment: To compute hydrophobicity and the hydrophobic moment, we used the Peptides R package142, the scale of choice was Eisenberg. For hydrophobic moment calculation, we selected ten amino acids for the length of the sliding window size based on our understanding of the antibody secondary structure and as explained by ref. 146. We also specified the angle value as 160 as recommended by ref. 147. Instability index and aliphatic index: The aliphatic index is defined as the relative volume occupied by aliphatic side chains (Alanine, Valine, Isoleucine, and Leucine—Supplementary Table 1). It may be regarded as a positive factor for the increase of thermostability of globular proteins148. The instability index was first developed by ref. 149 to reflect the stability of proteins based on their content of certain dipeptides that were found to be associated with degradation tendency. We calculated the instability and aliphatic indices using the Peptides R package141. Protein solubility prediction: We used SoluProt (version 1.0) to predict the solubility index for antibody variable region sequences150. We chose this tool as it allows the consideration of protein expressibility into its solubility score prediction while proving comparable performance compared to other state-of-the-art solubility prediction tools150. Sequences with solubility scores above 0.5 were predicted to be soluble when expressed in Escherichia Coli. Immunogenicity prediction: We used netMHCIIpan version 4.040 to estimate the immunogenicity of the antibody variable region sequences. Briefly, we examined the global immunogenicity of the antibody sequences by predicting their affinities for HLA II supertypes that are found in 98% of the global population111. We obtained the following numerical values from these calculations including (i) minimum rank, (ii) number of weak binders (percentage rank >2 and <10), (iii) number of strong binders (percentage rank < 2), (iv) full average percentage rank and (v) the number of antibody regions where the maximal immunogenic peptide stretches (maximal_immunogenicity_region_span). As these calculations are computationally intensive, we computed the immunogenicity DPs on 10% only of the native, Kymouse and PAD datasets.

Calculation of structure-based developability parameters

First, we added hydrogen atoms to each antibody variable region structure that we previously built with ABodyBuilder using the Reduce software (version 3.24.130724)151. Hydrogenated structures were used as an input (pdb format) to calculate structure-based developability parameters as they were reported to provide a closer chemical representation of their potential in vivo structures152. As detailed in Supplementary Data 1, we used BioPython (version 1.79) to compute secondary structure parameters153, FreeSASA (version 2.1.0) for solvent accessibility predictions154, PROPKA (version 3.4.0) for the calculation of thermodynamic and electrochemical parameters155 and ProDy (version 2.0) to count free and bridged cysteines156. We used custom Python scripts for the calculation of developability index (DI) and spatial aggregation propensity (SAP) following the work of Lauer and colleagues56,157. We used the original Black-Mold hydrophobicity scale as suggested by ref. 124. Finally, we used Arpeggio (version 1.4.1) to calculate interatomic interactions after converting antibody hydrogenated structures to cif format158.

Parameter correlation calculation and visualization

For the analysis in Fig. 2b, Supplementary Fig. 9A, Supplementary Fig. 6 and Supplementary Fig. 7, we computed pairwise developability parameter correlations (Pearson) matrices using the ‘cor()‘ function from the ‘base’ package in R (version 4.0.3). for each isotype/species combination to investigate parameter associations and redundancies. Missing data was accounted for by choosing the argument use.pairwise.obs=complete option in the function. We visualized the correlation matrices using ComplexHeatmap R package version 2.9.4159 and annotated the heatmaps with the threshold-specific ABC-EDA/MWDS algorithm output160.

Determination of the minimum weight dominating set of developability parameters

To investigate the redundancy of developability parameters (DPs), we first constructed undirected weighted network graphs for each species and isotype where the nodes represent DPs and the edge weights represent the pairwise Pearson correlation values (as in Supplementary Fig. 9B). For this purpose, we constructed and visualized networks with Cytoscape 3.9.1161,162. The constructed networks could include up to three distinct classes of pairwise relationships for a given correlation threshold (from 0.1 and 0.9 in intervals of 0.1): (1) isolated nodes representing DPs that have correlation values below the given threshold with all other DPs, (2) doublets (exclusive pairs) where two DPs are solely correlated with each other (above the given threshold), and (3) correlation subnetworks where more than two DPs form correlation clusters.

Secondly, the minimum weight dominating set (MWDS) for a specific network at a given correlation threshold was defined as the set of parameters for which the sum of the associated edge weights is minimal59. Thus, the MWDS is a subset of nodes where each node in the network is either in the MWDS or connected to a member of it by an edge59. Based on this definition, we included all DPs from classes (1) and (2) in the MWDS, where no meaningful selection based only on correlation can be made. However, finding the dominating parameters among class (3) DPs is an NP-hard problem. Thus, we implemented an algorithm described by Shetgaonkar and Singh, which leverages the local optimization of an artificial bee colony algorithm guided by iterative global estimations of distribution (ABC-EDA algorithm) to approximate optimal solutions59. Ultimately, this algorithm classifies class 3 DPs as either “dominant” or “redundant”. The dominant DPs were added to the final MWDS.

Correlation distance dendrograms

This analysis aims to examine the similarities in the pairwise associations of developability parameters among isotypes and species of the native and human-engineered antibody datasets (Fig. 3b and Fig. 7a). For this analysis, we transformed the isotype- and species-specific parameter correlation matrices (sequence and structure parameters separately) into numerical vectors. Subsequently, we quantified the pairwise Pearson correlation distance for these numerical vectors using the ‘get_dist‘ function from the R package factoextra version 1.0.7163. We finally clustered the resulting distance matrices using the hierarchical clustering ‘hclust‘ command following the complete linkage method from the stats package, version 4.0.3164, where the height of the dendrogram represents the Pearson distance (0–1).

Analysis of parameter sensitivity to single amino acid substitution: excess kurtosis and range

Excess kurtosis: To investigate the impact of changes in the amino acid sequence on DP values (sensitivity—analysis in Fig. 4 and Supplementary Fig. 12), we normalized (mean-centered) the DP values of the mutated antibody dataset by subtracting their mean and then dividing by their standard deviation. The DP distributions of all mutants of each wt-antibody were analyzed by calculating and comparing two metrics First, the excess kurtosis, defined as \({Excess\; kurtosis}=\frac{{\mu }_{4}}{{\sigma }^{4}}-3\), where \(\mu\)4 is the fourth central moment and \(\sigma\) the standard deviation of the normalized DP values. An excess kurtosis of 0 represents that of a normal distribution. It increases with peakedness and decreases with uniformity of the distribution, under the assumption that it is bell-shaped. Thus, while a high excess kurtosis indicates a small change induced by single amino acid substitutions on average or low average sensitivity, low excess kurtosis suggests high average sensitivity. The second metric is the range of a distribution, defined as the distance between the highest and lowest DP values and represents the maximum normalized shift that is inducible by a single amino acid substitution.

Pairwise developability profile correlations and pairwise sequence similarity studies

We define the antibody developability profile (DPL) as a numerical vector that carries the values of developability parameters in a fixed order for a given antibody sequence. Developability parameter values were mean-centered and scaled to unit variance (normalized)165. Of note, scaling for a given group of antibody sequences was performed after accounting for chain type and species.

For the analysis in Fig. 5a, b, we defined the pairwise developability profile correlation (DPC) as the Pearson correlation value between a pair of sequence developability profiles. We computed the amino acid-based sequence similarity between two antibody variable region sequences (A and B) as follows:

$${Normalized\; Levenshtein\; distance}=\frac{{LD}(A,B)}{{Max}({length}(A),{length}(B))}{Sequence\; similarity}=1-{Normalised\; Levenshtein\; distance}$$

Where LD(A,B) represents the Levenshtein distance between A and B. We used the stringdist R package version 0.9.8166 and the Levenshtein python package version 0.20.8 to calculate LD167.

In Fig. 5c we inspected the relationship between sequence similarity and normalized developability profile (DPL) similarity by first examining the proximity of human heavy chain antibodies that belong to the same sequence similarity cluster on the 2D PCA projection plane of the developability space (RN), where N is the number of DPs that makes a developability profile. In this analysis, we used the MWDS DPs (which have full values for all antibodies) as identified by the ABC-EDA algorithm for the human IgG dataset at an absolute Pearson correlation coefficient threshold of 0.6, accounting for 46 DPs (N = 43, Supplementary Table 2). The PCA was computed using the Python packages scikit-learn version 1.1168 and dask-ml version 2022.5.27169. Sequence similarity groups were identified using USEARCH version 11.0170 as groups of 10000 (or more) antibodies that share at least 0.75 sequence similarity as defined by Levenshtein distance (Supplementary Fig. 16B).

Then, to quantify the relationship between sequence similarity and developability profile similarity, we studied the correlation between the pairwise Euclidean distance (ED) in the developability space (RN) and the pairwise normalized LD for a sample of 5000 human IgM antibodies as in Supplementary Fig. 16C (computationally intensive process; 12,500,000 data points for each ED and LD calculation). We ensured that these sampled antibodies belong to the same IGHV gene family to exclude the effect of IGHV family variance on the LD computations64,65. Antibody pairs with ED ≥ 15 were considered outliers and were excluded from this analysis (forming 0.8% of total data points).

Of note, the PCAs from this analysis were used to examine the positioning of the human human-engineered antibody datasets within the developability space of the human native antibodies (Fig. 7b) and the role of native antibody isotype and germline gene annotation in the positioning within the developability space (Supplementary Fig. 20A, Supplementary Fig. 21A).

Predictability of developability parameters

We used the human VH antibody sequences (854,418 antibodies) and their computed DP values (normalized; mean-centered) to assess the predictability of developability parameters in two machine-learning tasks (ML task 1 and ML task 2). Of note, we excluded mouse antibodies and VL sequences to avoid species- and chain-specific biases, ensuring greater data homogeneity. In both tasks, the predictability of DPs was assessed by computing the coefficient of determination (R2) between observed and predicted DP values171.

ML task 1; predicting the values of the same (single) missing DP

For this task (Fig. 6a, b, and Supplementary Fig. 17A), we randomly split the human VH antibody sequences into two subsets: (i) training set; containing ~80% of sequences (683,534), which was further subsampled to derive training sets of variable sizes (50, 100, 500, 1000, 10000, 20000). Specifically, for each size, we defined 20 independent training subsets (ii) test set; containing the remaining (~20%) sequences (170,884). Then, we used two types of embeddings to train multiple linear regression (MLR) models: (1) single-DP-wise incomplete developability profiles (low-dimensionality embeddings—order of 10 s) and (2) antibody sequence encodings obtained from the protein language model (PLM) ESM-1v (high dimensionality embeddings—order of 1000 s74,75,76). Beginning-of-sentence (BOS) processing was used to compress the vectors produced by the PLM to avoid biases that might occur by averaging the entire vectors (as detailed below in “Antibody sequence encoding with PLM”). Finally, we compared the predictive power of both embeddings to predict the values of DPs in the test set after deleting single-DP column values at a time. For each type of embedding, we predicted the missing DP using linear regressors, trained with the MSE loss. We did not opt for more complex types of regressors to contrast the occurrence of overfitting. The DP value prediction was repeated 20 times (using the 20 independently trained MLR models per training set size) and the mean R2 was reported.

The native-trained MLR models from this task were then utilized to predict the values of MWDS DPs (Supplementary Table 2) in the human-engineered datasets (shown in Fig. 7c). Models were trained using the training set size, which achieved the highest prediction accuracy before plateauing (1000 antibodies for DPL-based predictions 20 K antibodies for PLM-based predictions).

ML task 2; predicting the values of randomly missing DPs

For this task (Fig. 6a, c and Supplementary Fig. 17B), we randomly deleted (either 2% or 4% of) DP values from subsamples of the human VH antibody sequences (683,534 sequences defined as training set in ML Task 1). We then predicted the deleted (missing) data using the multivariate imputation by chained random forests (MICRF) algorithm79 via the missRanger R package172. We repeated this step 20 times for each subsample size (50, 100, 500, 1000, 10000, 20000 antibodies) and reported the mean R2.

For both ML tasks—and both embedding types implemented in ML Task 1—we performed ablation studies by randomly permuting the column values in the input datasets for the ML models, and confirmed that the prediction accuracy was abolished (R2 ≤ 0).

Antibody sequence encoding with PLM

Antibody sequences were encoded using a Protein Language Model (PLM). PLMs are deep neural networks designed to transform protein sequences into contextual embedding vectors depending on the entirety of the protein sequence. In our experiments, we use the PLM ESM-1v since this model is optimized to predict the effects of mutations on the function of proteins75.

To extract a global, fixed-size representation from PLM embeddings, we use a compression scheme based on the Beginning-Of-Sequence (BOS) token, as is customary for LLMs, e.g., for the [CLS] token in ref. 173. BOS tokens are trained to provide a summarized representation of the entire protein sequence and are, therefore, a natural choice to represent antibodies.

Graphical illustrations

We used BioRender.com to create the illustrations in Fig. 1, Fig. 4a, and Fig. 6a. Antibody structural images were produced in PyMOL v2.5.5174. We generated the remaining figures in RStudio142 using the ‘ggplot2’ package175 and figure panels were aggregated with Adobe Illustrator176.

Statistics and reproducibility

All statistics were calculated using the rstatix R package (version 0.7.2). The suitable statistical test used in each analysis is reported accordingly when applicable. The sample size, the number of iterations, replications and data shuffling are also reported for all experiments included. Bias in developability parameter value prediction was avoided by including only non-redundant MWAS DPs in the predictability experiments.

Experimentally generated native antibody sequences (human IgD, IgK and IgL)

Human subjects, B-cell isolation and RNA extraction

Human peripheral blood was obtained from one healthy volunteer. Sample acquisition was approved by the Regional Ethics Committee of South-Eastern Norway (project 6544) and informed consent was obtained. All ethical regulations relevant to human research participants were followed. Samples were collected in BD Vacutainer® K2 EDTA tubes, and pan B cells were isolated by negative selection using MACSxpress® Whole Blood B Cell Isolation Kit (Miltenyi Biotec). The cells were washed with PBS and remaining erythrocytes were lysed using Red Blood Cell Lysis Solution (Miltenyi Biotec). RNA was extracted using RNeasy Kit (Qiagen). RNA quality and concentration were measured using a Nanodrop spectrophotometer (Thermo Fisher Scientific).

cDNA synthesis

200 ng of RNA was used for cDNA synthesis with 1 µl of 100 µM isotype-specific reverse primers, 1 µl of 10 mM dNTP Mix (Thermo Fisher Scientific) and nuclease-free water up to 14.5 µl. Mixture was incubated for 5 min at 65 °C, briefly placed on ice and centrifuged. Subsequently, 4 µl of 5X RT buffer (Thermo Fisher Scientific), 0.5 µl of RiboLock RNase Inhibitor (Thermo Fisher Scientific), and 1 µl Maxima RT (Thermo Fisher Scientific) were added and cDNA synthesis was performed at 50 °C for 30 min with reaction termination at 85 °C for 5 min. The obtained cDNA was purified using MinElute PCR Purification Kit (Qiagen) and eluted in 20 µl of EB buffer.

5’ Multiplexing (MPTX) PCR

4 µl of cDNA were amplified with 0.5 µl of 100 µM Read2U primer, 1 µl of chain-specific 5′ forward leader primer mix (Supplementary Data 2), 10 µl of KAPA HiFi HotStart ReadyMix (Roche Molecular Systems), and 4.5 µl of nuclease-free water at the following conditions: 96 °C—5 min; 25 cycles of 95 °C—20 s, 68 °C—20 s, 72 °C—20 s; 72 °C—5 min; 4 °C—hold. Amplified product was run in a 1.2% agarose gel in TBE buffer. Bands corresponding to the amplified regions of interest (~480 bp) were cut and purified using QIAquick Gel Extraction Kit (Qiagen) with elution in 20 µl of EB buffer.

NGS library generation

For indexing PCR, 10 ng of the 5’ MTPX product were mixed with 0.5 µl of 100 µM P5_R1 forward indexing primer, 0.5 µl of 100 µM P7_R2 reverse indexing primer containing Illumina index sequence, 12 µl of KAPA HiFi HotStart ReadyMix, and nuclease-free water up to 24 µl and reaction was performed at the following conditions: 96 °C—5 min; 10 cycles of 95 °C—30 s, 68 °C—30 s, 72 °C—30 s; 72 °C—10 min; 4 °C—hold. The resulting libraries were bead-purified with AMPure XP (Beckman Coulter) using a 1:1 beads ratio. The molarity of the libraries was determined using Qubit™ 4 Fluorometer. The final libraries were inspected with BioAnalyzer (average product length 550 bp) and sequenced on the Illumina MiSeq platform (V3 chemistry 300×2 bp). The raw sequencing data is available on the Sequence Read Archive (BioProject number PRJNA1043047).

Validation datasets

The crystal structures dataset

We obtained 859 crystal structures of paired-chain antibodies (Fv regions) from the antibody structure database (AbDb)177. We extracted the amino acid sequences for both (heavy and light) chains and utilized them to further predict the structure of paired and unpaired antibody chains in silico. We used several tools to benchmark our analysis including ABodyBuilder37, ABodyBuilder238, IgFold39, AlphaFold2178, and AlphaFold-multimer179. Both ABB and ABB2 pipelines include a relaxation step after the initial antibody structure prediction. Of note, the alignment process and regional sequence definition (CDRs and FRs) was performed similarly to the processing steps mentioned for the therapeutic antibodies dataset (see Methods). We finally computed the values of structure-based DPs on these structures to perform the rigid model analysis described in Supplementary Fig. 3.

The in silico mutated antibody dataset at CDRs

We subsampled 10 WT antibodies per isotype, except 9 for IgD, and all their corresponding CDR mutants (a total of 30,015 antibody sequences) from the in silico mutated dataset (see Methods), and predicted their structures (including WT antibodies) using IgFold version 0.1.039. We used IgFold for this task as template search-based tools (including ABB) tend to utilize an identical template for antibodies that are within the distance of a single amino acid substitution from their wildtype antibody180. We used this dataset to conduct the structural variance study described in Supplementary Fig. 11A.

The AbDb antibody pairs

To study the structural variance of antibody mutants with a single aa difference, we sourced 10 pairs of antibodies (Supplementary Table 3) from the public AbDb database177. We ensured that each pair of antibodies (1) has the same sequence length and (2) the different (single) amino acid is located in the loops. This dataset was used in the structural variance study described in Supplementary Fig. 11B, C.

Molecular dynamics simulations

We selected five paired-chain (VH and VL) antibody structures from the crystal structures dataset with the best resolutions (Supplementary Table 4) to perform classical MD simulations. Two antibody structures (4TRP and 5WCA) contained missing atoms, which were corrected with the MODELLER’s function “complete_pdb”181. Hydrogens were added to initial antibody structures using Reduce (151). All MD steps were performed in Gromacs v.2022.4182. We used AMBER99SB-ILDN183 force field and transferable intermolecular potential with 3 points184 water model for MD system preparation185. The simulation box was defined as a cube centered around the antibody placed at the 1 nm distance between the solute and the box. All systems contained both Na and Cl ions at 0.1 mol/liter salt concentration. We minimized the energy of each initial system with the steepest descent algorithm186 for 20000 steps. For the equilibration187, we first considered constant volume simulations at 300 K for 100 ps and followed up by constant pressure simulations at 1 bar for another 10 ns. During these equilibration simulations, where applicable, we used the Parrinello-Rahman barostat188 and V-rescale thermostat189 with velocity rescaling using 0.1 ps and 0.1 ps time constants, respectively. During all the simulations, we constrained the length of all bonds using the linear constraint solver (LINCS) algorithm190 and kept the water molecules rigid via the SETTLE algorithm191. We used particle mesh Ewald (PME)192 for treating the electrostatic interactions with a real-space cutoff of 1.0 nm192. We simulated a 100 ns independent production run with 2 fs time step for each system, all continued from the last step of the NPT simulations at 300 K and 1 bar using a 2 fs time step. Further details of the parameter specifications can be found in .mdp files. MD simulations were performed starting from crystal structures and ABB-predicted models for the five selected antibodies (Supplementary Table 4), followed by the computation of structure-based DPs on the convergent frames (4001 frames per antibody per structure origin). This data was used to perform the molecular dynamics analysis described in Supplementary Fig. 4.

Computation of the overlap index (η)

For the analysis in Supplementary Fig. 4C, we used the ‘overlapping’ R package193 to quantify the overlap between the distributions of DP values when measured on the MD-simulated crystal structures and the MD-simulated ABB-predicted structures.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.