Introduction

Monoclonal antibodies are well-established as therapeutic agents and essential tools for biological research1. Monoclonal antibodies bind their target with high affinity and specificity and are amenable to engineering to provide desired antibody functions. However, one of the major obstacles to developing a therapeutic antibody in a conventional IgG format is the challenge of producing a large (~ 150 kDa) heterotetrameric protein with multiple domains and disulfide bonds; human IgG1, for example, consists of 12 domains and has four interchain and 12 intradomain disulfide bonds. In this regard, research on a smaller-sized antibody format has recently garnered significant interest. The antibody fragments and other smaller non-traditional antibodies that have been commercially developed in recent years attest to the usefulness of these formats. Antigen binding fragments (Fab) and single-chain variable fragments (scFvs), for example, can be extensively engineered for higher affinities and better physicochemical properties2,3 and are utilized in various applications, including biotherapeutics.

Since antibody fragments are structurally simpler and substantially smaller than immunoglobulins, they can be functionally expressed in prokaryotic host cells4 and, thus, displayed on bacteriophage (phage display). Antibody phage display is a technology capable of isolating antibody fragments specific to the target antigen in vitro. Many antibody-based therapeutics have been discovered or optimized using phage display technology, proving their usefulness as a technological platform for biopharmaceutical development5,6,7. The quality of the antibody library is crucial for successfully identifying binders with the desired target specificity and properties8,9,10. The functional diversity of the library and the properties of individual clones within the library, such as folding stability, expression level, solubility, and correct assembly into phage particles, determine the quality of an antibody library. The monomeric content is another important issue in developing antibody-based therapeutics. The aggregation propensity of antibody proteins is primarily determined by their variable domain components (VH and VL)11, and antibody fragments, such as scFv, are prone to form dimers and higher molecular weight species12,13. Therefore, for practical applications, developing more stable variable domains with lower aggregation propensity in a multitude of conditions is necessary.

Heavy chain-only antibodies (HCAbs), which lack the first constant domain (CH1) and light chain of conventional IgG, were originally found in the serum of Arabian camels (Camelus dromedarius)14 and subsequently in other camelid species. The variable heavy domain of camelid HCAb, called VHH (variable heavy domain of HCAb) or nanobody, is highly stable, autonomously functional without a light chain counterpart, and represents the smallest antibody-based target binding proteins. Other notable features of these single variable domain antibodies include high expression levels in a variety of prokaryotic and eukaryotic host cells, high solubility, and modularity due to their small molecular size and good stability, which enables sdAbs to incorporate into bispecific antibodies, chimeric antigen receptors, and other molecular constructs that require target binding moieties15.

Camelid VHHs have high sequence homology to human VH316 and can be readily humanized to minimize immunogenicity in therapeutic applications. However, camelids as immunization hosts (e.g., camels, llamas, or alpacas) are not as accessible as common laboratory animals, such as mice or rabbits. Moreover, producing monoclonal antibodies through animal immunization, followed by library construction and phage display selection, is time-consuming and expensive. A synthetic VH sdAb library based on a human variable domain scaffold can be an attractive alternative to camelid immunization, and many studies have reported the construction and validation of such libraries17,18,19,20,21. The major difference between camelid VHH and human VH3 is within the framework 2 (FR2) region, which consists of a part of the VH–VL interface in conventional antibodies. The nonpolar amino acids in FR2 form hydrophobic interactions with their VL counterparts and are primarily responsible for the poor solubility of unaltered non-camelid VH domains produced as sdAbs. Therefore, many human VH-based sdAbs have ‘camelized’ FR2 sequences for better solubility and lower aggregation.

In this study, we identified a stable human VH domain with enhanced purification yield and monomeric content from FR2-randomized human VH3-23 by thermal challenge followed by phage display selection against protein A, and developed a synthetic human VH domain antibody library based on the engineered scaffold. Our approach differs from previously reported human sdAb libraries, which were constructed either by utilizing known stable variable domain scaffolds22,23 or by optimizing CDR and/or FR sequences through rational design24. In another previous study17, FR2-randomization of the VH domain of trastuzumab yielded stable autonomous VH domains that bear no significant similarity to our scaffold, highlighting the uniqueness of the library constructed in this study. Subsequently, we evaluated the performance of the library by panning it against test antigens and analyzing the affinities and physicochemical properties of the isolated target-binding clones. Our results suggest that the engineered human VH3 domain shows structural and functional compatibility with diverse CDR sequences and is an effective scaffold for constructing highly functional sdAb libraries.

Results

Isolation of stable human VH3 FR2 variants

To develop stable human scaffolds for the VH sdAb library, we first chose the VH domain of a previously isolated scFv antibody with a high expression level and good physicochemical properties. To engineer stable variants of this VH domain (based on the human VH3–23 germline gene) in a sdAb format, the “hallmark” residues of FR2 (residues 37, 44, 45, 47; Kabat numbering)25 were randomized according to NNK degenerate codon (N = A, T, G, or C; K = G or T). The FR2-randomized VH library was generated using overlap extension PCR using an NNK-randomized oligonucleotide primer, and the randomized VH repertoire was ligated to the pComb3X phagemid vector and transformed to TG1 E. coli cells. Finally, an FR2-randomized library with a size of 1.1 × 107 was obtained, which was sufficient to cover the theoretical maximum diversity of (NNK)4 (~ 1 × 106).

Stable human VH single domains were enriched from the FR2-randomized library through 8 rounds of panning against protein A superantigens, which is known to bind antibody VH domains, including human VH3, as well as the Fc portion of immunoglobulins from various species26. After three rounds of panning, the amplified phage pool was heated (at 70 °C for rounds 4–6 and 80 °C for rounds 7 and 8) for 10 min before the protein A binding step. It was expected that incorporating the heating step would preferentially enrich VH variants that are either resistant to heat denaturation or rapidly renaturing after the denaturation. After the last panning round, protein A binders (i.e., clones expressing correctly folded VH sdAb) were identified by ELISA, in which the E. coli periplasmic extracts containing individual VH variants were also heat-treated before binding to surface-adsorbed protein A.

The sequence of ELISA-positive clones from the final panning round showed significant sequence convergence (Table 1). Compared with the clones from the final panning round, randomly selected clones from the pre-panning library did not express well and did not exhibit discrete melting transitions in protein thermal shift (PTS) assay (Supplementary Fig. S1), implying their relative instability. Notably, the consensus FR2 sequence of the ELISA-positive clones showed significant similarity to the FR2 sequences of human Vκ. Positions 37 and 44 were preferentially occupied by Tyr and Pro, respectively, and Pro mostly occupied position 45. In comparison, the consensus amino acids in the corresponding positions of human Vκ germline segment were Tyr, Ala/Ser, and Pro for Vκ residues 36, 43, and 44, respectively. Two isolated FR2 variants (YTPW and YPPG) were well expressed in E. coli, suggesting their potential as human VH sdAbs. Since YTPW was better expressed, it was chosen as a primary candidate for further optimization and library construction. This clone also harbored a spontaneous F27S mutation in FR1 (Kabat definition: the position is in CDR1 according to Chothia CDR definition), which was retained in the subsequent sdAb library construction.

Table 1 The FR2 sequences of ELISA-positive clones from the final panning round of the FR2-randomized library.

Identification of optimized sdAb library scaffold

VH Gln39 forms hydrogen bonds with VL Gln38 in native VH–VL pairing27 and faces each other in the VH–VH homodimer at a distance of 6 Å28. Therefore, we reasoned that VH single-domain homodimer formation could be reduced by introducing a Q39E mutation, which would induce charge repulsion. This mutation was introduced to YTPW to produce YETPW. During the Q39E mutation process, we identified a clone with an inadvertent T44S mutation (YESPW). Since this clone was well expressed, and residue 43 in human Vκ2 and Vκ6 (corresponding to residue 44 in human VH) is predominantly Ser, it was also tested as a library scaffold. YETPW, YESPW, and their Gln39 forms (YTPW and YSPW) were expressed in E. coli. Size exclusion chromatography (SEC) analysis of the purified VH sdAbs showed that these variants were monomeric (Supplementary Fig. S2), and YETPW and YESPW were initially chosen to evaluate their performance as library scaffolds, with the expectation that the negative charge on Glu39 could help minimize homodimerization in a wider range of VH sequences. Limited randomization of CDR1 and CDR2 was introduced using degenerate oligonucleotides, and CDR3 was diversified using trinucleotide phosphoramidite chemistry (Supplementary Fig. S3). Unfortunately, the sdAb libraries with estimated > 109 diversity failed to yield binders when panned against several antigens, including hen egg lysozyme (HEL), human serum albumin (HSA), B-cell maturation antigen (BCMA), c-Met, and HER2.

Since the scaffolds containing the Q39E mutation were not optimal, Glu at position 39 was reverted to the original Gln or changed to the positively charged Lys, which could destabilize VH homodimer through charge repulsion. Additionally, position 44 was variegated between Ser and Thr to determine the residue that provides better scaffold properties. These four variants (QT, QS, KT, and KS) (Fig. 1) were expressed in E. coli and tested for the aggregation propensity using SEC. The analysis showed that QT, KT, and KS were eluted as a single peak with a retention time similar to HEL (molecular weights for HEL and VH sdAb are 14.3 and ~ 15 kDa, respectively). In contrast, the other variant QS showed retention times similar to scFv (~ 25 kDa), suggesting the formation of VH homodimers.

Figure 1
figure 1

SEC analysis of VH scaffolds. (a) Four VH variants (QT, QS, KT, and KS) were expressed in E. coli, purified by Ni–NTA chromatography, and analyzed by SEC. QT, KT, and KS were found to be eluted similarly to the retention time of HEL (14.4 kDa). QS was analyzed to have a similar retention time to scFv (25 kDa), which was considered indicative of dimerization. The retention time (min) of each VH variant is indicated above the peak. (b) The FR2 sequences and Ni–NTA purification yield of the VH sdAb scaffolds.

For QT, KT, and KS, position 27 was again variegated between Ser and Phe because position 27 in human VH3-23 is Phe; however, it was spontaneously mutated to Ser during the selection process, as mentioned above. It was reasoned that the panning selection of this mutation suggests that residue 27 possibly influences the physicochemical behavior of the VH sdAb. The resulting variants SQT, SKS, FQT, and FKS were produced in E. coli and analyzed using SEC and the protein thermal shift (PTS) assay (Fig. 2). Three variants (SQT, SKS, and FKS) showed monomeric behavior and Tm values > 60 °C. Among these clones, FKS eluted as a sharp monomeric peak from SEC with minimal aggregation or degradation; thus, it was chosen for the construction of the final sdAb library.

Figure 2
figure 2

SEC analysis of SQT, SKS, FQT, and FKS VH scaffolds. (a) The VH variants SQT, SKS, FQT, and FKS were expressed in E. coli, purified by Ni–NTA chromatography, and analyzed by SEC. Retention times for SQT, SKS, and FKS were similar to that of HEL. FQT exhibited a retention time similar to scFv, which is indicative of homodimer formation. The retention time (min) of each VH variant is indicated above the peak. (b) The FR2 sequences, Ni–NTA purification yields, and Tm values of the four VH scaffolds. *N/A—Not applicable: VH3-23 wild type did not exhibit a discrete melting transition in PTS assay, and Tm value could not be measured.

Construction and validation of the sdAb library

Diversities for CDR1 and CDR2 were introduced to the FKS scaffold using partially degenerate oligonucleotides that reflect amino acid usage of human germline CDR repertoires (Supplementary Fig. S3a). CDR3 was diversified based on the published amino acid composition of functional human CDR-H3 sequences29 by trinucleotide phosphoramidite-based randomization with length variation (9, 14, and 20 amino acids [aa]; Kabat CDR definition) (Supplementary Fig. S3b). Two separate repertoires were designed for the 20 aa CDR3, each with or without an intraloop disulfide bond, thereby simulating the human immunoglobulin D2 segment30 (Supplementary Fig. S3b). CDR positions that are relatively invariable in natural human antibodies (i.e., positions 34, 51, 93, 94, 101, 102, and 103) were not randomized. The randomized oligonucleotides were assembled with the framework regions of FKS using a series of overlap–extension PCR cycles to generate the VH sdAb libraries, ligated to pComb3X phagemid vector, and transformed to E. coli. The total estimated size of the library, deduced from the number of bacterial transformants, was 1.6 × 109 (Table 2).

Table 2 The size of the final library.

The constructed sdAb library was validated by panning against a panel of test antigens (HEL, HSA, ovalbumin, and cysteinyl tRNA synthetase 1 [CARS1]). Four rounds of panning were performed against each antigen, and target-binding clones were identified by ELISA screening of the panning output clones (Table 3). Several binders with unique sequences and minimal background binding signals were each identified against HSA, ovalbumin, and CARS1, although the panning and screening against HEL failed to yield specific binders. The identified binding clones were then characterized in detail for their affinities and physicochemical properties.

Table 3 Validation of the FKS library.

Characterization of the isolated target-specific sdAbs

The selected sdAbs were expressed and purified from E. coli by immobilized metal ion affinity chromatography (IMAC). Approximately 1–3 mg of sdAb was purified from 50 mL culture (20–70 mg/L purification yield; Table 4). SEC analysis showed that the monomeric contents of the isolated sdAbs were 66%–86% when purified (Supplementary Fig. S4). The retention times of the monomer peaks varied from ~ 21 min for CARS1 #3 and #8 to ~ 26 min for HSA #23, possibly reflecting their CDR3 length variation (Table 4) that could affect not only their molecular weights but also the hydrodynamic radii31,32. It should be noted that during SEC analysis, many clones were not eluted as a discrete peak and thus could not be further characterized (Tables 3, 4). It is possible that these clones were structurally unstable and the partially unfolded protein aggregated or interacted with the stationary phase, trapping them inside the chromatographic column. The kinetic binding parameters of the purified monomeric sdAbs were analyzed by surface plasmon resonance (SPR). The affinities (KD values) ranged from 10–9 to 10–7 M, with kon ranges of 103–105 M−1 s−1 and koff at ~ 10–3 s−1 (Table 4 and Supplementary Fig. S5). It is noted that the sensorgrams for anti-CARS1 clones show signs of second-phase binding, which may reduce the quality of the fits and the accuracy of the binding kinetics analysis. For comparison, a panel of 50 monoclonal antibodies raised against 19 different antigens showed average kon, koff, and KD values of 1.9 × 105 M−1 s−1, 2.4 × 10–3 s−1, and 1.3 × 10–8 M, respectively33. Lastly, the thermal stability of the sdAbs was evaluated using PTS assay and phage ELISA. Tm values ranging from 58 to 70 °C were observed, averaging 67 °C (Table 4)34. The sdAbs retained approximately 50% or more of their binding activity to the cognate antigen after heat treatment (70 °C for 2 min), suggesting high thermostability and/or refoldability (Fig. 3). Compared to the sdAbs isolated from the library and the FKS scaffold itself, unengineered WT VH3-23 exhibited relatively low expression yield (1.46 mg/L culture, Fig. 1b) and was prone to aggregation which prevented PTS or SEC analysis, highlighting the improvement achieved through scaffold optimization.

Table 4 Sequences, purification yields, and binding kinetics of selected target-specific VH clones.
Figure 3
figure 3

Binding activities of sdAbs before and after heat treatment. SdAb-displaying phages were heat-treated at 70 °C for 2 min, and their binding activities were compared with non-heat-treated phages. The relative binding activities of heat-treated phages were 92%, 102%, 97%, 80%, 85%, and 48% for CARS1 #1, CARS1 #3, CARS #8, ovalbumin #7, ovalbumin #8, and HSA #23, respectively. Average values of triplicate absorbance readings are shown. Error bars indicate standard deviation of the readings.

The binding specificity of the sdAbs was evaluated using ELISA (Fig. 4). When tested against a panel of antigens, including SARS-CoV-2 RBD-Fc, human ACE2, HEL, CARS1, ovalbumin, and HSA, all six clones bound exclusively to their target antigen, demonstrating no significant binding to non-target antigens and confirming their high specificity.

Figure 4
figure 4

Evaluation of binding specificity by ELISA. The binding specificity of Ni–NTA/SEC-purified sdAbs (5 µM) were evaluated against a panel of antigens (SARS-CoV-2 RBD-Fc, human ACE2, HEL, CARS1, ovalbumin, and HSA) by ELISA. Average values of triplicate absorbance readings are shown. Error bars indicate standard deviation of the readings.

Discussion

Poor physicochemical properties of autonomously expressed non-camelid single variable domains have been a major obstacle to the wider development and application of sdAbs. Structural analysis of HEL-binding human VH sdAb (HEL4) suggests that the extended conformation of CDR3, as well as the side-chain reorientation of Trp47 in FR2 accommodated by Gly35 in CDR1, might have determined its enhanced physicochemical properties35. Other sdAbs with framework sequences identical to HEL4 but different in CDR sequences showed poor biophysical properties22, highlighting the importance of the sequence and the conformation of CDRs and FR2. Furthermore, camelid VHHs have a high homology to the human VH3 family except in FR2, and “camelization” of human VHs in which the hallmark FR2 residues of human VH FR2 were substituted to camelid sequences resulted in improved biophysical properties36. Therefore, we sought to engineer the FR2 sequence of human VH3-23 (DP47), which is highly common and has desirable properties and high homology to camelid VHHs, to generate optimized scaffolds for constructing human sdAb libraries. Specifically, the four FR2 residues (Val37, Gly44, Leu45, Trp47), which together form a part of the VH–VL interface through hydrophobic interactions in normal antibodies but contribute to poor biophysical properties of many non-engineered non-camelid VH sdAbs, were randomized, and the resulting VH library with diversified FR2 was displayed on phage.

M13 bacteriophage retains full infectivity after one-hour heat treatment at 80 °C37. Therefore, the FR2-diversified VH phage-displayed library was subjected to brief heating at up to 80 °C before binding selection to immobilized protein A, which is known to have an affinity for correctly folded human VH319,38,39. FR2 variants that are resistant to heat denaturation or capable of rapid renaturation would be preferentially enriched by this selection, and after eight rounds of panning, the output sequences showed convergent patterns. Notably, the consensus FR2 sequence from the eighth round resembled the corresponding region of human Vκ. It was reported that VL domains exhibit better biophysical properties than VH domains as sdAbs13,40,41,42. The observed convergence to VL-like FR2 sequences suggests that the thermal challenge during panning promoted the enrichment of stable sdAb sequences and that the reported stability of VL sdAbs might partly be explained by their FR2 sequences. After a candidate scaffold (YTPW) was identified, the effects of the serendipitously found mutations (F27S and T44S) and the charge introduction (Q39E and Q39K) were probed to determine the final library scaffold FKS (VH3–23 sequence with V37Y/Q39K/G44S/L45P mutations). While these mutations were tested only in VH3-23, it is highly homologous to other members of human VH3 family (92.5% sequence identity and 97.4% similarity on average in framework regions). Also, human VH3 family is notably prevalent in rearranged human antibody repertoires43 and shares high sequence similarity to camelid VHHs. Based on these observations, it is plausible that the FR2 mutations found in this study could be applied to the optimization of other VH sdAb, although further studies are necessary to validate their general applicability. Other scaffolds (SQT and SKS) also showed monomeric content, expression level, and thermal stability comparable to FKS. Although FKS was chosen as a scaffold for the library construction, the decision was based on a limited set of experimental data, and it is possible that other sequences might also prove effective as sdAb scaffolds.

CDR diversities were designed based on the amino acid usage of human antibody CDRs29, as the detailed knowledge of the interaction and compatibility between the new scaffold and CDRs was lacking. For CDR3, length variations were also introduced: length 14 (from position 93 to 102 in Kabat numbering scheme44), the most common length among known human antibody CDR-H3s, as well as a shorter (9 aa) and a longer (20 aa) lengths were designed. Additionally, a repertoire of 20-aa CDR3s with an intraloop disulfide bond, which is known to stabilize long CDR3s and expand the conformational space of antibody paratope45, was also prepared. By introducing the synthetic CDR diversity to the FKS scaffold, a sdAb library with an estimated size of ~ 109 clones was constructed. After four rounds of panning, target-specific binders were identified against three of the four test antigens (ovalbumin, HSA, CARS1), while no binder was found against HEL. It is possible that the small, positively charged protein (14.6 kDa; pI = 10.7) does not have epitopes suitable for engaging the largely convex paratopes of the sdAb library46. For the other three antigens, target-binding clones were sequenced, expressed, and purified, and their affinities, heat stability, purification yield, and monomeric content were analyzed to validate the performance of the library. The affinities of the isolated sdAbs ranged between 4 and 800 nM, comparable to typical nanobodies47 or murine monoclonal antibodies33. Likewise, monomer contents (66–87%) and Tm (59–70 °C) for these clones were comparable to previously reported values41,48,49,50. It is noted that for the clones with moderate-to-weak affinity (KD > 10–7 M), the rate of association is slower than in typical antibody–antigen interactions (kon ~ 103–104 M−1 s−1 vs. ~ 105 M−1 s−1), which is suggestive of the entropic costs of binding associated with the conformational flexibility of CDR loops in the unbound form51,52,53. These results indicate that the engineered VH scaffold FKS can accommodate diverse CDR sequences and conformations while retaining proper folding. At the same time, the highly variable CDR sequences could also affect the sdAb properties, as evidenced by the variance in purification yields, monomer contents, SEC profiles, and Tm values among clones isolated from the library. Particularly, patches of charged or hydrophobic amino acids could affect domain solubility and stability significantly. For example, CARS1 #8 and ovalbumin #8 clones have a conspicuous patch of positively charged residues (Arg and Lys) in their CDR3s, which likely interact with negative charges on the target epitopes but also might contribute to their biophysical properties such as solubility and monomeric content (through charge repulsion). The CDR3 of CARS1 #8 also features a notable cluster of tyrosines, reflecting the high frequency of tyrosines in long human CDR-H3 repertoires, on which the library diversity was designed. Large-scale analysis of additional CDR sequences derived from the library could provide useful information for the further optimization of the library.

In conclusion, we have identified an optimized, non-camelid FR2 sequence for human VH3-based sdAbs, which exhibits enhanced purification yield, higher heat stability, and increased monomeric content. CDR diversities were introduced to this scaffold, and the resulting sdAb library successfully yielded target-specific binders with desirable properties. The results indicate that the monomeric and stable scaffold is sufficiently flexible to tolerate different CDR loop conformations. Further studies are planned to optimize CDR diversities for better compatibility with the engineered scaffold and improved library performance.

Materials and methods

Construction of synthetic VH sdAb libraries

All oligonucleotides and OE-PCR assembly schemes for library construction are listed in Supplementary Tables S1–S3. A mixture of Taq DNA polymerase (New England Biolabs, Ipswich, MA, USA; 2.5 units/100 µL) and Pfu DNA polymerase (Promega, Madison, WI, USA; 0.6 units/100 µL) was used for the PCR amplification of DNA fragments. For the FR2-randomized VH library, the VH3–23 domain of an anti-RANKL scFv clone 4A5 previously isolated from a phage antibody library54 (unpublished result) was used as a template for randomizing FR2-hallmark residues. The 5′-fragment, including position 37, and the 3′-fragment containing positions 44, 45, and 47, were amplified separately using degenerate oligonucleotide primers. The amplified fragments with randomization at positions 37, 44, 45, and 47 were assembled through overlap-extension PCR (PCR conditions were as follows: initial melting at 94 °C for 5 min; 25 cycles of denaturation at 94 °C for 30 s, annealing at 56 °C for 30 s, extension at 72 °C for 30 s; final extension at 72 °C for 7 min). To construct the final sdAb library, CDR diversities were introduced to the selected VH scaffold (FKS scaffold). FR1-CDR1, FR2, CDR2-FR3, CDR3, and FR4 fragments were amplified by PCR (initial melting at 94 °C for 5 min; 25 cycles of denaturation at 94 °C for 30 s, annealing at 56 °C for 30 s, extension at 72 °C for 30 s; final extension at 72 °C for 7 min) and assembled by OE-PCR (initial melting at 94 °C for 5 min; 25 cycles of denaturation at 94 °C for 30 s, annealing at 56 °C for 30 s, extension at 72 °C for 30 s; final extension at 72 °C for 7 min). The assembled VH library was digested using the SfiI restriction enzyme (New England Biolabs), ligated into SfiI-digested pComb3X vector55 (a phagemid vector with an amber stop codon before truncated gIII for production of soluble protein in SupE44 mutant strains of E. coli, a 6 × His-tag for purification and an HA-tag for detection), and transformed into TG1 electrocompetent E. coli (Lucigen, Middleton, WI, USA). Transformed cells were plated on LB agarose square plates (500 cm2) supplemented with 2% (w/v) glucose and ampicillin (100 μg/mL). The next day, the amplified cells were harvested and resuspended in 5 mL SB media (Super Broth; 3% (w/v) bactotryptone, 2% yeast extract, and 1% MOPS, pH 7.2) supplemented with 100 μg/mL ampicillin and 2% glucose. A half volume of 50% glycerol solution was added (17% final glycerol), and 1 mL aliquots were snap-frozen in liquid nitrogen and stored at − 80 °C.

Phage library was rescued from the E. coli stock by inoculating a frozen aliquot (1 mL) in 400 mL SB media supplemented with 100 μg/mL ampicillin and 2% glucose and growing the bacteria until OD600 reaches ~ 0.7. The culture was centrifuged, cells were resuspended in 400 mL SB-ampicillin without glucose, and VCSM13 helper phage (Cat. #200251, Agilent Technologies, Santa Clara, Ca, USA; 1012 pfu) was added. After 1 h infection at 37 °C with shaking (80 rpm), kanamycin (70 μg/mL final concentration) was added, and the infected bacteria were cultured overnight (16 h) at 30 °C, 120 rpm. Next morning, the culture was centrifuged at 14,000 ×g for 15 min. Supernatant was transferred to a clean centrifuge bottle, PEG8000 (16 g; 4% w/v) and NaCl (12 g; 3% w/v) were added and dissolved with shaking, and the mixture was incubated on ice for > 30 min. The precipitated phage library was centrifuged at 14,000 × g for 30 min and resuspended in 2 mL PBS, 0.5 volume of 50% glycerol was added, and 0.3 mL aliquots (~ 1012 cfu) were stored at − 80 °C.

Identification of stable autonomous human VH sequences

A total of 8 rounds of panning against the protein A superantigen were performed using the FR2-randomized library. Briefly, 1 to 3 rounds of panning were performed using a typical panning method. Protein A antigen (2 μg/mL in 1 mL PBS) was immobilized on an immunotube (Nunc 470319, Thermo Scientific). After immobilization, the antigen-coated tube was blocked for 1 h with 3% skim milk solution in PBS containing 0.05% Tween 20 (mPBS) at room temperature. FR2-randomized library phage stock was blocked using mPBS–0.05% Tween 20 (mPBST, 1 mL total volume) for 1 h, then transferred to the protein A (Pierce Biotechnology, Rockford, IL, USA)-coated immunotube. After incubation at 37 °C for 2 h, unbound phages were removed, and the tube was washed three times with PBST. Bound phages were eluted with 1 mL of 100 mM triethylamine solution and neutralized with 0.5 mL of 1 M Tris–HCl (pH 7.0). Then, 8.5 mL of mid-log phase TG1 E. coli was added to the eluted phage and incubated at 37 °C with slow agitation (120 rpm) for 1 h, centrifuged, and plated and titrated on LB–ampicillin agarose plates supplemented with 2% (w/v) glucose. The next day, the amplified cells were collected, and phages were rescued for the next round. Phage rescue was performed by inoculating 50 µL of the harvested E. coli to 20 mL of SB–ampicillin. VSCM13 helper phage (1011 pfu) was added to mid-log phase culture for infection at 37 °C with gentle agitation (120 rpm) for 1 h. Kanamycin (70 μg/mL) was added, and the bacteria were cultured at 30 °C overnight. The next day, secreted phages in the culture supernatant were precipitated with 5 × PEG solution (20% [w/v] PEG8000 and 15% [w/v] NaCl in deionized water), resuspended in PBS, and used for the next round of panning. In rounds 4 to 6, precipitated phage pools were heated at 70 °C for 2 min and cooled on ice for 5 min before binding to the antigen. In rounds 7 and 8, heating was performed at 80 °C for 10 min and cooled on ice for 10 min. After the heat treatment, panning was performed in the same manner as described above.

Panning and ELISA screening of sdAb library

The constructed library was panned for four rounds against test antigens (HSA, HEL, CARS1, ovalbumin) following the identical protocol for protein A panning described above, except for the heat treatment step. To screen antigen-specific VH binders, soluble sdAbs were produced. Single colonies were picked from the final round output, inoculated into a 96-well microtiter culture plate containing 200 µL SB–ampicillin medium, and grown with shaking at 37 °C until turbid (3–4 h). IPTG (1 mM final concentration) was added to each well, and the plate was incubated with shaking at 30 °C overnight. The next day, the plate was centrifuged, and cell pellets were resuspended in 60 µL of cold 1 × TES buffer (20% [w/v] sucrose, 1 mM EDTA, 50 mM Tris, pH 8.0) and incubated on ice for 30 min. Cold 0.2 × TES (90 µL) was added, and the mixture was incubated on ice for 30 min. The plate was centrifuged, and the supernatant containing E. coli periplasmic extract was transferred to an ELISA plate coated with antigen and blocked with 3% BSA in PBST (25 µL/well). After binding for 1 h and 3 × washing with PBST, anti-HA-HRP antibody (Cat. #2999S, 1:3,000 dilution, Thermo Fisher Scientific, Waltham, MA, USA) was added and incubated at room temperature for 1 h. After washing three times with PBST, binding activity was measured using the chromogenic HRP-conjugated substrate tetramethylbenzidine (TMB).

Phage ELISA was performed to assess binding activity after heat denaturation. E. coli clones were grown in 20 mL SB-ampicillin until turbid as described above, and VCSM13 helper phage (1011 pfu) was added. After infection at 37 °C for 1 h, kanamycin was added at 70 µg/mL final concentration, and the cells were cultured overnight at 30 °C. Next day, phages were precipitated with 5 × PEG solution, resuspended in 1 mL PBS, heated at 70 °C for 2 min, and heat-treated and non-treated phages were added to the wells of an ELISA plate coated with antigen and blocked with 3% BSA-PBST. Subsequent steps were proceeded as described above.

Purification of sdAb

For sdAb purification, TG1 E. coli cells harbouring the phagemid DNA were grown in 50 mL of SB–ampicillin medium at 37 °C until the OD600 reached 0.7. Protein expression was induced using 1 mM IPTG, and the culture was incubated at 30 °C with agitation at 220 rpm overnight. The following day, the cell pellets, obtained by centrifugation, were resuspended in 4 mL of cold 1 × TES buffer and incubated on ice for 30 min. Cold 0.2 × TES (6 mL) and PMSF (final concentration: 1 mM) were added, and the mixture was further incubated on ice for 30 min. After centrifugation at 14,000×g for 15 min, 5 mM MgCl2 was added to the cleared supernatant, followed by 100 μL of Ni–NTA agarose beads suspension (Cat. #70666-4, Merck Millipore, Darmstadt, Germany). The suspension was incubated at room temperature with gentle agitation for 1 h and transferred to an empty gravity column (Cat. #7311550, Bio-Rad, Hercules, CA, USA) to allow the liquid to flow through. The beads were washed twice with 5 mL of wash buffer (PBS with 5 mM imidazole, pH 7.4), and the purified sdAb was eluted by adding 400 μL fractions of elution buffer (PBS with 250 mM imidazole, pH 7.4).

Size exclusion chromatography

Size exclusion chromatography was performed using a Superdex™ 75 increase 10/300 GL column (Cat. #17-5174-01, Cytiva, Marlborough, MA, USA) or, for data shown in Fig. 1, a Superdex™ 200 increase 10/300 GL column (Cat. #28–9909-44, Cytiva), in an AKTA PURE system (Cytiva). Purified sdAb (400 μL of 3–5 mg/mL concentration) was injected and run with 1 × PBS buffer at a flow rate of 0.75 mL/min. HEL (Cat. # L6876, Sigma; MW 14.4 kDa) and scFv (anti-RANKL clone 4A5 [see above]; MW 27 kDa including 6 × His and HA-tag) were used as molecular weight standards. Monomeric content was estimated from the ratio of peak areas of monomer and dimer/higher molecular weight species, using Unicorn 7.3 software (Cytiva).

Protein thermal shift (PTS) assay

The Tm (°C) of purified VH sdAb was measured by Protein Thermal Shift™ (PTS) assay using the StepOnePlus™ Real-Time PCR System (Cat. # 4379216, Applied Biosystems). The measurement was performed by mixing 2 µg of sdAb in 1 × PBS (pH 7.4) with assay reagents from the Protein Thermal Shift™ Dye kit (Cat. #4462263, Applied Biosystems) to a total volume of 20 µL, following the manufacturer’s instructions. The real-time PCR instrument was set up as detailed for the 7500 Fast Real-Time PCR system in the Protein Thermal Shift™ Starter Kit User Instructions from Applied Biosystems. Tm values were calculated using Protein Thermal Shift™ Software v1.4 (Cat. #4466038, Applied Biosystems).

Surface plasmon resonance (SPR) assay

SPR analysis was performed using the BIAcore 3000 (GE Healthcare, Piscataway, NJ, USA) instrument at Ewha Fluorescence Core Imaging Center. Antigen (2–5 µg/mL in 10 mM acetate buffer [pH 4, 4.5, or 5]) was immobilized at 1000 response units (RUs) in the flow cell of a CM5 sensor chip (GE Healthcare) using the amine coupling method, according to the manufacturer's protocol. Ni–NTA and SEC-purified sdAbs were diluted in filtered and degassed PBS and run on the antigen-immobilized sensor chip. Binding kinetics were analyzed by BIAevaluation software using the 1:1 Langmuir binding model.