Main

While antibodies still have central roles in protein therapeutics, progress has been made in drug development using nonantibody-binding proteins that show superior properties in thermal/pH stability, binding affinities, tissue delivery and industrial-scale manufacture1,2,3. The two main approaches are random library selection methods and computational protein design. Perhaps the most successful scaffold for random library selection has been the ankyrin repeat4,5; libraries of designed ankyrin repeat proteins (DARPins) have been used to identify high-affinity binding proteins via high-throughput screening methods, which have had multiple successes in preclinical studies3,5,6. Ankyrin repeat proteins have a repeating architecture with structured, hairpin-shaped loops extending from the helices to an extended binding groove that is geometrically compatible with many globular protein targets. Despite these successes, the global shape diversity of DARPins is limited by the use of a single base scaffold. Computational design of binding proteins does not have this limitation as a wide range of scaffolds can be used, with shapes more optimal to bind the target protein of interest. However, this advantage thus far has come with a different limitation—because of the inherent flexibility and lack of extensive backbone hydrogen bonding of long loop regions, protein binder design has focused on scaffolds and binding sites primarily composed of α helical7 or β strand8 secondary structure, which has limited the achievable local shape diversity.

Results

Design approach

Here we set out to overcome the challenges in de novo design of long loops on the one hand, and the limitations of ankyrin scaffolds in global shape diversity on the other, by computationally designing repeat proteins with multiple long loops buttressed by loop–loop interactions (Fig. 1a). To achieve this goal, we divided the problem into the following two subproblems: first, the generation of repeating scaffold backbone conformations compatible with loop buttressing, and second, the generation of loop backbone conformations compatible with a dense network of hydrogen bonds and hydrophobic interactions between pairs of loops and between the loops and the underlying scaffold.

Fig. 1: Computational design of RBLs.
figure 1

a, Design strategy for generating and stabilizing multiple loops in helical repeat proteins. b, A gallery of diverse designed proteins that pass the in silico design filters. c, Loop buttressing hydrogen bonds in the designed proteins.

We developed a computational method for generating a wide range of repeat protein backbones that are geometrically compatible with the insertion of long loops (Fig. 1a, top row). Previous approaches to designing helical repeat proteins have used fragment assembly methods to assemble repeat units with short loops connecting them9. While these methods can generate considerable diversity, we found that they did not provide sufficient control over backbone positions for designing loop buttressing. Instead, we developed a parametric repeat protein generation method that enables precise control over backbone placement. We generated diverse repeat units consisting of two idealized helices by systematically sampling the lengths of the helices (from 12 to 28 residues) and the six rigid-body degrees of freedom between the two helices. We next sampled the six rigid-body degrees of freedom between repeat units and applied the same transform repeatedly to generate a disconnected repeat protein model. Finally, we connected pairs of sequence-adjacent helices using a native-protein-based loop lookup protocol that grafts on loops (from three to six residues) that best fit onto the termini of the helices10. In extensive model-building experiments, we found that to enable the installation of long loops onto these parametrically generated models, the termini of the helices had to be less than 18 Å apart, and we removed backbone models where the distance between termini was greater than this value. We also eliminated poorly packed models with fewer than 28% of the residues in a buried core. The resulting repeat protein models have well-defined core regions and are slightly curved with little or no twisting between the repeat units.

We next sought to develop a general method for building multiple long loops that buttress one another onto protein scaffolds (Fig. 1a, second and third rows). Surveying the structured, long loops in natural proteins, we observed that they frequently contain β turns with strand-like hydrogen bonds flanking the turn residues, which contribute to the stabilization of the specific loop conformation. We also observed that natural proteins often use helix-capping interactions between the sidechain or backbone on the loop residue and the backbone of the helix from which it emanates; this feature helps specify the orientation of the loop as it leaves the helix. Based on these observations, we constructed and curated libraries of β-turn motifs and helix-capping motifs by clustering four-residue native-protein fragments, and we selected the clusters that fulfilled the requirements of hydrogen bonds as described in Methods. During the loop sampling, these motifs were randomly selected and incorporated into a single loop growing from the C-terminus of a helix. Using generalized kinematic closure11, we then connected the C-terminus of the loop to the N-terminus of the next helix in the backbone model. The resulting loop was then propagated to each repeat unit to generate a complete repeat protein model with multiple long loops. To specifically favor loops that could be buttressed with hydrogen bond networks, we required that models have at least two intraloop backbone-to-backbone hydrogen bonds within each repeat unit and at least one interloop backbone-to-backbone hydrogen bond between the repeat unit neighbors. To favor interactions between the long loops and the helices, we further filtered the models by requiring at least five residues within 8 Å of the closest helical residues. The remaining backbone models following these filtering steps contain long loops arranged in sheet-like structures ready for the installation of additional sidechain-based buttressing interactions.

We designed sequences onto these backbones, focusing on further loop stabilization through buttressing (Fig. 1a, bottom row). We began by scanning each position on the long loops for Asn, Asp, His or Gln placements that form backbone-sidechain bidentate hydrogen bonds between loops or between a loop and a helix, and for Val, Leu, Ile, Met or Phe placements that form loop–helix hydrophobic contacts; amino acids meeting these criteria were kept fixed in subsequent design steps. We then performed four rounds of full combinatorial Rosetta protein sequence design with slowly ramped-up fa_rep weight to promote core packing. A slight compositional bias toward proline was used in the long loop to increase rigidity. The design models were filtered in Rosetta by the number of buried unsatisfied heavy atoms (≤3), core residue hole score (≤−0.015), total score per residue (≤−2), packstat (≥0.5) and average hydrogen bond energy per residue (≤−1) in the buttressed long loops. The rigidity of the design models was evaluated using molecular dynamics simulations and the extent to which the designed sequence encodes the structure by AlphaFold12,13. The in silico validated designs span a diverse range of shapes with different repeat protein curvatures and loop geometries (Fig. 1b) and adopt multiple loop buttressing strategies using loop–helix hydrogen bond networks and loop–loop bidentate hydrogen bonds (Fig. 1c). These designed buttressed loops have significantly more diverse structures than the long, hairpin loops in the native ankyrins (Extended Data Fig. 1) and contain more backbone hydrogen bonds (Extended Data Fig. 2).

Experimental characterization

We expressed 102 selected designs (which we call repeat proteins with buttressed loops (RBLs)) in Escherichia coli and purified them by His tag-immobilized metal affinity chromatography. In total, 77 of the purified proteins were soluble (representative models shown in Fig. 2a), 52 were monodisperse and 46 were monomeric, as indicated by multi-angle light scattering coupled with size-exclusion chromatography (SEC–MALS; Fig. 2b). Forty-four of these proteins showed the expected α-helical circular dichroism (CD) spectrum at 25 °C, remained at least partially folded at 95 °C and recovered nearly all the CD signal when cooled down to 25 °C (Fig. 2c). Fourteen designs were further validated by small-angle X-ray scattering (SAXS; Fig. 2d and Extended Data Fig. 3). The experimental scattering curves agreed with profiles computed from the design models.

Fig. 2: Biophysical characterization of designed helical RBLs.
figure 2

a, Design models of six representative designs. b, SEC traces monitoring absorbance at 280 nm. c, CD spectra collected at 25 °C (blue), 95 °C (orange) and 25 °C after cooling from 95 °C (green). d, Overlay of experimental (black) and theoretical (red) SAXS profiles.

We determined the crystal structure of design RBL4 at 1.8 Å resolution (Fig. 3a–e). RBL4 contains four helix–long-loop–helix repeat units that are sandwiched by two terminal capping helices. Each long loop is anchored on top of the neighboring helices and stabilized by interloop Asn-mediated bidentate hydrogen bond networks as designed, and the design model is in good agreement with the crystal structure with a Cα root-mean-square deviation (RMSD) of 1.7 Å (Fig. 3a). The primary discrepancy between the crystal structure and design model is in the inter-repeat transformation—the design model is slightly curved (smaller superhelical radius), while the crystal structure is nearly flat (larger superhelical radius). Within individual repeat units, there is very close agreement between the crystal and design model, with repeat unit Cα RMSDs for different repeat units ranging from 0.48 to 0.61 Å (Fig. 3b). The designed loop buttressing interactions—the bidentate interloop hydrogen bonds (Fig. 3c) and loop–helix salt bridges (Fig. 3d)—were accurately recapitulated in the crystal structure. B-factors for the loop residues are elevated compared to the helix residues (Extended Data Fig. 4a), but the fit to the electron density shows that the loops are well ordered (Extended Data Fig. 4b).

Fig. 3: Structural characterization by X-ray crystallography.
figure 3

a, Superimposition of crystal structure (yellow) onto the design model of RBL4 (gray). b, Alignment of individual repeat units. ce, Accurately designed loop buttressing interactions: bidentate interloop hydrogen bonds (c), loop–helix salt bridge (d) and loop–helix hydrophobic contacts (e). f, Superimposition of crystal structure (blue) onto the design model of RBL7_C2_3 (gray). g, Superimposition of a monomer unit in the crystal structure onto the design model. hj, Accurately designed loop buttressing interactions: intraloop and interloop hydrogen bonds (h), bidentate interloop hydrogen bonds (i) and loop–helix hydrophobic contacts (j).

Design RBL7 has a similar overall geometry as RBL4, but with a smaller superhelical radius. This design was highly stable and monomeric, with an overall fold validated by SAXS (Fig. 2, second row). We obtained crystals that diffracted poorly with the highest resolution at 4.2 Å. As previous studies suggested that synthetic oligomerization can sometimes assist crystallization14, we sought to generate a dimeric form of RBL7 by introducing a hydrophobic dimer interface. The redesigned protein, RBL7_C2_3, was soluble and dimeric, and we were able to solve the crystal structure at 3 Å resolution. The crystal structure closely matches the design model, with a Cα RMSD over the dimer of 2.9 Å (Fig. 3f) and over the monomer of 1.6 Å (Fig. 3g). The main discrepancies between the crystal and designed structures were in the terminal helices. Similar to design RBL4, the crystal structure confirmed the accuracy of the designed loop buttressing interactions in RBL7_C2_3 (Fig. 3h–j). All of the designed interloop hydrogen bonds at the β turns of long loops were recapitulated in the crystal structure (Fig. 3h). These hydrogen bonds are likely crucial for positioning the long loops and contribute to the close matching between the loops in the design model and those in the crystal structure. Again, the B-factor values of the buttressed loops are slightly elevated in the loops compared to the helices (Extended Data Fig. 4c), but with a good fit to the electron density (Extended Data Fig. 4d).

Design of peptide-binding RBLs

An exciting application of our designed RBLs is to use them as starting points for the computational design of high-affinity binding proteins. This could enable the design of DARPin-like binders for a wide range of targets without the need for large-scale library selection methods. The ability to design a wide diversity of repeating scaffolds with buttressed loops could considerably expand the space of targets. As a first step toward investigating the design of RBL-based binders, we redesigned the extended groove bordered by the buttressed loops to bind extended peptides. To take advantage of the repeating nature of RBLs, we chose to focus on peptides with a repeating sequence motif—in this case, once a repeat unit is designed to bind a particular short peptide, repeat proteins containing multiple copies of this unit should bind peptides with multiple copies of the motif, provided the register between the repeat protein and the peptide can be maintained. Generalizing from the observation that some ankyrin family proteins can bind peptides with a PxLPxI/L (x can be any amino acid) sequence motif15, we sought to design binders for peptide sequences of the form (XYZ)n, where n is the number of repeats, X is a polar residue interacting with residues in the buttressed loop β turns and Y and Z are hydrophobic residues interacting with the helices and the helix–loop joint of RBLs (see Fig. 4b for an example of one peptide repeat unit interacting with an RBL-based peptide binder).

Fig. 4: Designed repeat peptide-binding RBLs.
figure 4

a,d,g, Design models of peptide-binding proteins in complex with (DLP)6 (a), (KLP)6 (d) and (DLS)6 (g). b,e,h, Sequence-specific interactions in the binder–peptide complexes: (DLP)6binder–(DLP)6 (b), (KLP)6binder–(KLP)6 (e) and (DLS)6binder–(DLS)6 (h). c,f,i, Fluorescence polarization measurement of binding between (DLP)6binder–(DLP)6 (c), (KLP)6binder–(KLP)6 (f) and (DLS)6binder–(DLS)6 (i). For each binder, a titration curve is plotted for the binding of each peptide (blue, (DLP)6; orange, (KLP)6; and green, (DLS)6).

To design binders of (XYZ)n peptides, we first docked tripeptide repeats in the polyproline II helix conformation to the binding grooves of RBLs guided by the interactions in peptide-binding ankyrin family proteins in the Protein Data Bank (PDB; Methods), and carried out rigid-body perturbations to diversify the docked poses. For each resulting pose, we used the Rosetta sequence design to generate sequences of both RBL and peptide for optimized binding. We designed 34 proteins to bind six-repeat peptides ((DLP)6, (KLP)6 or (DLS)6), screened them in silico based on protein–protein interaction design filters including AlphaFold12,13 structure recapitulation, obtained synthetic genes encoding the designs and purified the proteins from E. coli expression. In the initial binding screens using split-luciferase assay, seven designs showed clear binding signals. From these designs, we selected the strongest binders for each peptide target and performed fluorescence polarization measurements, which showed the protein–peptide interactions are orthogonal and have high affinities (Fig. 4). All the selected binders (Fig. 4a,d,g) were based on RBL4 with the peptides in nearly identical binding modes. Unlike the natural peptide-binding ankyrins, the designed peptide binders interact specifically with every peptide residue side chain, with the Asp/Lys at the X position forming salt bridges with charged residues from the β-turn tip of RBL, the Leu at the Y position fully buried in the hydrophobic interface and the Pro/Ser interacting with the residues on the bottom of helices (Fig. 4b,e,h). Both the (DLP)6 binder and the (KLP)6 binder bound their target peptides with high affinities (Kd = 1.2 nM and <0.3 nM, respectively) and high specificity (Fig. 4c,f). Neither binder strongly bound (DLS)6, suggesting Pro in the peptide was crucial in the protein–peptide interactions. We sought to rescue the (DLS)6 binding by installing Gln-mediated bidentate hydrogen bonds (Fig. 4h). The resulting design bound (DLS)6 with high affinity (Kd = 2.9 nM) but retained affinity for (DLP)6 (Fig. 4i).

Discussion

There are two primary routes forward for engineering new functions using our designed RBLs. First, by analogy with the many DARPins obtained starting from stabilized consensus ankyrin repeat proteins, it should be readily possible to create binders from RBLs by random library generation in conjunction with yeast display and other selection methods for binding to targets of interest. Second, as demonstrated by our design of peptide-binding proteins, computational design methods can be used to generate binders to a wide variety of targets, taking advantage of the diverse geometries that can be achieved with different buttressed loops on different repeat protein scaffolds.

From a fundamental design perspective, the crystal structures presented here show that computational protein design has advanced to the point that proteins with multiple ordered long loops can now be designed. Key to this success was the design of dense networks of hydrogen bonding and nonpolar interactions within and between the loops and between the loops and the underlying secondary structural elements. Our approach, alone or integrated with additional recent progress in loop design16 and recently developed deep learning approaches for protein design17,18,19,20,21 (which do not directly address the challenge of designing structured long loops), should enable the design of structured loops for binding functions and beyond in a wide variety of scaffolds. For example, for enzyme design, multiple loops emanating from designed TIM barrels22,23,24 could be built to stabilize each other and, together with the residues emerging from the top of the β strands and helices in the TIM barrel structure, form an extensively buttressed catalytic site and associated substrate/transition state binding site.

Methods

Computational design method

We developed our computational protein design protocols using Rosetta25,26 (2019.01) and PyRosetta4 (release 2019.22)27. Our protocol of parametric repeat protein generation started by building an ideal helix H1 (with a length of 12–28 residues) with the MakeBundleHelix mover in Rosetta25,26 and placing it away from the z axis with a given radius and an angle corresponding to its orientation. A second helix, H2 (with a length of 12–28 residues), was then modeled and placed according to the specification of the six rigid-body degrees of freedom for geometry transformation from H1 to H2. By combining H1 and H2 into one pose, we built the first repeat unit R1. Subsequently, we used user-specified six rigid-body degrees of freedom between repeat units to perform a geometric transformation to obtain the second unit R2. We propagated the repeat units based on the number of repeats desired to generate the helical repeat protein backbones. We then connected pairs of sequence-adjacent helices with loops of three to six residues using ConnectChainMover10. To filter the generated repeat protein backbones, we required a maximum distance of 18 Å between the termini of the helices to be connected by buttressed long loops. We also removed the low-quality backbone models with fewer than 28% of the residues in a buried core.

To design buttressed loops, we developed a hybrid method that assembles native structural motifs via kinematic loop closure. To guide the sampling toward the hairpin-shaped conformations, we constructed a motif library that consists of native β turns. A β-turn motif is defined by having a backbone-to-backbone hydrogen bond between the carbonyl group of residue i and the amine group of residue i + 3 (refs. 28,29). In this work, we searched for native β-turn fragments by mining a set of selected PDBs based on 90% maximum sequence identity and a 1.6 Å resolution cutoff from PISCES30. The collected β turns were further clustered by the K-centers algorithm31 at a maximum cluster distance of 0.63 Å, resulting in 180 motif clusters. Using the same approach, we compiled a library of native helical capping motifs to guide the sampling of loops connecting helices in the repeat proteins.

We used GeneralizedKIC11 for loop closure. An extended loop fragment was first constructed by stitching native helical capping motifs (four amino acids), β-turn motifs (four amino acids) and KIC residues (five to ten amino acids) with randomized backbone torsion angles. We chose these lengths because we found limited structural diversity for loops with lengths less than nine amino acids. When the loop length exceeded 14 amino acids, it became significantly more difficult to design buttressing interactions to stabilize the entire loop. The torsion angles of β turns were set according to the motifs sampled from the β-turn library, and the Φ/Ψ torsion angles of nonpivot KIC residues were sampled from the Ramachandran distribution, with omega torsion angles fixed at 180°. All the bond lengths were kept fixed at the ideal lengths. The position of the β-turn was randomly sampled in the loop. In each step of GeneralizedKIC, kinematic loop closure was performed to connect the loop to the intended insertion site. Loop conformations were filtered by backbone steric clashes. We further filtered the models by selecting loops with at least two intraloop backbone-to-backbone hydrogen bonds. To avoid helical conformations, we removed the models predicted to have more than five consecutive helical residues by DSSP32. This ensured the extended β-hairpin shape, which contributed to the loop stability and compatibility for buttressing.

To install the loops of the same conformation in each unit of repeat proteins, we used the RepeatPropagationMover in Rosetta25,26. After filtering out the loops with steric clashes, we computed three metrics to help select the best loop conformations for buttressing—number of interloop backbone-to-backbone hydrogen bonds, loop motif score and direction score. We required at least one interloop backbone-to-backbone hydrogen bond between each pair of neighboring loops to enhance the sequence-independent loop buttressing. To select loops with loop–helix hydrophobic contacts, the motif scores were computed by matching the selected pairs of residues to the known contacting native hydrophobic residue pairs (Val, Leu, Ile, Met and Phe) in PDB33. The scores for the matched residue pairs in the loop regions were then summed to one total score. Only the loops with a negative total motif score were selected. The direction score described the relative orientation of the loops from the rest of the input repeat proteins. Specifically, we defined the following two vectors: vector a started from the center of mass of the two loop terminal residues and pointed to the farthest Cα atom of the loop; vector b started from the same point as a but pointed toward the center of mass of the repeat unit. The direction score was derived by computing the angle between the two vectors.

$${\mathrm{Direction}}\; {\mathrm{score}}={\cos }^{-1}\frac{{{\bf{a}}}{\boldsymbol{\cdot }}{{\bf{b}}}}{\left|{{\bf{a}}}\right|\left|{{\bf{b}}}\right|}$$

The accepted angles ranged from 45° to 135°. We also required at least five residues within 8 Å of the closest helical residues.

Next, we performed a fast sequence design task to identify loop conformations compatible with interloop bidentate hydrogen bond networks. From each propagated set of loops, the loop on the second repeat unit was selected for sequence design. One packing step using PackRotamersMover25,26 was conducted separately for each residue on this loop using amino acids that are compatible with forming sidechain-to-backbone bidentate hydrogen bonds—Asn, Asp, Gln or His. We excluded amino acids with longer side chains (Arg and Lys), as their high entropic cost might diminish the free energy contribution of buttressing. After each packing step, bidentate hydrogen bonds between the packed residue and its neighboring residues were counted. A bidentate hydrogen bond was defined as two separate hydrogen bonds forming between atoms in the functional group of the sidechain from a residue on the loop and the backbone of a neighboring repeat unit. The selected amino acid was kept only if it formed interloop bidentate hydrogen bonds; otherwise, the original amino acid (by default, Ala) was kept. In the case where the one-step packing approach failed to generate any interloop bidentate hydrogen bonds, we used an alternative three-stage scheme to maximize the sampling efficiency of bidentate hydrogen bonds—identifying pseudo-bidentate hydrogen bonds, performing constrained minimization for building hydrogen bonds and evaluating the resulting bidentate hydrogen bonds. We defined that a pseudo hydrogen bond has a donor–acceptor distance <3 Å and a hydrogen bond angle >120°. After propagating the designed residue to all the repeat units, we imposed a harmonic distance constraint between each donor and acceptor atoms with a target distance of 2 Å and a s.d. of 0.5 Å. At the minimization stage, we performed symmetric minimization of the loops to improve the interactions of potential hydrogen bonds. Finally, we used the Rosetta score function to examine if the bidentate hydrogen bonds formed in the minimized loop conformations.

To guide the sequence design, we used LayerSelector to define the core, the boundary and the surface layers and specified the allowed amino acids for each layer. We added residue type constraints to fix the identity of the residues participating in loop buttressing bidentate hydrogen bonds, so the stabilizing interactions obtained during loop sampling would be maintained throughout sequence design. Next, we performed four rounds of sequence design using the FastDesign mover under the repeat-symmetric constraints to ensure the repeat units had the same structures and sequences. To improve the solubility and folding of the designs, we subsequently performed one round of FastDesign to remove the solvent-exposed hydrophobic residues on the terminal repeat units. Only polar residues such as Glu, Gln, Lys and Arg were allowed for this round of design. The designed structures were then refined by minimization in Cartesian space and subsequently filtered by the number of buried unsatisfied heavy atoms (≤3), hole score normalized by total number of core residues (≤−0.015), total score normalized by total number of residues (<−2), packstat (≥0.5) and hydrogen bonding energy of each loop residue (≤−1). Top 10% scoring structures were further tested by in silico validation methods such as molecular dynamics simulations (Cα RMSD < 3 Å), AlphaFold12,13 (PLDDT > 80, Cα RMSD < 3 Å) or RoseTTAFold34 (PLDDT > 80, Cα RMSD < 3 Å). Structural similarity between native ankyrin loops and the designed RBL loops was computed by TM-align35.

We performed molecular dynamics simulations using GROMACS (2018.4)36 with the Amber99SB-ILDN force field37. The design models were solvated in dodecahedron boxes of the explicit TIP3P38 waters with the net charge neutralized. We treated long-range electrostatic interactions with the Particle-Mesh Ewald method39. Both short-range electrostatic interactions and van der Waals interactions used a cutoff of 10 Å. Energy minimization was performed using the steepest descent algorithm. A 1-ns equilibration under the NPT ensemble was subsequently performed with position restraints on the heavy atoms. We used Parrinello–Rahman barostat40 and velocity-rescaling thermostat41 for pressure coupling (1 atm) and temperature coupling (310 K), respectively. For the production runs, we launched three 20-ns trajectories under the NPT ensemble for each design model. The Cα atom RMSD against the design model was computed for analysis.

Protein expression and characterization

Genes encoding the in silico validated designs were synthesized (IDT) and cloned into pET-29b expression vectors. The plasmids were transformed into Lemo21 (DE3) expression E. coli strain (NEB). Protein expression was performed using the auto-induction protocol42 at 37 °C for 24 h in 50 ml or 100 ml culture. During the purification, cells were pelleted at 4,000g for 10 min and resuspended in 25 ml lysis buffer (25 mM Tris–HCl (pH = 8), 150 mM NaCl, 30 mM imidazole, 1 mM DNase and 10 mM lysozyme with Pierce Protease Inhibitor Tablets (Thermo Fisher Scientific)). Sonication was subsequently performed for 2.5 min (10 s on and 10 s off per cycle). The lysate was then centrifuged at 16,000g for 30 min. The supernatant was applied to a gravity flow column packed with Ni-NTA resin (Qiagen), followed by 20 ml wash buffer (25 mM Tris–HCl (pH = 8), 150 mM NaCl and 30 mM imidazole) and 5 ml elution buffer (25 mM Tris–HCl (pH = 8), 150 mM NaCl and 400 mM imidazole). The eluted protein was then concentrated and injected into an Akta Pure FPLC device with a flow rate of 0.75 ml min−1 in the running buffer (25 mM Tris–HCl (pH = 8) and 150 mM NaCl). The typical yield of a monodisperse and thermally stable designed RBL is 1–6 g l−1. To perform SEC–MALS, we prepared the purified protein at ~2 mg ml−1 and injected 100 μl of sample into a Superdex 200 10/300GL column and measured the light scattering signals using a miniDAWN TREOS device (Wyatt Technology). To measure the CD signals, we first prepared the sample at ~0.2 mg ml−1 in 25 mM phosphate buffer in a 1 mm cuvette. A Jasco J-1500 CD spectrometer was used for all CD measurements. We set the range of wavelength from 190 nm to 260 nm and scanned over a three-temperature (25 °C, 95 °C and cooling back to 25 °C) set for each sample. We submitted all samples for SAXS43,44 to Advanced Light Source, LBNL for data collection at the SIBYLS 12.3.1 beamline.

Design and characterization of repeat peptide-binding proteins

We used the recently developed protein interface design method7 for in silico binder docking and design experiments. Docking of repeat peptides to the binder scaffold was guided by the geometric transformation between native ankyrins and their peptide targets in the crystal structures from PDB15. Symmetric sequence design was performed for each docked peptide–protein pair following the same protocol used for designing RBLs. All the designed complexes were computationally tested by AlphaFold with a cutoff of PAE_interaction ≤15 before experimental characterization.

Split-luciferase assay was performed using the Nano-Glo Luciferase Assay System (Promega). The coding sequence of small-BiT was fused to the gene of peptide binders, and the coding sequence of large-BiT was fused to the coding sequence of the target peptide (GenScript). The BiT-fused proteins and peptides were expressed and purified with the same protocol for RBLs. The purified peptide binders and target peptides were titrated in the presence of Nano-Glo substrate in 96-well plates, and the luminescence was measured on a Synergy Neo2 plate reader (Agilent Technologies). To conduct the fluorescence polarization binding assays, we synthesized the repeat peptide fragments with N-terminal tetramethylrhodamine labels. Fluorescence polarization measurements were performed at 25 °C in a Synergy Neo2 plate reader (Agilent Technologies) with a 530/590 nm filter. A series of twofold dilutions of binder–peptide 80-μl mixture were performed in 25 mM Tris–HCl (pH = 8), 150 mM NaCl and 0.05% (vol/vol) Tween 20 in 96-well assay plates. The protein concentrations ranged from 4 μM to 0.47 pM, and the concentration of N-terminal tetramethylrhodamine-labeled peptide was kept at 0.3 nM. The samples were incubated for 3 h before measurement.

Structural characterization by X-ray crystallography

RBL4 was concentrated to 150 mg ml−1 and crystallized by vapor diffusion. Initial crystals formed in the MCSG-2 crystallization screen (Anatrace) and optimized crystals were grown in 100 mM sodium acetate, pH 4.4, and 2% polyethylene glycol 4000. The crystal was cryoprotected with 30% ethylene glycol and flash-cooled in liquid nitrogen. Diffraction was measured at the Advanced Photon Source beamline 23 ID-B. Reflections were indexed, integrated and scaled with autoPROC (1.0.5)45. The structure was solved by molecular replacement in Phaser (2.8.3)46. Initial attempts using the predicted model were unsuccessful due to clashes. A subsequent search for eight copies of a single helix–loop–helix repeat (76–118 residues) identified two copies of the protein in the asymmetric unit. The model was rebuilt using Phenix AutoBuild (1.18.2_3874)47 and completed by iterative rounds of interactive refinement in Coot (0.9.5)48 and reciprocal space refinement in Phenix (1.19.1_4122)49,50,51,52. The final refinement strategy included reciprocal space refinement, individual atomic displacement parameters, Translation/Libration/Screw refinement using parameters determined with TLSMD (13 June 2012)53 and occupancy refinement of alternate conformations. Model geometry was assessed with MolProbity (implemented in Phenix 1.19.1_4122)54. The final model included 99.5% of residues in the favored region of the Ramachandran plot with no outliers.

RBL7_C2_3 was concentrated to 119 mg ml−1 and crystallized by vapor diffusion in 2.4 M sodium malonate, pH 7.0, using the MCSG-1 crystallization screen (Anatrace). The crystal was cryoprotected by the addition of ten volumes of 3.4 M sodium malonate, pH 7.0, and flash-cooled in liquid nitrogen. Reflections were indexed, integrated and scaled with XDS (5 February 2021)55. To solve the structure by molecular replacement, an ensemble of monomer structures was generated by AlphaFold and used as a search ensemble in Phaser (2.8.3). The solution contained eight molecules that formed four homodimers. The model was rebuilt with Phenix AutoBuild (1.19.2_4158) with morphing and completed by iterative rounds of interactive refinement in Coot (0.9.8.6) and reciprocal space refinement in Buster (2.10.4)56 or Phenix (1.20.1_4487). The final refinement strategy in Phenix included reciprocal space refinement, individual atomic displacement parameters, noncrystallographic symmetry restraints and Translation/Libration/Screw refinement using one group per chain. Model geometry was assessed with MolProbity (implemented in Phenix 1.20.1_4487)54. The final model had 98.22% of residues in the favored regions of the Ramachandran plot with no outliers. Composite omit maps were generated in Phenix by sequentially omitting 5% of the final structure model and performing simulated annealing from 5,000 K. Crystallographic software was installed and maintained using SBGrid57.

Data analysis and visualization were performed using Python (3.7)58, seaborn (0.11.2)59, Matplotlib (3.1.3)60, Pandas (0.24.2)61,62 and PyMOL (2.4.1)63.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.