Biological context

Non-ribosomal peptide synthases (NRPS) use amino acids as building blocks for the synthesis of complex natural products. Each amino acid is incorporated into the final product by an individual specialized functional module of the NRPS. In turn, each individual module contains a number of enzymatically distinct catalytic domains. A typical NRPS elongation module contains at least an adenylation (A), a thiolation (T) and a condensation (C) domain. The adenylation (A) domain uses ATP to activate a specific amino acid as aminoacyl adenylate and then transfers the activated amino acid to the neighbouring thiolation (T) domain. The T domain in its holo form contains a phosphopantetheinyl (Ppant) moiety derived from coenzyme A covalently bound to a conserved serine side chain. The activated amino acid reacts with the terminal thiol group of the Ppant moiety to form a thioester. The T domain is followed by a condensation domain (C) which catalyses the formation of peptide bonds between the amino acid bound to the T domain of its own module and an amino acid or a peptide chain bound to the T domain of the downstream module. Additional domains with enzymatic activities such as methylation or epimerization can be included in individual modules and may further modify the peptide product. Finally, the nascent peptide is released by a thioesterase (TE) domain localized at the C-terminus of a specialized termination module. TE domains can release either linear, cyclic or branched cyclic peptides (Süssmuth and Mainz 2017).

The individual functional modules necessary for the stepwise incorporation of each amino acid into the final product can be located either on a single long protein chain or on multiple proteins. When all modules are arranged on a single protein chain their linear order directly predicts (Mootz et al. 2002) the order of amino acid building blocks in the final product. In multi-protein NRPS systems specific non-covalent interactions between the individual protein chains determine the functional assembly of the NRPS complex and thereby the order of amino acids in the synthesized peptide. Hahn and Stachelhaus (Hahn and Stachelhaus 2004) were first in identifying short amino acid sequences at the N- and C-termini of the individual protein chains in a multi-protein NRPS that were capable of mediating specific noncovalent interactions between these protein chains. They referred to these sequences as “communication-mediating (COM) domains”. Functionally similar structural elements, which were named “docking domains (DD)” (Broadhurst et al. 2003), were identified in polyketide synthase (PKS) complexes which possess a multi-modular architecture similar to NRPS systems. The term “docking domain” is now more commonly used for domains that enable non-covalent specific interactions between individual protein chains in both PKS and NRPS complexes. Multiple structurally diverse families of DD architectures have been described so far (Broadhurst et al. 2003; Buchholz et al. 2009; Dorival et al. 2016; Hacker et al. 2018; Watzel et al. 2020; Whicher et al. 2013). Many DD structural families are dominated by α-helical secondary structure elements (Buchholz et al. 2009; Watzel et al. 2020; Whicher et al. 2013). The binding affinities even for specific interactions between DDs are relatively weak with typical dissociation constants for the DD complexes in the range between 5 µM and 25 µM (Dorival et al. 2016; Hacker et al. 2018; Watzel et al. 2020; Whicher et al. 2013). Nevertheless, for many types of DD interactions it has been experimentally demonstrated that the DDs act independently from the other functional domains in an NRPS or PKS in order to mediate the non-covalent interactions between protein chains needed for the functional assembly of functional megasynthase complexes (Dorival et al. 2016; Hacker et al. 2018). However, it is not yet clear if this is true for all types of docking domains.

The PaxS NRPS found in Gram-negative entomopathogenic bacteria from the genus Xenorhabdus is a prototypical example for a multi-protein assembly line. It consists of the three proteins PaxA, PaxB and PaxC. The three proteins interact with each other non-covalently in a unidirectional manner where PaxA is bound by PaxB and PaxB is bound by PaxC. The DD pair mediating the specific interaction between the C-terminus of PaxB and the N-terminus of PaxC has been characterized functionally, structurally and biophysically in previous work (Kegler and Bode 2020; Watzel et al. 2020). A structural basis for the specific interaction between the C-terminus of PaxA and the N-terminus of PaxB has not yet been established. Notably, the putative C-terminal docking domain (CDD) in PaxA was predicted to be very short in comparison to other structurally characterized types of docking domains. Thus, in this case the T domain directly preceding the CDD of PaxA might play a supporting role in establishing a specific non-covalent interaction with the N-terminal docking domain (NDD) of PaxB. As a prerequisite for investigating a putative role of the T domain in this DD interaction and for establishing a structural basis for the specific interaction between PaxA and PaxB in the PaxS NRPS from Xenorhabdus cabanillasii JM26 we present here the NMR resonance assignments for a PaxA T1-CDD di-domain construct and for the PaxB NDD in their free states and in a 1:1 non-covalent complex. The T1-CDD (amino acids 981–1084 of PaxA) comprises 104 amino acids (12 kDa), whereas the NDD (amino acids 1–30 of PaxB) contains only 30 amino acids (3.6 kDa).

Methods and experiments

Cloning, expression and purification

The DNA coding sequences for PaxA T1-CDD and PaxB NDD were generated by PCR amplification using the genomic DNA from Xenorhabdus cabanillasii JM26 as the template. All protein sequences referred to in this work are based on the UniProt Archive (UniParc) entries for PaxA (UPI0003E57C57) and PaxB (UPI000C04EDD1) based on a genome assembly for Xenorhabdus cabanillasii JM26 produced in our group (NCBI: ASM263290v1; GenBank: NJGH00000000 (Tobias et al. 2017)). For protein concentration measurements a codon for a tyrosine residue was added downstream of the PaxB NDD coding sequence via the used primer. The inserts were cloned into a modified pET-11a vector (Hacker et al. 2018) containing the sequence for an N-terminal hexahistidine (His6)-tag followed by a SUMO (SMT3)-tag which also functions as a cleavage site for the ULP1 protease (Malakhov et al. 2004). The PaxA T1-CDD was expressed in its apo form by using Escherichia coli BL21(DE3)ΔentD (Owen et al. 2012) cells preventing post-translational modification of the T domain by the covalent addition of a phosphopantetheinyl moiety, whereas the PaxB NDD was expressed in E. coli BL21(DE3) Gold (Agilent Technologies/Stratagene) cells.

Protein expression was induced at OD600 ~ 0.7 with 1 mM IPTG at 20 °C for ~ 18 h in media supplemented with ampicillin (100 µg/ml). The expression of uniformly 15 N- and 15 N,13C-labelled proteins was achieved by using M9 minimal medium supplemented with 1 g/l 15NH4Cl or 1 g/l 15NH4Cl and 2.5 g/l 13C6-D-glucose (Cambridge Isotope Laboratories). For the stereospecific NMR assignment of γ1/2 and δ1/2 methyl groups of valine and leucine, the PaxA T1-CDD as well as the PaxB NDD were expressed as fractionally 13C-labelled proteins in M9 minimal medium containing a mixture of 0.25 g L−1 13C6-D-glucose and 2.25 g L−1 unlabeled glucose as the sole carbon source (Neri et al. 1989).

Cell lysis of E. coli cells was accomplished by sonication in a buffer containing 50 mM sodium phosphate, pH 8.0, 300 mM NaCl, 1 mM EDTA, 10 mM MgCl2, 2 mM β-mercaptoethanol, Benzonase (Merck) and cOmplete protease inhibitor (Roche). The cell debris was cleared from the lysate by centrifugation (8000 × g, 4 °C, 30 min) and the supernatant was run through a HisTrap HP column (GE Healthcare). With the help of a recombinantly produced His6-tagged ULP1 protease the His6-tagged SUMO (SMT3)-tag was cleaved off from the recombinant fusion proteins, leading to native target protein sequences. Both His6-tagged ULP1 and the SUMO-tag were separated from the protein of interest by a second immobilized metal ion affinity chromatography using a HisTrap HP column, followed by gel filtration chromatography with a HiPrep 16/60 Sephacryl S-100 high resolution column (GE Healthcare). For NMR measurements, the samples of the individual proteins (300 µM protein concentration) and the PaxA T1-CDD:PaxB NDD/PaxB NDD:PaxA T1-CDD complexes (300 µM protein A concentration: 360 µM protein B concentration) were prepared in a buffer containing 50 mM sodium phosphate, pH 6.5, 100 mM NaCl, 2 mM β-mercaptoethanol and 5% (v/v) D2O.

NMR spectroscopy

NMR experiments were recorded at 293 K on Bruker AVANCE III HD 600, 700 and 800 MHz spectrometers, each of them equipped with 5 mm cryogenic triple resonance probes. 1H chemical shifts were internally referenced to DSS, whereas the heteronuclear 13C and 15 N chemical shifts were indirectly referenced with the appropriate conversion factors (Markley et al. 1998). Spectra were processed using TOPSPIN 3.6.2 (Bruker) and analysed with CARA (Keller 2004). The secondary structure of the unbound and bound PaxA T1-CDD and PaxB NDD was derived from TALOS-N (Shen and Bax 2013) based on the chemical shift assignments.

For the backbone assignment of the free PaxA T1-CDD a uniformly 13C,15 N-labelled sample was used and the following triple resonance experiments were recorded: HNCO, HNCA, HNCACB, HBHA(CO)NH, CBCA(CO)NH (Sattler 1999). The backbone and side chain assignments of the unbound 13C,15 N-labelled PaxB NDD were derived from 3D HNCO, HNCACB, HBHA(CO)NH, H(CCO)NH (mixing time 12 ms) and (H)C(CO)NH (mixing time 12 ms) experiments (Sattler 1999). The backbone resonances of 13C,15 N-labelled PaxA T1-CDD/PaxB NDD in complex with a 1.2-fold excess of unlabelled PaxB NDD/PaxA T1-CDD were assigned on the basis of 3D HNCO, HNCA, HNCACB, and CBCA(CO)NH experiments (Sattler 1999). Side chain assignments were obtained from 3D HBHA(CO)NH, H(C)CH-/(H)CCH-TOCSY (mixing times 12 ms) (Sattler 1999) and H(C)CH-COSY (Bax et al. 1990) experiments. Stereospecific assignments of valine γ1/2 and leucine δ1/2 methyl groups of the bound PaxA T1-CDD and free and bound PaxB NDD were determined in 1H,13C-HSQC experiments with a resolution in the 13C dimension of ~ 28 Hz, ~ 26 Hz and ~ 23 Hz, respectively. This allowed the unambiguous discrimination between the signals for the γ2/ δ2 CH3 groups of valine and leucine which appear as singlets in the 13C-dimension and the signals of the γ1/ δ1 CH3 groups which appear as doublets separated by the 1J13C,13C coupling constant of ~ 33 Hz (Neri et al. 1989).

Assignment and data deposition

The protein construct PaxA T1-CDD contains amino acids 981 to 1084 of the PaxA protein from Xenorhabdus cabanillasii JM26. It has a molecular weight of 12 kDa and consists of 104 amino acid residues. Most of the backbone amid signals are well dispersed in the 1H,15 N-HSQC spectrum of the unbound PaxA T1-CDD as is typical for a well-folded protein. The 1H,15 N-HSQC is expected to contain 101 backbone amide signals since there is no observable amide signal for the N-terminal residue D981 and there are two proline residues (P993, P1054) present in the sequence. However, only 99 backbone amide signals were observed and assigned (99/101, 98.0%; Fig. 1a, top). No backbone amide signals were detectable for residues H982 and S1027 most likely due to fast exchange of the respective amide protons with the solvent or due to conformational exchange. S1027 is the residue that would be posttranslationally modified by addition of the Ppant-arm in the native environment. Furthermore, 99% of all Cα (103/104), 98% of all Cβ (98/100), 95% of all CO (99/104) and Hα (99/104) and 94% of all Hβ (94/100) chemical shifts were assigned. The backbone chemical shifts were used to derive the putative secondary structure of the free PaxA T1-CDD. Based on these chemical shifts TALOS-N (Shen and Bax 2013) identified five amino acid stretches with high α-helical propensity (Fig. 1a, bottom). This secondary structure fits well to that of previously described T domain structures, which typically feature a four-helix bundle (Weber et al. 2000). Compared to the canonical four-helix bundle T domain fold the PaxA T1 domain contains an additional very short (3 residues) α-helix in the linker region between the canonical helices α1 and α2. Such an additional helix is also present at this position in some other carrier protein structures (Lohman et al. 2014). According to TALOS-N the C-terminus including the predicted CDD of the PaxA T1-CDD construct is unstructured.

Fig. 1
figure 1

(a, top) 1H,15 N-HSQC spectra and assigned backbone amide signals for the free PaxA T1-CDD and (b, top) the free PaxB NDD. The spectra were recorded at 293 K on a 600 MHz Bruker Avance III spectrometer. The assigned NMR signals are labelled with their respective residue names and numbers and the central spectral region of the PaxA T1-CDD with increased peak overlap is enlarged for a better overview (box). (a, b, bottom) TALOS-N based secondary structure analysis

The PaxB NDD construct contains amino acids 1–30 of the PaxB protein followed by a non-native tyrosine at the C-terminus and has a molecular weight of 3.6 kDa. Surprisingly, the backbone amide signals of the free PaxB NDD are well dispersed in the 1H,15 N-HSQC spectrum (Fig. 1b, top) indicating a folded protein. Overall, 27 backbone amide signals were assigned since the sequence contains one proline residue (P10) and no backbone amide signals were observable for M1, N2 and the non-native Y31. Additionally, 93% of all CO (28/30) and all Cα (30/30), Cβ (30/30), Hα (30/30) and Hβ (30/30) chemical shifts were assigned. The overall side chain assignments were completed to 99.2% for aliphatic protons and to 99.0% for aliphatic carbons. No aromatic side chain NMR signals were assigned since the PaxB NDD contains only a single native aromatic residue (H30) and the non-native Y31. The TALOS-N analysis of the secondary structure suggests the presence of a single continuous α-helix including residues L11 to K22 in the free PaxB NDD (Fig. 1b, bottom). The 13Cγ and 13Cβ chemical shifts of proline P10 are in agreement with a trans conformation for this residue (Schubert et al. 2002). In addition, all valine γ1/2 and leucine δ1/2 methyl groups of the unbound PaxB NDD were stereospecifically assigned.

PaxA T1-CDD and PaxB NDD form a stable 1:1 complex according to analytical gel filtration and NMR titration experiments which is in slow exchange on the NMR time scale. Assignments for both proteins in their bound states were made using samples containing one binding partner in 13C,15 N-labelled form and a 1.2-fold excess of the unlabelled binding partner. The backbone resonances of PaxA T1-CDD bound to the PaxB NDD could be assigned with the same degree of completeness as for the free protein. All expected backbone amide signals except those for H982 and S1027 (99/101, 98.0%; Fig. 2a, top) and all Cα (104/104), 99% of all Cβ (99/100), 95% of all CO (99/104), all Hα (104/104) and 99% of all Hβ (99/100) chemical shifts were assigned. The overall side chain assignments were completed to 98.7% for the remaining aliphatic protons and to 98.2% for aliphatic carbons. All valine γ1/2 and leucine δ1/2 methyl groups were stereospecifically assigned. In addition, 56.3% of all aromatic side chain proton-bound carbon and carbon-bound proton signals were assigned.

Fig. 2
figure 2

(a, top) 1H,15 N-HSQC spectra and assigned backbone amide signals for the bound PaxA T1-CDD and (b, top) the bound PaxB NDD. The spectra were recorded at 293 K on a 600 MHz Bruker Avance III spectrometer. The assigned NMR signals are labelled with their respective residue names and numbers. The central spectral region of the PaxA T1-CDD with increased peak overlap is enlarged for a better overview (box). Assigned sidechain NH2 signals are indicated by horizontal bars and labelled with the respective residue names and numbers. The labelled ϵ imino groups of R1013, R1016 and R1036 in the PaxA T1-CDD and of R16 in the PaxB NDD are folded into the spectrum. (a, b, bottom) TALOS-N based secondary structure analysis

The analysis of the backbone chemical shifts of the PaxA T1-CDD with TALOS-N in its bound state suggests that its secondary structure is very similar to that in its free state but that there is now an additional α-helix present at the very C-terminus suggesting that the predicted docking domain becomes structured upon binding (Fig. 2a, bottom).

The backbone assignment of the bound PaxB NDD (Fig. 2b, top) includes 27 amide backbone signals, 93% of all CO (28/30) and all Cα (30/30), Cβ (30/30), Hα (30/30) and Hβ (30/30) chemical shifts. The overall side chain assignments were completed to 99.2% for aliphatic protons and to 99.0% for aliphatic carbons. All valine γ1/2 and leucine δ1/2 methyl groups were stereospecifically assigned. In addition, 50% of all aromatic side chain proton-bound carbon and carbon-bound proton signals were assigned. The TALOS-N derived secondary structure for the bound PaxB NDD indicates the presence of two α-helices (Fig. 2b, bottom). The proline residue P10 remains in the trans conformation in the bound state according to its 13Cγ and 13Cβ chemical shifts (Schubert et al. 2002).

The assigned chemical shifts have been deposited in the BMRB under the accession numbers 50,594, 34,576 and 34,575 for the free PaxA T1-CDD construct, the free PaxB NDD and the complex, respectively.