Refinement of the stable complex structure
To refine against intermolecular CXMS restraints, we treated each subunit as a rigid body. Any two cross-linked lysine residues were restrained to have their Cα-Cα distance to be less than the maximum length of the corresponding cross-linker using a square-well pseudo-energy potential. BS3 and BS2G covalently link lysine residues <24 Å and <20 Å apart, respectively, as measured from Cα to Cα atoms (Lee 2009; Kahraman et al. 2011). Cross-links may also involve protein N-terminus; when fully extended, the maximum Cα-Cα distance between an N-terminal residue and a lysine is 15 Å for BS2G and 19 Å for BS3.
We then assessed the refinement protocol on the complex between trypsin and bovine pancreatic trypsin inhibitor (BPTI), a stable complex with a K
D value of ~60 fmol/L (Marquart et al. 1983; Kastritis et al. 2011). Based on the known structure of the complex (PDB code 2PTC), there can be a maximum of 17 theoretical inter-subunit lysine-lysine cross-links with BS3 cross-linking reagent (Table S1). Starting from the structures for the free proteins (PDB codes 4GUX and 1JV8, for trypsin and BPTI, respectively), we fixed the coordinates of trypsin and allowed BPTI to freely rotate and translate as a rigid body. With simulated annealing, we refined the complex structure against the CXMS restraints, with additional van der Waals repulsive term employed. Calculating one structure takes less than 2 min on a single core of Intel Xenon 5620 CPU. Repeating the calculation from different starting positions for the two subunits afforded a set of highly converged structures with overall root-mean-square deviation (RMSD) for backbone heavy atoms almost 0 Å. Importantly, the RMS difference between the CXMS model and the crystal structure was only 0.54 Å (Fig. 1).
Further assessment of the rigid-body refinement protocol
In practice, however, it is rare to have as many as 17 intermolecular cross-links for a complex with the size of trypsin/BPTI (281 residues total and 18 lysine residues). Often, only a few cross-links can be experientially identified. To assess how robust the refinement protocol is with fewer CXMS restraints, we obtained CXMS data from the published studies (Herzog et al. 2012; Kahraman et al. 2013) for the complex between protein phosphatase 2A catalytic subunit (PP2Ac) and immunoglobulin binding protein 1 (IGBP1). PP2Ac and IGBP1 interact with each other with a K
D value of ~300 nmol/L (Jiang et al. 2013), and six intermolecular cross-links were identified between Lys28-Lys158, Lys33-Lys166, Lys35-Lys163, Lys40-Lys158, Lys40-Lys163, and Lys40-Lys166 (from PP2Ac to IGBP1) (Herzog et al. 2012). Starting from the structures for free PP2Ac (PDB code 2NYL) and IGBP1 (PDB code 3QC1) proteins, we obtained their complex structures by refining against the CXMS distance restraints. The probabilistic distribution was computed for PP2Ac with respect to IGBP1 in all the structural models and was shown as atomic probability map (Schwieters and Clore 2002), which encompassed the known complex structure (Fig. 2A). Importantly, the overall backbone RMS difference between the CXMS models and the crystal structure for PP2Ac/IGBP1 complex was as small as 2.8 Å (Fig. 2B) (Jiang et al. 2013).
Then what is the minimum number of intermolecular cross-links needed to model the complex structure? With the use of three experimental cross-links involving PP2Ac Lys40 (Lys40-Lys158, Lys40-Lys163, and Lys40-Lys166), the resulting structures took up similar positions (Fig. S1A) as the structures calculated using the full set of CXMS restraints, though a bit more scattered. With only one CXMS restraint, for example from PP2Ac Lys35 to IGBP1 Lys163, the modeling still afforded a set of CXMS models that are similar to those calculated with the full set of experimental CXMS restraints (Fig. S1B). Thus, the more CXMS restraints were incorporated, the more converged the resulting models were. We also performed the structural refinement using five out of the six cross-links, and then back-calculated the Cα-Cα distance for the unused cross-link. Except for the cross-link between PP2Ac Lys28 and IGBP1 Lys158, the calculated distances are mostly within the maximum length stipulated by the corresponding cross-linker (Table S2). Thus, the cross-link between PP2Ac Lys28 and IGBP1 Lys158 afforded a key restraint about the complex structure, and owing to the sparsity of the inter-molecular cross-links, this cross-link is not redundantly provided by other cross-links.
Using CXMS, we characterized the complex between CDK9 and Cyclin-T1. This complex is responsible for transcription elongation, and its two subunits interact with each other at a K
D value of ~300 nmol/L (Baumli et al. 2008). We focused our attention on the intermolecular cross-links that were identified twice or more, for which the probability of being observed by random chance was below 10−8 for at least one instance and below 10−3 for additional instances (a false discovery rate cutoff of 0.05, an E-value cutoff rate of 10−3, spectral count ≥2, and the best E-value cutoff of 10−8). With these stringent criteria, it would be unlikely that the cross-links were identified by random chance, and the remaining cross-links should be correctly assigned. Three intermolecular cross-links were identified for CDK9/Cyclin-T1 (Table 1) and the corresponding MS2 spectra are shown in Fig. S2. For each, the two linked lysine residues were found within the maximum length of the cross-linker, as calculated from the known structure of the complex (Baumli et al. 2008).
Table 1 Intermolecular cross-links observed for transient and fleeting protein complexes
We treated each subunit in CDK9/Cyclin-T1 as a rigid body, and refined against the intermolecular CXMS distance restraints: two cross-linked lysine residues were restrained to have their Cα-Cα distance to be less than the maximum length of the corresponding cross-linker using a square-well energy potential. Since each intermolecular cross-link was observed with both BS2G and BS3 cross-linkers (Table 1), we restrained the Cα-Cα distance to be shorter than the length of BS2G (20 Å for lysine-lysine cross-links and 15 Å for lysine-protein N terminus cross-links). In the refinement, the coordinates for one subunit, CDK9, were fixed, while the other subunit, Cyclin-T1, was grouped as a rigid body, given full translational and rotational freedoms. A single intermolecular CXMS restraint was readily satisfied, but the resulting complex model was poorly converged, with Cyclin-T1 dangling along one side of CDK9 (Fig. S3). As Lys74 and Lys144 are adjacent to each other in CDK9, cross-links of Cyclin-T1 Lys6 to these two residues provided redundant information about the complex structure. Cyclin-T1 Lys100 and CDK9 Lys56 are located at the other side of the complex; as a result, the refinement against the corresponding cross-link restraint afforded a different but overlapping distribution of the complex. With all three restraints used, a narrower distribution was obtained (Fig. 3A). Significantly, the structural models based on CXMS restraints encompassed the known crystal structure of CDK9/Cyclin-T1, and the pairwise RMS difference between the CXMS model and the PDB structure was as small as 2.86 Å (Fig. 3B). Thus, we show that the CDK9/Cyclin-T1 complex can be modeled as a single conformer, based on sparse CXMS distance restraints.
CXMS analyses of transient and fleeting complexes
We then performed CXMS analysis for EIN/HPr and ubiquitin homodimeric complexes using BS2G and BS3. EIN and HPr are involved in signal transduction for bacterial sugar uptake and interact with each other with a K
D value of ~7 µmol/L (Suh et al. 2007). Ubiquitin is an important signaling protein in cell and can noncovalently dimerize with a K
D value of ~5 mmol/L (Liu et al. 2012). Using the same stringent criteria described above, intermolecular cross-links for the two complexes are also presented in Table 1, and the corresponding MS2 spectra are shown in Figs. S4 and S5. A total of 13 intermolecular cross-links were identified for EIN/HPr, but only one of them (EIN Lys58 to HPr Lys24) was found consistent with the stereospecific complex structure (Garrett et al. 1999). For validation, we also performed CXMS analysis for EIN/HPr using PDH (Leitner et al. 2014) as the cross-linking reagent.
In order to identify intermolecular cross-links between two ubiquitin subunits in a ubiquitin homodimer, we performed CXMS analysis on a mixture of 14N-labeled (natural isotope abundance) and 15N-labeled ubiquitin proteins (Liu et al. 2012). The cross-links between 14N- and 15N-labeled peptides with characteristic MS1 spectra (Fig. S6) should only arise from intermolecular interactions (Taverner et al. 2002). In this way, we identified a total of seven intermolecular cross-links for the ubiquitin homodimer.
Ensemble structure refinement of protein encounter complexes
To account for the experimental cross-links and to model the structure of EIN/HPr complex, we fixed the position of EIN and treated HPr as a rigid body given rotational and translational freedoms. The intermolecular cross-links could not be satisfied with a single-conformer representation of the complex, as the restraints were consistently violated with an average violation >8 Å (Fig. 4A). This means that in addition to the stereospecific complex, HPr sampled a multitude of conformations with respect to EIN, which were captured by cross-linking. Thus, we invoked ensemble representation for the complex—with EIN fixed, HPr was represented as multiple conformers. We treated each intermolecular cross-link as an ambiguous restraint (Nilges 1995), and defined the CXMS energy averaged over all the conformers in the ensemble with a steep dependence on the Cα-Cα distance. In this way, a CXMS restraint could be satisfied providing that it was accounted for by at least one conformer in the ensemble. The ensemble refinement showed that a minimum of four conformers was required to fully satisfy the intermolecular CXMS restraints with an average distance violation close to 0 Å (Fig. 4A). Too large an ensemble size, however, would lead to over-fitting. When using five conformers to represent the complex, HPr in the additional conformers were found scattering around, making no contribution to the CXMS energy (Fig. S7).
Using a spherical coordinate system, we projected the positions of HPr with respect to EIN in the CXMS models to lower dimensions. In the 2D plot, HPr was found in four distinct clusters (Fig. 4B), thus explaining the requirement of four conformers in the ensemble. One cluster (SC) contained conformers overlapping with the known complex structure, and therefore accounted for the stereospecific EIN/HPr interactions. HPr was positioned away from the specific interface with EIN in the other three clusters (EC-I, EC-II and EC-III), which represented non-specific interactions between EIN and HPr. Each cluster of conformers accounted for multiple intermolecular cross-links (Table 1).
We could cross-validate the ensemble structure modeled from lysine-lysine cross-links with the CXMS restraints from a different cross-linking reagent, PDH (Leitner et al. 2014). For a pair of PDH cross-linked glutamate residues, the Cα-Cα distance should be less than 22 Å. With high confidence, the PDH cross-links were identified between EIN Glu41 and HPr Glu85 and between EIN Glu67 and HPr Glu85 (Fig. S8). Calculated from the stereospecific complex structure (Garrett et al. 1999), the Cα-Cα distances for these two pairs of residues were 41.2 and 12.9 Å, respectively. Clearly, the cross-link between EIN Glu41 and HPr Glu85 could not be accounted for with the stereospecific complex structure alone. In the four-conformer ensemble structure modeled from BS2G/BS3 CXMS data, however, the averaged Cα-Cα distance between EIN Glu41 and HPr Glu85 was 23.1 ± 4.9 Å.
Previously, EIN/HPr complex has been characterized with paramagnetic nuclear magnetic resonance (NMR), and it was shown that EIN and HPr form a multitude of encounter complexes, which facilitate the formation of the stereospecific complex (Tang et al. 2006; Fawzi et al. 2010). Protein encounter complexes are of low occupancies and short lifetimes. Previous NMR studies estimated that encounter complexes made up less than 10% of the total EIN/HPr complex, thus putting the apparent K
D value for the encounter interactions >10 mmol/L (Fawzi et al. 2010). Importantly, the distribution of HPr relative to EIN modeled on the basis of CXMS data (Fig. 4C) resembles the EIN/HPr encounter complexes previously depicted using NMR spectroscopy (Fig. 4D).
Ensemble structure refinement of a fleeting complex
Performing CXMS experiments on an equimolar mixture of 15N- and 14N-labeled ubiquitin proteins, we identified five inter-molecular cross-links. We fixed the coordinates for one ubiquitin, and allowed the other one to move. A single conformation for the ubiquitin dimer failed to satisfy all the restraints, with average violations ~2 Å. Hence we represented the ubiquitin dimer with two, three, and four conformers, with C
2 non-crystallographic symmetry enforced for each pair of ubiquitin dimer. The CXMS restraints could be satisfied with an N = 2 ensemble. Increasing the size of the ensemble did not improve the agreement between experimental and calculated Cα-Cα distances, and the additional conformers in the N = 3 and 4 ensemble scattered around with respect to its dimer partner (Fig. S9). Thus, the N = 2 ensemble was sufficient to describe the dynamic interactions between two ubiquitin proteins.
In the CXMS models, the two ubiquitins adopt a variety of orientations (Fig. 5A), characteristic of fleeting protein-protein interactions (Liu et al. 2016). This also explains why Lys48 in one ubiquitin was able to cross-link to five different lysine residues, except for Lys27 and Lys63, in the other ubiquitin. Importantly, the two subunits interacted at the β-sheet region in the CXMS models, and the distribution of the CXMS models was in good agreement with a previous NMR characterization of the ubiquitin homodimer (Fig. 5B).