Protocol for analyzing protein ensemble structures from chemical cross-links using DynaXL
Chemical cross-linking coupled with mass spectroscopy (CXMS) is a powerful technique for investigating protein structures. CXMS has been mostly used to characterize the predominant structure for a protein, whereas cross-links incompatible with a unique structure of a protein or a protein complex are often discarded. We have recently shown that the so-called over-length cross-links actually contain protein dynamics information. We have thus established a method called DynaXL, which allow us to extract the information from the over-length cross-links and to visualize protein ensemble structures. In this protocol, we present the detailed procedure for using DynaXL, which comprises five steps. They are identification of highly confident cross-links, delineation of protein domains/subunits, ensemble rigid-body refinement, and final validation/assessment. The DynaXL method is generally applicable for analyzing the ensemble structures of multi-domain proteins and protein–protein complexes, and is freely available at www.tanglab.org/resources.
KeywordsChemical cross-linking DynaXL Ensemble refinement Solvent accessible surface distance Multi-domain protein Protein–protein complex
Multi-domain proteins and protein–protein complexes often undergo conformational fluctuations (Liu et al. 2015a, b). As a result, the ensemble structures corresponding to multiple conformational states are required to fully depict protein dynamics. Protein structural characterization has been mostly focused on the predominant structure of a protein. Recently we and others have shown that the so-called over-length cross-links actually contain information about the alternative and often lowly populated conformational states of the protein (Shore et al. 2016). Based on the over-length cross-links, we have developed a computational approach called DynaXL to visualize protein dynamics. Using DynaXL, we were able to characterize the ensemble structures of protein–protein complexes, with the dissociation constant ranging from nanomolar to millimolar (Gong et al. 2015). We were also able to visualize open-to-closed movement of multi-domain proteins (Ding et al. 2017).
Overview of DynaXL Algorithm Design
Obtaining highly reliable cross-links using pLink (Yang et al. 2012). The acceptance criteria are the following: each spectrum should have an E value < 10−3, each cross-link must be identified by at least two spectra, and one of them should have an E value < 10−8.
Evaluating the conformational dynamics based on the CXMS data. The domains are defined based on the known structure of a protein and are treated as rigid bodies. For protein complexes, the monomers in the complex are treated as rigid bodies. The protein may solely exist in a single conformation if the known structure can already satisfy all experimental cross-links. Otherwise, there likely exist some alternative conformational states of the protein that give rise to the “over-length” cross-links.
Performing ensemble rigid-body refinement. One of the rigid bodies is kept fixed, and the other rigid body is subjected to translation and rotation. The number of structures in the ensemble is gradually increased if an N = 2 ensemble still cannot satisfy all cross-links. The optimal ensemble size is reached when all CXMS restraints are satisfied. There can be additional conformational state present for the protein system, which however is not captured and manifested by over-length cross-links.
Assessment of the ensemble structures, either by cross-validation or by corroboration from other experimental data.
Materials and Equipment
Software for cross-link identification
pLink (Yang et al. 2012) is the program used for querying a database containing the protein sequence and for identifying the cross-linked peptides.
Software for protein structure modeling
Xplor-NIH (Schwieters et al. 2003), the software package for biomolecular structure refinement against experimental and knowledge-based restraints, is used here to identify an optimal ensemble structure that can account for all CXMS restraints.
AMBER 14 (Case et al. 2014),the molecular dynamic simulation package, is used here to refine the local conformation before ensemble refinement using Xplor-NIH.
PyMOL (the PyMOL Molecular Graphics System) is the software for illustrating and rendering protein structures.
To identify high-confidence cross-links from CXMS experiment;
To obtain the known structure from the PDB database;
To define domain boundaries;
To validate domain definition and evaluate local flexibility;
To classify and identify the cross-links (intra-domain vs. inter-domain, intramolecular vs. inter-molecular cross-links), and prepare the CXMS restraints table;
To prepare the starting structure files for Xplor-NIH;
To patch the cross-linker to the protein structure;
Ensemble rigid-body refinement against the CXMS restraints;
Cross-validation with a subset of cross-links;
To analyze and validate with other types of data.
Here, we use Ca2+-loaded calmodulin as an example to illustrate how DynaXL is used to account for all CXMS data and to afford the ensemble structures.
Identification of cross-links
False discovery rate cutoff of 0.05 is applied and followed by an E value cutoff rate (Yang et al. 2012) of 10−3 at the spectrum level;
Spectral count ≥ 2 and the best E value < 10−8 for each pair of cross-link.
The cross-link spectra that pass the false discovery rate (FDR) cutoff are further filtered with these requirements: (A) each spectrum should have an E value of < 10−3, and (B) each cross-link should be identified in at least two spectra.
Assessment of the predominant structure of the protein
The structure of the predominant conformational state of the protein under investigation can be downloaded from the PDB. For proteins without known structure, the structure can be modeled from homology modeling (Marti-Renom et al. 2000), domain threading (Yang et al. 2015), or fragment splicing (Rohl et al. 2004).
The definition of protein domain boundary is performed with protein domain motion analysis (Hayward et al. 1997; Hayward and Berendsen 1998), multi-threading alignment (Xue et al. 2013; Belsom et al. 2016), or the assessment of evolutionary relationships (Cheng et al. 2014, 2015a, b). It should be noted that the definition of protein domain is not immutable, but will be amended based on further calculation and analysis (see below). For protein complexes, each subunit in the complex is treated as an individual rigid body.
Evaluation of protein local flexibility
Structure completion. The atomic information is often missing in the PDB file, which is especially true for the X-ray structure. The missing parts include flexible loops and N- and C-terminal tails. In addition, hydrogen atoms are usually absent in relatively low-resolution structures. To complete the missing residues, e.g., the first three amino acids in calmodulin PDB structure 1CLL, the build-residue function in PyMOL software is used. To complete the missing atoms, e.g., side-chain atoms or hydrogen atoms, either PyMOL build-residue module or MD simulation software AMBER can be used.
Flexibility evaluation. MD simulation using software AMBER can provide local flexibility for different parts of the protein upon assessing the fluctuation over time. The flexibility can also be assessed from crystal B-factors (Shore et al. 2016) and from NMR heteronuclear NOE values (Shore et al. 2016).
Identification and classification of the cross-links
The discrepancy between the theoretical solvent-accessible inter-residue surface distance and the maximum length of the cross-linker is small, and the over-length cross-links can be attributed to local dynamic of the protein.
The discrepancy between the theoretical solvent-accessible inter-residue surface distance and the maximum length of the cross-linker is large. Local dynamics alone cannot account for all the over-length cross-links. Therefore, the protein has to undergo collective domain movement, and further computational analysis is warranted.
Ensemble rigid-body refinement against CXMS restraints
The ensemble rigid-body refinement against CXMS restraints is performed when the predominant conformation or the known structure of the protein cannot satisfy all intramolecular inter-domain cross-links identified with high confidence. The protein domains are treated as rigid bodies, and their relative orientations are optimized on the basis of explicitly represented CXMS distance restraints. Similar approach is used to optimize the ensemble structures of protein–protein complexes. The details of the process are as follows.
Prepare the initial structure for Xplor-NIH
Different programs may have different atomic naming rules (especially for hydrogen atoms). Therefore, one should first remove all the hydrogen atoms, and use the Xplor-NIH script to re-protonate the protein.
For assessment of protein local dynamics and domain boundaries, it may be necessary to re-number the residues of the protein. The first residue handled by AMBER software is always 1. Please refer to the respective manual for the software used.
The Xplor-NIH will add an extra oxygen atom for the last residue and rename the last two oxygen atoms as OT1 and OT2 in the PSF file. As a result, the PSF file may be inconsistent with the PDB file provided. A quick solution is to duplicate the last line in the PDB file and to name the last two atoms as OT1 and OT2 as follows:
Patch the cross-linker to protein
Presented in this Protocol, we use two common cross-linkers BS2G and BS3, whose PDB and PSF files are provided in “Supplementary material.” For other types of cross-linking reagents, the PDB files can be generated using the build function in PyMOL, and the corresponding parameter file can be obtained from online servers like HIC-Up (Kleywegt 2007) or PRODRG (Schuttelkopf and van Aalten 2004).
For subsequent structure optimization, the cross-linker is only patched to one of the two cross-linked residues or to one of the protein domains. We have found that patching the cross-linker to either domain affords essentially the same results. The peptide bonds are formed between the side-chain of the protein and the cross-linkers, thus requiring the modification of the corresponding atoms. For example, as the nitrogen atom (Nζ) of Lys side-chain is connected with three hydrogen atoms, it is necessary to remove the two extra hydrogen atoms, and the atom types for the remaining nitrogen ant hydrogen atoms are modified accordingly.
The segment ID is another important distinguisher in Xplor-NIH in addition to the residue ID, i.e., residues with the same residue ID values (residue number) and with different segment ID values correspond to different residues. It happens when a residue can be cross-linked to different residues in the opposite domain. Thus, we assign the cross-links at the same residue with different segment ID values. Physically, the multiple cross-links involving same residues should take place one at a time, and accordingly the iso-residue cross-links can overlap with each other without incurring van der Waals clashes during the refinement.
Preparation of the CXMS restraints table
With the cross-linker patched to one domain (or one subunit), the cross-linking process is simulated by enforcing a distance restraint between the end of the cross-linker and the reactive group of the other cross-linked residue. Specifically, it is achieved by constraining the distance between the carbonyl atom in the cross-linker and Lys Nζ atom ranging from 1.3 Å (covalent bond length) to 5 Å (the sum of the VDW radius of both the carbon atom and the nitrogen).
Simulated annealing refinement
When all the input files and constraint files are prepared, a user can start the ensemble rigid-body refinement against the CXMS restraints. As mentioned above, when the given structure cannot satisfy all experimental high-confidence cross-links, the ensemble refinement based on CXMS restraints can be performed. The over-length cross-links capture one or more alternative conformational states. The ensemble refinement process starts with an N = 2 ensemble that comprises the predominant conformation and the alternative one. An additional conformer is included if the N = 2 ensemble cannot satisfy all the cross-links. The process is repeated until the experimental data are fully accounted for.
dyn.fix (“““ segid “ “ and resi 1:77 “““)
dyn.group (“““ segid ALT0 and resi 82:148 “““)
dyn.group (“““ segid ALT1 and resi 82:148 “““)
Here, the N-terminal domain including residues 1–77 of the calmodulin is fixed, and the C-terminal domain including residues 82–148 moves as a rigid body. In addition, the flexible loop region between the including residues 78–81 has full torsional freedom. In the Xplor-NIH script shown above, note that there are two different conformers for the C-terminal domain, marked with segment ID ALT0 and ALT1. The two conformers correspond to the two conformational states of calmodulin. To speed up the computation, the non-bonded van der Waals interactions within each rigid body are not considered and calculated. This is implemented using the following statement.
inter = (segid “ “) (segid ALT0 or resi 82:148)
inter = (segid “ “) (segid ALT1 or resi 82:148)
inter = (segid ALT0 and resi 78:82) (segid “ “ and segid ALT0)
inter = (segid ALT1 and resi 78:82) (segid “ “ and segid ALT1)
weights * 1 end end
The ensemble refinement is carried out by simulated annealing. The system is heated to a relatively high temperature and then slowly cooled down. The structure is refined against the CXMS restraints during the cooling process. The computational process is repeated many times for effective sampling. Finally, the structures with no CXMS violation (satisfying all cross-links) and low energy (no atomic overlap) are selected for further analysis.
We have found that the explicit representation of cross-linker not only provides more realistic and stringent restraints, but also allows better convergence for the ensemble structures, as compared to straight-line Euclidean distance restraints.
Cross-validation with a subset of cross-links
The cross-validation process is performed to verify the accuracy of the ensemble structures. In detail, a subset of CXMS restraints is removed, and the remaining cross-links are used for the ensemble refinement as described above. The ensemble structures generated with a subset of the restraints are evaluated and the CXMS restraints excluded in the refinement should be cross-validated.
Analysis and validation with other types of experimental data
The over-length cross-links capture protein alternative conformations in solution. The ensemble structures obtained by refining against the CXMS restraints may be compared to those obtained from other biochemical and biophysical methods, such as paramagnetic relaxation enhancement (Tang et al. 2006) and small-angle X-ray scattering (Schneidman-Duhovny et al. 2012; Kikhney and Svergun 2015).
Limitations of the DynaXL Method
The ensemble structures obtained based on the CXMS restraints may suffer from certain limitations as described below.
Due to the quality of the mass spectra, false identification of the cross-links may occur. In other words, the experiment may identify incorrect cross-links, even though stringent criteria are applied when selecting high-confidence cross-links. Multiple technical and biological repeats are necessary to minimize false identifications.
Insufficient number of restraints
There may not be a large number cross-links identified with high confidence that can be used as the restraints. Certainly more restraints would enable a researcher to better refine the structure and to discover discrepancy within the restraints. However, it has been shown that the structural model of a protein complex can be obtained from just a single inter-molecular cross-linking restraint (Gong et al. 2015). Thus, the DynaXL approach may only identify the minimum number of ensemble structures that can account for all available CXMS restraints. Should there are more conformational states that elude cross-linking reactions, DynaXL cannot uncover.
The ensemble size may have to be increased to account for all the cross-links. Additional conformers introduce additional parameters, which may lead to over-fitting. It is also possible that some over-length cross-links can be satisfied by intra-domain dynamics without the invocation of domain movement. Thus, cross-validation is important.
CXMS has been increasingly used for protein structure modeling. Here, we present the detailed protocol using DynaXL for explicitly modeling the cross-links and characterization of protein ensemble structures. The chemical cross-linking as well photo-cross-linking are rapidly evolving (Chiang et al. 2016), and new types of cross-linking reagents (Brodie et al. 2016) with various linker lengths and reactivity are becoming increasingly available, which can afford more spatial information between protein residues. In the age of integrative structural biology, protein ensemble structures can be better visualized with the joint refinement against multiple types of experimental inputs including but not limited to NMR, cryo-EM, and FRET.
This work was supported by a Grant from the National Key R&D Program of China (2016YFA0501200), Chinese Ministry of Science and Technology (2013CB910200), and National Natural Science Foundation of China (31225007, 31400735 and 31500595).
Compliance with Ethical Standards
Conflict of interest
Zhou Gong, Zhu Liu, Yue-He Ding, Meng-Qiu Dong, and Chun Tang declare that they have no conflict of interest.
Human and animal rights and informed consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
- Case DA, Babin V, Berryman JT, Betz RM, Cai Q, Cerutti DS, Cheatham TE III, Darden TA, Duke RE, Gohlke H, Goetz AW, Gusarov S, Homeyer N, Janowski P, Kaus J, Kolossváry I, Kovalenko A, Lee TS, LeGrand S, Luchko T, Luo R, Madej B, Merz KM, Paesani F, Roe DR, Roitberg A, Sagui C, Salomon-Ferrer R, Seabra G, Simmerling CL, Smith W, Swails J, Walker RC, Wang J, Wolf RM, Wu X, Kollman PA (2014) AMBER 14. University of California, San FranciscoGoogle Scholar
- Lasker K, Forster F, Bohn S, Walzthoeni T, Villa E, Unverdorben P, Beck F, Aebersold R, Sali A, Baumeister W (2012) Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach. In: Proceedings of the National Academy of Sciences of the United States of America 109: 1380–1387Google Scholar
- Leitner A, Joachimiak LA, Unverdorben P, Walzthoeni T, Frydman J, Forster F, Aebersold R (2014) Chemical cross-linking/mass spectrometry targeting acidic residues in proteins and protein complexes. In: Proceedings of the National Academy of Sciences of the United States of America 111: 9455–9460Google Scholar
- Lossl P, Kolbel K, Tanzler D, Nannemann D, Ihling CH, Keller MV, Schneider M, Zaucke F, Meiler J, Sinz A (2014) Analysis of nidogen-1/laminin gamma1 interaction by cross-linking, mass spectrometry, and computational modeling reveals multiple binding modes. PLoS ONE 9:e112886CrossRefPubMedPubMedCentralGoogle Scholar
- Tan D, Li Q, Zhang MJ, Liu C, Ma C, Zhang P, Ding YH, Fan SB, Tao L, Yang B, Li X, Ma S, Liu J, Feng B, Liu X, Wang HW, He SM, Gao N, Ye K, Dong MQ, Lei X (2016) Trifunctional cross-linker for mapping protein-protein interaction networks and comparing protein conformational states. Elife 5:e12509PubMedPubMedCentralGoogle Scholar
- The PyMOL Molecular Graphics System, Version 1.8 Schrödinger, LLCGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.