13C-detected NMR experiments for automatic resonance assignment of IDPs and multiple-fixing SMFT processing
- 642 Downloads
Intrinsically disordered proteins (IDPs) have recently attracted much interest, due to their role in many biological processes, including signaling and regulation mechanisms. High-dimensional 13C direct-detected NMR experiments have proven exceptionally useful in case of IDPs, providing spectra with superior peak dispersion. Here, two such novel experiments recorded with non-uniform sampling are introduced, these are 5D HabCabCO(CA)NCO and 5D HNCO(CA)NCO. Together with the 4D (HACA)CON(CA)NCO, an extension of the previously published 3D experiments (Pantoja-Uceda and Santoro in J Biomol NMR 59:43–50, 2014. doi:10.1007/s10858-014-9827-1), they form a set allowing for complete and reliable resonance assignment of difficult IDPs. The processing is performed with sparse multidimensional Fourier transform based on the concept of restricting (fixing) some of spectral dimensions to a priori known resonance frequencies. In our study, a multiple-fixing method was developed, that allows easy access to spectral data. The experiments were tested on a resolution-demanding alpha-synuclein sample. Due to superior peak dispersion in high-dimensional spectrum and availability of the sequential connectivities between four consecutive residues, the overwhelming majority of resonances could be assigned automatically using the TSAR program.
KeywordsIntrinsically disordered proteins 13C direct-detection NMR High-dimensional NMR experiment Non-uniform sampling Automatic assignment Sparse multidimensional Fourier transform
Intrinsically disordered proteins (IDPs) have recently attracted much interest of researchers studying biological mechanisms (Oldfield and Dunker 2014). Notably, ca. 25–30 % of proteins found in Eukaryota are intrinsically disordered (Oldfield et al. 2005). IDPs are extremely flexible and reveal only transient secondary structure, which allows them to play many important roles in living organisms (Wright and Dyson 1999), often connected with signaling and regulation processes. They are also associated with many human diseases, like cancer (Iakoucheva et al. 2002) or neurodegenerative diseases (Uversky and Fink 2004). Much effort has been recently put into improving experimental techniques of studying IDPs. Importantly, X-ray crystallography that is often considered to be the leader in structural proteomics cannot be applied. High mobility of the polypeptide chain usually prevents crystallization, which is essential for X-ray crystallography. On the contrary, NMR provides both structural and dynamic information with atomic resolution. It allows determining the conformational propensities or finding the regions involved in ligand binding. This makes NMR the main tool for studying IDPs.
The first step of every NMR-based protein research is a sequence-specific assignment of resonances. A large variety of experiments appropriate for assignment of IDPs is available (Kazimierczuk et al. 2010; Mäntylahti et al. 2011; Zawadzka-Kazimierczuk et al. 2012b; Solyom et al. 2013; Pantoja-Uceda and Santoro 2013; Liu and Yang 2013; Hellman et al. 2014; Pantoja-Uceda and Santoro 2014; Reddy and Hosur 2014; Piai et al. 2014; Yao et al. 2014; Yoshimura et al. 2015). Among them 13C direct-detected high-dimensional (≥4D) techniques (Nováček et al. 2011, 2012, 2013; Bermel et al. 2012, 2013a) are especially useful, due to their specific features described below.
The high dimensionality of the experiments gives the possibility to circumvent the problem of poor dispersion of chemical shifts, caused by high flexibility of IDP backbone. However, the high flexibility of IDPs has also a positive aspect—it results in reduced transverse relaxation rates when compared to structured proteins of similar size. Thus, even long pulse sequences (employed in high-dimensional experiments) are still practical.
13C detection is helpful if amide protons (typically used for NMR signal detection) undergo fast chemical exchange. It also allows efficient assignment of proline-rich proteins. Proline residue does not give a signal in HN-detected spectra, creating a break in chains of sequentially-linked residues. Notably, prolines, as disorder-promoting residues, are more abundant in IDPs than in folded proteins (Dunker et al. 2008). Another benefit of carbon detection is that it takes an advantage of peak dispersion in CO dimension that is superior to HN, usually used for NMR signal detection.
High-dimensional and 13C-detected experiments require special approach to data acquisition and processing. As the linewidths are inversely proportional to maximum evolution times (in each spectral dimension), conventional sampling of the evolution time space generally does not provide the sufficiently narrow peaks for ≥4D experiments within practically achievable experiment duration. This is caused by the fact that the distance between the equally-distanced sampling points is defined by the Nyquist theorem (Nyquist 2002). Therefore, to acquire high-dimensional data, non-uniform sampling (NUS) of the evolution-time space should be employed. For that, one has to choose a method of processing of such data: projection reconstruction (Freeman and Kupče 2012), automated projection spectroscopy, APSY (Hiller and Wider 2012), maximum entropy (Mobli and Hoch 2008), multi-way decomposition (Malmodin and Billeter 2005), multi-dimensional decomposition (Orekhov and Jaravine 2011), multidimensional Fourier transform (Kazimierczuk et al. 2012), compressed sensing (Holland and Gladden 2014). Finally, there remains an important problem of handling the high-dimensional spectrum. No software for displaying 5D and higher-dimensional spectra is so far available. Moreover, the full-dimensional spectrum would be a very large file (tens GB and more). Thus, usually the full-dimensional spectrum is not calculated. Some methods come with a software for automatic analysis of high-dimensional data [APSY–GAPRO (Hiller et al. 2005), multiway decomposition—PRODECOMP (Malmodin and Billeter 2006)], without giving access to a full spectrum. Another approach is calculating just the regions of the spectra that contain peaks (without losing any information), as in sparse multidimensional Fourier transform, SMFT (Kazimierczuk et al. 2009).
Despite all the aforementioned advantages, 13C-detection is also associated with some experimental limitations. Firstly, 13C detection features lower sensitivity than 1H detection. This is primarily caused by the difference in gyromagnetic ratios of 1H and 13C nuclei. The problem has been alleviated by constant improvement in NMR instrumentation, including 13C-detection optimized cryogenically cooled probes. Another problem of carbon-detected experiments is peak splitting caused by the large value of one-bond homonuclear C–C scalar coupling. This effect can be avoided by applying band-selective carbon homodecoupling during acquisition or so called virtual decoupling (Bermel et al. 2006a).
Taking advantage of the methods presented above (NUS, SMFT, virtual decoupling), several high-dimensional 13C-detected experiments were recently elaborated. To establish the sequential connectivities between the residues, some of the experiments use aliphatic chemical shifts (HAi–CAi and/or HBi–CBi pairs), others—better resolved and thus more efficient COi−1–Ni pairs. Some of the techniques employ amide proton excitation, other excite alpha protons preserving proline signals. All the previously proposed techniques establish the sequential connectivities between two adjacent residues.
In the current study we present two 5D 13C-detected experiments. They are combined with the 4D (HACA)CON(CA)NCO experiment, an extension of the previously published 3D hacaCOncaNCO and 3D hacacoNcaNCO experiments (Pantoja-Uceda and Santoro 2014). The 4D (HACA)CON(CA)NCO experiment serves as a basis spectrum for SMFT processing. This allows us to calculate spectral cross-sections in several ways, which we refer to as multiple fixing. As a result, 4D (HACA)CON(CA)NCO spectrum provides the sequential connectivities (via COi−1–Ni pairs) between four consecutive residues, allowing to resolve even difficult cases with high degree of chemical shift degeneracy. 5D spectra provide additionally amide proton (HNCO(CA)NCO spectrum) and aliphatic HA, CA, HB, CB (HabCabCO(CA)NCO spectrum) chemical shifts. Moreover both 5D techniques provide sequential connectivities between two adjacent residues. The set of experiments allows for efficient automatic assignment of IDPs’ resonances.
Materials and methods
Experimental parameters for indirect dimensions of all experiments
53 h 15 min
78 h 40 mind
81 h 10 mine
The multidimensional Fourier transform (MFT) (Kazimierczuk et al. 2006), implemented in the ToASTD program, was used for processing of the basis 4D (HACA)CON(CA)NCO experiment. The sparse multidimensional Fourier transform (SMFT) (Kazimierczuk et al. 2009), implemented in the reduced program, with ‘fixed’ frequencies derived from the basis spectrum peak list, was used to obtain 2D cross-sections of all experiments. Both ToASTD and reduced programs are available from http://nmr.cent3.uw.edu.pl/software. Prior to direct-dimension processing, both programs performed the appropriate handling of in-phase and anti-phase components so that no doublets appeared in the spectrum. For processing of directly detected dimension, cosine square weighting function was used prior to Fourier transform with zero-filling to 2048 complex points. No apodization was applied in indirect dimensions.
The resulting spectra were displayed using the SPARKY software (Goddard and Kneller 2002). The assignment of protein resonances was done automatically using the TSAR program (Zawadzka-Kazimierczuk et al. 2012a), available from http://nmr.cent3.uw.edu.pl/software. The TSAR input files in which the new experiment types are defined are shown in Supplementary Materials.
Results and discussion
As stated above, handling of high-dimensional spectra is problematic due to large file size and lack of appropriate software for displaying the high-dimensional spectra. Furthermore, such spectra are mostly empty: in multidimensional space (of several hundred frequency points in each dimension) typically just several hundred peaks are present (few peaks per residue). The sparse multidimensional Fourier transform, SMFT (Kazimierczuk et al. 2009), allows to limit the processing to the regions that contain peaks. The information on the location of the regions is gathered by recording certain basis spectrum that shares some dimensions with the high-dimensional spectrum. Thus, for each peak of the basis spectrum, one (or more) cross-section(s) of the high-dimensional spectrum can be calculated, by fixing the frequencies obtained from the basis peak list. Up to now, the basis spectrum was chosen in a way that allowed to calculate just a single cross-section for each peak (Kazimierczuk et al. 2009, 2010; Nováček et al. 2011; Zawadzka-Kazimierczuk et al. 2012b; Bermel et al. 2013a). In the present study, we chose the basis spectrum in a way that allows us to calculate a few different cross-sections for each basis peak. Such multiple fixing, will make it possible to do an easy expansion of the spin systems and thus facilitates the establishment of sequential connectivities.
Experimental techniques and data processing
We developed a set of 13C-detected high-dimensional techniques, consisting of 4D (HACA)CON(CA)NCO, 5D HabCabCO(CA)NCO and 5D HNCO(CA)NCO. It allows the assignment of the following resonances: HN, N, CO, CA, CB, HA and HB. However, not always all of these experiments are needed to obtain the resonance assignment. As shown below, the 5D HNCO(CA)NCO experiment can be excluded and still the assignment can be obtained. All of these experiments are recorded using NUS, which allows to achieve extraordinary resolution in a reasonable experimental time. The first two of the above experiments exploit excitation of aliphatic protons, allowing preservation of proline signals. The last one, 5D HNCO(CA)NCO, excites amide protons and is the only one which yields amide proton chemical shifts. In all three techniques the signal is acquired on carbon channel, and the detected nuclei are carbonyl carbons.
4D (HACA)CON(CA)NCO is processed with multidimensional Fourier transform, MFT (Kazimierczuk et al. 2006). The obtained 4D spectrum serves as the basis spectrum during SMFT processing. Two other experiments are processed with sparse multidimensional Fourier transform, SMFT (Kazimierczuk et al. 2009). For such spectra the multiple fixing method can be applied. Interestingly, also the basis spectrum can be SMFT-processed (also with multiple fixing), basing on itself (see below).
The strategy of resonance assignment and multiple fixing method
The strategy of resonance assignment is based on the parallel analysis of cross-sections of various multidimensional spectra. The four-level process, performed automatically by the TSAR program (Zawadzka-Kazimierczuk et al. 2012a), includes: (1) formation of cross-section spin systems (CSSSs, a data structure corresponding to each basis peak, containing the chemical shifts of the nuclei of the i − 2, i − 1, i, i + 1 and i + 2 residues), (2) finding the sequential connectivities between the CSSSs and formation of CSSSs chains, (3) recognition of amino acids that may correspond to each CSSS, (4) mapping of the CSSSs chains on the protein sequence.
It is worth mentioning that not all ways of fixing are equally useful. On those where direct dimension is not fixed the ridges of artifacts along the indirect dimension may appear, making such planes more difficult to work with than those with fixed direct dimension. The contribution of sampling artifacts to the overall spectral noise depends on the technique sensitivity. In a case of low-sensitive experiments with the low dynamic range of peak amplitudes, including typically all 13C-detected ones, the artifacts constitute just a tiny proportion of all spectral noise. For more sensitive techniques, the problem of spectral artifacts can be overcome by using one of the artifact-cleaning methods (Kazimierczuk et al. 2007; Coggins and Zhou 2008; Stanek and Koźmiński 2010). Various cross-section types may differ in a degree of planes overlap. It depends on the resolution in fixed dimensions. As direct dimension features the highest resolution, usually the planes obtained without direct dimension fixing overlap more severely than those calculated with direct-dimension fixing. Usually, if two types of fixing yield identical peaks, it is more beneficial to use the one with direct-dimension fixing. In the proposed 5D experiments (HabCabCO(CA)NCO and HNCO(CA)NCO) in one fixing type (2i and 3i) three different frequencies are used for cross-sections calculation, while in two other fixing types (2ii and 3ii, 2iii and 3iii) just two different frequencies are used—dimensions 3 and 5 are fixed based on the same frequency (see Figs. 2b, 3b). Therefore, sets of frequencies used for fixing 2i and 3i are more unique then those used for other fixing types. It results with lower degree of cross-section overlap for fixing 2i and 3i.
In principle, a full-dimensional spectrum contains all the information available from the given experiment. Obviously, multiple fixing will not yield any new information, but provides an easy access to the available information. It facilitates expanding of the CSSSs to promote finding more sequential connectivities. In consequence the assignment process is more efficient.
The difficulties in the assignment process may appear when some peaks are missing in the basis spectrum. It can happen due to the low sensitivity of the basis experiment or due to overlap of the COi–Ni+1–Ni–COi−1 and COi–Ni+1–Ni+1–COi peaks, which have opposite signs. Then, the whole CSSS is missing, as the corresponding cross-sections of the high-dimensional spectra are not calculated. This may interrupt the formation of chains of CSSS. Nonetheless, as the basis spectrum yields the sequential connectivities via three N–CO pairs, such a gap is not critical. Shorter chains (with a gap between) can be still sequentially linked and all the chemical shifts of the “lacking” residue can be acquired from the adjacent CSSSs.
Due to high dimensionality of the experiments, the strategy is capable to cope with proteins with high degree of chemical shift overlap. Excitation of aliphatic proton nuclei (utilized in two of the three spectra used) and 13C-detection makes it better suited to proline-rich IDPs than more sensitive, in the case of slow chemical exchange, HN-excited and/or—detected techniques. Comparing to previously published strategies employing 13C-detected high-dimensional experiments (Nováček et al. 2011, 2012, 2013; Bermel et al. 2012, 2013b), the proposed strategy seems to be very robust, as the sequential connectivities are established via chemical shifts of seven nuclei types: N, CO, HN, CA, CB, HA and HB. The former three ones (N, CO, HN) are especially useful in case of IDPs. They are typically better resolved than the aliphatic carbon and proton chemical shifts, which in IDPs are strongly dependent on the particular amino acid. Moreover, two of the chemical shifts (N and CO) provide the links between four residues. Similar level of sequential linking was proposed in (Bermel et al. 2013c), which can use (dependently on the set of chosen pulse sequences) N and CO chemical shifts of three residues and HN, CA, CB, HA and HB of two residues. The main difference between the two strategies are the fixed dimensions, which influence the level of cross-section overlap. In the strategy of Bermel et al. in 4D pulse sequences two frequencies (COi−1 and Ni) are fixed for SMFT and in 5D pulse sequences CAi−1 is fixed additionally. In the strategy proposed in the present article, in one fixing type of the 5D techniques, COi, Ni and COi−1 frequencies are fixed. Such triples are better resolved than the CAi−1, COi−1 and Ni triples utilized by Bermel et al., which is beneficial, regarding the level of cross-section overlap. On the other hand, other fixing types of the 5D spectra and the basis spectrum utilize COi−1 and Ni pairs, which not always allows to resolve the cross-sections. To sum up, the two approaches are of similar quality.
All presented techniques were tested on alpha-synuclein protein and the data were used to assign the resonances. To get the assignment at least one experiment assuring the sequential connectivities and one experiment yielding CB chemical shifts (for amino acid recognition) are needed. Thus, assignment can be done using just the 4D (HACA)CON(CA)NCO and 5D HabCabCO(CA)NCO data (below we call it data set 1). However, such a data set does not yield HN chemical shifts. Therefore, also the second data set (data set 2) was constructed, including additionally 5D HNCO(CA)NCO spectrum. Data set 1 was acquired within ca. 132 h of the measurement time, while data set 2—within ca. 213 h. Due to the high signal to noise ratio in the 5D spectra, the data was transformed again using smaller number of increments (on NUS data such operation does not cause resolution loss). It was found that both 5D HabCabCO(CA)NCO and 5D HNCO(CA)NCO spectra obtained using 1200 increments (2/3 of the original number of increments) contained identical set of peaks: no false peak appeared and no true peak disappeared. It means that these experiments could be acquired faster, without any information loss. In such a case, the total measurement time would be <106 h for data set 1, and almost 160 h for data set 2.
Results obtained with the TSAR program
Total measurement (time, h)
Assigned resonances correct/incorrect (%)
For both data sets the chains of cross-sections were identical. 123 cross-sections were correctly assigned. Most of them (106) were parts of long (≥8 residues) chains, whose assignment is the most reliable. Just three of them were parts of short (1–2 residues) chains. Out of three incorrectly assigned cross-sections, two were in “chains” of length 1 and one was at the very end of a long chain.
In the present study we proposed two new high-dimensional experiments 5D HabCabCO(CA)NCO and 5D HNCO(CA)NCO) and a 4D (HACA)CON(CA)NCO extension of the previously published 3D experiments, dedicated for resonance assignment of proteins featuring low chemical shift dispersion, like IDPs. Since all these experiments exploit 13C detection and two of them use HA and HB proton excitation, this set of experiments is suitable for samples of high content of proline residue or featuring high level of chemical exchange of amide protons. NUS allows to acquire these experiments in limited measurement time. Using sparse multidimensional Fourier transform together with multiple fixing method provides an easy access to the high-dimensional data and facilitates spectral analysis. The TSAR program can perform the assignment automatically. On the example of 140-amino-acid-long IDP it was shown that the proposed techniques enable almost complete automatic resonance assignment.
Anna Zawadzka-Kazimierczuk thanks the Foundation for Polish Science for support with the POMOST program. This work was also supported by the Grant No. IP2012 062772, funded by Polish Ministry of Science and Higher Education for years 2013–2015.
- Goddard TD, Kneller DG (2002) Sparky 3. University of California, San FranciscoGoogle Scholar
- Hellman M, Piirainen H, Jaakola V-P, Permi P (2014) Bridge over troubled proline: assignment of intrinsically disordered proteins using (HCA)CON(CAN)H and (HCA)N(CA)CO(N)H experiments concomitantly with HNCO and i(HCA)CO(CA)NH. J Biomol NMR 58:49–60. doi:10.1007/s10858-013-9804-0 CrossRefGoogle Scholar
- Nováček J, Janda L, Dopitová R et al (2013) Efficient protocol for backbone and side-chain assignments of large, intrinsically disordered proteins: transient secondary structure analysis of 49.2 kDa microtubule associated protein 2c. J Biomol NMR 56:291–301. doi:10.1007/s10858-013-9761-7 CrossRefGoogle Scholar
- Yoshimura Y, Kulminskaya NV, Mulder FAA (2015) Easy and unambiguous sequential assignments of intrinsically disordered proteins by correlating the backbone (15)N or (13)C′ chemical shifts of multiple contiguous residues in highly resolved 3D spectra. J Biomol NMR. doi:10.1007/s10858-014-9890-7 MATHGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.