High dimensional and high resolution pulse sequences for backbone resonance assignment of intrinsically disordered proteins

Four novel 5D (HACA(N)CONH, HNCOCACB, (HACA)CON(CA)CONH, (H)NCO(NCA)CONH), and one 6D ((H)NCO(N)CACONH) NMR pulse sequences are proposed. The new experiments employ non-uniform sampling that enables achieving high resolution in indirectly detected dimensions. The experiments facilitate resonance assignment of intrinsically disordered proteins. The novel pulse sequences were successfully tested using δ subunit (20 kDa) of Bacillus subtilis RNA polymerase that has an 81-amino acid disordered part containing various repetitive sequences.


Introduction
Intrinsically disordered proteins (IDPs) are a group of macromolecules with peculiar structure, dynamics and interactions (Dyson and Wright 2001;Tompa et al. 2006) that allow them to play a number of important biological functions (Ward et al. 2004;Dyson and Wright 2005;Tompa 2010). Solution state NMR spectroscopic techniques appear ideally suited for the studies of these proteins as IDPs' fast conformational dynamics results in relatively slow transverse relaxation rates. The problem is that the rapid interconversion rate for the various conformations causes averaging of chemical shifts and very poor peak separation, making resonance assignment difficult, even for relatively small disordered protein fragments. However, there is a promise for solving this obstacle in multidimensional NMR methods utilizing non-uniform sampling of indirectly detected dimensions (Felli and Brutscher 2009;Coggins et al. 2010;Kazimierczuk et al. 2010a), as this sampling facilitates acquisition of highresolution and high-dimensional spectra.
To date, several strategies have been proposed for effective backbone resonance assignment of IDPs. These strategies include 13 C detection (Bermel et al. 2006(Bermel et al. , 2009, automated projection spectroscopy (APSY; Narayanan et al. 2010), HA detection (Mäntylahti et al. 2011), and the sparsely sampled 4D (Wen et al. 2011) and 5D experiments (Motáčková et al. 2010;Nováček et al. 2011). Below, we propose a set of new pulse sequences that feature high resolution and high dimensionality resulting from the use of sparse random sampling in the indirectly detected dimensions. The novel experiments, which require (because of multiple coherence transfer steps involved) a slow transverse relaxation, were designed for IDPs and are superior in terms of peak resolution and the easiness of resonance assignment of the proteins. The pulse sequences were tested using the 20 kDa d subunit of B. subtilis RNA polymerase. Having an 81 a.a. unstructured part with various repetitive sequences, this macromolecule is an excellent example of an IDP whose resonance assignment is extremely difficult using conventional methods.

Methods
The uniformly 13 C, 15 N-labeled sample of B. subtilis RNA polymerase d subunit was prepared as described previously (Motáčková et al. 2010). All spectra were acquired in a 0.7 mM protein solution sample on a Varian NMR System 700 spectrometer equipped with a Performa XYZ PFG unit, using the standard 5 mm 1 H-13 C-15 N triple-resonance probehead. High-power 1 H, 13 C and 15 N p/2 pulses of 5.9, 13.5 and 31.0 ls, respectively, were used. Selective CA and CO pulses were realized as phase-modulated (for offresonance excitation or inversion) sinc shapes, with B 1 field strength adjusted to have a minimal effect on CO and CA, respectively. In all cases, four scans per each data set were acquired with acquisition time of 85 ms and relaxation delay of 1.2 s. For processing of directly detected dimension, cosine square weighting function was used prior to Fourier transform with zero-filling to 2,048 complex points. The experiments were performed using random off-grid Poisson disk sampling with sampling density set according to a Gaussian distribution (r = 0.5) with regard to maximum evolution time (Kazimierczuk et al. 2008). No apodization was applied in indirect dimensions. The number of complex points M i in the frequency domain of ith indirectly detected dimension was set as M i C 3 9 sw i 9 t i max . The sparse multidimensional Fourier transform (SMFT) procedure (Kazimierczuk et al. 2009), with 'fixed' frequencies derived from 3D HNCO and 4D HNCOCA peak list, was used to obtain F 1 /F 2 cross-sections in 5D and 6D experiments, respectively. The remaining relevant experimental parameters are shown in Table 1.
The pulse sequences were written using own-developed programming library. The resulting spectra were analyzed using the SPARKY software (Goddard and Kneller 2002). The pulse sequence code for Agilent spectrometers as well as the SMFT software used for data processing are available from the authors upon request.

Results and discussion
The first two pulse sequences (5D HACA(N)CONH and 5D (HACA)CON(CA)CONH) are depicted schematically in Figs. 2a and 3a, respectively, and the corresponding coherence transfer pathways are given in Figs. 2b and 3b. Both experiments employ equilibrium magnetization of HA protons. This allows identifying certain chemical shifts of the proline residues whose successors' amide protons are detected. In both experiments, the effective separation of F 1 /F 2 cross-sections is obtained due to the good peak separation in CO-N subspectra. In the HACA(N)CONH Sampling density versus conventional 7.11 9 10 -6 2.34 9 10 -6 2.93 9 10 -6 1.05 9 10 -8 7.80 9 10 -7 /2 (where D stands for coherence transfer delays listed below, t i is the evolution time in ith dimension, and t i max is maximal length of the evolution time delay). Delays were set as follows: D' H-C = 2.6 ms, D CA-N = 28.0 ms, D CA-N-CO = 28.0 ms, D N-CO = 28.0 ms, and D N-H = 5.4 ms. The fourstep phase cycle was used: / 1 = x, -x, / 2 = 2x, 2(-x) and Rec = / 1 ? / 2 . Simultaneous inversion of CA and CO spins was achieved using 6-element composite pulse (Shaka 1985). The coherence selection gradients (marked xyz) were applied at the magic angle. The phase w was inverted simultaneously with the last gradient pulse. (b) Coherence transfer in the peptide chain. H N , N, and CO frequencies (filled rectangles) are 'fixed' for Fourier transform. Frames for HA and CA indicate the dimensions of 2D cross-sections obtained by SMFT procedure. (c) 2D spectral planes for the d subunit of B. subtilis RNA polymerase, which were obtained by SMFT procedure performed on the 5D randomly sampled signal (Poisson disk sampling) with 'fixed' frequencies obtained from 3D HNCO peak list. Each cross-section contains two cross-peaks: for'fixed' H i N , N i and CO i-1 , the peaks correspond to HA i -CA i and HA i-1 -CA i-1 correlations J Biomol NMR (2012) 52:329-337 331 experiment, the sequential connectivities may be obtained from 1 HA and 13 CA chemical shifts, which approach usually fails in the case of IDPs due to poor peak separation. On the other hand, the (HACA)CON(CA)CONH spectra allow, at the expense of additional coherence transfer steps, finding of connectivities with the use 13 CO and 15 N frequencies that are more uniformly distributed over the entire spectral band and therefore are more suitable for the studies of IDPs. This experiment requires extension of the spectral width in the first two dimensions to accommodate correlations of proline residues. In Figs. 2c and 3c we show an example of 2D cross-sections for the I118-L123 fragment of the disordered part of d subunit of B. subtilis RNA polymerase. The out-and-back 5D HNCOCACB experiment shown in Fig. 4 correlates 1 H i N , 15 N i , 13 CO i-1 with 13 CA i-1 and 13 CB i-1 chemical shifts. Contrary to the established CBCANH, CBCA(CO)NH and HNCACB experiments (Grzesiek and Bax 1992a, 1992bWittekind and Müller 1993), the CA and CB evolutions are performed in separate dimensions. This allows to increase the CA-CB coupling evolution delay to 0.5/J CACB and therefore, to double (providing one ignores the relaxation) the sensitivity, which in the case of IDPs compensates for the extended pulse sequence. Although this sequence does not provide sequential connectivities, it allows to assign CA and CB chemical shifts and to identify a.a. residues by comparing the respective chemical shifts with typical values for each amino acid.
The 5D (H)NCO(NCA)CONH experiment is schematically depicted in Fig. 5a together with the scheme of coherence transfer pathway in protein backbone 5b. In this case the magnetization of amide proton's origin is transferred through amide nitrogen and carbonyl carbon nuclei back to nitrogen and to two different CA nuclei, then via the respective CO nuclei to the corresponding coupled NH pairs. In this case, again, 13 CO and 15 N chemical shifts enable peak resolution in the 'fixed' dimensions F 3 and F 4 , and establishing sequential connectivities from 13 CO and 15 N in dimensions F 1 and F 2 . The four-step phase cycle was used: / 1 = x, -x, / 2 = 2x, 2(-x) and Rec = / 1 ? / 2 . Simultaneous inversion of CA and CO spins was achieved using 6-element composite pulse (Shaka 1985). The coherence selection gradients (marked by xyz) were applied at the magic angle. The phase w was inverted simultaneously with the last gradient pulse. (b) Coherence transfer in the peptide chain. H N , N, and CO frequencies (filled rectangles) are 'fixed' for Fourier transform. Frames for N and CO indicate the dimensions of 2D cross-sections obtained by SMFT procedure. (c) 2D spectral planes for the d subunit of B. subtilis RNA polymerase, which were obtained by SMFT procedure performed on the 5D randomly sampled signal (Poisson disk sampling) with 'fixed' frequencies obtained from 3D HNCO peak list. Each cross-section contains two cross-peaks: for 'fixed' H i N , N i and CO i-1 the peaks correspond to N i -CO i-1 and N i-1 -CO i-2 correlations The 6D (H)NCO(N)CACONH pulse sequence is shown in Fig. 6. This sequence was obtained from the aforementioned 5D variant by introducing constant time evolution of CA frequencies, i.e. with no increase in overall sequence duration. The extra resolution gain resulting from the increased dimensionality may be crucial for IDPs' spectra that feature high chemical shift degeneracy. Such an example is given in Fig. 7, where the additional dimension enables to resolve peaks that still overlap in 5D spectra. In this case, however, application of SMFT procedure requires the knowledge of CA i-1 chemical shifts. These shifts can be obtained using 5D HNCOCACB, 5D HabCabCONH , or 4D HNCOCA (Zawadzka-  experiments. The experiments shown in Figs. 5 and 6 are conceptually similar to the (HACA)CON(CA)CONH experiment shown in Fig. 3, but the (HACA) fragment is replaced by (H)N. This modification enables application of band-selective excitation short-transient (BEST) approach that is aimed at the acceleration of acquisition (Schanda et al. 2006;Lescop et al. 2007), but does not allow to find the resonances of proline residues. Notably, the (H)NCO(NCA)CONH sequence has a sensitivity advantage over the (HACA)CON(CA)CONH sequence as the CA ? CO coherence transfer in the latter is attenuated due to the presence of a concurrent 1 J CACB coupling. The respective amplitude transfer functions at this point (i.e. before the first CO evolution period) are shown in (1) and (2) below: where n is the number of HA protons (for Gly: n = 2).
We have set D CACO = 6.8 ms to compromise between J-couplings and relaxation. The same choice was made by Mäntylahti et al. (2011) for other HA-excited experiments and is usually employed in HSQC type HN(CA)CO experiments (for references see Sattler et al. 1999). Assuming T 2HN = 50 ms, T 2N = 50 ms, T 2HA , T 2CA = 20 ms, n = 1, and delay times as given in figure captions, we obtain 0.50 and 0.36 for I(HN) and I(HACA), respectively. Setting D CACO at 28.5 ms, which is close to 1/J CACB , with evolution of J CACO extended to 9.1 ms, further reduces I(HACA) to 0.18. Using the relaxation times T 2HN = 80 ms, T 2N = 100 ms, T 2HA = 40 ms and T 2CA = 50 ms, which seem likely for IDPs (based on our experience), and D CACO = 6.8 ms, one obtains I(HN) = 0.68 and I(HACA) = 0.49, whereas for D CACO of 28.5 ms I(HACA) = 0.47. Therefore, the latter option seems impractical, especially that the relaxation rates and coupling constants may not be uniform in the entire molecule. Despite long duration of the proposed pulse sequences and high sparsity of the sampling schedules employed we have found all expected peaks for the disordered fragment of B. subtilis RNA polymerase d subunit. We have not found any false peaks, i.e. all resonances found were unambiguously assigned in a sequential manner.
In Fig. 8, non-specificity of aliphatic 1 H and 13 C chemical shifts is demonstrated using E168-E171 correlations for d subunit of RNA polymerase from B. subtilis. It is shown that in the repeated glutamic acid fragment the aliphatic chemical shifts do not differ sufficiently for the sequential assignment, while amide nitrogen and carbonyl carbon chemical shifts enable unambiguous assignment.

Conclusions
Random sampling and SMFT processing allow developing novel NMR experiments of high dimensionality and high resolution that would not be feasible using conventional sampling. The new experiments enable simple and unambiguous backbone assignment of IDPs. Importantly, not all of the presented techniques must be used to obtain complete sequential assignment. One can combine various experiments (also from among those published before) to construct an optimal set for a given protein. The use of the The four-step phase cycle was used: / 1 = x, -x, / 2 = 2x, 2(-x) and Rec = / 1 ? / 2 . Simultaneous inversion of CA and CO spins was achieved using 6-element composite pulse (Shaka 1985). The coherence selection gradients (marked by xyz) were applied at the magic angle. The phase w was inverted simultaneously with the last gradient pulse. (b) Coherence transfer in the peptide chain. H N , N, and CO frequencies (filled rectangles) are 'fixed' for Fourier transform. Frames for N and CO indicate the dimensions of 2D cross-sections obtained by SMFT procedure. (c) 2D spectral planes for the d subunit of B. subtilis RNA polymerase, which were obtained by SMFT procedure performed on the 5D randomly sampled signal (Poisson disk sampling) with 'fixed' frequencies obtained from 3D HNCO peak list. Each cross-section contains two cross-peaks: for 'fixed' H i N , N i and CO i-1 the peaks correspond to N i -CO i-1 and N i-1 -CO i-2 correlations techniques we present in this paper provides sequential connectivities via 13 CO and 15 N chemical shifts and enables more straightforward sequential assignment than the 5D experiments published previously. Moreover, the separation of individual spin systems on 2D cross-sections could be very useful for a possible automatic assignment algorithm, which would allow fast and simple resonance assignment, also for large IDPs. The four-step phase cycle was used: / 1 = x, -x, / 2 = 2x, 2(-x) and Rec = / 1 ? / 2 . Simultaneous inversion of CA and CO spins was achieved using 6-element composite pulse (Shaka 1985). The coherence selection gradients (marked by xyz) were applied at the magic angle. The phase w was inverted simultaneously with the last gradient pulse. (b) Coherence transfer in the peptide chain. H N , N, CO and CA frequencies (filled rectangles) are 'fixed' for Fourier transform. Frames for N and CO indicate the dimensions of 2D cross-sections obtained by SMFT procedure. (c) 2D spectral planes for the d subunit of B. subtilis RNA polymerase, which were obtained by SMFT procedure performed on the 6D randomly sampled signal (Poisson disk sampling) with 'fixed' frequencies obtained from 4D HNCOCA peak list. Each plane contains two cross-peaks: for 'fixed' H i N , N i , CO i-1 and CA i-1 the peaks correspond to N i -CO i-1 and N i-1 -CO i-2 correlations