Amino-acid selective isotope labeling enables simultaneous overlapping signal decomposition and information extraction from NMR spectra

Signal overlapping is a major bottleneck for protein NMR analysis. We propose a new method, stable-isotope-assisted parameter extraction (SiPex), to resolve overlapping signals by a combination of amino-acid selective isotope labeling (AASIL) and tensor decomposition. The basic idea of Sipex is that overlapping signals can be decomposed with the help of intensity patterns derived from quantitative fractional AASIL, which also provides amino-acid information. In SiPex, spectra for protein characterization, such as 15N relaxation measurements, are assembled with those for amino-acid information to form a four-order tensor, where the intensity patterns from AASIL contribute to high decomposition performance even if the signals share similar chemical shift values or characterization profiles, such as relaxation curves. The loading vectors of each decomposed component, corresponding to an amide group, represent both the amino-acid and relaxation information. This information link provides an alternative protein analysis method that does not require “assignments” in a general sense; i.e., chemical shift determinations, since the amino-acid information for some of the residues allows unambiguous assignment according to the dual selective labeling. SiPex can also decompose signals in time-domain raw data without Fourier transform, even in non-uniformly sampled data without spectral reconstruction. These features of SiPex should expand biological NMR applications by overcoming their overlapping and assignment problems. Electronic supplementary material The online version of this article (10.1007/s10858-019-00295-9) contains supplementary material, which is available to authorized users.

assuming = 3, the decoded amino acid pairs of components 2 and 3, (N)Q and (P)E, did not appear in the sequence. The negative elements of the loading vector along the 1 H dimension of component 1 also implied the overfitting (Fig. S2g). Given that = 2 did not indicate overfitting ( Fig. S2b, f), we concluded that = 2 was the best assumption. As this ROI contains two overlapping signals, (L)E16 and (N)V26, the correct answer of is 2.  (Fig. S3), we rejected the assumption = 6. Since the decomposition assuming = 5 did not indicate the wrong assumption, we concluded that = 5 is the appropriate assumption. The correct answer of is 5.
Note S2. Importance of selectively labeled samples for tensor decomposition.
The NMR spectra acquired with SiCode samples can be regarded as a four-order tensor in SiPex (Eq. 9), while those acquired with a uniformly labeled sample can be regarded as a three-order tensor in relaxation measurement by MUNIN (Eq. 2) (Korzhnev et al. 2001). The additional dimension, SiCode, not only provides amino-acid information but also facilitates the decomposition of overlapping signals. In this supplementary note, we discuss the improvements of the decomposition by the additional SiCode dimension.
The two-dimensional spectra acquired with a uniformly 13 C/ 15 N-labeled sample were assembled to form the three-order tensor illustrated in Fig. S8a. Hereafter, we refer to the decomposition of this three-order tensor as the "MUNIN-equivalent". The relaxation measurements utilized for the values, which were in good agreement with those obtained from the conventional methods (Table 1).
The assignment of the components to the residues is based only on the slight differences in the chemical shifts (Fig. S8b). Note that, in contrast, the additional amino-acid information greatly facilitates the assignments in SiPex (Figs. 2d, S1f).
The MUNIN-equivalent of the same ROI as in Fig. S7, containing actual overlapping signals (L)E16 and (N)V26, was unsuccessful ( Fig. S9a, b). The loading vectors of the two components along the 1 H dimension had both positive and negative values, which is less likely on the phase-corrected NMR spectra. This failure is probably due to the similar loading vectors of the two components along the relaxation dimension (Fig. S2f). As noted in the main text, the other two dimensions are insufficient for a unique solution, due to rotational ambiguity (Orekhov et al. 2001).
In contrast, the decomposition by SiPex was successful, because there were still three remaining dimensions, 1 H, 15 N, and SiCode (Fig. S2f).
Some tensor decomposition software programs apply constraints, such as non-negativity and unimodality, to the loading vectors. The non-negativity constraint on the loading vectors on all dimensions prevents the appearance of negative signals, and as a consequence avoids the decomposition failure described above. Fig. S9c, d shows the successful decomposition of the same dataset as in Fig. S9a, b when non-negativity constraints were applied. In this case, the application of the non-negativity constraints is reasonable, because the NOE enhancements of these non-terminal residues of the globular protein are expected to be positive. However, in general, the non-negativity constraints on the relaxation dimension are not always applicable, since the signals on heteronuclear NOE spectra may be negative. The decomposition failed when the non-negativity constraints were applied only on the 1 H and 15 N dimensions (Fig. S9e, f). More flexible constraints, such as applying non-negativity on some elements of the loading vector, should help the decomposition in these cases.
We have proposed a tensor decomposition program that is compatible with such flexible constraints (Ono and Kasai 2018).
We also performed the MUNIN-equivalent of the same ROI shown in Fig

Note S3. Tensor decomposition with a publicly available program.
In this paper, we used a simple, in-house ALS program to solve the PD problem (See Materials and Methods). The concept of SiPex, to extract information from the loading vectors as a result of PD, is independent from the method of tensor decomposition. Therefore, various other programs/algorithms that can solve PD are also applicable. For example, alternating optimization with primal-dual splitting (Ono and Kasai 2018), a faster algorithm compatible with various regularizations, may be more suitable.
One of the publicly available programs that can solve PD is 'N-way toolbox' (Andersson and Bro 2000). It solves PD using ALS implemented with various acceleration methods (Bro 1997). We confirmed that the same results as our in-house program were obtained with N-way toolbox ( Fig.   S14) with 'random orthogonalized values' as the initialization option. The computation times to decompose the datasets shown in Fig. S14a-e with 100 random initializations were 1.8, 1.3, 78, 59, and 15 seconds, respectively, using dual Xeon E5-2690 v4 CPUs. The corresponding computation times using the in-house program were 27, 26, 422, 399, and 38 seconds, respectively. Therefore, we recommend the publicly available programs such as N-way toolbox for practical use.

Note S4. Compensation of concentration differences between samples.
Since amino-acid discrimination by SiCode and SiPex depends on the signal intensity ratios of the samples, it is important to compensate for systematic errors between samples (named "intensity disturbances" in the previous report (  (a-c) Decomposition of the dataset shown in Figure 2, assuming that the number of components is (a) 1, (b) 2, and (c) 3. The simulated spectra with signal overlapping, the reconstructed spectra Decompositions of the dataset shown in Fig. S7 assuming = 1 to 3 are presented as in Fig. S1. Decompositions of the dataset shown in Fig. 3c assuming = 4 to 6 are presented as in Fig. S1.  Fig. S6. Decomposition of simulated overlapping signals with different intensity levels.
The same experiment as Fig. 2 but the ROI of (R)G75 was multiplied by 0.3 prior to the merging of  properties. On the right of the plots, the extracted information, the relaxation constants of R 1 and R 2 (in s -1 ) and the NOE enhancements, is shown.   The 13 C and 15 N labeling ratios are indicated as percentages.