A simplified recipe for assigning amide NMR signals using combinatorial 14N amino acid inverse-labeling

  • Hidekazu Hiroaki
  • Yoshitaka Umetsu
  • Yo-ichi Nabeshima
  • Minako Hoshi
  • Daisuke Kohda
Article

DOI: 10.1007/s10969-011-9116-0

Cite this article as:
Hiroaki, H., Umetsu, Y., Nabeshima, Y. et al. J Struct Funct Genomics (2011) 12: 167. doi:10.1007/s10969-011-9116-0

Abstract

Assignment of backbone amide proton resonances is one of the most time-consuming stages of any protein NMR study when the protein samples behave non-ideally. A robust and convenient NMR procedure for analyzing spectra of marginal-to-low quality is helpful for high-throughput structure determination. The 14N selective- and inverse-labeling method is a candidate solution. Here, we present a simplified protocol for assigning protein backbone amide NMR signals. When 14N inversely labeled residues are present in a protein, their backbone NH cross peaks vanish from the protein’s 1H–15N HSQC spectrum, and thus, their chemical shifts can be readily identified by a process of elimination. Some metabolically related amino acids, for example, Ile, Leu, and Val, cannot be individually incorporated but can be inversely labeled together. We optimized and simplified the protocol and M9-based medium formula for the 14N selective- and inverse-labeling method without any additives. Our approach should be cost-effective, because the method could be additively applied stepwise, even when the proteins of interest were found to be non-ideal.

Keywords

Combinatorial inverse-labeling Aβ(1–40) peptide NMR sample preparation Isotope labeling 

Abbreviations

HSQC

Heteronuclear single quantum coherence spectroscopy

IPTG

Isopropyl-β-d-galactoside

IL1β

Interleukin-1β

SOFAST-HMQC

Band-selective optimized flip-angle short-transient heteronuclear multiple quantum coherence spectroscopy

BEST

Band-selective excitation short-transient

TROSY

Transverse relaxation optimized correlation spectroscopy

Introduction

The development of a robust and cost-effective NMR method that is applicable to assignment of backbone amide proton resonances of proteins with non-ideal properties is a challenge in structural genomics research. Usually, assignment of the backbone resonances is one of the most time-consuming but indispensable stages of any NMR study. A 1H–15N HSQC spectrum (and its variants, for example, SOFAST-HMQC [1] and TROSY [2]) provides a fingerprint of the protein of interest and is thereby important as a reference spectrum for other triple resonance experiments. The information obtained from a 1H–15N HSQC spectrum is not only necessary for 3D structure determination but also indispensable for examining protein dynamics, H/D exchange, and protein–ligand interactions. A process for assignment of the backbone resonances of proteins with molecular weights as large as 20 kDa routinely starts with preparation of a uniformly 13C/15N-labeled non-deuterated protein. Using this sample, several pairs of 3D NMR experiments are recorded, for example, HNCA/HN (CO)CA, HNCACB/CBCA (CO)NH, and HNCO/HN (CA)CO [3, 4, 5]. Among these pairs, one provides information on the intraresidual spin systems while another provides details of the interresidual connectivities. Thus, analysis of these data sets leads to sequence-specific assignment of all main chain signals along the peptide sequence.

Nevertheless, non-ideal properties of a protein sample often hamper the analysis of a protein’s NMR data set because some key resonances are often missing. Such undesired behaviors include limited solubility, nonspecific self-association, chemical exchange, and internal motion of the protein. These properties often result in low signal-to-noise ratios of certain cross-peaks in 3D NMR experiments, for example, those arising from 13Cβs in HNCACB and CBCA (CO)NH. As a result, assignment of the backbone resonances becomes difficult or even impossible because connectivities between residues are not uniquely identified. In addition, some non-ideal experimental conditions provide similar results. Examples of non-ideal NMR conditions of physiological or biophysical interest include highly viscous conditions, NMR analysis of a protein inside a living cell, presence of lipid micelles, and a high level of denaturants or salts.

At present, there are several options for solving the problem of missing resonances: (a) use a SOFAST-/BEST- NMR pulse scheme to gain signal intensity per experimental time [1, 6]; (b) use a nonlinear, sparse sampling method for the indirect dimension of a 3D/4D NMR experiment to minimize the measurement time and to maximize S/N [7, 8, 9]; and (c) use protein samples that contain selectively 15N-labeled residues [10, 11, 12]. However, there exist many drawbacks associated with these solutions in terms of the costs. For example, using the third strategy, the cost and effort associated with sample preparation is a major concern when projects like structural genomics that require massive protein preparation and NMR data acquisition are involved. The cost and effort required to introduce selectively labeled amino acids by using cell-free translation systems are now relatively minimized, but are still relevant [12].

In this study, we describe a simple method to obtain the information necessary for residue-type assignments that uses “combinatorial inverse-labeling” of sets of specific amino acids, an idea that was initially proposed by Shortle [13]. In the original method, MOPS minimal media [14], modified such that glycerol was the sole carbon source, was employed with the combinatorial inverse-labeling method. A matrix of four combinations of amino acids redundantly covering 17 amino acids was designed and inversely labeled. The resulting five HSQC data sets (four inversely labeled and one uniformly labeled) were processed with the “labeling pattern matrix” to identify each amino acid type independent from the other 3D experiments. One major drawback of this method is the requirement of a special processing application.

Here, we further optimized the method for high-throughput structural determination in the current manner. By this method, we prepared 15N uniformly labeled and 14N selectively and inversely labeled protein samples as recombinant proteins expressed in Escherichia coli BL21(DE3) that were grown in M9 minimal media containing 15NH4Cl as the sole nitrogen source with or without supplementation of unlabeled (14N) amino acids. In addition, we proposed some new sets of combinations of metabolically related amino acids, which were found to be useful.

Materials and methods

Protein techniques

The E. coli expression plasmid for IL1β, which has an added N-terminal methionine and whose synthesis is under the control of the ptac promoter, was generously provided by Dr. Y. Kikumoto [15]. 15N-labeled recombinant IL1β and 14N selectively and inversely labeled IL1βs were expressed in E. coli BL21(DE3) that had been transformed with the plasmid. Cells were grown at 30°C in 0.1 L M9 minimal media containing 15NH4Cl as the sole nitrogen source with or without unlabeled amino acid(s). Protein production was induced by adding IPTG to a final concentration of 1 mM when the OD600 of a culture reached 0.4. Cells were harvested after 4 h. The details of this expression protocol are outlined in Table 1. Each cell-free extract was applied to a S-Sepharose fast flow column (GE-Healthcare Biosciences), and the IL1β was eluted using a NaCl gradient containing 20 mM sodium phosphate (pH 6.5).
Table 1

The procedure for 14N selective- and inverse-labeling of a protein

1.

Prepare 15N M9 medium containing glucose (4 g/L), 15N-ammonium chloride (0.7 g/L), NaH2PO4–12H2O (15 g/L), K2HPO4 (3 g/L), NaCl (0.5 g/L), MgSO4 (0.24 g/L), CaCl2-H2O (15 mg/L), vitamins, nucleobases (each at 100 mg/L), and ampicillin (50 mg/L)

2.

Pick three to five colonies of E. coli BL21(DE3) to add into 1 mL LB medium and grow the culture overnight at 30–37°C

3.

Add 1 mL of the bacterial culture to 50 mL M9 medium containing 15NH4Cl and incubate at 30–37°C until the OD600 of the culture is 0.8–1.0 (pre-culture medium)

4.

Add the pre-culture medium to 450 mL of M9 medium containing 15NH4Cl and incubate at 30–37°C until the OD600 of the culture is 0.2–0.4 (main culture medium)

5.

Add 100 mg/L (final concentration) of each 14N selected-labeled amino acid* and incubate the culture at 30–37°C for an additional 30–60 min

6.

Induce protein expression with the addition of 0.5–1.0 mM IPTG (final concentration)

7.

Incubate the culture at 30–37°C for an additional 2–4 h

8.

Harvest the cells

* Use of a stock solution of 20 mg/mL (~pH 10) is recommended for Tyr and Phe. Other amino acids can be added in powder form

The E. coli expression plasmid for human Aβ(1–40) peptide, which has a N-terminal maltose-binding protein tag and a linker containing tobacco etch virus protease cleavage site followed by Aβ(1–40), was generously provided by Dr. D. Hamada. 15N-labeled recombinant Aβ(1–40) and 14N selectively and inversely labeled Aβ(1–40)s were expressed in E. coli BL21(DE3) that had been transformed with the plasmid. Cells were grown at 37°C in 0.1 L M9 minimal media containing 15NH4Cl as the sole nitrogen source with or without unlabeled amino acid(s). Protein production was induced by adding IPTG to the final concentration of 1 mM when the OD600 of a culture reached 0.45. 14N-amino acids were added 45 min prior to IPTG induction. Cells were harvested after 18 h at 20°C in order to prevent aggregation of Aβ(1–40). Each cell-free extract was applied to an amylose resin column (New England Biolabs), and the fusion protein was eluted by maltose. The fusion protein was treated with tobacco etch virus protease (New England Biolabs), and the resultant peptide was finally purified by reversed-phase HPLC using a linear acetonitrile gradient containing dilute ammonium hydroxide.

NMR spectroscopy

Samples used for NMR spectroscopy were approximately 0.5 mM and 0.1 mM in fully 13C/15N-labeled and inversely labeled protein, respectively, 5% D2O–95% H2O, and 20 mM sodium phosphate (pH 5.5). For assignment of the backbone resonances, initially, HNCA, HN (CO)CA, HNCACB, CBCA (CO)NH, HNCO, HN (CA)CO, and 3D 15N-edited NOESY-HSQC spectra of fully 13C/15N-labeled proteins were recorded at 25°C using a 600 MHz Bruker DMX600 spectrometer or a 600 MHz Bruker AVANCE-III spectrometer [1]. Then, 1H–15N HSQC spectra or 1H–15N SOFAST-HMQC spectra of several inversely labeled proteins were recorded for residue-type assignment according to the backbone resonance assignment circumstances. Data were processed using NMRPipe [16]. IL1β backbone signal assignments were taken from the literature [17]. Aβ(1–40) backbone signal assignments were taken from the literature [18] and confirmed by this method.

Results and discussion

To obtain the information necessary for backbone signal assignments when using data sets derived from low-quality 3D NMR spectra, we developed an assignment strategy using a “combinatorial inverse-labeling” method. According to this method, we prepared 15N uniformly labeled and 14N selectively and inversely labeled IL1βs. We expressed them in E. coli BL21(DE3) grown in M9 minimal media that contained 15NH4Cl as the sole nitrogen source with or without supplementation of specific 14N-labeled amino acids at concentrations of 100 mg/L each. The 14N amino acids were added to the cultures approximately 30–60 min prior to IPTG-induced protein expression. Because E. coli BL21(DE3) was a competent host strain, no special auxotrophic host strains were needed. The simplified fermentation protocol is summarized in Table 1.

We carefully selected combinations of amino acids to be inversely labeled. In detail, these amino acids are the final products of isolated metabolic pathways or are synthesized in metabolically related pathways. The expression of 15N uniformly labeled IL1βs was possible in E. coli BL21(DE3), which produces large amounts of recombinant proteins in the minimal media [19]. We expressed the labeled IL1β in BL21(DE3) cells from plasmid DNA under the control of the ptac promoter. Next, expression for other 14N inversely labeled proteins was examined. We chose three metabolically isolated amino acids (Arg, Lys, and Ala) and four combinations of amino acids (Phe/Tyr, Ile/Val/Leu, His/Trp, and Gly/Ser/Cys) to label IL1β. Note that the biosynthetic pathways for His and Trp are metabolically distinct from each other, and hence a protein need not be labeled with both amino acids at once. Nevertheless, in this study, we used these two amino acids simultaneously because His and Trp are usually present in low concentrations in the protein. We could identify each residue type by its absent or weakened cross peak in the reference 1H–15N HSQC spectrum (Fig. 1). The targeted cross peaks’ intensities decreased by greater than 95% for all spectra, while the method did not affect the other cross peak intensities (Table 2). The incorporation ratio of 14N inversely labeled amino acids by each method was quantified (Supplementary Table 1). Thus, our method can provide a large chunk of the missing information, specifically, the residue-type information associated with amide cross peaks, even if many 13Cβ chemical shifts in the HNCACB and CBCA (CO)NH spectra are absent.
Fig. 1

Expanded regions of the 14N selectively and inversely labeled IL1βs’ 1H–15N HSQC spectra. The inversely labeled proteins, named according to the 14N-labeled residues that they contain, are a Lys, b Arg, c Ala, d (Phe/Tyr), e (Ile/Val/Leu), f (His/Trp), and g (Gly/Ser/Cys). The control spectrum, which is that of fully labeled IL1β, is shown in h. The full range expansion spectrum of (h) with assignments is shown in Supplementary Fig. 1. The eliminated cross peaks are marked by crosses and residue names

Table 2

Cross peak identification in the 1H–15N HSQC of IL1β inversely labeled with the selected 14N amino acid(s)

Amino acid(s)

Number of expected cross peaks

Number of eliminated cross peaks

Number of overlapping cross peaksa

K

15

14

1

R

3

3

0

A

4

3

1

I/V/L

5/11/15

31

0

H/W

1/1

2

0

G/S/C

8/14/2

23

1

F/Y

9/4

9b

0

aThe positions of these peak(s) overlap with other peaks. It was not possible to identify them as eliminated peaks by manual inspection

bFour of these peaks were barely visible but could be identified as phenylalanine or tyrosine cross peaks

We also employed our method for de novo assignment of several protein NMR spectra, for example, those of the PX domain from human p47phox NADPH oxidase (PX-p47phox, residues 1–128, 130 total residues) [20], the DNA binding domain of the Drosophila melanogaster gcm transcriptional factor (gcm-DBD, 158 residues) [21, 22], the TIR domain from MyD88 (residues 148–296, 149 residues) [23, 24], and the second LOV domain (LOV2) from Arabidopsis thaliana phototropin 2 (LOV2phot2, residues 363–500, 140 total residues; manuscript in preparation). In all cases, 3D-NMR data sets were analyzed using only spectra recorded with fully 13C/15N-labeled protein samples prepared at concentrations of 0.3–0.6 mM—perdeuterated samples were not needed. For PX-p47phox, all observable amide signals were assigned when the data sets for PX-p47phox that had been selectively and inversely labeled with arginines and lysines (designated Arg and Lys, respectively) were used to eliminate ambiguities. For gcm-DBD, more than 95% of the backbone signals were unambiguously assigned by incorporating the information contained in the 1H–15N HSQC spectra of Arg and Lys samples during an iterative computer assignment followed by manual examination. The assignment results were further confirmed using the data obtained from the 1H–15N HSQC spectra of (Phe/Tyr) and (His/Trp) samples. Assignment of the gcm-DBD resonances proved to be our most difficult test case because of poor signal-to-noise ratios found for many of the 3D NMR spectra. Only 60% of the intra-residue CO, 30% of the inter-residue Cβ, and 10% of the intra-residue Cβ cross peaks were observed in the HN (CA)CO, CBCA(CO)NH, and HNCACB spectra, respectively (Fig. 2). For LOV2phot2, more than 95% of its backbone signals were unambiguously assigned when the data sets obtained from the 1H–15N HSQC spectra of (Ile/Val/Leu) and (Phe/Tyr) were included in the assignment procedure. Note that these examples included not only a T7 promoter-driven pET-based expression system, which was originally described by Shortle [13], but also a tac-promoter-based pGEX GST-fusion protein expression system. We further confirmed the expression of the inversely labeled sample with a pMal MBP-fusion system as well. In essence, there is no difference between these expression systems.
Fig. 2

An example of the backbone assignment of GCM-DBD using inversely labeled protein. Expanded regions of the of the 14Arg-labeled and fully labeled references and GCM-DBD’s 1H–15N HSQC spectra are shown in a and b, respectively. The eliminated signals for R111 and R116 are indicated as open squares. Selected strip plots of c HNCA and d HNCO (black)/HNCACO (red) spectra overlays corresponding to the residues N110 to E118 are shown. Sequential connectivities between strips are indicated by arrows

One possible problem associated with this combinatorial labeling method concerns potential ambiguity of assignment(s) when an eliminated cross peak overlaps another cross peak of any residue type. Usually, it will be easy to classify resonance(s) according to residue type without ambiguity when the 1H–15N HSQC spectrum of the inversely labeled sample and the reference spectrum are compared. If there are not many unassigned backbone resonances to begin with, a visual examination of the two spectra should suffice. However, it often becomes difficult to begin to assign backbone resonances because many of the necessary inter- and intra-residue cross peaks, especially those involving Cβs and COs, are missing within a data set. These NMR cross peaks are needed to make connectivities between two successive residues; however, the cross peaks between backbone amides and the intra-residual Cβs and COs tend to vanish. This is because the two triple-resonance experiments HNCACB and HN (CA)CO are remarkably insensitive compared to CBCA(CO)NH and HNCO, respectively, in cases of non-ideal samples. When the aforementioned difficulty occurs, it may be resolved if an automated assignment program (e.g., AUTOASSIGN [25], MAPPER [26], MARS [27], KUJIRA [28], or Olivia [29]) is used in conjunction with the additional data acquired from inverse-labeling experiments. For the experimental work reported herein, we used a program developed in-house that relies on a simple exhaustive search algorithm for residue assignment. The data set inputted into these assignment programs usually consists of peak lists obtained from six 3D NMR experiments: HNCA, HNCO, HN(CO)CA, HN(CA)CO, CBCA(CO)NH, and HNCACB. The residue-type information obtained from 1H–15N HSQC spectra of inversely labeled proteins would be added to these input data. Using our simplified method, selected combinations of amide signal elimination data were added in a stepwise manner according to the completion of the assignment. During the assignment procedure, the cross peak data generated from a 3D 15N-edited NOESY spectrum should not be used as additional input because this data can be used later to eliminate any assignment ambiguities. Many (e.g., 10–30) sequential assignment possibilities may be generated by a semi-automatic program, but often the correct assignment can be confirmed using the 15N-edited NOESY connectivity data. Finally, backbone resonance assignments can be further confirmed by manually tracing the connectivity of the sequential NOEs. Interpretation of the spectra of non-ideal protein samples should benefit from the use of the strategy outlined above because NOESY experiments usually have greater signal-to-noise ratios compared with HNCACB and HN(CA)CO experiments. Our NMR assignment procedure using inversely labeled protein is summarized in Fig. 3.
Fig. 3

Scheme for the iterative procedure for assignment of backbone resonances using inversely labeled protein

Another limitation of this method is the harvesting time of the bacteria after the addition of the 14N amino acids. For example, some aggregation-prone proteins are available in their soluble form only when the proteins are expressed at a low temperature. For assessing this problem, we prepared (Arg/Lys) and (Ile/Leu/Val) Aβ(1–40) peptide (40 residues, Fig. 4) as a good example of an aggregation-prone protein sample. The sample was prepared by expressing a maltose binding protein fusion form of Aβ(1–40) at 20°C for 18 h using a pMAL-c2-based plasmid, followed by tag removal by TEV protease and purification by HPLC. While one lysine and two arginine peaks were eliminated in the spectrum of (Arg/Lys), no inverse-labeling effect was observed in the spectrum of (Ile/Leu/Val) because of metabolic scrambling. Therefore, long culture times such as 18 h after IPTG induction should be avoided.
Fig. 4

Expanded regions of the 14N selectively and inversely labeled Aβ(1–40)s’ 1H–15N SOFAST-HMQC spectra. The inversely labeled proteins, named according to the 14N-labeled residues that they contain, are a (Arg/Lys) and b (Ile/Val/Leu). The control spectrum, which is that of fully labeled Aβ(1–40), is shown in c. The inversely labeled cross peaks are marked by crosses and residue names

The merit of our simplified method compared with existing labeling methods (e.g., residue-specific 15N-labeling methods [30, 31] and a side chain/residue-specific 12C-inverse-labeling method, where other carbons are 13C-labeled [32]) is in the preparation of proteins with specific, metabolically related amino acids that are 14N inversely labeled in combination. Although our method requires two to seven additional protein samples with different 14N inversely labeled amino acids, the required concentration of each sample is less than 0.1 mM, which suffices for a single experiment of 1H–15N HSQC. Such samples are routinely prepared economically with a small scale fermentation (i.e., 0.2 L). In addition, inverse-labeling is routinely achieved by expressing the target protein in E. coli strain BL21(DE3) with the normal glucose-based M9 minimal medium—auxotrophic host strains, cell-free protein expression systems, or specialized glycerol-based MOPS medium are not required. This implies that one can skip the step for optimizing the conditions of individual experiments to prepare the additional inversely labeled samples. Thus, we found our method experimentally advantageous because it can be used additively after the standard NMR assignment experiments using the fully labeled sample, even when the quality of some of the 3D NMR spectra were found to be insufficient to complete the full assignment. Finally, this approach is not necessarily limited by sample viscosity (suitable for highly viscous samples) or protein locality (compatible with an in-cell NMR approach with E. coli) [33].

Conclusion

Our simplified combinatorial 14N selective- and inverse-labeling method has enormous potential for NMR-based structural biology studies of proteins, because the strategy should be widely applicable for proteins with non-ideal properties at any stage of the research project. The information from the spectra of 14N selectively and inversely labeled proteins could be applied to compensate for an otherwise incomplete 3D NMR data set in a stepwise manner. The method reduces the ambiguity often present during the initial sequential assignment trials and can confirm the tentative assignments derived from 3D-NMR spectra of marginal quality because the specific residue type of a cross peak can be readily identified as a result of the elimination of its cross peak in the 1H–15N HSQC spectrum.

Acknowledgments

We thank the people who helped to confirm the versatility of the method by applying it to each of the proteins: Dr. M. Shimizu (gcm-DBD), Mr. M. Itoh (LOV2phot), Dr. H. Tochio and Dr. H. Ohnishi (TIR-MyD88). We also thank Dr. D. Hamada for providing an Aβ expression system.

Supplementary material

10969_2011_9116_MOESM1_ESM.pdf (25 kb)
Supplementary material 1 (PDF 24 kb)
10969_2011_9116_MOESM2_ESM.pdf (195 kb)
Supplementary material 2 (PDF 194 kb)

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Hidekazu Hiroaki
    • 1
    • 2
    • 3
  • Yoshitaka Umetsu
    • 1
    • 2
  • Yo-ichi Nabeshima
    • 4
  • Minako Hoshi
    • 4
  • Daisuke Kohda
    • 3
    • 5
  1. 1.Division of Structural BiologyKobe University Graduate School of MedicineChuo, KobeJapan
  2. 2.The Structural Biology Resreach Center and Division of Biological Science, Graduate School of ScienceNagoya UniversityFuro-cho, Chikusa-ku, NagoyaJapan
  3. 3.Department of Structural BiologyBiomolecular Engineering Research InstituteSuita, OsakaJapan
  4. 4.Department of Pathology and Tumor BiologyKyoto University Graduate School of MedicineKyotoJapan
  5. 5.Medical Institute for BioregulationKyushu UniversityHakata, FukuokaJapan

Personalised recommendations