Analysis of nucleoside-binding proteins by ligand-specific elution from dye resin: application to Mycobacterium tuberculosis aldehyde dehydrogenases
- First Online:
- Cite this article as:
- Kim, CY., Webster, C., Roberts, J.K.M. et al. J Struct Funct Genomics (2009) 10: 291. doi:10.1007/s10969-009-9073-z
- 619 Downloads
We show that Cibacron Blue F3GA dye resin chromatography can be used to identify ligands that specifically interact with proteins from Mycobacterium tuberculosis, and that the identification of these ligands can facilitate structure determination by enhancing the quality of crystals. Four native Mtb proteins of the aldehyde dehydrogenase (ALDH) family were previously shown to be specifically eluted from a Cibacron Blue F3GA dye resin with nucleosides. In this study we characterized the nucleoside-binding specificity of one of these ALDH isozymes (recombinant Mtb Rv0223c) and compared these biochemical results with co-crystallization experiments with different Rv0223c-nucleoside pairings. We found that the strongly interacting ligands (NAD and NADH) aided formation of high-quality crystals, permitting solution of the first Mtb ALDH (Rv0223c) structure. Other nucleoside ligands (AMP, FAD, adenosine, GTP and NADP) exhibited weaker binding to Rv0223c, and produced co-crystals diffracting to lower resolution. Difference electron density maps based on crystals of Rv0223c with various nucleoside ligands show most share the binding site where the natural ligand NAD binds. From the high degree of similarity of sequence and structure compared to human mitochondrial ALDH-2 (BLAST Z-score = 53.5 and RMSD = 1.5 Å), Rv0223c appears to belong to the ALDH-2 class. An altered oligomerization domain in the Rv0223c structure seems to keep this protein as monomer whereas native human ALDH-2 is a multimer.
KeywordsFunctional analysis High efficiency in structural genomics Improvement of crystal quality Nucleoside binding proteins Prioritization of targeting Specificity of ligand binding
Nicotinamide adenine dinucleotide
Nicotinamide adenine dinucleotide phosphate
Nucleosides and their derivatives in cellular metabolism are well recognized as energy carriers in metabolic transactions, essential chemical links for intracellular signals, and constituents of nucleic acids (DNA and RNA) [1, 2]. The metabolism of nucleosides is vital to a cell’s survival, and about half of all enzymes are nucleoside-dependent, representing one of the largest and most important classes of cellular proteins . Determining structure–function relationships of nucleoside-binding proteins is therefore a significant component of the structural genomics of proteins that are crucial to cell function. Two major challenges in such studies are the significant number of target proteins, and the problem of sorting proteins according to their nucleoside specificity.
Nucleoside ligand-binding to individual proteins typically occurs with high specificity, and is often sensitive to slight changes at the protein interaction site [4, 5, 6]. Elucidating specific interactions between ligands and proteins helps to derive functional insights for many proteins [7, 8], complementing bioinformatics approaches, and may provide the sole source of functional information for hypothetical proteins.
The aldehyde dehydrogenases (ALDH) comprise a large family of proteins which metabolize various endogenous and exogenous substrates [9, 10]. The human genome contains 19 putative ALDH genes and three pseudogenes. Many of them are regulated in response to oxidative stress and over expressed in various tumors [11, 12]. ALDH enzymes have multiple catalytic and non-catalytic functions in ester hydrolysis, antioxidant properties, xenobiotic bioactivation and UV light absorption, and also play important roles in embryogenesis, development and neurotransmission [13, 14]. Mutations in these genes cause subsequent inborn errors in aldehyde metabolism, such as Sjögren-Larsson syndrome , Type II hyperprolinaemia and gamma-hydroxybutyric aciduria , and pyridoxine-dependent seizures . Human ALDH-2 is important as a nitroglycerin reductase  and an activator of NADPH oxidases , and for the major function of elimination of toxic aldehydes which lead to lipid peroxidation, protein/enzyme dysfunction, structural damage and apoptosis in alcohol related disorders , such as alcohol liver disease , heart disease  and gastrointestinal cancer .
Predicted ALDH family proteins in the M. tuberculosis Genomea
Protein name (probable)b
Aldehyde dehydrogenase (NAD+ dependent)
ALDH-1 or -2
Succinate-semialdehyde dehydrogenase gabD1 (NADP+ dependent)
ALDH-1 or -2
Methylmalonate-semialdehyde dehydrogenase mmsA
Aldehyde dehydrogenase aldA (NAD+ dependent)
ALDH-1 or -2
Pyrroline-5-carboxylate dehydrogenase rocA
Succinate-semialdehyde dehydrogenase gabD2 (NADP+ dependent)
ALDH-1 or -2
Piperideine-6-carboxilic acid dehydrogenase PCD (antiquintin)
In a previous report, we used dye-ligand elution chromatography to screen for nucleoside-binding proteins in Mtb cell extracts and to analyze the specificity of nucleoside-protein interactions . That study identified 26 native Mtb proteins binding to Cibacron Blue resin that were specifically eluted with nucleosides. Four of these 26 proteins were members of the ALDH family, as shown in Table 1. The large number of ALDH proteins in the relatively small Mtb genome, and the many essential functions of ALDH proteins in human cells, suggest potential critical roles of Mtb ALDHs for survival in its human host environment.
In this report, we purify one of these Mtb ALDHs (Rv0223c), characterize its nucleoside specificity, use ligands that interact with Rv0223c to improve its crystallization, present the first structure of an Mtb ALDH, and show its close structural similarity to human mitochondrial ALDH-2.
Materials and methods
Cloning and expression of putative Mtb ALDHs
Four Mtb ALDH genes (Rv0223c, Rv0458, Rv2858c, and Rv3293) were previously identified as interacting with nucleosides using dye-resin chromatography and ligand-specific elution . Each targeted ALDH gene was amplified by PCR from a M. tuberculosis H37Rv COSMID library as the template with Pfu proof-reading DNA polymerase (Stratagene), using the 5′ NdeI primer, 5′-AGATATACATATG + (N-terminal 21 bases of target sequence)-3′, and the 3′ BamHI primer, 5′-AATTCGGATCC + (C-terminal 23 bases of target sequence)-3′. The underlined bases represent the NdeI and BamHI sites, respectively. The PCR amplicon was digested with NdeI and BamHI restriction endonucleases (NEB), and cleaned using Qiaquick PCR spin column (Qiagen). The product was ligated into a modified pET-28 vector containing a C-terminal 6-His tag, in frame with the BamHI restriction site using T4 DNA ligase (New England BioLabs), and transformed into BL21(DE3) (Novagen). The expressed proteins contained the C-terminal tag GSHHHHHH, where GS is encoded by the BamHI restriction site (GGATCC). BL21(DE3) 3 ml cell culture was tested for the expression of heterologous protein by binding on a Cobalt-chelated Talon superflow bead slurry (Clontech) and SDS–PAGE analysis.
Cell culture was performed as described by Studier  with some modifications. Transformed cells were inoculated into 3 ml seed culture media (1 mM MgSO4, 0.5% glucose, 17 amino acids of 100 μg/ml for each Na-Glu, Asp, Lys-HCl, Arg-HCl, His-HCl, Ala, Pro, Gly, Thr, Ser, Gln, Asn, Val, Leu, Ile Phe, Trp, metal mix of 50 μM Fe, 20 μM Ca, 10 μM Mn, 10 μM Zn, 2 μM for each Co, Cu, Ni, Mo, Se and B, 5 mM PO4, 5 mM Na, 2.5 mM K, 2.5 mM NH4 and 1.25 mM SO4), and grown overnight at 37°C. From the seed culture, 500 μl was inoculated into 500 ml auto-induction media, containing 1 mM MgSO4, metal mix (same as seed culture), 0.5% glycerol, 0.5% glucose, 0.2% α-lactose, NPS (same as seed culture), and 35 μg/ml kanamycin. After cells were grown at 37°C until OD600 reaches 0.5, the growth was continued at 20°C for approximately 16 h until the OD600 reached approximately 15. The cells were harvested and stored at −80°C.
The cell pellet was lysed by sonication in 10 ml of buffer A (20 mM Tris–HCl, pH 8.0, and 100 mM NaCl) per gram of cells for 10 min in 30 s pulses at 10°C. The cell debris was removed by ultra-centrifugation for 30 min at 38,000 rpm using a Ti 60 rotor (Beckman). The clear supernatant was filtered through a 0.45 μm pore membrane and loaded on a 5 ml Talon superflow affinity column equilibrated with buffer A. After washing with 30 ml buffer A and 20 ml buffer B (20 mM Tris–HCl, pH 8.0, 100 mM NaCl and 20 mM imidazole), the His-tagged Rv0223c (and the other ALDHs) was eluted from the cobalt affinity column using Buffer C (20 mM Tris–HCl, pH 8.0, 100 mM NaCl and 300 mM imidazole). The eluted fraction was dialyzed against Buffer D (20 mM Tris–HCl, pH 8.0, 100 mM NaCl and 10 mM β-mercaptoethanol) and purified by gel filtration on a Superdex-75 column (GE Healthcare Inc.) using Buffer D for equilibration and elution. The peak fractions (monitored at OD280) were analyzed by SDS–PAGE and the pooled protein fractions were concentrated using a Centricon Plus-20 (Millipore) up to 35 mg/ml, which was measured by Bradford assay with IgG (Bio-Rad) as a standard. The purity of each protein was estimated to be higher than 95% based on densitometry of SDS–PAGE gels .
Screening of Rv0223c for interactions with multiple nucleosides and nucleotides
Recombinant proteins were evaluated for their ligand-binding properties using a modified affinity elution chromatography protocol . Individual proteins were diluted to 2 mg/ml in column buffer (CB, containing 50 mM potassium phosphate, pH 7.5, 1 mM MgCl2 and 2 mM DTT) and adsorbed to multiple small aliquots of F3GA resin (100 μg protein per 10 mg resin) in 2 ml spin-columns (Costar, Fisher Scientific). Binding was for 1 h at 4°C with very gentle vortexing, followed by recovery of unbound protein (flow-through fraction) and washing of the resin (4 × 0.4 ml washes with CB); spin-columns were micro-centrifuged for 30 s at 10,000×g, to recover fractions and change solutions. Individual spin-columns containing resin-bound proteins were then incubated (as for protein binding, above) with 50 μl 1 mM test ligand in CB, and the elution fractions recovered by centrifugation. Protein which remained bound to the resin was recovered by heating at 95°C for 5 min in 100 μl SDS sample buffer, and centrifugation (resin fraction). Aliquots of initial protein, spin-column flow-through and eluate fractions were diluted 1:1 with 2× SDS sample buffer, and loaded in equal proportion (equivalent to 1 μg input protein) on 15% gels and stained with silver.
Rv0223c protein–ligand crystallization and data collection
Crystallization experiments were carried out by the hanging-drop vapor diffusion method  at room temperature (25°C) using 24-well plates. Recombinant Rv0223c was tested for the presence of bound nucleotides (see Supplementary Methods); the results indicated that the protein has at most 0.1:1 bound NAD or other nucleotide (see Supplementary Fig. 1). Each protein–ligand solution was prepared by mixing protein (0.68 mM in solution of Buffer D) with the corresponding ligand (20 mM in H2O) at a molar ratio of 1:2 protein:ligand. The mixtures were incubated at room temperature for 30 min prior to setting up crystallization experiments. The final concentration of protein in each protein–ligand mixture was between 0.60 and 0.63 mM. The ligands used were NAD, NADH, NADP, NADPH, adenosine, AMP, ADP, ATP, GTP, FAD, FMN and Cibacron Blue F3GA (free dye). Crystals were grown from drops consisting of 1 μl protein–ligand solution mixed with 1 μl of reservoir solution against a reservoir containing 0.1 M MES (pH 6.0) and 0.8 M ammonium sulfate, for 3 days at room temperature. For some ligands (e.g. ATP), the effect of Mg++ was tested for crystallization, but without any noticeable effect, crystallization experiments reported were executed without Mg++. Native and SeMet Rv0223c-NAD complex crystals were flash-cooled in liquid N2 with the addition of 10% glycerol in the crystallization buffer as cryoprotectant. Three-wavelength selenium multi-wavelength anomalous dispersion (MAD) data was collected at the beam line 5.0.2 at the Advanced Light Source (ALS). A native data set at a resolution of 1.8 Å was collected at the beam line 8.2.1 at the ALS. Both data sets were processed with the HKL2000 program suite .
Structure determination and data analysis of Rv0223c protein–ligand complexes
Initial phasing was carried out with the program SOLVE  using the MAD data set. The resulting experimental map was density modified and traced using the program RESOLVE . The protein model was further improved and built with the ARP/WARP package [32, 33] against the 1.8 Å native data. Manual model rebuilding was carried out with programs COOT . The final model of this complex is deposited in the Protein Data Bank (http://www.rcsb.org) as entry 3B4W, and has R/Rfree values of 0.18/0.20 at a resolution of 1.8 Å, after refinement with the phenix.refine program from the PHENIX software package . Difference electron density maps were calculated for each Rv0223c-ligand complex by refining the structure of the Rv0223c protein (without ligands or solvent molecules) against the observed structure factor amplitudes for each complex. The resulting crystallographic phases were used to construct an (mFo − DFφ)eiφc difference map . The LigandFit algorithm for automated ligand-fitting in PHENIX  was used to identify the location of the largest contiguous regions of high density in the difference map. In this algorithm the contour level for identification of contiguous regions of density is set to a level such that the largest region is approximately the size of the anticipated ligand. In this way, the location of this largest region gives an indication of the location of the ligand. The difference electron density for each complex is shown in the region of the NAD in the Rv0223c protein-NAD complex and is displayed with PyMOL . The overall structure comparison between Mtb Rv0223c and human mitochondrial ALDH-2 was performed by the DaliLite program .
Sequence similarities of four Mtb ALDHs captured by dye-ligand chromatography to each other and to human ALDH proteins
Homology analysis of Mtb ALDHs being compared with other Mtb ALDH or Human ALDH by the NCBI Blastp program
e-valuea (seq. id/res)b
e-value (seq. id/res)
e-value (seq. id/res)
e-value (seq. id/res)
Analysis of the specificity of ligand–protein interactions with recombinant Rv0223c using dye-ligand chromatography
We used a modified version of our dye-resin/ligand-elution procedure to examine the specificity of ligand binding of one of the four Mtb ALDHs (see Materials and Methods). In this ligand-specific elution screen, recombinant Rv0223c was adsorbed on small aliquots of F3GA resin in spin-columns and assayed for elution by twelve nucleotides and nucleosides using one ligand per column. By using purified recombinant protein and one ligand per column, we identify effects of individual ligands on the stability of the dye-Rv0223c protein complex. We expect that those ligands that cause the Rv0223c protein to elute from the resin are likely to bind specifically to the protein, although non-specific interactions could potentially also cause elution to occur .
The differences in the release of Rv0223c from the affinity resin by various ligands were substantial (Fig. 1), and the elution of the purified recombinant Mtb ALDH with ATP is consistent with the observations from experiments with crude cytosolic extracts (Fig. 1; Table 3 of . Comparing the methods used for recombinant proteins versus native proteins in cell extracts, we note that the method for cell extracts has some potential complications. As cell extracts are mixtures of many proteins of various abundances, binding and detection of minor proteins may be difficult. Moreover, some proteins retained on the column may be bound via other proteins in native complexes, rather than directly to the dye, while others may have increased or unchanged affinity for the dye when bound to ligands . Finally, the order of elution, especially when several related ligands are used, could potentially influence gel spot intensity and identification of some proteins due to depletion by earlier ligand(s). This method is therefore only useful as a positive screen; any protein eluted by a specific ligand is a possible in vivo target of that ligand, whereas failure to detect a given protein is uninformative. In contrast, our analysis of recombinant proteins eliminates most of these variables, and allows the direct comparison of a protein’s affinity for an array of ligands at the dye-binding site.
Overall, Fig. 1 indicates that NAD, NADH, ATP, ADP, AMP and FAD all interact with Rv0223c protein, and that the highest affinity is for NADH and ADP. This pattern was identical in 3 replicate experiments. Further it indicates that NADP, NADPH, adenosine, GTP, FMN, and NMN interact only weakly with Rv0223c at the dye-binding site; ligand binding elsewhere in the protein cannot be ruled out.
Crystallization and X-ray diffraction of Mtb Rv0223c with multiple ligands
Data collection and refinement statistics
Unit cell parameters:
a, b, c (Å)
135.111, 135.111, 72.543
Resolution limits (Å)
No. of unique reflections:
Resolution limits (Å)
No. of protein atoms
No. of solvent atoms
No. of hetrogen atoms
rmsd bond lengths (Å)
rmsd bond angles (°)
Summary of ligand elution from dye and co-crystallization of Rv0223c
The structure of the Rv0223c+NAD complex was determined by MAD phasing at a resolution of 1.8 Å (PDB ID 3B4W; Table 3). The structures of all the complexes, Rv0223c+ligand (NADH, AMP, FAD, adenosine, GTP or NADP) were isomorphic with this one, with less than 1% changes in cell parameters. Consequently, we were able to use the structure of the Rv0223c protein (without ligand) from the Rv0223c+NAD complex to calculate phases and a difference electron density map for each of these structures (after refinement using the observed structure factors for each structure) using the diffraction data indicated in Table 4.
Location of nucleoside ligands in the Rv0223c structures
The other ligand-Rv0223c protein complexes diffracted to 2.3–3.0 Å, and for GTP, NADP and AMP, the strongest difference electron density corresponding to a bound ligand was found at the NAD-binding site (Fig. 3). In the case of adenosine-Rv0223c co-crystals, the second-largest region of contiguous density was at the NAD-binding site, but the density was weak. In a final case (FAD), no clear peaks of difference density were found. The varied electron density of ligands binding to a common site in Rv0223c suggests that the conformation of the ligands at this site may vary and that there may be considerable flexibility in binding at this site. It seems possible that the ligands that are normally involved in the function of Rv0223c might bind in a more specific fashion than those that are binding adventitiously, but we cannot be certain, as we do not know the ligand specificity of Rv0223c necessary for its function.
These structural studies provide independent evidence for various protein–ligand interactions detected using dye-ligand chromatography (Fig. 1). The crystallization data together with structural and biochemical analyses show strong and specific interactions between NAD (or NADH) and Rv0223c, while weaker interactions were also detected between Rv0223c and several other purine nucleosides/nucleotides (adenosine, AMP, GTP, NADP and FAD).
Structural homology between His tagged recombinant Rv0223c and human mitochondrial ALDH
We noticed that Mtb Rv0223c is a monomer, whereas human mitochondrial ALDH is an octamer in crystal lattices [42, 45]. Both samples used for structural analysis were expressed in E. coli, but the Mtb Rv0223c protein has a His-tag at its C-terminus. In the human mitochondrial ALDH, the C-terminus of the protein is part of the oligomerization domain. It is possible that the His-tag of Rv0223c may interfere with oligomerization. However, gel filtration chromatography data with His-tagged and non-tagged Rv0223c proteins indicated that the His-tag does not influence the oligomerization state of the protein as reflected in the monomeric elution times of both proteins from the column (data not shown). It is more likely that the Rv0223c protein is monomeric for another reason, as the C-terminal residues of Mtb Rv0223c (Val486, Thr485, and Tyr484) are on the opposite side of the oligomerization domain from the side which is involved in monomer–monomer contacts based on the human ALDH-2 structure (PDB ID 1CE3). (Note that the His-tag is not visible in the electron density map and is therefore not shown in Fig. 4).
Sequence homology between Mtb ALDH Rv0223c and human mitochondrial ALDH
The partial amino acid sequences of Rv0223c and human mitochondrial ALDH were aligned based on secondary structure for sequence homology analysis, as shown in Fig. 5. The full length Mtb and human sequences are shown (487 and 518 amino acids, respectively), and were determined by NCBI Blastp to be 39% identical and 53% similar. The two ALDH signature sequences identified by Prosite , and including cysteine and glutamic acid active site residues, were conserved in both proteins and found at identical locations.
The dye-resin chromatography method generates data about the interaction of protein and specific nucleoside ligand(s), and the identified ligand(s) may help stabilize the protein and improve the target protein’s crystallization (Fig. 2). Structural data obtained from such crystals may also show the interactions of the ligand with the protein. In conjunction with functional information derived from the identification of ligands, this may help elucidate the biochemical role of the protein. Additionally, since many drugs are analogues of nucleosides, proteins bound by nucleoside analogue drugs can be identified through this approach as a biomedical application (Fig. 5 of .
The native Mtb Rv0223c protein did not crystallize readily. As shown in Fig. 2, co-crystallization with NAD and NADH, which showed strong interaction by dye-ligand chromatography (Fig. 1), greatly improved the crystal quality and yielded structures of the complexes at resolutions of 1.8–2.0 Å (Fig. 3; Table 3). In contrast, NADP, adenosine, GTP, AMP and FAD, which showed relatively weak interactions (Fig. 1), generated lower resolution (2.3 Å or higher) structural data. We note that the ligands ADP and ATP, which showed strong interactions with Rv0223c, did not generate crystals (Table 4). Figure 3 reveals that ligands share a common interaction site within Rv0223c, and that Rv0223c contains a hydrophobic pocket and adenine recognition motif (composed of Gln217 and Glu239) in the coenzyme domain, which is a common motif for binding adenine-derivative coenzymes . This information, in combination with our biochemical data on protein–ligand interactions (Fig. 1) and crystallization data from protein–ligand mixtures (Fig. 2), supports the conclusion that the binding of each ligand to Rv0223c occurs via contacts of the ligand to both the adenine recognition motif and hydrophobic pocket. Further, the stability of different protein–ligand crystals and their quality for high resolution structure determination depends on the degree of stabilization of protein by the interaction of the nicotinamide moiety of NAD(H) with residues in the catalytic domain.
A major bottleneck in structural genomics projects is the production of crystals suitable for analysis by X-ray diffraction. The demonstration in this work that information on ligands generated by dye-ligand chromatography can be used to identify ligands binding to nucleoside-binding proteins and to improve crystallization may be a significant contribution to overcoming this bottleneck.
There are several extensions to our method that we will make the process higher throughput. Using LC-MS (or LC-MS-MS) systems to identify native proteins in cell extracts that are eluted by specific ligands  may allow the protein identification process to be finished immediately after the ligand elution. In addition to Cibacron Blue F3GA dye resin, which was used in our dye-ligand chromatography approach, there are several other dye resins which are known to interact with a specific group of proteins . These resins may be useful in binding groups of related proteins in a high throughput manner. The techniques described here and in Roberts et al.  could be applied to newly sequenced organisms to identify the metabolically important proteins that interact with nucleoside ligands, and to complement the annotation of each gene based on the interacting ligand(s) .
The authors are grateful to N. Maes for her technical assistance. We also would like to thank the staff at the BL 5.0.2 and BL 8.2.1 managed by the Berkeley Center for Structural Biology (BCSB) at the ALS for technical support. The BCSB is supported in part by the National Institutes of Health, National Institute of General Medical Sciences. The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, of the US Department of Energy under Contract No. DE-AC02-05CH11231. This work was in part supported by the LANL-UCR CARE program (STB-UC:06-29) and the NIGMS Protein Structure Initiative program (NIH U54 GM074946).
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.