Introduction

Nucleosides and their derivatives in cellular metabolism are well recognized as energy carriers in metabolic transactions, essential chemical links for intracellular signals, and constituents of nucleic acids (DNA and RNA) [1, 2]. The metabolism of nucleosides is vital to a cell’s survival, and about half of all enzymes are nucleoside-dependent, representing one of the largest and most important classes of cellular proteins [3]. Determining structure–function relationships of nucleoside-binding proteins is therefore a significant component of the structural genomics of proteins that are crucial to cell function. Two major challenges in such studies are the significant number of target proteins, and the problem of sorting proteins according to their nucleoside specificity.

Nucleoside ligand-binding to individual proteins typically occurs with high specificity, and is often sensitive to slight changes at the protein interaction site [46]. Elucidating specific interactions between ligands and proteins helps to derive functional insights for many proteins [7, 8], complementing bioinformatics approaches, and may provide the sole source of functional information for hypothetical proteins.

The aldehyde dehydrogenases (ALDH) comprise a large family of proteins which metabolize various endogenous and exogenous substrates [9, 10]. The human genome contains 19 putative ALDH genes and three pseudogenes. Many of them are regulated in response to oxidative stress and over expressed in various tumors [11, 12]. ALDH enzymes have multiple catalytic and non-catalytic functions in ester hydrolysis, antioxidant properties, xenobiotic bioactivation and UV light absorption, and also play important roles in embryogenesis, development and neurotransmission [13, 14]. Mutations in these genes cause subsequent inborn errors in aldehyde metabolism, such as Sjögren-Larsson syndrome [15], Type II hyperprolinaemia and gamma-hydroxybutyric aciduria [16], and pyridoxine-dependent seizures [17]. Human ALDH-2 is important as a nitroglycerin reductase [18] and an activator of NADPH oxidases [9], and for the major function of elimination of toxic aldehydes which lead to lipid peroxidation, protein/enzyme dysfunction, structural damage and apoptosis in alcohol related disorders [19], such as alcohol liver disease [20], heart disease [21] and gastrointestinal cancer [22].

The Mycobacterium tuberculosis (Mtb) genome encodes ten putative ALDH proteins potentially associated with seven different ALDH classes (Table 1), suggesting that they have diversity comparable to human ALDHs [13, 14]. The presence of a Rossman consensus sequence [23] in four of ten predicted Mtb ALDHs (Table 1) further suggests diversity at the level of nucleotide binding at the cofactor binding site.

Table 1 Predicted ALDH family proteins in the M. tuberculosis Genomea

In a previous report, we used dye-ligand elution chromatography to screen for nucleoside-binding proteins in Mtb cell extracts and to analyze the specificity of nucleoside-protein interactions [24]. That study identified 26 native Mtb proteins binding to Cibacron Blue resin that were specifically eluted with nucleosides. Four of these 26 proteins were members of the ALDH family, as shown in Table 1. The large number of ALDH proteins in the relatively small Mtb genome, and the many essential functions of ALDH proteins in human cells, suggest potential critical roles of Mtb ALDHs for survival in its human host environment.

In this report, we purify one of these Mtb ALDHs (Rv0223c), characterize its nucleoside specificity, use ligands that interact with Rv0223c to improve its crystallization, present the first structure of an Mtb ALDH, and show its close structural similarity to human mitochondrial ALDH-2.

Materials and methods

Cloning and expression of putative Mtb ALDHs

Four Mtb ALDH genes (Rv0223c, Rv0458, Rv2858c, and Rv3293) were previously identified as interacting with nucleosides using dye-resin chromatography and ligand-specific elution [24]. Each targeted ALDH gene was amplified by PCR from a M. tuberculosis H37Rv COSMID library as the template with Pfu proof-reading DNA polymerase (Stratagene), using the 5′ NdeI primer, 5′-AGATATACATATG + (N-terminal 21 bases of target sequence)-3′, and the 3′ BamHI primer, 5′-AATTCGGATCC + (C-terminal 23 bases of target sequence)-3′. The underlined bases represent the NdeI and BamHI sites, respectively. The PCR amplicon was digested with NdeI and BamHI restriction endonucleases (NEB), and cleaned using Qiaquick PCR spin column (Qiagen). The product was ligated into a modified pET-28 vector containing a C-terminal 6-His tag, in frame with the BamHI restriction site using T4 DNA ligase (New England BioLabs), and transformed into BL21(DE3) (Novagen). The expressed proteins contained the C-terminal tag GSHHHHHH, where GS is encoded by the BamHI restriction site (GGATCC). BL21(DE3) 3 ml cell culture was tested for the expression of heterologous protein by binding on a Cobalt-chelated Talon superflow bead slurry (Clontech) and SDS–PAGE analysis.

Cell culture was performed as described by Studier [25] with some modifications. Transformed cells were inoculated into 3 ml seed culture media (1 mM MgSO4, 0.5% glucose, 17 amino acids of 100 μg/ml for each Na-Glu, Asp, Lys-HCl, Arg-HCl, His-HCl, Ala, Pro, Gly, Thr, Ser, Gln, Asn, Val, Leu, Ile Phe, Trp, metal mix of 50 μM Fe, 20 μM Ca, 10 μM Mn, 10 μM Zn, 2 μM for each Co, Cu, Ni, Mo, Se and B, 5 mM PO4, 5 mM Na, 2.5 mM K, 2.5 mM NH4 and 1.25 mM SO4), and grown overnight at 37°C. From the seed culture, 500 μl was inoculated into 500 ml auto-induction media, containing 1 mM MgSO4, metal mix (same as seed culture), 0.5% glycerol, 0.5% glucose, 0.2% α-lactose, NPS (same as seed culture), and 35 μg/ml kanamycin. After cells were grown at 37°C until OD600 reaches 0.5, the growth was continued at 20°C for approximately 16 h until the OD600 reached approximately 15. The cells were harvested and stored at −80°C.

The cell pellet was lysed by sonication in 10 ml of buffer A (20 mM Tris–HCl, pH 8.0, and 100 mM NaCl) per gram of cells for 10 min in 30 s pulses at 10°C. The cell debris was removed by ultra-centrifugation for 30 min at 38,000 rpm using a Ti 60 rotor (Beckman). The clear supernatant was filtered through a 0.45 μm pore membrane and loaded on a 5 ml Talon superflow affinity column equilibrated with buffer A. After washing with 30 ml buffer A and 20 ml buffer B (20 mM Tris–HCl, pH 8.0, 100 mM NaCl and 20 mM imidazole), the His-tagged Rv0223c (and the other ALDHs) was eluted from the cobalt affinity column using Buffer C (20 mM Tris–HCl, pH 8.0, 100 mM NaCl and 300 mM imidazole). The eluted fraction was dialyzed against Buffer D (20 mM Tris–HCl, pH 8.0, 100 mM NaCl and 10 mM β-mercaptoethanol) and purified by gel filtration on a Superdex-75 column (GE Healthcare Inc.) using Buffer D for equilibration and elution. The peak fractions (monitored at OD280) were analyzed by SDS–PAGE and the pooled protein fractions were concentrated using a Centricon Plus-20 (Millipore) up to 35 mg/ml, which was measured by Bradford assay with IgG (Bio-Rad) as a standard. The purity of each protein was estimated to be higher than 95% based on densitometry of SDS–PAGE gels [26].

Screening of Rv0223c for interactions with multiple nucleosides and nucleotides

Recombinant proteins were evaluated for their ligand-binding properties using a modified affinity elution chromatography protocol [27]. Individual proteins were diluted to 2 mg/ml in column buffer (CB, containing 50 mM potassium phosphate, pH 7.5, 1 mM MgCl2 and 2 mM DTT) and adsorbed to multiple small aliquots of F3GA resin (100 μg protein per 10 mg resin) in 2 ml spin-columns (Costar, Fisher Scientific). Binding was for 1 h at 4°C with very gentle vortexing, followed by recovery of unbound protein (flow-through fraction) and washing of the resin (4 × 0.4 ml washes with CB); spin-columns were micro-centrifuged for 30 s at 10,000×g, to recover fractions and change solutions. Individual spin-columns containing resin-bound proteins were then incubated (as for protein binding, above) with 50 μl 1 mM test ligand in CB, and the elution fractions recovered by centrifugation. Protein which remained bound to the resin was recovered by heating at 95°C for 5 min in 100 μl SDS sample buffer, and centrifugation (resin fraction). Aliquots of initial protein, spin-column flow-through and eluate fractions were diluted 1:1 with 2× SDS sample buffer, and loaded in equal proportion (equivalent to 1 μg input protein) on 15% gels and stained with silver.

Rv0223c protein–ligand crystallization and data collection

Crystallization experiments were carried out by the hanging-drop vapor diffusion method [28] at room temperature (25°C) using 24-well plates. Recombinant Rv0223c was tested for the presence of bound nucleotides (see Supplementary Methods); the results indicated that the protein has at most 0.1:1 bound NAD or other nucleotide (see Supplementary Fig. 1). Each protein–ligand solution was prepared by mixing protein (0.68 mM in solution of Buffer D) with the corresponding ligand (20 mM in H2O) at a molar ratio of 1:2 protein:ligand. The mixtures were incubated at room temperature for 30 min prior to setting up crystallization experiments. The final concentration of protein in each protein–ligand mixture was between 0.60 and 0.63 mM. The ligands used were NAD, NADH, NADP, NADPH, adenosine, AMP, ADP, ATP, GTP, FAD, FMN and Cibacron Blue F3GA (free dye). Crystals were grown from drops consisting of 1 μl protein–ligand solution mixed with 1 μl of reservoir solution against a reservoir containing 0.1 M MES (pH 6.0) and 0.8 M ammonium sulfate, for 3 days at room temperature. For some ligands (e.g. ATP), the effect of Mg++ was tested for crystallization, but without any noticeable effect, crystallization experiments reported were executed without Mg++. Native and SeMet Rv0223c-NAD complex crystals were flash-cooled in liquid N2 with the addition of 10% glycerol in the crystallization buffer as cryoprotectant. Three-wavelength selenium multi-wavelength anomalous dispersion (MAD) data was collected at the beam line 5.0.2 at the Advanced Light Source (ALS). A native data set at a resolution of 1.8 Å was collected at the beam line 8.2.1 at the ALS. Both data sets were processed with the HKL2000 program suite [29].

Structure determination and data analysis of Rv0223c protein–ligand complexes

Initial phasing was carried out with the program SOLVE [30] using the MAD data set. The resulting experimental map was density modified and traced using the program RESOLVE [31]. The protein model was further improved and built with the ARP/WARP package [32, 33] against the 1.8 Å native data. Manual model rebuilding was carried out with programs COOT [34]. The final model of this complex is deposited in the Protein Data Bank (http://www.rcsb.org) as entry 3B4W, and has R/R free values of 0.18/0.20 at a resolution of 1.8 Å, after refinement with the phenix.refine program from the PHENIX software package [35]. Difference electron density maps were calculated for each Rv0223c-ligand complex by refining the structure of the Rv0223c protein (without ligands or solvent molecules) against the observed structure factor amplitudes for each complex. The resulting crystallographic phases were used to construct an (mF o  − DF φ )e iφc difference map [36]. The LigandFit algorithm for automated ligand-fitting in PHENIX [37] was used to identify the location of the largest contiguous regions of high density in the difference map. In this algorithm the contour level for identification of contiguous regions of density is set to a level such that the largest region is approximately the size of the anticipated ligand. In this way, the location of this largest region gives an indication of the location of the ligand. The difference electron density for each complex is shown in the region of the NAD in the Rv0223c protein-NAD complex and is displayed with PyMOL [38]. The overall structure comparison between Mtb Rv0223c and human mitochondrial ALDH-2 was performed by the DaliLite program [39].

Results

Sequence similarities of four Mtb ALDHs captured by dye-ligand chromatography to each other and to human ALDH proteins

In Table 2, we examine the degree of sequence identity among these four Mtb ALDHs and their closest human homologs in more detail. The sequence identities show that the four Mtb proteins have more sequence homology to specific human ALDHs than to each other; for example, Rv0223c, Rv0458, and Rv2858c show distinctively smaller e-values and higher percentages of identical residues when aligned with human ALDH class 1 and 2 protein (than when they are aligned with other Mtb ALDHs) by the NCBI Blastp program, and Rv3293 shows the same result with human ALDH class 7.

Table 2 Homology analysis of Mtb ALDHs being compared with other Mtb ALDH or Human ALDH by the NCBI Blastp program

Analysis of the specificity of ligand–protein interactions with recombinant Rv0223c using dye-ligand chromatography

We used a modified version of our dye-resin/ligand-elution procedure to examine the specificity of ligand binding of one of the four Mtb ALDHs (see Materials and Methods). In this ligand-specific elution screen, recombinant Rv0223c was adsorbed on small aliquots of F3GA resin in spin-columns and assayed for elution by twelve nucleotides and nucleosides using one ligand per column. By using purified recombinant protein and one ligand per column, we identify effects of individual ligands on the stability of the dye-Rv0223c protein complex. We expect that those ligands that cause the Rv0223c protein to elute from the resin are likely to bind specifically to the protein, although non-specific interactions could potentially also cause elution to occur [24].

Figure 1 shows that the Rv0223c protein was eluted by NAD and NADH but only weakly by NADP and NADPH. This result is consistent with the dinucleotide preferences of homologous NAD-dependent aldehyde dehydrogenases (E.C. 1.2.1.3) from other organisms [40] and is addressed further below in the context of structural data. Additionally, the elution of Rv0223c protein by AMP/ADP/ATP is consistent with adenylate-binding by other dehydrogenases (cf. [3]).

Fig. 1
figure 1

Ligand-specific elution chromatography of recombinant Rv0223c, a putative Mtb aldehyde dehydrogenase. One-dimensional SDS–PAGE analysis of protein added (Rv0223c), not bound (FT, flow-through fraction) or released from Cibacron Blue F3GA resin by buffer or 12 different ligands shows the preferential release by NAD, NADH, ATP, and ADP

The differences in the release of Rv0223c from the affinity resin by various ligands were substantial (Fig. 1), and the elution of the purified recombinant Mtb ALDH with ATP is consistent with the observations from experiments with crude cytosolic extracts (Fig. 1; Table 3 of [24]. Comparing the methods used for recombinant proteins versus native proteins in cell extracts, we note that the method for cell extracts has some potential complications. As cell extracts are mixtures of many proteins of various abundances, binding and detection of minor proteins may be difficult. Moreover, some proteins retained on the column may be bound via other proteins in native complexes, rather than directly to the dye, while others may have increased or unchanged affinity for the dye when bound to ligands [24]. Finally, the order of elution, especially when several related ligands are used, could potentially influence gel spot intensity and identification of some proteins due to depletion by earlier ligand(s). This method is therefore only useful as a positive screen; any protein eluted by a specific ligand is a possible in vivo target of that ligand, whereas failure to detect a given protein is uninformative. In contrast, our analysis of recombinant proteins eliminates most of these variables, and allows the direct comparison of a protein’s affinity for an array of ligands at the dye-binding site.

Overall, Fig. 1 indicates that NAD, NADH, ATP, ADP, AMP and FAD all interact with Rv0223c protein, and that the highest affinity is for NADH and ADP. This pattern was identical in 3 replicate experiments. Further it indicates that NADP, NADPH, adenosine, GTP, FMN, and NMN interact only weakly with Rv0223c at the dye-binding site; ligand binding elsewhere in the protein cannot be ruled out.

Crystallization and X-ray diffraction of Mtb Rv0223c with multiple ligands

Ligands that bind specifically to proteins are potentially useful for protein crystallization and structure determination [41]. We therefore examined the effects of the nucleotide/nucleoside ligands shown in Fig. 1 on the crystallization of recombinant Rv0223c protein. We carried out a systematic screen for conditions that led to crystallization of Rv0223c in the presence of NAD, and found one condition that led to crystals of the Rv0223c+NAD complex diffracting to high resolution (1.8 Å; see Materials and Methods; Table 3). We then carried out crystallization trials of the purified protein under this condition, but in the presence of either no ligand or the various nucleoside ligands used in the dye-ligand chromatography assay. As shown in Fig. 2, ligand-free protein generally yielded either very small or no crystals in repeated tests for crystallization (in one experiment, crystals were obtained that diffracted to a relatively low resolution of about 3 Å). In contrast, co-crystallization with several ligands (NAD, NADH, NADP, adenosine, AMP, GTP and FAD) enhanced protein crystal formation, yielding crystals diffracting to high resolution suitable for structural analysis. Other ligands either hardly affected crystallization (NADPH, ADP and ATP) or promoted precipitation (FMN, and non-resin-bound F3GA). NMN was not tested. As summarized in Table 4, (although further screening of crystallization conditions may be required for the interactions between Rv0223c and ADP, ATP and F3GA) the strongly interacting ligands (NAD and NADH) enhanced crystal formation, leading to higher resolution data than the moderately or weakly interacting ligands (AMP, FAD, adenosine, GTP and NADP).

Table 3 Data collection and refinement statistics
Fig. 2
figure 2

The improvement of Rv0223c (putative aldehyde dehydrogenase) crystallization by addition of ligands identified by ligand-specific elution affinity chromatography method. Crystallization experiments were carried out by the hanging-drop vapor diffusion method at room temperature (295°K) using 24-well plates. Each protein–ligand solution was prepared by mixing with the corresponding ligand using molar ratio of 1:2 for protein vs. ligand, and incubated at room temperature for 30 min. In each protein–ligand mixture, the dilution factor of protein was kept less than 20% using the proper concentration of each ligand (NAD, NADH, NADP, adenosine, AMP, GTP and FAD) solution. The crystals were grown from drops consisting of 1 μL protein–ligand solution mixed with 1 μL of well solution against a reservoir containing 0.1 M MES (pH 6.0), 0.8 M ammonium sulfate, for 3 days at room temperature

Table 4 Summary of ligand elution from dye and co-crystallization of Rv0223c

The structure of the Rv0223c+NAD complex was determined by MAD phasing at a resolution of 1.8 Å (PDB ID 3B4W; Table 3). The structures of all the complexes, Rv0223c+ligand (NADH, AMP, FAD, adenosine, GTP or NADP) were isomorphic with this one, with less than 1% changes in cell parameters. Consequently, we were able to use the structure of the Rv0223c protein (without ligand) from the Rv0223c+NAD complex to calculate phases and a difference electron density map for each of these structures (after refinement using the observed structure factors for each structure) using the diffraction data indicated in Table 4.

Location of nucleoside ligands in the Rv0223c structures

The crystals obtained with NAD, and with its reduced form NADH, were the largest and best-diffracting crystals (resolution range 1.8–2.0 Å) that we obtained in repeated experiments, and the ligand was most clearly defined in the difference electron density maps for these complexes (Fig. 3) although the crystals could potentially contain NAD because NADH showed ~50% conversion to NAD after about 3 days in the crystallization condition used. These data suggest that Rv0223c forms the most well-defined complexes with NAD and NADH, consistent with the gene annotation (putative aldehyde dehydrogenase).

Fig. 3
figure 3

Structure of Mtb Rv0223c with NAD (in center; resolution range 1.8–2.0 Å), and the difference electron density map of other nucleoside ligands that promoted crystallization of Rv0223c. NADH (resolution range 1.9–2.0 Å), NADP (2.7–2.8 Å), adenosine (2.3–2.6 Å), AMP (2.6 Å), GTP (2.5–2.9 Å) and FAD (2.5–3.0 Å) identified at the binding site of NAD

The other ligand-Rv0223c protein complexes diffracted to 2.3–3.0 Å, and for GTP, NADP and AMP, the strongest difference electron density corresponding to a bound ligand was found at the NAD-binding site (Fig. 3). In the case of adenosine-Rv0223c co-crystals, the second-largest region of contiguous density was at the NAD-binding site, but the density was weak. In a final case (FAD), no clear peaks of difference density were found. The varied electron density of ligands binding to a common site in Rv0223c suggests that the conformation of the ligands at this site may vary and that there may be considerable flexibility in binding at this site. It seems possible that the ligands that are normally involved in the function of Rv0223c might bind in a more specific fashion than those that are binding adventitiously, but we cannot be certain, as we do not know the ligand specificity of Rv0223c necessary for its function.

These structural studies provide independent evidence for various protein–ligand interactions detected using dye-ligand chromatography (Fig. 1). The crystallization data together with structural and biochemical analyses show strong and specific interactions between NAD (or NADH) and Rv0223c, while weaker interactions were also detected between Rv0223c and several other purine nucleosides/nucleotides (adenosine, AMP, GTP, NADP and FAD).

Structural homology between His tagged recombinant Rv0223c and human mitochondrial ALDH

The structure of Mtb Rv0223c is the first reported structure from the Mtb ALDH family. We compared this structure with that of its human homologue, NAD-dependent mitochondrial ALDH, ALDH-2 (PDB ID 1CW3) [42], using the DaliLite server [39] to align the structures, as shown in Fig. 4. The choice of human ALDH-2 for comparison was based on sequence similarity (Table 2), and because Rv0223c has generally been categorized as a class 2 enzyme [43, 44]. The structure of Rv0223c closely overlapped that of human ALDH-2, with an overall rmsd of 1.5 Å over 184 residues in common. Only minor regions of incongruity were found, and these were in peripheral regions of the protein (Fig. 4).

Fig. 4
figure 4

Comparison of crystal structure of Mtb Rv0223c (green) with NAD (red) against human mitochondria aldehyde dehydrogenase-2 (orange) structure. The homology of two structures is very high with RMSD value 1.5 Å. In Rv0223c structure, the miss-aligned sequences (in magenta) are minor, consisting of residues; Pro360, Glu361, Gly362, and Leu363 in the catalytic domain, Asp9, Lys10, and Leu11 in the coenzyme domain, and Gly135, Ser136, Tyr137, Gly138, Gln139, and Ser140 in the oligomerization domain. The three domains are sectioned by purple lines. The adenine ring of NAD (and NADH) is located in the hydrophobic pocket formed by helices αD (gray) and αE (blue) in the coenzyme domain

The orientation of NAD in Rv0223c was similar to human ALDH-2 (data not shown) [45]. As Perez-Miller and Hurley described, the adenine ring of NAD as located in the hydrophobic pocket formed by helices αD and αE in the coenzyme domain, with the nicotinamide end facing and interacting with residues of the catalytic domain. The amino acid residues within a 5 Å range of the NAD binding site (residues in red in Fig. 5) were substantially conserved. Of the 25 residues of Rv0223c ALDH within this range, 14 were identical to those in the human protein. The spatial locations of all 14 of these conserved residues were very close in the two structures with rmsd 1.1 Å (versus rmsd 1.5 Å for the structures overall), and had a similar orientation (data not shown), suggesting that these are critical amino acids for the interaction with NAD.

Fig. 5
figure 5

Protein sequence alignment based on secondary structure of Mtb protein Rv0223c and human ALDH-2. The protein sequence alignment based on secondary structure of Mtb protein Rv0223c and human ALDH-2 was obtained using the DaliLite server [39]. Residues within 5 Å of NAD(H) are shown in red. The two ALDH signature sequences given by Prosite (PDOC00068) are highlighted in yellow. The identical amino acids between Rv0223c and human ALDH-2 are indicated with amino acid letter code and the similar amino acids are marked by ‘+’ in the line labeled with ‘Ident’. The secondary structure of each protein was indicated based on the assignment by DSSP program (H, helix; E, extended strand; L, loop or irregular) [50]

We noticed that Mtb Rv0223c is a monomer, whereas human mitochondrial ALDH is an octamer in crystal lattices [42, 45]. Both samples used for structural analysis were expressed in E. coli, but the Mtb Rv0223c protein has a His-tag at its C-terminus. In the human mitochondrial ALDH, the C-terminus of the protein is part of the oligomerization domain. It is possible that the His-tag of Rv0223c may interfere with oligomerization. However, gel filtration chromatography data with His-tagged and non-tagged Rv0223c proteins indicated that the His-tag does not influence the oligomerization state of the protein as reflected in the monomeric elution times of both proteins from the column (data not shown). It is more likely that the Rv0223c protein is monomeric for another reason, as the C-terminal residues of Mtb Rv0223c (Val486, Thr485, and Tyr484) are on the opposite side of the oligomerization domain from the side which is involved in monomer–monomer contacts based on the human ALDH-2 structure (PDB ID 1CE3). (Note that the His-tag is not visible in the electron density map and is therefore not shown in Fig. 4).

Sequence homology between Mtb ALDH Rv0223c and human mitochondrial ALDH

The partial amino acid sequences of Rv0223c and human mitochondrial ALDH were aligned based on secondary structure for sequence homology analysis, as shown in Fig. 5. The full length Mtb and human sequences are shown (487 and 518 amino acids, respectively), and were determined by NCBI Blastp to be 39% identical and 53% similar. The two ALDH signature sequences identified by Prosite [46], and including cysteine and glutamic acid active site residues, were conserved in both proteins and found at identical locations.

Discussion

The dye-resin chromatography method generates data about the interaction of protein and specific nucleoside ligand(s), and the identified ligand(s) may help stabilize the protein and improve the target protein’s crystallization (Fig. 2). Structural data obtained from such crystals may also show the interactions of the ligand with the protein. In conjunction with functional information derived from the identification of ligands, this may help elucidate the biochemical role of the protein. Additionally, since many drugs are analogues of nucleosides, proteins bound by nucleoside analogue drugs can be identified through this approach as a biomedical application (Fig. 5 of [24].

The native Mtb Rv0223c protein did not crystallize readily. As shown in Fig. 2, co-crystallization with NAD and NADH, which showed strong interaction by dye-ligand chromatography (Fig. 1), greatly improved the crystal quality and yielded structures of the complexes at resolutions of 1.8–2.0 Å (Fig. 3; Table 3). In contrast, NADP, adenosine, GTP, AMP and FAD, which showed relatively weak interactions (Fig. 1), generated lower resolution (2.3 Å or higher) structural data. We note that the ligands ADP and ATP, which showed strong interactions with Rv0223c, did not generate crystals (Table 4). Figure 3 reveals that ligands share a common interaction site within Rv0223c, and that Rv0223c contains a hydrophobic pocket and adenine recognition motif (composed of Gln217 and Glu239) in the coenzyme domain, which is a common motif for binding adenine-derivative coenzymes [47]. This information, in combination with our biochemical data on protein–ligand interactions (Fig. 1) and crystallization data from protein–ligand mixtures (Fig. 2), supports the conclusion that the binding of each ligand to Rv0223c occurs via contacts of the ligand to both the adenine recognition motif and hydrophobic pocket. Further, the stability of different protein–ligand crystals and their quality for high resolution structure determination depends on the degree of stabilization of protein by the interaction of the nicotinamide moiety of NAD(H) with residues in the catalytic domain.

A major bottleneck in structural genomics projects is the production of crystals suitable for analysis by X-ray diffraction. The demonstration in this work that information on ligands generated by dye-ligand chromatography can be used to identify ligands binding to nucleoside-binding proteins and to improve crystallization may be a significant contribution to overcoming this bottleneck.

There are several extensions to our method that we will make the process higher throughput. Using LC-MS (or LC-MS-MS) systems to identify native proteins in cell extracts that are eluted by specific ligands [48] may allow the protein identification process to be finished immediately after the ligand elution. In addition to Cibacron Blue F3GA dye resin, which was used in our dye-ligand chromatography approach, there are several other dye resins which are known to interact with a specific group of proteins [49]. These resins may be useful in binding groups of related proteins in a high throughput manner. The techniques described here and in Roberts et al. [24] could be applied to newly sequenced organisms to identify the metabolically important proteins that interact with nucleoside ligands, and to complement the annotation of each gene based on the interacting ligand(s) [51].