Background

The protein encoded by the ybgI gene of Escherichia coli is 247 residues in length and has a molecular weight of 27 kDa. It belongs to the DUF34 family of proteins [1]. No biological function is known for members of this sequentially related family of, at present, 67 proteins. One of the members of this family, NIF3 yeast, which has 22% identity with ybgI, is reported to interact with the yeast transcriptional coactivator NGG1p, but the exact function of this interaction is not known [2]. It has been suggested that the product of the human gene, NIF3L1, and its mouse ortholog, Nif3l1, which have 22% identity with ybgI and 37% identity with yeast NIF3, inhibits Ngg1p from translocation to the nucleus or that NIF3 binds to Ngg1 in the cytoplasm and enters the nucleus by cotransport [3]. Analysis of the gene expression levels in Escherichia coli under conditions of genotoxic stress caused by mitomycin C DNA damage, showed that the expression level for ybgI was significantly induced[4]. This protein has been included as a structural genomics target [5, 6] for a study focusing on proteins which have no known function. The initial targets for this project were selected from the first completely sequenced bacterial genome of the Haemophilus influenzae [7]. The protein ybgI is a sequence homolog of Haemophilus influenzae HI0105 with a sequence identity of 59%. The ybgI protein was cloned, expressed and the crystal structure was determined to 2.2-Å resolution.

Results and Discussion

The ybgI protein consists of two similar interlinked a/ß domains; both are 3-layer sandwiches (alpha-beta-alpha) as shown in Figure 1. The first domain has a 5-stranded mixed ß-sheet with two a-helices on one side and three a-helices on the other side. Two of the three a-helices are approximately parallel to the ß-strands of the ß-sheet and the third is shorter, approximately perpendicular to the ß-strands and leads over to the second domain. The order of the ß-strands is 1-4-3-2-11. The second domain also has a central mixed ß-sheet but has 6 ß-strands with the order 5-6-8-9-10-7; the ß-sheet is flanked on each side by two a-helices and there is an additional short a-helix leading back to domain 1. The crystallographic asymmetric unit contains three dimers. The application of the three-fold crystal symmetry reveals that the quaternary structure is a toroid formed by three crystallographically related dimers. In the crystals, these toroids stack forming long tubes. The toroidal structure is shown in Figures 2A and 2B.

Figure 1
figure 1

The crystal structure of ybgI from E. coli. (A) Stereo view of the secondary structure cartoon showing the fold of the polypeptide chain. Domain 1 is shown with helices in blue and ß-strands in red and domain 2 is shown with helices in cyan and ß-strands in rose. The strands and helices are numbered sequentially from the N-terminus. This figure was prepared using MOLSCRIPT [31] and Raster3D [32, 33]. (B) Topology diagram of the secondary structure. Helicies are represented as rectangles and ß-strands are represented as arrows.

Figure 2
figure 2

(A) Side view of the toroidal structure. The dimers are colored slate blue and cyan, chocolate brown and tan and green and lime green. (B) Top view of the toroidal structure with the same coloring as A. Secondary structure cartoons are included inside transparent surface representations. These figures were prepared using PyMol[34].

Searching with CE [8, 9], DALI [10, 11] and SCOP [12, 13] yielded no other polypeptides with the particular arrangement of mixed ß-sheets and a-helices observed in either domain.

The toroid is composed of six polypeptide chains generated by the application of 3-fold symmetry on the dimer. The 2-fold noncrystallographic symmetry operators of the dimers are perpendicular to the three-folds. The inside diameter of the toroid is approximately 30 Å and the outside diameter is approximately 90 Å; the height of the toroid is 57 Å. Due to the 2-fold, the toroid appears the same when approached from either direction along a 3-fold symmetry axis. The superposition of the native subunit structure and selenomethionine subunit structure gives RMSDs for the Ca atoms of 0.2-0.3 Å. The z positions of the toroids on the crystallographic 3-folds differ; in other words, the non-crystallographic 2-folds are not coplanar. It is also of interest to note that the relative positions of the toroids differ between the native and selenomethionine crystals. For instance, the E chain selenomethionine/methionine at position 135 is packed up against the C chain region of 138–141 in the selenomethionine structure and against the C chain region of 140–144 in the native structure.

The most likely region for the active site is a group of conserved residues which includes four histidines (63, 64, 97, 215), two glutamic acids (194, 219), one aspartic acid (101), one asparagine (108), one cysteine (171), one tyrosine (22) and one tryptophan (68). There are also two metal ions 3.3 Å apart in the selenomethionine protein and 2.5 Å apart in the native protein bound by this cluster of residues. In the early refinement of the selenomethionine structure these were treated as 'water' molecules and the B-values became very low indicating that they must be something heavier than oxygen. The anomalous Fourier map of the selenomethionine data indicates that there is a significant anomalous signal at these positions, though much lower than selenium. The X-ray fluorescence identified the presence of Fe in the protein sample. One metal ion is coordinated by H64 Ne2, H215 Ne2 and E219 Oe1; the other metal ion is coordinated by D101 Od1 and Od2, E219 Oe2 and H63 Ne2. This grouping is set back into the inside wall of the toroid and includes residues from both domains. The metal ion sites of the dimer are at opposite ends of a cavity that extends across the dimer interface. This cavity is separated from the center of the toroid by the Y22 residues of the dimer chains. The Y22 residues narrow the access to the cavity from the center of the toroid to approximately 14 Å The distances between the metal ions of the dimer chains are 21.9 and 25.6 Å. The distances between 3-fold related metal ions are 45.0 and 42.5 Å. One of the six putative active sites is shown in Figure 3. In the native protein, the metal positions may be filled or partially filled by magnesium ions which are present in both the growth medium and in the crystallization solution. An anomalous fourier using the native data does not reveal any anomalous signal at these positions and negative results using X-ray fluorescence eliminate the presence of Fe, Zn, Cu, Ni, and Co. In this structure, only 11 sites were included. The electron density at these positions tends to be somewhat smeared. The appearance of the electron density and the refinement of the B factors were used as guides to include or exclude ion sites. The protein structure around the sites is quite good. The presence of iron in the selenomethionine protein sample may indicate the adventitious uptake of iron during preparation since the procedure includes the addition of iron sulfate as a component in the growth medium [14]. The intrinsic metal ions for this protein are not known. The inclusion of histidine, glutamic acid, and aspartic acid in the putative active site with a bridging glutamic acid between the ions is in keeping with cocatalytic sites in a number of proteins where the metal ions are Zn or Zn and Fe, Mn, or Mg [15]. The constancy of the protein structure around these sites supports the view that these are catalytic rather than structural sites.

Figure 3
figure 3

The putative active site with the metal ions shown in silver balls and four water molecules shown as cyan balls. This figure was prepared using MOLSCRIPT [31] and Raster3D [32, 33].

An E. coli operon has been identified that includes the nei gene which codes for endonuclease VIII and four other genes, ybgI, ybgJ, ybgK, ybgL [16]. Endonuclease VIII is an oxidative base excision repair protein. The proteins encoded by ybgJ and ybgK are putative carboxylases and the protein encoded by ybgL is a putative lactam utilization protein. The inclusion of ybgI in the nei operon of other bacteria is not well conserved.

The highly conserved residues of the DUF34 family are concentrated in two regions of the ybgI structure: at the putative active site and on the side of a groove between the polypeptide chains of the trimer. Figure 4 shows the conserved residues mapped onto the surface of the molecule.

Figure 4
figure 4

A view looking down into the toroid at the putative active site. A gap between the dimers and a trough lead down toward the site. Where conserved residues contact the surface, the surface has been colored red. Again the metal ions are depicted as silver balls. This figure was prepared using PyMol [34].

The toroidal ring quaternary structure brings to mind many proteins that are involved in DNA metabolism. In a recent review [17], Hingorani and O'Donnell examine these proteins and speculate on the convergence to the toroidal shape as being a means of providing an enclosed environment for otherwise chemically unfavorable reactions. The functions of these proteins include sliding clamps and helicases that catalyze ATP-fuelled DNA unwinding, and exonucleases and topoisomerases that chemically modify DNA. For instance, the exonuclease of ? bacteriophage is a trimer and forms a toroid with an inner diameter of 30 Å at one end and 15 Å at the opposite end. The double-stranded DNA is encircled by the exonuclease and processively hydrolyzes one of the two strands. The enzyme moves with a specific orientation and degrades the 5' strand so that the product is the 3' strand [18]. The ybgI structure is a symmetric toroid, the inner diameter is the same approached from above or below.

In a review of di-iron-carboxylate proteins (proteins with di-iron centers bridged by carboxylate residues and oxide/hydroxide groups) [19], the authors grouped the known structures into four structural categories. The first three categories are all variations on helix bundles. The fourth class is the a/ß sandwich category which includes purple acid phosphatases. These proteins have di-metal centers (Fe and Zn) that catalyze the hydrolysis of phosphate esters. There is an active site tyrosine radical that is responsible for the purple color and the OH is 2.2 Å from the iron atom. In ybgI, the closest tyrosine is 11 Å away from the metal ions.

Conclusions

The quaternary structure taken together with the upgraded response to DNA damage, the inclusion in the operon with endonuclease VIII, and sequential homology with the yeast NIF3 protein appears consistent with a function that involves DNA repair or involvement in the transcription process. Comparison of the active site with known structures has not yet yielded a definitive clue concerning the specific biological function. Biochemical studies to further profile the function of the ybgI protein are in progress.

The atomic coordinates and structure factors of the selenomethionine and native structures of ybgI are deposited in the Protein Data Bank[20] as 1NMO and 1NMP, respectively.

Methods

Cloning, expression, and purification

The ybgI gene was PCR, polymerase chain reaction, amplified from Escherichia coli MG1655 genomic DNA and subcloned into pDONR201 plasmid using Gateway Technology (Invitrogen). For expression, the coding sequence was transferred into pDEST14 plasmid using site-specific recombination (Invitrogen). The protein was produced in E. coli strain BL21 Star (DE3) (Invitrogen) that was transformed with pDEST14. Cells were grown on LB media containing 100 µg/µL ampicillin at 37°C to an A600 of 0.6 and induced with 1 mM isopropyl ß-D-thiogalactoside for 3 hours. The protein was purified by column chromatography in two steps using Source 30Q (Pharmacia) and Butyl-560M (Toyopearl).

Crystallization and structure determination

Crystals were obtained by the vapor diffusion method in hanging drops at room temperature for the native protein and the selenomethionine derivative. The reservoir solution for the native protein included 0.1 M cacodylate buffer at pH 7.5, 0.1 M magnesium acetate, 15% (w/v) polyethylene glycol 8000 and 5% (v/v) polyethylene glycol 400. The reservoir solution for the selenomethionine protein included 0.1 M imidazole buffer pH 8.0, 0.2 M calcium acetate and 15% (w/v) polyethylene glycol 3350. The hanging drops were formed by combining equal volumes of protein solution and reservoir solution. The protein concentrations were 4.7 mg/mL for the native protein and 8.2 mg/mL for the selenomethionine protein. For data collection the crystals were passed through a solution made of equal volumes of reservoir solution and saturated lithium formate for the native crystals and 2 volumes of reservoir solution and one volume of saturated lithium formate for the selenomethionine derivative [21].

Diffraction data were collected at the Advanced Photon Source (APS) South East Regional Collaborative Access Team (SER-CAT) beam line 22ID-D at Argonne National Laboratory. All data were collected at 100 K. Data were collected at three wavelengths for the selenomethionine derivative crystal (0.9795 Å, 0.9793 Å and 0.9780 Å) and at 0.9793 Å for the native crystal. The data were processed using D * TREK [22].

The selenium sites were found with Shake-N-Bake [23, 24]. The polypeptide has four methionine residues and there are three dimers (six monomers) in the asymmetric unit. The 18 highest-ranked sites were entered into SOLVE [25]http://www.solve.lanl.gov. SOLVE chose the opposite hand and gave a solution with 21 sites. RESOLVE [26] was not able to find the correct noncrystallographic symmetry, but once this was determined by visual and vector examination of the sites, RESOLVE was able to build backbone for 911 of the 1482 residues and place 491 sidechains. By superimposing the partial models for the six copies of the polypeptide chain, a nearly complete tracing was determined. CNS [27] was used to refine this model against the data. As the refinement progressed the noncrystallographic symmetry restraints were reduced. XTALVIEW [28] was used to visualize the structure and to make manual adjustments of the coordinates to improve their agreement with the electron density map. REDUCE and PROBE [29] were used to guide rebuilding to help resolve side chain conformations and PROCHECK [30] was used to validate the structures.

The selenomethionine data and the native data are not isomorphous. The cells differ by greater than 1% in the a and b unit cell dimensions. Consequently, the native structure was solved by molecular replacement using CNS. The dimer unit was used as the search molecule. Refinement against the diffraction data was also accomplished using the CNS package. As in the selenomethione structure, noncrystallographic symmetry restraints were used throughout the refinement but the weighting was reduced after the initial rounds of refinement. The data and refinement statistics are shown in Table 1.

Table 1 X-Ray Data Processing and Refinement Statistics

Metal ion determination

X-ray fluorescence scans were performed at the absorption edges for Zn, Cu, Ni, Co and Fe at the Advanced Photon Source (APS) Industrial Macromolecular Crystallography Association Collaborative Access Team (IMCA-CAT) beam line 17-ID at Argonne National Laboratory. Solution samples of the native and SeMet proteins were used for the scans. The scans indicated the presence of Fe in the SeMet protein and no Zn, Cu, Ni, or Co, and found none of these metals present in the native protein solution.