Background

Pathogenic bacteria rely on a variety of secretion systems to transport virulence factors, proteins that mediate host-pathogen interactions, across their hydrophobic cell membranes to sites where they can interact with the host. Gram-positive bacteria need only transport proteins across a single membrane, but Gram-negative bacteria require specialized secretion machinery that spans both inner and outer membranes. Mycobacterium tuberculosis, the causative agent of tuberculosis, was recently re-classified as a diderm bacterium when it was shown to have an outer membrane bi-layer — referred to as the mycomembrane — composed largely of mycolic acids [1]. In order to transport key virulence factors across both membranes M. tuberculosis has evolved specialized Type VII secretion systems (T7SS). The T7SSs were discovered based on attenuated strains of M. tuberculosis deficient in EsxA (ESAT-6, early secreted antigenic target of 6 kDa) secretion and are commonly called ESAT six (ESX) secretion systems [24]. In M. tuberculosis there are five gene clusters, named ESX-1 to ESX-5, which encode T7SS. Each gene cluster encodes a number of proteins that are either secreted or are building blocks for the secretion apparatus. ESX-1 is responsible for secretion of the important virulence factors EsxA and EsxB as well as other virulence-associated proteins (e.g., EspB, EspF, EspJ) that are secreted to the cell surface or extracellular milieu based on recognition of a conserved C-terminal signal sequence on the secretion substrates [58]. These secreted factors have been linked to mycobacterial virulence through studies of the attenuated BCG strain of M. tuberculosis [2, 4, 9]; in non-pathogenic Mycobacterium smegmatis the orthologous ESX-1 system is involved in conjugation [10, 11]. ESX-3 is critical for mycobacterial survival due to its role in metal acquisition [1214]. ESX-5 is important for the secretion of many members of the PE/PPE family of proteins that also play a role in virulence and cell wall integrity [1517]. The functional role of ESX-2 and ESX-4 is still unknown although ESX-4 appears to be the ancestral system from which the other ESX systems have evolved [18].

All ESX gene clusters contain at least three or four ESX conserved components (Ecc), named EccB, EccC, and EccD, with EccE being present in all ESX systems with the exception of ESX-4 [19]. Multiple copies of each core protein as well as other T7SS-associated proteins are present in the core complex resulting in a large ~1500 kDa particle [20]. The function of some core components is known, for example, EccC is a member of the FtsK/SpoIIIE-like ATPase family and provides the energy to transport proteins across the mycobacterial membrane(s) [21, 22]. EccD contains an N-terminal cytoplasmic domain followed by 11 predicted transmembrane helices, and may form the cytoplasmic membrane channel through which cargo proteins are secreted. The functions of EccB and EccE within the secretion apparatus are less clear. These proteins both have N-terminal transmembrane elements and large C-terminal regions predicted bioinformatically to be localized in the periplasm, but their molecular structures and interacting partners remain unknown.

Understanding the T7SS architecture is critical for development of new antitubercular agents. Currently, no structural data is available for three of the four conserved components EccB, EccD, and EccE. In this study we report the molecular structures of the periplasmic domain of EccB1 and the cytoplasmic domain of EccD1 from the ESX-1 cluster. The structures reveal probable functional surfaces of EccB1, and an unexpected dimerization by EccD1. Here we describe these structures in detail and how they might fit into the larger context of the T7SS.

Results and discussion

Structure of EccB1

M. tuberculosis EccB1 (Rv3869) is a 51 kDa protein containing a 40 amino acid (aa) N-terminal domain followed by a single membrane-spanning helix and a ~400 aa C-terminal fold. EccB1 is annotated as a protein domain of unknown function (DUF690) in the Pfam database [23]. In order to gain further insight into the role of EccB1 within the ESX machinery we determined the crystal structure of the C-terminal domains of EccB1 from M. tuberculosis (EccB1mt) to 1.7 Å resolution and of the orthologous protein (MSMEG_0060; EccB1ms) from the nonpathogenic mycobacterial species M. smegmatis to 3.07 Å resolution. Both EccB1 structures contain a single elongated fold in the shape of a distorted propeller, which has an unanticipated quasi 2-fold symmetry (Fig. 1). A structural comparison of the EccB1mt and EccB1ms structures shows that they are highly similar with an r.m.s.d. of 2.7 Å for the superposition of 381 amino acids (Dali Z-score 42.2); there is considerable variability in the conformation of the extensive unstructured loops connecting secondary structure elements which are, themselves, relatively well conserved (Fig. 2) Five domains are present in the structures including a core domain flanked by two repeat domains on either side. The central core domain consists of a 6 stranded β-sheet with 5 strands (β7-β19-β18-β5-β6) arranged in anti-parallel fashion with an additional strand (β21) parallel to strand β6 on the periphery of the sheet; the sheet is further stabilized by a disulfide bond between the two central strands (β5 and β18) of the sheet formed between Cys150 and Cys345 (EccB1mt) and Cys152 and Cys347 (EccB1ms). The four repeat domains each contain a 4 stranded β sheet and two α helices (Fig. 1c). Repeat 1 (R1) (residues S74-M124) and repeat 4 (R4) (residues G391-L445) are located between the core domain and the N-terminal transmembrane region while repeat 2 (R2) (residues E185-P241) and repeat 3 (R3) (residues V267-A320) are located on the opposite side of the core domain distal to the transmembrane region. The interfaces between R1/R4 and R2/R3 domains are formed by hydrophobic residues on the N-terminal helices of each repeat that fold together with each other and with hydrophobic residues from the proline rich strands downstream of each repeat’s C-terminal helix. The R2 and R3 domains also pack tightly with the core domain via residues on their N-terminal helices as well as β sheet residues. The tight packing involving residues on either side of multiple repeat domains gives EccB1 a stable fold with a continuous hydrophobic core and an elongated pseudo-symmetrical shape.

Fig. 1
figure 1

Overall structure and repeat domains of EccB1mt. a Domain organization of EccB1. The predicted transmembrane helix is indicated by a shaded rectangle. The protein variants used for structure determination are shown as horizontal lines. b Overall structure of EccB1mt. The structure is shown in cartoon representation with the central core domain in grey and repeats domains R1-R4 colored red, orange, green, and blue, respectively. The disulfide bond between Cys150 and Cys345 is shown as yellow spheres. c Repeat domains R1–R4 have a common fold. The isolated repeat domains are shown in the same orientation after superposition of repeats R2-R4 on repeat R1 using Chimera [52]

Fig. 2
figure 2

Superposition of EccB1mt and EccB1ms structures. a EccB1mt (grey) and EccB1ms (blue) were superimposed using Chimera. b Structure-based sequence alignment of EccB1mt and EccB1ms prepared with ESPript (http://espript.ibcp.fr) [53] with numbering and secondary structure elements derived from the EccB1mt sequence and structure

A comparison of the repeat domains of EccB1mt gives clues to the evolution of the protein. Pairwise sequence alignments of the repeats shows that R2, R3, and R4 show 26, 33 and 27 % sequence identity, respectively, to R1. Pairwise alignments comparing R2–R4 to all other repeats revealed that only R1 has significant identity to all 3 other domains (Fig. 3). Therefore, it appears that R1 is the ancestral domain with R3 sharing more conserved features with R1 than do either R2 or R4. EccB1ms contains a corresponding set of repeats in the same arrangement as seen in EccB1mt: R1 (residues Q75–K127) is membrane proximal, R4 (residues G392–L447), the central core domain, R2 (residues Q187–P243), and R3 (residues G267–E323) which is distal to the membrane.

Fig. 3
figure 3

Structure-based sequence alignment of repeat domains of EccB1mt. Alignment was rendered using ESPript. Amino acid numbering above the alignment refers to the repeat domain R1 sequence and indicated secondary structure elements are derived from the repeat domain R1 structure

EccB1 does not bear significant sequence similarity to any protein of known structure, and Dali searches using the complete EccB1 structures revealed no proteins with significant structural homology. However, Dali searches using only EccB1mt repeat 1 (S74–P124) revealed weak homology (r.m.s.d. 2.7 Å and Dali Z-score of 5.0) to the N-terminal domain of PlyCB (PDB 4 F87, residues 14–70) from streptococcal C1 bacteriophage [24]. Eight PlyCB monomers assemble into a ring that associates with the bacterial cell wall and facilitate phage egress by tethering the degradative PlyCA subunit to the bacterial cell wall. The structural similarity between the two proteins and a common localization of both to bacterial cell envelope structures is intriguing but no clues to EccB1 function are apparent from our examination of PlyCB.

Structure of EccD1mt

EccD1mt (Rv3877) is a 54 kDa protein containing an ~110 amino acid (aa) N-terminal ubiquitin-like domain followed by a 30 aa linker and 11 closely spaced transmembrane helices at its C-terminus. The ubiquitin-like domain of EccD1 classifies it as a member of the YukD family within the Pfam database. Based on the characteristics of the transmembrane regions the N-terminal portion of EccD1 is predicted to be localized in the cytoplasm.

We grew crystals of the predicted cytoplasmic domain of EccD1 from M. tuberculosis (cyto-EccD1mt) which diffracted to 1.88 Å. However, we could not obtain crystals of Se-Met containing cyto-EccD1mt and attempts to perform heavy atom soaks of fragile native crystals of cyto-EccD1mt were unsuccessful. Therefore, we obtained crystals and determined the structure of cyto-EccD1mt fused to maltose binding protein (MBP) at a resolution of 2.20 Å by molecular replacement using an MBP structure (PDB ID 1ANF) as the search model [25]. We subsequently solved the 1.88 Å cyto-EccD1mt structure by molecular replacement using the EccD1mt segment of the MBP fusion protein. In both structures EccD1mt residues 20–109 adopt an identical ubiquitin-like fold characterized by a β grasp motif and an anti-parallel β sheet with strands in the order 2,1,5,3,4 (Fig. 4). The MBP fusion protein used as a crystallization aid provides additional crystallization contacts, but it does not perturb the fold of cyto-EccD1mt (Fig. 4d,e). The two EccD1mt structures are superimposable with an r.m.s.d. of 0.7 Å over 90 residues and a Dali Z-score of 18.8.

Fig. 4
figure 4

Structure of the cytoplasmic domain of EccD1mt. a Domain organization of EccD1. The predicted transmembrane helices 1–11 are indicated by shaded rectangles. The protein construct used for crystallization is shown as a horizontal line. b cyto-EccD1mt monomer in cartoon representation colored in rainbow colors from N-terminus (blue) to C-terminus (red). The secondary structure elements are labeled. c cyto-EccD1mt dimer in cartoon representation with acidic residues shown in stick representation (see Fig. 5). d MBP-cyto-EccD1mt dimer in cartoon representation with MBP moieties colored in grey and cyto-EccD1mt domains colored in blue and purple. e A close-up view of the MBP-cyto-EccD1mt dimer. The orientation corresponds to panel c

Interestingly, the asymmetric unit of both crystal forms contains two EccD1 molecules and in both crystal forms the two EccD1 molecules are arranged as a head-to-tail homodimer stabilized by an extensive interface. The interface is formed by interlocking side chains from β strands 1 and 2 and the N-terminal α-helix of both EccD1 molecules (Fig. 4) and ~650 Å2 of each EccD1 molecule (13 % of the total surface) is buried in the interface as calculated with the PISA webserver [26]. The interaction is stabilized by 4 hydrogen bonds and a cluster of buried hydrophobic residues including Met1, Val54, and Val58 resulting in a solvation energy of −13.9 kcal/mol and a Complex Significance Score of 1.0 calculated by the PISA server. The extensive nature of the interface and its re-occurrence in both crystal forms, with or without the MBP fusion, suggests that EccD1 is a natural homodimer.

Dimerization of cyto-EccD1mt creates a wide open-ended groove bordered on two sides by the α1/β3 loops (Fig. 5). The floor of the groove is formed by the two α helices. Notably, the dimerization interface brings acidic residues (Glu45, Asp49, Asp50, Glu57, Glu60, and Asp61) from both chains into this groove. These acidic residues are not offset by the presence of any basic residues in this region thus they create a highly negative surface (Fig. 5b).

Fig. 5
figure 5

Dimerization of cyto-EccD1mt creates a negatively charged groove. a cyto-EccD1mt dimer is shown in cartoon representation underneath a semitransparent surface. Clustered acidic residues are shown in stick representation. b Electrostatic surface calculated using the APBS server [54] with protonation states at pH 7.0 assigned by PROPKA [55]. The surface was colored +10 eV (blue) to −10 eV (red)

Putative function of EccB1 and EccD1

Mutations in EccB3 of the ESX-3 secretion system have been shown to confer drug resistance in M. tuberculosis [27]. The mutations found to confer resistance (Arg14Leu, and Asn24His) occur in the small cytoplasmic domain preceding the transmembrane element of EccB3, a region not present in our EccB1 constructs which contain the soluble periplasmic domain. The fact that mutations in this region confer drug resistance indicates an important function for this short region perhaps in mediating interactions with other cytoplasmically exposed components of the T7SS. The elongated shape and continuous hydrophobic core of EccB1 suggest that it may serve a structural role – perhaps forming part of a structure that spans the inner and outer membrane components of the ESX secretion system. The structural similarities between PlyCB, the viral cell wall binding protein complex, and EccB1 hints that EccB1 may also bind elements of the peptidoglycan layer, but there is not yet any experimental data to support this idea. However, post-translational modification of secreted bacterial proteins with O-linked polysaccharides has been shown to be important for solubility or maintaining subcellular localization to the cell wall [28, 29]. EccB1 contains 24 putative glycosylation sites, as predicted by the NetOGlyc webserver [30], and many of these are surface-exposed in the EccB1 structures (including Ser143, Thr144, Ser351, and Ser356). While this manuscript was under preparation, the ATPase activity of EccB1 has been reported [31]. Further studies are needed to define the precise role of EccB1 in the context of a functional ESX-1 secretion complex.

The dimerization of the cytoplasmic domain of EccD1 raises interesting possibilities regarding the nature of the transmembrane pore. Each EccD1 monomer has 11 transmembrane elements thus a dimer would have a total of 22 transmembrane elements. Each monomer may form an independent pore resulting in a pair of closely associated channels, or the transmembrane elements may comprise a single, larger, transmembrane channel. The cytoplasmic domain itself is connected to the first transmembrane element by a 30 amino acid linker that may facilitate protein-protein interactions, either with the cytoplasmic EccD1 domain or other components of the secretion system, or it may simply form an extended tether allowing increased mobility of the ubiquitin-like domains.

The negatively charged groove of the EccD1 dimer indicates that it should associate with a positively charged partner(s). It may act to recruit other T7SS components or secretion substrates with positively charged patches into the system, or it may be part of a gating element required to close the channel during periods of inactivity. The residues contributing to the negatively charged groove are not conserved in EccD1 homologs from other ESX systems indicating that they may serve a system-specific role. Indeed, the ESX-1 locus encodes a variety of secretion substrates not found in the paralogous M. tuberculosis ESX systems and thus it is likely that the ESX-1 system has structural adaptations to enable the secretion of these substrates [68, 32]. As more structures of ESX-1 components are determined likely partners for interaction with the EccD1 dimer may be revealed.

Conclusions

In summary, we have determined the structures of soluble domains of two integral, conserved components, EccB1 and EccD1, of the ESX-1 secretion channel. Given the importance of the ESX-1 secretion system to mycobacterial virulence, our structures provide crucial information about the molecular makeup of this important protein complex that will aid future drug development efforts.

Methods

Expression and purification of EccB1mt

A construct for expression of the periplasmic domain of EccB1mt (residues 72–463) was designed based on predicted transmembrane helix using the TOPCONS server [33], secondary structure prediction using the JPred4 server [34], and the sequence alignment of EccB1 orthologs (Additional file 1: Figure S1). The DNA fragment was PCR-amplified from M. tuberculosis H37Rv genomic DNA using primers EccB1_F72_Nco 5′–CACCATGGGCACCAGCCTGTTCACCGACC and EccB1_RS463_Hind 5′–GCAAGCTTACAGCGTGTCGTGCTCGAGCAG, and cloned into a modified pET-22b(+) vector (Novagen), which contains the Escherichia coli DsbA signal sequence, a hexahistidine tag and a tobacco etch virus (TEV) protease cleavage sequence.

EccB1mt was expressed in E. coli Rosetta2(DE3) strain using LB media and 0.5 mM IPTG for induction. Cells were harvested after 4 h incubation at 18 °C, resuspended in 20 mM Tris–HCl pH 8.0, 300 mM NaCl buffer, and lysed using microfluidizer (Avestin). EccB1mt was purified via a Ni-NTA affinity column, incubated with TEV protease to remove the hexahistidine tag, and passed over a Ni-NTA column to remove uncleaved protein, and further purified by size exclusion chromatography using a Superdex 200 column (GE Healthcare). Protein was flash-frozen using liquid nitrogen and stored at −80 °C.

Crystallization and structure determination of EccB1mt

Crystals were grown using the sitting drop vapor diffusion method with precipitant containing 0.1 M Tris–HCl pH 5.6, 15 % PEG2000 MME, 10 mM NiCl. Crystals were transferred to crystallization solution supplemented with 20 % glycerol, or with 20 % glycerol and 0.5 NaI [35], and flash-frozen in liquid nitrogen.

Data were collected at the 22-ID beamline at the Advance Photon Source, Argonne National Laboratory, and processed using XDS [36] and HKL-3000 [37]. Iodide ion positions were determined using SHELXD [38] as implemented in HKL-3000, and phases were calculated using SHARP [39]. The model was built using Buccaneer [40] and Coot [41], and refined by REFMAC5 [42] using TLS groups defined by the TLSMD server [43]. The final structure includes residues 74–458.

Expression and purification of EccB1ms

The periplasmic domain (residues S73-G479) of the MSMEG_0060 gene was PCR-amplified from M. smegmatis mc2155 genomic DNA with the gene-specific primers MsEccB1.For. 5′-AACCTGTATTTCCAGAGTAGTGACCAGCTGCTGGTGG and MsEccB1.Rev. 5′-TTCGGGCTTTGTTAGCAGTTAGCCCTCCCCGCTCG and cloned into the pMAPLe4 expression vector [44], which appends a TEV protease cleavable hexahistidine tag to the N-terminus of the target protein, using the Gibson ISO cloning method [45]. The sequence of the expression clone was verified by DNA sequencing (Genewiz, Piscataway, NJ).

Recombinant protein was overexpressed in E. coli BL21(DE3) by inducing protein expression, of 1 L Terrific broth cultures, at an OD600 of 1.0 with the addition of IPTG to 0.5 mM. Cell growth was continued overnight at 18 °C. The following day the cells were harvested by centrifugation and resuspended in Buffer A (20 mM Tris, pH 8.0, 300 mM NaCl, 10 % Glycerol) containing 10 mM imidazole, 1 mM EDTA and Complete protease inhibitor and lysed by sonication. The lysate was clarified by centrifugation (15,000 × g, 30 min, 4 °C) and the supernatant was loaded on a Ni-NTA affinity column equilibrated in Buffer A. After extensive washing the bound protein was eluted with Buffer B (Buffer A containing 250 mM imidazole). The target protein was further purified by size exclusion chromatography using a Sephacryl S-100 column (GE Healthcare) equilibrated in Buffer A.

Crystallization and structure determination of EccB1ms

Crystals of EccB1ms were grown using the hanging drop vapor diffusion method by mixing protein at a 1:1 ratio of protein to reservoir solution (14 % PEG 8000, 200 mM NaCl, 100 mM phosphate-citrate pH 4.2). Crystals were cryoprotected by a brief soak in reservoir solution containing 20 % propylene glycol. Data from a single crystal was collected at beamline 24-ID-C at the Advanced Photon Source, Argonne National Laboratory. The data were processed with XDS [36] and the structure solved by molecular replacement using the program Phaser [46] and a homology model, prepared with the Phyre2 web server [47], based on the structure of M. tuberculosis EccB1 (PDB ID 4KK7). The structure was refined with BUSTER [48].

Expression and purification of cyto-EccD1mt

A construct for expression of the cytoplasmic domain of EccD1mt (residues 21–109) was designed based on predicted ubiquitin-like domain using the HHpred server [49]. The DNA fragment was PCR-amplified from M. tuberculosis H37Rv genomic DNA using primers EccD1_F21_Nco 5′–CACCATGGCCACCACCCGGGTGACGATC and EccD1_R109_SpeEcoR 5′–GGGAATTCACTAGTCATGACACCAGAGTCAGCAGTGAC, and cloned into a modified pET-Duet1 vector, which contains an N-terminal hexahistidine tag and TEV protease cleavage sequence. To create a maltose-binding protein (MBP) fusion construct, the same DNA fragment was cloned into a modified pET-22b(+) vector, which contains an N-terminal hexahistidine tag and TEV protease cleavage sequence followed by MBP sequence. Both cyto-EccD1mt and MBP-cyto-EccD1mt proteins were expressed and purified as described for EccB1mt. 5 mM maltose was included in the size-exclusion buffer during purification of MBP-cyto-EccD1mt variant to obtain ligand-bound MBP [50].

Crystallization and structure determination of MBP-cyto-EccD1mt and cyto-EccD1mt

Crystals of cyto-EccD1mt were obtained by sitting drop vapor diffusion method using 0.1 M Tris–HCl pH 8.5, 0.2 M Mg chloride, 30 % PEG4000 as precipitant. Crystals were cryoprotected using crystallization solution supplemented with 10 % glycerol, and vitrified in liquid nitrogen. Crystals grew as thin hexagonal plates and were mounted in cryo-loops with 60° tilt (Mitigen) to avoid overlapping reflections along the crystallographic c axis (Table 1). Crystals of MBP-cyto-EccD1mt were obtained by sitting drop vapor diffusion method using 0.1 M HEPES pH 7.5, 1.4 M Na citrate, and cryoprotected using crystallization solution supplemented with 20 % glycerol.

Table 1 Diffraction data collection and refinement statistics

Data were collected at the 22-ID beamline at the Advance Photon Source, Argonne National Laboratory, and processed using XDS [36]. The structure of MBP-cyto-EccD1mt was solved by molecular replacement using Phaser [46] and an MBP structure as a search model (PDB ID 1ANF) [25]. The electron density modification was performed using Parrot [51], and the model was extended using Buccaneer and Coot. The fragment corresponding to cyto-EccD1mt from the structure of MBP-cyto-EccD1mt was used as a search model to solve the structure of cyto-EccD1mt alone using Phaser. The structures were refined using REFMAC5 and TLS groups defined by the TLSMD server.

Availability of supporting data

The structure factors and atomic coordinates have been deposited in the Protein Data Bank under accession codes 4KK7 (EccB1mt), 5CYU (EccB1ms), 4KV2 (cyto-EccD1mt), and 4KV3 (MBP-cyto-EccD1mt).