Background

Isocitrate Dehydrogenase (IDH) enzymes convert isocitrate to oxoglutarate in most living organisms. Based on the cofactor utilized, they may be either Nicotinamide Adenine Dinucleotide (NAD) dependent [EC:1.1.1.41] or NAD phosphate (NADP) dependent [EC:1.1.1.42]. Other members of the family are isopropylmalate dehydrogenase (IMDH) [EC:1.1.1.85], homoisocitrate dehydrogenase (HIDH) [EC:1.1.1.87] and tartrate dehydrogenase [EC:1.1.1.93] [1]. Isocitrate Dehydrogenases are important enzymes essential for survival of all organisms. In humans, mutations in IDHs have been associated with diseases like Glioblastoma [2]. IDH is also important for applications in biotechnology, drug design against pathogens and for general understanding of biochemistry and systems biology.

IDHs are functionally either monomers or dimers. The functionally monomeric type has an active site completely defined by a single protein chain, while the functionally dimeric type has active sites contributed to by residues from both chains. Examples of functional monomeric type are the Azotobacter vinelandii IDH [3] [PDB:1ITW] and Corynebacterium glutamicum IDH [PDB:2B0T]. Bacteria such as Mycobacterium tuberculosis [4] and Vibrio [5] have both dimeric type IDHs (IDH1) and monomeric type IDH (IDH2). Functionally dimeric IDHs are more abundant and diverse. In this study, unless otherwise mentioned, references to IDH from Mycobacterium, Vibrio or any such bacterium refers to the dimeric type IDH.

Previous studies [6, 7] have classified dimeric NADP-dependent IDHs into two groups: Subfamily I (S1-IDH) and Subfamily II (S2-IDH), while NAD-dependent IDHs have been classified as Subfamily III (S3-IDH). There are several unclassified IDHs which do not fall into these three subfamilies. Phylogenetic analysis of increasingly available data [810] tends to indicate that cofactor-specificity is not a monophyletic property; i.e., NAD-dependent IDHs may be found in all subgroups and are ancestral to all dimeric IDHs. NADP-dependent IDHs are not found in subfamily III, while the functionally monomeric IDHs are all NADP-dependent.

S1-IDHs are homodimers with two active sites, active in soluble dimeric form, and are found in Prokaryotes. Most are NADP-dependent, such as Escherischia coli IDH [11] and Bacillus subtilis IDH [12]. Some are NAD-dependent, such as Acidothiobacillus thiooxidans IDH [PDB:2D4V] [13] and Hydrogenobacter thermophilus IDH [14].

Subfamily II IDHs are homodimers, and are similar in structure and function to S1-IDHs, but share low sequence identity (15-30%) with them. Subfamily II consists of predominantly eukaryotic IDHs such as Human cytosolic IDH [15]. Bacterial IDHs also belong to subfamily II, such as Thermotoga maritima IDH (TmIDH) [PDB:1ZOR] [16] and Desulphotalea psychrophila IDH (DpIDH) [PDB:2UXQ] and [PDB:2UXR] [17], both of which are extremophiles, and the recently identified Sinorhizobium meliloti IDH [PDB:3US8]. Most known members of the group are NADP-dependent, but anaerobic bacteria (such as Clostridia) are thought to have NAD-dependent members.

IDHs have various functions in the biochemistry of organisms. Anaerobic bacteria use NAD-dependent IDHs for diverse purposes such as glutamate biosynthesis [18]. In aerobic organisms, IDHs catalyze an irreversible step in the Tricarboxylic Acid cycle (TCA) or Krebs cycle, responsible for respiration. Eukaryotic mitochondria use NAD-dependent IDHs of subfamily III for this purpose. Aerobic bacteria dependent on the Glyoxylate bypass for survival during conditions of glucose starvation have NADP-dependent IDHs that perform this role [8].

To open the Glyoxylate bypass, IDH is inactivated by kinase phosphorylation in enteric bacteria such as Escherischia coli IDH [19, 20], but not in others like Bacillus subtilis IDH [21]. This specificity is facilitated by the interaction of kinase AceK with the AceK Recognition Segment (ARS) of E. coli IDH [20, 22]. Eukaryotic NADP-dependent IDHs replenish pathways concerned with lipid synthesis [23] oxidative stress repair [24] with NADPH or oxoglutarate. Eukaryotic cells contain at least two kinds of NADP-IDH isoenzymes: cytosolic and mitochondrial. Fungi, plants and various protists may have localized IDH isoenzymes for organelles like chloroplasts, glyoxysomes, peroxysomes etc. This functional diversity in subfamily II implies that the enzymes have evolved diverse catalytic rates and mechanisms of regulation [25].

Regulation by phosphorylation has not been shown to exist in eukaryotic subfamily II IDHs. However dimeric NADP-dependent IDH from the pathogenic bacterium Mycobacterium tuberculosis [4, 26, 27] (M.tb IDH or MtIDH1) is shown to get phosphorylated [26] during the persistent stage. M.tb IDH is closer in sequence identity to Eukaryotic IDHs and belongs to subfamily II. The closest homologous resolved structure in the Protein Data bank [28] belongs to its host i.e. Human cytosolic IDH, sharing 65.4% identity with MtIDH1. The recently identified Sinorhizobium IDH [PDB:3US8] is a subfamily II bacterial IDH, and has a higher identity at 72.4%, but is not included in study.

NADP-dependent IDH1 from Mycobacterium tuberculosis takes part in the TCA cycle, and has a functional glyoxylate bypass. An attempt [26] was made to compare it's function with that of Escherischia coli IDH, and identify the kinase responsible for deactivating IDH1 by phosphorylation. The kinase PknG was seen to be the most likely candidate. It phosphorylated Serine 213 in M.tb IDH1. To decipher the mechanism of deactivation, a homology model of the M.tb IDH1 [27] was constructed.

This structure revealed that the residue targeted for phosphorylation by the kinase PknG, is in a different location from that of E.coli IDH [29]. E. coli IDH gets phosphorylated at Serine 105 which is located within the active site cavity, and takes part in anchoring the substrate isocitrate. M.tb IDH1 seems to have a remote buried target, where the target Serine, while located close to the active site, does not have a direct role to play in catalysis. Moreover, the mechanism of access to this Serine by any kinase attempting to phosphorylate the residue is unclear.

The mechanism of access to this residue cannot be explained by simulation of the model structure alone, and the need was felt to compare the results with other IDH structures to understand the significance of differences in atomic motions. The current study therefore concentrates mainly on dimeric NADP-dependent IDHs from subfamilies I and II and additionally subfamily IV (Table 1), with an emphasis on regulation in dimeric M.tb IDH.

Table 1 IDH representative structures.

Methods

We first extend earlier phylogenetic studies [6, 810, 30] using a larger number of sequences and combine this with structural information. Representative dimeric IDH structures were first aligned using the structural alignment tool STAMP [31] to ensure that functional residues (Table 1 for representative list) were aligned. This was then subject to CLUSTALW [32] realignment by preserving gaps using the Jalview [33] interface [see Additional file 1]. This was done to ensure that catalytic and important scaffold residues are aligned as subsequent sequences were added to the initial set.

Full-length reviewed protein sequence ids provided by the ExPasy Enzyme database [34] [EC:1.1.1.42] from UniProt [35] and Protein Databank [28] structures were used. BLAST was run on each of these sequences using the UniProt web interface to identify similar sequences. We also added eukaryotic NAD-dependent IDHs yielding a dataset consisting of 111 dimeric IDH sequences [see Additional File 2].

Average distance (UPGMA) and neighbor joining methods [36] were initially used through the Jalview interface to generate phylogenetic trees (Figure 1). The average distance method tree for dimeric IDH sequences shows four groups of IDHs. While this method yields clustering information about the phenetic similarities or differences between the sequences, it does not necessarily trace the evolutionary pathway [37].

Figure 1
figure 1

Phylogenetic tree from UPGMA method. Phylogenetic tree calculated using UPGMA Method. The tree diagram shows phenetic relationship. The alignment used is provided by Additional file 1. The reference table is in Additional file 2.

The IDH dataset is characterized by large variation in sequence identity (15% and above). Yet the overall structures and distinct scaffold and active site residues are conserved. Rate heterogeneity estimation was therefore used with the Maximum likelihood method to account for conserved residues. The required α shape parameter for gamma-distribution for 8 categories was estimated using tree-puzzle [38], and highly similar sequences reported by the program were reduced to one representative.

The program ProML in Phylip [39] was used to calculate the final tree (Figure 2), and the coefficient of variation calculated as 1 α , with 8 HMM categories. The BLOSUM62 [40] matrix was used, and if unavailable, as in ProML, the compatible PMB matrix [41] was used. Phylogenetic tree was also generated for the whole dimeric β-decarboxylase family dataset to check the relative position of the IDHs with respect to the other members of the family [see Additional file 3].

Figure 2
figure 2

Phylogenetic tree from Maximum likelihood. Phylogenetic tree calculated using Maximum likelihood Method. The tree diagram shows phylogenetic relationships. The alignment used is provided by Additional file 1. The reference table is in Additional file 2.

At most four representative crystal structures were chosen from each group seen in the phylogenetic tree (Table 1), making a total of 9 structures, four each from subfamily I and II and one belonging to neither. An additional homology model of dimeric IDH from Mycobacterium tuberculosis [27] (subfamily II) was also included. The sequence alignment of these 10 structures is shown in Figure 3.

Figure 3
figure 3

Alignment of dimeric IDH sequences. This is an alignment of sequences given in Table 1. Numbers correspond to residues given in Table 2. The numbers are 1-9 and A-F. Colors correspond to those given in structure markers in other figures. Some C-terminal residues of Thermus thermophilus TtIDH are not shown, as this IDH is longer than other IDHs and the extra region doesn't align with the other IDH sequences.

Molecular dynamics

In order to examine the consequences of the phylogenetic and structural variations, molecular dynamics simulations were carried out. The structures given in Table 1 were used for this analysis. Ligands, cofactors and divalent ions were removed to make comparisons easier.

AMBER version 9 [42] with the ff99 [43] forcefield was used. Protonation states were assigned to each structure using PDB 2PQR[44] through ProPKa [45] at pH 7.0. With the exception of ApIDH, all other IDH structures that were used lacked disulphide bonds. The protein structures were solvated with the TIP3P [46] water model in a truncated octahedral box with a 10Å buffer and neutralizing ions added. Periodic boundary conditions were used. Each system contained approximately 800-830 residues and ~20000 water molecules.

All systems were first minimized with solute restraints for 500 steepest descent (SD) and 500 Conjugate gradient (CG) steps followed by minimizations without restraints for an additional 1500 SD and 3000 CG steps. The systems were subsequently heated to 300 K at constant volume. An equilibration run was carried out for 250 ps under constant pressure (NPT) conditions with isotropic box scaling for pressure regulation. The particle mesh Ewald method [47] was used to model the electrostatics. Kinetic and total energy of the system was monitored to ensure stability for equilibration. The root mean squared deviation (RMSD) of atomic coordinates relative to the starting minimized structure was also monitored at this stage. SHAKE [48] was used to enable a timestep of 2fs. The Langevin thermostat [49] was used.

Simulations were run for 20 ns, and some were extended if required for up to 30 ns to ensure stability. A window of 15 ns was chosen from each of these simulations, which showed the least variability in the RMSD plots. Standard fluctuation analysis and correlation analysis were used to analyse these simulations, using the ptraj facility provided in the AMBER suite [50]. Principle component analysis was done using Pcazip [51], and plotted using Bio3d [52]. The RMSD and Radius of Gyration plots are given [see Additional file 4: S2-S3].

Results

Phylogenetic analysis

Phenetic clustering of dimeric IDHs using average distance shows four groups (Figure 1). Subfamily I (S1-IDH) consists of homodimeric, prokaryotic and predominantly NADP-dependent IDHs. Subfamily II (S2-IDH)[9, 53] consists of homodimeric, predominantly eukaryotic and NADP-dependent IDHs shown in Figure 4.

Figure 4
figure 4

Structures of subfamily I and II. Structures of subfamily I (top) and II (bottom)are shown for comparison. Colors are consistent with regions in Figure 3. Note the difference in Clasp region, the three loops and the ARS-like region. Subfamily I IDHs have α-helices (β-α-β pattern from each subunit). Subfamily II have all β (β-ββ-β) greek-key motif [57, 58]. Images were made using Chimera [80].

Subfamily III consists of heterodimeric NAD-dependent IDHs, along with a few bacterial members. An additional group whose members were previously classified as outliers [7, 8] are found to be closer to subfamily III. A resolved structure of Thermus thermophilus (Figure 5) belongs to this group. The structure and alignment show homodimers with 480-500 residues per chain with a unique extended C-terminal region of approximately 100 residues. This suggests that the clade may be regarded as a distinct subfamily IV.

Figure 5
figure 5

Structures of subfamily III and IV. Structures of subfamily III (top) and IV (bottom)are shown for comparison. Colors are consistent with regions in Figure 3. The sequentially central homologous clasp region (C1) in subfamilies III and IV is reduced to a two-strand anti-parallel sheet (ββ) (residues 148-160 in TtIDH), and is similar in both. C-terminal forms a larger domain over the clasp (C2). Images were made with Chimera [80].

Maximum likelihood analysis shows notable differences. NAD-dependent bacterial IDHs are grouped with subfamily III by phenetic clustering. Maximum likelihood analysis places them closer to subfamily I. These may be considered outliers, as they are most likely homodimers like those of subfamily I but do not seem to be part of subfamily I. Subfamily III IDHs are mostly NAD-dependant eukaryotic heterodimers, and some of these outliers may share close common ancestors with them.

Subfamily IV shows two subgroups. One subgroup contains Rickettsia IDH and other bacterial IDHs, while the other has Thermus thermophilus IDH and several putative thermophilic sequences.

Sequence alignment shows regions of conservation and regions where insertions or gaps are prominent between the different subfamilies (Figure 3, Figure 4 and Figure 5). These variable regions will be referred to as: Complementary region 1 (CR1), Phosphorylation loop (Phos-loop), Clasp domain (clasp), ARS-like [52], NADP discriminating loop, nucleotide binding loop and Complementary region 2 (CR2).

The homodimeric IDHs of subfamilies I, II and IV have two active sites present symmetrically, each formed from residues contributed by the larger domain of one subunit, and the smaller central domain of the other subunit. These homodimers may be described as pseudo 3D-domain-swapped dimmers [54, 55] as a single subunit is not known to be independently active [4]. It has been speculated that higher order oligomers, such as tetramers [7, 30] may exist, however they retain the homodimer as a basic unit. The prominent cross-over domain forming interaction between the two subunits is called the clasp domain as it resembles two hands, each representing a subunit, clasped together (see Figure 4 and Figure 5 for comparative structures).

Subfamily III IDHs form heterodimeric units with one active site and one regulatory site. Yeast NAD-dependent IDH [56] [PDB:3BLV], [PDB:3BLW], [PDB:3BLX] is represented by two sequences in Uniprot [Uniprot:IDH1_YEAST] and [Uniprot:IDH2_YEAST]. Two heterodimers associate by their clasp domains to form tetramers and two such tetramers associate to form the octamer, which is the biological unit in yeast. The clasp domain (C) is usually formed by at least one β-sheet between the two subunits.

The distinctly different shape of this domain in each subfamily helps to immediately distinguish structurally the four subfamilies of dimeric IDHs. Subfamily IV IDH subunits are longer than other dimeric IDHs. The extra length is accounted for by a long C-terminal region forming a larger clasp-like structure (C2) with motif ββ-α-β-α-ββ, as seen in T. thermophilus (Figure 5). Without the longer C-terminal region, the subfamily IV homodimeric IDHs structurally resemble subfamily III heterodimeric IDHs. The clasp region is known to play role in higher order oligomer formation and signalling [7, 56].

The various regions which show variations in sequence length are highlighted in the alignment (see Figure 3 and the corresponding color-coded region in Figure 4 and Figure 5). The function of these regions is not apparent from sequence or structural examination, but they clearly classify the different subfamilies. These features may modulate the rate and regulation of the enzyme through the diversity of roles they play in the biochemical cycles of their corresponding organisms.

As an example, the ARS-like region differs greatly in length and associated structure within subfamily I. At least five types can be identified, of which three can be structurally represented (Figure 6). These can be correlated with the bacterial family and the role and associated mode of regulation of IDH in these bacteria. The variation in length is not seen in subfamily II, and this region is reduced in subfamily III and IV.

Figure 6
figure 6

ARS-like segments in various IDHs. The AceK recognition segment (ARS) in E.coli IDH [22] and ARS-like region sequences and structures in other IDHs. S1-IDHs have at least five groups with different structures, three of which are structurally represented here. Cyanobacteria like Nostoc IDH_ANASP have the longest ARS-like sequence, which is not structurally resolved yet. The shortest S1-type, IDH_STRMU (Streptococcus mutans) may be NAD-dependent. S2-IDHs have conserved structure, represented by Pig PmIDH. The residues may differ, however, as the alignment between PmIDH and Mycobacterium tuberculosis IDH_MYCTU shows here. The MtIDH sequence has a stretch of glutamates (-EEE-) and is richer in acidic residues. The shortest length is seen TtIDH, as well as S3-IDHs. Image was made using Chimera [80] and Jalview [33].

Simulations reveal the dynamic properties of these enzymes and their modes of action. The role in modulation of the enzyme by these regions may be inferred from their dynamic behaviour, allowing us to probe the mechanism of the enzyme further.

Simulations

The major regions of fluctuation correspond mostly to the variable regions in the alignment (Figure 6). Sharp peaks are observed in E.coli (Figure 7) and other S1-IDHs [see Additional file 4: S4 A-D], while broader regions corresponding to the three loops show movement in the α-helix regions for subfamily II [see Additional File 4: S4 E-I]. The third loop or nucleotide-binding loop is more mobile in Eukaryotic IDHs than bacterial IDHs within subfamily II, corresponding to the longer loop in the alignment (Figure 3). These regions are known to have higher crystal B-factors [15, 57, 58] in several structures in comparison with other regions within the protein, implying that they are characterized by higher mobility.

Figure 7
figure 7

Fluctuations of IDHs. Fluctuations of dimeric IDH. (a) E. coli (EcIDH) and (b) Sus scrofa (PmIDH). The colored regions correspond to alignment in Figure 3 and regions in 4. Note that loops in PmIDH have helix structures within them. The numbering is continuous for the whole dimeric protein - subunit boundary is marked by thin black line in centre.

Correlation plots of the two subfamilies, subfamily I and subfamily II (Figure 8 and Figure 9, also [see Additional File 4: S5]), are visually distinct. Correlated movements of large loops in the proteins of subfamily II are more dominant than those in subfamily I. The subfamily IV IDHs show similar correlation pattern to S1-IDHs. This may be correlated from phylogeny data showing subfamily I, III and IV being close to each other.

Figure 8
figure 8

Correlation map for S1-IDH. Normalized Correlation map representative for dimeric S1-IDH (E.coli). The symmetric correlation matrix has been split, with lower triangle showing only negative values and upper triangle showing only positive values. Numbering of residues is continuous for each dimer (1- > ~800).

Figure 9
figure 9

Correlation map for S2-IDH. Normalized Correlation map representative for dimeric S2-IDH (Sus scrofa mitochondrial). S2-IDH map has been annotated. Colored circles within the lower triangle region representing negative correlations, show the general movement indicated in the inset image, with the color bars corresponding to the color codes in Figure 3, Figure 4 and Figure 7. The region highlighted in the upper triangle of the matrix show the positive correlations of the loops with each other (green) and the central region (blue). This graph was plotted using Bio3d [52] and structure image was made in Chimera [80].

The subfamily II IDHs show prominent negative and positive correlated motions. Both loops show strong anti-correlation with regions 605-685 (second subunit 190-270, most of the variable region), as seen in the correlation map of PmIDH (Figure 9). The nucleotide-binding loop (371-392) also shows similar correlations. Other negatively correlated regions include the n-terminal residues of both subunits with each other, suggesting a correlated hinged open-close motion. This hints at the possibility that each active site functions in tandem.

Positive correlations are seen as expected near the diagonal and in domains which are sequentially distant, but structurally close and associated, such as regions 605-684 and 190-270 both of which refer to the same region on the different subunits. Most of these correlations are either completely absent or very subdued in S1 type IDHs.

Among subfamily II IDHs, the movement of the NADP-binding loop is pronounced in mitochondrial enzymes, such as PmIDH and YmIDH, and subdued in HcIDH [see Additional file 4: S5]. The Mycobacterium MtIDH1 model was constructed based upon pig PmIDH as a template. However, the correlations of the loops are smaller in the MtIDH1 model than in PmIDH. The NADP discriminating loop, in particular has much smaller correlations. The cytosolic Human IDH shows very low negatively correlated motion for the NADP discrimination loop with respect to the central domain, in both the active [PDB:1T0L] and inactive [PDB:1T09] forms, whereas in both PmIDH and in YmIDH, this correllation is very strong (~1.0). The nucleotide-binding loop has less movement in MtIDH and TmIDH than in the Eukaryotic IDHs as the loop is shorter in the prokaryotes, as can be seen in the alignment in Figure 3.

The loops are subject to large domain motions. Principal component analysis (PCA) of the simulation data was used to see trends in the relative domain motions. The first principal component shows a very high contribution compared to the second and the third in subfamily II IDHs, while the difference is much lesser in subfamily I. In the stable sample sampled region (15 ns), this difference is subdued, but still discernible [see Additional file 4: S6].

A porcupine plot [59] of the PCA movements (Figure 10) shows domain motion, which is extensive in S2-IDHs, but attenuated in S1-IDHs. The overall RMSD and gyration plots show two relatively stable regions in S2-IDHs, implying an open and a closed form, but show only one region in S1 IDHs. The transition to a more open form is seen in S2-type IDHs, while bacterial types prefer the closed form. The porcupine plot of motions along the first principal component highlights this transition. Subfamily II IDHs have a pronounced open-close motion, which appears to compensate for the hindrance to entry into the active site that result from the large loops.

Figure 10
figure 10

Principal Component analysis. Porcupine plots [59] for (a) EcIDH and (b)PmIDH. Only Cα atoms are shown for First PCA mode. The loop present at top and bottom of structure is the ARS region. Subfamily I show localized loop motion in a rotatory fashion around the central domain. Subfamily II shows tandem motion - as one site closes, the other opens. The loops are mobile, and may play a role to guide substrate and cofactor to the active site. The summary plots are provided [see Additional file 4].

Subfamily I IDHs do not show this pronounced motion and the side domains tends to rotate sideways in opposite directions with respect to the central domain. Subsequent PCA modes in PmIDH show pronounced movement of loop 2, the NADP discriminating loop, and movement of the other loops as well. These motions are consistent with what is observed in the correlation plots. The loop regions move towards the region 605-685, which consists of the domain across the opening to the active site.

The motions of the loops appear to effectively open and close the active site (Figure 10). The Complementary regions I and II are so-named because they may explain the differences in the hinge-like motion between subfamilies I and II. Subfamily I has larger CR1 and correspondingly smaller CR2. In contrast, subfamily II has larger CR2 and correspondingly smaller CR1, while subfamily IV is short in both regions. While sequentially distant, these two regions are structural neighbours of each other. They are located close to the hinge region, and may modulate the differences in motion between the subfamilies I and II.

The results show that the mode of working of subfamily I and subfamily II are distinctly different. Although the enzyme has the same basic function, these differences correlate with their overall function in the biochemical pathway of the organism. The loop movements in subfamily II may be exploited for regulation by modulation of the enzyme in eukaryotes, where the enzyme is not involved in respiration, while the ARS region may be exploited for regulation in subfamily I, especially if the enzyme is involved in the respiratory TCA cycle.

Discussion

Phylogeny

Subfamily II IDHs include Eukaryotic IDHs and some bacterial IDHs. Thermatoga maritima and Desulphotalea IDHs along with some others such as Clostridia form one basal group of bacterial S2-IDHs. The other group of bacterial S2-IDHs consists of alphaproteobacterial IDHs and Actinobacterial IDHs from Bifidobacteria and Actinomycetales. These are closer to the isozymes of Eukaryotes and many organisms within this subgroup are either endosymbionts or cellular pathogens.

The alphaproteobacterial members, such as Rhizobium IDH [60], the recently resolved Sinorhizobium meliloti [PDB:3US8], Brucella, Bradyrhizobium and Paracoccus have IDHs most closely related to their Eukaryotic homologs, while Actinobacteria like Mycobacteria are more distant. This similarity is in agreement with the Endosymbiont theory of evolution [61, 62] which states that mitochondria evolved from alphaproteobacterial endosymbionts sharing a close common ancestor with Rhizobia and Rickettsia.

The phylogenetic analysis answers an immediate question: what is the reason for the similarity between M. tuberculosis IDH1 and host IDH? This similarity is not a result of gene exchange between host and parasite, and a clear pathway can be traced through evolution. Many of these, such as Rhizobium show close common ancestry with eukaryotic mitochondria, while others like Rickettsia have an NAD-dependent IDH of subfamily IV which appears to beclose to the subfamily III IDHs present in mitochondria. Most α -proteobacterial IDHs have subfamily II NADP-dependent IDHs, while some have NAD-dependent IDHs which are close to subfamily III or IV. This implies that IDH is one of several proteins, such as kinases [63] within the proteome of these organisms, which can be termed eukaryotic-like. Eukaryotic-like genes may aid pathogenesis [64] and endosymbiosis.

Activity regulation

Some important active site residues are listed in Table 2 and can be grouped as those interacting with substrate isocitrate and those involved in interactions with the cofactor. Residues associated with isocitrate binding [65, 66] are conserved in most IDHs. Among them, S113 and T105 in E. coli IDH are involved in anchoring the substrate isocitrate within the active site. S113 is also the target of phosphorylation in E.coli regulation [66, 67]. The Phos loop is the loop between and including these two residues. This loop is considerably larger in S2-group IDHs, hindering kinase phosphorylation [15, 57, 58]. The larger loop in subfamily II has a prominent α-helix (see alignment in Figure 3 and color-coded regions in Figure 4).

Table 2 Active site residues.

Residues K344 and Y345 in E. coli IDH are NADP-binding residues found to have a strong role in cofactor specificity [10]. The mutant K344D, Y345I makes the enzyme NAD-specific, incapable of using NADP as a cofactor [68]. The loop on which these residues are present is thus called the NADP-Discriminating loop, and the residues in this position can be used to distinguish NADP specificity vs. NAD specificity, making this fact a useful classification criterion [69].

The replacement of positively charged K with negatively charged D is thought to change the interaction with the electronegative phosphate of NADP [68]. This mutation (KY to DI) mimics the residues found in NAD-dependent IDHs in subfamily III and IMDH [68]. Most NADP-dependent IDHs from subfamily I and IV have K and Y, while those of subfamily II have R and H. Monomeric type IDHs and some subfamily I IDHs have K and H, responsible for high NADP-specificity [70]. There are however IDHs with DI in all four subfamilies, mostly at the basal level. The third loop or the nucleotide-binding loop has residues which anchor and guide the nucleotide base of the cofactor [10].

The three loops are therefore important for modulating the activity of the enzyme, and may provide clues for the mechanisms of activity of the enzyme. These loops may regulate the entry of substrate on their own, or help guide the substrate and cofactor to the active site, discriminate between similar cofactors, such as demonstrate selectivity for NADP vs. NAD, and thus contribute towards tuned regulation, depending on the function of the enzyme within the biochemical pathways of the organism.

Known regulation mechanisms for NADP IDHs include transcription control [71], inhibition by NAD(P)H or ATP (TCA feedback), concerted glyoxylate and oxaloacetate [72] phosphorylation by kinase [11], glutathione inhibition [73], specific changes in secondary structure as in Human cytosolic IDH [15] or allosteric regulation as in yeast subfamily III IDH [56]. In eukaryotes, these can be quite different in each case, as isoenzymes may be present for different tasks.

The three loops i.e., the Phos loop, NADP discriminating loop and third nucleotide-binding loop, are prominent with α-helices in subfamily II IDHs. Eukaryotic IDHs have evolved as paralogs within the same cell, within different organelles, and adapted to different biochemical feedback mechanisms. Modulation of the movement of these loops is likely to affect the activity of these enzymes.

Mitochondrial subfamily II IDHs (PmIDH and YmIDH) show anti-correlated motions in all three loops with the domains, while cytosolic IDH (HcIDH) does not show the correlation in the NADP-discrimination loop. However, the first loop shows anti-correlated movement. The cytosolic enzyme may be subjected to feedback concerning the substrate isocitrate.

In mitochondria the NADP-dependent iso-enzymes of subfamily II, compete with efficient NAD-dependent subfamily III enzymes for isocitrate. The substrate is plentiful in the mitochondria, thus rendering the relative availability of cofactor NADP or NAD as the regulating factors, to which subfamily II IDHs may respond.

Sequence lengths within subfamily I are variable. E.coli IDH has a length of 416 residues and B. subtilis IDH is 423 residues long, while Nostoc sp. [Uniprot:IDH_NOSS1] has 471 residues. Most of these differences are incorporated in the ARS in E. coli or the ARS-like region [22]. The ARS region in E.coli IDH plays a role in assisting the AceK kinase to phosphorylate its target S113 [22, 74]. The same region in B. subtilis IDH forms a fairly rigid helical hairpin structure which prevents AceK from acting on BsIDH [21].

Subfamily I may be divided into subgroups by their variable regions alone (Figure 6). Assuming the variable region is defined between EcIDH 239-275, the lengths of this region correlate with different families of bacteria. Gram-negative bacteria of the proteobacterial order: E.coli, Burkholderia pseudomallei, Helicobacter pylori, Coxiella burnetii etc., share the structure seen in EcIDH and BpIDH, which is ~36 residues. These may follow the classic regulation with kinase AceK seen in E.coli (Class A [22]), Gram positives like B. subtilis [21] and the NAD-dependent Acidothiobacillus thiooxidans IDH [13] all of which show a large helix hairpin, of ~49 residues (Class C [22]). Archaea such as Aeropyrum pernix [75], Sulfolobus tokodaii and Archeoglobus fulgidus IDH [76] have a short loop with a short helix, of ~37 residues (Class D [22]). In Nostoc, the sequence length is ~84 residues. Nostoc [Uniprot:IDH_NOSS1] requires IDH for a different role, i.e. nitrogen fixation [77]; it is likely that the regulation process may be different. Aquifex aeolicus IDH has ~32 residues, representing another type of system. The Streptococcus mutans sequence shows the shortest sequence in S1.

Subfamily II IDHs do not show large variations in length of the ARS-like region. S4-IDHs have a very short length. This indicates that the region may have little direct influence in actual enzymatic activity, but may serve in protein-protein interactions concerned with bacterial regulation, as seen in E.coli IDH [20].

Within subfamily II, bacterial IDHs are differentiated from the Eukaryotic ones by the length of the nucleotide-binding loop region. The nucleotide-binding loop has a conserved α-helix with a conserved threonine and aspartate (T390 and D392 in EcIDH) and residues around them which contribute to cofactor binding [10] and specificity [69]. The nucleotide-binding loop is longer in subfamily II IDHs than in subfamily I, and within subfamily II, bacterial IDHs have shorter lengths than eukaryotic IDHs. This makes the helix more mobile in eukaryotic IDHs than bacterial IDHs.

Conclusions

Implications for Mycobacterium tuberculosis

NADP-dependent IDHs take part in the TCA cycle, and there is provision for a glyoxylate bypass. The ARS region has been shown to play a role in regulation of IDHs in E.coli and the variation in structure of this region implies similar roles in other IDHs as well. Subfamily II bacterial NADP-dependent IDHs with a functional glyoxylate cycle, such as Mycobacterium tuberculosis IDH1 [78] perform a similar function in the bacterial cell like other subfamily I bacterial IDHs. It implies that they may also utilize the ARS-like region as in similar bacterial IDHs.

Metabolic Flux analysis [79] of the pathway indicates that inactivation of IDH is required for the glyoxylate cycle to function. The kinase responsible for inactivation, i.e., PknG and its target S213 was determined previously [26]. An attempt was made to decipher the effects of phosphorylation of the target serine in comparison with other likely targets in a previous study [27]. However, it was also found that the target serine was buried during the length of the short 5 ns simulation, and extending the simulation to 30 ns did not result in any exposure of the residue.

The serine residue lies below the variable region helix of the model structure. Correlation plots of all S2-IDHs show a square region containing the ARS-like region and the adjacent helix which has high positive correlations and negligible or no negative correlations. For the MtIDH1 model, this same square contains prominent negative correlations, and S213 seems to show this tendency as well, with respect to the corresponding residues in the other subunit (Figure 11). Compared with the template PmIDH used, this tendency for movement may be attributed to a greater proportion of acidic residues, such as a stretch of three glutamates, both on the surface of the modelled structure and mainly in these loops, and also the replacement of bulky aromatic residues such as W with the smaller polar residue T at a critical position near S213. The large proportion of negative charges may lead to frustration in the region.

Figure 11
figure 11

Correlation map for MtIDH1. The region around S213, including the ARS-like region just above it, shows negative correlations not seen in any S2-type IDH simulated here. The ARS-like region in particular shows negative correlations, and so does S213 and its immediate vicinity. This movement may be biologically relevant, as it does not appear in any other IDH simulation, particularly S2-IDHs, and is unlikely to be obtained by chance.

Using homology modelling, MD simulations and phylogenetic analysis of an important class of enzymes in the metabolic pathway provides clues towards the possible mechanism of phosphorylation and functional inactivation of M.tb IDH in persistent bacteria, leading to the opening of the shunt pathway. Selective biologically relevant movements of the ARS-like region and nucleotide-binding loop need to be explored further in the context of regulation and performance of the enzymes.