Tuberculosis continues to be a devastating pathogen throughout the world, particularly in developing nations. In 2001, the World Health Organization (WHO) estimated 8.5 million new cases of tuberculosis (based on 3.8 million new reported cases) and an estimated 1.8 million deaths from tuberculosis in 2000 [1]. Within the United States, the number of reported cases of tuberculosis has been decreasing with the exception of a period when the trend reversed in 1986 and peaked in 1992 [2, 3]. This reversal has been attributed principally to HIV/AIDs, immigration from countries with high prevalence of tuberculosis, poverty, homelessness, and multi-drug resistant (MDR) tuberculosis [1, 2]. MDR tuberculosis is generally defined as strains that are resistant to treatment with isoniazid and rifampin [4], two of the key first line antituberculosis drugs [1]. MDR strains of tuberculosis emerged in the early 1990s and have now been found all over the world [4].

Many of the unique properties of tuberculosis are attributable to its metabolism, particularly the complex fatty acids characteristics of the organism. These mycolic acids, phenolic glycolipids, and mycoceric acids confer many of the properties such as its acid-fastness and are believed to contribute to the resilience of the organism. Mycobacterium tuberculosis can survive in a wide range of environments (many different tissues) and fairly extreme pHs [5]. One of the most confounding factors with these bacteria is their ability to survive for long periods of time in a dormant stage. The slow doubling time of tuberculosis has further limited the amount of experimental data that can be generated. Many of the first and second line drugs used to treat tuberculosis have metabolic targets, so developing systems level models of metabolism are anticipated to be of great use in the future.

DNA sequencing of the ~4.4 Mbp genome of Mycobacterium tuberculosis H37Rv (M. tb) in 1998 [6] enabled the ability the pursuit of genome-scale analyses of this microorganism. The remarkable relevance to world health and disease control and the need to understand the metabolic function of the organism all evoke the need of a genome-scale metabolic model. Long-term anticipated goals and applications of such models are to understand the growth of mycobacteria under different conditions, identifying strategies to improve growth in vitro (for experimental and diagnostic purposes), and identifying new drug targets for treatment.

In order to gain understanding about the unique characteristics of this important pathogen, we manually reconstructed the metabolic network of M. tb in silico (iNJ 661), from which we developed a model to compute perform computational analyses and interpret experimental data. These bottom-up reconstructions have been described in the past as, biochemically, genetically and genomically structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about its normal metabolic function and to infer new potential targets for drugs.

Results and discussion

The reconstruction process has been described previously [7, 8], Figure 1 summarizes this process in brief. The network statistics for iNJ 661, which has 661 genes and 939 intra-system reactions, are summarized in Table 1. A biomass objective function was defined using available measurements of M. tb H37Rv and other mycobacteria strains if information was lacking. The biomass objective function was defined using the literature for chemical composition studies of M. tb [913]. When such information was not found for the specific strain, Mycobacterium bovis was used (for example to approximate the biomass composition of nucleotides and peptidoglycans [14]). The simple fatty acids compositions were based on studies of the mycobacterial cell wall [15]. The biomass function includes 90 metabolites in addition growth associated maintenance ATP costs (see Table 2). [See Additional file 1] for more detailed information.

Table 1 The confidence level for each reaction is based on a scale from 0 to 4, 4 being the highest level of confidence (experimental biochemical evidence supporting the inclusion of a reaction) and 1 being the lowest level of confidence (inclusion of a reaction solely on modeling functionality). Sequence based annotations have a confidence level of 2.
Figure 1
figure 1

The Gene Index for Mycobacterium tuberculosis H37Rv was downloaded from The Institute for Genomic Research (TIGR) [45]. Reconstruction content was defined based on the sequence annotation, legacy data, the Tuberculist database [31], ancillary sources such as the Kyoto Encyclopedia of Genes and Genomes (KEGG), and SEED [47]. Reactions were defined and according to the Tuberculist web and KEGG. Legacy data was also used in the process of building the model from the reconstruction. The manual curation aspects of the reconstruction are outlined above and discussed in further detail in [42]. Debugging was started once the first draft of the reconstruction was completed and functional testing (i.e. flux balance analysis calculations, etc.) were begun.

M. tb growth in silico

Flux Balance Analysis (FBA) was used to grow iNJ 661 in silico maximizing the biomass function defined in Table 2 (see Materials and Methods). iNJ 661 was grown in silico on three different types of media, Middlebrook 7H9 (supplemented with glucose and glycerol) [16], Youmans [17], and the chemically defined rich culture media (CAMR) [18]. Uptake fluxes which define the different media conditions are shown in Table 3 and the computed results under the different conditions are summarized in Table 4. The doubling times are within the range described in the literature [18, 19], and as expected, the growth rates are higher with the richer media. The minimum doubling time of 14.7/hr described by [18] are within the capabilities of maximum growth of iNJ 661.

Table 2 The list of biomass components for the initial biomass function. [See Additional file 1] for the additional components added for the extended biomass function and [see Additional file 5] for the explicit names and molecular formulas.
Table 3 *Glucose and glycerol supplementation were added as described in [20].
Table 4 Summary of iNJ 661 biomass production rates (in mmol/hr/g dwt) and doubling times (1/hr) in silico on different media. * from [19], ^ from [18].

As noted by Primm et al [20], M. tb growth on Middlebrook 7H9 media requires supplementation with glucose and glycerol. Glycerol is a component of the biomass function (Table 2) according to the chemical composition measurements performed by [14] on Mycobacterium bovis. So in order to grow in silico, glycerol will either need to be in the medium or the cell must have some way of producing it from other precursors. iNJ 661 was able to grow when glucose uptake was abrogated, however, it always required glycerol supplementation in the media which could be due to the biomass requirement or due to the need for glycerol for the production of a different essential metabolite. The growth dependence on glycerol and glucose uptake can be seen in the phase plane diagram depicted in Figure 2a. In this figure, the biomass objective function is optimized while the glucose and glycerol uptake fluxes are varied in different combinations. The resulting plot shows how the growth capability of iNJ 661 varies as a function of glucose and glycerol uptake. The solid black lines depict isoclines, along which growth is constant [21]. Further investigations were carried out in order to understand the essentiality of glycerol and non-essentiality of glucose in iNJ 661. Since flux distributions calculated by FBA are non-unique, flux variability analysis (see Methods) can be used to calculate the span of each flux while still achieving the maximum value for the objective function. Reactions with non-zero fluxes, in which the maximum flux equals the minimum flux (and is non-zero), are essential in achieving the particular objective (no alternative pathways exist). Flux variability [22] at maximum growth, with Middlebrook media with glucose and glycerol supplementation (Table 2) was performed in order to identify the constraining flux(es) that prevents growth on glucose but not glycerol. The only non-zero fluxes with no flexibility, directly involving the metabolism of glycerol, were the glycerol transporter (GLYCt) and glycerol kinase (Rv3696c, GlpK, GLYK). The production of glycerol-3-phosphate via glycerol kinase is essential for membrane and fatty acid metabolism involving fatty-acyl glycerol phosphates and interconversions between CDP-diacyl-glycerol and phosphatidyl-glycerol phosphates. These results are consistent and expected given the importance of membrane metabolism in the role of biomass production.

Figure 2
figure 2

a. Phase plane diagram of glycerol versus glucose uptake while optimizing for growth on Middlebrook media. iNJ661 can grow with glycerol serving as the sole carbon source, but not glucose. The open dots are the calculated phase points and the solid black lines indicate isoclines. b. Phase plane diagram for phosphate versus sulfate uptake on Middlebrook media. Although the transport fluxes are very small, they are both necessary for the organism to be able to grow.

The non-zero fluxes with no flexibility directly involving glucose were the glucose transporter (GLCabc, SugABC, Rv1236+Rv1237+Rv1238) and glucokinase (PPGKr, PpgK, Rv2702). Glucose-6-phosphate is subsequently essential for a number of other reactions involving various aspects of membrane metabolism, including the synthesis of trehalose phosophate (TRE6PS, OtsA, Rv3490) and inositol-phosphate (MI1PS, Ino1, Rv0046c), in addition to phosphoglucomutase (PGMT, PgmA, Rv3068c). These three reactions are required not only for optimal growth, but they are in fact essential for any growth at all in silico. Only one of these, however, has been suggested to be essential experimentally (Rv3490) [23]. Growing iNJ 661 in Middlebrook media (with glycerol but without glucose supplementation) severely retards the growth rate, however it is still viable in silico in contrast to M. tb growth in vitro. The discrepancy between in vitro essentiality of glucose, but non-essentiality of reactions requiring glucose derivatives and in silico non-essentiality of glucose suggest that there may be non-metabolic functions related to the need for glucose, in addition to alternative metabolic pathways for the production and/or use of glucose-6-phosphate.

Figure 2b characterizes the growth dependence on the uptake rates of inorganic phosphate and sulfate. Limiting uptake of either of these molecules will inhibit growth in silico. Although the uptake fluxes are small compared to the uptake of the carbon sources and oxygen, both are required for growth of M. tb. Consequently, the sulfate and phosphate transporters may serve as bacteriostatic and possibly bacteriocidal drug targets. A robustness analysis (see Methods) was carried for biomass production by iNJ 661 for phosphate and sulfate [see Additional file 7]. This was compared to a robustness analysis performed for the in silico E. coli strain iJR 904 [24] on aerobic glucose conditions [see Additional file 7]. The sensitivity of the respective biomass production rates to the uptake fluxes are given by the slopes of the plots for each organism. The sensitivity to the phosphate uptake rate for iNJ 661 and iJR 904 are both approximately 1. However, the slope for sulfate uptake in iNJ 661 is approximately four times that of iJR 904 (about 16 for iNJ 661 compared to about 4 for iJR 904). This difference reflects the relatively larger amount of sulfur as a percent of total biomass in M. tb compared to E. coli, which is consistent with knowledge of M. tb's composition [25]. Collectively, the phase planes and robustness plots suggest that M. tb may be much more sensitive to sulfate depletion than E. coli.

A remarkable property of M. tb is its ability to grow in culture with carbon monoxide (CO) serving as the sole carbon source [26, 27]. Unlike other mycobacteria that can grow with CO as the sole carbon source, M. tb lacks ribulose-bisphosphate carboxylase (RuBisCO), an enzyme found in plants enabling fixation of carbon dioxide. Although it was argued that a reductive citric acid cycle may enable the fixation of carbon dioxide into by M. tb [27, 28], experimentally it was observed that citrate did not affect M. tb growth on CO [27]. We did not expect iNJ 661 to grow on CO, since the only experimentally characterized CO associated metabolism in M. tb was CO dehydrogenase (CODH). The uptake of glycerol versus the uptake of CO while optimizing for growth can be seen in Figure 3. CO uptake affects the biomass optimization only during minimal uptake of glycerol, over a slight range. The contribution being made in this region is the donation of protons (through the oxidation of CO to CO2). Looking further into this issue, we tested the hypothesis that RuBisCO would enable growth on CO (even though M. tb is not known to have it). We added RuBisCO in addition ribose 1-phosphokinase and ribose 1,5-bisphosphate isomerase (to complete the pathway) and optimized for growth on a modified Middlebrook 7H9 media (glycerol and glucose were removed and CO was added). iNJ 661 was unable to grow under these conditions, despite conferring the ability to fix CO2 with ribose sugars. This observation in conjunction with Figure 2a suggests that one possibility for how M. tb is able to grow using CO as the carbon source is that it somehow is able to fix CO into a glycolytic intermediate that does not appear to occur through a RuBisCO pathway or a reductive citrate cycle.

Figure 3
figure 3

Biomass optimization on Middlebrook media without glucose. Carbon monoxide uptake can affect the growth at very low glycerol uptake rates via Carbon Monoxide Dehydrogenase, which generate protons through the oxidation of carbon dioxide.

A context for content

This BiGG reconstruction can be used as a 'context for content' in analyzing large biological datasets. Multiple investigators have employed high-throughput data analysis methods to characterize M. tb in different environmental conditions, Sassetti et al [23] used transposon site hybridization (TraSH) to identify the genes needed for optimal growth of M. tb in vitro (this will be referred to as the OptGro dataset throughout the rest of the text), Gao et al [29] identified a set of genes that are consistently expressed under different growth conditions in liquid culture (this will be referred to as the ConstExp dataset throughout the rest of the text), and Sassetti in 2003 investigated the set of genes needed for M. tb survival in vivo during infection in mice [30] (will be referred to as the Infect dataset). The degree of overlap between the three datasets can be seen in Figure 4A (only 2 genes are shared by all three sets) highlight the variability of gene expression across the different experimental datasets. Although significant differences between in vivo and in vitro patterns of gene expression may be expected, the OptGro and ConstExp datasets were both done in vitro. The heterogeneity among the datasets (all three experimental datasets share only two loci) suggests that defining an objective function based on in vitro data, may not necessarily translate to useful results in vivo (it may constrain the solution space from the wrong directions).

Figure 4
figure 4

Summary of experimental datasets and their overlap with iNJ 661 and the iNJ 661 gene deletion studies. OptGro refers to the dataset described in [23], ConstExp refers to the dataset described in [29], and Infect describes the dataset in [30]. ^ denotes the subset of genes from the experimental dataset contained in iNJ 661. iNJ 661 represents the 661 genes in the reconstruction. iNJ 661 optimal growth represents the subset of genes required for optimal growth with the original objective function. iNJ 661 optimal growth* represents the subset of genes required for optimal growth with the objective function expanded to include vitamins and cofactors. iNJ 661 alternative objective represents the set of genes required for optimal growth using the objective function constructed based on the ConstExp dataset. Panel A: The Venn diagram shows the overlap between the different experimental gene expression datasets. The accompanying chart summarizes the total number of genes in each dataset and how many of those genes are found in the iNJ661 reconstruction. Panel B: Series of Venn diagrams summarizing the results for the gene deletion studies carried out of iNJ 661 compared to the experimental gene expression datasets.

We evaluated these three datasets for M. tb and compared them to gene deletion analyses for iNJ 661, in which each gene was individually deleted and growth was optimized on Middlebrook media. Any growth rate that fell short of the maximum wild type growth was considered 'sub-optimal'. Table 5a summarizes the 114 false negative (iNJ 661 gene non-essentiality, but in vitro gene essentiality; 50%) results between iNJ 661 and the OptGro dataset (for the full results [see Additional file 4]). The inconsistencies were considered individually and classified into four categories, Not in Objective Function (NOF), Alternate Route (AR), Alternative Locus (AL), and Not Essential (NE); these classifications are not mutually exclusive and were defined in order to navigate and interpret with greater ease. A gene was placed in the NOF category if it was identified to be in a biosynthetic pathway (based on network connectivity) for a particular metabolite, such as a vitamin or co-factor. Genes would be categorized as AR or AL if there was an alternative biochemical route or an alternative gene annotation that could carry out that reaction in iNJ 661, respectively, but not in M. tb. Potential AR or AL classifications reflect one of the potential errors due to sequence based annotations, so further biochemical characterization of the gene products would help resolve these cases. Genes that did not fall in these categories were classified as NE. Many of these loci were associated with fluxes in which the flux variability was uniformly zero (maximum and minimum value).

Table 5a Classification of False Negative results by subdivision into four overlapping categories for the gene deletion study in iNJ 661 compared to the OptGro dataset. Each row and column lists the number of those genes in the respectively classes. NOF: Not in Objective Function, AR: Alternate Route, AL: Alternative Locus, NE: Not essential.

Since quantitative data was not available for vitamins and cofactors, they were not included in the original definition of the objective function (Table 2). However, after the deletion analysis, the majority of the false negative predictions in the NE category by iNJ 661 were reactions associated with the biosynthesis of vitamins and cofactors, such as dihydrofolate and tetrahydrofolate. 16 such metabolites were added [see Additional file 1] to the biomass function with coefficients of 0.000001 (small enough to have negligible effects on quantitative growth, but still required for growth). The results for the genes required for optimal growth using the initial biomass function and the expanded compared to OptGrow are displayed in the first Venn diagram of Figure 4B. Although only 16 metabolites were added, the deletion study resulted in 31 more gene loci that matched the OptGro data. This only slightly increased the deletion prediction from 54% to 56%. The reason for the modest increase is that there were also 18 additional false positive (iNJ 661 gene essentiality, but in vitro gene non-essentiality) results (from 46% to 44% for the original and expanded objective functions, respectively).

Genes categorized as AL or AR suggest that there was an error in the sequence based annotation, there are regulatory control loops that we are not aware of, or there was a false positive result from the experimental data (TraSH is often used as a screening tool). Sassetti et al [23] dicussed one of these cases involving the Rv0505c and Rv3042c loci (serB and serB2 respectively). Both have been assigned the same function based on annotation, however serB2 was found to be essential while serB was not. There may be many possibilities why serB cannot rescue mutations/deletions of serB2, however the detection of such cases imply that having duplicate genes many not confer the degree or robustness that is often assumed to accompany an organism. The cases with alternative loci highlight areas where further experimental investigation may provide insight into the growth capabilities of M. tb. The AL cases are enumerated in Table 5b. These loci are worth further experimental investigation to confirm the annotation and to determine the different conditions which may induce or inhibit expression. The false negative prediction for the sulfate transporter (Rv2397c, Rv2398c, Rv2399c, and Rv2400c) is an interesting case to consider, in part because it validates the observations in the phase plane discussed in Figure 2b. Additionally, it identifies an area in which a partially characterized annotation causes erroneous results. The four aforementioned loci form a protein complex in the model in order to catalyze the transport reaction. However, since there was an additional annotation identifying Rv1739c as a sulfate transporter [31], an association was made between Rv1739c in addition to the existing protein complex (Rv2397c + Rv2398c + Rv2399c + Rv2400c). So, although the sulfate transporter is essential for in silico growth of iNJ 661, the lack of a biochemically characterized gene protein relationship can result in false predictions.

Table 5b Detailed listing of the AL results for the False Negative results for iNJ 661. The first column lists a locus needed for optimal growth according to the OptGro dataset. The second column lists the reactions involved. The third column lists the possible alternative loci in iNJ 661 (commas indicate isozymes, addition symbols indicate formation of protein complexes).

The 103 false positive cases likely reflect incomplete knowledge about the network (alternative pathways exist) or an alternative objective function [see Additional file 4]. Approximately 20 cases involve amino acid pathways, since amino acids would presumably be required for any kind of growth in lab conditions, these cases likely suggest alternative pathways or enzymes are present in M. tb that are not in iNJ661. Almost half of these cases (46) involve fatty acid, membrane, or peptidoglycan metabolism, since the fatty acid composition in M. tb can change significantly over time in vitro [9], many of these false predictions are likely to be due to changes in the biomass composition (in silico simulations have fixed biomass compositions). These cases highlight areas in which experimental studies documenting the changes in composition and growth rates may yield interesting results. Five of the cases involve loci associated with the transport of phosphate, ammonium, and a sugar transporter. These cases likely involve as of yet unidentified alternative transporters in M. tb. Another case involves glycerol kinase (Rv3696c, GlpK, GLYK) which was one of the fluxes with no flexibility (see M. tb growth in silico), since M. tb in vitro requires glycerol supplementation for growth [20]. However, since glycerol kinase does not appear to be required even for sub-optimal growth as implied by the OptGro dataset, there must be other reactions involving the direct metabolism of glycerol that have not yet been identified for M. tb. The ability to grow using CO as the sole carbon source and maintain optimal growth even without glycerol kinase activity imply unique and unidentified aspects to central metabolism in the M. tb network. Further experimental investigations may provide clues towards controlling the growth of M. tb (increasing the growth rate in culture and inhibiting growth in vivo).

Out of the 194 genes identified in the Infect dataset, only 17 were shared with the OptGro dataset and 37 were shared with the ConstExp dataset (Figure 4A). From these three sets of experimental datasets, OptGro and ConstExp had the greatest number of genes in common, 162. However this is still only about 27% and 29% of the genes in each dataset, respectively. The large variability between all of these datasets highlights the point that further development of Flux Balance Analysis based approaches will require the definition alternative objective functions to biomass optimization.

Functional assessment independent of objective functions

Biomass functions can help improve predictions under well-defined conditions by constraining the steady state solution space. However, different growth conditions can appreciably alter the composition and the 'objectives' of bacteria, as reflect by Figure 4A and the discussion above. Significant changes in fatty acid composition have been observed even while growing M. tb in culture over the span of weeks [12]. These changes would be reflected as changes in the composition and the coefficients of the biomass function, consequently altering the predictions made by the model. Not all COBRA methods require the definition of objective functions [32] and identifying groups of reactions that operate together can help simplify a network and provide insight into its functionality [33]. Correlated reaction sets have been calculated using sampling [34] and flux coupling [35] and implications for classifying diseases and identifying pathway specific drug targets have been discussed [36]. Applying the same systems view of metabolic networks to pathogens such as M. tb causes one to adopt the view that single enzyme drug targets actually knock out complete pathways. As a result, terminating the activity of any other enzyme in that pathway should have the same effect. Similar to the correlated reaction sets calculated by sampling or flux coupling, HCR sets are defined by mass balance constraints. The definition of HCRs is stricter in the sense that it is based on metabolites with one-to-one connectivity. However in contrast to the other approaches, it is independent of the exchange reactions with the environment and any requirements for demand functions. Consequently, the sets can be calculated directly from the stoichiometric matrix. There is a significant degree of overlap between HCRs and the Enzyme Subsets (ES) described by Pfeiffer et al [37]. Since ESs are not constrained by 1:1 connectivity, in principle they may include additional reactions. On the same token, when the same reaction is carried out by multiple co-factors and the reactions are reversible, which is not uncommon in metabolic networks, ESs may add reactions which form loops (similar to Type III Extreme Pathways [38]). Additionally, intermediate steps in the calculation of HCRs allow the consideration of inclusion or removal of potentially reversible reactions.

147 HCR sets (1:1 only, additional reversible reactions, 0:2/2:0, were not included, see Methods) were calculated for the network, the average size of each set was 2.93, with median and modes both equal to 2. The largest set, HCR 40, consisted of 13 reactions involved in Cofactor Metabolism and Porphyrin Metabolism. The summary of how the HCRs are split among the 35 subsystems in the network are outlined in Table 6. Using the Gene-Protein-Reaction relationships which form the foundation of the reconstruction, we mapped the HCRs back to the gene loci (Figure 5). The complete HCR set mapped to 124 loci and 61 HCRs in the OptGro dataset, to 46 loci and 34 HCRs in the ConstExp dataset, and 21 loci and 18 HCRs in the Infect dataset [see Additional file 2]. The OptGro and ConstExp shared 8 HCR sets (HCR 2, 3, 11, 12, 43, 57, 69, and 122). The intersection of these datasets identifies a subset of reactions that are consistently expressed under different conditions and also required for optimal growth in vitro. These HCR sets may reflect underlying core sets of reactions vital to growth and survival in vitro.

Table 6 HCR sets mapped to the subsystems in the network along with the size of each subsystem.
Figure 5
figure 5

The stoichiometric matrix is created from the metabolic network. The HCR (Hard Coupled Reaction) sets are calculated directly from the stoichiometric matrix and mapped back to the gene loci. Gene-Protein-Reaction relationships: top box is the gene, the next box is the peptide, the oval represents the functional protein, and the bottom boxes are the reactions catalyzed by the protein.

Many of the tuberculosis treatment drugs have metabolic targets [1] and a use of this metabolic based reconstruction can be to assist in identifying new and alternative chemotherapy targets for tuberculosis. Mdluli and Spigelman recently discussed the drug targets in M. tb reasonably comprehensively [39]. We used the list (absent the DNA synthesis and Regulatory Protein targets) and mapped them to the 147 HCR sets as seen in Table 7 [see Additional file 3 for the full set]. The final column lists the number of different reactions in the HCR. Other reactions that are shared within a particular HCR which contains a known drug target, have the potential to be alternative but metabolically equivalent targets. The drug targets mapped to 25 of the HCR sets. These sets are depicted on a reduced metabolic map of iNJ 661 (Figure 6). An image of the HCR sets mapped to the entire metabolic network will be available on our website [40]. These 25 HCR sets contain all of the 8 HCR sets identified by the overlap between the OptGro and ConstExp data.

Table 7 Mapping the HCR sets to the drug targets described by Mdluli and Spigelman [39]. The first column includes the protein names (adopted from Mdluli and Spigelman), the second column lists the corresponding gene locus, the third is the HCR set number and the fourth lists how many reactions are in the HCR set. Only those drug targets mapped to an HCR are shown. [See Additional file 3] for the full set of HCR sets and the individual reactions they are comprised of; the HCR sets mapped to the above drug targets are highlighted in these data.
Figure 6
figure 6

A partial metabolic map of iNJ 661 with the 25 drug target HCR sets. Each reaction is numbered and color coded according to the HCR which it belongs to. The number of the HCR sets matches those in the Additional files. Only pathways or parts of pathways which included members of the 25 HCRs are depicted.

Last year Raman et al [41] constructed and analyzed a model of the mycolic acid synthesis pathways and proposed additional targets in fatty acid metabolism. Accounting for the entire metabolic network further builds upon this by enabling a more rigorous evaluation of the global growth capabilities of the network and the identification of potential drug targets. The genome-scale model presented here includes the extensive fatty acid metabolism pathways of M. tb, in addition to rest of the metabolic network. With this full network, we then sought to extend the identification of potential drug targets by taking advantage of the underlying principles of reconstructing metabolic networks to calculate HCR sets.

Interpreting large, complex networks in a functionally meaningful manner is enabled by adopting a hierarchical view of the network, where one identifies groups of reactions that respond in unison to changes in media conditions. Groups of reactions strictly bound together based on mass conservation and stoichiometry constraints can lead to the definition of hierarchical sets of reactions [33, 34]. This principle was applied to causal SNP associated diseases in the human mitochondria, and it was found that reactions in the same co-set had similar disease phenotypes [36]. We applied this same concept to iNJ 661, but rather than looking for disease phenotypes, we sought to identify alternative drug targets.


Over time and with iterative improvements, metabolic network reconstructions have achieved the stage of hypothesis generation and model-driven biological discovery in systems biology [42]. We present an in silico strain of M. tb, iNJ 661, with the anticipation that it will be received by the community in a similar vein and to assist in the discovery process of this remarkable yet devastating pathogen.

A large amount of material has been presented in this manuscript, beginning with the presentation of a genome-scale reconstruction and model of M. tb, iNJ 661, followed by experimental validation and use of the model for integration and analysis of multiple experimental datasets. We summarize some of the main points that highlight areas in which further experimental research may be fruitful.

Growth studies

  • iNJ 661 can grow at rates consistent with experimental data in varying media conditions. Further measurements of the biomass composition of M. tb in well-defined situations are likely to improve the in silico predictions under those conditions.

  • Analysis of the capabilities under maximal growth conditions can help identify critical nutrient targets for killing M. tb (such as the sulfate transporter) or for optimizing growth (in the lab).

  • The pathway enabling growth of M. tb using CO as the carbon source is still unclear. The computational studies suggest that there must be an alternative pathway (to reductive citrate cycle and RuBisCO) which enables entry of CO2 into central metabolism.

  • The seemingly paradoxical need for glucose supplementation in Middlebrook media but non-essentiality of directly related reactions involving the metabolism of glucose-6-phosphate for in vitro growth and the almost diametrically opposed results for in silico growth, suggest that there may be additional pathways involving glucose metabolism and potentially non-metabolic functions related to the glucose requirement.

Comparison with and analysis of large datasets

  • The definition of the biomass function will largely dictate the results of optimal growth/gene deletion experiments, consequently, positive results (in agreement with the experiments) are not as interesting as the false negative (or false positive results). The false negative results may be due to an error in the annotation of the gene or an underlying physiological mechanism that we are not aware of. In either case, investigating the particular genes experimentally may help clarify the issue.

  • The model can be used as a Context for Biological Content and provides a consistent, coherent framework to analyze various datasets focusing on metabolism.

'Unbiased' analysis

Unbiased analysis methods, methods that calculate properties of the network independent of objective functions, can provide interesting and valuable insights into network capabilities [32]. The definition of HCR sets, in a similar vein to correlated reaction sets from sampling and flux-coupling, can confer a hierarchical structure to metabolic networks, and may assist in the identification of alternative drug targets. Using these models in conjunction with experimental data can assist in screening and identifying new and alternative drug targets.

Taken together, this study adds another high resolution reconstruction of microbial metabolism. There now exist a growing number of such reconstructions that have been particularly useful for basic studies of network properties and for bioprocessing applications [4244]. As the number of reconstructions of human pathogens grows, we should be able to expand their uses towards improved understanding pathogenic mechamisms, and design of interventions and treatment. Large-scale rigorous experimental validation studies should now be performed to further these goals.


Reconstruction and model development of the metabolic network has been described previously [7, 8]. The reconstruction contents will be made available on our website [40] in Excel [see Additional file 5] and SBML formats.


Figure 1 provides a brief summary of the reconstruction and modeling building process. The sequence based genome annotation of Mycobacterium tuberculosis H37Rv was downloaded from TIGR [45] and served as the framework of the model. Charge and elementally balanced reactions were added individually based on this annotation, legacy data when available, and updates made in the Tuberculist database [31]. KEGG [46] and the SEED [47] were used as ancillary tools on occasion. Following the initial reconstruction, the gaps were evaluated individually by searching for direct evidence in the literature for their metabolism. Due to the relative sparsity of literature on aspects of central and cofactor metabolism in M. tuberculosis, many of these gaps could not be filled. Legacy data in the form of primary articles, review articles, and textbooks were employed in addition to the database resources [see Additional file 6] during the reconstruction and model building phases. A comprehensive map of iNJ 661's metabolic network was visualized by creating a map of the network organized by lumped subsystems of metabolism [see Additional files 8, 9, 10, 11, 12, 13, 14, 15]. After the debugging process, when the cell could grow (i.e. produce biomass) on different media, the remaining intracellular gaps were evaluated by searching the literature for evidence of metabolic reactions involving that particular metabolite. If no evidence of transport or biochemical transformations of the metabolite in M. tb was found, no additional reactions or transporters were added.

Model formulation

The stoichiometric matrix, S, was constructed based on the reactions described in the reconstruction. The biomass function was defined using available legacy data [see Additional file 1]. Flux Balance Analysis (FBA) simulations were employed during the debugging process towards developing a functional model.

Flux Balance Analysis (FBA)

The metabolic network is represented mathematically by the stoichiometric matrix, m rows by n columns, where there are m metabolites and n reactions in the network. Mass conservation dictates

d x d t = S v MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaiabdsgaKjabdIha4bqaaiabdsgaKjabdsha0baacqGH9aqpcqWGtbWucqGHIaYTcqWG2bGDaaa@3777@

which simplifies to

S·v = 0

at the steady state. Further constraints can be placed on the system based on thermodynamics (reaction directionality) and limits on substrate uptake rates


The oxygen uptake rate measured from the Mycobacteria Bovis BCG strain of 25 μL/hr/mg dry weight (0.98 mmol O2/hr/g dry weight) [48] was used. The small molecule carbon sources (glucose, glycerol, etc.) were limited to an uptake rate of 1 mmol/hr/g dry weight and all other allowable substrates were unconstrained. The different in silico culture media are listed in Table 3. Deviations from the standard media include: the addition of minute amounts of ferric iron in Youmans media. The oxygen uptake rate was the growth constraining flux in all of the different media.

Flux Variability Analysis (FVA)

Constraining the biomass objective function, c, to the optimal value calculated in FBA,


every flux in the network is then minimized and maximized. The different between the maximum and minimum for each flux defines the flux variability for that reaction.

Robustness Analysis

Robustness plots are performed by varying a particular flux through a pre-defined range and recalculating the objective function. The slope of the curve describes the sensitivity of the objective function on that particular flux (over the specified range of values).

Hard-Coupled Reaction (HCR) Sets

Denoting the binary form of the stoichiometric matrix, S ^ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuqGtbWugaqcaaaa@2DE9@ , the input/reactant part of the stoichiometric matrix, S-, and the output/product part of the stoichiometric matrix, S+, then it is transparent that,

S = S ^ + S ^ + MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqGH9aqpcuWGtbWugaqcamaaBaaaleaacqGHsislaeqaaOGaey4kaSIafm4uamLbaKaadaWgaaWcbaGaey4kaScabeaaaaa@3472@

The input and output connectivities of each metabolite can be calculated for S ^ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuqGtbWugaqcaaaa@2DE9@ - and S ^ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuqGtbWugaqcaaaa@2DE9@ +, respectively. The input:output ratio for each metabolite can be determined and all metabolites with 1:1 connectivity will be hard-coupled by the network. Metabolites with 0:2 or 2:0 connectivity reflect metabolites that are hard-coupled in the event that the corresponding reaction is reversible. Metabolites with 0:1 or 1:0 connectivity are either source/sink transporters or blocked reactions.

A general algorithm for calculating hard coupled reaction sets.

  1. 1.

    Identify 1:1, 0:2, and 2:0 metabolites

  2. 2.

    Define HCRs

  3. a.

    Identify the sets of reactions corresponding to the metabolites, denote the entire set as HCR0

  4. b.

    Join any HCR0 subset that share 1 or more reactions, denote the entire set as HCR1

  5. c.

    i ← 1

  6. 3.

    while (HCRi-1 ≠ HCRi)

  7. a.

    Join any HCRi subset that share 1 or more reactions

  8. b.

    i ← i+1

Since 0:2 or 2:0 reactions may be irreversible reactions, an intermediate processing step may be needed to verify that there is biological evidence supporting the catalysis of the reaction in the reverse direction. If one does not wish to consider these potential effects and assume that all of the reactions are reversible then step 1 can be simplified by finding all of the 2 metabolites for S ^ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuqGtbWugaqcaaaa@2DE9@ . Biomass functions were removed from the stoichiometric matrix before calculating the HCR sets. The HCRs calculated for iNJ 661 had 94 0:2/2:0 reactions. Of these, 24 overlapped with the core set of 147 1:1 HCRs. Since the set of 0:2/2:0 reactions did not have additional gene associations (that weren't already in the 1:1 set) and since they had many irreversible reactions throughout, the 1:1 set and the 0:2/2:0 set were not combined (step 1 in the above algorithm outline would involve only identifying the 1:1 metabolites).

The reconstruction process, the metabolic maps, and the majority of the calculations were performed using Simpheny™ v1.11 software (Genomatica, Inc.). HCR Sets were calculated with Mathematica© v4.2 (Wolfram Research, Inc.).