Applied Microbiology and Biotechnology

, Volume 97, Issue 2, pp 519–539

Genome-scale metabolic model in guiding metabolic engineering of microbial improvement


  • Chuan Xu
    • College of Biological and Environmental EngineeringZhejiang University of Technology
    • Department of Bioinformatics, College of Life SciencesZhejiang University
  • Lili Liu
    • Department of Bioinformatics, College of Life SciencesZhejiang University
  • Zhao Zhang
    • Department of Bioinformatics, College of Life SciencesZhejiang University
    • Watson Institute of Genome SciencesZhejiang University
  • Danfeng Jin
    • Department of Bioinformatics, College of Life SciencesZhejiang University
    • Institute of MicrobiologyZhejiang University
  • Juanping Qiu
    • College of Biological and Environmental EngineeringZhejiang University of Technology
    • Department of Bioinformatics, College of Life SciencesZhejiang University
    • Watson Institute of Genome SciencesZhejiang University

DOI: 10.1007/s00253-012-4543-9

Cite this article as:
Xu, C., Liu, L., Zhang, Z. et al. Appl Microbiol Biotechnol (2013) 97: 519. doi:10.1007/s00253-012-4543-9


In the past few decades, despite all the significant achievements in industrial microbial improvement, the approaches of traditional random mutation and selection as well as the rational metabolic engineering based on the local knowledge cannot meet today’s needs. With rapid reconstructions and accurate in silico simulations, genome-scale metabolic model (GSMM) has become an indispensable tool to study the microbial metabolism and design strain improvements. In this review, we highlight the application of GSMM in guiding microbial improvements focusing on a systematic strategy and its achievements in different industrial fields. This strategy includes a repetitive process with four steps: essential data acquisition, GSMM reconstruction, constraints-based optimizing simulation, and experimental validation, in which the second and third steps are the centerpiece. The achievements presented here belong to different industrial application fields, including food and nutrients, biopharmaceuticals, biopolymers, microbial biofuel, and bioremediation. This strategy and its achievements demonstrate a momentous guidance of GSMM for metabolic engineering breeding of industrial microbes. More efforts are required to extend this kind of study in the meantime.


Genome-scale metabolic modelSystems biologyMetabolic engineeringMicrobial improvementIndustrial application


With modern genome-sequencing capabilities, metabolic model has been developed into genome-scale reconstruction. This reconstruction tries to collect every reaction of target organism through integrating genome annotation and biochemical knowledge to reconstruct a stoichiometric mathematical model (Palsson 2006). It bridges the gaps between genome-derived biochemical information and metabolic phenotype in a principled manner, offering an ideal view of the whole cell (Durot et al. 2009).

The first genome-scale metabolic model (GSMM) was published 13 years ago (Haemophilus influenza (Edwards and Palsson 1999)). In the decade since, great strides have been made in GSMM reconstruction. Both efficiency and quality are significantly improved (Feist et al. 2009; Kim et al. 2011b; Notebaart et al. 2006). Lately, the process to build a GSMM has been extensively described (Baart and Martens 2012; Durot et al. 2009; Thiele and Palsson 2010), which further stimulates GSMM reconstruction. With rapid reconstruction and improved quality, GSMM has become an indispensable tool for studying system biology of organism metabolism. It is widely used in many aspects, such as contextualization of high-throughput data, understanding complex biological phenomena, guidance of metabolic engineering, directing hypothesis-driven discovery, interrogation of multi-species relationships, and network property discovery (Durot et al. 2009; Feist and Palsson 2008; Liu et al. 2010; Oberhardt et al. 2009; Palsson 2009; Rocha et al. 2008).

In the past few decades, despite all the significant achievements in industrial microbial improvement, the approaches of traditional random mutation and selection as well as the rational metabolic engineering based on the local knowledge cannot meet today’s needs. Recently, systems metabolic engineering based on simulation and prediction of mathematic model has provided a reliable method for microbial improvement (Lee et al. 2011b; Park and Lee 2008). As a major tool of the systematic method, GSMM has been widely applied to guide metabolic engineering of microbial improvement (Oberhardt et al. 2009; Palsson 2009). This GSMM-guided metabolic engineering is a systematic process, generally named as systematic method (Alper et al. 2005a; Alper et al. 2005b) or in silico-aided metabolic engineering approach in industrial biotechnology (Bro et al. 2006). Many reviews included this application of GSMM but described the achievements inadequately (Oberhardt et al. 2009; Palsson 2009; Rocha et al. 2008). Other reviews focused on different aspects, such as strategy and method (Kim et al. 2008a, b), GSMM species (Milne et al. 2009), and chemical materials (Curran and Alper 2012; Lee et al. 2012). Although the use of in silico GSMMs is still in its early stages for delivering to industry, some significant successes in microbial improvement have been achieved in recent years. Thus, it is timely to collect the successful cases to summarize the application of GSMM in guiding microbial metabolic engineering.

This review presents a synthetical overview of GSMM in guiding microbial improvements, focusing on the systematic strategy and its achievements in different industrial fields. It begins with a brief introduction to available GSMMs and then highlights the systematic strategy, implementation methods, and the successful applied cases in industrial microbial improvements. Selected exemplary achievements of every field are described in detail, and the potential research and application of GSMM are outlined as well. Furthermore, future directions of GSMM-guided metabolic engineering are prospected, in which we highlight the issues, trends, and opportunities. It is expected that all the contents reviewed here could expand the overview of using genome-scale metabolic model to guide metabolic engineering of microbial improvement.

Currently available GSMMs

After 13 years’ development, current GSMMs have spanned three domains: bacteria, eukaryota, and Archaea (Fig. 1). The most represented domain is bacteria, in which 54 species have been reconstructed, and more than half of them are proteobacteria. Some animals have already been built into GSMM, e.g., Plasmodium falciparum (Huthmacher et al. 2010; Plata et al. 2010), Mus musculus (NIELSEN 2008; Selvarasu et al. 2010; Sheikh et al. 2005; Sigurdsson et al. 2010), and Homo sapiens (Duarte et al. 2007; Ma et al. 2007). In addition, it is worth mentioning that recent GSMM publications for Arabidopsis thaliana (de Oliveira Dal’Molin et al. 2010; Mintz-Oron et al. 2012; Poolman et al. 2009) and Zea mays (Saha et al. 2011) have filled the blank of GSMMs in plant kingdom. Profiting from years of exploration on reconstruction, GSMMs of some model species are constantly upgraded. For instance, Escherichia coli has five versions at least (Archer et al. 2011; Edwards and Palsson 2000; Feist et al. 2007; Orth et al. 2011; Reed et al. 2003), and yeast has nine versions (Osterlund et al. 2012). As the number of reactions, genes, and metabolites involved in the updated versions increases, the quality of GSMM is improved to some extent.
Fig. 1

Phylogenetic tree of reconstructed species. This figure shows phylogenetic relationship between the organisms whose GSMMs have been built. All the built species are grouped into phyla which are noted with blue-font words along the shaded areas. The colors represent the three domains. The phylogenetic tree was generated using semi-automated software iTOL at (Letunic and Bork 2011), and phyla were determined using the NCBI taxonomy browser. The GSMM information comes from the website of Palsson’s group (, genome-scale metabolic network database (, published paper (Kim et al. 2012b), as well as others collected by us

The species built into GSMM have been across three areas including basic study, medical biotechnology, and industrial biotechnology (Feist and Palsson 2008; Milne et al. 2009). Some model species such as E. coli and yeast are explored in basic studies by theory biologists to clarify biological network evolution (Papp et al. 2011) and discover network characteristics (Oberhardt et al. 2009). Along with the applied scope of GSMM trends towards medical biotechnology, particularly for anti-pathogen target discovery, GSMM reconstructions for pathogenic organism have increased rapidly in recent years (Chavali et al. 2012; Kim et al. 2012a). Although many species are built into GSMMs for research fields mentioned above, applications in industrial biotechnology are the biggest motivation to reconstruct GSMMs for sequenced species (Blazeck and Alper 2010; Milne et al. 2009). These models are mainly used to design metabolic engineering strategy to enhance the yield of target products in microbial factory and to improve metabolic degradation ability of pollutants in bioremediation (Izallalen et al. 2008). Therefore, the GSMM-guided metabolic engineering strategy and its achievements undoubtedly show the value and significance of this systematic strategy both in theory and practice.

Strategy for guiding metabolic engineering of microbial improvement

Based on high-quality GSMMs, this systematic strategy commonly uses in silico modeling to forecast operational targets which are then validated in experiments to make microbes eventually with desired mutation (Park and Lee 2008; Patil et al. 2004; Tyo et al. 2010). From parent strain to the desired phenotype-changed strain, we summarize four cardinal steps, including microbial data acquisition, GSMM reconstruction, in silico simulation, and experimental engineering implementation (Fig. 2).
Fig. 2

General process of GSMM-guided metabolic engineering strategy. First essential is to obtain different level microbial data of the target microbe, including genome annotations and biochemical and physiological data for next-step GSMM reconstruction, as well as alternative high-throughput omics data for in silico analysis. Subsequently, there are iterative steps to reconstruct a high-quality GSMM including draft reconstruction, manual curation, and validation. Based on GSMM, in silico simulation and system-level analysis not only characterize a whole cell and define experimental results but also predict engineering targets and design experiments rationally. According to the predictions, engineering operation in wet-lab is performed. Generally, genetic engineering is the first choice to improve target microbes directly, including gene (reaction or pathway) knockout, insertion, and amplification. During the whole engineering manipulation process, reasonable cultures and accurate phenotype monitor are needed as well. If the processed microbial strain does not meet our desired interest, further iterative implementations can be performed from any step of the workflow. The final strain in this cyclic optimization achieves desired phenotype changes from the parent strain

Essential data acquisition

In present biological researches, genome sequencing is the first premise for any microbe which is desired to study its internal metabolic state in system scale. Hence, genome sequencing and annotation are essential for the strategy. In recent years, sequencing technology has evolved into the so-called “next-generation sequencing” which accelerates sequencing rate, decreases the cost, and more importantly, improves the veracity of data (Hall 2007; Mardis 2008; Schuster 2008). Meanwhile, more and more genes and proteins verified in biological functions are collected in many excellent databases, such as NCBI (Sayers et al. 2010), KEGG (Kanehisa et al. 2006), BRENDA (Scheer et al. 2011), LIGAND (Goto et al. 1998), and BioCyc (Caspi et al. 2012). These achievements greatly improve the genome annotation, providing researchers more and reliable genome resources used for GSMM reconstruction. Other microbial data, such as biochemical and physiological information, are also important for GSMM building and simulation. They could be collected from corresponding databases and literatures. In addition, the recent omics technology moves biological data acquisition into high-throughput era, producing massive microbial information in the scale of transcriptome, proteome, and metabolome. Thus, it is highly possible to reconstruct a high-quality GSMM and to introduce the systematic strategy into microbial improvement.

GSMM reconstruction

GSMM reconstruction is more mature comparing to other biochemical networks, such as transcriptional and translational networks and transcriptional regulatory networks (Feist et al. 2009). However, it is still time-consuming and labor-intensive. So far, none of the GSMMs are reconstructed without manual refinement.

The methods of GSMM reconstruction were summarized, and the comprehensive building process was described (Baart and Martens 2012; Feist et al. 2009; Thiele and Palsson 2010), which briefly involves four typical steps. First, a draft is built from gnome-annotation data. It is to obtain gene–enzyme-reaction information chain generally from databases such as KEGG (Kanehisa et al. 2006) and BioCyc (Caspi et al. 2012). This data assembly can be automated by computational tools such as PathAligner (Chen and Hofestadt 2004), MoVisPP tool (Chen et al. 2011), and Pathway tools (Karp et al. 2002), but they cannot replace manual curations. Second, the draft is improved through a time-consuming and literature-based manual process, which is to verify whether genes and reactions of the draft are really existent or correct and to find some others that may be missed. This step is the most crucial one in the reconstruction, for its quality determines the reliability of GSMM. Third, all the collected reactions are converted into a mathematical matrix model in accordance with their stoichiometric coefficients (Fig. 3a). If a given network is composed of n reactions including m metabolites, then the model S matrix is an m × n stoichiometric matrix. It is prepared to be further refined and analyzed by computational tools, such as gap analysis (Satish Kumar et al. 2007) and constraint-based methods (Price et al. 2004). The final step is to debug the model elaborately. Based on the discrepancies between in silico simulations and wet experimental results, the evaluation and validation can improve the accuracy of GSMM. In order to get a high-quality GSMM, the last three steps need to be repeated iteratively. Nevertheless, it is impossible for present GSMM to represent a natural cell perfectly, so the construction is usually terminated according to our purpose (Thiele and Palsson 2010).
Fig. 3

Mathematical representation of GSMM and the key items of manual refinement. a The mathematical basis of GSMM which describes the conversion from biochemical network into model S matrix. Subscripte” means the metabolite is in the extracellular space. b Concisely depicts the major items of curation from gene to reaction

The key of reconstruction lies in the iterative refinement. Actually, its major work is to perfect the enzymatic reactions and relevant genes. The major items requiring curations are enumerated in Fig. 3b. In addition, the non-enzymatic reactions, such as transport reactions, spontaneous reactions, exchange reactions, demand reactions, and sink reactions should also be added correctly. During the curations, gap analysis is the most immediately helpful approach. In most cases, it is beneficial for gap analysis by identifying missing enzymes through analysis of incomplete but essential metabolic pathways, stimulating literature searches that reveal previously overlooked phenotypic data, and analysis of high-throughput omics data (Oberhardt et al. 2009). Many computational methods, such as flux balance analysis (FBA) (Orth et al. 2010) and gapfind/gapfill (Satish Kumar et al. 2007), are developed to analyze the gaps (Pitkanen et al. 2010). Drawing software are developed to create organism-specific metabolic maps, such as Cellular Overview (Latendresse and Karp 2011) and MyBioNet (Huang et al. 2011), which is very useful for gap analysis.

In addition to the crucial reaction refinements, some cellular parameters representing microbial physiological and biochemical process are indispensable. One essential parameter is biomass synthesis reaction which accounts for all known biomass components (protein, DNA, RNA, lipids, peptidoglycan, glycogen,polyamines, etc.) and their fractional contributions to the whole cellular biomass (Thiele and Palsson 2010). Other important parameters, e.g., P/O ratio, ATP maintenance costs, and minimal medium, also need to be estimated or measured. Reported experimental data are the main resources. With the model and essential parameters, constraints-based modeling and analysis can effectively characterize cell metabolism and predict potential expression responses to environmental or genetic perturbations.

Constraints-based optimizing simulation

This simulation process is implemented on GSMM through many constraints-based algorithms. So far, the number of these optimization algorithms has been developed up to dozens (Kim et al. 2012b; Lewis et al. 2012). They have played a significant role in predicting microbial metabolic capability after genetic manipulations (Lewis et al. 2012).

First of all, the cardinal principle and process are illustrated in Fig. 4. In general, the basic framework is composed of variables, constraints, and objectives (Park et al. 2009). Its structure can be represented as a concise equation: maximize/minimize \( \mathrm{f}\left( \mathrm{x} \right) = \varSigma {{\left( {{b_{\mathrm{i}}}{{\mathrm{x}}_{\mathrm{i}}}} \right)}^{\mathrm{k}}},{{\mathrm{x}}_{\mathrm{i}}}\in \left\{ {\mathrm{x}\left| {{\alpha_{\mathrm{i}}}\ \leqslant\ {{\mathrm{x}}_{\mathrm{i}}}\ \leqslant\ {\beta_{\mathrm{i}}}} \right.} \right\} \), where xi represents the decision variable constrained by αi and βi, f (x) is the objective function constrained by bi and k. In in silico GSMM, the whole metabolism of a cell is simplified as \( {{{\mathrm{d}{{\mathrm{x}}_{\mathrm{i}}}}} \left/ {{d\mathrm{t} = {S_{\mathrm{i}\mathrm{j}}}\cdot {{\mathrm{v}}_{\mathrm{j}}},\ {\alpha_{\mathrm{j}}}\ \leqslant\ {{\mathrm{v}}_{\mathrm{j}}}\ \leqslant\ {\beta_{\mathrm{j}}}}} \right.} \), where S represents the stoichiometric coefficients of all the reactions (Fig. 3a), xi is the concentration of ith metabolite, metabolic flux vj of the jth reaction is subjected to the lower and upper bound constraints αj and βj. In most cases, vj is set as the decision variable to formulate an objective function f (x) according to simulation purpose. The constraints imposed on decision variable and objective function are commonly obtained from genetic, physiological, and biochemical data, such as regulation and thermodynamics (Fig. 4). Depending on the desired purpose, the objective function is commonly optimized by mathematical programming including linear programming, quadratic programming, mixed integer linear programming, etc.
Fig. 4

Cardinal principle of in silico constraints-based simulation. Without constraints, the flux of a biological network may distribute anywhere. When some constraints are applied to the network, it defines an allowable flux distribution space. Through optimization of an objective function, a single optimal flux distribution can be identified

According to the cardinal principle, many algorithms have been developed to simulate biological phenotype (Lewis et al. 2012; Park et al. 2009). FBA is the most basic and simplistic constraints-based method (Orth et al. 2010). Its constraints consist of three fundamental assumptions: pseudo-steady state (S·v = 0), mass conservation, and an optimizing objective. These constraints (assumptions) shrink the unconstrained flux distribution to a closed finite flux space. Then biomass synthesis flux (vbiomass) is optimized through linear programming to find a unique flux distribution. FBA can perform gene-deletion simulation to investigate gene essentiality, calculate growth rates under a given medium, and predict the yields of important cofactors (Orth et al. 2010).

Since gene knockout is the most common tool in metabolic engineering, most present constraints-based algorithms are developed for simulating gene deletion (Table 1). For example, minimization of metabolic adjustment (MOMA) (Segre et al. 2002) is a quadratic programming algorithm that realistically calculates the changes to reaction fluxes when a gene is deleted. It results in an optimal flux state that is the closest to a given flux distribution observed in a wild-type strain. Similar to MOMA, regulatory on/off minimization (ROOM) also predicts putative flux distributions after gene deletions by minimizing the number of significant flux changes (Shlomi et al. 2005). As coupling the production of desired product to cellular growth is one of the most widespread strategies for optimizing product yields, OptKnock, a bi-level optimization framework based on mixed-integer linear programming, was developed to identify optimal gene knockout strategy. This framework can predict a resulting phenotype with a high production of the desired metabolite at the maximal growth rate. With OptKnock as starting basis, OptGene (Patil et al. 2005), OptForce (Ranganathan et al. 2010), OptORF (Kim and Reed 2010), OptReg (Pharkya and Maranas 2006), and OptStrain (Pharkya et al. 2004) were developed and also have the ability to perform gene deletion simulation. Other algorithms with this simulation ability are collected in Table 1.
Table 1

Applications of algorithms in guiding metabolic engineering


Gene (reaction)





BioPathway predictor


Yim et al. (2011)



Hatzimanikatis et al. (2005)



Fowler et al. (2009)




Burgard et al. (2004)



Kim et al. (2007)




Lee et al. (2007)




Delgado and Liao (1997)




Choi et al. (2010)




Bushell et al. (2006)



Lun et al. (2009)



Segre et al. (2002)



Ranganathan et al. (2010)



Patil et al. (2005)



Burgard et al. (2003)



Kim and Reed (2010)



Pharkya and Maranas (2006)



Pharkya et al. (2004)



Shlomi et al. (2005)



Yang et al. (2011)



Kim et al. (2011a)

Gene amplification is another useful strategy so that some constraints-based frameworks focus on this simulation and prediction (Table 1). As an example, the method named flux scanning based on enforced objective flux (FSEOF) is skilled in identifying gene amplification targets by scanning the changes of all the metabolic fluxes in response to the enhancement of the flux toward the desired biochemical (Choi et al. 2010). It was further validated by identifying amplification targets that improved the production of lycopene in E. coli (Choi et al. 2010). Many FBA-based algorithms such as flux variability analysis (FVA) and flux sensitivity analysis (FSA), some OptKnock derivate including OptReg, OptORF, and OptForce, and other independent frameworks are developed to predict gene amplifications to investigate up-or downregulation of genes in the target organism (Table 1).

In addition to perturbation of endogenous genes, heterologous pathway assembly and expression is another critical approach for strain improvement. However, the number of constraints-based algorithms with the ability to predict foreign gene insertion is limited at present. OptStrain (Pharkya et al. 2004), an OptKnock derivate based on mixed-integer linear programming, is the most popular one with this prediction ability. In order to confer nonnative functionality into a host organism for a desired phenotype, OptStrain first identifies the minimal heterologous pathway that can achieve the maximum in silico yield of desired metabolites from a universal reaction database and then uses OptKnock framework to carry out a new efficient GSMM incorporated with the searched pathway. Other frameworks such as Biopathway predictor (Yim et al. 2011) and BNICE (Hatzimanikatis et al. 2005) are developed to find an optimal insertion pathway or reaction for redesigning a metabolic network.

These algorithms have their special particularities suitable for different guides of stain design. Selecting a wrong algorithm will result in misleading or erroneous interpretation, so it is necessary to keep cautious when choosing one or more for guiding metabolic engineering. In general, the given constraints of target organism and the simulation purpose determine the choice of constraints-based algorithms. In addition, many software are developed for simulating these algorithms (Copeland et al. 2012; Wiechert 2002), which make great contributions to constraints-based modeling. Among them, the COBRA Toolbox (a Matlab-based package) (Schellenberger et al. 2011) has become a near-standard tool in this field. It can perform many algorithms, e.g., FBA, OptKnock, OptStrain, and MOMA. Instead of COBRA’s deficiency with unfriendly interface, other tools with friendly graphical user interface, such as MetaFluxNet (Lee et al. 2003), BioMet Toolbox (Cvijovic et al. 2010), and OptFlux (Rocha et al. 2010), are frequently used as well.

Experimental validation

Our effort taken in reconstructing GSMMs and developing computational tools is to predict reliable engineering targets. Then these predicted results need to be validated in wet lab. This engineering process is commonly composed of genetic manipulation, strain cultivation, and phenotype measurement, which involves a lot of experimental methods.

Recombinant DNA techniques are the centerpiece of metabolic engineering (Tyo et al. 2010). In the achievements of GSMM-guided metabolic engineering, gene knockout is the principal genetic manipulation, which is always carried out by DNA homologous recombination (Capecchi 1989). Insertion mutagenesis (Klinner and Schäfer 2004) and RNAi (Agrawal et al. 2003) are the alternative gene-deletion technologies when homologous recombination is difficult for some organisms. In addition, other genetic manipulating strategies including genetic insertion and amplification are applied in the achievements of GSMM-guided metabolic engineering. As the most important and basic experimental techniques of molecular biology, these genetic manipulations reform and regroup the DNA of interest to modify a target industrial microbe, resulting in a possibility to create a biological factory for products of interest (Le Borgne 2012).

In addition to changes in genome, microbial cultivations are equally essential for strain improvement. Laboratory research commonly starts in batch cultivation. However, in order to get higher products using limited bioreactor and shorten production cycle to fulfill industrial applications, further efficient cultivations including fed-batch cultivation and continuous cultivation are applied under aerobic or anaerobic conditions (Bro et al. 2006; Brochado et al. 2010; Choi et al. 2010; Lee et al. 2007; Park et al. 2011). As important as genetic manipulations, the selection and optimization of microbial cultivations can also contribute to strain improvement for industrial applications.

Authentication and measurement of microbial phenotype are the final challenging and indispensable tasks. In most cases, the detection techniques refer to qualification and quantification of a target metabolite. At present, metabolite quantification depends on either spectrophotometric assays (detection of single molecules) or simple chromatographic separation techniques (detection of molecules on mixtures of low complexity). For example, high-performance liquid chromatography has been used in different methods to analyze metabolites, such as reversed-phase chromatography used for detecting spectinomycin (Yan et al. 2009) and size-exclusion chromatography for analyzing heparin (Ziegler and Zaia 2006). In order to analyze complex mixtures of compounds in high accuracy and sensitivity, some advanced methods combining chromatographic techniques and spectrometry-based techniques have been established, such as gas chromatography-mass spectrometry, liquid chromatography-mass spectrometry, and nuclear magnetic resonance (NMR). All the representative methods in phenotype measurement greatly speed up the validation of in silico predictions.

It is obvious that only these experimental biotechnologies could make microbial improvements come true. Hence, efficient genetic tools and genetic manipulation systems, appropriate cultivation, and accurate phenotype measurement are the prerequisites to apply GSMM-guided metabolic engineering strategy for strain improvement.

Achievements in different industrial application fields

Although available GSMMs have been more than 100 and dozens of constraints-based algorithms are developed, only several models and algorithms are further validated successfully in strain design. The achievements of GSMM in guiding metabolic engineering of microbial improvement are listed in Table 2, and the representative in silico and wet experimental methods are summed up in Fig. 5. In order to highlight the contents in an apple-pie order, we group them into five industrial application fields: (1) food and nutrients, (2) biopharmaceuticals, (3) biopolymer materials, (4) microbial biofuels, (5) bioremediation. Meanwhile, potential GSMM developments of some important industrial microbes and those reconstructed GSMMs which could be further used in practices are the other significant contents reviewed here.
Table 2

Achievements of industrial strain improvement using GSMM-guided systematic method



Improved result


Saccharomyces cerevisiae


25 % improvement

Bro et al. (2006)

Saccharomyces cerevisiae


An approximately 85 % increase in the final cubebol titer

Asadollahi et al. (2009)

Saccharomyces cerevisiae


Fivefold improvement

Brochado et al. (2010)

Saccharomyces cerevisiae

Formic acid

Threefold higher in log-phase and the extracellular concentration got 16.5-fold increased

Kennedy et al. (2009)

Saccharomyces cerevisiae


2,3-Butanediol titer (2.29 g/L) and yield (0.113 g/g) were achieved.

Ng et al. (2012)

E. coli


Nearly 40 % increase

Alper et al. (2005a)

E. coli


8.5-fold increase over the wild strain

Alper et al. (2005b)

E. coli


Developed FSEOF algorithm to find gene amplification targets resulting in lycopene yields increased significantly

Choi et al. (2010)

E. coli


Over 12-fold improvement

Boghigian et al. (2012)

E. coli


E. coli and its DXP pathway were found with the most potential ability beneficial for taxadiene production.

Meng et al. (2011)

E. coli


A high yield of 0.378 g of l-valine per gram of glucose

Park et al. (2007)

E. coli


A high yield of 32.3 g/L l-valine in fed-batch cultivation

Park et al. (2011)

E. coli

Polylactic acid and its copolymers

PLA, P (3HB-co-LA), and 3HB.P (3HB-co-LA) produced up to 11 wt.%, 56 wt.%, and 46 wt.% from glucose, respectively

Jung et al. (2010)

E. coli

Malic acid

9.25 g/L could be obtained after 12 h of aerobic cultivation.

Moon et al. (2008)

E. coli


A high yield of 0.393 g per gram of glucose and 82.4 g/L threonine by fed-batch culture

Lee et al. (2007)

E. coli

Succinic acid

Increased production by more than sevenfold and the ratio by ninefold

Lee et al. (2005a)

E. coli

Succinic acid

A high yield of 1.29 mol succinate/mol glucose and high productivity

Wang et al. (2006)

E. coli

Lactic acid

Lactate titers ranged from 0.87 to 1.75 g/L and secretion rates were directly coupled to growth rates.

Fong et al. (2005)

E. coli


817 mg/L of leucocyanidin and 39 mg/L (+)-catechin with 10 g/L glucose, a fourfold and twofold increase, respectively

Chemler et al. (2010)

E. coli


A fourfold increase in the levels of intracellular malonyl-CoA

Xu et al. (2011)

E. coli


Leading to a strain of E. coli capable of producing 18 g/L of this highly reduced, non-natural chemical

Yim et al. (2011)

E. coli


Increased by over 660 % for naringenin and by over 420 % for eriodictyol

Fowler et al. (2009)

B. subtilis


2.3-fold improvement

Li et al. (2012)

C. glutamicum


0.55 g per gram of glucose, a titer of 120 g/L lysine, and a productivity of 4.0 g/L/h

Becker et al. (2011)

C. glutamicum


The best strain obtained 10 % higher yields.

van Ooyen et al. (2012)

L. lactis


At least 15 % higher GFP per cell than the control strain

Oddone et al. (2009)

G. sulfurreducens

Respiratory rate

Successfully increasing electron transfer as a result of higher respiratory rate

Izallalen et al. (2008)

Acinetobacter baylyi ADP1


5.6-fold more triacylglycerol (milligrams per gram cell dry weight) and the proportion in total lipids was increased by eightfold

Santala et al. (2011)

S. roseosporus


Approximately 43.2 % higher than that of the parental strain

Huang et al. (2012)
Fig. 5

Summary of the representative in silico and experimental methods used in the collected achievements. A black box indicates that a particular method (one method per row) was used in a corresponding study (one citation per column) which was marked published year in the first row of the figure. Achievements are coming from the E. coli, S. cerevisiae, and other microbes, which are labeled in the top left corner. The using frequency of every method is given on the right. Computational methods were grouped into three aspects including mathematic basic, in silico analysis, and corresponding algorithm. Following the engineering process, experimental methods consist of three parts that is genetic manipulation, cultivation, and phenotype measurement

Food and nutrients

In food and nutrients industry, GSMMs are built to improve the yield of fermentation byproducts and explore metabolic mechanisms and processes. Guided by the systematic strategy, improving E. coli for the production of amino acid and organic acid is the most successful attempt (Becker and Wittmann 2012). In addition, certain non-model microbes such as lactic acid bacteria (LAB) and Corynebacterium glutamicum have been reconstructed into GSMM to investigate the global cell for strain improvement (Milne et al. 2009; Teusink et al. 2011).

Due to its importance in nutrition, amino acids are commonly used in nutrition supplements, fertilizers, and food industry. Based on the GSMM, E. coli was improved for producing l-valine through systematic analysis and simulation (Park et al. 2007). In this study, Park et al. first constructed an l-valine-producing basic strain by analyzing metabolic and regulatory information available in the literatures. Then, this basic strain was improved stepwise guided by new information obtained from transcriptome analysis and in silico gene knockout simulation. The final engineered E. coli strain was able to attain a high yield of l-valine per gram of glucose, up to 0.378 g. In its GSMM-guided process, firstly, an E. coli GSMM named MBEL979 was slightly updated from iJR904 (Lee et al. 2005b; Reed et al. 2003), which included 979 metabolic reactions and 814 metabolites. Then, through simulating MOMA algorithm, aceF, mdh, and pfkA genes were identified as the best triple knockout targets that could insure a reasonable growth rate. After wet experimental gene deletions, the l-valine concentration of the mutation strain was 2.27-fold higher than that of the corresponding recombinant start strain, which highly agreed with the in silico predictions. Four years later, Park et al. (2011) indicated that l-valine yields was further improved using the GSMM-guided method. Differently, the simulating algorithm was flux response analysis (FRA). In addition to valine, Lee et al. (2007) reported an improvement of a genetically modified l-threonine-overproducing strain, in which FRA was first developed to perform on GSMM. In summary, these three representative studies demonstrate that GSMM-guided metabolic engineering strategy has been applied efficiently to improve prokaryotic E. coli to produce primary metabolites.

Improving eukaryotic microbe S. cerevisiae for vanillin and malic acid production is another typical example (Brochado et al. 2010; Zelle et al. 2008). Vanillin is one of the most widely used flavoring agents in food industry and has been expressed in S. cerevisiae (Hansen et al. 2009). Recently, Brochado et al. (2010) improved vanillin production in baker’s yeast through in silico design. Several genetic targets were identified by OptGene and OptKnock on the GSMM iFF708 (Forster et al. 2003) while MOMA was used as the biological objective function. Subsequently, two of them (PDC1 and GDH1) were selected for further experimental verification, resulting in a Δpdc1 mutant with fivefold increase in production compared with previous works. In another study, S. cerevisiae was engineered to produce 59 g/l of malate, five times higher than earlier efforts (Zelle et al. 2008). The GSMM iND750 (Duarte et al. 2004) was used as the basis for the 13C flux model, so that the remarkable improvement could be validated by 13C-NMR flux determination. Therefore, the two studies demonstrate that GSMMs are not only applied to design metabolic engineering strategy but also to make further improvements on strains through interpreting experimental data.

Apart from the engineering model strains (E. coli and S. cerevisiae) mentioned above, there are a number of microbes well known as natural producers for materials of food and nutrients industry. C. glutamicum is one of the most important natural producers of various amino acids, which has two GSMMs reconstructed with a high quality (Kjeldsen and Nielsen 2009; Shinfuku et al. 2009). Its ability to over-produce l-lysine was also remarkably modified by GSMM-guided metabolic engineering (Becker et al. 2011). LAB are the other useful nutrient-related microbes because of their powerful ability to produce bacteriocins, exopolysaccharides, polyols, vitamins, etc. (Zhu et al. 2009). Lactococcus lactis is the earliest one whose metabolic network is reconstructed into genome scale to analyze metabolic capabilities and whole-cell function under aerobic and anaerobic continuous cultures (Oliveira et al. 2005). Base on this GSMM, Oddone et al. (2009) employed the dynamic flux balance analysis (DFBA) algorithm to predict gene targets to increase the expression of green fluorescent protein (GFP, a model heterologous protein) in L. lactis IL1403. The subsequent wet-lab experiments increased GFP production of L. lactis by 15 %, which validated the model-based prediction to certain extent. In addition, some other LAB such as Lactobacillus plantarum (Teusink et al. 2006) and Streptococcus thermophiles (Pastink et al. 2009) were reconstructed into GSMMs as well. Hence, in the industrial production of food and nutrients, applying these GSMMs of non-model organisms for microbial improvement will be a significant coming progress.


Microbes are famous as the source of pharmaceuticals for a long time. Many drugs such as penicillin, cephalosporin, and tetracycline are produced by natural or engineered microbes. The reason that we choose microbes as drug production factory is that they have more advantages comparing with total chemical synthesis or extraction from natural resources, including friendliness to environment, low costs, and higher producing rates (Lee et al. 2009b). With the reconstructed GSMMs as basis, some biopharmaceuticals have benefited from the GSMM-guided metabolic engineering strategy (Alper et al. 2005a; Alper et al. 2005b; Asadollahi et al. 2009; Boghigian et al. 2012; Meng et al. 2011).

For example, lycopene is a valuable pharmaceutical and nutrient in our diets. It is beneficial to human health because of its abilities to prevent cardiovascular disease and cancers of the prostate or gastrointestinal tract (Clinton 1998; Gerster 1997). Over 10 years ago, lycopene and carotenoids had received high attention and achieved their production in recombinant microorganisms (Farmer and Liao 2000). In order to explore the guidance of GSMM in metabolic engineering, Alper et al. (2005a) did an in silico analysis to investigate the putative genes impacting network properties and cellular phenotype. Profiting from the GSMM iJE660 of E. coli (Edwards and Palsson 2000) and the applied algorithm MOMA, five genes were identified as candidates for experimental validation to improve lycopene production. After experimental attempts of single and multiple gene knockouts, lycopene yields in the final engineered strain got a nearly 40 % increase over parental strain. What is even more exciting is that the yields of lycopene achieved 8.5-fold increase over recombinant K12 wild-type after combining GSMM-based and combinatorial (transposon-based) methods (Alper et al. 2005b). Recently, Lee’s group has also successfully employed the systematic method to identify the genetic amplification targets in E. coli for enhancing lycopene production (Choi et al. 2010). In addition, GSMMs of E. coli are simulated to produce taxadiene (Boghigian et al. 2012; Meng et al. 2011) and sesquiterpene (Asadollahi et al. 2009) as well. The productions of some drug precursors in E. coli have also been improved by this method, such as l-valine and l-threonine mentioned above. Obviously, with prokaryotes as drug expression system, the model organism E. coli is the best choice to explore GSMM-guided metabolic engineering.

In eukaryotic organisms, model microbe S. cerevisiae is generally designed as a microbial cell factory to produce pharmaceuticals. The sesquiterpene production of S. cerevisiae is a typical successful example using this GSMM-guided metabolic engineering (Asadollahi et al. 2009). GSMM iFF708 (Forster et al. 2003) was employed for constraints-based analyses. While OptGene was chosen as the modeling framework and MOMA as objective function, GDH1 encoding NADPH-dependent glutamate dehydrogenase was then identified as the best target gene for the improvement of sesquiterpene biosynthesis in yeast. Deletion of GDH1 resulted in an approximately 85 % increase in the final cubebol titer, but it decreased the maximum specific growth rate significantly. Just as well, this disadvantage was then mitigated by over-expression of GDH2. Thus, the complexity of eukaryotic organisms might bring a greater obstacle for using GSMM-guided systematic strategy.

Aside from engineering model strains (E. coli and S. cerevisiae), there are lots of microbes well known for their natural production of biologically active drugs or precursors. However, so far only two popular microbes of this kind have their high-quality GSMMs, Bacillus subtilis (Henry et al. 2009; Oh et al. 2007) and Streptomyces coelicolor (Alam et al. 2010; Borodina et al. 2005). Streptomyces bacteria are the well-deserved microbial factories for antibiotics. It is said that almost two thirds of all known natural antibiotics are produced by Streptomyces (Borodina et al. 2008). S. coelicolor A3(2), the best genetically characterized strain in this genus, has become a preferred model organism in Streptomyces research. Jens Nielsen’s group had applied the GSMM of S. coelicolor A3(2) to display its global metabolism (Borodina et al. 2005). Then, it was predicted that decreased phosphofructokinase activity would lead to an increase in pentose phosphate pathway flux and in flux to pigmented antibiotics and pyruvate (Borodina et al. 2008). Alam et al. (2010) updated the GSMM and successfully predicted flux changes when the cell switches from biomass to antibiotic production. Recently, Huang et al. (2012) reconstructed a partial metabolic network of Streptomyces roseosporus based on the GSMM of S. coelicolor and successfully improved the strain in daptomycin yield using in silico metabolic flux analysis. Thus, the GSMM of S. coelicolor A3(2) shows its widespread applications in model reconstruction and prediction for other Streptomyces microbes. B. subtilis is another best-characterized drug-producing microbe with an ability to produce antibiotics. The first GSMM of B. subtilis was reconstructed based on the combination of genomic, biochemical, high-throughput phenotype, and gene essentiality data (Oh et al. 2007), and then it was updated as a result of the more accurate genomic annotations (Henry et al. 2009). Although these works brilliantly investigate the metabolic network characteristics of these strains, there are few reports about successful strains improvements for producing biopharmaceuticals driven by this systematic strategy.

Furthermore, many other important drug-producing microbes are applied in industry, yet have no high-quality GSMMs (Lee et al. 2009b). Therefore, in biopharmaceutical fields, not only the applications aimed at strain improvement but also the GSMM reconstruction still requires further attempts.

Biopolymer materials

In synthetic material industry, microbes also make important contributions. Many polymer materials and their monomers could be produced by natural or engineered microbes, e.g., poly-3-hydroxyalkanates (PHAs), polylactic acid (PLA), carboxylic acids, butanediols, etc. (Lee et al. 2011a). Recently, GSMM-guided metabolic engineering strategy has been successfully implemented to enhance the productivity of useful biopolymers and their precursors (Jung et al. 2010; Ng et al. 2012; Yim et al. 2011).

For example, PLA is a promising biomass-derived polymer which is considered to be biodegradable, biocompatible, and of low toxicity to humans. It is reported that PLA can be synthetized by engineered E. coli, but at relatively low efficiency (Yang et al. 2010). To overcome this insufficiency, Jung et al. (2010) further improved this engineered E. coli based on in silico genome-scale metabolic flux analysis. In this case, MOMA, FBA, and FRA were simulated for in silico knockout and amplification studies by using the GSMM EcoMBEL979. In silico gene knockout simulation demonstrated that deleting adhE gene could achieve a PLA production rate under an acceptable growth rate, much higher than the predicted flux of control strain. With the other two genes ackA and ppc deleted, this triple knockout was considered as the most beneficial strategy for maximizing the PLA flux. After additional promoter modification, the resulting strain allowed the most efficient production of PLA homopolymer and poly[3-hydroxybu-tyrate(3HB)-co-LA] copolymers which agrees well with the in silico simulation results. This study allowed efficient bio-based one-step production of PLA and its copolymers. It is expected that this strategy might be generally useful for developing other engineered organisms capable of producing various unnatural polymers.

Beyond the full-length polymers, the production of monomers in microbial cell factories is an easier biosynthetic route. It is reported that some platform monomers including propanediols, butanediols, diamines, and terpenoids have been produced in microbes (Curran and Alper 2012; Lee et al. 2011a; Lee et al. 2012). Among them, the production of butanediols including 2, 3-butanediol (Ng et al. 2012) and 1, 4-butanediol (Yim et al. 2011) was improved based on the GSMM-guided metabolic engineering strategy. In addition, the production of some carboxylic acid monomers, such as formic acid, malic acid, and succinic acid, were also enhanced in engineered S. cerevisiae or E. coli by the guidance of GSMM (Kennedy et al. 2009; Lee et al. 2005a; Moon et al. 2008; Wang et al. 2006). All these monomers are the important raw materials in synthetic material industry.

The achievements mentioned above in this industrial field are implemented on model species. Actually, there are many natural microbes with the abilities to produce biopolymers and their monomers (Lee et al. 2011a; Lee et al. 2012). For instance, Pseduomonas putida is a typical microbe of this kind. Its two GSMMs are built to investigate the production of biopolymer PHA (Nogales et al. 2008; Puchalka et al. 2008). However, only a few of these microbes are reconstructed into GSMM. Thus, in this application fields, GSMM-guided metabolic engineering strategy has an extensive development space in the future even while it needs more effort to be taken in reconstructing GSMMs for these natural species.

Microbial fuel

As one of the most important renewable energy, biofuel has gained increasing public and scientific attention, driven by factors such as oil price hikes, environmental concerns, and supports from government subsidies (Stephanopoulos 2007). Microbes make a significant contribution to the production of biofuels, including bio-ethanol, bio-butanol, bio-gasoline, bio-diesel, bio-hydrogen, etc. (Jang et al. 2012). However, original microbes need to be improved because of low yield rates in biofuel production. Recent trends have been developed into using systems biology strategies for biofuel strain improvement (Gowen and Fong 2011; Mukhopadhyay et al. 2008), in which GSMM holds great promise to guide strain design for improving biofuel production by microorganisms (Lee et al. 2008b).

It is a typical case that ethanol production of S. cerevisiae was increased through manipulating the genetic targets predicted by in silico GSMM-guided simulation (Bro et al. 2006). Firstly, different strategies were characterized based on the published GSMM of S. cerevisiae (Forster et al. 2003). Then, one of them (an insertion of the GAPN gene) was predicted as the optimal genetic manipulation for ethanol production. After the experiments, the first resulted strain had a 40 % lower glycerol yield on glucose while the ethanol yield increased with 3 % without affecting the maximum specific growth rate. Subsequently, the GAPN gene was further expressed in the strain harboring xylose reductase and xylitol dehydrogenase, the ethanol production was finally increased by up to 25 %. Though there is not an outstanding enhance for ethanol production, it is at least the first successful attempt of using the GSMM-guided metabolic engineering strategy in the biofuel field. In addition, one recent study was reported that the production of isobutanol in B. subtilis was enhanced by using elementary mode analysis based on an updated B. subtilis GSMM (Li et al. 2012). These cases indicate the practicability of this strategy in guiding related microbial improvements of biofuel field.

In recent years, GSMM reconstruction in this field has got rapid extension (Table 3). With the exception of the model microbes and engineering strains, 15 species who can naturally synthesize biofuels have 21 GSMMs reconstructed. However, successful strain improvements using the GSMM-guided metabolic engineering strategy are limited. The studies based on these GSMMs are concentrating on the analyses of biofuel-related metabolic mechanism (Feist et al. 2006; Lee et al. 2008a; Roberts et al. 2010), the explanation for the mechanism of metabolic engineering results (Lee et al. 2009a), and in silico predictions but without further validation by wet experiments for biofuel production (Milne et al. 2011; Ranganathan and Maranas 2010). In addition, some important photosynthetic microbes with the ability to produce bio-hydrogen energy, such as Chlamydomonas reinhardtii and Synechocystis sp. PCC6803, are placed with high expectations (Ducat et al. 2011; Jones and Mayfield 2012). Seven GSMMs of these photosynthetic species have been developed (Table 3). Their applications are still confined to exploration of photoautotrophic mechanism and microbial growth (Chang et al. 2011; Montagud et al. 2010; Nogales et al. 2012).
Table 3

The potential microbes which have been reconstructed into GSMMs in biofuel field





C. reinhardtii

Photosynthetic organisms as a source of hydrogen


Chang et al. (2011)


de Oliveira Dal’Molin et al. (2011)

Clostridium acetobutylicum

Of interest for industrial solvent (particularly bio-butanol) production.


Lee et al. (2008a)


Senger and Papoutsakis (2008)

Clostridium beijerinckii

A sustainable alternative to petroleum-based production of butanol


Milne et al. (2011)

Clostridium thermocellum

Biochemically converting plant sugar and cellulose to ethanol


Roberts et al. (2010)

Methanococcus jannaschii

Producing many unique cofactors, coenzymes, and enzymes during methanogenesis


Tsoka et al. (2004)

Methanosarcina acetivorans

A methanogen capable of producing methane


Satish Kumar et al. (2011)

Methanosarcina barkeri

The more obvious use is to produce methane as an alternative fuel


Feist et al. (2006)

Micrococcus luteus

Its potential application for production of carotenoids and alkanes


Rokem et al. (2011)

Pelobacter propionicus

The organism creates ATP for an energy source and acetate, CO2 and H2 as bio-products


Sun et al. (2010)

Pelobacter carbinolicus

Of interest as microbial fuel cells for production of ethanol and acetate


Sun et al. (2010)

Rhodobacter sphaeroides

To produce hydrogen, polyhydroxybutyrate or other hydrocarbons


Imam et al. (2011)

Rhodoferax ferrireducens

It can utilize a large host of electron donors


Risso et al. (2009)

Synechocystis sp. PCC6803

A Cyanobacterium considered as a candidate photo-biological production platform for bio-hydrogen


Fu (2009)


Montagud et al. (2010)


Yoshikawa et al. (2011)


Nogales et al. (2012)


Montagud et al. (2011)

Zymomonas mobilis

A leading candidate for ethanol production


Widiastuti et al. (2011)


Lee et al. (2010)

Given the thriving performances in other fields, the advances of experimental genetic accessibility in biofuel species, and the preparedness of comprehensive exploration and knowledge on metabolic system, it is expected that GSMM-guided metabolic engineering approach would be wildly applied to improve biofuel microbes in the future.


It is no doubt that microbes can degrade or adsorb pollutants, such as nitrobenzene, to remedy the polluted environment (Jin et al. 2012). In order to reduce the toxic effects of environmental pollutants, the microbes used for bioremediation have taken advantages of this systematic strategy. So far seven species of interest and nine GSMMs have been reconstructed in this field (Table 4). Based on these models, not only the microbial mechanisms of remedying polluted environment are characterized but also some desired results of strain improvement are simulated, predicted, and even experimentally achieved.
Table 4

The representative microbes with GSMMs in application of bioremediation





Acinetobacter baylyi

Of interest in environmental and biotechnological applications with large-spectrum biodegradation capabilities.


Durot et al. (2008)

Dehalococcoides ethenogenes

Widespread application in bioremediation of toxic, persistent, carcinogenic, and ubiquitous ground water pollutants


Islam et al. (2010)

Geobacter metallireducens

Used for bioremediation and electricity generation from waste organic matter and renewable biomass


Sun et al. (2009)

G. sulfurreducens

Used for bioremediation and electricity generation from waste organic matter and renewable biomass


Mahadevan et al. (2006)

Pseudomonas putida

With the ability to degrade organic solvents such as toluene and also to convert styrene oil to biodegradable plastic polyhydroxyalkanoates (PHA)


Puchalka et al. (2008)


Sohn et al. (2010)


Nogales et al. (2008)

Rhodococcus erythropolis

The remarkable catabolic diversity of R. erythropolis makes it an interesting organism for bioremediation and fuel desulfurization.


Aggarwal et al. (2011)

Shewanella oneidensis

Its cytochromes have been of particular interest in the field of research due to their potential of bioremediation of heavy metals.


Pinchuk et al. (2010)

Geobacter spp., the natural inhabitants of a diverse range of soils and aquatic sediments (Lovley et al. 2004), which can reduce insoluble metal oxides, are the typical microorganisms in bioremediation. Geobacter sulfurreducens has become a model organism of this species for studying the mechanism of Fe(III) respiration and the process of environmental remediation, because it owns the earliest available sequenced genome (Methe et al. 2003), a workable system for genetic manipulation (Coppi et al. 2001), and the in silico GSMM (Mahadevan et al. 2006). The GSMM was reconstructed to investigate its central metabolism and electron transport, revealing that energy conservation with extracellular electron acceptors is limited when comparing with that associated with intracellular acceptors (Mahadevan et al. 2006). Guided by the prediction, Izallalen et al. (2008) had achieved a strain improvement. The Optknock algorithm (Burgard et al. 2003) was simulated on the GSMM of G. sulfurreducens to determine optimal gene knockouts which maximally increased respiration rates. The in silico analysis indicated that gene deletions in central metabolism or in the fatty and amino acid metabolism could increase respiration and cellular ATP demand. Subsequently, the prediction was validated in wet experiments through altering the F1 portion of membrane-bound F0F1 ATP synthase. As a result of increased electron transfer, a higher respiratory rate is beneficial to its ability in bioremediation. This was the first report of metabolic engineering to improve the respiratory rate of a microorganism.

Since the systematic approach for microbial strain improvement in bioremediation plays a significant role in environment protection (de Lorenzo 2008), a number of relevant microbial species are reconstructed into in silico GSMMs (Table 4). However, up to date, the successful cases using this systematic strategy are very limited in this field. Hence, much more explorations are needed in the future.

Future improvements and directions

Indeed, those applied achievements reviewed here are not easy to carry out in the past since there are a lot of obstacles hindering the guidance of GSMM. In summary, there are following three aspects with many challenges to be overcome in the near future.

Firstly, building a high-quality GSMM is a time-consuming and labor-intensive process (Durot et al. 2009; Thiele and Palsson 2010). So far, there are more than 2,200 species whose whole genomes were sequenced completely (Fig. 6). In theory, we can use the sequence information to build an organism-specific metabolic model into genome scale. However, even if the birth of GSMM (Edwards and Palsson 1999) is just 4 years later than the first sequenced bacteria (Fleischmann et al. 1995) to present date, only about 130 GSMMs of less than 80 species (<4 % of sequenced species) are available, lagging far behind the genome-sequenced species (Fig. 6). Although recent GSMM reconstruction speeds up a lot than earlier years, it still cannot keep pace with genome sequencing, especially in the “next-generation sequencing” era. Thus, the first one of the main researching focuses is to build a GSMM in high speed, so as to extend the applied scope of GSMM-guided metabolic engineering strategy for industrial biotechnology.
Fig. 6

Statistics of reconstructions and sequenced genomes. The cumulative number of GSMMs, corresponding species, and sequenced genomes published over the past decade are shown in this figure. The number of sequenced genomes was cited from GOLD database ( in October, 2012

For this reason, the efforts to automate reconstruction have been taken continuously. A few methods, such as pathologic (used in software The Pathway Tools) and AUTOGRAPH (Notebaart et al. 2006), can help for GSMM automatic reconstruction. Some computational tools, such as Pathwaytools (Karp et al. 2002), metaSHARK (Pinney et al. 2005), and MetaNetMaker (Forth et al. 2010), have been developed to reconstruct a draft GSMM. Recently, a web-based resource for high-throughput generation, optimization, and analysis of genome-scale metabolic models, the Model SEED (Henry et al. 2010), which could automate the reconstruction process in approximately 48 h using a completed genome sequence, greatly accelerates the reconstruction of a new GSMM. Another building strategy based on a refined GSMM template has been proven to be a feasible method (AbuOun et al. 2009; Liao et al. 2011; Vongsangnak et al. 2012). This reconstruction method identifies orthologous genes between the target species and a template species with extensively curated metabolic network, then followed by extracting the “orthologous” part of the GSMM from the well-studied species (AbuOun et al. 2009). Although those methods and tools decrease the time for reconstruction and improve model’s reliability to some degree, faster and more reliable automatic methods are still the immediate needs to meet the requirement of high-throughput GSMM reconstruction for industrial microbe.

Secondly, it is absolutely inescapable that none of the current existed models can provide a complete view of a natural cell (Baginsky et al. 2010). The accuracy of a GSMM and its predicting ability are the key issues for further applying GSMM-guided strategy widely. Thus, manual refinement to ensure GSMM with high-quality is another important researching aspect. In this process, the primary method is to search the physiological and biochemical information of target organism from literatures and databases. So it is apparent that comprehensive and non-redundant databases and enough phenotype data are desired. Recently, a biochemical database developed through integrating the data of more than ten different biochemical databases, Rhea is a comprehensive resource of expert-curated biochemical reactions, providing a set of non-redundant chemical reaction information including enzyme-catalyzed reactions, transport reactions, and spontaneously occurring reactions for GSMM reconstruction (Alcantara et al. 2012). The development of phenotype microarrays for microbes (Bochner 2009), which attempt to give a global view of cellular phenotypes, have greatly improved the reliability of the GSMM (Oh et al. 2007). In addition, some model curating strategies, such as the popular gap analysis (Satish Kumar et al. 2007), have played a significant role in filling the flaws of GSMM. A novel method, which could eliminate erroneous differences between species through comparative systems analysis, was reported for improving GSMM reconstruction (Oberhardt et al. 2011). These methods and resources mentioned above are expected to accelerate the reconstruction of high-quality GSMM to meet the industrial interest.

Beyond increasing the quality of GSMM, another desired researching direction should be the further developments of GSMM simulation methods. As is known to all, one of the most important features of constraints-based simulation based on GSMM is that it can easily depict a global metabolic system without kinetic and regulatory information. However, that is something unsatisfactory for it to predict the true metabolism of a target microbe. In order to make it up, one promising approach is to combine multiple omics data and thermo-kinetics data with the constraints-based simulation. For example, some biochemical kinetic parameters, if incorporated into GSMM, could significantly increase its predicting power (Blazeck and Alper 2010). Based on GSMM, the DFBA was simulating to identify gene targets for increasing specific expression of GFP in L. lactis IL1403 and to analyze the ethanol production of S. cerevisiae in fed-batch culture (Becker et al. 2011; Hjersted et al. 2007). As regulation genes are important targets in metabolic engineering, it is apparent that incorporating gene regulatory information from transcriptome data can greatly increase the prediction accuracy (Åkesson et al. 2004). For instance, the engineered target genes to increase valine and threonine production in E. coli were successfully identified by genome-scale metabolic simulations using transcriptome profiling data (Lee et al. 2007; Park et al. 2007). Furthermore, 13C-based flux analysis was used to predict engineering targets and evaluate cellular physiology with relatively high accuracy based on the GSMMs (Park et al. 2010; van Ooyen et al. 2012). Thus, it can be seen that GSMM-guided strategy incorporated with other effective data is expected to be the most efficient method to improve microbes at present or in the near future.

Thirdly, for industrial application aspect, microbial improvement guided by GSMM is expected to expand in applications towards none-model microbes. From the collected examples, it is obvious that this systematic method is well explored in the model organisms, E. coli and S. cerevisiae. However, those non-model microbes of interest have advantages in which they have inherent metabolic and regulation systems for the special application, making them attractive as a powerful platform for material production or other industrial use. In general, extensive applications of this strategy in non-model microbes are hindered by three major obstacles including costs in sequencing and analysis, great effort took in searching data and building a GSMM, and genetically accessibility to the non-model target microbe (Blazeck and Alper 2010). Comparing with the cost in “sanger” sequencing era, recent advances in “next-generation” sequencing have greatly reduced this cost (Hall 2007; Mardis 2008; Schuster 2008). Other high-throughput biological technologies, e.g., RNA-seq, gene-chips, phenotype microarray technology, NMR metabolite detection, and so on, provide us enough data for model building and simulation. More and more efforts imposed on the study of genetic manipulating system of target industrial microbes or genetic close species create prerequisites to apply GSMM-guided metabolic engineering for strain improvement. Profiting from these advances, some non-model microbes have been improved by this systematic strategy (Table 2). Although the obstacles that must be considered before using this strategy are still inescapable at present, it is expected that this strategy would be the state-of-the-art technology in the improvements of industrial microbes.

With the rise in high-throughput measurement technologies and the growing number of sequenced genomes, the continued construction of in silico GSMMs will provide increasingly powerful tools to investigate biological systems and design efficient cell factories. Much more effort should be made to accelerate GSMM reconstruction, improve model simulation, as well as expand the scope of GSMM-guided strategy in microbial improvement of industrial interests in future.


The authors are grateful to the supports by the major special project of science and technology of Zhejiang province, China (No. 2008C12G2020010), the Fundamental Research Funds for the Central Universities, and the NSFC project, China (No. 30971743).

Copyright information

© Springer-Verlag Berlin Heidelberg 2012