Background

Genome scale metabolic models (GEMs) are widely used to model and compute functional states of cellular metabolism [1] and as scaffolds for integrating various levels of high-throughput data [2]. A crucial step for achieving proper simulations with GEMs is to define a biomass pseudo-reaction [3, 4], which accounts for every single constituent comprising the cellular biomass (proteins, carbohydrates, lipids, etc.). In this step it is challenging to account for lipid requirements, as there are copious different individual lipid species: over 20 different classes of lipids can be produced in a cell, and each specific lipid belonging to any of those classes can contain various combinations of acyl chain groups, each of them with varying length and number of saturations [5]. This can yield over 1000 specific lipid species that the cell can potentially produce. Unsurprisingly, lipid metabolism therefore tends to be the most complicated part of any GEM.

A requirement for formulating the biomass pseudo-reaction are abundance measurements of every single constituent; however, this is seldom available for individual lipid species. Instead, it is more common to measure separately (i) a profile of all different lipid classes, for example by high-performance liquid chromatography [6, 7]; and (ii) a distribution of all different acyl chains, by fatty acid methyl ester (FAME) analysis [8, 9]. Therefore, GEMs have been adapted to handle these data.

The most common approach to represent lipid metabolism in GEMs is to enforce a specific distribution of each individual lipid species, either by using detailed experimental data [10, 11] or by assuming that lipid classes have all the same acyl chain distribution from a single FAME analysis [12, 13]. In both cases however, the model will be fixed to follow a predefined lipid distribution. This is undesirable, as lipid metabolism can show a high level of reorganization [5, 14], hence rendering the model’s predictions of limited use when simulating different experimental conditions, or when looking into the network’s flexibility for satisfying lipid requirements.

A second common approach is to allow any specific lipid to form a corresponding generic lipid class (e.g., “phosphocholine”) and to only constrain those classes with experimental abundances from lipid profiling [15, 16]. The problem with this approach is that experimental abundances from FAME analysis are neglected, and simulations always end up choosing lipid species that cost the least energy, which might not reflect reality, e.g. if there is regulation in place to ensure production of longer chain species. Hence, there is need for an approach that can incorporate both lipid profiling and FAME analysis, but at the same time can allow flexibility in the metabolic network.

In this work, we introduce SLIMEr, a method for correctly representing lipid requirements in GEMs while allowing network flexibility. The approach adds so-called SLIME reactions, which split lipids into their basic components; and lipid pseudo-reactions, that impose constraints on both the lipid classes and the acyl chain distributions. By following this approach, we achieve flux simulations that respect both the lipid class and acyl chain experimental distribution, and at the same time avoid over-constraining the model to only simulate one lipid distribution. We implemented this approach for the consensus GEM of Saccharomyces cerevisiae (budding yeast), a model that has undergone iterative improvements [15, 17,18,19,20] and is currently being hosted at https://github.com/SysBioChalmers/yeast-GEM. We show that the enhanced model: (i) enforces acyl chain requirements while preserving a high degree of network flexibility and an almost equal metabolic energy demand, (ii) better predicts specific lipid distributions, and (iii) computes lipid costs of transitioning between experimental conditions.

Results

Representing lipid constraints with the aid of SLIME reactions

Flux balance analysis (FBA) [21] is based on the following assumptions on metabolism: (i) a cell has a metabolic goal, which we can represent through a mathematical objective function, (ii) under short timescales there is no accumulation of intracellular metabolites, and (iii) metabolic fluxes are bounded to physical constraints, such as thermodynamics and kinetics. Those three assumptions define a basic FBA problem as followed:

$$ {\displaystyle \begin{array}{c}\operatorname{Min}\left({\mathrm{c}}^{\mathrm{T}}\mathrm{v}\right)\\ {}\mathrm{S}\cdot \mathrm{v}=0\\ {}\mathrm{LB}\le \mathrm{v}\le \mathrm{UB}\end{array}} $$
(1)

where S is the stoichiometric matrix, which contains the stoichiometric coefficients for all reactions and metabolites, v is the vector of metabolic fluxes [mmol/gDWh], c is the objective function vector, and LB and UB are the corresponding lower and upper bounds for each of the fluxes (some of them based on experimental values). As we usually wish to simulate growth, a biomass pseudo-reaction is typically added to the stoichiometric matrix as follows:

$$ \mathrm{protein}+\mathrm{carbohydrate}+\mathrm{lipid}+\mathrm{RNA}+\mathrm{DNA}+\dots \to \mathrm{Biomass} $$
(2)

where protein, carbohydrate, lipid, etc., are pseudo-metabolites that are produced from a combination of metabolic components. For example, protein is produced from a protein pseudo-reaction:

$$ {\mathrm{s}}_1\ \mathrm{ala}+{\mathrm{s}}_2\mathrm{cys}+{\mathrm{s}}_3\ \mathrm{asp}+\dots \to \mathrm{protein} $$
(3)

where si are the measured abundances [mmol/gDW] of the corresponding amino acids in yeast. In this study we focus on the lipid pseudo-reaction, which becomes more challenging to formulate, because there are so many different individual species. One option is to define an equivalent reaction to Eq.3 with every single lipid species [10]; however, as these measurements are not available for most organisms, the lipid pseudo-reaction is usually represented as the following instead:

$$ {\mathrm{s}}_1\ \mathrm{PI}+{\mathrm{s}}_2\ \mathrm{PC}+{\mathrm{s}}_3\ \mathrm{TAG}+{\mathrm{s}}_4\ \mathrm{ERG}\dots \to \mathrm{lipid} $$
(4)

where PI (phosphoinositol), PC (phosphocholine), TAG (triglyceride), ergosterol (ERG), etc. represent each of the lipid classes that exist in the model. Most of them represent not one but a plurality of different molecules, each with different combinations of acyl chain lengths and saturations. Therefore, they are also pseudo-metabolites that need to be produced in turn by additional pseudo-reactions. These pseudo-reactions can be constructed either in a restrictive or permissive approach (Fig. 1). The restrictive approach is to enforce the experimental FAME distribution to every single specific lipid species. This can be achieved by creating a generic acyl chain component [12]:

$$ {\mathrm{s}}_1\ \mathrm{Acyl}\ \mathrm{CoA}\left(16:0\right)+{\mathrm{s}}_2\ \mathrm{Acyl}\ \mathrm{CoA}\left(16:1\right)+\dots \to \mathrm{Acyl}\ \mathrm{CoA} $$
(5)

where the si coefficients are fractions inferred from FAME data. This generic Acyl – CoA is then used to form each generic lipid species, forcing then every lipid class to have the same acyl chain distribution. The latter is an important limitation of this approach, considering that the acyl chain distribution can vary significantly across lipid classes [5].

Fig. 1
figure 1

Overview of the process of including SLIME reactions and new lipid pseudo-reactions for a hypothetical model of three lipid classes and two types of acyl chain. The active fluxes after simulating the models are highlighted in light blue, showing that a GEM with a restrictive approach would use the same acyl chain composition for all lipid classes (left upper corner), a GEM with a permissive approach would always choose the cheapest species from each lipid class (left lower corner), and a GEM with SLIME reactions would satisfy both the lipid class and the acyl chain distribution, but choosing freely which specific lipid species to produce for this goal (right side)

On the other hand, the permissive approach for building the pseudo-reactions is to allow any of the specific lipids to form the generic lipid class [15]. For instance, the following set of pseudo-reactions can be defined for PI:

$$ {\displaystyle \begin{array}{c}\mathrm{PI}\ \left(16:0-16:1\right)\to {\mathrm{s}}_1\mathrm{PI}\\ {}\mathrm{PI}\ \left(16:1-16:1\right)\to {\mathrm{s}}_2\mathrm{PI}\\ {}\mathrm{PI}\ \left(16:1-18:1\right)\to {\mathrm{s}}_3\mathrm{PI}\\ {}\vdots \end{array}} $$
(6)

where si can be set to 1 or adapted to represent the cost of producing each specific lipid. The problem of this approach is that it disregards the acyl chain distribution, even if FAME data is available. Therefore, once simulations are computed, the model will always end up preferring the “cheapest” species to produce in terms of carbon and energy, usually corresponding to species with the shortest acyl chains, unless the si coefficients are arbitrarily tuned to favor longer chains.

Additionally, even though for some specific species such as ergosterol the measured abundance [mg/gDW] can be directly transformed to the stoichiometric coefficient in Eq.4 [mmol/gDW], for most lipids the measured abundance cannot be directly converted, as the molecular weight varies between specific lipid species. Hence, average molecular weights need to be estimated in both permissive and restrictive approaches, leading to skewed predictions.

In this study we solve all the problems presented above through two new types of pseudo-reactions, to account for both constraints on lipid classes and on acyl chains. The first pseudo-reactions Split Lipids Into Measurable Entities and are hence referred to as SLIME reactions. As the name suggests, these pseudo-reactions take each specific lipid and split it into its basic components, i.e. its backbone and acyl chains:

$$ {\mathrm{L}}_{\mathrm{i}\mathrm{j}}\to {\mathrm{s}}_{\mathrm{i}}\ {\mathrm{B}}_{\mathrm{i}}+\sum \limits_{\mathrm{k}\in \mathrm{j}}{\mathrm{s}}_{\mathrm{jk}}\ {\mathrm{C}}_{\mathrm{k}} $$
(7)

where Lij is a lipid of class i and chain configuration j, Bi the corresponding backbone, Ck the corresponding chain k, and si and sjk the associated stoichiometry coefficients. These reactions replace any pseudo-reaction of the sort of Eq.5 or Eq.6 that were already present in the model (Fig. 1).

The second type of pseudo-reactions are new lipid pseudo-reactions, which will in turn replace Eq.4, the old lipid pseudo-reaction that only constrained lipid classes. There are now three different lipid pseudo reactions (Fig. 1): the first pulls all backbone species created in Eq.7 into a generic backbone and uses the corresponding abundance data [g/gDW] as stoichiometric coefficients. The second reaction does the same but for the specific acyl chains, with data from FAME analysis [g/gDW], to create a generic acyl chain. Finally, the third reaction merges back together the generic backbone and the generic acyl chain into a generic lipid, which will be used in the biomass pseudo-reaction as in Eq.2.

For the new reactions to be consistent, we need to choose adequate stoichiometric coefficients for Eq.7. If the abundance data would be molar, si would be equal to 1 and sjk would be equal to the number of repetitions of the corresponding acyl chain k in lipid j. However, as the abundance data often comes in mass units, si must be equal to the molecular weight [g/mmol] of the full lipid, and sjk must be equal to the molecular weight of the corresponding acyl chain k, multiplied by the number of repetitions of k in configuration j. By choosing these values we allow the SLIME reactions to convert the molar production of the lipid [mmol/gDWh] into a mass basis [g/gDWh], which in turn will be converted to a lipid turnover [1/h] by the lipid pseudo reactions.

Improved model of yeast

We implemented SLIMEr in the consensus genome-scale model of yeast version 7.8.0 [22], a model which used the previously mentioned permissive approach, and had at the start 2224 metabolites and 3496 reactions. Out of those reactions, 176 corresponded to reactions of the sort of Eq.6, which were replaced by 186 SLIME reactions that cover in total 19 lipid classes and 6 different acyl chains. An additional 27 metabolites (including both specific and generic backbones and acyl chains) and 15 reactions (including transport reactions, lipid pseudo-reactions and exchange reactions) were added to the model, and 10 metabolites and 1 reaction (connected to previously deleted reactions) were removed. The final enhanced model had therefore 2241 metabolites and 3520 reactions, and kept the number of genes and gene-reaction rules constant, as only pseudo-reactions with no gene-reaction rules were modified.

For the reference model, we used both lipid profiling and FAME data at low growth rate and no stress conditions [23]. The lipid profile was rescaled to be proportional to the FAME data, as detailed in the methods section. We can see that by using SLIMEr, the enhanced model was enforced to follow the acyl chain distribution of the experimental data (Fig. 2a), whereas the previous permissive model predicted mostly acyl chains of 16-carbon length (less costly), and only a small amount of 18-carbon length to satisfy the requirement of ergosterol ester (Additional file 1: Figure S1), as ergosteryl oleate is cheaper to produce mass-wise than ergosteryl palmitoleate (Additional file 1: Table S1).

Fig. 2
figure 2

The enhanced GEM with improved constraints on lipid metabolism. a By using SLIMEr, a correct acyl chain composition is enforced. b Breakdown of the acyl chain distribution and variability predicted by the enhanced GEM, for each experimentally detected lipid class. Thick black lines correspond to parsimonious FBA predictions, while the FVA allowed ranges are shown with colored bars

With the enhanced model we also studied in how many ways lipid requirements can be satisfied spending the same amount of energy, by performing flux variability analysis (FVA) (Fig. 2b). Comparing these predictions to the ones of the permissive model (Additional file 1: Figure S1), we saw some reductions in variability, coming mostly from changes in phosphatidylcholine and triglyceride content. However, despite the additional constraints imposed, lipid metabolism could still rearrange itself in a wide amount of combinations, and overall flux variability did not decrease significantly (Additional file 1: Figure S2a). This agrees with experimental observations that lipid metabolism is highly flexible [5]; therefore, handling lipid metabolism with SLIME reactions is preferred over alternative approaches, such as models that constrain single individual lipid species [10, 24], as the latter limit the organism to only one feasible state of lipid metabolism and hence bias results.

Model predictions of specific lipid distributions

To validate model predictions, we used reported data [5] including measurements of 102 specific lipid species. This data was added up to compute the totals of each lipid class and each acyl chain, and these sums were in turn used as input for creating both a permissive and an enhanced model. In the latter case, as a total lipid abundance of 8% was assumed, the acyl chain abundances were rescaled to be proportional to the lipid classes abundances (see the methods section for more details). We then performed random sampling of fluxes for the resulting models, to generate 10,000 specific lipid distributions for each model and for each of the 8 conditions of the study.

Comparing these in silico lipid distributions to the original in vivo measurements (Fig. 3a), the enhanced model improved the average prediction error for all experimental conditions (Additional file 1: Table S2), and overall the simulated lipid distributions came much closer to the experimental values compared to the permissive model (Fig. 3b). Furthermore, simulations with the enhanced model are also superior to simulations with a restrictive model, as the latter approach would not capture the fact that many lipid classes show preference towards few acyl chains. Instead, it would force the model to produce all acyl chains in the same proportion for each lipid class, significantly lowering the quality of predictions (Additional file 1: Figure S3).

Fig. 3
figure 3

Using experimental data to validate the model and predict energy costs. a The lipid composition of 10,000 simulations of the enhanced model achieved with random sampling are presented for each specific lipid species (blue circles), compared to the actual measured experimental values (red bars). b Principal component analysis of all (log transformed) lipid abundance distributions for both the permissive (yellow) and enhanced (blue) models, compared to experimental values (red). Different tonalities of yellow and blue indicate the 8 different simulated conditions and strains. c Carbon costs (continuous lines) and ATP costs (segmented lines) of satisfying the acyl chain distribution at four increasing levels of temperature (30, 33, 36 and 38 °C), NaCl concentrations (0, 200, 400 and 600 mM) and EtOH concentrations (0, 20, 40 and 60 g/L)

It should be noted that even though SLIMEr improved the model’s lipid composition predictions, many other distributions are still predicted to be equally likely for all simulated conditions (Fig. 3b, Additional file 1: Figure S4); which reinforces the previously mentioned idea of a highly flexible lipid network. Furthermore, the fact that yeast picks a certain lipid distribution in vivo for each strain and condition, but has many additional options in silico, points also to a high level of regulation in place to adapt the distribution of lipid species in S. cerevisiae depending on the genetic background and environmental conditions [25].

Energy costs at increasing levels of stress

As a final study, we used lipid data of yeast grown under 9 different stress levels [23] to create both a permissive and enhanced GEM for each of those conditions. We then computed the differences in ATP turnover and carbon requirements between the permissive and enhanced model, which correspond to the extra energy and carbon costs, respectively, required to achieve the given acyl chain distribution in each condition (Fig. 3c). As increasing stress levels are associated to an increase in maintenance energy (Additional file 1: Figure S5) [26], by using SLIMEr we therefore showed an increase in lipid expenses when transitioning from a metabolic state of low energy demand to high energy demand.

In the case of the reference condition, the permissive model could produce 145.9 μmol (ATP)/gDW more than the enhanced model. Also in this condition, the simulated growth-associated ATP maintenance (GAM) without accounting for known polymerization costs of proteins, carbohydrates, RNA and DNA [27] was of 36.96 mmol (ATP)/gDW, which corresponds to the maintenance costs of unspecified functions in the model, such as protein turnover, maintenance of membrane potentials, etc. The ATP cost for achieving correct acyl chain distribution under reference conditions corresponded then to 0.4% of the total costs of processes not included in the model. This is a rather low percentage, which shows that the addition of SLIME reactions will not cause a significant increase in the overall metabolic energy demand, while making the simulated fluxes in lipid metabolism better match experimentally observed distributions (Fig. 3b).

Discussion

As previously mentioned, we did not see a significant reduction in flux variability of predictions compared to the permissive approach (Additional file 1: Figure S2). This is partly explained as in each simulation we maximize the ATP maintenance; therefore, simulations of the permissive model (which did not have constraints on the acyl chain distribution) had a slightly higher ATP maintenance, making simulations overall similarly constrained. Nonetheless, the main advantage of using SLIMEr is not to constrain simulations more, but instead to constrain lipid fluxes such that they better match biologically feasible distributions (Fig. 3b).

It is also important to note that the model does not take other physiological properties into account, such as specific regulation, or curvature and fluidity of membranes as function of lipid composition and/or temperature. It only takes FAME analysis and lipid profile data, and demonstrates that specific lipid distributions from simulations are consistent with these measurements. It would be of interest to account for additional data and processes such as the ones mentioned, but this is beyond the scope of this study.

Even though developed for the consensus GEM of S. cerevisiae, this approach can be extended to any other model and/or organism. The main challenge here is to map all lipids in the model to the corresponding pseudo-metabolites (backbones and chains), as conventions for naming lipids vary a great deal between different databases and models. Introduction of standardized metabolite ids [28, 29] can significantly aid this otherwise laborious task.

Conclusion

With SLIMEr we can now correctly represent biomass requirements from lipid metabolism in genome-scale metabolic models. The approach allows the model to satisfy at the same time requirements on the lipid class and acyl chain distributions, which is a significant improvement compared to only being able to constrain lipid classes [15, 16]. We have also shown the high degree of flexibility in lipid metabolism, which shows that approaches that over-constrain the lipid requirements by enforcing specific concentrations for individual species [10, 11, 24] or forcing a given acyl chain distribution to all species [12, 13] are not suitable for handling this flexibility. Finally, we have demonstrated the use of the expanded model as a tool to compute lipid requirements in varying experimental conditions. We expect the enhanced model to be useful for metabolic engineering applications, particularly for designing strains that can rearrange the chain length distribution of specific lipid classes [30].

Methods

Data used

All data used in this study was collected from literature. For the initial model analysis and the analysis of lipid metabolism under increasing levels of stress, aerobic glucose-limited chemostat data of S. cerevisiae, strain CEN.PK113-7D, growing on minimal media at a growth rate of D = 0.1 h− 1 was used [23]. The mentioned study collected lipid abundance data in mg/gDW for both lipid classes and acyl chains for 1 reference condition plus 9 different conditions of stress (temperature, ethanol and osmotic stress). Additionally, carbohydrate, protein and RNA content [g/gDW] was measured for all stress conditions, together with flux data [mmol/gDWh] for glucose and oxygen uptake, and glycerol, acetate, ethanol, pyruvate, succinate and CO2 production.

For model predictions of specific lipid distributions, we used published data of S. cerevisiae grown aerobically on SD media at maximum growth rate (shake flask cultures), under 8 different conditions: four different BY4741 strains (a wildtype plus three knockout strains), each cultivated at both 24 °C and 37 °C [5]. In that study, the authors introduced a novel quantification method for detecting the abundance of up to 250 singular species of lipids. Out of those, 102 were used in our study, as they had direct correspondence to a species in the GEM employed. Even though not all lipids were accounted for in the model, those 102 species included the ones most abundant in vivo, as such providing high mass-coverage (on average 84% of the total detected lipid abundance) without having to add any additional lipid species and reactions to the model. Abundance values were converted from mol/mol to mg/gDW assuming an 8% lipid abundance in biomass [23] and considering the unmatched lipid percentage previously mentioned. Additionally, we assumed a protein composition of 0.5 g/gDW, an RNA composition of 0.06 g/gDW, a glucose uptake of 20.4 mmol/gDWh, and biomass growth rate of 0.41 h− 1, based on previous batch simulations of the yeast GEM [31].

Model enhancement details

The consensus genome-scale model of yeast, version 7.8.0 [22], was used. Compared to version 7.6 from the published paper [20], the model included a manual curation detailed in previous work [31], a clustered biomass pseudo-reaction, and metabolite formulas added to every lipid. Combining this model together with the experimental data, the following five steps were followed to create a model with SLIME reactions, specific to each experimental condition:

  1. 1.

    Add pseudo-metabolites representing each specific backbone, each specific acyl chain, the generic backbone and the generic acyl chain.

  2. 2.

    Add for each specific lipid species a SLIME reaction. These reactions replace the previous ones of the sort of Eq.6 in the model [15].

  3. 3.

    Add all three new lipid pseudo-reactions (Fig. 1) using the experimental data [g/gDW]. These reactions replace the original lipid pseudo-reaction (Eq.4).

  4. 4.

    Scale either the lipid class or the acyl chain abundance data so that they are proportional, as the approach is based on exact mass balances. For this, an optimization problem is carried out where the coefficients of the corresponding pseudo-reaction are rescaled to minimize to zero the excretion of unused backbones and acyl chains (Additional file 1: Figure S6).

  5. 5.

    Finally, scale any other component in the biomass pseudo-reaction for which there is data, and ensure that the biomass composition adds up to 1 g/gDW [32] by rescaling the total amount of carbohydrates, which was not measured in the datasets employed.

To compare the performance of the new enhanced model, an additional model for each condition was created, which did not have the acyl chain pseudo-reaction, but instead exchange reactions for each acyl chain, so that the model could freely choose the acyl chain distribution. Note that by doing this, the only remaining lipid constraint is the lipid backbone pseudo-reaction, meaning that this alternative model is equivalent to the permissive approach mentioned in the results section. Therefore, we refer to this model as the “permissive” model, and use it to benchmark our analysis. In turn, a comparison to a “restrictive” model is only briefly outlined when predicting specific lipid distributions, as the experimental data showed that the acyl chain distribution in yeast varies considerably across lipid classes (Additional file 1: Figure S7), making the restrictive approach not applicable here.

Simulation details

For all FBA simulations, measured exchange fluxes were used to constrain the model, allowing up to a 5% of deviation from the average measurements, and a parsimonious FBA approach [33] was followed, maximizing first the ATP turnover and then minimizing the total sum of absolute fluxes, in order to find the most compact solution. The obtained ATP turnover value is equal to the sum of the growth associated ATP maintenance (GAM) and the non-growth counterpart (NGAM, equal to 0.7 mmol/gDWh in the original model), and it was used to compare ATP costs from transitioning from one state to another.

The variability of each different lipid species was computed with FVA [34] on each corresponding group of SLIME reactions at a time; e.g., for assessing the variability of C18:0 in PI, FVA was applied on all SLIME reactions producing PI and any C18:0 acyl chains. Variability was also assessed with optGpSampler, an implementation of the artificial centering hit-and-run algorithm for random sampling of metabolic fluxes [35]. Abundances in mg/gDW of each lipid species were then computed from the corresponding SLIME reaction fluxes, multiplied by the molecular weight and divided by the biomass growth rate. All simulations were performed in Matlab® R2018a, using the COBRA toolbox [36], and Gurobi® 7.5 set as optimizer.