PENDISC: A Simple Method for Constructing a Mathematical Model from Time-Series Data of Metabolite Concentrations

The availability of large-scale datasets has led to more effort being made to understand characteristics of metabolic reaction networks. However, because the large-scale data are semi-quantitative, and may contain biological variations and/or analytical errors, it remains a challenge to construct a mathematical model with precise parameters using only these data. The present work proposes a simple method, referred to as PENDISC (arameter stimation in a on-mensionalized -system with onstraints), to assist the complex process of parameter estimation in the construction of a mathematical model for a given metabolic reaction system. The PENDISC method was evaluated using two simple mathematical models: a linear metabolic pathway model with inhibition and a branched metabolic pathway model with inhibition and activation. The results indicate that a smaller number of data points and rate constant parameters enhances the agreement between calculated values and time-series data of metabolite concentrations, and leads to faster convergence when the same initial estimates are used for the fitting. This method is also shown to be applicable to noisy time-series data and to unmeasurable metabolite concentrations in a network, and to have a potential to handle metabolome data of a relatively large-scale metabolic reaction system. Furthermore, it was applied to aspartate-derived amino acid biosynthesis in Arabidopsis thaliana plant. The result provides confirmation that the mathematical model constructed satisfactorily agrees with the time-series datasets of seven metabolite concentrations. Electronic supplementary material The online version of this article (doi:10.1007/s11538-014-9960-8) contains supplementary material, which is available to authorized users.


Supplementary Information 1: Additional information
Let us consider the following S-system equations that are derived from the linear network shown in Fig. S1.
The initial values for the metabolite concentrations are chosen as X 10 = 1.4, X 20 = 2.7, and X 30 = 1.2. The steady-state metabolite concentrations are as follows; X 1 * = 0.4902, X 2 * = 3.0958, and X 3 * =1.9610. In an actual system, these concentrations must be determined from measured values. Firstly, values of 0.5 or -0.5 are assigned to the kinetic orders in Eq. (S1) to obtain the following equations.
In general, the chance of the parameter estimation converging increases as the number of unknown parameters decreases. Fortunately, the PENDISC method can markedly reduce the number of unknown parameters using the constraints derived from the structure of the network. To confirm the validity of this operation, the parameter estimation in the linear metabolic pathway model with inhibition ( Fig. S1) was performed for two cases, one in which the values of A 1 , A 2 , and A 3 are unknown, and a second in which only the value of A 1 is unknown. Figure S2 shows a comparison of the results obtained using the estimated rate constants with the time-series data of metabolite concentrations; the left vertical axis shows the actual metabolite concentration, and the right vertical axis shows the dimensionless metabolite concentration. It should be noted that these two metabolite concentrations are different in magnitude, but vary in the same manner. Eleven concentration data points were used for each metabolite, and the initial values of A i were all set to 5. The rate constants determined for different numbers of unknown parameters are listed in Table S1 (Supplementary Information 2). The agreement between the calculated results and the time-series data is reasonable, but not perfect. This is because the kinetic orders in the S-system equations were set to 0.5 or -0.5. In every case, the calculated lines successfully exhibit time-transient behaviors analogous to the time-series data. The agreement is higher in the case of three unknown rate constants than in the case of one unknown rate constant. This is because the former case has more flexibility for changing the shape of the calculated line, as there are more adjustable parameters. However, it should be noted that when three dimensionless rate constants are unknown, the relevant equations do not satisfy the constraints. In theory, the calculated result can satisfy the constraints only when A 1 is unknown and other dimensionless rate constants are expressed as a function of A 1 . The calculation time was 6.34 s for the case of three unknown parameters, and this shortened to 4.67 s for the case of one unknown parameter.
Evidently, a reduction in the number of unknown parameters is the major advantage of the PENDISC method. When the pathway is linear, all the relevant differential equations are expressed in terms of the dimensionless rate constant of the first metabolite (A 1 ), regardless of the total number of metabolites. If the steady-state values and time-series data are available for all metabolites, the rate constants α i and β i (i = 1, 2, .....,n) can be obtained by simply determining A 1 in order to fit the calculated values to the time-series data for all metabolite concentrations.

Equations for branched pathway model with inhibition and activation
Let us consider the following S-system equations derived from the network structure with a branched pathway shown in Fig The initial values for the metabolite concentrations are chosen as X 10 = 1.4, X 20 = 2.7, X 30 = 1.2, and X 40 = 0.4.
(S6) to obtain the following equations.
Secondly, Eq. (S8) is non-dimensionalized using steady-state concentrations, followed by applying the constraints: with the following dimensionless rate constants.
Lastly, two unknown parameters included in Eq. (S10), A 1 and A 4 , are determined by a nonlinear least-squares method so as to fit the solution to Eq. (S10) to the time-series data.

Equations for aspartate-derived amino acid biosynthesis model
The The initial values for the metabolite concentrations are chosen as with the following dimensionless rate constants.

Supplementary Information 2: Additional tables
Original data without P2 (leave data at t=1.0 out) Original data without P3 (leave data at t=1.5 out) Original data without P4 (leave data at t=2.0 out) Original data without P5 (leave data at t=2.5 out) Original data without P6 (leave data at t=3.0 out) Original data without P7 (leave data at t=3.5 out) Original data without P8 (leave data at t=4.0 out) Original data without P9(leave data at t=4.5 out) Original data without P10 (leave data at t=5.0 out)  The numerical values before the character "A" and after the character "I" indicate the number of unknown parameters and the value used as the initial guesses, respectively. The numerical value before the character "A" indicates the number of unknown parameters and that after the word "Initial" represents the value used as the initial guesses   (Stephanopoulos and Simpson, 1997), (2) Methionine cycle (Reed, et al., 2004), (3) Modified TCA cycle (Shiraishi and Savageau, 1992), (4) Purin metabolism (Curto, et al., 1998)  for different numbers of time-series data. The initial guesses were all set to either 5 or 10 (represented by the symbols "initial 5" and "initial 10," respectively).  time-series data for each metabolite in an aspartate-derived amino acid biosynthesis model. The initial guesses were all set to be 5.