Background

Mathematical modeling has evolved as a powerful paradigm to analyze, and ultimately design complex biochemical networks [15]. Mathematical modeling of biochemical networks is often an iterative process. First, models are formulated from existing biochemical knowledge, and then model parameters are estimated using experimental data [68]. Parameter estimation is typically framed as a non-linear optimization problem wherein the residual (or objective function) between experimental measurements and model simulations is minimized using an optimization strategy [9]. Optimal parameter estimates are then used to predict unseen experimental data. If the validation studies fail, model construction and calibration are repeated iteratively until satisfactory results are obtained. As our biological knowledge increases, model formulation may not be as significant a challenge, but parameter estimation will likely remain difficult.

Parameter estimation is a major challenge to the development of biochemical models. Parameter estimation has been a well studied engineering problem for decades[1013]. However, the complex dynamics of large biological systems and noisy, often incomplete experimental data sets pose a unique estimation challenge. Often optimization problems involving biological systems are non-linear and multi-modal i.e., typical models have multiple local minima or maxima [7, 9]. Non-linearity coupled with multi-modality renders local optimization techniques such as pattern search [14], Nelder-Mead simplex methods [15], steepest descent or Levenberg-Marquardt [16] incapable of reliably obtaining globally optimal solutions as these methods often terminate at local minimum. Though deterministic global optimization techniques (for example algorithms based on branch and bound) can handle non-linearity and multi-modality [17, 18], the absence of derivative information, discontinuous objective functions, non-smooth regions or the lack of knowledge about the objective function hampers these techniques.

Meta-heuristics like Genetic Algorithms (GAs) [19], Simulated Annealing (SA) [20], Evolutionary Programming [21] and Differential Evolution (DE) [2225] have all shown promise on non-linear multi-modal problems [26]. These techniques do not make any assumptions nor do they require, a priori information about the structure of the objective function. Meta-heuristics are often very effective at finding globally optimal or near optimal solutions. For example, Mendes et al. used SA to estimate rate constants for the inhibition of HIV proteinase [27], while Modchang et al. used a GA to estimate parameters for a model of G-protein-coupled receptor (GPCR) activity [28]. Parameter estimates obtained using the GA stratified the effectiveness of two G-protein agonists, N6-cyclopentyladenosine (CPA) and 5’-N-ethylcarboxamidoadenosine (NECA). Tashkova et al. compared different meta-heuristics for parameter estimation on a dynamic model of endocytosis; DE was the most effective of the approaches tested [29]. Banga and co-workers have also successfully applied scatter-search to estimate model parameters [3032]. Hybrid approaches, which combine meta-heuristics with local optimization techniques, have also become popular. For example, Egea et al. developed the enhanced scatter search (eSS) method [32], which combined scatter and local search methods, for parameter estimation in biological models [33]. However, despite these successes, a major drawback of most meta-heuristics remains the large number of function evaluations required to explore parameter space. Performing numerous potentially expensive function evaluations is not desirable (and perhaps not feasible) for many types of biochemical models. Alternatively, Tolson and Shoemaker found, using high-dimensional watershed models, that perturbing only a subset of parameters was an effective strategy for estimating parameters in expensive models [34]. Their approach, called Dynamically Dimensioned Search (DDS), is a simple stochastic single-solution heuristic that estimates nearly optimal solutions within a specified maximum number of function (or model) evaluations. Thus, while meta-heuristics are often effective at estimating globally optimal or nearly optimal solutions, they require a large number of function evaluations to converge to a solution.

In this study, we developed Dynamic Optimization with Particle Swarms (DOPS), a novel hybrid meta-heuristic that combines the global search capability of multi-swarm particle swarm optimization with the greedy refinement of dynamically dimensioned search (DDS). The objective of DOPS is to obtain near optimal parameter estimates for large biochemical models within a relatively few function evaluations. DOPS uses multi-swarm particle swarm optimization to generate nearly optimal candidate solutions, which are then greedily updated using dynamically dimensioned search. While particle swarm techniques are effective, they have the tendency to become stuck in small local regions and lose swarm diversity, so we combined multi-swarm particle optimization with DDS to escape these local regions and continue towards better solutions [35]. We tested DOPS using a combination of classic optimization test functions, biochemical benchmark problems and real-world biochemical models. First, we tested the performance of DOPS on the Ackley and Rosenbrock functions, and published biochemical benchmark problems. Next, we used DOPS to estimate the parameters of a model of the human coagulation cascade. On average, DOPS outperformed other common meta-heuristics like differential evolution, a genetic algorithm, CMA-ES (Covariance Matrix Adaptation Evolution Strategy), simulated annealing, single-swarm particle swarm optimization, and dynamically dimensioned search on the optimization test functions, benchmark problems and the coagulation model. For example, DOPS recovered the nominal parameters for the benchmark problems using an order of magnitude fewer function evaluations than eSS in all cases. It also produced parameter estimates for the coagulation model that predicted unseen coagulation data sets. Thus, DOPS is a promising hybrid meta-heuristic for the estimation of biochemical model parameters in relatively few function evaluations. However, the relative performance of DOPS should be evaluated cautiously; only naive implementations of the other approaches were tested. Thus, it is possible that other optimized meta-heuristics could outperform DOPS on both test and real-world problems.

Results

DOPS explores parameter space using a combination of global methods.

DOPS combines a multi-swarm particle swarm method with the dynamically dimensioned search approach of Shoemaker and colleagues (Fig. 1). The goal of DOPS is to estimate optimal or near optimal parameter vectors for high-dimensional biological models within a specified number of function evaluations. Toward this objective, DOPS begins by using a multi-swarm particle swarm search and then dynamically switches, using an adaptive switching criteria, to the DDS approach. The particle swarm search uses multiple sub-swarms wherein the update to each particle (corresponding to a parameter vector estimate) is influenced by the best particle amongst the sub-swarm, and the current globally best particle. Particle updates occur within sub-swarms for a certain number of function evaluations, after which the sub-swarms are reorganized. This sub-swarm mixing is similar to the regrouping strategy described by Zhao et al. [36]. DOPS switches out of the particle swarm phase based upon an adaptive switching criteria that is a function of the rate of error convergence. If the error represented by the best particle does not decrease for a threshold number of function evaluations, DOPS switches automatically to the DDS search phase. The DDS search is initialized with the globally best particle from the particle swarm phase, thereafter, the particle is greedily updated by perturbing a subset of dimensions for the remaining number of function evaluations. The identity of the parameters perturbed is chosen randomly, with fewer parameters perturbed the higher the number of function evaluations.

Fig. 1
figure 1

Schematic of the dynamic optimization with particle swarms (DOPS) approach. Top: Each particle represents an N dimensional parameter vector. Particles are given randomly generated initial solutions and grouped into different sub-swarms. Within each swarm the magnitude and direction of the movement a particle is influenced by the position of the best particle and also by its own experience. After every g number of function evaluations the particles are mixed and randomly assigned to different swarms. When the error due to the global best particle (best particle amongst all the sub-swarms) does not drop over a certain number of function evaluations, the swarm search is stopped and the search switches to a Dynamically Dimensioned Search with global best particle as the initial solution vector or candidate vector. Bottom: The candidate vector performs a greedy global search for the remaining number of function evaluations. The search neighborhood is dynamically adjusted by varying the number of dimensions that are perturbed (in black) in each evaluation step. The probability that a dimension is perturbed decreases as the number of function evaluations increase

DOPS minimized benchmark problems using fewer function evaluations.

On average, DOPS performed similarly or outperformed four other meta-heuristics for the Ackley and Rastrigin test functions (Fig. 2). The Ackley and Rastrigin functions both have multiple local extrema and attain a global minimum value of zero. In each case, the maximum number of function evaluations was fixed at \(\mathcal {N} = 4000\), and \(\mathcal {T} = 25\) independent experiments were run with different initial parameter vectors. DOPS found optimal or near optimal solutions for both the 10-dimensional Ackley (Fig. 2a) and Rastrigin (Fig. 2b) functions within the budget of function evaluations. In each of the 10-dimensional cases, other meta-heurtistics such as DDS and DE also performed well. However, DOPS consistently outperformed all other approaches tested. This performance difference was more pronounced as the dimension of the search problem increased; for a 300-dimensional Rastrigin function, DOPS was the only approach to find an optimal or near optimal solution within the function evaluation budget (Fig. 2b). Taken together, DOPS performed at least as well as other meta-heuristics on small dimensional test problems, but was especially suited to large dimensional search spaces. Next, we tested DOPS on benchmark biochemical models of varying complexity.

Fig. 2
figure 2

Performance of DOPS and other meta-heuristics for the Ackley and Rastrigin functions. a: Mean scaled error versus the number of function evaluations for the 10-dimensional Ackley function. DOPS, DDS and ESS find optimal or near optimal solutions within the specified number of function evaluations. b: Mean scaled error versus the number of function evaluations for the 10-dimensional Rastrigin function. Nearly all the techniques find optimal or near optimal solutions within the specified number of function evaluations. c: Mean scaled error versus the number of function evaluations for the 300-dimensional Rastrigin function. DOPS is the only algorithm that finds an optimal or near optimal solution within the specified number of function evaluations. In all cases, the maximum number of function evaluations was \(\mathcal {N}\) = 4000. Mean and standard deviation were calculated over \(\mathcal {T}\) = 25 trials. A star denotes that the average value was less than 1E-6

Villaverde and co-workers published a set of benchmark biochemical problems to evaluate parameter estimation methods [33]. They ranked the example problems by computational cost from most to least expensive. We evaluated the performance of DOPS on problems from the least and most expensive categories. The least expensive problem was a metabolic model of Chinese Hamster Ovary (CHO) with 35 metabolites, 32 reactions and 117 parameters [37]. The biochemical reactions were modeled using modular rate laws and generalized Michaelis–Menten kinetics. On the other hand, the expensive problem was a genome scale kinetic model of Saccharomyces cerevisiae with 261 reactions, 262 variables and 1759 parameters [38]. In both cases, synthetic time series data generated with known parameter values, was used as training data to estimate the model parameters. For the Saccharomyces cerevisiae model, the time series data consisted of 44 observables, while for the CHO metabolism problem the data corresponded to 13 different metabolite measurement sets. The number of function evaluations was fixed at \(\mathcal {N} = 4000\), and we trained both models against the synthetic experimental data. DOPS produced good fits to the synthetic data (Additional file 1: Figure S1 and Additional file 2: Figure S2), and recapitulated the nominal parameter values using only \(\mathcal {N}\leq ~4000\) function evaluations (Additional file 3: Figure S3). On the other hand, the enhanced scatter search (eSS) with a local optimizer method, took on order 105 function evaluations for the same problems. DOPS required a comprable amount of time (Additional file 4: Figure S4), faster convergence (Additional file 5: Figure S5 and Additional file 6: Figure S6), and also had lower variability in the best value obtained (Additional file 7: Figure S7) across multiple runs when compared to other meta-heuristics. Thus, DOPS estimated the parameters in benchmark biochemical models, and recovered the original parameters from the synthetic data, using fewer function evaluations. Next, we compared the performance of DOPS with four other meta-heuristics for a model of the human coagulation cascade.

DOPS estimated the parameters of a human coagulation model.

Coagulation is an archetype biochemical network that is highly interconnected, containing both negative and positive feedback (Fig. 3). The biochemistry of coagulation, though complex, has been well studied [3945], and reliable experimental protocols have been developed to interrogate the system [4649]. Coagulation is mediated by a family proteases in the circulation, called factors and a key group of blood cells, called platelets. The central process in coagulation is the conversion of prothrombin (fII), an inactive coagulation factor, to the master protease thrombin (FIIa). Thrombin generation involves three phases, initiation, amplification and termination. Initiation requires a trigger event, for example a vessel injury which exposes tissue factor (TF), which leads to the activation of factor VII (FVIIa) and the formation of the TF/FVIIa complex. Two converging pathways, the extrinsic and intrinsic cascades, then process and amplify this initial coagulation signal. There are several control points in the cascade that inhibit thrombin formation, and eventually terminate thrombin generation. Tissue Factor Pathway Inhibitor (TFPI) inhibits upstream activation events, while antithrombin III (ATIII) neutralizes several of the proteases generated during coagulation, including thrombin. Thrombin itself also inadvertently plays a role in its own inhibition; thrombin, through interaction with thrombomodulin, protein C and endothelial cell protein C receptor (EPCR), converts protein C to activated protein C (APC) which attenuates the coagulation response by proteolytic cleavage of amplification complexes. Termination occurs after either prothrombin is consumed, or thrombin formation is neutralized by inhibitors such as APC or ATIII. Thus, the human coagulation cascade is an ideal test case; coagulation is challenging because it contains both fast and slow dynamics, but also accessible because of the availability of comprehensive data sets for model identification and validation. In this study, we used the coagulation model of Luan et al. [49], which is a coupled system of non-linear ordinary differential equations where biochemical interactions were modeled using mass action kinetics. The Luan model contained 148 parameters and 92 species and has been validated using 21 published experimental datasets.

Fig. 3
figure 3

Schematic of the extrinsic and intrinsic coagulation cascade. Inactive zymogens upstream (grey) are activated by exposure to tissue factor (TF) following vessel injury. Tissue factor and activated factor VIIa (FVIIa) form a complex that activates factor X (fX) and IX (fIX). FXa activates downstream factors including factor VIII (fVIII) and fIX. Factor V (fV) is primarily activated by thrombin (FIIa). In addition, we included a secondary fV activation route involving FXa. FXa and FVa form a complex (prothrombinase) on activated platelets that converts prothrombin (fII) to FIIa. FIXa and FVIIIa can also form a complex (tenase) on activated platelets which catalyzes FXa formation. Thrombin also activates upstream coagulation factors, forming a strong positive feedback ensuring rapid activation. Tissue factor pathway inhibitor (TFPI) downregulates FXa formation and activity by sequestering free FXa and TF-FVIIa in a FXa-dependent manner. Antithrombin III (ATIII) inhibits all proteases. Thrombin inhibits itself binding the surface protein thrombomodulin (TM). The IIa-TM complex catalyzes the conversion of protein C (PC) to activated protein C (APC), which attenuates the coagulation response by the proteolytic cleavage of fV/FVa and fVIII/FVIIIa

DOPS estimated the parameters of a human coagulation model for TF/VIIa initiated coagulation without anticoagulants (Fig. 4). The objective function was an unweighted linear combination of two error functions, representing coagulation initiated with different concentrations of TF/FVIIa (5pM, 5nM) [46]. The number of function evaluations was restricted to \(\mathcal {N} = 4000\) for each algorithm we tested, and we performed \(\mathcal {T} = 25\) trials of each experiment to collect average performance data (Table 1). DOPS converged faster and had a lower final error compared to the other algorithms (Fig. 5). Within the first 25% of function evaluations, DOPS produced a rapid drop in error followed by a slower but steady decline (Additional file 8: Figure S8b). Approximately between 500-1000 function evaluations DOPS switched to the dynamically dimensioned search phase, however this transition varied from trial to trial since the switch was based upon the local convergence rate. On average, DOPS minimized the coagulation model error to a greater extent than the other meta-heuristics. However, it was unclear if the parameters estimated by DOPS had predictive power on unseen data. To address this question, we used the final parameters estimated by DOPS to simulate data that was not used for training (coagulation initiated with 500pM, 50pM, and 10pM TF/VIIa). The optimal or near optimal parameters obtained by DOPS predicted unseen coagulation datasets (Fig. 6). The normalized standard error for the coagulation predictions was consistent with the training error, with the exception of the 50pM TF/VIIa case which was a factor 2.65 worse (Table 2). However, this might be expected as coagulation initiation with 50pM TF/FVIIa was the farthest away from the training conditions. Taken together, DOPS estimated parameter sets with predictive power on unseen coagulation data using fewer function iterations than other meta-heuristics. Next, we explored how the number of sub-swarms and the switch to DDS influenced the performance of the approach.

Fig. 4
figure 4

Model fits on experimental data using DOPS. The model parameters were estimated using DOPS. Solid black lines indicate the simulated mean thrombin concentration using parameter vectors from 25 trials. The grey shaded region represents the 99% confidence estimate of the mean simulated thrombin concentration. The experimental data is reproduced from the synthetic plasma assays of Mann and co-workers. Thrombin generation is initiated by adding Factor TF/VIIa (5nM (blue) and 5pM (red)) to synthetic plasma containing 200 μmol/L of phospholipid vesicles (PCPS) and a mixture of coagulation factors (II,V,VII,VIII,IX,X and XI) at their mean plasma concentrations

Fig. 5
figure 5

Error convergence rates of the nine different algorithms on the coagulation model. The objective error is the mean over \(\mathcal {T}\)= 25 trials. DOPS, SA, PSO and DOPS-PSO have the steepest drop in error during first 300 function evaluations. Thereafter the error drop in DDS and SA remains nearly constant whereas DOPS continues to drops further. In the alloted budget of function evaluations ESS produces a modest reduction in error. At the end of 4000 function evaluations DOPS attains the lowest error

Fig. 6
figure 6

Model predictions on unseen experimental data using parameters obtained from DOPS. The parameter estimates that were obtained using DOPS were tested against data that was not used in the model training. Solid black lines indicate the simulated mean thrombin concentration using parameter vectors from \(\mathcal {T}\) = 25 trials. The grey shaded region represents the 99% confidence estimate of the mean simulated thrombin concentration. The experimental data is reproduced from the synthetic plasma assays of Mann and co-workers. Thrombin generation is initiated by adding Factor VIIa-TF (500pM - blue, 50pM - pink and 10pM - purple, respectively) to synthetic plasma containing 200 μmol/L of phospholipid vesicles (PCPS) and a mixture of coagulation factors (II,V,VII,VIII,IX,X and XI) at their mean plasma concentrations

Table 1 Table with optimization settings and results for the coagulation problem, the benchmarks and test functions using DOPS
Table 2 Error analysis for the human coagulation model

Phase switching was critical to DOPS performance.

A differentiating feature of DOPS is the switch to dynamically dimensioned search following stagnation of the initial particle swarm phase. We quantified the influence of the number of sub-swarms and the switch to DDS on error convergence by comparing DOPS with and without DDS for different numbers of sub-swarms (Fig. 7). We considered multi swarm particle swarm optimization with and without the DDS phase for \(\mathcal {N} = 4000\) function evaluations and \(\mathcal {T} = 25\) trials on the coagulation model. We used one, two, four, five and eight sub-swarms, with a total of 40 particles divided evenly amongst the swarms. Hence, we did not consider swarm numbers of three and seven. All other algorithm parameters remained the same for all cases. Generally, the higher sub-swarm numbers converged in fewer function evaluations, where the optimum particle partitioning was in the neighborhood of five sub-swarms. However, the difference in convergence rate was qualitatively similar for four, five and eight sub-swarms, suggesting there was an optimal number of particles per swarm beyond which there was no significant advantage. Themulti-swarm particle swarm optimization stagnated after 25% of the available function evaluations irrespective of the number of sub-swarms. However, DOPS (with five sub-swarms) switched to DDS after detecting the stagnation. The DDS phase refined the globally best particle to produce significantly lower error on average when compared to multi-swarm particle swarm optimization alone. Thus, the automated switching strategy was critical to the overall performance of DOPS. However, it was unclear if multiple strategy switches could further improve performance.

Fig. 7
figure 7

Influence of the switching strategy and sub-swarms on DOPS performance for the coagulation model. DOPS begins by using a particle swarm search and then dynamically switches (switch region), using an adaptive switching criteria, to the DDS search phase. We compared the performance of DOPS with and without DDS for different sub-swarm searches to quantify the effect of number of sub-swarms and DDS. We used one, two, four, five and eight sub-swarms, with a total of 40 particles divided evenly amongst the swarms. The results presented are the average of \(\mathcal {T}\) = 25 trials with \(\mathcal {N}\) = 4000 function evaluations each. The convergence rates with higher swarm numbers is typically higher but there is no pronounced difference amongst four, five and eight. The multi-swarm with without DDS saturates while DOPS shows a rapid drop due to a switch to the DDS phase

We explored the performance of DOPS if it was permitted to switch between the PSO (Particle Swarm Optimization) and DDS modes multiple times. This mode (msDOPS) had comparable performance to DOPS on 10-d Ackley and Rastrigin functions, as well as on the 300-dimensional Rastrigin function. However, msDOPS performed better than DOPS on the CHO metabolism problem (Fig. 8a), with the average functional value being nearly half that of DOPS. To further distinguish DOPS from msDOPS, we compared the performance of each algorithm on the Eggholder function, a difficult function to optimize given its multiple minima [50]. msDOPS outperformed DOPS on the Eggholder function, however, neither version reached the true minimum at -959.6407 on any trial with a budget of \(\mathcal {N} =\) 4000 function evaluations (Fig. 8b). We also explored the performance of msDOPS and DOPS on the 100 dimensional Styblinksi-Tang function [51] (Fig. 8c). In this comparison, msDOPS significantly outperformed DOPS, finding the true minimum before exhausting its function evaluation budget, while DOPS does not reach the minimum. Since the performance of msDOPS was promising on these problems, we measured its performance on the coagulation problem. Surprisingly, DOPS performed similarly to msDOPS on the coagulation problem (Fig. 8d); the final average objective value for DOPS reached 0.9413% of the initial functional value, compared to 0.9428% for msDOPS. Taken together, these results indicate that switching plays a key role in DOPS’s performance and that for some classes of problems, multiple switching between modes produces a faster drop in objective value. However, the coagulation model results suggested the advantage of msDOPS was problem specific.

Fig. 8
figure 8

Comparison of DOPS and Multiswitch DOPS Performance of DOPS and Multiswitch DOPS on the CHO metabolism problem (a), the Eggholder function (b), the 100 dimensional Styblinksi-Tang function (c) and the coagulation problem (d). Both methods have the same initial decrease in error, but as the number of function evaluations increases, multiswitch DOPS produces a larger decrease in error. The results presented are the average of \(\mathcal {T}\) = 250 trials with for the CHO metabolism problem and \(\mathcal {T}\) = 250 trials on the Eggholder and Styblinksi-Tang functions with \(\mathcal {N}\) = 4000 function evaluations each, and \(\mathcal {T}\) = 25 trials for the coagulation problem

Discussion

In this study, we developed dynamic optimization with particle swarms (DOPS), a novel meta-heuristic for parameter estimation. DOPS combined multi-swarm particle swarm optimization, a global search approach, with the greedy strategy of dynamically dimensioned search to estimate optimal or nearly optimal solutions in a fixed number of function evaluations. We tested the performance of DOPS and seven widely used meta-heuristics on the Ackley and Rastrigin test functions, a set of biochemical benchmark problems and a model of the human coagulation cascade. We also compared the performance of DOPS to enhanced Scatter Search (eSS), another widely used meta-heuristic approach. As the number of parameters increased, DOPS outperformed the other meta-heuristics, generating optimal or nearly optimal solutions using significantly fewer function evaluations compared with the other methods. We tested the solutions generated by DOPS by comparing the estimated and true parameters in the benchmark studies, and by using the coagulation model to predict unseen experimental data. For both benchmark problems, DOPS retrieved the true parameters in significantly fewer function evaluations than other meta-heuristics. For the coagulation model, we used experimental coagulation measurements under two different conditions to estimate optimal or nearly optimal parameters. These parameters were then used to predict unseen coagulation data; the coagulation model parameters estimated by DOPS predicted the correct thrombin dynamics following TF/FVIIa induced coagulation without anticoagulants. Lastly, we showed the average performance of DOPS improved when combined with dynamically dimensioned search phase, compared to an identical multi-swarm approach alone, and that multiple mode switches could improve performance for some classes of problems. Taken together, DOPS is a promising meta-heuristic for the estimation of parameters in large biochemical models.

Meta-heuristics can be effective tools to estimate optimal or nearly optimal solutions for complex, multi-modal functions. However, meta-heuristics typically require a large number of function evaluations to converge to a solution compared with techniques that use derivative information. DOPS is a combination of particle swarm optimization, which is a global search method, and dynamically dimensioned search, which is a greedy evolutionary technique. Particle swarm optimization uses collective information shared amongst swarms of computational particles to search for global extrema. Several particle swarm variants have been proposed to improve the search ability and rate of convergence. These variations involve different neighborhood structures, multi-swarms or adaptive parameters. Multi-swarm particle swarm optimization with small particle neighborhoods has been shown to be better in searching on complex multi-modal solutions [36]. Multi-swarm methods generate diverse solutions, and avoid rapid convergence to local optima. However, at least for the coagulation problem used in this study, multi-swarm methods stagnated after approximately 25% of the available function evaluations; only the introduction of dynamically dimensioned search improved the rate of error convergence. Dynamically dimensioned search, which greedily perturbs only a subset of parameter dimensions in high dimensional parameter spaces, refined the globally best particle and produced significantly lower error on average when compared to multi-swarm particle swarm optimization alone. However, dynamically dimensioned search, starting from a initial random parameter guess, was not as effective on average as DOPS. The initial solutions generated by the multi swarm search had a higher propensity to produce good parameter estimates when refined by dynamically dimensioned search. Thus, our hybrid combination of two meta-heuristics produced better results than either constituent approach, and better results than other meta-heuristic approaches on average. This was true of not only the convergence rate on the coagulation problem, but also the biochemical benchmark problems; DOPS required two-orders of magnitude fewer function evaluations compared with enhanced Scatter Search (eSS) to estimate the biochemical benchmark model parameters. What remains to be explored is the performance of DOPS compared to techniques that utilize derivative information, either on their own or in combination with other meta-heuristics, and the performance of DOPS in real-world applications compared with other meta-heuristics such as hybrid genetic algorithms e.g., see [52]. Gradient methods perform well on smooth convex problems which have either a closed form of the gradient of the function being minimized, or a form that can be inexpensively estimated numerically. While the biological problems DOPS is intended for often do not have this form, perhaps the solutions could be further improved by following (or potentially replacing) the DDS phase with a gradient based technique when applicable. Taken together, the combination of particle swarm optimization and dynamically dimensioned search performed better than either of these constituent approaches alone, and required fewer function evaluations compared with other common meta-heuristics.

Conclusions

DOPS performed well on many different systems with no pre-optimization of algorithm parameters, however there are many research questions that should be pursued further. DOPS comfortably outperformed existing, widely used meta-heuristics for high dimensional global optimization functions, biochemical benchmark models and a model of the human coagulation system. However, it is possible that highly optimized versions of common meta-heuristics could surpass DOPS; we should compare the performance of DOPS with optimized versions of the other common meta-heuristics on both test and real-world problems to determine if a performance advantage exists in practice. Next, DOPS has a hybrid architecture, thus the particle swarm phase could be combined with other search strategies such as local derivative based approaches to improve convergence rates. We could also consider multiple phases beyond particle swarm and dynamically dimensioned search, for example switching to a gradient based search following the dynamically dimensioned search phase. Lastly, we should update DOPS to treat multi-objective problems. The identification of large biochemical models sometimes requires training using qualitative, conflicting or even contradictory data sets. One strategy to address this challenge is to estimate experimentally constrained model ensembles using multiobjective optimization. Previously, we developed Pareto Optimal Ensemble Techniques (POETs) which integrates simulated annealing with Pareto optimality to identify models near the optimal tradeoff surface between competing training objectives [53]. Since DOPS consistently outperformed simulated annealing on both test and real-world problems, we expect a multi-objective form of DOPS would more quickly estimate solutions which lie along high dimensional trade-off surfaces.

Methods

Optimization problem formulation.

Model parameters were estimated by minimizing the difference between model simulations and \(\mathcal {E}\) experimental measurements. Simulation error is quantified by an objective function K(p) (typically the Euclidean norm of the difference between simulations and measurements) subject to problem and parameter constraints:

$$ \begin{aligned} & \min_{\mathbf{p}} K(\mathbf{p}) & =& \sum\limits_{i=1}^{\mathcal{E}} \left(g_{i}(t_{i},\mathbf{x,p,u})-y_{i}\right)^{2} \\ & \text{subject to} &&\dot{\mathbf{x}}=\mathbf{f}(t,\mathbf{x}(t,\mathbf{p}),\mathbf{u}(t),\mathbf{p})\\ &&&\mathbf{x}(t_{0}) = \mathbf{x}_{0}\\ &&&\mathbf{c}(t,\mathbf{x,p,u}) \geqslant \mathbf{0} \\ &&& \mathbf{p}^{L} \leqslant \mathbf{p} \leqslant \mathbf{p}^{U}\\ \end{aligned} $$
(1)

The term K(p) denotes the objective function (sum of squared error), t denotes time, gi(ti,x,p,u) is the model output for experiment i, while yi denotes the measured value for experiment i. The quantity x(t,p) denotes the state variable vector with an initial state x0, u(t) is a model input vector, f(t,x(t,p),u(t),p) is the system of model equations (e.g., differential equations or algebraic constraints) and p denotes the model parameter vector (quantity to be estimated). The parameter search (or model simulations) can be subject to c(t,x,p,u) linear or non-linear constraints, and parameter bound constraints where pL and pU denote the lower and upper parameter bounds, respectively. Optimal model parameters are then given by:

$$ \mathbf{p^{*}}= \arg\min_{\mathbf{p}} K\left(\mathbf{\mathbf{p}}\right) $$
(2)

In this study, we considered only parameter bound constraints, and did not include the c(t,x,p,u) linear or non-linear problem constraints. However, additional these constraints can be handled, without changing the approach, using a penalty function method.

Dynamic optimization with particle swarms (DOPS).

DOPS combines multi-swarm particle swarm optimization with dynamically dimensioned search (Fig. 1) and (Algorithm 1). The goal of DOPS is to estimate optimal or near optimal parameter vectors for high-dimensional biological models within a specified number of function evaluations. Toward this objective, DOPS begins by using a particle swarm search and then dynamically switches, using an adaptive switching criteria, to a DDS search phase.

Phase 1: Particle swarm phase.

Particle swarm optimization is an evolutionary algorithm that uses a population of particles (solutions) to find an optimal solution [54, 55]. Each particle is updated based on its experience (particle best) and the experience of all other particles within the swarm (global best). The particle swarm phase of DOPS begins by randomly initializing a swarm of \(\mathcal {K}\)-dimensional particles (represented as zi), wherein each particle corresponded to a \(\mathcal {K}\)-dimensional parameter vector. After initialization, particles were randomly partitioned into k equally sized sub-swarms \(\mathcal {S}_{1},\hdots,\mathcal {S}_{k}\). Particles within each sub-swarm \(\mathcal {S}_{k}\) were updated according to the rule:

$$ {z}_{i,j} = \theta_{1,j-1}{z}_{i,j-1} + \theta_{2}{r}_{1}\left(\mathcal{L}_{i} - {z}_{i,j-1}\right) + \theta_{3}{r}_{2}\left(\mathcal{G}_{k} - {z}_{i,j-1}\right) $$
(3)

where (θ1,θ2,θ3) were adjustable parameters, \(\mathcal {L}_{i}\) denotes the best solution found by particle i within sub-swarm \(\mathcal {S}_{k}\) for function evaluation 1→j−1, and \(\mathcal {G}_{k}\) denotes the best solution found over all particles within sub-swarm \(\mathcal {S}_{k}\). The quantities r1 and r2 denote uniform random vectors with the same dimension as the number of unknown model parameters (\(\mathcal {K}\times {1}\)). Equation 3 is similar to the general particle swarm update rule, however, it does not contain velocity terms. In DOPS, the parameter θ1,j−1 is similar to the inertia weight parameter for the velocity term described by Shi and Eberhart [56]; Shi and Eberhart proposed a linearly decreasing inertia weight to improve convergence properties of particle swarm optimization. Our implementation of θ1,j−1 is inspired by this and the decreasing perturbation probability proposed by Tolson and Shoemaker [34]. It is an analogous equivalent to inertia weight on velocity. However θ1,j−1 places inertia on the position rather than velocity and uses the same rule described by Shi and Eberhart to adaptively change with the number of function evaluations:

$$\begin{array}{@{}rcl@{}} \mathbf \theta_{1,j}&=&\frac{(\mathcal{N}-{j})*({w}_{max}-{w}_{min})}{(\mathcal{N}-{1})} + {w}_{min} \end{array} $$
(4)

where \(\mathcal {N}\) represents the total number of function evaluations, wmax and wmin are the maximum and minimum inertia weights, respectively. In this study, we used wmax= 0.9 and wmin= 0.4, however, these values are user configurable and could be changed depending upon the problem being explored. Similarly, θ2 and θ3 were treated as constants, where θ2 = θ3= 1.5; the values of θ2 and θ3 control how heavily the particle swarm weighs the previous solutions it has found when generating a new candidate solution. If θ2θ3 the new parameter solution will resemble the best local solution found by particle i (\(\mathcal {L}_{i}\)), while θ3θ2 suggests the new parameter solution will resemble the best global solution found so far. While updating the particles, parameter bounds were enforced using reflection boundary conditions (Algorithm 2).

After every \(\mathcal {M}\) function evaluations, particles were randomly redistributed to a new sub-swarm, and updated according to Eq. (3). This process continued for a maximum of \(\mathcal {F}\mathcal {N}\) functions evaluations, where \(\mathcal {F}\) denotes the fraction of function evaluations used during the particle swarm phase of DOPS:

$$ \mathcal{F} = \left(\frac{\text{NP}}{\mathcal{N}}\right)j $$
(5)

The quantity NP denotes the total number of particles in the swarm, \(\mathcal {N}\) denotes the total possible number of function evaluations, while the counter j denotes the number of successful particle swarm iterations (each costing NP function evaluations). If the simulation error stagnated e.g., did not change by more than 1% for a specified number of evaluations (default value of 4), the swarm phase was terminated and DOPS switched to exploring parameter space using the DDS approach using the remaining \(\left (1-\mathcal {F}\right)\mathcal {N}\) function evaluations.

Phase 2: DDS phase.

Dynamically Dimensioned Search (DDS) is a single solution based search algorithm. DDS is used to obtain good solutions to high-dimensional search problems within a fixed number of function evaluations. DDS starts as a global search algorithm by perturbing all the dimensions. Later the number of dimensions that are perturbed is decreased with a certain probability. The probability that a certain dimension is perturbed reduces (a minimum of one dimension is always perturbed) as the iterations increase. This causes the algorithm to behave as a local search algorithm as the number of iterations increase. The perturbation magnitude of each dimension is from normal distribution with zero mean. The standard deviation that was used in the original DDS paper and the current study is 0.2. DDS performs a greedy search where the solution is updated only if it is better than the previous solution. The combination of perturbing a subset of dimensions along with greedy search indirectly relies on model sensitivity to a specific parameter combination. The reader is requested to refer to the original paper by Tolson and Shoemaker for further detail [34].

At the conclusion of the swarm phase, the overall best particle, \(\mathcal {G}_{k}\), over the k sub-swarms was used to initialize the DDS phase. DOPS takes at least \(\left (1-\mathcal {F}\right)\mathcal {N}\) function evaluations during the DDS phase and then terminates the search. For the DDS phase, the best parameter estimate was updated using the rule:

$$ \mathcal{G}_{new}({J})=\left\{\begin{array}{ll} \mathcal{G}(\mathbf{J})+\mathbf{r}_{normal}(\mathbf{J})\sigma(\mathbf{J}), &\ \text{if} \mathcal{G}_{{new}}(\mathbf{J})<\mathcal{G}(\mathbf{J}).\\ \mathcal{G}(\mathbf{J}), &\ \text{otherwise}. \end{array}\right. $$
(6)

where J is a vector representing the subset of dimensions that are being perturbed, rnormal denotes a normal random vector of the same dimensions as \(\mathcal {G}\), and σ denotes the perturbation amplitude:

$$ \sigma = {R}\left(\mathbf{p}^{U} - \mathbf{p}^{L}\right) $$
(7)

where R is the scalar perturbation size parameter, pU and pL are (\(\mathcal {K}\times {1}\)) vectors that represent the maximum and minimum bounds on each dimension. The set J was constructed using a probability function \(\mathcal {P}_{i}\) that represents a threshold for determining whether a specific dimension j was perturbed or not; \(\mathcal {P}_{i}\) is monotonically decreasing function of function evaluations:

$$ \mathcal{P}_{i}={1}-\log\left[\frac{i}{(1-\mathcal{F})\mathcal{N}}\right] $$
(8)

where i is the current iteration. After \(\mathcal {P}_{i}\) is determined, we drew \(\mathcal {P}_{j}\) from a uniform distribution for each dimension j. If \(\mathcal {P}_{j}<\mathcal {P}_{i}\) was included in J. Thus, the probability that a dimension j was perturbed was inversely proportional to the number of function evaluations. DDS updates are greedy; \(\mathcal {G}_{new}\) becomes the new solution vector only if it is better than \(\mathcal {G}\).

Multiswitch DOPS

We investigated whether switching search methods more than once would result in better performance; this DOPS variant is referred to as multiswitch DOPS or msDOPS. msDOPS begins with the PSO phase and uses the same criteria as DOPS to switch to the DDS phase. However, msDOPS can switch back to a PSO search when the DDS phase has reduced the functional value to 90% of its initial value. Should the DDS phase fail to improve the functional value sufficiently, this version is identical to DOPS. When the switch from DDS to PSO occurs, we use the best solution from DDS to seed the particle swarm. DOPS and msDOPS source code is available for download under a MIT license at http://www.varnerlab.org.

Comparison techniques

The implementations of particle swarm optimization, simulated annealing, and genetic algorithms are the ones given in Matlab R2017A (particleswarm, simulannealbnd and ga). The implementation of DE used was developed by R.Storn and available at http://www1.icsi.berkeley.edu/~storn/code.html. The version of eSS used was Release 2014B - AMIGO2014bench VERSION WITH eSS MAY-2014-BUGS FIXED - JRB, released by the Process Engineering Group IIM-CSIC. The genetic algorithm, particle swarm, and differential evolution algorithms were run with a 40 particles to be directly comparable to the number of particles used in the PSO phase of DOPS. For comparison, the version of CM-AES used was cmaes.m, Version 3.61.beta from http://cma.gforge.inria.fr/cmaes_sourcecode_page.html. The scripts used to run the comparison methods are also available at http://www.varnerlab.org.