Methods for anaerobic digestion model fitting—comparison between heuristic and automatic approach

The article demonstrates if automatic optimization can be better than manual adjustment. The subject of optimization was the temperature-phased anaerobic digestion (TPAD) model. A selection of 3 parameters per each reactor in the process chain was appointed—reaction rate for propionate conversion, acetate conversion, and hydrolysis. Overall, both methods provided very convergent results. However, the total summary error (TSE) for the automatic algorithm was always moderately lower than for manual—the difference varied between 16.16 and 57.05 percentage points. Although the manual method has significant advantages—adjustment was more homogenous and gave more uniform fitting. Finally, cross-validation was performed to unify the values between the experimental series. The result was a total number of 4 values for each optimized constant—for two temperature points in each of two methods. Due to inconclusive information about the accuracy, averaged values were calculated to use in further researches. The recommendation from this article is to connect the best aspect of both methods to achieve the most accurate results.

x IT Value of measured constant for iteration IT and time n for the model (y n) or experimental results (1 n) θ The temperature-dependence coefficient MRE Mean relative error ODEs Ordinary differential equations R1, R2 Reactor 1 or reactor 2 T Temperature TSE Total summary error

Introduction
The computing power of current systems is increasing, leading to a trend in which mathematical modeling becomes an actual competition for the in-scale experiment [1][2][3]. This includes also the bioenergy sector [4] and the designing of related bioprocesses [5]. However, the construction and optimization of the suitable model are not trivial [6] and need lots of human and computational resources-thus, many shortcuts in achieving the goal are used. One of them is utilizing automatic algorithms for optimizing of model constants, instead of adjusting them manually [7,8]. This approach undeniably saves time, but can it be enough precise and trustworthy as the work of an experienced scientist, which does not work in cold, machine way, but uses his knowledge to heuristically find the best result? This question lays in the basics of this work. The manual technique applied in this work, to solve the nonlinear optimization problem (from the mathematical point of view), has all features of heuristic methods [9,10]. They are used to make decisions basing on creative thinking, intuition, and logical combinations. In the results, heuristic methods let to find the way to follow, to achieve the intended target. On the one hand, manual optimization of constants needs a lot of work, as all changes in the model need to be by the researcher itself. It also limits the number of combinations possible to test, as the speed of human work is limited comparing with automatic algorithms. On the other hand, this approach gives the best control over the process, as at every step of optimization the researcher can take independent conclusions and change the direction of the further test, basing on obtained results [11]. This gives much higher flexibility and potentially protects from some kind of errors. However, it needs to be mentioned, that every change in the values of constants is based on the subjective sense of the scientist, which may lead to inaccuracy if his predictions will be incorrect or initially targeted to achieve desired results.
The automatic algorithm will be much faster than human work, as it can test hundreds of combinations within minutes [12]. It is also fully arbitral, and unidirectional as its decisions are made based on independent and universal mathematical definitions [13,14], unaffected by any subjective aspects. However, these advantages are charged with a high price-in some cases, the algorithm can give results that make sense in mathematical meaning, but are incorrect from a biological or physical point of view. For example, the very high value of a selected constant can be satisfactory for optimization query but can exceed values in which it will be expected in a real experiment. This is a potential trap and shows how important is to construct correct, and studied statement for the function to minimize. Also if the reference data for curve fitting includes discontinuity or is not smooth, it can mislead the algorithm or deteriorate accuracy [15].
The fundamental difference between searching for a solution using exact and heuristic algorithms is that the first approach returns the real optimal solution (in the perfect case) while the second provides only approximate one, and only in particular instance exact [10]. Due to this, the exact methods are applied mostly in the cases that are well explored and recognized, while heuristics are more common in all places where there are not known the algorithms that would let to find the exact solution in enough time. The search is carried out along the shortest, the most likely way, avoiding the less promising patches. The effectiveness of heuristic steps cannot be fully proved theoretically, it is possible only to experimentally show their relevance [16].
In this article, a detailed comparison between manual and automatic optimization of constants will be presented. As a reference, a mathematical model of the biogas plant, and the literature data from real in-scale installations will be utilized. To be more specific-the temperature-phased anaerobic digestion (TPAD) system will be considered [17,18] (Fig. 1).
This concept uses the differences in environmental preferences between selected groups of microorganisms to split their populations (at least partially) between two steps of the process. The first reactor is kept in higher temperature (typically around 55 • C), to promote hydrolysis, which is particularly important for hardly decomposable feedstocks. However, these conditions are not proper for methanogenesis, which leads to the limited efficiency of biogas production, and potential issues in the stability of the process [19]. Thus in the second step, the temperature is kept in the mesophilic range (around 35 • C), to promote the further conversion of intermediate products from hydrolysis to the final fuel. By that means, all processes involved in the anaerobic digestion pathhydrolysis, acidogenesis, acetogenesis, and methanogenesis [20]-can occur in optimal conditions.

Materials and methods
The optimization in this article was performed for a biogas plant model of two-phase anaerobic digestion. This concept of installation introduces separating bioprocesses, which take place during fermentation, to two different tanks. To be more specified, in the case considered in our work, the phasing is performed by keeping different temperatures of the process between steps. The model was described in our previous works [21,22] and consists of 29 ordinary differential equations (ODEs) per every reactor in the process chain.
Overall, the selection of constants to adjust vary between researchers [23][24][25]. However, some of them repeat in every study, namely, hydrolysis rate (k dis ), uptake rate for propionate (k m;pro ), and acetate degrading organisms (k m;ac ). The relation between temperature and value of first of the mentioned parameters is the clearest, as most of the researches agree that increasing the temperature-which takes place in the first reactor in the considered systemwill raise the ratio of hydrolysis. For the remaining two parameters, it will be more complicated to declare correct correlations and detailed optimization is required.
Irrespective of the method of optimization-manual or automatic, all simulations were performed in pure MATLAB suite, without the support of Simulink. To keep HeaƟng HeaƟng all calculations reproducible and equivalent, in all cases the same ODEs solver is utilized-namely ode15s [26]. This algorithm is dedicated to stiff systems, which fits perfectly in the requirements for biological simulation [27]. What's more, it provides relatively high accuracy, with very good performance. Also, both optimization methods were split into two series-in the first, the full TPAD system was modeled. In the second, two independent steps were considered-one for mesophilic and one for the thermophilic reactor, both with shifting feeding rate. The starting from raw biomass in both reactors, instead of effluent from the previous reactor, let to prevent potential error accumulation in the modeling of the mesophilic step. In this way, we can be sure that the constants are optimized correctly, while still being tested in the complete process chain.
As a reference to simulation results, experimental data from the following studies [23,25] were taken. In both series, the biogas was produced from sewage sludge-it is a common substrate, also in co-digestion [28]. In the case of hydrolysis constant, overall biogas production was chosen as a parameter to fit in the graph. For k m;pro , the concentration of dissolved propionate was taken, and for k m;ac similarly-dissolved acetate concentration. The summary of the model configuration for both series of simulations is presented in Table 1.

A heuristic method for optimization.
The manual method of optimization was relatively simple, thus could not test a high number of parameter combinations [29]. In general, the sequence of steps was as follows: the constants were optimized starting from the k m;pro , while the decision-making factor was the convergence between experimental points and model predictions plots. Parallel with the value of k m;pro , also from 2 to 3 combinations of hydrolysis constants were considered, to check its eventual impact on the concentration of propionate.
If the best-fitting configuration were inside the tested range, extreme values from both ends were eliminated, and a new range with the higher resolution was generated, to define more precisely the final optimization value. If the best value were on the edge of the range, its borders were slid to look for the solution in a new scope. The procedure was repeated as long as the fitting between experimental and modeled data was not assumed arbitrarily as satisfying. Then, analogical procedures for k m;pro and finally for k dis were considered. A summary of the range of optimization and the number values per step is presented in Table 2.
For the first configuration, with two subsequential reactors usually between 110 and 140 combinations of constants for reactors were tested. While for the second system, where reactors were considered separately, and with shifting feeding rate, more iterations were required-20-35 per every reactor. Considering that on every iteration usually around 15 combinations are tested, the final number of trails can reach the level of 500. The flowchart of a single trail, the same for every constant and optimization series, is presented in Fig. 2.

Algorithm of automatic optimization.
The automatic optimization was performed for the same constants as previously described. However, basing on the knowledge from the earlier research, the range of optimization was slightly changed. Also, it was unified for both reactors in the process chain. These changes are necessary to speed up an optimization algorithm and its sum-up, as well as the initial condition for the algorithm, is presented in Table 3.
The automatic optimization can be performed with different algorithms. The Optimization Toolbox for MATLAB provides a wide selection of methods, from problem-based, through solver-based, ending with multi-objectives functions. As the model required the external solver for ODEs, the most logical and natural choice is the solver-based approach. In this category, 5 most common methods are available and in this study, the fmincon algorithm will be utilized [30,31]. It can be used for nonlinear multivariable functions what is important in the considered case. What's more, it lets to declare upper and lower borders, which is very useful if a rough estimation of the range, in which the values should be searched, is known. Additionally instead of subsequential optimization, like for the manual method, the constants are changed independently in random order, which can lead to interesting results, impossible to achieve by organized, manual adjustment.
The algorithm needs at least 4 parameters-upper and lower borders, initial conditions, and a function to minimize. First, three aspects were discussed above, and the last one is strictly connected with the character of the process constant which will be optimized. As it was already mentioned, the sum of 3 process parameters, every bounded with the different measured value, is considered during optimization. All of them need to be included in the optimization query. The most simple and elementary function to minimize can look as follows: fun k m;pro , k m;ac , k dis = Q gas − Q gas exp + S pro − S pro exp + S ac − S ac exp (1) where S x means a concentration of dissolved substance x and Q gas is a biogas output flow from the reactor.
The indicator "exp" shows values from the experiment, while other parameters are obtained from the simulation. In general, this equation is correct and would lead to This form includes a sum of all differences between measured and modeled values, for all points of time where both are available. However, the order of the magnitude of selected parameters can vary significantly between them. Thus, it is justified to consider differences as a relative error. The function to minimize takes the form: fun k m;pro , k m;ac , k dis = n i=1 Q gas n − Q gas exp n Q gas exp n + S pro n − S pro exp n S pro exp n + S ac n − S ac exp n S ac exp n And this form of function for optimization query will be utilized in further study. It connects the best aspect of both described earlier, additionally preventing the directional errors from different orders of magnitude between the parts of the objective function. In the second series of optimization, additional correction is included-the relative error for propionate concentration is weighted by 0.5, to limit its excessive impact on biogas curve fitting.
However, the fmincon itself is an algorithm that looks for a local minimum, which means the results will strongly depend on the initial conditions and is unable to determine if the solution is global or local. In the matter of model optimization, it would be to search for a global minimum or at least its approximation. Thus, the method was not called directly, but by GlobalSearch, from MATLAB Global Optimization Toolbox. These extensions run the fmincon multiple times, to test many combinations of not only optimized variable but also initial conditions. In contrast to the similar, but simpler MultiStart approach, it not only generates a series of start points but also evaluate them with score function and reject those that would not improve the solution [30]. It can potentially save calculation time, which is very important for a rather complicated and demanding biological system. However, it needs to be noticed that the GlobalSearch algorithm uses a scatter-search mechanism [32], which is more complicated than in the case of the MultiStart [33] and cannot run parallel on multiple CPU cores, which can affect performance in some cases. As a result, the GlobalSearch returns a vector of solutions, which are a representation of points assume as the global minimum for the optimization problem.

Results of manual and automatic optimization
As it was discussed, both techniques have benefits and disadvantages. In general, they should provide similar results, at least in the same order of magnitude. And indeed-the results show high convergence in the overall assessment, however, in detail, there are significant variations. Both methods led to an acceptable biogas production profile in the first series, which is presented in Fig. 3. In both cases, the manual adjustment led to higher projected biogas production. However, considering relative error between experimental and modeled values, these changes seem to be justified. In the case of manual optimization of the first reactor (Fig. 3a), the error was equal to 36.60 % while using the model parameter suggested by the fmincon leads to a relative error on the level of 30.30%. This difference is statistically significant and can lead to preliminary conclusions that automatic optimization can be an interesting alternative to manual adjustments. The situation is even more interesting if the second reactor is considered (Fig. 3b). In this case, also the automatic algorithm suggested constant values, which lead to lower biogas production profile, but this time the relative error is decreasing very clearly-from 114.78 to 83.10%. This could be surprising, basing on the fact how similar are both plots, and show the advantage of automatic simulation over the manual. It would be very easy to skip a better solution, basing only on subjective analysis.
In the matter of the other two constants and measured parameters connected with them, the relation between method and relative error was not so directly proportional. Despite the propionate concentration projection accuracy increased slightly in both reactors (0.31 and 2.9 percentage point respectively), for the acetate only in the first case, around 10 percentage point decrease in relative error was determined. For the second reactor, the relative error increased by about 8 percentage points. However, this fact does not exclude the usability of automatic methods, as the summary accuracy, considering all parameters, is better. To present it the most clearly, a total summary error (TSE) was calculated as: where δ S x means the mean value of relative error for all available points of time. The summary of obtained constants and related TSE for the first series is presented in Table 4. The analogical analysis was performed for the second series of experimental data. As mentioned earlier, this time both reactors are considered separately, with independent feedstock input, to avoid potential error accumulation. What's more, the biomass flow is not constant and is increasing during the experiment, which is a benefit in the matter of examining model response for additional changes. Again, both methods led to statistically correct values, while the values from an automatic algorithm result in lower biogas production for the thermophilic reactor. The production for the second, mesophilic, is nearly the same. The details are presented in Fig. 4.
For the first reactor (Fig. 4a), the overall mean relative error of biogas production was 30.35% for the manual method, and 25.10% for automatic. Despite weaker fitting in the second part of the simulation, there is a very good convergence in the first 30 days of simulation. On the other hand, the manual method is more balanced, shows similar fitting on both parts of the process.
For the second reactor (Fig. 4b), the biogas production was nearly the same, independently on the method utilized to achieve model parameters. The difference in relative error was just 0.52% in favor of the fmincon. In this situation, it is far too low, to determine if it is statistically relevant, so it had to be assumed that both methods did just as well.
Considering other optimized parameters-k m pro , and k m ac , the difference is the biggest in the aspect of propionate concentration, especially for the first reactor. The improvement in decreasing relative error between manual and automatic methods reached over 52 percentage points. For the second reactor, it was more subtle-as the error decreased from 75.46 to 56.21%. Ultimately in the matter of dissolved acetate concentration, the accuracy was slightly decreased in the first reactor (around 0.61 percentage point), and the accuracy was significantly improved for the second tank-over 37 percentage point  Table 5.
As previously, in all cases the TSE is decreasing, which suggests that the fmincon supported by GlobalSearch, can find more precise and accurate values, than a researcher using systematized but manual adjustment.

Cross-validation and final values
The consideration from Section 3.1 led to interesting results, however, they are still separated between steps and methods, giving a sum of four possible values for any constant in every reactor in the process chain. Additionally, they are correct only for these two fixed points of temperature, which is a significant limitation. Thus, there is a need to indicate full temperature dependencies for the model, so it can work in every temperature specified by the user, and to find a consensus value, between both mentioned methods.
The methodology of calculation was split into two parts. In the first, the unified values of constants for two fixed temperatures were calculated within both optimization methods. Then, in the second part, a consensus value for each constant was calculated, based on the results of both the fmincon and heuristic method. Finally, from this constant in fixed conditions, full temperature dependencies were calculated. The matrix of cross-validation (Fig 5) between series was the same for both methods. The simple algorithm based on taking always only one constant, and replacing it with its equivalent from another series. For example, after the first initial calculation with all default values, in the second iteration, the k m pro , was replaced with its value from the second series of optimization (where reactors were independent), then in 3rd series k m ac was replaced, while k m pro value was restored to default for this experiment, and so on. A total number of 14 different combinations were calculated.
The results of the calculations consistent with the abovementioned matrix were required to determine the impact of selected changes on an eventual increase of relative error. For reference trials (iteration 1 and 8) the error is assumed as 0%, and the change for any other iterations is calculated in the respect of the corresponding test from these two. For example, the error for iterations 2-7 will be calculated as follows: where x IT1 n means the value of the corresponding parameter for measuring point n in the first iteration and x ITy n in the tested configuration. For iterations 9 to 14, the scheme will be the same, only the reference will be changed to x IT8 n . No differences between the automatic and heuristic method in this matter are present. The summary for both algorithms is presented in Table 6.
As it may be noticed, there is no simple pattern in the impact of the method, on the result of cross-validation. In the case of k m;pro , in all optimized configurations, the manually adjusted value led to a lower error. However, for k m;ac, the impact is more complicated-in thermophilic reactors, manually adjusted values were better, why for mesophilic this from automatic algorithm gave lower mean relative error. An opposite tendency can be denoted in the case of hydrolysis constant, wherein both thermophilic reactors, the automatic algorithm gives better value, while for mesophilic tanks, manually adjusted values led to overall lower mean relative error.
Basing on these dependencies it is impossible to arbitrary chose a better method. Both in the step of raw optimization, as well as in the cross-validation, manual and automatic method have their advantages. Therefore, both will be considered as an input to the final results. The next step was to calculate the values of constant, for every method  separately, basing on the data from cross-validation. To do so, weighted arithmetic mean will be utilized, where the mean relative error will be treated as a weight. The equation below summarizes the idea: The k x1 n and k x2 n mean the value of constant from the first and second series of optimization, respectively. The δ 1 n and δ 2 n are mean relative errors connected with them. The results for both considered techniques of optimization are summarized in Table 7, additionally mean value for further calculation is determined. However, the mean value of constant, calculated based on two series of experimental data, and two independent methods of optimization, can be treated like a kind of unification, this is still correct only for two fixed points of time. There is still a need to determine full temperature dependencies, to let the model works in any condition programmed by the user. In general, the relation between temperature, and the value of discussed constants can be approximated to exponential [25]. Referring it to the initial value of constant, for one of the known temperature points (T 0 ), leads to the following conclusions: Where k x T n describes the value of the selected constant in any temperature n, and k x T 0 means its value in reference temperature (average value from both methods). The θ is the temperature-dependence coefficient that is individual for any of the considered constants. It can be calculated, by rewriting mentioned above equation, to include the value of constant for both temperatures considered in the optimization: where T 0 refers to mesophilic conditions (35 • C) and T 1 to thermophilic (55 • C). According to the adopted nomenclature, all further calculations will be related to the calculated values for T 0 . The data are summarized in Table 8.

Comparison of methods
The determination if fully impartial, an automatic algorithm is better or worse than a manual, subjective adjustment by a scientist is not a trivial issue. As our research shows, there is no simple answer to a question, which method provides higher accuracy, better results, preferable fitting to experimental results. Certainly, it can be said that both approaches provide results, which are convergent in all the most important aspects. What's more, both give constant values in acceptable order of magnitude, leading to satisfactory fitting between experimental and simulated data. However, also significant differences need to be noticed. For example, for most constant, the temperature tendency was the same, independently of the method. Nevertheless, for the constant that describes uptake rate for acetate degrading organisms, the manual method suggested increase with temperature, while automatic algorithm shows values leading to the conclusion that it is almost independent, or slightly decreasing.
Another interesting observation is related to the nature of both methods. The TSE is always lower for automatic algorithms, but it does not mean directly that this method is just better. This is the best visible for the second series of optimization, where the input flow of biomass is no constant (Fig. 4). The overall error for the first reactor in this series is 56.3 percentage points lower for automatic method, however, this is achieved mostly by very previse fitting in the first part of the simulation, while in days 30-50 the accuracy is significantly dropping. The manual method provides less accurate results in synthetic tests, but the fit is more regular throughout the entire measurement points. This can have significant consequences on the results of further, more complicated simulations.
On the other hand, the errors, connected the with absolute nature of optimization function in automatic method, can be reduced by utilizing more complicated queries. In this research, a comparatively simple form of a sum of relative errors between experimental, and predicted values for all known points, was used. The more complex objective function, utilizing weight factors or other solutions forcing more equal fitting in a whole range of simulations could lead to significantly different results.
Irrespectively of presented above benefits and disadvantages, it should be kept in mind, that auto-optimization is a much faster process, as the computer can test hundreds of possible combinations within minutes. What's more, they can be shifted in a very complex pattern, which potentially may be more efficient than a simple manual, consequent order presented in this article. Ultimately, it can find solutions, which would be omitted by a researcher in manual adjustment.
In the presence of the above-mentioned facts, we suggested a cross-validation based method, which in the final step connected both approaches. This will be a compromise between the absolute accuracy, and scientific sense, between impartiality and the experience. This denouement seems to be the one equitable in this draw between scientists and computers. The averaged value of constant, and the temperature-dependence coefficient, received by this methodology can be used in further simulations and researches.

Conclusions
Our recommendation is to combine the advantages of both approaches always when it is possible. The automatic method should be carried out first, as this approach is generally faster and does not require a detailed understanding of the process basics. If the desired accuracy and uniformity of result is not achieved, the manual method can be utilized independently, and eventually combined with previous calculations. As our results show, while the automatic algorithm indicates higher overall accuracy (in the meaning of the TSE), it can neglect some subtle correlations which can be recognized by a scientist. Nevertheless, in some cases where manual optimization is not applicable-for example, where there is a high number of parameters to optimize, the automatic method based on the GlobalSearch algorithm can provide good and acceptably accurate results, without external support of manual adjustment. Both of the examined methods are efficient for the optimization of bioprocess models. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.