The shortcomings of accurate rate estimations in cultivation processes and a solution for precise and robust process modeling
 405 Downloads
Abstract
The accurate estimation of cell growth or the substrate consumption rate is crucial for the understanding of the current state of a bioprocess. Rates unveil the actual cell status, making them valuable for qualitybydesign concepts. However, in bioprocesses, the real rates are commonly not accessible due to analytical errors. We simulated Escherichia coli fedbatch fermentations, sampled at four different intervals and added five levels of noise to mimic analytical inaccuracy. We computed stepwise integral estimations with and without using moving average estimations, and smoothing spline interpolations to compare the accuracy and precision of each method to calculate the rates. We demonstrate that stepwise integration results in low accuracy and precision, especially at higher sampling frequencies. Contrary, a simple smoothing spline function displayed both the highest accuracy and precision regardless of the chosen sampling interval. Based on this, we tested three different options for substrate uptake rate estimations.
Keywords
Bioprocess development Cubic smoothing spline Fedbatch fermentation Growth rate Substrate uptake rateIntroduction
State variables, such as biomass, substrates, and product, are quantified via offline measurements during cultivation processes of microbial, mammalian and yeast cells to understand how the process states evolve. To shed light into the biological subsystem, i.e., the cell state, as well as the metabolism [4, 6, 8, 12] or to compare different cultivations on the biological level, e.g., for media selection or cell line development [13, 16, 19], specific production/consumption rates are a necessity.
Principle approaches to rate estimation
There are several approaches for estimating rates of a bioprocess [7, 15, 21]. A very simple method is to calculate the first derivative of a cubic smoothing spline function [15, 21]. The result is a continuous rate over the whole course of a bioprocess such as a fedbatch process, where for every time point, a rate value can be derived.
Although the applicability of this nonparametric method on bioprocess data is known for a longer time [3, 15], it still does not seem to be the method of choice for researchers in upstream bioprocess engineering, or related fields of biology. In most cases, the integral approach, a simple stepwise integral estimation is used [5, 10, 11, 25]. Hereby two measurements, one derived from sampling time point t_{i} and the other from sampling time point t_{i+1}, are considered to estimate a rate for this interval (t_{i}, t_{i+1}). The same methodology is then applied to the next interval (t_{i+1}, t_{i+2}) and so on, estimating one rate value for each time interval, resulting in a trend over the course of the cultivation process. This, in turn, means that the rate is assumed to be constant for each sampling interval, for which it was calculated, independent on its length.
Parameters impacting rate estimation quality
Some parameters do have a high impact on the outcome of these rate estimations and if treated in the wrong way result in false estimations. For instance, dynamic process trends can remain unnoticed, e.g., if the sampling frequency is too low. In addition, if larger measurement errors are present, the rate is not feasible to describe the process anymore due to this inaccuracy. This can lead to a reduction of the accuracy of the rates and to a reasonably weakened hypothesis on the influences of certain variables or parameters. To make the calculations more applicable, different smoothing approaches for rates can be used. An often described and simple method is the moving average [9, 26]. Here, the rates from several sampling points are smoothed by taking the average value from a sampling window. In addition, more advanced moving average filters such as lowpass and Savitzky–Golay were already retrospectively used for rate modeling of bioprocesses [14, 17]. Such advanced filters require settings and appropriate knowledge for the ideal window size and smoothness, which are dependent on the process they are applied on. Using these methods, the true covariance matrix is often underestimated and the lack of automatic constraints for state variables may lead to suboptimal performances [23].
Accurate estimation of a rate
Key figures existing in every cultivation process are the growth rate µ, which is defined as the time derivative of the logarithm of the change in population size and specific substrate uptake rates, which are feed dependent. Although stepwise integral estimation gives a simple estimation of the growth rates, this calculation possesses several drawbacks. One discrete estimation from one sampling time point to the next one is suboptimal for nonlinear trends. Due to inaccurate biomass measurements, which is, in particular, true for cell culture cultivations, cell growth rates vary strongly between the samplings, indicating a false process status. On the other hand, variations in the amount of fed substrate can have substantial impacts on the specific uptake rate estimation due to error propagation. A switch in the cell’s behavior is more likely to happen continuously and not spontaneously. It can be expected that calculations and model building attempts with these obtained biased values can lead to unreliable results containing much noise. To yield better descriptions of cultivation processes continuous rates should be preferred over sudden changes to yield.
Since the “true” rate is not accessible in a real fermentation process, because of the existence of analytical measurement errors [20] and biological differences from cultivation to cultivation, we present a simulated case study, at which linear and inhibited cell growth were simulated insilico. Noise was added to the dataset to mimic a range of typical analytical measurement errors. 100 single fedbatch processes were simulated to obtain a statistical meaningful dataset. We compared the performance of the stepwise integral estimation including postsmoothing with a simple moving average with the cubic smoothing spline function. Hereby, different sampling intervals and analytical measurement errors have been simulated and both approaches were elucidated with respect to their precision and accuracy to obtain the real rates. Additionally, we also highlight an optimal solution to describe the substrate uptake rates, since for estimating substrate uptake rates, the feeding rate and feeding substrate concentration need to be taken into account. Any analytical error in this part can have a huge impact on the level of noise in the data.
The unique combination of different rate calculations applied on data with varying sampling frequencies and analytical deviations is very valuable for process understanding and modeling.
Materials and methods
The detailed cultivation settings for the different simulated insilico fedbatch fermentations (table 1) and all the necessary equations (Eqs. 1–4) are given in the Bioprocess Simulation section of the Online Resource 1.
Noise generation
The CV describes the magnitude of variation for 68.2% of the data with the standard deviation σ and the average value \(\bar{X}\).
Stepwise integral estimation
As in Takuma et al. [22], µ is estimated for each time interval between two measurements by dividing the current total biomass X(t) with the value of the previous measurement X(t − 1). This equation assumes that µ is constant for the described time interval.
Moving average
Cubic smoothing spline
Specific substrate uptake rate
Option 1
A cubic smoothing spline fit was performed on the total consumption (\(S V  S_{0} V_{0}  \smallint uf Sf{\text{ d}}t\)) and on the biomass term \(\left( {x V} \right)\).
Option 2
Option 3
For this, an additional variable must be introduced, the dilution rate D, which is defined as the ratio of uf to V (Eq. 10). The cubic smoothing spline fit was performed on the substrate concentration term \(\left( S \right)\) and on the biomass term \(\left( {x V} \right)\).
RMSE and MAPE calculation
Results
Bioprocess simulation
When a process is performed with exactly the same process parameters for an infinite number of runs and with the exact same time interval at which samples are drawn, still random errors are likely to occur. Due to the analytical method precision, which depends on the utilized device different amounts of CV can be expected. The CV of biomass determination, for instance, is obviously depending on the used method. Gravimetric dried biomass determination for E. coli is expected to be quite accurate, whereas the measurement of the viable cell count via a microscope using a hemocytometer can be rather imprecise [1, 2]. The generated variations between 2.5 and 12.5% already represent very precise cell measurements. For instance, at 7.5% CV, the biomass at 20 g/L varies with ± 1.5 g/L, which is an absolutely realistic value (see Fig. 1c, d).
Rate estimations via stepwise integral estimation and elucidation of sampling interval impact
This behavior of the stepwise integration has huge implications on the evaluation of the current growth rates. For instance, if the growth rate would be rapidly changed back and forth due to a modification in the experimental condition, the stepwise integration approach would not be able to recognize this and the information would remain hidden because of the weak performance.
Rate estimation via cubic smoothing spline
Methodical comparison: stepwise integral estimation and cubic smoothing spline
The combination of stepwise integration and a moving average is a widely used approach for gathering smoothed rates. In the following, we elucidate the differences of using this combined method with the cubic smoothing spline.
The rate estimations described via the cubic smoothing spline outperformed the stepwise integral estimation. While the spline is considering the whole data, the stepwise integral estimation only takes two consecutive time points into account. Hence, smoothing splines can better deal with the error in the data compared to stepwise integral estimations. Regarding stepwise integral estimation, the error in the data is further propagated into the rate calculation. The spline fit already smooths the data before it gets even further processed. Considering this fact, it is obvious that spline functions are more accurate and precise.
However, due to the moving average, the rate change will seem to occur at different time points than it is the case. This is, in particular, a problem for nonconstant rates (Fig. 4b). This effect will get even stronger at lower sampling frequencies. Further, averaging rates over several time points reduces the ability to describe the dynamics in the system, whereas exactly this should be described by the rates. The more likely process changes occur and the larger the averaging window is, the more likely they are overseen. Hence, the increased precision is traded for a reduced rates description.
The user also has to face the socalled endpoint problem. Due to the application of the moving average, the end of the process is not determined. Depending on the window size, the timeline of the rates will be inevitable shorter. Consequently, the utilization of moving average will reduce variation in the prediction, but will also lead to a reduced descriptiveness of the process and to misleading assumptions.
Specific substrate uptake rate estimations via the cubic smoothing spline
All three options can in average accurately describe the specific substrate uptake rate (Fig. 5d). However, the incorporation of the feed into the calculation beforehand increased the precision to a great extent (Option 1) and also the feeding noise can be almost completely erased. Interestingly, between option 2 and 3, respectively, using the total amount of substrate or the substrate concentration, no significant difference was observed (see Fig. 5e). Only at the end of the fedbatch process, option 2 underestimates the specific substrate uptake rate. However, already 1% variation in the feeding system can have a substantial impact. As a consequence of using the wrong approach, the error will increase almost fourfold (Fig. 5f) from around 5% up to 20% MAPE (Eq. 12). If the feed is not incorporated into the calculation beforehand, such as it is the case in Option 2 and 3, the feeding error propagates further into the rate estimation.
Discussion
Stepwise integral estimation issues
The key to process development and process modeling is to estimate rates accurately and precisely. In average (n = 100), the stepwise integral approach calculated an accurate rate value. This was expected considering that a large number of repetitive experiments should always meet in average the desired target value. But, we demonstrated that the stepwise integral estimation will end up in large variations. It is not surprising that the inaccuracy rises with an increased sampling frequency [24], but such an increasing variation at higher sampling frequencies was on first sight rather unexpected. Due to the magnitude of the sampling errors, the slope of the linear function will either be more positive or negative, in comparison to the real value. Every new sampling point will add its failure to it and, consequently, the deviation will increase over the time course of the cultivation. Therefore, with an increased sampling frequency, the rate estimation error increases although the measurement error remains constant. Since this behavior is counterintuitive, it is most likely overseen. This is a major disadvantage since for accurate process characterization and to gather process knowhow a large dataset, thus a high sampling frequency, is a necessity. The application of the moving average would be a simple tool to reduce such variances but the user will eventually end up in less accurate values. Therefore, rates calculated by stepwise integral estimation should be handled carefully for modeling purposes.
Application of cubic spline and specific substrate rate estimation
In this study, we focused on the cubic smoothing spline function as an alternative to rate estimations via stepwise integral estimation. With a reduced precision of the analytical determination, also the variation in the estimation increased but not to the same extent as when the stepwise integral estimation was applied. In the best case, at a high sampling frequency and biomass determination inaccuracy, the CV was around a factor of 4 lower. Moreover, the cubic smoothing spline was not affected by the sampling frequency. In real bioprocesses, a good tradeoff between sampling frequency, process dynamics and the analytical error should be considered. For high analytical errors and slow process dynamic changes, a high sampling interval does not increase precision and accuracy.
Additionally, we elucidated three different approaches for estimating substrate uptake rates via the established spline fit. If the substrate feed is not incorporated beforehand a cubic spline is performed, feed variations can have a substantial impact on the propagated error. Hence, it is important to first calculate the total amount of consumed substrate before the rates are estimated.
The only “drawback” using the cubic smoothing spline function is that one degree of freedom is present, the fitting parameter p. Therefore, before processing the optimal p must be reconsidered with respect to the given magnitude of the x ordinate. Another powerful alternative to spline functions can be found in Gaussian distributions. It was shown that for processes with high sampling numbers (100–1000), the Gaussian distribution outperforms the spline function while for samplings below 100, it is viceversa [21]. Typically, mammalian cell culture processes lead to only 10–20 observations. Likewise, also microbial fermentations do not comprise such a high sampling frequency, also resulting in only 15–25 observations per process. These considerations and the remarkably easy use of this method due to no data pre or postprocessing are clearly stating the advantage of the smoothing spline compared with other methods.
Conclusion

is easy to apply and to implement for offline analytical purposes,

is to a major extent sample interval independent,

can cope with large analytical variances,

allows the user to assess a rate value at every time point.
In addition, we showed that a small error in the feeding system can lead to huge impacts in the estimation of specific substrate uptake rates. Hereby, it is important to take the feeding into account before the actual spline fit takes part.
For this level of complexity, the spline is sufficiently enough and more complex algorithms such as the Gaussian distribution or functions with more degrees of freedom (e.g., Kalman filters) are not necessary. It is easy to implement into existing codes and can add a reasonable value to process development and process comparability.
Notes
Acknowledgements
Open access funding provided by University of Natural Resources and Life Sciences Vienna (BOKU). We would like to thank Bilfinger Industrietechnik Salzburg and the Austrian Research Promotion Agency (FFG) for their support. (Research Studio Austria, 859219 and Competence Headquarters, 849725).
Compliance with ethical standards
Conflict of interest
The authors have declared no conflicts of interest.
Supplementary material
References
 1.Bratbak G, Dundas IAN (1984) Bacterial dry matter content and biomass estimations. Appl Environ Microbiol 48(4):755–757PubMedPubMedCentralGoogle Scholar
 2.CadenaHerrera D, Lara JEE, RamírezIbañez ND, LópezMorales CA, Pérez NO, FloresOrtiz LF, MedinaRivero E (2015) Validation of three viablecell counting methods: manual, semiautomated, and automated. Biotechnol Rep 7:9–16. https://doi.org/10.1016/j.btre.2015.04.004 CrossRefGoogle Scholar
 3.Craven P, Wahba G (1978) Smoothing noisy data with spline functions  Estimating the correct degree of smoothing by the method of generalized crossvalidation. Numer Math 31(4):377–403. https://doi.org/10.1007/BF01404567 CrossRefGoogle Scholar
 4.Ferreira AR, Dias JML, Teixeira AP, Carinhas N, Portela RMC, Isidro IA (2011) Projection to latent pathways (PLP): a constrained projection to latent variables (PLS) method for elementary flux modes discrimination. BMC Syst Biol 5(1):181. https://doi.org/10.1186/175205095181 CrossRefPubMedPubMedCentralGoogle Scholar
 5.Franz C, Kern J, Karl B (2005) Sensor combination and chemometric modelling for improved process monitoring in recombinant E. coli fedbatch cultivations. J Biotechnol 120:183–196. https://doi.org/10.1016/j.jbiotec.2005.05.030 CrossRefGoogle Scholar
 6.Galleguillos SN, Ruckerbauer D, Gerstl MP, Borth N, Hanscho M, Zanghellini J (2017) What can mathematical modelling say about CHO metabolism and protein glycosylation? Comput Struct Biotechnol J 15:212–221. https://doi.org/10.1016/j.csbj.2017.01.005 CrossRefPubMedPubMedCentralGoogle Scholar
 7.Glassey J, Gernaey KV, Clemens C, Schulz TW, Oliveira R, Striedner G, Mandenius CF (2011) Process analytical technology (PAT) for biopharmaceuticals. Biotechnol J 6:369–377. https://doi.org/10.1002/biot.201000356 CrossRefPubMedGoogle Scholar
 8.Hefzi H, Ang KS, Hanscho M, Borth N, Lee D, Lewis NE (2016) Consensus genomescale reconstruction of Chinese hamster ovary cell metabolism. Cell Syst 3:434–443. https://doi.org/10.1016/j.cels.2016.10.020 CrossRefPubMedPubMedCentralGoogle Scholar
 9.Herwig C, Marison I, Stockar U Von (2001) Online stoichiometry and identification of metabolic state under dynamic process conditions. Biotechnol Bioeng 75(3):345–354CrossRefPubMedGoogle Scholar
 10.Li J, Jaitzig J, Lu P, Süssmuth RD, Neubauer P (2015) Scaleup bioprocess development for production of the antibiotic valinomycin in Escherichia coli based on consistent fed—batch cultivations. Microb Cell Fact. https://doi.org/10.1186/s129340150272y CrossRefPubMedPubMedCentralGoogle Scholar
 11.Mairhofer J, Scharl T, Marisch K, CserjanPuschmann M, Striedner G (2013) Comparative transcription profiling and indepth characterization of plasmidbased and plasmidfree Escherichia coli expression systems under production conditions. Appl Environ Microbiol 79(12):3802–3812. https://doi.org/10.1128/AEM.0036513 CrossRefPubMedPubMedCentralGoogle Scholar
 12.Niklas J, Schräder E, Sandig V, Noll T, Heinzele E (2011) Quantitative characterization of metabolism and metabolic shifts during growth of the new human cell line AGE1. HN using time resolved metabolic flux analysis. Bioproc Biosyt Eng 34:533–545. https://doi.org/10.1007/s004490100502y CrossRefGoogle Scholar
 13.Noh SM, Shin S, Lee GM (2018) Comprehensive characterization of glutamine synthetasemediated selection for the establishment of recombinant CHO cells producing monoclonal antibodies. Sci Rep 1–11. https://doi.org/10.1038/s41598018237209
 14.Ohadi K, Legge RL, Budman HM (2014) Development of a softsensor based on multiwavelength fluorescence spectroscopy and a dynamic metabolic model for monitoring mammalian cell cultures. Biotechnol Bioeng 112(1):197–208. https://doi.org/10.1002/bit.25339 CrossRefPubMedGoogle Scholar
 15.Oner MD, Erickson LE, Yang SS (1986) Utilization of spline functions for smoothing fermentation data and for estimation of specific rates. Biotechnol Bioeng 28(6):902–918. https://doi.org/10.1002/bit.260280618 CrossRefPubMedGoogle Scholar
 16.Pan X, Streefland M, Dalm C (2017) Selection of chemically defined media for CHO cell fedbatch culture processes. Cytotechnology 69:39–56. https://doi.org/10.1007/s1061601600365 CrossRefPubMedGoogle Scholar
 17.Paulsson D, Gustavsson R, Mandenius C (2014) Filtering of metabolic heat signals. Sensors 14:17864–17882. https://doi.org/10.3390/s141017864 CrossRefPubMedGoogle Scholar
 18.R. J, de Boor C (2006) A practical guide to splines. Math Comput 34(149):325. https://doi.org/10.2307/2006241Google Scholar
 19.Sieck JB, Cordes T, Budach WE, Rhiel MH, Suemeghy Z, Leist C, Soos M (2013) Development of a scaledown model of hydrodynamic stress to study the performance of an industrial CHO cell line under simulated production scale bioreactor conditions. J Biotechnol 164(1):41–49. https://doi.org/10.1016/j.jbiotec.2012.11.012 CrossRefPubMedGoogle Scholar
 20.Sonnleitner, B. (2007). Bioanalysis and biosensors for bioprocess monitoring. Springer, Berlin, pp 1–64. https://doi.org/10.1007/3540487735_1Google Scholar
 21.Swain PS, Stevenson K, Leary A, MontanoGutierrez LF, Clark IBN, Vogel J, Pilizota T (2016) Inferring time derivatives including cell growth rates using Gaussian processes. Nat Commun 7(May):1–8. https://doi.org/10.1038/ncomms13766 CrossRefGoogle Scholar
 22.Takuma S, Hirashima C, Piret JM (2007) Dependence on glucose limitation of the pCO_{2} Influences on CHO cell growth. Metab IgG Prod 97(6):1479–1488. https://doi.org/10.1002/bit CrossRefGoogle Scholar
 23.Ungarala S, Dolence E, Li K (2007) Constrained extended Kalman filter. IFAC Proc 2:63–68CrossRefGoogle Scholar
 24.Wechselberger P, Herwig C (2012) Modelbased analysis on the relationship of signal quality to realtime extraction of information in bioprocesses. AlChE J 28(1):265–275. https://doi.org/10.1002/btpr.700 CrossRefGoogle Scholar
 25.Wechselberger P, Sagmeister P (2013) Realtime estimation of biomass and specific growth rate in physiologically variable recombinant fedbatch processes. Bioproc Biosyst Eng 36:1205–1218. https://doi.org/10.1007/s0044901208484 CrossRefGoogle Scholar
 26.Zahel T, Sagmeister P, Suchocki S, Herwig C (2016) Accurate information from fermentation processesoptimal rate calculation by dynamic window adaptation. ChemIngTech 88(6):798–808. https://doi.org/10.1002/cite.201500085 CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.