Data synchronization techniques and their impact on the prediction performance of automated recalibrated soft sensors in bioprocesses

Siegl, Manuel; Geier, Dominik; Andreeßen, Björn; Max, Sebastian; Mose, Esther; Zavrel, Michael; Becker, Thomas

doi:10.1007/s12257-024-00120-7

Data synchronization techniques and their impact on the prediction performance of automated recalibrated soft sensors in bioprocesses

Research Paper
Open access
Published: 24 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Biotechnology and Bioprocess Engineering Aims and scope Submit manuscript

Data synchronization techniques and their impact on the prediction performance of automated recalibrated soft sensors in bioprocesses

Download PDF

Manuel Siegl ORCID: orcid.org/0000-0002-2377-5103¹,
Dominik Geier¹,
Björn Andreeßen²,
Sebastian Max²,
Esther Mose²,
Michael Zavrel^2,3 &
…
Thomas Becker¹

231 Accesses
Explore all metrics

Abstract

Innovative soft sensor concepts can recalibrate automatically when the prediction performance decreases due to variations in raw materials, biological variability, and changes in process strategies. For automatic recalibration, data sets are selected from a data pool based on distance-based similarity criteria and then used for calibration. Nevertheless, the most appropriate data sets often are not reliably selected due to variances in the location of landmarks and process length of the bioprocesses. This can be overcome by synchronization methods that align the historical data sets with the current process and increase the accuracy of automatic selection and recalibration. This study investigated two different synchronization methods (dynamic time warping and curve registration) as preprocessing for the automatic selection of data sets using a distance-based similarity criterion for soft sensor recalibration. The prediction performance of the two soft sensors without synchronization was compared to the variants with synchronization and evaluated by comparing the normalized root mean squared errors. Curve registration improved the prediction performance on average by 24% (Pichia pastoris) and 9% (Bacillus subtilis). Using dynamic time warping, no substantial improvement in prediction performance could be achieved. A major factor behind this was the loss of information due to singularities caused by the changing process characteristics. The evaluation was performed on two target variables of real bioprocesses: biomass concentration prediction in P. pastoris and product concentration prediction in B. subtilis.

Soft sensor development based on just-in-time learning and dynamic time warping for multi-grade processes

Article 10 March 2023

Online sensor validation in sensor networks for bioprocess monitoring using swarm intelligence

Article 08 July 2019

Strategies for synchronizing chocolate conching batch process data using dynamic time warping

Article 23 August 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Monitoring biotechnological processes requires real-time monitoring of—in the best case—many variables using hardware sensors. However, several critical information-bearing variables, such as biomass and product concentration, often cannot be monitored in real time or only by expensive sensor technology. In this case, soft (ware) sensors are commonly used. Within this sensor concept, existing hardware sensors are combined in mathematical models according to statistical correlations and process knowledge to predict the required target variables [1, 2]. This technology has already been widely used in biotechnological processes in the past. For example, it was possible to predict the biomass concentration in a process involving Pichia pastoris. To achieve this, a soft sensor model was developed based on various online process parameters, such as the concentration of CO₂ and O₂ in the exhaust gas and actuators, such as the addition of pH correction agents [3]. Another example of the use of soft sensors is the prediction of biomass and product concentration in industrial insulin production with Escherichia coli. Galvanauskas et al. [4] showed that the target variables could be successfully predicted using neural networks and Monod kinetics. The input for the models included the temperature of the reactor heating jacket, the addition of pH correction agents, and variables resulting from gas concentrations, such as the oxygen uptake rate (OUR) and the carbon dioxide evolution rate (CER). Soft sensor models for predicting biomass, substrate concentration, and spore yield were also successfully developed for a bioprocess with Clostridium butyricum. The input variables used were fermentation time, capacitance, conductivity, pH, initial total sugar concentration, as well as ammonium and calcium concentration [5]. Further examples of the successful application of soft sensors in bioprocesses can be found in several review articles [6,7,8,9]. However, soft sensors require regular recalibration to prevent loss of prediction performance over time due to changing process characteristics [10]. In biotechnological processes, these can contain changing raw materials, modified process strategies, or biological variability.

In many applications, soft sensors are still recalibrated manually, which is a very time-consuming and expensive process [10]. However, the automatic recalibration of soft sensors is an alternative and subject of different studies and is referred to as just-in-time modeling [11,12,13,14,15]. The approach is usually similar: First, a time is defined, or, depending on process conditions, it is determined when a recalibration should occur. Subsequently, historical data sets are selected to recalibrate the soft sensor model. This is necessary because the reference values of the current process would be available with a significant delay only. All these steps can be carried out automatically using an optional manual specification of initial conditions. The selection can be based on the chronological order (temporal similarity criterion) [15] of the data sets as well as spatial similarity (distance-based similarity criterion) [11,12,13,14]. Finally, the soft sensor model is recalibrated based on the selected data sets and defined as valid until the subsequent recalibration. When selecting historical data sets, selection based on distance-based similarity criteria is particularly promising for bioprocesses. Thus, reliable predictions can be guaranteed despite sudden changes in process characteristics if similar characteristics have already been logged in the past. If the data sets were selected solely based on their chronological occurrence, the prediction performance would be substantially reduced in the case of a raw material change, for example. In the distance-based selection of data sets, the temporal trends of the online variables are matched with the current process. This is often done in combination with multiway principal component analysis (MPCA) and batch-wise unfolding of the online measured variables [16,17,18]. The advantage of MPCA is that additional low-information noise of the data sets can be removed with the help of an ordinary PCA [19]. Therefore, the historical process data matrix must only be refolded in advance, which gave this method the name MPCA. Similar data sets can now be detected in this principal component space for the current process based on similarity criteria. These criteria can be based on different distances, e.g., the Euclidean distance or the Mahalanobis distance [20,21,22].

Nevertheless, the problem with this approach is that particularly temporally similar data sets are selected. This means calibration data sets with equally long lag phases are selected if the current process has a similar long lag phase. However, these calibration data sets do not necessarily guarantee the best prediction performance. It is conceivable that the curve patterns of the areas between different process-specific landmarks are significantly more relevant than the absolute length of individual sections. In this example, a process with a shorter lag phase but a similarly steep exponential phase could guarantee a better prediction performance as a calibration data set. To test this hypothesis, the process and the historical data sets must be temporally aligned, which can be done via data synchronization methods.

In this study, two synchronization approaches, dynamic time warping (DTW) and curve registration (CR), suitable for synchronizing all process-specific landmarks, were compared, and the influence of data synchronization on the prediction performance of an automatically recalibrated soft sensor approach was investigated. The two synchronization methods were modified for soft sensor recalibration and applied to the process variables of two different bioprocesses. Firstly, the automatic recalibration concept of the soft sensors selects similar data sets using an MPCA and distance-based similarity criterion. Next, the soft sensor is recalibrated using partial least squares regression (PLSR) with an additional model transition that contains a forgetting factor. A linear model was used as the basic soft sensor model, with all process variables available online and additionally calculated variables (CER and OUR) as input. To evaluate the influence of data synchronization on prediction performance, the normalized root mean squared error (NRMSE) was compared with and without the synchronization approach. The evaluation was performed for the biomass prediction of the P. pastoris process and the protein prediction of a Bacillus subtilis process.

2 Materials and methods

2.1 P. pastoris process—cultivation and hardware

2.1.1 Strain, preculture conditions, and main culture

In a preculture, P. pastoris (DSMZ 70382) was cultured in three shake flasks of 150 mL, each containing 50 mL FM22 medium with glycerol for the main process. The flasks were cultured for 70 h at 150 min⁻¹ and 30 °C. The preculture was then pooled and used as inoculum for the following main culture. The main culture had a volume of 15 L and FM22 with glycerol as a medium. In the initial batch phase for biomass generation, all glycerol was consumed. In the following fed-batch phase, a substrate change to methanol occurred. The methanol addition was also supplemented with 12 mL L⁻¹ PTM4 solution. Methanol concentration (4.5 g L⁻¹), pH (5), pressure (500 mbar), temperature (30 °C), and dissolved oxygen (40%) were controlled. The FM22 medium contained the following [23]: (NH₄)₂SO₄, 5 g L⁻¹; CaSO₄ 2H₂O, 1 g L⁻¹; K₂SO₄, 14.3 g L⁻¹; KH₂PO₄, 42.9 g L⁻¹; MgSO₄ 7H₂O, 11.7 g L⁻¹; and glycerol, 40 g L⁻¹. To the FM22 medium, an additional 2 mL L⁻¹ of the PTM4 solution was added: CuSO₄ 5H₂O, 2 g L⁻¹; KI, 0.08 g L⁻¹; MnSO₄ H₂O, 3 g L⁻¹; Na₂MoO₄ 2H₂O, 0.2 g L⁻¹; H₃BO₃, 0.02 g L⁻¹; CaSO₄ 2H₂O, 0.5 g L⁻¹; CoCl₂, 0.5 g L⁻¹; ZnCl₂, 7 g L⁻¹; FeSO₄ H2O, 22 g L⁻¹; biotin, 0.2 g L⁻¹; and conc. H₂SO₄, 1.0 mL.

2.1.2 Bioreactor, sensor systems, and reference measurements

The main culture was performed in a stirred tank reactor with a total volume of 42 L (Biostat® Cplus reactor; Sartorius AG). During the processes, the concentration of O₂ and CO₂ in the exhaust gas (BlueInOne sensor; BlueSens gas sensors GmbH) and the methanol concentration in the reactor (Alcosens sensor; Heinrich Frings GmbH & Co. KG) were monitored in real time, in addition to the standard probes (pH, pressure, and dissolved oxygen). The target variable for soft sensor prediction was biomass concentration in dry cell weight (DCW). As reference values for the soft sensor predictions, samples were taken and analyzed every 2–14 h. To determine DCW in triplicates, centrifuge tubes were pre-weighed, filled with 2 mL of sample, and centrifuged at 21,000 × g. The supernatant was discarded, and the cell pellet was dried at 80 °C for 72 h and subsequently weighed. The process was controlled with the Biostat® Cplus control unit. Data logging was performed using the SIMATIC SIPAT software (Siemens AG). All sensor values, actuator values, and reference values were logged.

2.2 Industrial B. subtilis process—cultivation and hardware

2.2.1 Strain, preculture conditions, and main culture

An optimized preculture cultivation strategy developed by Clariant Produkte (Deutschland) GmbH was implemented to generate an inoculum of B. subtilis for the main culture. The main culture (700 mL) was cultivated using a proprietary high-performance medium specifically designed for industrial cultivation, and its detailed composition is not disclosed due to confidentiality agreements. The temperature was modified during the process, and glucose served as the substrate initially supplied for an initial batch phase and later fed in the fed-batch phase. Oxygen was continuously provided to the process through a constant inflow (1.5 L min⁻¹) of sterile air via a sparger.

2.2.2 Bioreactor, sensor systems, and reference measurements

Multifors reactors (1.4 L total volume, Infors AG) were used for the processes. These reactors were equipped with standard sensors for pH, pressure, and dissolved oxygen measurements. In-line exhaust gas analysis was performed using a mass spectrometer (Thermo Scientific™ Prima PRO; Thermo Fisher Scientific Inc.). Protein concentration was selected as the target variable for soft sensor prediction. Reference measurements were taken manually by trained laboratory personnel. The protein concentration was determined in triplicates by assessing the target protein’s activity. The data logging and process control were managed using the bioprocess platform software eve® (Infors AG).

2.3 Automatic recalibration of soft sensors with different synchronization methods

The soft sensor development and validation were performed in MATLAB R2023a (The MathWorks Inc.). As a basic prediction model, a linear model with all process variables available online (pO₂, pH, temperature, addition of pH correcting agents, addition of substrate, CO₂ and O₂ concentration in the exhaust gas), as well as additionally calculated variables (CER and OUR, as well as the cumulative values of these variables) as input were used. This underlying linear model structure has already been successfully used for several bioprocesses [2], including P. pastoris and B. subtilis [14]. The structure of the algorithm used to recalibrate this soft sensor model is described below.

2.3.1 Structure of the automatic recalibration soft sensor concept

To determine the influence of synchronization methods (DTW, CR) on the prediction performance of a soft sensor with automatic recalibration, the structure shown in Fig. 1 was used. A data pool of P. pastoris (n = 12) and a data pool of B. subtilis (n = 24) were available. The rough outline of the soft sensor structure is as follows: At the beginning, one data set is deleted from the data pool and declared as the current process (query data set). This data set is passed to the algorithm step by step as if it occurred in real time. Additional input variables (CER, OUR, etc.) are now calculated, followed by an optional synchronization using DTW or CR. Next, the most similar data sets (n = 3) are automatically selected using an MPCA and a similarity analysis based on the weighted Euclidean distances between the historical and the current query data set. Finally, the PLSR-based prediction model is recalibrated, and the model is evaluated. Initially, a soft sensor model calibrated with all historical data sets was given, which was recalibrated four times per process using the methodology described. More details on the sub-steps are presented in the following chapters.

2.3.2 Preprocessing of data sets

Initially, calculations were performed to determine additional input variables, namely the OUR and the CER. These calculations required the utilization of various parameters, including the airflow rate (${\dot{V}}_{\text{air}})$, pressure ($p)$, liquid reactor volume $({V}_{\text{liquid}})$, the universal gas constant ($R=$ $8.314 \bullet {10}^{-2}\frac{L \text{bar}}{\text{mol} K}$), temperature ($T$), and the mole fractions of oxygen (${x}_{\text{O}2}$) and carbon dioxide (${x}_{\text{CO}2}$) at the inlet (indexed as $in$) and outlet (indexed as $out$) [24].

$${\text{CER}} = { }\frac{{\dot{V}_{{{\text{air}}}} \cdot p}}{{V_{{{\text{liquid}}}} \cdot R \cdot T}} \cdot \left( {\frac{{1 - x_{{{\text{O}}2,{\text{ in}}}} - x_{{{\text{CO}}2,{\text{in}}}} }}{{1 - x_{{{\text{O}}2,{\text{out}}}} - x_{{{\text{CO}}2,{\text{out}}}} }} \cdot x_{{{\text{CO}}2,{\text{out}}}} - x_{{{\text{CO}}2,{\text{in}}}} } \right)$$

(1)

$${\text{OUR}} = { }\frac{{\dot{V}_{{{\text{air}}}} \cdot p}}{{V_{{{\text{liquid}}}} \cdot R \cdot T}} \cdot \left( {x_{{{\text{O}}2,{\text{in}}}} - \frac{{1 - x_{{{\text{O}}2,{\text{ in}}}} - x_{{{\text{CO}}2,{\text{in}}}} }}{{1 - x_{{{\text{O}}2,{\text{out}}}} - x_{{{\text{CO}}2,{\text{out}}}} }} \cdot x_{{{\text{O}}2,{\text{out}}}} } \right)$$

(2)

The majority of required variables for calculating the CER and OUR were measured directly with hardware sensors. Besides, the liquid reactor volume (${V}_{\text{liquid}}$) was determined by a balance approach, considering the initial volume, the liquids added during the process (such as pH corrector, antifoam, and substrate feed), and the liquids removed from the process (samples). The influence of evaporation could be neglected as an exhaust air condenser was used. In addition to calculating the CER and OUR, the cumulative values were also calculated as input variables.

2.3.3 DTW

The synchronization of the online variables ${x}_{\text{raw}}$ of the historical data sets to the online variables ${x}_{\text{query}}$ of the query data set (validation data set) using DTW was performed iteratively. This data-driven method calculates a distance matrix between the historical data points and the current process. Therefore, the Euclidean distances between all possible pairs of points are calculated. Then, the optimal warping path through this matrix is searched, minimizing Euclidean distances. This search considers boundary conditions, such as prohibiting backward steps in time. This process is repeated iteratively until a termination criterion is reached. Process length and specific landmarks are synchronized between the data sets by applying the calculated warping path to skip values or to use them more than once [25,26,27,28,29]. This process is visualized in Fig. 2.

2.3.4 CR

CR was used as an alternative synchronization method to DTW. In this method, the online variables ${x}_{\text{raw}}$ of the historical data sets are aligned to the online variables ${x}_{\text{query}}$ of the query data set. Therefore, characteristic features (landmarks) are searched in curves and synchronized (Fig. 3). It is assumed that the curves of the sensor values consist of underlying continuous functions. For synchronization, the curve-specific characteristics of the trajectories, such as extrema or trend reversals, are identified and aligned between the processes. This can be done in raw and derived signal profiles [30,31,32,33]. Areas between the curve-specific characteristics are then linearly compressed or stretched.

2.3.5 MPCA

To create a search space to compare the historical data sets with the query data set, an MPCA [16,17,18,19] was performed. This search space is dimension- and noise-reduced compared to the input variable space. Therefore, the data matrix of the data pool (content: batch $I$, input variables $J$ and time $K$) has to be refolded from an $I\times J\times K$ matrix to an $I\times JK$ matrix. The query data set is also added as a row in this matrix. As a result, each batch represents a long row in a two-dimensional matrix. After this step, regular PCA can be performed on the two-dimensional matrix. The scores of each data sets’ first four principal components are now weighted by the variance they explain. The new dimension-reduced search space created this way is used to identify similar data sets.

2.3.6 Similarity analysis and selection of data sets via k-nearest neighbors

To identify the three most similar data sets, the variance-weighted scores ($w\bullet t$) of the query data set were compared with those of the historical data sets. Therefore, the Euclidean distance ${dk}_{j}$ between the query data set and each historical data set $j$ was calculated.

$$dk_{j} = \|w \cdot t_{{{\text{query}}}} - w \cdot t_{j} \| = \sqrt {\mathop \sum \limits_{i = 1}^{4} (w_{i} \cdot t_{{{\text{query}}, i}} - w_{i} \cdot t_{j,i} )^{2} }$$

(3)

Now the three lowest Euclidean distances ${dk}_{j}$ could be identified, and thus, the three most similar data sets could be selected.

2.3.7 Partial least square regression

A linear model was used to predict the target variables. All available, online measurable variables of the query process (hardware sensors, actuators, calculated variables) served as input for this model. Recalibration was performed using the selected similar data sets. As a calibration approach, PLSR, which is widely applied in bioprocesses, was used [2]. The advantage of this methodology is that in addition to the calibration of the prediction model, a dimension reduction of the input variables takes place. This is done by calculating latent variables representing combinations of the input variables. The composition of these latent variables is based on the covariance of those to the target variable. Iteratively, more and more latent variables are added to the prediction model, and each individual model is evaluated using mean squared error (MSE). For this purpose, the $i$ reference values ${y}_{\text{selected}}$ of the selected data sets are compared with the predictions $\widehat{y}$ of the models with increasing number $k$ of latent variables.

$${\text{MSE}}_{k}=\frac{1}{n}\sum_{i=1}^{n}{({y}_{\text{selected}}-{\widehat{y}}_{k})}^{2}$$

(4)

The optimal number of latent variables is determined based on the first local minimum of the MSE. Thus, the new, recalibrated prediction model ${f}_{\text{recal}}({x}_{\text{query}}(t))$ is defined. To prevent sharp jumps between the previous model ${f}_{\text{previous}}$ and the recalibrated model, a smooth transition is made for the $m$ timesteps ${t}_{m}$ in a defined transition period from the recalibration timestamp ${t}_{\text{recal}}$ to the end of the transition phase ${t}_{tr}$ using a forgetting factor $\lambda$ (changes linearly in the transition period from 1 to 0) between the old and the new models. The transition phase takes up the initial 40% of the time between two recalibrations.

$${f}_{\text{recal},tr }\left({x}_{\text{query}}\left({t}_{m}\right)\right)=\lambda {(t}_{m})\bullet {f}_{\text{previous}}\left({x}_{\text{query}}\left({t}_{m}\right)\right)+(1-\lambda \left({t}_{m}\right))\bullet {f}_{\text{recal}}\left({x}_{\text{query}}\left({t}_{m}\right)\right)$$

(5)

with $\lambda \left({t}_{m}\right)=1-\frac{{t}_{m}-{t}_{\text{recal}}}{{t}_{tr}-{t}_{\text{recal}}}$ (6).

2.3.8 Evaluation of prediction performance via quality parameters

The normalized root mean squared error of prediction (NRMSEP) is used to compare and evaluate the prediction performances of the models with and without synchronization methodology. Thereby, the $i$ reference values ${y}_{\text{query}}$ of the query data set are used with the values ${\widehat{y}}_{\text{query}}$ predicted by the models $f({x}_{\text{query}})$ with and without synchronization.

$$\text{RMSEP}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{({y}_{i,\text{query}}- {\widehat{y}}_{i,\text{query}})}^{2}}$$

(7)

$$\text{NRMSEP}=\frac{\text{RMSEP}}{{y}_{\text{max}} -{ y}_{\text{min}}}$$

(8)

3 Results and discussion

To discuss the influence of the synchronization methods on an automatically recalibrating soft sensor, the results are structured as follows: First, the choice of synchronization methods and the specific adaptation to the bioprocesses are discussed (Sect. 3.1). The results of the synchronizations are then presented and discussed. Here, the visual synchronization success on two essential input variables for the soft sensor, the carbon dioxide and the oxygen concentration in the exhaust gas of the process, is reasoned first. Then, the prediction performance of the soft sensor without synchronization is compared to those with CR and DTW. On the one hand, this is visualized on an example process. On the other hand, the mean NRMSE of the soft sensors for all data sets of a data pool is given, and differences are discussed. This is carried out for the P. pastoris process (Sect. 3.2) and the B. subtilis process (Sect. 3.3). Finally, the results are discussed regarding their transferability between bioprocesses and further potentials of the presented soft sensor concept (Sect. 3.4).

3.1 Selection of synchronization methods and implementation

Two different synchronization methods were used as preprocessing for the automatic recalibration of the soft sensors: DTW and CR. The objective was to increase the prediction performance of the soft sensors. Therefore, it was essential that, not only the general length of the process data, but all process-specific landmarks were synchronized. In the following, the choice of suitable synchronization methods is briefly discussed, and modifications are described.

A comprehensive overview of different synchronization methods for bioprocesses was given by Brunner et al. [34]. They described three different synchronization methods for bioprocess data: Indicator variable techniques, DTW, and CR.

With indicator variable techniques, other variables than time are used to describe the process progress. These can be single variables as well as linear combinations of sensor values. When using linear combination models, these can be trained using a partial least square method (PLSR) [35, 36]. Therefore, an additional maturity index (0–100%) is introduced, which describes the process progress. The problem with this method is, that not all process-specific landmarks are synchronized. Instead, only the process lengths are aligned. The suitability of this methodology in bioprocesses is relatively limited. Consequently, this methodology was not considered in this study.

DTW is suitable for synchronizing all process-specific landmarks and was therefore included in these investigations. A brief overview of the method has already been given in 2.3.3. Several modifications were made for the specific implementation. The synchronization was carried out iteratively. In the first three iterations, the historical data were synchronized to the query data set and aligned to the averaged trajectory of all data in the following iterations. With this procedure, it is possible to align all historical data sets to the same length as the query data set and to achieve the basic curve shape of the query data set. By the following synchronization to the mean curve, an even more exact synchronization of the curves is achieved without overweighing anomalies in the query data set (e.g., sensor faults). Multiple steps, following Kassidas et al. [25] and González-Martínez et al. [26], were used to perform the DTW. First, the variables were scaled by dividing them by the average range of each variable. This was followed by the first synchronization iteration, which determines the warping path based on the Euclidean distances between the data sets to be synchronized and the reference trajectory. In addition, temporally backward warping steps were interdicted when determining the warping path. Since a multivariate approach with several variables per data set was pursued, a weighting matrix for the next iteration was calculated, which weights the variables with consistent trajectories higher. Two possible cases were defined as termination criteria for the iterations: A maximum number of iterations of 20 and a change in the weighting matrix between the iterations of less than 1%. Furthermore, as preprocessing for the synchronization with DTW, a preselection of the data sets (n = 10) was performed using MPCA and k-nearest-neighbors based on the Euclidean distance of the data sets to the query data set. The procedure was analogous to the methods described in Sects. 2.3.5 and 2.3.6. With this step, the partly very different process characteristics were addressed, which otherwise would favor singularities with the synchronization starting from the 4th iteration. Singularities are defined as loss of information during the synchronization of processes caused by too frequent duplication of individual values.

Also, CR is suitable for synchronizing all process-specific landmarks and was part of this study. Depending on curve-specific characteristics, multiple signals’ landmarks can be synchronized using this methodology. For this study’s specific implementation of CR, a principal component analysis was first performed with the variables ${x}_{\text{query}}$ of the query data set. This was done to enable multivariate synchronization. In addition to the scores of the first principal component of the query data set, the scores of the historical data sets were then determined using the calculated loadings [33]. To synchronize the principal component scores, the search for landmarks was not performed analytically, but characteristic landmarks were identified with a generalist data-driven concept. The pruned exact linear time method was used to identify the landmarks. This method is an efficient algorithm for automatically identifying structural changes in data sets. The algorithm traverses the data set step by step, examining all possible subsegmentations to identify those that best explain structural changes (landmarks) [37]. Here, significant changes in the mean and the slope of the curve were considered as criteria. Due to the nonlinearity of bioprocesses, numerous landmarks were found in the first principal component of the bioprocesses. However, regarding the variable process characteristics, many and partly different numbers of landmarks were detected and had to be harmonized and reduced. Therefore, a knowledge-based minimum distance between the individual landmarks was specified for each process, which resulted in ten landmarks per total process for the P. pastoris process and six landmarks per complete process for the B. subtilis process. Subsequently, the landmarks found in the historical data sets were synchronized to the query data set. Areas between two landmarks were linearly stretched or compressed.

3.2 Comparison of the different data synchronization techniques for the prediction of the biomass concentration of the P. pastoris processes

The biomass concentration was selected as the target variable for the soft sensor prediction in P. pastoris, since the real-time availability can be used to optimize control concepts, such as methanol control. In addition to the exhaust gas values shown in this chapter, further process variables of a query process (see Fig. S1) are presented in the supplementary information.

Figure 4 shows a query process with a potential calibration data set. The potential calibration data set is demonstrated unsynchronized, after DTW and CR. The plot represents the trajectories at the 4th recalibration point in time. Comparing the trajectories, several absolute differences between the trajectories can be observed. On the one hand, there is a slight initial difference, probably due to an inaccurate calibration of the exhaust gas measuring device. Significant absolute differences between the data sets can be recognized in the later course (time window 45–50 h, beginning of the fed-batch phase). In this time frame, the query data set only reaches CO₂ concentrations of up to 1.2%, compared to the potential calibration data set with CO₂ concentrations of up to 1.5%. This is due to differences in the adjustment of the methanol concentration at the beginning of the fed-batch phase. Analogous differences can be recognized in the O₂ curves. Looking at the temporal variability of the processes, it can be noted that there are no major differences in the represented progressions until the end of the batch phase (~ 37 h); only after that do the processes vary in time (transition and fed-batch phases). There are also significant differences between the two synchronized curves. It can be observed that DTW changes the original curve shape more strongly than CR. However, evaluating the prediction performance of the resulting prediction model will show whether this is a valuable synchronization of the curve or overfitting and, thus, yield in loss of information.

In the following, the influence of the synchronization methods on an automatically recalibrated soft sensor for the biomass concentration prediction of the process was considered. Comparing the prediction performance of the soft sensor with and without synchronization (Fig. 5), there are no apparent differences between the predictions in the batch phase. This is not unexpected since there are few temporal differences between the processes during this phase. In the subsequent transition and fed-batch phase, on the other hand, there is a clear deviation of the soft sensor without synchronization methodology. As already observed in Fig. 4, this is an area where synchronization of the historical data sets results in a clear shift of the curves. When now comparing the average prediction performance of the soft sensor for all data sets (each data set acts as a query data set), it results as follows: NRMSEP_unsync = 13.5%; NRMSEP_DTW = 13.0%; NRMSEP_CR = 10.3%. A comparison of the NRMSEP of the recalibration without synchronization step with the recalibration with CR or DTW shows the following: On average, a 24% improvement in prediction performance could be achieved by applying CR during recalibration. However, no such substantial improvement (4%) can be achieved with DTW because of too extensive changes in the synchronized curve profiles and the resulting loss of information. Consequently, there is no optimal selection of the calibration data sets, and thus, regarding the mean NRMSEP, only a comparable prediction performance between DTW and the unsynchronized prediction methodology. In general, the main reason for the differences between the NRMSEPs lies primarily in the more frequent major misestimates in single recalibration steps (as in the example at hours 45–63 of the prediction without synchronization) than the minor differences between the predictions, such as at the end of the batch phase and the transition phase (around hour 37). Synchronizing the input variables of the soft sensor with CR can, therefore, lead to fewer major misestimates.

3.3 Comparison of the different data synchronization techniques for the prediction of the protein concentration of the B. subtilis processes

The B. subtilis bioprocess is an industrial process for commercial target protein production. The industrially most crucial variable, which cannot be measured directly online, was the protein concentration. Therefore, a soft sensor was used to predict it. For confidentiality reasons, not all process variables can be provided. The process variables shown in the figures have been normalized to the maximum value of the respective measured variable.

A slightly different pattern for the B. subtilis process emerges for the CO₂ and O₂ curves in the exhaust gas for an exemplary query process with a potential calibration data set (Fig. 6). The curves start very similarly but show first temporal differences during the batch phase. The potential calibration data set reaches the first CO₂ peak after 13% of the total process duration (given that the substrate of the batch phase is consumed, thus the end of the batch phase). In contrast, the query data set reaches this peak after 16% of the total process duration. Subsequently, O₂ consumption and CO₂ production increase again due to the limited addition of substrate during the fed-batch phase. This results in relatively stable exhaust gas values, which can differ significantly between the processes based on different feeding strategies. Thus, the CO₂ production levels are almost double the level of the query process. Such differences are common for processes with different process characteristics. However, this already shows a problem with DTW: Long singularities partly occur, starting at about 30% of the total process duration. The reason for this is the remaining absolute differences between the curve to be synchronized and the query process. Despite normalization of the curves before synchronization, these differences remain due to different feeding strategies. Such information losses do not occur during synchronization with CR. The underlying curve shape is primarily preserved, and yet the landmarks are aligned.

The differences between the unsynchronized and synchronized process variables are also noticeable in the predictions. Thus, the predictions of the protein concentration vary, especially at the third recalibration (Fig. 7). The prediction without synchronization shows overshooting. The prediction with DTW appears more suitable compared to the reference points, but the general prediction shape here is untypical for a bioprocess. Thus, a decrease in protein concentration (from about 55% of the absolute process duration) followed by an increase to the original level (from 80% of the total process duration) is not expected to be a plausible prediction if CO₂ production and O₂ consumption remain constant. Again, the automatically recalibrated soft sensor with CR convinces with its conclusive and steady curve shape. Considering the mean NRMSE of all 24 data sets of the B. subtilis data pool, prediction performances of NRMSEP_unsync = 17.4%, NRMSEP_DTW = 20.7%, and NRMSEP_CR = 15.9% can be calculated. Thus, the mean NRMSE can be reduced by 9% using CR compared to the soft sensor without synchronization. With DTW, no higher prediction performance can be achieved than with the soft sensor without synchronization. The reason for this is the partly strong singularities due to the large variances between the processes. These occur mainly in the B. subtilis process since substantially more different process strategies were used in this data pool than in the P. pastoris process.

3.4 Transferability between bioprocesses and further aspects

Both synchronization approaches could be straightforwardly transferred between the two bioprocesses. DTW could be applied directly in the form presented here without individual adjustments to both bioprocesses. A transfer to further bioprocesses is thus conceivable. For the transferability of CR, the number of landmarks must be defined manually. For this study, the number of landmarks was set to ten (P. pastoris) and six (B. subtilis). As the number of landmarks varies from process to process, depending on various factors such as the general velocity of the processes or the number of technical process phases (batch, fed-batch), a recommendation for further processes can only be made to a very limited extent. However, iterative approaches are also conceivable here, which automatically examine several landmark counts initially and define a suitable number depending on the synchronization performance. The number found here always represents an optimum between too many landmarks (incorrect detection of sensor faults as landmarks) and too few landmarks (no synchronization of all important landmarks).

When using DTW and CR as preprocessing for the automatic recalibration of soft sensors, CR led to higher prediction performances of the soft sensors in both bioprocesses. In contrast, DTW did not lead to any significant improvements (compared to the recalibration without synchronization method). The poor performance of DTW is mainly due to the complexity of the data sets. Changing process characteristics such as changing raw materials, modified process strategies, and biological variability lead to process variables differing not only in the temporal occurrence of landmarks, but also arise in the absolute value of the process variables, as shown in Fig. 6 at the beginning of the fed-batch phase of the B. subtilis process (from 25% of the process duration). This shows the weakness of DTW: With these permanent deviations in the absolute heights of curve profiles, singularities occur in the synchronization of the curves. The reason is, that with this method, the process curves are iteratively synchronized by skipping and duplicating individual sensor values until a termination criterion is reached. However, if the curves generally differ significantly from one another, due to changing process characteristics, the termination criterion is only reached after numerous iterations and long singularities are generated in the curves up to this point. This loss of information in the process variables subsequently leads to problems with the prediction performance of the soft sensors. This problem does not occur with CR. Here, characteristic landmarks are selected first, and only these are synchronized. The areas between the landmarks are then evenly stretched or compressed. The absolute values of the process variables are, therefore, less critical for synchronization, which is advantageous for the success of synchronization when process characteristics change.

The recalibration algorithm applied can be widely automated and straightforwardly transferred to further bioprocesses and target variables, such as other target proteins and by-products. Thus, the presented concept allows an even more comprehensive validation and optimization of the presented synchronization methods. Only a defined number of equal time intervals were specified as initial conditions for when the recalibrations are to be carried out. The general prediction performance of the soft sensor concept could be further improved by various other approaches besides the application of synchronization methods. As mentioned, the prediction is recalibrated in five fixed sections per process. These prediction windows could be adapted to the process phases by, e.g., automated phase detection [14, 38,39,40]. Thus, the underlying relationships between variables in phase-dependent selected process sections do not change, and better prediction models are formed. Furthermore, the algorithm can be even more automated, as the recalibration points no longer need to be specified.

Linear models were used as the soft sensor model structure in the algorithm presented. As mentioned, the prediction performance of these models can be achieved by a phase-dependent segmentation. Alternatively, nonlinear models such as artificial neural networks could be used. These models have more complex model structures and are also suitable when the underlying relationships are nonlinear [9]. However, a major challenge is the number of reference values of the data sets used for model training. An insufficient number can lead to overfitting when training the models and thus to poor prediction performance. This can be avoided by automated sampling and processing and, hence, a high number of reference values for model training. Nevertheless, the use of synchronization methods as preprocessing for the selection and calibration of soft sensor models can likely lead to an increase in prediction performance regardless of the model structure. Therefore, preprocessing the data sets with CR can also be recommended for nonlinear models. Most likely, the singularities that occur when using DTW are also challenging, as information losses lead to problems with all model types.

Additional methods to compensate for sensor faults should also be incorporated to optimize the prediction models' robustness further. Besides the fault-tolerant fusion of redundant soft sensor models [41], the detection of sensor faults utilizing pattern recognition [42, 43], symptom signal methods [36], or multivariate statistical process control [35] would be thinkable.

The recalibration concept presented allows the long-term use for the automated maintenance of soft sensors. However, another factor must be considered before realizing its real-time implementation. In this study, we worked with the data pools of two different bioprocesses, which were created over several months. However, the data pool must be regularly expanded for long-term real-time use on a bioprocess. This is because changing process characteristics can only be automatically compensated for in the prediction model if these or similar process characteristics have already occurred and are part of the data pool. If this condition is fulfilled, high long-term prediction performance can be achieved by combining synchronization methods and automated recalibration.

4 Conclusion

This study investigated the influence of synchronization methods on an automatically recalibrating soft sensor concept. Therefore, two different synchronization methods (DTW and CR) were used to preprocess the real-time automatic selection of the most similar data sets. These data sets were subsequently used to recalibrate the current soft sensor model. These studies were performed on two different bioprocesses (P. pastoris and B. subtilis) with different target variables. When comparing the NRMSEP of the soft sensors without synchronization method and with CR or DTW, there were significant differences in the prediction performance of the soft sensor models. The use of CR resulted in a reduction of the NRMSEP of up to 24%. DTW was less suitable for the synchronization of bioprocess data. This was due to the occasionally large differences between data sets in the data pools due to different process characteristics, such as new feeding strategies. This led to singularities in the synchronized data sets associated with loss of information, which impaired the predictions of the soft sensors. Nevertheless, the DTW algorithm could be further developed and adapted to the different process characteristics. Initial approaches to this have already been presented here. However, implementing a CR methodology is much more intuitive and requires minimal process knowledge in the form presented here.

Overall, soft sensors allow enhanced bioprocess monitoring and control. Unfortunately, the implementation in the biotechnological industry often fails due to the long-term usability of these sensors [10]. Optimized intelligent recalibration can address this issue and secure soft sensors a permanent place in the biotechnology industry.

References

Kadlec P, Gabrys B, Strandt S (2009) Data-driven soft sensors in the process industry. Comput Chem Eng 33:795–814. https://doi.org/10.1016/j.compchemeng.2008.12.012
Article CAS Google Scholar
Luttmann R, Bracewell DG, Cornelissen G et al (2012) Soft sensors in bioprocessing: a status report and recommendations. Biotechnol J 7:1040–1048. https://doi.org/10.1002/biot.201100506
Article CAS PubMed Google Scholar
Brunner V, Siegl M, Geier D et al (2020) Biomass soft sensor for a Pichia pastoris fed-batch process based on phase detection and hybrid modeling. Biotechnol Bioeng 117:2749–2759. https://doi.org/10.1002/bit.27454
Article CAS PubMed Google Scholar
Galvanauskas V, Simutis R, Lübbert A (2004) Hybrid process models for process optimisation, monitoring and control. Bioprocess Biosyst Eng 26:393–400. https://doi.org/10.1007/s00449-004-0385-x
Article CAS PubMed Google Scholar
Xu F, Zhang W, Wang Y et al (2024) Enhancing and monitoring spore production in Clostridium butyricum using pH-based regulation strategy and a robust soft sensor based on back-propagation neural networks. Biotechnol Bioeng 121:551–565. https://doi.org/10.1002/bit.28597
Article CAS PubMed Google Scholar
Biechele P, Busse C, Solle D et al (2015) Sensor systems for bioprocess monitoring. Eng Life Sci 15:469–488. https://doi.org/10.1002/elsc.201500014
Article CAS Google Scholar
Mandenius CF, Gustavsson R (2015) Mini-review: soft sensors as means for PAT in the manufacture of bio-therapeutics. J Chem Technol Biotechnol 90:215–227. https://doi.org/10.1002/jctb.4477
Article CAS Google Scholar
Randek J, Mandenius CF (2018) On-line soft sensing in upstream bioprocessing. Crit Rev Biotechnol 38:106–121. https://doi.org/10.1080/07388551.2017.1312271
Article CAS PubMed Google Scholar
Zhu X, Rehman KU, Wang B et al (2020) Modern soft-sensing modeling methods for fermentation processes. Sensors (Basel) 20:1771. https://doi.org/10.3390/s20061771
Article PubMed Google Scholar
Kano M, Fujiwara K (2013) Virtual sensing technology in process industries: trends and challenges revealed by recent industrial applications. J Chem Eng Japan 46:1–17. https://doi.org/10.1252/jcej.12we167
Article CAS Google Scholar
Chen K, Castillo I, Chiang LH et al (2015) Soft sensor model maintenance: a case study in industrial processes. IFAC-PapersOnLine 48:427–432. https://doi.org/10.1016/j.ifacol.2015.09.005
Article Google Scholar
Facco P, Bezzo F, Barolo M (2010) Nearest-neighbor method for the automatic maintenance of multivariate statistical soft sensors in batch processing. Ind Eng Chem Res 49:2336–2347. https://doi.org/10.1021/ie9013919
Article CAS Google Scholar
Fujiwara K, Kano M, Hasebe S et al (2009) Soft-sensor development using correlation-based just-in-time modeling. AIChE J 55:1754–1765. https://doi.org/10.1002/aic.11791
Article CAS Google Scholar
Siegl M, Kämpf M, Geier D et al (2023) Generalizability of soft sensors for bioprocesses through similarity analysis and phase-dependent recalibration. Sensors 23:2178. https://doi.org/10.3390/s23042178
Article CAS PubMed PubMed Central Google Scholar
Kadlec P, Grbić R, Gabrys B (2011) Review of adaptation mechanisms for data-driven soft sensors. Comput Chem Eng 35:1–24. https://doi.org/10.1016/j.compchemeng.2010.07.034
Article CAS Google Scholar
Villez K, Sin G, Vanrolleghem PA et al (2008) Combining multiway principal component analysis (MPCA) and clustering for efficient data mining of historical data sets of SBR processes. Water Sci Technol 57:1659–1666. https://doi.org/10.2166/wst.2008.143
Article PubMed Google Scholar
Wold S, Kettaneh-Wold N, MacGregor JF et al (2009) Batch process modeling and MSPC. In: Brown SD, Tauler R, Walczak B (eds) Comprehensive chemometrics: chemical and biochemical data analysis. Elsevier, Amsterdaml
Google Scholar
Yao Y, Gao F (2013) Multivariate statistical process control. In: Zhou H (ed) Computer modeling for injection molding: simulation, optimization, and control. Wiley, Hoboken
Google Scholar
Nomikos P, MacGregor JF (1995) Multivariate SPC charts for monitoring batch processes. Technometrics 37:41–59. https://doi.org/10.1080/00401706.1995.10485888
Article Google Scholar
Saptoro A (2014) State of the art in the development of adaptive soft sensors based on just-in-time models. Procedia Chem 9:226–234. https://doi.org/10.1016/j.proche.2014.05.027
Article CAS Google Scholar
Ito M, Matsuzaki S, Odate N et al (2004) Large scale database online modeling for blast furnace. In: Proceedings of the 2004 IEEE international conference on control applications, taipei, Taiwan, pp 2–4
Mei C, Chen Y, Jiang H et al (2017) Just-in-time modeling with a combination of input and output similarity criterions for soft sensor modeling in fermentation processes. Chem Eng Trans 61:1045–1050. https://doi.org/10.3303/CET1761172
Article Google Scholar
Stratton J, Chiruvolu V, Meagher M (1998) High cell-density fermentation. Methods Mol Biol 103:107–120. https://doi.org/10.1385/0-89603-421-6:107
Article CAS PubMed Google Scholar
Sahm H, Antranikian G, Stahmann KP et al (2013) Industrielle mikrobiologie. Springer Spektrum, Berlin
Book Google Scholar
Kassidas A, MacGregor JF, Taylor PA (1998) Synchronization of batch trajectories using dynamic time warping. AIChE J 44:864–875. https://doi.org/10.1002/aic.690440412
Article CAS Google Scholar
González-Martínez JM, Ferrer A, Westerhuis JA (2011) Real-time synchronization of batch trajectories for on-line multivariate statistical process control using dynamic time warping. Chemometr Intell Lab Syst 105:195–206. https://doi.org/10.1016/j.chemolab.2011.01.003
Article CAS Google Scholar
González-Martínez JM, de Noord OE, Ferrer A (2014) Multisynchro: a novel approach for batch synchronization in scenarios of multiple asynchronisms. J Chemom 28:462–475. https://doi.org/10.1002/cem.2620
Article CAS Google Scholar
Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: Kumar V, Grossman R (eds) Proceedings of the 2001 SIAM International Conference on Data Mining (SDM). Society for Industrial and Applied Mathematics, Philadelphia
Spooner M, Kold D, Kulahci M (2018) Harvest time prediction for batch processes. Comput Chem Eng 117:32–41. https://doi.org/10.1016/j.compchemeng.2018.05.019
Article CAS Google Scholar
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Book Google Scholar
Ündey C, Williams BA, Çınar A (2002) Monitoring of batch pharmaceutical fermentations: data synchronization, landmark alignment, and real-time monitoring. IFAC Proc 35:271–276. https://doi.org/10.3182/20020721-6-ES-1901.01354
Article Google Scholar
Andersen SW, Runger GC (2012) Automated feature extraction from profiles with application to a batch fermentation process. J R Stat Soc C Appl Stat 61:327–344. https://doi.org/10.1111/j.1467-9876.2011.01032.x
Article Google Scholar
Williams BA, Undey C, Cinar A (2001) Detection of process landmarks using registration for on-line monitoring. IFAC Proc 34:221–226. https://doi.org/10.1016/S1474-6670(17)33827-2
Article Google Scholar
Brunner V, Siegl M, Geier D et al (2021) Challenges in the development of soft sensors for bioprocesses: a critical review. Front Bioeng Biotechnol 9:722202. https://doi.org/10.3389/fbioe.2021.722202
Article PubMed PubMed Central Google Scholar
Krause D, Hussein MA, Becker T (2015) Online monitoring of bioprocesses via multivariate sensor prediction within swarm intelligence decision making. Chemometr Intell Lab Syst 145:48–59. https://doi.org/10.1016/j.chemolab.2015.04.012
Article CAS Google Scholar
Brunner V, Klöckner L, Kerpes R et al (2020) Online sensor validation in sensor networks for bioprocess monitoring using swarm intelligence. Anal Bioanal Chem 412:2165–2175. https://doi.org/10.1007/s00216-019-01927-7
Article CAS PubMed Google Scholar
Killick R, Fearnhead P, Eckley IA (2012) Optimal detection of changepoints with a linear computational cost. J Am Stat Assoc 107:1590–1598. https://doi.org/10.1080/01621459.2012.737745
Article CAS Google Scholar
Undey C, Cinar A (2002) Statistical monitoring of multistage, multiphase batch processes. IEEE Control Syst 22:40–52. https://doi.org/10.1109/MCS.2002.1035216
Article Google Scholar
Wang S, Chang YQ, Zhao Z et al (2012) Multi-phase MPCA modeling and application based on an improved phase separation method. Int J Control Autom Syst 10:1136–1145. https://doi.org/10.1007/s12555-012-0608-x
Article Google Scholar
Yao Y, Gao F (2009) A survey on multistage/multiphase statistical modeling methods for batch processes. Annu Rev Control 33:172–183. https://doi.org/10.1016/j.arcontrol.2009.08.001
Article Google Scholar
Siegl M, Brunner V, Geier D et al (2022) Ensemble-based adaptive soft sensor for fault-tolerant biomass monitoring. Eng Life Sci 22:229–241. https://doi.org/10.1002/elsc.202100091
Article CAS PubMed PubMed Central Google Scholar
Mehranbod N, Soroush M, Piovoso M et al (2003) Probabilistic model for sensor fault detection and identification. AIChE J 49:1787–1802. https://doi.org/10.1002/aic.690490716
Article CAS Google Scholar
Mehranbod N, Soroush M, Panjapornpon C (2005) A method of sensor fault detection and identification. J Process Control 15:321–339. https://doi.org/10.1016/j.jprocont.2004.06.009
Article CAS Google Scholar

Download references

Acknowledgements

This study was supported by a German Federal Ministry of Education and Research—03160475B.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Chair of Brewing and Beverage Technology, Technical University of Munich, 85354, Freising, Germany
Manuel Siegl, Dominik Geier & Thomas Becker
Clariant Produkte (Deutschland) GmbH, 82152, Planegg, Germany
Björn Andreeßen, Sebastian Max, Esther Mose & Michael Zavrel
Professorship Bioprocess Engineering, Technical University of Munich, 94315, Straubing, Germany
Michael Zavrel

Authors

Manuel Siegl
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Geier
View author publications
You can also search for this author in PubMed Google Scholar
Björn Andreeßen
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Max
View author publications
You can also search for this author in PubMed Google Scholar
Esther Mose
View author publications
You can also search for this author in PubMed Google Scholar
Michael Zavrel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Becker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Siegl.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Informed consent

Neither ethical approval nor informed consent was required for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 231 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Siegl, M., Geier, D., Andreeßen, B. et al. Data synchronization techniques and their impact on the prediction performance of automated recalibrated soft sensors in bioprocesses. Biotechnol Bioproc E (2024). https://doi.org/10.1007/s12257-024-00120-7

Download citation

Received: 26 February 2024
Revised: 03 June 2024
Accepted: 11 June 2024
Published: 24 June 2024
DOI: https://doi.org/10.1007/s12257-024-00120-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data synchronization techniques and their impact on the prediction performance of automated recalibrated soft sensors in bioprocesses

Abstract

Similar content being viewed by others

Soft sensor development based on just-in-time learning and dynamic time warping for multi-grade processes

Online sensor validation in sensor networks for bioprocess monitoring using swarm intelligence

Strategies for synchronizing chocolate conching batch process data using dynamic time warping

1 Introduction

2 Materials and methods

2.1 P. pastoris process—cultivation and hardware

2.1.1 Strain, preculture conditions, and main culture

2.1.2 Bioreactor, sensor systems, and reference measurements

2.2 Industrial B. subtilis process—cultivation and hardware

2.2.1 Strain, preculture conditions, and main culture

2.2.2 Bioreactor, sensor systems, and reference measurements

2.3 Automatic recalibration of soft sensors with different synchronization methods

2.3.1 Structure of the automatic recalibration soft sensor concept

2.3.2 Preprocessing of data sets

2.3.3 DTW

2.3.4 CR

2.3.5 MPCA

2.3.6 Similarity analysis and selection of data sets via k-nearest neighbors

2.3.7 Partial least square regression

2.3.8 Evaluation of prediction performance via quality parameters

3 Results and discussion

3.1 Selection of synchronization methods and implementation

3.2 Comparison of the different data synchronization techniques for the prediction of the biomass concentration of the P. pastoris processes

3.3 Comparison of the different data synchronization techniques for the prediction of the protein concentration of the B. subtilis processes

3.4 Transferability between bioprocesses and further aspects

4 Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 231 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation