Data assimilation as a key step towards the implementation of an efficient management of dissolved oxygen in land-based aquaculture

A data assimilation (DA) methodology, e.g. the continuous-discrete Kalman filter (CD-KF), was applied to the assimilation of dissolved oxygen data, in order to obtain a dynamic estimation of the oxygen demand in a land-based aquafarm. The CD-KF was implemented on a dynamic model, which included as state variables the concentration of dissolved oxygen (DO) and fish respiration rate: the latter was considered as a non-observable stochastic variable. The model was applied to a 1-month long set of observations collected at a raceway rainbow trout farm, including (1) hourly time series of water temperature and dissolved oxygen concentration in the raceway influent and effluent and (2) a daily time series of fish number and fish weight distribution. The results show that the assimilation of DO data led to a dynamic estimate of DO demand which showed changes in the daily mean and the daily pattern: these were related to changes in the feeding regime. Furthermore, the methodology provided accurate short-term predictions of the DO concentration also in the presence of short-term fluctuations, which would be very difficult to relate to external forcings in a mechanistic model. These findings indicate that DA could be effectively used to design and implement efficient and robust control systems for optimizing the oxygen supply, thus contributing to the implementation of Precision Fish Farming in land-based aquafarms.


Introduction
The dynamics of dissolved oxygen (DO) concentration is of key interest in land-based fish farming systems, as the oxygen supply through oxygenation or aeration is critical for fish welfare and growth (Lawson, 1995;Colt & Orwicz, 1991). Fish respiration is thus one of the most important processes to take into account when designing a land-based system: according to (Colt and Maynard, 2019) the daily pattern of fish respiration rate should be considered, in order to prevent low DO concentration conditions. The detection and prediction of short-term changes of the oxygen demand is also a key component of an adaptive oxygen control system that modulates the oxygen supply in response to the current demand.
The Precision Fish Farming (PFF) approach  aims at supporting optimal management decision-taking by feeding predictive models with real-time data concerning environmental variables, e.g. DO concentration, and "animal variables", e.g. fish biomass and size distribution. Royer et al. (2021) applied this approach to a trout farm case study in Northern Italy, showing that this methodology can improve both the economic and environmental sustainability.
Rainbow trout metabolism, in particular the standard metabolic rate, shows a daily dynamics which is affected by both circadian rhythm and feeding regime (Bolliet et al., 2001;Heydarnejad & Purser, 2009). Bolliet et al. (2004) highlighted the importance of adequately phasing the food distribution with internal rhythms associated with nutrient partitioning. Royer et al. (2021) showed that the daily pattern of oxygen concentration in a trout farm can be simulated using a model of DO dynamics: in this study, fish respiration was modelled using a sinusoidal function. However, the daily mean value, phase and amplitude may vary from day to day, due to changes in feeding regimes, fish health and fluctuations in environmental, non-controllable, variables, e.g. water temperature and concentration of suspended particles. Such short-term variability can hardly be explained and predicted using process-based models. Thus, combining these models with a data-driven approach could be a flexible, robust and computationally efficient solution.
Data assimilation (DA) methods combine model predictions and observations in order to improve the estimation of the state of a given system, thus allowing a more reliable forecast (Bocquet, 2014;Niu et al., 2014). This goal is achieved by assuming that the state variables of a process-based model are, in fact, stochastic ones: non-observable variables, such as the parameters of a process-based model, can also be included in the state vector, and, therefore, their value can change in time. These changes can be driven by the information gained any time a new observation is collected. DA was introduced in statistics and engineering (Kalman, 1960), and its popularity has increased in several fields, such as meteorology (Ide et al. 1999), oceanography (Beck, 1987;Pastres et al. 2003) and ecology (Niu et al., 2014;Luo et al., 2011). DA methods are not widely used in the aquaculture field as yet. In this context, the assimilation of real-time observations can markedly improve the robustness and short-term predictive skill of dynamic models, which are the key components for implementing efficient data quality control and management systems, based on the PFF principles. As such, DA-based forecast and control systems could quickly expand in aquaculture, thanks to the availability of low-cost sensors able to monitor both environmental and animal variables .
To our knowledge, dynamic models including DA algorithms were not previously applied to the management of dissolved oxygen supply. The Kalman filter (KF) (Gelb 1974) is one of the most popular DA methods, which allows real-time estimation of 1 3 state variables and parameters of a system based on observational knowledge. The main goals of this study were (1) to test the applicability of the KF method and provide a protocol for its implementation in a land-based trout farm and (2) to show that a DO model including a DA algorithm could provide reliable short-time forecast of fish oxygen demand, thus leading to improve both economic efficiency and fish welfare, as sudden drops of DO could be avoided. The model was tested on a rainbow trout farm, as part of the activities of the GAIN (Green Aquaculture Intensification in Europe) H2020 EU-funded project.

Kalman filter
The Kalman filter (KF) (Gelb, 1974;Kalman, 1960) is a well-known recursive estimation technique that has been used for decades in many different fields, including ecological modelling (Luo et al. 2011;Niu et al. 2014;Dabrowski, 2018). This DA methodology was applied also to aquaculture for monitoring fish behaviour (Pinkiewicza et al. 2011;Mehammer et al. 2018) and forecasting fish growth (Marafioti et al., 2012). The objectives of this method are (i) to improve the prediction of observable output variables, based on the information provided by any new datum, and (ii) to improve the estimation of the remaining state variables, which may include key model parameters. Key to the method is the assumption that the state variables are stochastic: in a process-based model, their dynamic is driven by a set of governing differential equations, whereas the introduction of a DA algorithm allows one to account for deviations from their trajectories, due to processes which are not explicitly taken into account by the mechanistic component of the model.
The KF was originally designed for linear models, but it can also be applied to nonlinear models, by linearizing the model equations. The algorithm can assimilate continuous or discrete data. For instance, Marafioti et al. (2012) implemented a continuous-discrete extended Kalman filter (CD-EKF) in order to deal with non-linearity of the model and discrete measurements of the state.
In this work, as observations are discrete and the model is a linear continuous-time model, the continuous-discrete Kalman filter (Lewis et al. 2016) was used: In Eq. 1a, X is the augmented state (that includes not only the state of the process-based model but also some parameters of this model that becomes, in the KF perspective, part of the state), F is the system dynamics matrix, B is the forcing matrix, U(t) is a vector of external forcings and (t) is a random system noise, such that (t) ∼ N(0, Q) , with Q being the system noise covariance matrix. Equation 1b is the measurement equation in which Z k is the vector of variables measured at time k, H k the state-to-measurement matrix, which project the state space onto the space of the output variables, and (t) ∼ N(0, R) is the measurement noise.
The "best" estimate of the system state between two measurements is given by Eq. 1a. Once a new measurement is collected at time "k", the state is corrected based on Eq. 2: where X k∶k−1 is the value of the augmented state estimated using Eq. 1a, k is the innovation term and X k∶k is the corrected augmented state vector. The innovation term k is a weighted difference between the measurement and the "a priori" estimate of the system state: The weight K , i.e. the "Kalman gain", takes into account the variances of both the system and measurement noise: its dynamic is given by the following matrix differential equation: where P is the covariance matrix of the state and R is the variance of the measurement noise. In order to apply Eq. 4, it is necessary to solve the covariance matrix propagation in Eq. 5, which read as: where Q is the system noise covariance matrix. The covariance matrix P is also updated after the collection of every new measurement using: where I is the identity matrix.
In summary, the CD-KF algorithm includes three steps ( Fig. 1): 1) Eqs. 1a and 5 are used for estimating the state and its covariance between t = "k-1" and t = "k". 2) After collecting a new datum at t = k, (i) the Kalman gain is computed, using Eq. 4.
The augmented state vector and the covariance matrix are corrected using Eqs. 3 and 6, respectively. (2) The three steps of the Kalman filter methodology for data assimilation

Application of CD-KF to the DO model
The KF methodology was applied for developing and testing a dynamic model for the prediction of DO concentration within the raceway and of oxygen demand due to fish respiration. The mechanistic model equations are presented in Table 1, which also explicitly shows the F, H and K matrices of Eqs. 1a and 1b. The DA model includes two state variables, namely: DO concentration, x 1 , and the fish specific respiration rate at 15° C, x 2 . The expected value of DO, Eq. 1 in Table 1, is estimated based on the following assumptions (see also Royer et al., 2021): (1) the raceway is well mixed; (2) fish respiration is the only significant contribution to oxygen consumption; (3) the rate of DO exchange with the atmosphere is proportional to the DO deficit (Ginot & Hervé, 1994). The expected value of the specific respiration rate is assumed to be constant (Table 1): its dynamics are driven entirely by the correction step of the KF algorithm. Water temperature is, in this case, a non-controllable model input: its effect on the respiration rate was modelled using an exponential function (Table 2) (Brigolin et al. 2010).
The matrix H and the explicit formulation of the Kalman gain, K, are given in Table 3, which also presents the parameters which specify the stochastic components of the model, Table 1 Process-based dynamic model simulating the evolution of dissolved oxygen and fish-specific respiration rate State variables Water flow rate [L h −1 ] Vectors and matrices of the model   This study mgh −1 kg −1 Oxygen consumption rate DO sat mgL −1 Oxygen concentration at saturation Ginot and Hervé (1994) S mgL −1 h −1 Oxygen concentration supply rate Initial values of the state variables

Royer et al. (2021)
Table 3 Initial conditions for the covariance state matrix, P, and the system noise covariance one, Q (the latter is assumed to be constant in time), and state-to-measurement and Kalman gain matrices, H and K State covariance Royer et al. (2021) System noise

State-to-measurement matrix and Kalman gain
also called "hyperparameters". Their estimation in real world applications of the KF algorithm could be critical: in this paper, the methodology applied in Pastres et al. (2003) to a model simulating DO dynamics in a water body was used. The value of the parameter R 15 was taken from Royer et al. (2021). The variances of the two state variables of the augmented state vector obtained from the calibration were subsequently used for estimating the system noise, represented by the diagonal matrix Q. The measurement noise, R, was estimated by averaging the absolute value of the differences between two consecutive measurements, in accordance with Woodall and Adams (1989).

External forcings and system observations
The model described in the previous section was tested at a raceway trout farm located in Preore, (Trentino-Alto Adige, Italy): the, purposely designed, monitoring system is described in detail in Royer et al. (2021) and summarized below. The model requires as input time series of concentration of DO in the influent, x in , water temperature, T w , oxygen supply rate, S, and fish biomass, M. The latter is a key, time dependent, input, which connect the specific respiration rate, x 2 , to the oxygen concentration, x 1 : see first state equation in Table 1. In this application, the total biomass was calculated by multiplying the number of fish by the average fish weight. The former daily time series was obtained by subtracting from the initial number of fish the daily mortalities. The fish weight daily distribution was estimated using the non-invasive monitoring system, "Biomass Daily", which is being used to estimate weight distribution in Atlantic salmon cages (Lopez Riveros, 2017) and was tested on rainbow trout during the H2020 project GAIN (Bolzonella et al, 2022).
The concentration of DO and water temperature were monitored at the inlet and outlet with an hourly frequency using two YSI EXO2 probes.
The model was applied to the data set collected from July 3, 2019, to July 31, 2019, in order to test its predictive capability under different feeding regimes. In fact, fish were not fed or underfed during the first 2 weeks, hereafter called "fasting weeks" due to the presence of a pathogen Lactococcus garvieae and regularly fed during the following 2 weeks, hereafter called "feeding weeks" with daily ration ranging from 0.4 (81 kg of feed) to 0.96% (200 kg of feed) of the average fish weight (see also Fig. 2 in the next section).
The model performance was assessed using the root mean square error, RMSE, as a goodness of fit indicator: where n is the number of observations, y k is the DO concentration in the effluent at time t k , and x k−1∶k is the DO concentration forecasted at time t k using Eq. 1.
The time series of the innovation components ( k ) were plotted and compared in terms of relative innovation, i.e. the ration between the innovation and the state estimate before the correction step, in order to assess the predictive skill of the mechanistic part of the model, and to better quantify the innovation itself.
The model and the CD-KF algorithm were coded using R software core (version 3.4.0) within R Studio (version 1.0.143). The differential equations were solved using the deSolve R package (version 1.21). The CD-KF matrix equations were entirely coded by the authors (none of the available KF packages for R was used).

Results
The results of the model application are summarized in Fig. 2, which shows the trajectories of the state variables, and in Fig. 3, which presents their relative innovations. A visual inspection of Fig. 2a, which presents the comparison between the DO estimated prior the correction step and the observations, indicates that the model succeeded in simulating the dynamics of the observations in both the "fasting" and "feeding" weeks. This is confirmed by the RMSE which, overall, was 0.27 [mgL −1 ], i.e. about 3% of the mean DO. The figure also shows that the daily pattern of DO observations was much more regular during the "fasting weeks": in this time window, the model performed better (RMSE = 0.12 mg/L). During the following "feeding weeks", the daily range was, in general, higher, and isolated negative peaks were observed: the model, however, still provided reasonably accurate predictions, with an RMSE of 0.4 mg/L, i.e. about 5% of the mean DO. Figure 2b shows the trajectory of the fish-specific respiration rate and, on the second axis, the daily feed ration, expressed as a percentage of the average fish weight. The respiration ranged from 20.12 to 208.45 [mg kg −1 h −1 ], with an average value of 95.33 [mg kg −1 h −1 ]. The fish oxygen demand is proportional to this non-observable variable, which showed both a clear daily pattern as well as changes in the mean level, which were entirely driven by the correction step and seems to be related mainly to the feeding regime. In fact, the daily range was markedly higher in the "feeding window", compared with the "fasting" one. Furthermore, the mean level also increased soon after the feeding started and showed a sudden decrease when the farmer stopped the feeding again for 3 days, from July 22nd to 25th.

3
This is confirmed by the analysis of the time series of the relative innovations, shown in Fig. 3. In fact, the relative innovation was much lower for both variables during the "fasting weeks", compared to "feeding" ones, in which both innovations showed marked daily oscillations, perturbed by positive spikes: these are related to the irregular pattern of the DO observations and, in particular, to the isolated negative peaks shown in Fig. 2. The autocorrelation functions of both variables, not shown, presented a minimum at the lag of 12-h lag and a maximum at the lag of 24 h; this means that innovation cannot be considered as an uncorrelated "white" noise.
The mean values, standard deviations and ranges of the innovations and of the relative innovations, i.e. the ratio between the innovation νk and the state estimate before the correction, X k−1∶k , are presented in Table 4. The standard deviation and range of the R 15 relative innovation were significantly higher than those concerning DO, indicating that the introduction of the stochastic component leads to improve the model prediction and to estimate a relatively large fluctuation in the oxygen demand.

Discussion
The findings presented in the previous section indicate that the data assimilation method tested in this paper allows one to: 1) obtain reliable predictions of dissolved oxygen concentration in a land-based system, during markedly different feeding regimes 2) detect and obtain a quantitative estimate, in a real-world case study, of the oxygen demand, which results from the complex interplay of the circadian component of fish metabolism and the response to the feeding regime, often perturbed by other factors which could be hardly accounted for in a process-based model The model proposed in this paper requires as an input the fish biomass density, which, in this case study, was estimated using a non-invasive monitoring system that may not be available in many farms. This may seem to limit the applicability of the methodology, but, in fact, the biomass is a smooth function, which could be interpolated from sampled data and/or estimated between samples using growth models (Bolzonella et al., 2022).
The CD-KF algorithm is a rather straightforward and efficient DA methods, which, however, is based on some hypothesis about the stochastic component of the model: in practical applications concerning ecology and environmental sciences, such hypothesis may be difficult to test or may not be fully complied with. In these cases, the KF is non-optimal, but this does not necessarily mean that its performance could be severely deteriorated: as the application of DA to aquafarming is still at a very early stage, in this paper, we tested the original KF formulation. The innovation statistics, see Table 4, indicates that, on average, the DA did not introduce a systematic bias in the state estimate. Furthermore, the standard deviation of the DO innovation was close to the estimated measurement noise, see Table 3, and below 2% of the model estimate, thus suggesting that, overall, the introduction of a stochastic component led to improve the predictive capability of the model, but the mechanistic component could still account for a large fraction of the data variability. The standard deviation of the innovation concerning the fish-specific respiration rate was higher than the DO one but, in relative terms, around 10% of the current system estimate and below 10% of its initial value. The mean respiration rate estimated for the whole period (July 3 to 31) was 95.33 [mg kg −1 h −1 ], while Royer et al. (2021) estimated for the same period a mean of 101.5 [mg kg −1 h −1 ]. The daily mean ranged from 53. 82 to 137.29 [mg kg −1 h −1 ], which is consistent with the findings presented in Alsop and Wood (1997) who measured a rate between 96 and 192 [mg kg −1 h −− ] and Eliason and Farrel (2014) who reported a rate of 75 [mg kg −1 h −1 ]. Furthermore, the peak of 137.29 [mg L −1 ] was close to the value reported by Colt and Watten (1988).
The most severe violation of the basic hypothesis is given, in fact, by the autocorrelation of the innovation time series: in fact, the partial autocovariance function showed a daily pattern, which, however, is already very evident from the time series of R15 in Fig. 2b. In principle, this points to the need of improving the mechanistic component of the model, by introducing a function which could mimic a systematic daily change of the specific respiration rate. Such a solution was suggested in Tudor (1999) and tested on the same data set used in this paper by Royer et al. (2021), who modelled this metabolic rate using the sinusoidal function: where t is the time [h], R m is the average daily rate, A is the amplitude of the daily oscillation and φ is the phase. The implementation of Eq. 8 requires the estimation of three parameters which, however, can hardly be considered as constant in time, as they depend on the feeding regime. In fact, in Royer et al (2021), R m was estimated using the data collected during the "fasting weeks", but A had to be estimated for both the "fasting" and "feeding" weeks. The comparison between the RMSE of the two models already suggests that the DA algorithm performs better than the process-based model: the CD-KF gave an RMSE of 0.12 vs 0.19 mg/L for the "fasting" weeks and of 0.44 vs 0.55 for the "feeding" ones. The improvement is more marked in the "feeding" weeks, which were characterized by higher daily oscillations of DO and higher noise: these were likely due to feeding anticipatory activity (Heydarnejad & Purser, 2009) and feed digestion. The DA model was able to track the effect of these processes and provided a dynamic estimate of the fish respiration rate which showed changes in both the daily mean and daily ranges.
As a result, the simulated DO followed the observations more closely, compared with the sinusoidal respiration model. The differences in the daily pattern of the fish respiration within the two time windows is clearly shown in Fig. 4, which presents its hourly distribution, estimated based on the results provided by the DA algorithm. The sinusoidal model (Eq. 8) performed quite well in the fasting weeks, characterized by a lower amplitude of the daily oscillation and lower variability. However, the same model, even after a retuning of two parameters, did not adequately simulate the daily respiration pattern under normal feeding. This could be due to both the choice of the function and/or the identifiability issues which arise in the simultaneous calibration of the two parameters, A and φ, and could lead to bias/uncertain estimates which are more noticeable as the noise increases.
(8) R 15 = R m + Acos 2 (t∕24) + Fig. 4 Box plot summarizing the hourly distribution of the fish respiration rate: a for the "fasting weeks" time window; b for the "feeding weeks" one. The whole segment represents the range, the box indicates the interquartile range, and the horizontal bar is the median. Green bullets are the mean values In fact, the retuning of the parameters led to estimate the same phase φ for the two time windows, thus failing in simulating the 6-h lag between the feeding time, about 9 a.m., and the respiration peak: this lag, however, is consistent with the literature (Colt & Watten, 1988). The retuning, indeed, led to estimate a higher absolute value of the amplitude A in Eq. 8 but did not allow to track the day-to-day variability of the hourly respiration rate which is shown in Fig. 4b.
The predictive capability of the CD-KF model could be used to design an advanced control system, based on the dynamic tuning of the oxygen supply, taking into account the residence time of the raceway, which was about 45 min. Therefore, a 1-h ahead prediction of the oxygen demand could already provide a useful input to such a system. Furthermore, the model output and, in particular, the innovation time series could provide relevant information for the quality control of the observations: such system could be very relevant for avoiding false alarms, in case sudden drops of DO concentration due to random fluctuation of the signal rather than to a real and potentially dangerous decrease due, for example, to a failure in the supply system. In practice, it may not be easy to identify random outliers by visual inspection or statistic on DO, and it could be easier to process the innovation, which, as can be seen in Fig. 3, shifts from highly positive to highly negative peaks when the observed DO (see Fig. 2a) suddenly drops but then goes back to acceptable levels. Since the parameter R15 has a physiological interpretation, as long as data are collected, it would be possible to establish a relative innovation threshold, for example, 2 times the standard deviation estimated on a representative data set, beyond which the observation is regarded as not reliable or suspicious.

Conclusion
The implementation of aquafarm management practices based on the PFF approach requires the development of dynamic models for connecting the response of the "animal variables" to change in the environmental variables and control systems. Modelling approaches based on data assimilation perfectly fits this framework, as they allow one to update the estimates also of unobservable variables, based on real-time processing of the available observation.
To our knowledge, data assimilation was not previously applied to the estimation of fish respiration rate, a parameter that can hardly be measured in a real faming system. This task was accomplished using a basic version of the continuous-discrete Kalman filter, one of the simplest and most computationally efficient data assimilation method. Its application showed that it is possible to forecast the DO concentration in a trout raceway, as well as to obtain a dynamic estimate the fish respiration rate, using a model with a simple mechanistic component. The model was tested on a comprehensive data set, collected at a rainbow trout farm during a 1-month-long monitoring survey: during this period of time, the feeding regime drastically changed, from fasting to feeding the full ration recommended by the feeding tables. In spite of that, the model provided DO forecasts consistent with measurements as well as the mean and standard deviation of the respiration rate which were in line with the literature, without any need to retune the parameters. Furthermore, the CD-KF algorithm allowed to estimate changes in the mean daily respiration, correlated to the feed ration, as well as a consistent daily pattern, due to the interplay of the circadian rhythm of the fish metabolism and the time of feeding. The model results were also compared with those obtained using a more complex mechanistic model, in which the daily changes of the fish respiration rate were simulated using a sinusoidal function, thus introducing two more but time-invariant parameters. As the model here presented performed better, we believe that the introduction of data-driven stochastic component in a simpler mechanistic model is a more efficient and robust solution to the problem of estimating fish oxygen demand and predicting DO concentration in a real-world land-based aquafarm.
The results presented in this paper suggests that this flexible modelling approach provides robust predictions of the respiration rate without the need of retuning the model in relations to changes in the feeding regime, in terms of quantity, i.e. daily ration, but also quality. This opens the way to the design and implementation of dynamic and efficient control system of the oxygen supply, which could be easily automatized. Such a system could contribute to the improvement of farm's sustainability: besides reducing the management costs, it may also lead to improve fish welfare and appetite, as sudden drops in the DO concentration could be avoided by matching the estimated demand with the supply.