1 Introduction

In many applications and problems, like the model predictive control of glycemia in diabetic subjects, there is a need to estimate the system state based solely on noisy measurements of the system output. Traditional recursive state estimators/observers, such as the Kalman filter [1], are algorithms based on the prediction and correction of the state estimate by the new output measurement. It means that the state estimate is corrected (innovated) according to the single step-ahead output prediction error produced by the system model.

It is known from the theory of Kalman filtering, that for an optimal state estimator, the sequence of the output error of the state observer (OESO), which is also called the innovation sequence, has the properties of Gaussian white noise. However, in the case of suboptimal state estimation, this innovation sequence is correlated [2, 3] and thus can be effectively predicted in real time.

However, why should one even consider using the suboptimal state estimation instead of the optimal one? To answer this question, the suboptimal state estimation is basically not an option of choice, but rather an inevitable consequence that arises from the unknown or poorly estimated parameters of the process noise model. The assumption of suboptimality is because the design of Kalman filter and its optimality relies on the exact knowledge of the process noise model, parameters of which are highly uncertain in many applications, or even the entries of the process noise covariance matrix are often used simply as the tuning parameters. It can be claimed that if there is a mismatch between the noise model and the actual statistical properties of the system, the OESO forms a correlated sequence.

The main practical motivation for studying the dynamics of the OESO is to try to predict it in real time and then correct the predictions of the output variable accordingly, yet the ultimate aim is to involve this prediction in the model predictive control. Therefore, the outlined strategy can be seen as a relatively feasible and cheap way to improve prediction and control performance by effectively compensating for the effect of suboptimal state estimation.

Concerning the target application domain outlined in the title, performing highly accurate predictions yields better forecasting of severe hyper- and hypo-glycemia states, as these are the major risks linked to diabetes and its treatment [4, 5]. Another application of the proposed strategy is possible within an implementation of the model predictive control-based artificial pancreas [6, 7] to control glycemia in subjects with type 1 diabetes by automating the insulin dosing.

The rationale for choosing this application domain to demonstrate the effectiveness of the proposed strategy is supported by the fact that in most available studies, e.g. in [8,9,10,11,12,13,14,15,16,17], the crucial entries of the covariance matrix of the process noise are considered tuning parameters in the Kalman filter design. Therefore, it can be concluded that the state estimator works suboptimally in such scenario.

It should be mentioned for completeness, that there also exist sophisticated and dedicated methods [18,19,20,21,22] for estimating the noise models, which can potentially eliminate the problem of suboptimality of the state estimation. Unfortunately, these covariance matrix estimation methods have limited practical applicability, as they often provide biased estimates and typically require very large datasets to be supplied, what can be infeasible in many applications.

In this paper, we provide important insights into the dynamics of the OESO from the analytical point of view, but also from the practical data-driven perspective where reduced models are considered. The primary motivation for using the reduced structures, such as the autoregressive and the moving average model, rather than the full analytical model, is their feasible prediction and identifiability from the experimentally obtained OESO sequence.

The paper has been divided into the following sections: Sect. 2 introduces the basic preliminaries, formulation of the stochastic state-space model, and equations describing the conventional state observer. In Sect. 3, the full analytical model of the OESO is derived to provide some theoretical background. Section 4 considers the reduced autoregressive model and comprises the formulation of the corresponding identification problem and the predictive equation. In Sect. 5, the moving average model is studied in a similar way. Section  covers the idea of using the identified models to enhance the output prediction accuracy and their inclusion in the model predictive control. The setup of the experiment aimed at prediction and model predictive control of glycemia in subjects with type 1 diabetes is outlined in Sect. 7, while its results are discussed in Sect. 8.

It should be marginally mentioned that the stochastic nature of glycemia dynamics does not necessarily have to be modeled in the state space by the process noise and the measurement noise, since basic input–output transfer function models like the autoregressive-exogenous, autoregressive moving average with exogenous inputs, and Box–Jenkins model can also be used as reported in [23,24,25,26]. The stochastic part of these input–output models is usually estimated in one step together with the deterministic submodels as a result of the system identification procedure from the experimental data, so there is no need to worry about the optimality of the state estimation.

An application of the unconstrained model predictive control with the state-space model along with the state estimation based on the Kalman filter was presented in study [27]. A similar control strategy was reported in [28, 29], but due to the input–output problem formulation the state estimator was not required. As another example of similar artificial pancreas scheme, in [30] the Kalman filter was used to estimate the state for the linear model predictive control with disturbance rejection.

In our recent work [31], a novel optimal state estimator was proposed as an alternative to the Kalman filter. However, this algorithm was not based on the traditional recursive correction of the state estimate by the OESO, but it used the generalized least squares formulation of the problem. Also in this case, the optimality of the state estimate depended on the exact knowledge of the process noise model.

From the perspective of unique contributions of this paper, it is important to remark that in any of the above referenced studies or in the latest comprehensive survey papers [32,33,34], the strategy of predicting the OESO and compensating for the suboptimality of the state estimation was not proposed or discussed. It can also be concluded that most authors relied on suboptimal state estimation with ad hoc tuning of the process noise model, hence leaving a significant headroom for improving the control performance by targeting this problem.

Unlike the conventional MPC-based artificial pancreas that typically uses the suboptimal state estimator, which normally results in a correlated OESO sequence, we propose a new strategy to effectively compensate for this effect. The proposed modification involves the prediction of the OESO to directly correct the prediction of the system free response. It is worth noting that this modification is easy to embed in already existing MPC schemes [32,33,34] while inducing little more computational cost yet requiring no additional hardware modifications. In other words, the proposed OESO models and the prediction/correction strategy can eventually be retrofitted to any advanced MPC-based artificial pancreas design that utilizes the state estimator at the expense of relatively straightforward structural modification. Note that the other features of the artificial pancreas, such as the safety algorithms and constraints, do not directly interact with the proposed modification.

2 Model structure and preliminaries

The general discrete-time stochastic state-space empirical model of glycemia dynamics in subjects with type 1 diabetes is postulated as [31]

$$\begin{aligned}&x_{(k+1)}\!=\!A x_{(k)}\!+\!B \begin{bmatrix} u_{(k)} \\ d_{(k)} \end{bmatrix}\!+\!w_{(k)}, \end{aligned}$$
(1a)
$$\begin{aligned}&y_{(k)}\!=\!C x_{(k)}\!+\!v_{(k)}, \end{aligned}$$
(1b)

where \(k \in {\mathbb {N}}\) is the current sample, the output y [mmol/l] stands for the deviation of glycemia from its steady-state value, the input u [U/min] denotes the deviation of the insulin administration rate from the basal rate, and d [g/min] represents the carbohydrate intake rate input. The state vector of this nth order model is denoted \(x ~ \left[ n \! \times \! 1 \right] \), \(w ~ \left[ n \! \times \! 1 \right] \) is the process noise vector and the zero-mean uncorrelated random process \(v \sim {\mathcal {N}}\left( 0,{\mathcal {R}}\right) \) represents the measurement noise of the glucose monitoring device [35].

The parameters of model (1) include the state-transition matrix \(A ~ \left[ n \! \times \! n \right] \), the input matrix \(B ~ \left[ n \! \times \! 2 \right] \), and the output vector \(C ~ \left[ 1 \! \times \! n \right] \). The state-transition matrix A consists of the submatrices \(A^{u}\), \(A^{d}\) and the zero matrices \(\varvec{0}\) of the conforming dimensions as

$$\begin{aligned} A\!=\!\begin{pmatrix} A^{u} &{} \varvec{0} \\ \varvec{0} &{} A^{d} \end{pmatrix}, \end{aligned}$$
(2)

where matrices \(A^{u} ~ \left[ n_u \! \times \! n_u \right] \) and \(A^{d} ~ \left[ n_d \! \times \! n_d \right] \) are in the canonical form and comprise the model coefficients \(a^{\frac{u}{d}}\) such that

$$\begin{aligned} A^{u\!/\!d}\!=\!\begin{pmatrix} -a^{u\!/\!d}_1 &{} \cdots &{} -a^{u\!/\!d}_{n_{u\!/\!d}\!-\!1} &{} -a^{u\!/\!d}_{n_{u\!/\!d}} \\ 1 &{} \cdots &{} 0 &{} 0 \\ \vdots &{} &{} \vdots &{} \vdots \\ 0 &{} \cdots &{} 1 &{} 0 \end{pmatrix}. \end{aligned}$$
(3)

The input matrix B is simply

$$\begin{aligned} B=\begin{bmatrix} B^{u} &{} \varvec{0} \\ \varvec{0} &{} B^{d} \end{bmatrix}, \end{aligned}$$
(4)

where \(\varvec{0}\) are the zero vectors of conforming dimensions and \(B^{u} ~ \left[ n_u \! \times \! 1 \right] \), \(B^{d} ~ \left[ n_d \! \times \! 1 \right] \) are equal to

$$\begin{aligned} B^{u}=B^{d}=\begin{bmatrix} 1&0&\cdots&0 \end{bmatrix}^{\textrm{T}}. \end{aligned}$$
(5)

The output vector C gets

$$\begin{aligned} C=\begin{bmatrix} C^{u}&~C^{d} \end{bmatrix}, \end{aligned}$$
(6)

where \(C^{u} ~ \left[ 1 \! \times \! n_u \right] \) and \(C^{d} ~ \left[ 1 \! \times \! n_d \right] \) will comprise the model coefficients \(c^{\frac{u}{d}}\) as

$$\begin{aligned} C^{\frac{u}{d}}=\begin{bmatrix} c^{\frac{u}{d}}_1&c^{\frac{u}{d}}_2&\cdots&c^{\frac{u}{d}}_{n_{u/d}} \end{bmatrix}. \end{aligned}$$
(7)

The state vector x also holds the canonical form

$$\begin{aligned} x_{(k)}\!=\!\begin{bmatrix} x^u_{(k)}&\cdots&x^u_{(k-n_u+1)}&x^d_{(k)}&\cdots&x^d_{(k-n_d+1)} \end{bmatrix}^{\textrm{T}}, \end{aligned}$$
(8)

where \(n_u\) and \(n_d\) are the orders of the corresponding submodels, so the overall model order is \(n\!=\!n_u\!+\!n_d\). The state variables \(x^u\), \(x^d\) represent the partial effects of insulin administration and carbohydrate intake, respectively.

Consider that the process noise w in model (1) represents the effect of input uncertainties, so one can write

$$\begin{aligned} w_{(k)}\!=\!\begin{bmatrix} \gamma _{(k)}&0&\cdots&0&\delta _{(k)}&0&\cdots&0 \end{bmatrix}^{\textrm{T}}. \end{aligned}$$
(9)

The first random input \(\gamma _{(k)}\!\sim \!{\mathcal {N}}\left( 0,\sigma ^2_\gamma \right) \) reflects various unmeasurable disturbances, including physiological changes in insulin absorption and action. The second random input \(\delta _{(k)}\!\sim \!{\mathcal {N}}\left( 0,\sigma ^2_\delta \right) \) represents the uncertainty of the meal announcing induced by the patient.

Since all stochastic terms in (1) were defined as zero-mean uncorrelated stationary random processes, the covariance matrix \({\mathcal {Q}} ~ \left[ n \! \times \! n \right] \) of the process noise (9) and the variance \({\mathcal {R}}\) of the measurement noise are equal to

$$\begin{aligned}&\begin{aligned} {\mathcal {Q}}&=\textrm{cov}\left( w,w\right) =E\lbrace w_{(k)} w_{(k)}^{\textrm{T}} \rbrace \\ {}&=\textrm{diag}\begin{pmatrix} \sigma ^2_\gamma&0&\cdots&0&\sigma ^2_\delta&0&\cdots&0\end{pmatrix}, \end{aligned} \end{aligned}$$
(10)
$$\begin{aligned}&{\mathcal {R}}=\textrm{var}\left( v\right) =E\lbrace v_{(k)}^2\rbrace . \end{aligned}$$
(11)

2.1 State observer

The state vector x of model (1) is usually estimated using the recursive state observer

$$\begin{aligned} {\hat{x}}_{(k+1)}=A {\hat{x}}_{(k)}+B \begin{bmatrix} u_{(k)} \\ d_{(k)} \end{bmatrix} + K \left[ y_{(k)} - C {\hat{x}}_{(k)} \right] , \end{aligned}$$
(12)

where \({\hat{x}} ~ [n \! \times \! 1]\) is the estimated state and \(K ~ [n \! \times \! 1]\) is the gain vector, which is the subject of the observer design. The design is usually based on the optimal Kalman approach [1, 36] in the case of stochastic system assumption, or using the pole-placement method if the system is deterministic.

The state estimate residual \(e ~ \left[ n \! \times \! 1 \right] \) will be defined as

$$\begin{aligned} e_{(k)}=x_{(k)}\!-\!{\hat{x}}_{(k)}. \end{aligned}$$
(13)

Finally, the single step-ahead model output prediction error \(\epsilon \), i.e. the OESO, gets

$$\begin{aligned} \epsilon _{(k)}=y_{(k)} - C {\hat{x}}_{(k)}. \end{aligned}$$
(14)

3 Dynamics of the state observer output error

The model of the state estimate residual e can be derived by substituting \({\hat{x}}_{(k+1)}\) from (12) and \(x_{(k+1)}\) according to (1a) into (13) while substituting the output \(y_{(k+1)}\) in the terms of 1b as

$$\begin{aligned} e_{(k+1)} =\left( A\!-\!KC \right) e_{(k)} \!+\! w_{(k)} \!-\! K v_{(k)}. \end{aligned}$$
(15)

Eq. (15) can be transformed by applying the forward time-shift operator z as \(e_{(k+1)}\!=\!z e_{(k)}\) obtaining

$$\begin{aligned} z e_{(k)} =\left( A\!-\!KC \right) e_{(k)}\!+\!w_{(k)}\!-\!K v_{(k)}. \end{aligned}$$
(16)

The above equation can be reshaped to separate vector \(e_{(k)}\) as

$$\begin{aligned} e_{(k)}=\left( zI\!-\!A\!+\!KC \right) ^{-1} \left[ w_{(k)} \!-\! K v_{(k)} \right] , \end{aligned}$$
(17)

where I is the unit matrix of the conforming dimensions.

According to (14) and (17), \(\epsilon _{(k)}\) holds

$$\begin{aligned} \epsilon _{(k)}=C\left( zI\!-\!A\!+\!KC \right) ^{-1} \left[ w_{(k)}\!-\!K v_{(k)} \right] . \end{aligned}$$
(18)

Eq. (18) implies that the dynamics of the OESO is represented by a stochastic system with multiple independent noise inputs. One may realize that term \(C\left( zI\!-\!A\!+\!KC \right) ^{-1}\) results in a row vector of rational functions \(\frac{p_i(z)}{s(z)}\) with the common denominator s(z) as the characteristic polynomial of this system. This consideration yields the transfer function model

$$\begin{aligned} \epsilon _{(k)}= \begin{bmatrix} \dfrac{p_1(z)}{s(z)}&\dfrac{p_2(z)}{s(z)}&\cdots&\dfrac{p_n(z)}{s(z)} \end{bmatrix} \left[ w_{(k)} \!-\! K v_{(k)} \right] , \end{aligned}$$
(19)

where

$$\begin{aligned} s(z)=\textrm{det}\left( zI\!-\!A\!+\!KC \right) . \end{aligned}$$
(20)

If \({\mathcal {Q}}\) is the covariance matrix of the process noise according to (10), then \(w_{(k)}\) can be written as

$$\begin{aligned} w_{(k)}= \begin{pmatrix} \sqrt{{\mathcal {Q}}}&~ \varvec{0} \end{pmatrix} \varvec{\eta }_{(k)}, \end{aligned}$$
(21)

where \(\varvec{0} ~ \left[ n \! \times \! 1 \right] \) is the zero vector, \(\varvec{\eta }~ [n+1 \! \times \! 1]\) is the vector of uncorrelated noise inputs with the unit variance, i.e., \(\textrm{cov}(\varvec{\eta },\varvec{\eta })\!=\!I\),

$$\begin{aligned} \varvec{\eta }_{(k)}=\begin{bmatrix} {\eta _1}_{(k)}&{\eta _2}_{(k)}&\cdots&{\eta _{n}}_{(k)}&{\eta _{n+1}}_{(k)} \end{bmatrix}^{\textrm{T}}, \end{aligned}$$
(22)

and \(\sqrt{{\mathcal {Q}}} ~ \left[ n \! \times \! n \right] \) is the Cholesky decomposition satisfying \({\mathcal {Q}}\!=\!\sqrt{{\mathcal {Q}}} \left( \sqrt{{\mathcal {Q}}}\right) ^{\textrm{T}}\) [37]. Similarly, the measurement noise \(v_{(k)}\) can be replaced by

$$\begin{aligned} v_{(k)}= \begin{pmatrix} \varvec{0}^{\textrm{T}}&\sqrt{{\mathcal {R}}} \end{pmatrix} \varvec{\eta }_{(k)}, \end{aligned}$$
(23)

where \({\mathcal {R}}\) is the variance of the measurement noise according to (11).

Finally, model (19) can be generalized as the sum of \(n\!+\!1\) autoregressive-moving-average (ARMA) models by substituting \(w_{(k)}\) in the terms of (21) and \(v_{(k)}\) from (23) as

$$\begin{aligned} \begin{aligned} \epsilon _{(k)}&= \begin{bmatrix} \dfrac{p_1(z)}{s(z)}&\dfrac{p_2(z)}{s(z)}&\cdots&\dfrac{p_n(z)}{s(z)} \end{bmatrix} \\&\quad \times \left[ \begin{pmatrix} \sqrt{{\mathcal {Q}}}&\varvec{0} \end{pmatrix} \!-\! K \begin{pmatrix} \varvec{0}^{\textrm{T}}&\sqrt{{\mathcal {R}}} \end{pmatrix} \right] \varvec{\eta }_{(k)} \\ {}&= \begin{bmatrix} \dfrac{r_1(z)}{s(z)}&\dfrac{r_2(z)}{s(z)}&\cdots&\dfrac{r_{n+1}(z)}{s(z)} \end{bmatrix} \varvec{\eta }_{(k)} \\ {}&= \textstyle \sum \limits _{i=1}^{n+1}{ \dfrac{r_i(z)}{s(z)}{\eta _i}_{(k)} }. \end{aligned} \end{aligned}$$
(24)

Since the process noise (9) has only two nonzero components and diagonal covariance matrix (10), general model (24) can be reduced to

$$\begin{aligned} \begin{aligned} \epsilon _{(k)} =&\dfrac{p_1(z)}{s(z)} \sigma _\gamma {\eta _1}_{(k)} + \dfrac{p_{n_u+1}(z)}{s(z)} \sigma _\delta {\eta _{n_u+1}}_{(k)}\\&- \dfrac{\textstyle \sum \limits _{i=1}^{n}{K_i p_i(z)}}{s(z)}\sqrt{{\mathcal {R}}}{\eta _{n+1}}_{(k)}. \end{aligned} \end{aligned}$$
(25)

However, model (24) or (25) cannot be used to predict the OESO, since the random input vector \(\varvec{\eta }\) as well as the partial outputs \(\frac{r_i(z)}{s(z)}\) are unmeasurable in practice.

Another important paradox concerning this analytical model is that since the covariance matrix of the process noise is considered unknown in the case of suboptimal state estimation, analytical model (24) simply cannot be determined. On the contrary, if the covariance matrix of the process noise is known, what implies that the state estimator works optimally, then model (24) will be known, but it will be unnecessary since the OESO is uncorrelated and hence it cannot be predicted.

Concerning the identification of full model (24) directly from experimental data, i.e. based on the OESO sequence, this would be hardly possible primarily due to its structure and the large number of parameters to be estimated.

The aforementioned issues with predictability and identifiability are the main motivation for further considering two reduced model structures, particularly the autoregressive and the moving average model.

3.1 Optimality test

To test whether the state estimator works optimally or not, the sample autocorrelation function of the OESO sequence has to be analyzed. In the case of optimal state estimation, this autocorrelation function should show a character similar to that of Dirac delta function.

Supposing a finite-length experiment with N samples, the autocorrelation function \(R_{\epsilon \epsilon }(n T_\textrm{s})\) is estimated as [38, 39]

$$\begin{aligned} {\hat{R}}_{\epsilon \epsilon }(n T_\textrm{s})=\dfrac{1}{N-n} \textstyle \sum \limits _{i=1}^{N-n}{\epsilon _{(i)}\epsilon _{(i+n)}}, \end{aligned}$$
(26)

where \(n \in {\mathbb {Z}}\) satisfies \(n \!<\! N\).

4 Autoregressive model

In this section, the dynamics of the OESO will be approximated by the single-input single-output autoregressive model defined as

$$\begin{aligned} \epsilon _{(k)}=\dfrac{1}{q(z)}\eta _{(k)}, \end{aligned}$$
(27)

where \(\eta \!\sim \!{\mathcal {N}}\left( 0,\sigma ^2_\eta \right) \) is a random process with the properties of zero-mean white noise.

The polynomial q(z) of this \(n_\textrm{q}\)th order model gets

$$\begin{aligned} q(z)=1 + q_1 z^{-1} + q_2 z^{-2} + \cdots q_{n_\textrm{q}} z^{-n_\textrm{q}}. \end{aligned}$$
(28)

The equivalent difference equation of model (27) holds

$$\begin{aligned} \epsilon _{(k)}=\eta _{(k)} - \textstyle \sum \limits _{i=1}^{n_\textrm{q}}{q_i \epsilon _{(k-i)} }. \end{aligned}$$
(29)

The parameter vector \(\varvec{q}\) gets

$$\begin{aligned} \varvec{q}=\begin{bmatrix} q_1&q_2&\cdots&q_{n_\textrm{q}} \end{bmatrix}^{\textrm{T}}. \end{aligned}$$
(30)

4.1 Identification strategy

According to difference equation (29), the corresponding linear regression system considering N available samples takes

$$\begin{aligned} \begin{aligned} \begin{pmatrix} \epsilon _{(1)} \\ \epsilon _{(2)} \\ \vdots \\ \epsilon _{(n_\textrm{q}+1)} \\ \vdots \\ \epsilon _{(N)} \end{pmatrix} =&-\begin{pmatrix} \epsilon _{(0)} &{} 0 &{} \ldots &{} 0 \\ \epsilon _{(1)} &{} \epsilon _{(0)} &{} \ldots &{} 0 \\ \vdots &{} \vdots &{} &{} \vdots \\ \epsilon _{(n_\textrm{q})} &{} \epsilon _{(n_\textrm{q}-1)} &{} \ldots &{} \epsilon _{(0)} \\ \vdots &{} \vdots &{} &{} \vdots \\ \epsilon _{(N-1)} &{} \epsilon _{(N-2)} &{} \ldots &{} \epsilon _{(N-n_\textrm{q})} \end{pmatrix}\\&\times \begin{pmatrix} q_1 \\ q_2 \\ q_3 \\ \vdots \\ q_{n_\textrm{q}} \end{pmatrix} + \begin{pmatrix} \eta _{(1)} \\ \eta _{(2)} \\ \vdots \\ \eta _{(n_\textrm{q}+1)} \\ \vdots \\ \eta _{(N)} \end{pmatrix}, \end{aligned} \end{aligned}$$
(31)

or using the shorthand notation

$$\begin{aligned} \varvec{\epsilon }=H_\mathrm{{AR}}\varvec{q}+\varvec{\eta }, \end{aligned}$$
(32)

where \(\varvec{q}\) is the parameter vector (30), \(H_\mathrm{{AR}} ~ \left[ N \! \times \! n_\textrm{q} \right] \) is the regression matrix and \(\varvec{\epsilon } ~ \left[ N \! \times \! 1 \right] \), \(\varvec{\eta } ~ \left[ N \! \times \! 1 \right] \) are vectors.

The parameter vector \(\varvec{q}\) can be estimated as \(\hat{\varvec{q}}\) in a straightforward way using the least squares method with the optimal parameter estimate determined analytically as [40]

$$\begin{aligned} \hat{\varvec{q}}=\left( H_\mathrm{{AR}}^{\textrm{T}} H_\mathrm{{AR}} \right) ^{-1}H_\mathrm{{AR}}^{\textrm{T}} \varvec{\epsilon }. \end{aligned}$$
(33)

4.2 Predictive form

For model (27), the explicit prediction formula can be derived based on difference equation (29). The future values of the white noise input are obviously unknown, so assuming that its statistically unbiased prediction is zero, i.e. \(E\left\{ \eta _{(k+i)}\right\} \!=\!0\), the predictive form considering the prediction horizon \(n_\textrm{p}\) gets

$$\begin{aligned} \hat{\varvec{\epsilon }}_f= -{M^\epsilon _f(\varvec{q})}^{-1} M^\epsilon _\textrm{p}(\varvec{q}) \varvec{\epsilon }_\textrm{p}, \end{aligned}$$
(34)

where vectors \(\varvec{\epsilon }_\textrm{p} ~ \left[ n_\textrm{q} \! \times \! 1 \right] \) and \(\hat{\varvec{\epsilon }}_f ~ \left[ n_\textrm{p} \! \times \! 1 \right] \) are defined as

$$\begin{aligned} \varvec{\epsilon }_\textrm{p}=\begin{bmatrix} \epsilon _{(k)}&\epsilon _{(k-1)}&\epsilon _{(k-2)}&\cdots&\epsilon _{(k-n_\textrm{q}+1)} \end{bmatrix}^{\textrm{T}}, \end{aligned}$$
(35)
$$\begin{aligned} \hat{\varvec{\epsilon }}_f=\begin{bmatrix} {\hat{\epsilon }}_{(k+1)}&{\hat{\epsilon }}_{(k+2)}&{\hat{\epsilon }}_{(k+3)}&\cdots&{\hat{\epsilon }}_{(k+n_\textrm{p})} \end{bmatrix}^{\textrm{T}}, \end{aligned}$$
(36)

and the matrices \(M^\epsilon _f ~ \left[ n_\textrm{p} \! \times \! n_\textrm{p} \right] \), \(M^\epsilon _\textrm{p} ~ \left[ n_\textrm{p} \! \times \! n_\textrm{q} \right] \) comprise the elements of vector \(\varvec{q}\) (30) as

$$\begin{aligned}&M^\epsilon _f(\varvec{q})= \begin{pmatrix} 1 &{} 0 &{} \cdots &{} 0 &{} \cdots &{} 0 \\ q_{1} &{} 1 &{} \cdots &{} 0 &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} &{} \vdots &{} &{} \vdots \\ q_{n_\textrm{q}} &{} q_{n_\textrm{q}-1} &{} \cdots &{} 1 &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} &{} \vdots &{} &{} \vdots \\ 0 &{} 0 &{} \cdots &{} q_{n_\textrm{q}} &{} \cdots &{} 1 \\ \end{pmatrix}, \end{aligned}$$
(37)
$$\begin{aligned}&M^\epsilon _\textrm{p}(\varvec{q})= \begin{pmatrix} q_{1} &{} q_{2} &{} \cdots &{} q_{n_\textrm{q}-1} &{} q_{n_\textrm{q}} \\ q_{2} &{} q_{3} &{} \cdots &{} q_{n_\textrm{q}} &{} 0 \\ \vdots &{} \vdots &{} &{} \vdots &{} \vdots \\ q_{n_\textrm{q}} &{} 0 &{} \cdots &{} 0 &{} 0 \\ \vdots &{} \vdots &{} &{} \vdots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} 0 &{} 0 \\ \end{pmatrix}. \end{aligned}$$
(38)

Since the noise input in (27) is unmeasurable, by reshaping equation (29), \(\eta _{(k)}\) can be estimated as

$$\begin{aligned} {\hat{\eta }}_{(k)} = \epsilon _{(k)} + \textstyle \sum \limits _{i=1}^{n_\textrm{q}}{q_i \epsilon _{(k-i)} }. \end{aligned}$$
(39)

5 Moving average model

In this section, the dynamics of the OESO will be approximated by the moving average model

$$\begin{aligned} \epsilon _{(k)}=g(z)\eta _{(k)}, \end{aligned}$$
(40)

where \(\eta \!\sim \!{\mathcal {N}}\left( 0,\sigma ^2_\eta \right) \) is the zero-mean white noise input.

The polynomial g(z) in the \(n_\textrm{g}\)th order model (40) gets

$$\begin{aligned} g(z)= 1 + g_1 z^{-1} + g_2 z^{-2} + \cdots g_{n_\textrm{g}} z^{-n_\textrm{g}}. \end{aligned}$$
(41)

The difference equation for model (40) can be written as

$$\begin{aligned} \epsilon _{(k)}= \eta _{(k)} + \textstyle \sum \limits _{i=1}^{n_\textrm{g}}{g_i \eta _{(k-i)} }. \end{aligned}$$
(42)

The parameter vector \( \varvec{g} ~ \left[ n_\textrm{g} \! \times \! 1 \right] \) \(g_i\) gets

$$\begin{aligned} \varvec{g}=\begin{bmatrix} g_1&g_2&\cdots&g_{n_\textrm{g}} \end{bmatrix}^{\textrm{T}}. \end{aligned}$$
(43)

5.1 Identification strategy

It is well known that estimating the moving average processes is more difficult than estimating the autoregressive processes [41]. Since the input noise \(\eta \) is unmeasurable in practice, the straightforward approach based on the least squares minimization of the model single step-ahead prediction error cannot be directly applied in this case.

Therefore, to estimate the coefficient vector (43) using the available OESO sequence, the two-step method of Durbin [41, 42] will be adopted. The first step of this method consists of fitting an autoregressive model to the OESO sequence via the ordinary least squares method in the terms of Sect. 4. In the second step, the identified autoregressive model is used to estimate the input noise \(\eta \) by filtering the OESO sequence by the inverse of the estimated autoregressive model according to (39).

The second step uses this estimated input noise sequence \({\hat{\eta }}\) to create the regression system and to estimate the parameters of the moving average process in the least squares sense.

The corresponding regression system then gets

$$\begin{aligned} \begin{aligned} \begin{pmatrix} \epsilon _{(1)} \\ \epsilon _{(2)} \\ \vdots \\ \epsilon _{(n_\textrm{g}+1)} \\ \vdots \\ \epsilon _{(N)} \end{pmatrix} =&\begin{pmatrix} {\hat{\eta }}_{(0)} &{} 0 &{} \ldots &{} 0 \\ {\hat{\eta }}_{(1)} &{} {\hat{\eta }}_{(0)} &{} \ldots &{} 0 \\ \vdots &{} \vdots &{} &{} \vdots \\ {\hat{\eta }}_{(n_\textrm{g})} &{} {\hat{\eta }}_{(n_\textrm{g}-1)} &{} \ldots &{} {\hat{\eta }}_{(0)} \\ \vdots &{} \vdots &{} &{} \vdots \\ {\hat{\eta }}_{(N-1)} &{} {\hat{\eta }}_{(N-2)} &{} \ldots &{} {\hat{\eta }}_{(N-n_\textrm{g})} \end{pmatrix}\\&\times \begin{pmatrix} g_1 \\ g_2 \\ g_3 \\ \vdots \\ g_{n_\textrm{g}} \end{pmatrix} + \begin{pmatrix} \eta _{(1)} \\ \eta _{(2)} \\ \vdots \\ \eta _{(n_\textrm{q}+1)} \\ \vdots \\ \eta _{(N)} \end{pmatrix}, \end{aligned} \end{aligned}$$
(44)

or using the shorthand notation,

$$\begin{aligned} \varvec{\epsilon }=H_{MA}\varvec{g}+\varvec{\eta }, \end{aligned}$$
(45)

where \(\varvec{g}\) is the parameter vector (43), \(H_{MA} ~ \left[ N \! \times \! n_\textrm{g} \right] \) is the regression matrix and \(\varvec{\epsilon } ~ \left[ N \! \times \! 1 \right] \), \(\varvec{\eta } ~ \left[ N \! \times \! 1 \right] \) are vectors.

The optimal parameter vector \(\varvec{g}\) can be estimated as

$$\begin{aligned} \hat{\varvec{g}}=\left( H_{MA}^{\textrm{T}} H_{MA} \right) ^{-1}H_{MA}^{\textrm{T}} \varvec{\epsilon }. \end{aligned}$$
(46)

5.2 Predictive form

Having the model parameters estimated, the OESO (36) can be predicted. Assuming that the statistically unbiased prediction of the input zero-mean white noise is zero, the predictive form of the moving average model (40) can be derived according to the difference equation (42) as

$$\begin{aligned} \hat{\varvec{\epsilon }}_f= M^\eta _\textrm{p}(\varvec{g}) \hat{\varvec{\eta }}_\textrm{p}, \end{aligned}$$
(47)

where matrix \(M^\eta _\textrm{p} ~ \left[ n_\textrm{p} \! \times \! n_\textrm{g} \right] \) is formed by the elements of vector \(\varvec{g}\) (43) such that

$$\begin{aligned} M^\eta _\textrm{p}(\varvec{g})= \begin{pmatrix} g_{1} &{} g_{2} &{} \cdots &{} g_{n_\textrm{g}-1} &{} g_{n_\textrm{g}} \\ g_{2} &{} g_{3} &{} \cdots &{} g_{n_\textrm{g}} &{} 0 \\ \vdots &{} \vdots &{} &{} \vdots &{} \vdots \\ g_{n_\textrm{g}} &{} 0 &{} \cdots &{} 0 &{} 0 \\ \vdots &{} \vdots &{} &{} \vdots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} 0 &{} 0 \\ \end{pmatrix}, \end{aligned}$$
(48)

and vector \(\hat{\varvec{\eta }}_\textrm{p} ~ \left[ n_\textrm{g} \! \times \! 1 \right] \) comprises the estimated past values of the noise input

$$\begin{aligned} \hat{\varvec{\eta }}_\textrm{p}=\begin{bmatrix} {\hat{\eta }}_{(k)}&{\hat{\eta }}_{(k-1)}&{\hat{\eta }}_{(k-2)}&\cdots&{\hat{\eta }}_{(k-n_\textrm{g}+1)} \end{bmatrix}^{\textrm{T}}. \end{aligned}$$
(49)

In practice, the input noise \(\eta \) cannot be measured, so it has to be estimated based on the inverse filtering of \(\epsilon \) according to the difference equation (42) as

$$\begin{aligned} {\hat{\eta }}_{(k)}= \epsilon _{(k)} - \textstyle \sum \limits _{i=1}^{n_\textrm{g}}{g_i {\hat{\eta }}_{(k-i)} }. \end{aligned}$$
(50)

6 Prediction and model predictive control with the OESO compensation

In this section, the algorithm of model predictive control will be adopted from [31], while an important modification that concerns the prediction of the OESO will be proposed.

Prediction of the state vector x and the output y can be expressed considering (1) as

$$\begin{aligned}&{\hat{x}}_{(k+i)}=A^i {\hat{x}}_{(k)}+ \textstyle \sum \limits _{j=1}^{i}{A^{i-j}B \begin{bmatrix} u_{(k+j-1)} \\ d_{(k+j-1)} \end{bmatrix} } , \end{aligned}$$
(51)
$$\begin{aligned}&{\hat{y}}_{(k+i)}=C {\hat{x}}_{(k+i)}+{\hat{\epsilon }}_{(k+i)}, \end{aligned}$$
(52)

where \(k \in {\mathbb {N}}\) is the current sample and \(i \in {\mathbb {N}}\) gets \(i=1 \cdots n_\textrm{p}\), while assuming \(n_\textrm{p} \in {\mathbb {N}}\) is the prediction horizon.

Notice that in (52), the OESO \({\hat{\epsilon }}\), which was predicted by the identified autoregressive or the moving average model, was taken into account by correcting the output prediction \({\hat{y}}\). This is the most important modification of the traditional prediction and predictive control algorithms as it allows us to effectively compensate for the suboptimality of the state estimation.

The predictive control minimizes the quadratic cost function of the model-based predictions of chosen system variables over the prediction horizon \(n_\textrm{p}\). The corresponding quadratic form gets [43, 44]

$$\begin{aligned} J(\Delta u_f)= \Delta u_f^{\textrm{T}} \varvec{A} \Delta u_f + 2\varvec{b}^{\textrm{T}} \Delta u_f + c, \end{aligned}$$
(53)

where \(\Delta u_f ~ \left[ n_\textrm{c} \! \times \! 1 \right] \) is the vector of future changes of the manipulated variable, while assuming \(n_\textrm{c}\) is the control horizon. Matrix \(\varvec{A} ~ \left[ n_\textrm{c} \! \times \! n_\textrm{c} \right] \), vector \(\varvec{b} ~ \left[ n_\textrm{c} \! \times \! 1 \right] \) and scalar c are defined as

$$\begin{aligned}&\varvec{A} = H_f^{\textrm{T}} H_f + I \lambda ^u, \end{aligned}$$
(54a)
$$\begin{aligned}&\varvec{b}^{\textrm{T}} = -\left( y_\textrm{r} - {\hat{y}}_{\textrm{free}} \right) ^{\textrm{T}} H_f, \end{aligned}$$
(54b)
$$\begin{aligned}&c = \left( y_\textrm{r} - {\hat{y}}_{\textrm{free}}\right) ^{\textrm{T}} \left( y_\textrm{r} - {\hat{y}}_{\textrm{free}}\right) , \end{aligned}$$
(54c)

where \(y_\textrm{r} ~ \left[ n_\textrm{p} \! \times \! 1 \right] \) is the reference vector, \({\hat{y}}_{\textrm{free}} ~ \left[ n_\textrm{p} \! \times \! 1 \right] \) is the system free response vector, and scalar \(\lambda ^u\) denotes the weight of the manipulated variable increments penalty.

The free response prediction \({\hat{y}}_{\textrm{free}}\) in (54) gets

$$\begin{aligned} \begin{aligned} {\hat{y}}_{\textrm{free}} =&\begin{pmatrix} C A \\ C A^2 \\ \vdots \\ C A^{n_\textrm{p}} \end{pmatrix} {\hat{x}}_{(k)} + {\mathcal {B}}^u \begin{pmatrix} u_{(k - 1)} \\ u_{(k - 1)} \\ \vdots \\ u_{(k - 1)} \end{pmatrix} \\&+ {\mathcal {B}}^d \begin{pmatrix} d_{(k)} \\ d_{(k+1)} \\ \vdots \\ d_{(k + n_\textrm{p} - 1)} \end{pmatrix} + \begin{pmatrix} {\hat{\epsilon }}_{(k + 1)} \\ {\hat{\epsilon }}_{(k + 2)} \\ \vdots \\ {\hat{\epsilon }}_{(k + n_\textrm{p})} \end{pmatrix}. \end{aligned} \end{aligned}$$
(55)

Matrices \({\mathcal {B}}^u ~ \left[ n_\textrm{p} \! \times \! n_\textrm{p} \right] \) and \({\mathcal {B}}^d ~ \left[ n_\textrm{p} \! \times \! n_\textrm{p} \right] \) get the lower triangular form

$$\begin{aligned} \begin{aligned} {\mathcal {B}}^{\frac{u}{d}} =&\left( \begin{array}{cc} C^{\frac{u}{d}} B^{\frac{u}{d}} &{} 0 \\ C^{\frac{u}{d}} ( A^{\frac{u}{d}} ) B^{\frac{u}{d}} &{} C^{\frac{u}{d}} B^{\frac{u}{d}} \\ \vdots &{} \vdots \\ C^{\frac{u}{d}} ( A^{\frac{u}{d}} )^{ n_\textrm{p} - 1} B^{\frac{u}{d}} &{} C^{\frac{u}{d}} ( A^{\frac{u}{d}} )^{ n_\textrm{p} - 2} B^{\frac{u}{d}} B^{\frac{u}{d}} \end{array}\right. \\&\qquad \qquad \qquad \left. \begin{array}{c@{~~~}c} \ldots &{} 0 \\ \ldots &{} 0 \\ &{} \vdots \\ \ldots &{} C^{\frac{u}{d}} B^{\frac{u}{d}} \end{array}\right) , \end{aligned} \end{aligned}$$
(56)

where matrices \(A^{\frac{u}{d}}\), \(B^{\frac{u}{d}}\), \(C^{\frac{u}{d}}\) were defined by (3), (5), (7), respectively.

The forced response matrix \(H_f ~ \left[ n_\textrm{p} \! \times \! n_\textrm{c} \right] \) in (54) gets

$$\begin{aligned} H_f= {\mathcal {B}}^u \varPsi ~~ \text {where} ~~ \varPsi _{ij} = {\left\{ \begin{array}{ll} 1, &{} \text {for} ~ j \le i,\\ 0, &{} \text {for} ~ j>i.\\ \end{array}\right. } \end{aligned}$$
(57)

The optimization problem (53) can be solved by quadratic programming if linear inequalities constraints are considered. For the sake of simplicity, the elements of the reference vector \(y_\textrm{r}\) are equal to an appropriately chosen constant \(G_\textrm{t}\) representing the target glycemia. Pursuing the receding horizon strategy, only the first element of the optimal solution \(\Delta u_f\) is actually applied, so one can write

$$\begin{aligned} u_{(k)}=u_{(k-1)}+\begin{bmatrix} 1&0&\cdots&0 \end{bmatrix} \Delta u_f. \end{aligned}$$
(58)

Note that the manipulated variable has to be constrained, so the minimal insulin infusion rate \(u_{\min }\!=\!0\) U/min, while \(u_{\max }\) will be adopted from [27]. The corresponding linear inequalities system with respect to the decision vector \(\Delta u_f\) can be formed by involving matrix \(\varPsi \) (57) as in [45]

$$\begin{aligned} \begin{bmatrix} -\varPsi \\ +\varPsi \end{bmatrix} \! \Delta u_f \! \le \! \begin{bmatrix} \begin{pmatrix} -1 &{} -1 &{} \ldots &{} -1 \end{pmatrix}^{\textrm{T}} \left( u_{\min }\!-\!u_{(k-1)}\right) \\ \begin{pmatrix} +1 &{} +1 &{} \ldots &{} + 1 \end{pmatrix}^{\textrm{T}} \left( u_{\max }\!-\!u_{(k-1)}\right) \end{bmatrix}. \end{aligned}$$
(59)

Compared to [31], the crucial modification of the control algorithm is made here by adding the prediction of the OESO \({\hat{\epsilon }}_{(k+i)}\) to equation (52).

Concerning the safety features of automated insulin therapy, various additional strategies could be considered to enhance the current configuration. To avoid the adverse and dangerous insulin stacking phenomenon, the insulin on board [46,47,48] representing the amount of insulin still active from the previously administered doses can be involved. The dynamics of insulin on board can be represented by simple linear models like in [49, 50] or [51], while this signal can be used to form a special quadratic penalty that is added to the cost function (53) of the MPC. Alternatively, hard constraints for the insulin on board signal can also be assumed by extending the linear inequalities system (59) of the quadratic program accordingly. Another type of safety feature is to hard constrain the controlled variable by modifying the inequalities system (59) to prevent the risk of hypoglycemia and hyperglycemia as proposed in [26]. As this strategy can potentially induce control infeasibility, soft formulation of the controlled variable constraints should be preferred, as proposed in [52, 53]. However, further details are beyond the scope of this paper. For information on safety features, see, e.g. [46, 48, 54]. Also note that these safety features do not directly interact with the proposed strategy of predicting the output error of the suboptimal state estimator, which is the main contribution of the paper.

7 Experimental setup

To validate the proposed strategy and assess its practical effectiveness in application to the problem of prediction and predictive control of glycemia in subjects with type 1 diabetes, a simulation-based experiment was designed and evaluated.

The glycemia response for this experiment was obtained by in-silico approach, simulating the complex physiology-based nonlinear model that was described in [55, 56] and the references therein. The basal state of this model was determined with respect to the basal glycemia \(G_b\!=\!6\) mmol/l and the corresponding basal insulin administration rate \(v_b\!=\!0.01\) U/min.

The orders of empirical model (1) were chosen as \(n_u\!=\!n_d\!=\!4\), implying the overall order \(n\!=\!8\). Note that the theory related to estimation of the model parameters \(a_i^{\frac{u}{d}}\) in (3) and \(c_i^{\frac{u}{d}}\) in (7), is not within the scope of this paper, so we suppose that model (1) was identified with parameters (60). However, for more details on this topic, we refer an interested reader to our recent works [25, 57].

$$\begin{aligned}&A^{u}\!=\!\begin{bmatrix} 3.2802 &{} -4.0149 &{} 2.1732 &{} -0.4389 \\ 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{bmatrix}, \end{aligned}$$
(60a)
$$\begin{aligned}&C^{u}=\begin{bmatrix} -0.0115&-0.0260&-0.0194&-0.0048 \end{bmatrix}, \end{aligned}$$
(60b)
$$\begin{aligned}&A^{d}\!=\!\begin{bmatrix} 3.0195 &{} -3.3977 &{} 1.6901 &{} -0.3138 \\ 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{bmatrix}, \end{aligned}$$
(60c)
$$\begin{aligned}&C^{d}=\begin{bmatrix} 0.0320&0.0061&0.0003&0 \end{bmatrix}. \end{aligned}$$
(60d)

The prediction horizon and the control horizon were assumed as \(n_\textrm{p}\!=\!15\), \(n_\textrm{c}\!=\!10\), while the sample time was chosen as \(T_\textrm{s}\!=\!10\) min.

The variance of the measurement noise 11 and the variances of the process noise 10 were empirically adjusted as

$$\begin{aligned} {\mathcal {R}}\!=\!0.01,~~ \sigma ^2_\gamma \!=\! 0.01,~~ \sigma ^2_\delta \!=\! 0.2. \end{aligned}$$
(61)

Note that since the variances of the process noise were just empirically tuned to obtain acceptable performance of the Kalman filter while the actual values cannot be determined because they are not based on any particular physiological mechanisms or characteristics of a diabetic subject, the state estimator will perform only suboptimally in this case, and hence the OESO sequence will be correlated.

The observer gain vector K was calculated according to the Kalman filter design [1, 36] while considering the model parameters (60) and the noise model parameters (61) as

$$\begin{aligned} \begin{aligned} K=&\big [\begin{matrix} -0.731&-0.685&-0.634&-0.576 \end{matrix} \\&\qquad \begin{matrix}0.998&0.899&0.778&0.641 \end{matrix}\big ]^{\textrm{T}}. \end{aligned} \end{aligned}$$
(62)

Concerning the initial tuning of the proposed empirical models of the OESO, the order of autoregressive model (27) was set as \(n_\textrm{q}=4\), whereas the order of moving average model (40) was chosen as \(n_\textrm{g}=12\).

The experiments were designed to mimic the insulin treatment of a subject with type 1 diabetes during the two-day period. The first investigated problem concerns the prediction of glycemia during standard insulin bolus therapy that was carried out according to the bolus calculator rule (see [58] and the references therein). The second deals with automated insulin dosing managed by the model predictive control algorithm of the artificial pancreas. Both algorithms were modified in the terms of Sect. .

8 Discussion

In this section, the results of the outlined simulation experiment will be comprehensively analyzed and discussed.

The sequence of the OESO obtained in the terms of equation (14) during regular insulin treatment while simultaneously performing the state estimation according to (12) by considering the observer gain (62) is plotted in Fig. 1. This figure suggests that although the state estimate asymptotically converges to the actual state, the character of the OESO sequence is far from ideal uncorrelated noise.

To prove this, the autocorrelation function \({\hat{R}}_{\epsilon \epsilon }\) of the OESO was estimated according to (26) by processing the sequence from Fig. 1 and is plotted in Fig. 2. Analyzing this autocorrelation function, one can conclude that the OESO sequence is correlated, what confirms that the state estimator works suboptimally due to the empirically adjusted variances of the process noise (61).

Fig. 1
figure 1

Evolution of the OESO \(\epsilon _{(k)}\) acquired during the experiment

Fig. 2
figure 2

Estimated autocorrelation function \({\hat{R}}_{\epsilon \epsilon }(n T_\textrm{s})\) of the OESO sequence from Fig. 1

Performing the estimation of both reduced models of the OESO by pursuing the strategies presented in Sections 4 and 5, the following coefficients were estimated.

Parameter vector \(\varvec{q}\) of the autoregressive model (27):

$$\begin{aligned} \hat{\varvec{q}}= \left[ \begin{array}{ccccc} 1&-0.733&-0.117&0.094&-0.065 \end{array} \right] ^{\textrm{T}}. \end{aligned}$$
(63)

Parameter vector \(\varvec{g}\) of the moving average model (40):

$$\begin{aligned} \begin{aligned} \hat{\varvec{g}} =&\big [ \begin{matrix} 0.98&0.72&0.65&0.46&0.43&0.36&0.24 \end{matrix}\\&\qquad \begin{matrix}0.31&0.32&0.29&0.15&0.11&0.08 \end{matrix} \big ]^{\textrm{T}}. \end{aligned} \end{aligned}$$
(64)

To validate the models by filtering the correlated OESO sequence \(\epsilon \) by the inverse of each of the identified empirical models, i.e. by q(z) and \(\frac{1}{g(z)}\), respectively, the input noise sequence was estimated according to (39) for the autoregressive model and according to (50) for the moving average model. The autocorrelation functions \({\hat{R}}_{{\hat{\eta }}{\hat{\eta }}}(n T_\textrm{s})\) of these sequences for both model structures are depicted in Figs.  3 and 4, which show their Dirac delta function-like nature, proving the estimated models valid.

Fig. 3
figure 3

Estimate of the autocorrelation function \({\hat{R}}_{{\hat{\eta }}{\hat{\eta }}}(n T_\textrm{s})\) of the estimated noise input for the autoregressive model

Fig. 4
figure 4

Estimate of the autocorrelation function \({\hat{R}}_{{\hat{\eta }}{\hat{\eta }}}(n T_\textrm{s})\) of the estimated noise input for the moving average model

Now follows the prediction of the OESO using the predictive form (34) of the autoregressive model and the predictive form (47) of the moving average model respectively. Relatively accurate predictions for randomly chosen starting points can be observed in Fig. 5. Both model structures showed almost identical performances and could predict the future evolution of the correlated OESO with a satisfying accuracy considering the highly stochastic nature of this signal.

Fig. 5
figure 5

Prediction of the OESO using the both autoregressive model (\(\hat{\varvec{\epsilon }}_f^\mathrm{{AR}}\)) and the moving average model (\(\hat{\varvec{\epsilon }}_f^\mathrm{{MA}}\))

The next comparison concerns the practical impact of correcting the glycemia prediction by predicted OESO as the original contribution of the paper. In Fig. 6, one can see the uncorrected prediction of glycemia (\({\hat{G}}\)) as the conventional strategy, as well as the predictions that involved the corrections by OESO predicted using the autoregressive (\({\hat{G}}^\mathrm{{AR}}\)) and the moving average (\({\hat{G}}^\mathrm{{MA}}\)) model. By a basic visual assessment, one can observe an improvement of the prediction accuracy, while the improvements concerned primarily the peaks of the response. Keep in mind that such differences between the uncorrected and the corrected prediction can be critical in situations such as decision making with regard to the application of insulin therapy.

In addition to the graphical assessment, the prediction performance will be quantified by the quadratic metric

$$\begin{aligned} Q_\textrm{p}=\dfrac{1}{N}\textstyle \sum \limits _{i=0}^{N}{ \left[ y_{(i)}-{\hat{y}}_{(i)} \right] ^2}, \end{aligned}$$
(65)

which will provide a better assessment of the prediction performance.

The last part of the experiment is focused on the model predictive control of glycemia in the context of the artificial pancreas implementation, where a positive effect of the proposed predictors on control performance is anticipated. To demonstrate this, Fig. 7 shows the closed loop glycemia response, where involving the predictions of the OESO visibly improved the control performance in terms of tighter control with respect to the reference value, and reduced maximal and minimal observed glycemia, what is especially significant to reduce the risk of hyperglycemia and hypoglycemia. It can be concluded that both predictors performed almost identically, but way better than in the original case without compensating for the OESO.

Fig. 6
figure 6

Prediction of glycemia without the OESO compensation G(t) compared to using the autoregressive model \(G^\mathrm{{AR}}(t)\) and the moving average model \(G^\mathrm{{MA}}(t)\)

Fig. 7
figure 7

Predictive control of glycemia without the OESO compensation G(t) compared to using the autoregressive model \(G^\mathrm{{AR}}(t)\) and the moving average model \(G^\mathrm{{MA}}(t)\)

Keep in mind that since typical values of the OESO are relatively low compared to the magnitude of the controlled variable (see Fig. 1), the proposed strategy can naturally yield a limited effect. It can also be claimed that the strength of the desired effect is directly related to the performance level of the state observer and thus to the degree of mismatch between the process noise model and the actual statistical properties of the system. Therefore, for systems with an empirically tuned covariance matrix of the process noise, the strategy proposed in this paper is highly recommended.

The control performance will be quantified by the maximal \(G_{\max }\) and the minimal \(G_{\min }\) observed glycemia, and by the quadratic metric

$$\begin{aligned} Q_\textrm{c}=\dfrac{1}{N}\textstyle \sum \limits _{i=0}^{N}{ \left[ y_{(i)}-G_\textrm{t} \right] ^2}, \end{aligned}$$
(66)

where \(G_\textrm{t}\) is the target glycemia.

The summary of prediction and control performance metrics obtained during the experiment is documented in Table 1, which also confirm the observations from Figs. 6 and 7.

Table 1 Comparison of the prediction and control performance metrics

To investigate the effect of the choice of the tunable parameters of the OESO models, particularly the order of the autoregressive model (27) \(n_\textrm{q}\) and the order of the moving average model (40) \(n_\textrm{g}\) on the resulting performance of the proposed strategy, the experimentation was repeated under various configurations, yielding the results summarized in Tables 2 and 3.

It can be concluded that the moving average model of the OESO performs slightly better in both prediction and predictive control than the autoregressive model, while the choice of the corresponding model orders \(n_\textrm{q}\), \(n_\textrm{g}\) also affected the overall performance, yet not consistently for all the metrics considered. However, all studied models performed better than the original uncompensated configuration.

Table 2 Comparison of the prediction and control performance metrics for the autoregressive model
Table 3 Comparison of the prediction and control performance metrics for the moving average model

9 Conclusions

This study stressed that the OESO sequence in the case of suboptimal state estimation is correlated, while it can be effectively predicted by the autoregressive or the moving average models. These two reduced models could provide good predictability and identifiability from the experimental data. The predicted OESO was then involved to correct the output variable prediction and hence ultimately improve the performance of the model predictive control. It can be concluded that the presented strategy allowed to effectively compensate for the suboptimality of the state estimation caused by the inaccuracy of the process noise model in a relatively inexpensive and feasible way.

We also obtained theoretical results demonstrating that the actual dynamics of the OESO is analytically described as the sum of ARMA processes, while this full structure was approximated by the autoregressive and the moving average models in practice.

A promising application of our results would be possible within an implementation of the artificial pancreas in subjects with type 1 diabetes, where maximizing the performance of the model predictive control is of highest priority. The presented simulation-based experiment demonstrated that the dynamics of the OESO can be effectively predicted by both proposed reduced models, while there were documented positive effects on the accuracy of the glycemia prediction and on the performance of the predictive control. Keep in mind that although qualitative improvements do not appear to be significant at first sight, any feasible improvement matters a lot when managing glycemia and can have significant long-term consequences for patient health.

Compared with the conservative structure of the MPC-based artificial pancreas [6, 32,33,34] that utilizes the Kalman filter state estimation with typically empirically tuned covariance matrix of the process noise, which normally results in its suboptimal performance, the strategy proposed in this paper additionally involves an easily identifiable prediction model of the OESO to effectively compensate for the adverse effect of suboptimality of the state estimation by correcting the system free response prediction. Moreover, this structural modification is not computationally demanding and it can be easily embedded to the existing modular MPC schemes [6, 32,33,34] without involving any hardware adjustments or directly interacting with other features of the artificial pancreas such as constraints.

On the contrary, compared with the alternative solution based on methodological estimation of the covariance matrix of the process noise according to the methods presented in [18,19,20,21,22] to implicitly ensure the optimality of the state estimation, it can be claimed that these methods typically require very large experimental datasets (tens of thousands of samples) to provide reliable and unbiased estimates, whereas the statistical models of the OESO proposed in this paper can be identified from relatively limited experimental data (hundreds of samples).

In a nutshell, the most significant contributions presented in this work include the derivation of the analytical stochastic ARMA model of the OESO and its approximations which can be used to improve the performance of the suboptimal state observer in applications when the process noise model is not exactly known or is uncertain. It can be concluded that the actual effect of the proposed strategy depends primarily on the degree of plant-model mismatch of the noise model.