Much of what we have described in the preceding chapters provides the basic tools necessary to build physiological state-space estimators. In this chapter, we will briefly review some additional concepts in state-space estimation, a non-traditional method of estimation, and some supplementary models. These may help serve as pointers if extensions are to be built to the models already described.

9.1 State-Space Model with a Time-Varying Process Noise Variance Based on a GARCH(p, q) Framework

Thus far, we have not considered time-varying model parameters. In reality, the human body is not static. Instead it undergoes changes from time to time (e.g., due to disease conditions, adaptation to new environments). In this section, we will consider a state equation of the form

$$\displaystyle \begin{aligned} x_{k} &= x_{k - 1} + \varepsilon_{k} \end{aligned} $$
(9.1)

where \(\varepsilon _{k} \sim \mathcal {N}(0, \sigma ^{2}_{\varepsilon , k})\). Note that the process noise variance now depends on the time index k. Here we will use concepts from the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) framework to model \(\varepsilon _{k}\). In a general GARCH(p, q) framework, we take

$$\displaystyle \begin{aligned} \varepsilon_{k} &= h_{k}\nu_{k}, \end{aligned} $$
(9.2)

where \(\nu _{k} \sim \mathcal {N}(0, 1)\) and

$$\displaystyle \begin{aligned} h^{2}_{k} &= \alpha_{0} + \sum_{i = 1}^{q}\alpha_{i}\varepsilon^{2}_{k - i} + \sum_{j = 1}^{p}\beta_{j}h^{2}_{k - j}, {} \end{aligned} $$
(9.3)

where the \(\alpha _{i}\)’s and \(\beta _{j}\)’s are coefficients to be determined. Now, conditioned on having observed all the sensor readings up to time index \((k - 1)\), we have

$$\displaystyle \begin{aligned} \mathbb{E}[\varepsilon_{k}] &= \mathbb{E}[h_{k}v_{k}] = h_{k}\mathbb{E}[v_{k}] = h_{k} \times 0 = 0 \end{aligned} $$
(9.4)

and

$$\displaystyle \begin{aligned} \sigma^{2}_{\varepsilon, k} &= V(\varepsilon_{k}) = V(h_{k}\nu_{k}) = h^{2}_{k}V(\nu_{k})= h^{2}_{k} \times 1 = \alpha_{0} + \sum_{i = 1}^{q}\alpha_{i}\varepsilon^{2}_{k - i} + \sum_{j = 1}^{p}\beta_{j}h^{2}_{k - j}. {} \end{aligned} $$
(9.5)

As is evident from (9.5), the variance of \(\varepsilon _{k}\) depends on k. If a GARCH(p, q) model is used for the process noise term in the random walk, the predict equations in the state estimation step change to

$$\displaystyle \begin{aligned} x_{k|k - 1} &= x_{k - 1|k - 1} \end{aligned} $$
(9.6)
$$\displaystyle \begin{aligned} \sigma^{2}_{k|k - 1} &= \sigma^{2}_{k - 1|k - 1} + \sigma^{2}_{\varepsilon, k} = \sigma^{2}_{k - 1|k - 1} + \alpha_{0} + \sum_{i = 1}^{q}\alpha_{i}\varepsilon^{2}_{k - i} + \sum_{j = 1}^{p}\beta_{j}h^{2}_{k - j}. \end{aligned} $$
(9.7)

The update equations in the state estimation step remain unchanged. Note also that the calculation of \(\sigma ^{2}_{k|k - 1}\) requires the previous process noise terms. In general, these will have to be calculated based on successive differences between the \(x_{k}\) and \(x_{k - 1}\) estimates.

Moreover, we would also have \((p + q + 1)\) additional GARCH terms (the \(\alpha _{i}\)’s and \(\beta _{j}\)’s) to determine at the parameter estimation step. These terms would have to be chosen to maximize the log-likelihood

$$\displaystyle \begin{aligned} Q &= \frac{(-1)}{2}\sum_{k = 1}^{K}\mathbb{E}\Bigg[\log(2\pi \sigma^{2}_{\varepsilon, k}) + \frac{(x_{k} - x_{k - 1})^{2}}{\sigma^{2}_{\varepsilon, k}}\Bigg] \end{aligned} $$
(9.8)
$$\displaystyle \begin{aligned} &= \frac{(-1)}{2}\sum_{k = 1}^{K}\mathbb{E}\Bigg\{\log\Bigg[2\pi\Bigg(\alpha_{0} + \sum_{i = 1}^{q}\alpha_{i}\varepsilon^{2}_{k - i} + \sum_{j = 1}^{p}\beta_{j}h^{2}_{k - j}\Bigg)\Bigg]\\ &\qquad + \frac{(x_{k} - x_{k - 1})^{2}}{\alpha_{0} + \sum_{i = 1}^{q}\alpha_{i}\varepsilon^{2}_{k - i} + \sum_{j = 1}^{p}\beta_{j}h^{2}_{k - j}}\Bigg\}. \end{aligned} $$
(9.9)

The maximization of Q with respect to the GARCH terms is rather complicated. Choosing a GARCH(1, 1) model for \(\varepsilon _{k}\) simplifies the computations somewhat. Additionally, note the recursive form contained within Q. For each value of k, we have terms of the form \(h^{2}_{k - j}\) which contain within them further \(h^{2}\) terms. In general, computing Q is challenging unless further simplifying assumptions are made.

When \(x_{k}\) evolves with time following \(x_{k} = x_{k - 1} + \varepsilon _{k}\), where \(\varepsilon _{k}\) is modeled using a GARCH(p, q) framework, the predict equations in the state estimation step are

$$\displaystyle \begin{aligned} x_{k|k - 1} &= x_{k - 1|k - 1} \end{aligned} $$
(9.10)
$$\displaystyle \begin{aligned} \sigma^{2}_{k|k - 1} &= \sigma^{2}_{k - 1|k - 1} + \alpha_{0} + \sum_{i = 1}^{q}\alpha_{i}\varepsilon^{2}_{k - i} + \sum_{j = 1}^{p}\beta_{j}h^{2}_{k - j}. \end{aligned} $$
(9.11)

The parameter estimation step updates for the \((p + q + 1)\) GARCH terms are chosen to maximize

$$\displaystyle \begin{aligned} &\frac{(-1)}{2}\sum_{k = 1}^{K}\mathbb{E}\Bigg\{\log\Bigg[2\pi\Bigg(\alpha_{0} + \sum_{i = 1}^{q}\alpha_{i}\varepsilon^{2}_{k - i} + \sum_{j = 1}^{p}\beta_{j}h^{2}_{k - j}\Bigg)\Bigg]\\ &\qquad + \frac{(x_{k} - x_{k - 1})^{2}}{\alpha_{0} + \sum_{i = 1}^{q}\alpha_{i}\varepsilon^{2}_{k - i} + \sum_{j = 1}^{p}\beta_{j}h^{2}_{k - j}}\Bigg\}. \end{aligned} $$
(9.12)

9.2 Deriving the Parameter Estimation Step Equations for Terms Related to a Binary Observation

Thus far, we have only considered cases where the probability of binary event occurrence \(p_{k}\) is of the form

$$\displaystyle \begin{aligned} p_{k} &= \frac{1}{1 + e^{-(\beta_{0} + x_{k})}}. \end{aligned} $$
(9.13)

We have also thus far only estimated \(\beta _{0}\) empirically (e.g., based on the average probability of point process event occurrence). Occasionally, however, we will find it helpful to model \(p_{k}\) as

$$\displaystyle \begin{aligned} p_{k} &= \frac{1}{1 + e^{-(\beta_{0} + \beta_{1} x_{k})}} \end{aligned} $$
(9.14)

and determine \(\beta _{0}\) and \(\beta _{1}\) at the parameter estimation step. If we wish to do so, we will need to consider the probability term that needs to be maximized at this step. Based on (3.27), this probability term is

$$\displaystyle \begin{aligned} \prod_{k = 1}^{K} e^{n_{k}\log\big(\frac{p_{k}}{1 - p_{k}}\big) + \log(1 - p_{k})} &= \prod_{k = 1}^{K} e^{n_{k}(\beta_{0} + \beta_{1}x_{k}) + \log\Big(\frac{1}{1 + e^{\beta_{0} + \beta_{1}x_{k}}}\Big)}. \end{aligned} $$
(9.15)

This yields the expected log-likelihood

$$\displaystyle \begin{aligned} Q = \sum_{k = 1}^{K}\mathbb{E}\Big[n_{k}(\beta_{0} + \beta_{1}x_{k}) - \log\big(1 + e^{\beta_{0} + \beta_{1}x_{k}}\big)\Big]. \end{aligned} $$
(9.16)

As in the case of determining the parameter updates for the terms in a CIF, this expected value is also somewhat complicated. Again, the trick is to perform a Taylor expansion around the mean \(\mathbb {E}[x_{k}] = x_{k|K}\) for each of the individual log terms. After performing this expansion, we end up with terms like \(\mathbb {E}[x_{k} - x_{k|K}]\) and \(\mathbb {E}[(x_{k} - x_{k|K})^{2}]\) which greatly simplify our calculations.

Let us begin by performing a Taylor expansion of the log term around \(x_{k|K}\) [6].

$$\displaystyle \begin{aligned} \log\big(1 + e^{\beta_{0} + \beta_{1}x_{k}}\big) &\approx \log\big(1 + e^{\beta_{0} + \beta_{1}x_{k|K}}\big) + \beta_{1}p_{k|K}(x_{k} - x_{k|K})\\ &+ \frac{\beta_{1}^{2}}{2}p_{k|K}(1 - p_{k|K})(x_{k} - x_{k|K})^{2}. \end{aligned} $$
(9.17)

Note the terms \((x_{k} - x_{k|K})\) and \((x_{k} - x_{k|K})^{2}\) in the expansion. Taking the expected value on both sides,

$$\displaystyle \begin{aligned} \mathbb{E}\Big[\log\big(1 + e^{\beta_{0} + \beta_{1}x_{k}}\big)\Big] \approx &\log\big(1 + e^{\beta_{0} + \beta_{1}x_{k|K}}\big) + \beta_{1}p_{k|K}\mathbb{E}\big[x_{k} - x_{k|K}\big] \\&+ \frac{\beta_{1}^{2}}{2}p_{k|K}(1 - p_{k|K})\mathbb{E}\big[(x_{k} - x_{k|K})^{2}\big] \end{aligned} $$
(9.18)
$$\displaystyle \begin{aligned} = &\log\big(1 + e^{\beta_{0} + \beta_{1}x_{k|K}}\big) + 0 + \frac{\beta_{1}^{2}}{2}p_{k|K}(1 - p_{k|K})\sigma^{2}_{k|K}. \end{aligned} $$
(9.19)

Therefore,

$$\displaystyle \begin{aligned} Q & \approx\sum_{k = 1}^{K}\Bigg[n_{k}(\beta_{0} + \beta_{1}x_{k|K}) - \log\big(1 + e^{\beta_{0} + \beta_{1}x_{k|K}}\big) - \frac{\beta_{1}^{2}}{2}p_{k|K}(1 - p_{k|K})\sigma^{2}_{k|K}\Bigg] . \end{aligned} $$
(9.20)

Now,

$$\displaystyle \begin{aligned} \frac{\partial p_{k|K}}{\partial \beta_{0}} &= \frac{\partial}{\partial \beta_{0}} \Bigg[\frac{1}{1 + e^{-(\beta_{0} + \beta_{1}x_{k|K})}} \Bigg] = \frac{(-1)}{\Big[1 + e^{-(\beta_{0} + \beta_{1}x_{k|K})}\Big]^{2}} \times \Big[-e^{-(\beta_{0} + \beta_{1}x_{k|K})}\Big]\\ &= p_{k|K}(1 - p_{k|K}) . \end{aligned} $$
(9.21)

And similarly,

$$\displaystyle \begin{aligned} \frac{\partial p_{k|K}}{\partial \beta_{1}} &= p_{k|K}(1 - p_{k|K})x_{k|K} . \end{aligned} $$
(9.22)

Taking the partial derivative of Q with respect to \(\beta _{0}\), we have

$$\displaystyle \begin{aligned} \frac{\partial Q}{\partial\beta_{0}} &= \sum_{k = 1}^{K}\Bigg\{n_{k} - \frac{e^{\beta_{0} + \beta_{1}x_{k|K}}}{\big(1 + e^{\beta_{0} + \beta_{1}x_{k|K}}\big)} - \frac{\beta_{1}^{2}\sigma^{2}_{k|K}}{2}\frac{\partial}{\partial \beta_{0}}\Big[p_{k|K}(1 - p_{k|K})\Big]\Bigg\} \end{aligned} $$
(9.23)
$$\displaystyle \begin{aligned} &= \sum_{k = 1}^{K}\Bigg\{n_{k} - p_{k|K} - \frac{\beta_{1}^{2}\sigma^{2}_{k|K}}{2}\frac{\partial}{\partial \beta_{0}}\Big[p_{k|K}(1 - p_{k|K})\Big]\Bigg\} \end{aligned} $$
(9.24)
$$\displaystyle \begin{aligned} &= \sum_{k = 1}^{K}\Bigg[n_{k} - p_{k|K} - \frac{\beta_{1}^{2}\sigma^{2}_{k|K}}{2}(1 - p_{k|K})(1 - 2p_{k|K})p_{k|K}\Bigg] . \end{aligned} $$
(9.25)

And similarly for \(\beta _{1}\), we have

$$\displaystyle \begin{aligned} & \frac{\partial Q}{\partial\beta_{1}}\!=\!\sum_{k = 1}^{K}\Bigg[n_{k}x_{k|K}\!-\!x_{k|K}p_{k|K}\!-\!\frac{\beta_{1}\sigma^{2}_{k|K}}{2}p_{k|K}(1\!-\!p_{k|K}) \big[2 + \beta_{1}x_{k|K}(1 - 2p_{k|K})\big]\Bigg]. \end{aligned} $$
(9.26)

By setting

$$\displaystyle \begin{aligned} \frac{\partial Q}{\partial\beta_{0}} &= 0 \end{aligned} $$
(9.27)
$$\displaystyle \begin{aligned} \frac{\partial Q}{\partial\beta_{1}} &= 0, \end{aligned} $$
(9.28)

we obtain two simultaneous equations with which to solve for \(\beta _{0}\) and \(\beta _{1}\). Note also that the use of \(\beta _{0}\) and \(\beta _{1}\) in \(p_{k}\) causes changes to the filter update equations for \(x_{k|k}\) and \(\sigma ^{2}_{k|k}\).

The parameter estimation step updates for \(\beta _{0}\) and \(\beta _{1}\) when we observe a binary variable \(n_{k}\) are obtained by solving

$$\displaystyle \begin{aligned} &\sum_{k = 1}^{K}\Bigg[n_{k} - p_{k|K} - \frac{\beta_{1}^{2}\sigma^{2}_{k|K}}{2}(1 - p_{k|K})(1 - 2p_{k|K})p_{k|K}\Bigg] = 0 \end{aligned} $$
(9.29)
$$\displaystyle \begin{aligned} &\sum_{k = 1}^{K}\Bigg[n_{k}x_{k|K} - x_{k|K}p_{k|K} - \frac{\beta_{1}\sigma^{2}_{k|K}}{2}p_{k|K}(1 - p_{k|K})\big[2 + \beta_{1}x_{k|K}(1 - 2p_{k|K})\big]\Bigg]\\ &\quad = 0 . \end{aligned} $$
(9.30)

9.3 Extending Estimation to a Vector-Valued State

We have also thus far only considered cases where a single state \(x_{k}\) gives rise to different observations. In a number of applications, we will encounter the need to estimate a vector-valued state \({\mathbf {x}}_{k}\). For instance, we may need to estimate the position of a small animal on a 2D plane from neural spiking observations or may need to estimate different aspects of emotion from physiological signal features. We have a multi-dimensional \({\mathbf {x}}_{k}\) in each of these cases.

Let us first consider the predict equations in the state estimation step. Assume that we have a state \({\mathbf {x}}_{k}\) that varies with time following

$$\displaystyle \begin{aligned} {\mathbf{x}}_{k} &= A{\mathbf{x}}_{k - 1} + B{\mathbf{u}}_{k} + {\mathbf{e}}_{k}, \end{aligned} $$
(9.31)

where A and B are matrices and \({\mathbf {e}}_{k} \sim \mathcal {N}(\mathbf {0}, \varSigma )\) is the process noise. The basic statistical results related to mean and variance in (2.1)–(2.6) simply generalize to the vector case. Thus, the predict equations in the state estimation step become

$$\displaystyle \begin{aligned} {\mathbf{x}}_{k|k - 1} &= A{\mathbf{x}}_{k - 1|k - 1} + B{\mathbf{u}}_{k} \end{aligned} $$
(9.32)
$$\displaystyle \begin{aligned} \varSigma_{k|k - 1} &= A\varSigma_{k - 1|k - 1}A^{\intercal} + \varSigma, \end{aligned} $$
(9.33)

where the covariance (uncertainty) \(\varSigma \) of \({\mathbf {x}}_{k}\) is now a matrix.

Recall also how we derived the update equations in the state estimation step. We calculated the terms that appeared in posterior \(p(x_{k}|y_{1:k})\) and made a Gaussian approximation to it in order to derive the mean and variance updates \(x_{k|k}\) and \(\sigma ^{2}_{k|k}\). In all of the scalar cases, the log posterior density had the form

$$\displaystyle \begin{aligned} q_{s} &= f(x_{k}) - \frac{(x_{k} - x_{k|k - 1})^{2}}{2\sigma^{2}_{k|k - 1}} + \text{constant}, \end{aligned} $$
(9.34)

where \(f(x_{k})\) was some function of \(x_{k}\). This function could take on different forms depending on whether binary, continuous, or spiking-type observations (or different combinations of them) were present. In each of the cases, the mean and variance were derived based on the first and second derivatives of \(q_{s}\).

There are two different ways for calculating the update step equations in the vector case.

  • The first is the traditional approach outlined in [10]. Here, the result that holds for the 1D case is simply extended to the vector case. Regardless of the types of observations (features) that are present in the state-space model, the log posterior is of the form

    $$\displaystyle \begin{aligned} q_{v} &= f({\mathbf{x}}_{k}) - \frac{1}{2}({\mathbf{x}}_{k} - {\mathbf{x}}_{k|k - 1})^{\intercal}\varSigma^{-1}_{k|k - 1}({\mathbf{x}}_{k} - {\mathbf{x}}_{k|k - 1}) + \text{constant}. \end{aligned} $$
    (9.35)

    The manner in which the updates \({\mathbf {x}}_{k|k}\) and \(\varSigma _{k|k}\) are calculated, however, is quite similar. We simply take the first vector derivative of \(q_{v}\) and solve for where it is \(\mathbf {0}\) to obtain \({\mathbf {x}}_{k|k}\). We next take the Hessian of \(q_{v}\) comprising all the second derivatives and take its negative inverse to obtain \(\varSigma _{k|k}\).

  • The second approach is slightly different [115]. Note that, based on making a Gaussian approximation to the log posterior, we can write

    $$\displaystyle \begin{aligned} - \frac{1}{2}({\mathbf{x}}_{k}\!-\!{\mathbf{x}}_{k|k})^{\intercal}\varSigma^{-1}_{k|k}({\mathbf{x}}_{k}\!-\!{\mathbf{x}}_{k|k}) &= f({\mathbf{x}}_{k})\!- \!\frac{1}{2}({\mathbf{x}}_{k}\!-\!{\mathbf{x}}_{k|k\!-\!1})^{\intercal}\varSigma^{-1}_{k|k\!-\!1}({\mathbf{x}}_{k}\!-\!{\mathbf{x}}_{k|k\!-\!1})\\ &\qquad + \text{constant}. \end{aligned} $$
    (9.36)

    Let us take the first vector derivative with respect to \({\mathbf {x}}_{k}\) on both sides. This yields

    $$\displaystyle \begin{aligned} - \varSigma^{-1}_{k|k}({\mathbf{x}}_{k} - {\mathbf{x}}_{k|k}) &= \frac{\partial f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}} - \varSigma^{-1}_{k|k - 1}({\mathbf{x}}_{k} - {\mathbf{x}}_{k|k - 1}). {} \end{aligned} $$
    (9.37)

    Let us now evaluate this expression at \({\mathbf {x}}_{k} = {\mathbf {x}}_{k|k - 1}\). Do you see that if we substitute \({\mathbf {x}}_{k} = {\mathbf {x}}_{k|k - 1}\) in the above expression, the second term on the right simply goes away? Therefore, we end up with

    $$\displaystyle \begin{aligned} - \varSigma^{-1}_{k|k}({\mathbf{x}}_{k|k - 1} - {\mathbf{x}}_{k|k}) &= \frac{\partial f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}}\Bigg\rvert_{{\mathbf{x}}_{k|k - 1}} \end{aligned} $$
    (9.38)
    $$\displaystyle \begin{aligned} \implies {\mathbf{x}}_{k|k} &= {\mathbf{x}}_{k| k - 1} + \varSigma_{k|k}\frac{\partial f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}}\Bigg\rvert_{{\mathbf{x}}_{k|k - 1}}. \end{aligned} $$
    (9.39)

    This yields the mean state update for \({\mathbf {x}}_{k|k}\). How do we derive the covariance matrix \(\varSigma _{k|k}\)? We simply take the vector derivative of (9.37) again. Note that in this case, \(\frac {\partial ^{2}}{\partial {\mathbf {x}}_{k}^{2}}\) is a matrix of all the second derivative terms. Thus, we obtain

    $$\displaystyle \begin{aligned} \varSigma^{-1}_{k|k} &= -\frac{\partial^{2}f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}^{2}} + \varSigma^{-1}_{k|k - 1} \end{aligned} $$
    (9.40)
    $$\displaystyle \begin{aligned} \implies \varSigma_{k|k} &= \Bigg[-\frac{\partial^{2}f({\mathbf{x}}_{k})}{\partial {\mathbf{x}}_{k}^{2}} + \varSigma^{-1}_{k|k - 1} \Bigg]^{-1}. \end{aligned} $$
    (9.41)

9.4 The Use of Machine Learning Methods for State Estimation

Machine learning approaches can also be used for state estimation (e.g., [116, 117]). In these methods, neural networks or other techniques are utilized to learn a particular state-space model and infer the unobserved state(s) from a dataset. In this section, we will briefly describe how the neural network approach in [116] is used for estimation. In [116], Krishnan et al. considered the general Gaussian state-space model

$$\displaystyle \begin{aligned} x_{k} &\sim \mathcal{N}(f_{\mu_{x}}(x_{k - 1}), f_{\sigma^{2}_{x}}(x_{k - 1})) \end{aligned} $$
(9.42)
$$\displaystyle \begin{aligned} y_{k} &\sim \varPi (f_{y}(x_{k})), \end{aligned} $$
(9.43)

where \(y_{k}\) represents the observations. Both the state equation and the output equation are learned using two separate neural networks (for simplicity, we group both of them together under the title “state-space neural network”—SSNN). A separate recurrent neural network (RNN) is used to estimate \(x_{k}\). Taking \(\psi \) and \(\phi \) to denote the parameters of the state-space model and the RNN, respectively, the networks are trained by maximizing

(9.44)

where \(p_{\psi }(\cdot )\) and \(q_{\phi }(\cdot )\) denote density functions [116]. The actual training is performed within the algorithm as a minimization of the negative term which we label \(Q_{ML}\). Analogous to the state-space EM algorithms we have seen so far, in this neural network approach, the SSNN replaces the explicit state-space model, the RNN replaces the Bayesian filter, and the weights of the neural networks replace the model parameters. The objective, however, is still to estimate \(x_{k}\) from observations such as \(n_{k}\), \(r_{k}\), and \(s_{k}\). Since neural networks are used to learn the state-space model, more complicated state transitions and input-output relationships are permitted. One of the drawbacks, however, is that a certain degree of interpretability is lost.

Similarities also exist between the terms in \(Q_{ML}\) and the log-likelihood terms we have seen thus far. For instance, when a binary variable \(n_{k}\) is present among the observations \(y_{k}\), \(Q_{ML}\) contains the summation

$$\displaystyle \begin{aligned} -\sum \Big[n_{k}\log\Big(\frac{1}{1 + e^{-f_{n}(x_{k})}}\Big) + (1 - n_{k})\log\Big(1 - \frac{1}{1 + e^{-f_{n}(x_{k})}}\Big)\Big]. \end{aligned} $$
(9.45)

Take a moment to look back at how (3.15) and (3.26) fit in with this summation. In this case, however, \(f_{n}(\cdot )\) is learned by the SSNN (in our other approaches, we explicitly modeled the relationship between \(x_{k}\) and \(p_{k}\) using a sigmoid). Similarly, if a continuous-valued variable \(s_{k}\) is present in \(y_{k}\), there is the summation

$$\displaystyle \begin{aligned} \sum \frac{1}{2}\log\big[2\pi f_{\sigma^{2}_{s}}(x_{k})\big] + \frac{\big[s_{k} - f_{\mu_{s}}(x_{k})\big]^{2}}{2f_{\sigma^{2}_{s}}(x_{k})}, {} \end{aligned} $$
(9.46)

where \(f_{\mu _{s}}(\cdot )\) and \(f_{\sigma ^{2}_{s}}(\cdot )\) represent mean and variance functions learned by the SSNN. Again, recall that we had a very similar term at the parameter estimation step for a continuous variable \(s_{k}\).

One of the primary advantages of the neural network approach in [116] is that we no longer need to derive all the EM algorithm equations when new observations are added. This is a notable drawback with the traditional EM approach. Moreover, we can also modify the objective function to

$$\displaystyle \begin{aligned} (1 - \rho)Q_{ML} + \rho\sum (x_{k} - l_{k})^{2}, \end{aligned} $$
(9.47)

where \(l_{k}\) is an external influence and \(0 \leq \rho \leq 1\). This provides the option to perform state estimation while permitting an external influence (e.g., domain knowledge or subject-provided labels) to affect \(x_{k}\).

9.5 Additional MATLAB Code Examples

In this section we briefly describe the two state-space models in [118] and [30] for which the MATLAB code examples are provided. The equation derivations for these two models require no significant new knowledge. The first of these incorporates one binary observation from skin conductance and one EKG spiking-type observation. The second incorporates one binary observation and two continuous observations. It is almost identical to the model with the same observations described in an earlier chapter but has a circadian rhythm term as \(I_{k}\). The derivation of the state and parameter estimation equations is similar to what we have seen before.

9.5.1 State-Space Model with One Binary and One Spiking-Type Observation

The MATLAB code example for the state-space model with one binary and one spiking-type observation is provided in the “one_bin_one_spk” folder. The model is described in [118] and attempts to estimate sympathetic arousal from binary-valued SCRs and EKG R-peaks (the RR-intervals are modeled using an HDIG-based CIF). The results are shown in Fig. 9.1. The data come from the study described in [119] where subjects had to perform office work-like tasks under different conditions. In the first condition, the subjects were permitted to take as much time as they liked. The other two conditions involved e-mail interruptions and time constraints. Based on the results reported in [118], it appeared that task uncertainty (i.e., how new the task is) seemed to have generated the highest sympathetic arousal responses for the subject considered.

Fig. 9.1
5 waveforms on state estimation with experimental data. The plot is made for the data from 0 to 50 minutes and it denotes multiple peaks. A decreasing trend of the graph is present for p subscript k, x subscript k, and H A I.

State estimation based on observing one binary and one spiking-type variable. The sub-panels, respectively, depict (a) the skin conductance signal \(z_{k}\) (the green and black dots on top depict the presence or the absence of SCRs, respectively), (b) the RR-interval sequence (orange) and the fit to the HDIG mean (red), (c) the probability of SCR occurrence \(p_{k}\) and its 95% confidence limits, (d) the arousal state \(x_{k}\) and its 95% confidence limits, and (e) the HAI (the regions above 90% and below 10% are shaded in red and green, respectively). Ⓒ 2019 IEEE. Reprinted, with permission, from [118]

9.5.2 State-Space Model with One Binary and Two Continuous Observations with a Circadian Input in the State Equation

Cortisol is known to exhibit circadian variation [120, 121]. Typically, cortisol concentrations in the blood begin to rise early morning during late stages of sleep. Peak values are reached shortly after awakening. Later in the day, cortisol levels tend to drop toward bedtime and usually reach their lowest values in the middle of the night [122, 123]. In [30], a circadian \(I_{k}\) term was assumed to drive \(x_{k}\) so that it evolved with time following

$$\displaystyle \begin{aligned} x_{k} &= \rho x_{k - 1} + I_{k} + \varepsilon_{k}, \end{aligned} $$
(9.48)

where

$$\displaystyle \begin{aligned} I_{k} &= \sum_{i = 1}^{2} a_{i} \sin\Big(\frac{2\pi ik}{1440}\Big) + b_{i}\cos\Big(\frac{2\pi ik}{1440}\Big). \end{aligned} $$
(9.49)

The model also considered the upper and lower envelopes of the blood cortisol concentrations as the two continuous variables \(r_{k}\) and \(s_{k}\). The pulsatile secretions formed the binary variable \(n_{k}\). The inclusion of each continuous variable necessitates the determination of three model parameters (two governing the linear fit and the third being the sensor noise variance). In addition, the state-space model in [30] also estimated \(\beta _{0}\) and \(\beta _{1}\) in \(p_{k}\). There are also six more parameters in the state equation: \(\rho \), \(a_{1}\), \(a_{2}\), \(b_{1}\), \(b_{2}\), and \(\sigma ^{2}_{\varepsilon }\). To ease computational complexity, the EM algorithm in [30] treated the four parameters related to the circadian rhythm (\(a_{1}\), \(a_{2}\), \(b_{1}\), and \(b_{2}\)) somewhat differently. Thus, while all the parameters were updated at the parameter estimation step, \(a_{1}\), \(a_{2}\), \(b_{1}\), and \(b_{2}\) were excluded from the convergence criteria. The results are shown in Fig. 9.2. Here, the data were simulated for a hypothetical patient suffering from a type of hypercortisolism (Cushing’s disease) based on the parameters in [124]. Cushing’s disease involves excess cortisol secretion into the bloodstream and may be caused by tumors or prolonged drug use [125]. Symptoms of Cushing’s disease involve a range of physical and psychological symptoms including insomnia and fatigue [126,127,128]. The resulting cortisol-related energy state estimates do not have the usual circadian-like patterns seen for a healthy subject. This may partially account for why Cushing’s patients experience daytime bouts of fatigue and nighttime sleeping difficulties.

Fig. 9.2
5 waveforms on state estimation with experimental data. The plot is made for the data from 0 to 24 hours and it denotes multiple peaks during the time period. The cortisol level has an erratic pattern.

State estimation based on observing one binary and two continuous variables with a circadian input in the state equation. The sub-panels, respectively, depict (a) the cortisol profile (the green and black dots on top denote the presence or the absence of pulsatile secretions respectively), (b) the first cortisol concentration envelope \(r_{k}\) (green solid) and its estimate (dashed), (c) the second cortisol concentration envelope \(s_{k}\) (mauve solid) and its estimate (dashed), (d) the probability of pulse occurrence \(p_{k}\), and (e) the energy state \(x_{k}\). Ⓒ 2019 IEEE. Reprinted, with permission, from [30]