In this chapter, we will consider the case where \(x_{k}\) evolves with time following a slightly more complicated state equation and gives rise to both a binary observation \(n_{k}\) and a continuous observation \(r_{k}\). Prior to looking into the equation derivations, however, as in the previous chapter, we will again first consider a few example scenarios where the need for such a model arises.

In the previous chapter, we considered the estimation of a continuous-valued learning state \(x_{k}\) based on correct/incorrect responses in a sequence of experimental trials. Based on a state-space model consisting of \(x_{k}\) and the binary observations \(n_{k}\), the cognitive learning state of an animal could be estimated over time [4]. Note, however, that it is not just the correct/incorrect responses that contain information regarding the animal’s learning state. How fast the animal responds also reflects changes in learning. For instance, as an animal gradually begins to learn to recognize a specific visual target, not only do the correct answers begin to occur more frequently, but the time taken to respond in each of the trials also starts decreasing (Fig. 4.1). Thus, a state-space model with both a binary observation \(n_{k}\) and a continuous observation \(r_{k}\) was developed in [5] to estimate learning. This was an improvement over the earlier model in [4].

Fig. 4.1
An illustration includes a cartoon picture of a monkey, a rectangular box with sun, lightning, and positive signs, and a scatter graph that plots the cognitive learning state with reaction times. The correct and incorrect responses are illustrated above the graph.

A monkey in a learning experiment with binary-valued correct/incorrect responses where reaction times are recorded. In a behavioral learning experiment, not only do the binary-valued correct/incorrect responses contain information regarding learning, but the time taken to respond in each trial also reflects changes in learning. The state-space model with correct/incorrect responses and reaction times was used in [5] to estimate a cognitive learning state in animal models

This particular state-space model is not just limited to cognitive learning. It can also be adapted to other applications as well. Human emotion is typically accounted for along two different axes known as valence and arousal [55]. Valence denotes the pleasant–unpleasant nature of an emotion, while arousal denotes its corresponding activation or excitement. Emotional arousal is closely tied to the activation of the sympathetic nervous system [56, 57]. Changes in arousal can occur regardless of the valence of the emotion (i.e., arousal can be high when the emotion is negative, as in the case of rage, or when it is positive, as in the case of excitement). As we saw in the earlier chapter, skin conductance is a sensitive index of arousal. Changes in emotional valence, on the other hand, often cause changes in facial expressions. Information regarding these facial expressions can be captured via EMG sensors attached to the face. The state-space model with one binary observation \(n_{k}\) and one continuous observation \(r_{k}\) was used in [27] for an emotional valence recognition application based on EMG signals. In [27], Yadav et al. extracted both a binary feature and a continuous feature based on EMG amplitudes and powers from data in an experiment where subjects watched a series of music videos meant to evoke different emotions. Based on the model, they were able to extract a continuous-valued emotional valence state \(x_{k}\) over time. The same model was also used in [28] for detecting epileptic seizures. Here, the authors extracted a binary feature and a continuous feature from scalp EEG signals to detect the occurrence of epileptic seizures. Based on the features, a continuous-valued seizure severity state could be tracked over time. These examples serve to illustrate the possibility of using physiological state-space models for a wide variety of applications.

4.1 Deriving the Predict Equations in the State Estimation Step

Let us now consider the state-space model itself. Assume that \(x_{k}\) varies with time as

$$\displaystyle \begin{aligned}x_{k} &= \rho x_{k - 1} + \varepsilon_{k}, \end{aligned} $$
(4.1)

where \(\rho \) is a constant (forgetting factor) and \(\varepsilon _{k} \sim \mathcal {N}(0, \sigma ^{2}_{\varepsilon })\). As in the previous chapter, we will, for the time-being, ignore that this is part of state-space control system, and instead view the equation purely in terms of a relationship between three random variables. As before, we will also consider the derivation of the mean and variance of \(x_{k}\) using basic formulas. We first consider the mean.

$$\displaystyle \begin{aligned} \mathbb{E}[x_{k}] &= \mathbb{E}[\rho x_{k - 1} + \varepsilon_{k}] \end{aligned} $$
(4.2)
$$\displaystyle \begin{aligned} &= \mathbb{E}[\rho x_{k - 1}] + \mathbb{E}[\varepsilon_{k}] \enspace \text{using (2.1)} \end{aligned} $$
(4.3)
$$\displaystyle \begin{aligned} &= \rho\mathbb{E}[x_{k - 1}] + \mathbb{E}[\varepsilon_{k}] \enspace \text{using (2.3)} \end{aligned} $$
(4.4)
$$\displaystyle \begin{aligned} &= \rho\mathbb{E}[x_{k - 1}] \enspace \text{since }\mathbb{E}[\varepsilon_{k}] = 0 \end{aligned} $$
(4.5)
$$\displaystyle \begin{aligned} \therefore \mathbb{E}[x_{k}] &= \rho x_{k - 1|k - 1}. \end{aligned} $$
(4.6)

Next we consider the variance.

$$\displaystyle \begin{aligned} V(x_{k}) &= V(\rho x_{k - 1} + \varepsilon_{k}) \end{aligned} $$
(4.7)
$$\displaystyle \begin{aligned} &= V(\rho x_{k - 1}) + V(\varepsilon_{k}) + 2 Cov(\rho x_{k - 1}, \varepsilon_{k}) \enspace \text{using (2.4)} \end{aligned} $$
(4.8)
$$\displaystyle \begin{aligned} &= V(\rho x_{k - 1}) + V(\varepsilon_{k}) \enspace \text{since }\varepsilon_{k}\text{ is uncorrelated with any of the }x_{k}\text{ terms} \end{aligned} $$
(4.9)
$$\displaystyle \begin{aligned} &= \rho^{2} V(x_{k - 1}) + V(\varepsilon_{k}) \enspace \text{using (2.6)} \end{aligned} $$
(4.10)
$$\displaystyle \begin{aligned} \therefore V(x_{k}) &= \rho^{2} \sigma^{2}_{k - 1|k - 1} + \sigma^{2}_{\varepsilon}. \end{aligned} $$
(4.11)

Now that we know the mean and variance of \(x_{k}\), we can use the fact that it is also Gaussian distributed to state that

$$\displaystyle \begin{aligned} p(x_{k}|n_{1:k - 1}, r_{1:k - 1}) &= \frac{1}{\sqrt{2 \pi \sigma^{2}_{k|k - 1}}}e^{\frac{-(x_{k} - x_{k|k - 1})^{2}}{2 \sigma^{2}_{k|k - 1}}}. \end{aligned} $$
(4.12)

When \(x_{k}\) evolves with time following \(x_{k} = \rho x_{k - 1} + \varepsilon _{k}\), the predict equations in the state estimation step are

$$\displaystyle \begin{aligned} x_{k|k - 1} &= \rho x_{k - 1|k - 1} \end{aligned} $$
(4.13)
$$\displaystyle \begin{aligned} \sigma^{2}_{k|k - 1} &= \rho^{2} \sigma^{2}_{k - 1|k - 1} + \sigma^{2}_{\varepsilon}. \end{aligned} $$
(4.14)

4.2 Deriving the Update Equations in the State Estimation Step

In the current model, \(x_{k}\) gives rise to a continuous-valued observation \(r_{k}\) in addition to \(n_{k}\). We shall assume that \(x_{k}\) is related to \(r_{k}\) through a linear relationship.

$$\displaystyle \begin{aligned} r_{k} &= \gamma_{0} + \gamma_{1}x_{k} + v_{k}, {} \end{aligned} $$
(4.15)

where \(\gamma _{0}\) and \(\gamma _{1}\) are constants and \(v_{k} \sim \mathcal {N}(0, \sigma ^{2}_{k})\) is sensor noise. Our sensor readings \(y_{k}\) now consist of both \(r_{k}\) and \(n_{k}\). What would be the best estimate of \(x_{k}\) once we have observed \(y_{k}\)? Just like in the previous chapter, we will make use of the result in (2.16) to derive this estimate. First, however, we need to note that

$$\displaystyle \begin{aligned} p(r_{k}|x_{k}) &= \frac{1}{\sqrt{2\pi \sigma^{2}_{v}}}e^{\frac{-(r_{k} - \gamma_{0} - \gamma_{1}x_{k})^{2}}{2\sigma^{2}_{v}}}{}. \end{aligned} $$
(4.16)

This can be easily verified from (4.15). We are now ready to derive the best estimate (mean) for \(x_{k}\) and its uncertainty. We will need to make use of \(p(r_{k}|x_{k})\), \(p(n_{k}|x_{k})\), and \(p(x_{k}|n_{1:k - 1}, r_{1:k - 1})\) to derive this estimate. Note that we now have an additional exponent term for \(r_{k}\) in \(p(x_{k}|n_{1:k}, r_{1:k})\). Using (2.16), we have

$$\displaystyle \begin{aligned} &p(x_{k}|n_{1:k}, r_{1:k})\\ &\quad = \frac{p(n_{k}|x_{k}, n_{1:k-1}, r_{1:k - 1})p(r_{k}|x_{k}, n_{1:k-1}, r_{1:k - 1})p(x_{k}|n_{1:k - 1}, r_{1:k-1})}{p(n_{k}, r_{k}|n_{1:k - 1}, r_{1:k - 1})} \end{aligned} $$
(4.17)
$$\displaystyle \begin{aligned} &\propto p(n_{k}|x_{k})p(r_{k}|x_{k})p(x_{k}|n_{1:k - 1}, r_{1:k-1}) \end{aligned} $$
(4.18)
$$\displaystyle \begin{aligned} &\propto e^{n_{k}\log(p_{k}) + (1 - n_{k})\log(1 - p_{k})} \times e^{\frac{-(r_{k} - \gamma_{0} - \gamma_{1}x_{k})^{2}}{2\sigma^{2}_{v}}} \times e^{\frac{-(x_{k} - x_{k|k - 1})^{2}}{2 \sigma^{2}_{k|k - 1}}}{}. \end{aligned} $$
(4.19)

Taking the log on both sides, we have

$$\displaystyle \begin{aligned} q &= n_{k}\log(p_{k}) + (1 - n_{k})\log(1 - p_{k}) - \frac{(r_{k} - \gamma_{0} - \gamma_{1}x_{k})^{2}}{2\sigma^{2}_{v}}\\ &\quad - \frac{(x_{k} - x_{k|k - 1})^{2}}{2\sigma^{2}_{k|k - 1}} + \text{constant.} \end{aligned} $$
(4.20)

The mean and variance of \(x_{k}\) can now be derived by taking the first and second derivatives of q. Making use of (3.39), we have

$$\displaystyle \begin{aligned} \frac{dq}{dx_{k}} &= n_{k} - p_{k} + \frac{\gamma_{1}(r_{k} - \gamma_{0} - \gamma_{1}x_{k})}{\sigma^{2}_{v}} - \frac{(x_{k} - x_{k|k - 1})}{\sigma^{2}_{k|k - 1}} = 0{}. \end{aligned} $$
(4.21)

We will use a small trick to solve for \(x_{k}\) in the equation above. We will add and subtract the term \(\gamma _{1}x_{k|k - 1}\) in the term involving \(r_{k}\).

$$\displaystyle \begin{aligned} n_{k} - p_{k} + \frac{\gamma_{1}(r_{k} - \gamma_{0} - \gamma_{1}x_{k} + \gamma_{1}x_{k|k - 1} - \gamma_{1}x_{k|k - 1})}{\sigma^{2}_{v}} = \frac{(x_{k} - x_{k|k - 1})}{\sigma^{2}_{k|k - 1}}& \end{aligned} $$
(4.22)
$$\displaystyle \begin{aligned} n_{k} - p_{k} + \frac{\gamma_{1}(r_{k} - \gamma_{0} - \gamma_{1}x_{k|k - 1})}{\sigma^{2}_{v}} - \frac{\gamma_{1}^{2}}{\sigma^{2}_{v}}(x_{k} - x_{k|k - 1}) = \frac{(x_{k} - x_{k|k - 1})}{\sigma^{2}_{k|k - 1}}& \end{aligned} $$
(4.23)
$$\displaystyle \begin{aligned} n_{k} - p_{k} + \frac{\gamma_{1}(r_{k} - \gamma_{0} - \gamma_{1}x_{k|k - 1})}{\sigma^{2}_{v}} = (x_{k} - x_{k|k - 1})\Bigg(\frac{1}{\sigma^{2}_{k|k - 1}} + \frac{\gamma_{1}^{2}}{\sigma^{2}_{v}}\Bigg)& \end{aligned} $$
(4.24)
$$\displaystyle \begin{aligned} \frac{\sigma^{2}_{v}(n_{k} - p_{k}) + \gamma_{1}(r_{k} - \gamma_{0} - \gamma_{1}x_{k|k - 1})}{\sigma^{2}_{v}} = (x_{k} - x_{k|k - 1})\Bigg(\frac{\sigma^{2}_{v} + \gamma_{1}^{2}\sigma^{2}_{k|k - 1}}{\sigma^{2}_{k|k - 1}\sigma^{2}_{v}}\Bigg).& \end{aligned} $$
(4.25)

This yields the mean update

$$\displaystyle \begin{aligned} x_{k} &= x_{k|k - 1} + \frac{\sigma^{2}_{k|k - 1}}{\gamma_{1}^{2}\sigma^{2}_{k|k - 1} + \sigma^{2}_{v}}\Big[\sigma^{2}_{v}(n_{k} - p_{k}) + \gamma_{1}(r_{k} - \gamma_{0} - \gamma_{1}x_{k|k - 1})\Big]{}. \end{aligned} $$
(4.26)

Again, to clarify the explicit dependence of \(p_{k}\) on \(x_{k}\) and the fact that this is the estimate of \(x_{k}\) having observed \(n_{1:k}\) and \(r_{1:k}\) (the sensor readings up to time index k), we shall say

$$\displaystyle \begin{aligned} x_{k|k} &= x_{k|k - 1} + \frac{\sigma^{2}_{k|k - 1}}{\gamma_{1}^{2}\sigma^{2}_{k|k - 1} + \sigma^{2}_{v}}\Big[\sigma^{2}_{v}(n_{k} - p_{k|k}) + \gamma_{1}(r_{k} - \gamma_{0} - \gamma_{1}x_{k|k - 1})\Big]. \end{aligned} $$
(4.27)

We next take the second derivative of q similar to (3.40). This yields

$$\displaystyle \begin{aligned} \frac{d^{2}q}{dx_{k}^{2}} &= -p_{k}(1 - p_{k}) - \frac{\gamma_{1}^{2}}{\sigma^{2}_{v}} - \frac{1}{\sigma^{2}_{k|k -1}}.{} \end{aligned} $$
(4.28)

Based on (2.21), the uncertainty or variance associated with the new state estimate \(x_{k|k}\), therefore, is

$$\displaystyle \begin{aligned} \sigma^{2}_{k|k} &= -\Bigg(\frac{d^{2}q}{dx_{k}^{2}}\Bigg)^{-1} = \Bigg[\frac{1}{\sigma^{2}_{k|k - 1}} + p_{k|k}(1 - p_{k|k}) + \frac{\gamma_{1}^{2}}{\sigma^{2}_{v}}\Bigg]^{-1}{}. \end{aligned} $$
(4.29)

When \(x_{k}\) gives rise to a binary observation \(n_{k}\) and a continuous observation \(r_{k}\), the update equations in the state estimation step are

$$\displaystyle \begin{aligned} x_{k|k} &= x_{k|k - 1} + \frac{\sigma^{2}_{k|k - 1}}{\gamma_{1}^{2}\sigma^{2}_{k|k - 1} + \sigma^{2}_{v}}\Big[\sigma^{2}_{v}(n_{k} - p_{k|k}) + \gamma_{1}(r_{k} - \gamma_{0} - \gamma_{1}x_{k|k - 1})\Big] \end{aligned} $$
(4.30)
$$\displaystyle \begin{aligned} \sigma^{2}_{k|k} &= \Bigg[\frac{1}{\sigma^{2}_{k|k - 1}} + p_{k|k}(1 - p_{k|k}) + \frac{\gamma_{1}^{2}}{\sigma^{2}_{v}}\Bigg]^{-1}. \end{aligned} $$
(4.31)

4.3 Deriving the Parameter Estimation Step Equations

In the previous chapter, we only needed to derive the update equation for the process noise variance \(\sigma ^{2}_{\varepsilon }\) at the parameter estimation step. In the current model, we have a few more parameters. Thus we will need to derive the update equations for \(\rho \), \(\gamma _{0}\), \(\gamma _{1}\), and \(\sigma ^{2}_{v}\) in addition to the update for \(\sigma ^{2}_{\varepsilon }\).

4.3.1 Deriving the Process Noise Variance

The derivation of the process noise variance update is very similar to the earlier case in the preceding chapter. In fact, the only difference from (3.62) is that we will now have \(\rho x_{k - 1}\) in the log-likelihood term instead of \(x_{k - 1}\). We shall label the required log-likelihood term \(Q_{1}\).

$$\displaystyle \begin{aligned} Q_{1} &= \frac{-K}{2}\log\big(2\pi \sigma^{2}_{\varepsilon}\big) - \sum_{k = 2}^{K}\frac{\mathbb{E}\Big[(x_{k} - \rho x_{k - 1})^{2}\Big]}{2\sigma^{2}_{\varepsilon}}{} . \end{aligned} $$
(4.32)

We take the partial derivative of \(Q_{1}\) with respect to \(\sigma ^{2}_{\varepsilon }\) and set it to 0 to solve for the parameter estimation step update.

$$\displaystyle \begin{aligned} \frac{\partial Q_{1}}{\partial \sigma^{2}_{\varepsilon}} = &\frac{-K}{2\sigma^{2}_{\varepsilon}} + \frac{1}{2\sigma^{4}_{\varepsilon}}\sum_{k = 2}^{K}\mathbb{E}\Big[(x_{k} - \rho x_{k - 1}\Big] = 0 \end{aligned} $$
(4.33)
$$\displaystyle \begin{aligned} \implies \sigma^{2}_{\varepsilon} = &\frac{1}{K}\sum_{k = 2}^{K}\Big\{\mathbb{E}\big[x_{k}^{2}\big] - 2\rho\mathbb{E}\big[x_{k}x_{k - 1}\big] + \rho^{2}\mathbb{E}\big[x_{k - 1}^{2}\big]\Big\} \end{aligned} $$
(4.34)
$$\displaystyle \begin{aligned} = &\frac{1}{K}\Bigg\{\sum_{k = 2}^{K}U_{k} - 2\rho \sum_{k = 1}^{K - 1}U_{k, k + 1} + \rho^{2}\sum_{k = 1}^{K - 1}U_{k}\Bigg\}. \end{aligned} $$
(4.35)

The parameter estimation step update for \(\sigma ^{2}_{\varepsilon }\) when \(x_{k}\) evolves with time following \(x_{k} = \rho x_{k - 1} + \varepsilon _{k}\) is

$$\displaystyle \begin{aligned} \sigma^{2}_{\varepsilon} &= \frac{1}{K}\Bigg\{\sum_{k = 2}^{K}U_{k} - 2\rho \sum_{k = 1}^{K - 1}U_{k, k + 1} + \rho^{2}\sum_{k = 1}^{K - 1}U_{k}\Bigg\}. \end{aligned} $$
(4.36)

4.3.2 Deriving the Forgetting Factor

We will take the partial derivative of \(Q_{1}\) in (4.32) with respect to \(\rho \) and set it to 0 to solve for its parameter estimation step update.

$$\displaystyle \begin{aligned} \frac{\partial Q_{1}}{\partial \rho} &= \frac{(-1)}{2\sigma^{2}_{\varepsilon}}\sum_{k = 2}^{K}\mathbb{E}\big[-2x_{k - 1}(x_{k} - \rho x_{k - 1})\big] \end{aligned} $$
(4.37)
$$\displaystyle \begin{aligned} \implies 0 &= -\sum_{k = 2}^{K}\mathbb{E}\big[x_{k}x_{k - 1}\big] + \rho\sum_{k = 2}^{K}\mathbb{E}\big[x_{k - 1}^{2}\big] \end{aligned} $$
(4.38)
$$\displaystyle \begin{aligned} &= -\sum_{k = 1}^{K - 1}U_{k, k + 1} + \rho\sum_{k = 1}^{K - 1}U_{k} \end{aligned} $$
(4.39)
$$\displaystyle \begin{aligned} \rho &= \sum_{k = 1}^{K - 1}U_{k, k + 1} \Bigg[\sum_{k = 1}^{K - 1}U_{k}\Bigg]^{-1}. \end{aligned} $$
(4.40)

The parameter estimation step update for \(\rho \) when \(x_{k}\) evolves with time following \(x_{k} = \rho x_{k - 1} + \varepsilon _{k}\) is

$$\displaystyle \begin{aligned} \rho &= \sum_{k = 1}^{K - 1}U_{k, k + 1} \Bigg[\sum_{k = 1}^{K - 1}U_{k}\Bigg]^{-1}. \end{aligned} $$
(4.41)

4.3.3 Deriving the Constant Coefficient Terms

We will next consider the model parameters that are related to \(r_{k}\). Recall from (3.57) that we need to maximize the expected value of the log of the joint probability

$$\displaystyle \begin{aligned} p(x_{1:K} \cap y_{1:K}|\Theta) &= p(y_{1:K}|x_{1:K}, \Theta)p(x_{1:K}|\Theta). \end{aligned} $$
(4.42)

In the current state-space model, \(y_{k}\) comprises both \(n_{k}\) and \(r_{k}\). The probability term containing \(\gamma _{0}\), \(\gamma _{1}\), and \(\sigma ^{2}_{v}\) is

$$\displaystyle \begin{aligned} p(r_{1:K}|x_{1:K}, \Theta) &= \prod_{k = 1}^{K}\frac{1}{\sqrt{2\pi \sigma^{2}_{v}}} e^{\frac{-(r_{k} - \gamma_{0} - \gamma_{1}x_{k})^{2}}{2\sigma^{2}_{v}}}{}. \end{aligned} $$
(4.43)

Let us first take the log of this term followed by the expected value. Labeling this as \(Q_{2}\), we have

$$\displaystyle \begin{aligned} Q_{2} = \frac{-K}{2}\log\big(2\pi \sigma^{2}_{v}\big) - \sum_{k = 1}^{K}\frac{\mathbb{E}\Big[(r_{k} - \gamma_{0} - \gamma_{1}x_{k})^{2}\Big]}{2\sigma^{2}_{v}}{}. \end{aligned} $$
(4.44)

To solve for \(\gamma _{0}\) and \(\gamma _{1}\), we have to take the partial derivatives of \(Q_{2}\) with respect to \(\gamma _{0}\) and \(\gamma _{1}\), set them each to 0, and solve the resulting equations. We first take the partial derivative with respect to \(\gamma _{0}\).

$$\displaystyle \begin{aligned} \frac{\partial Q_{2}}{\partial \gamma_{0}} &= \frac{1}{2\sigma^{2}_{v}}\sum_{k = 1}^{K}2\mathbb{E}\big[r_{k} - \gamma_{0} - \gamma_{1}x_{k}\big] = 0 \end{aligned} $$
(4.45)
$$\displaystyle \begin{aligned} \implies 0 &= \sum_{k = 1}^{K}r_{k} - \gamma_{0}K - \gamma_{1}\sum_{k = 1}^{K}\mathbb{E}\big[x_{k}\big] \end{aligned} $$
(4.46)
$$\displaystyle \begin{aligned} &= \sum_{k = 1}^{K}r_{k} - \gamma_{0}K - \gamma_{1}\sum_{k = 1}^{K}x_{k|K} \end{aligned} $$
(4.47)
$$\displaystyle \begin{aligned} \gamma_{0}K &+ \gamma_{1}\sum_{k = 1}^{K}x_{k|K} = \sum_{k = 1}^{K}r_{k}. \end{aligned} $$
(4.48)

This provides one equation containing the two unknowns \(\gamma _{0}\) and \(\gamma _{1}\). We next take the partial derivative with respect to \(\gamma _{1}\).

$$\displaystyle \begin{aligned} \frac{\partial Q_{2}}{\partial \gamma_{1}} &= \frac{1}{2\sigma^{2}_{v}} \sum_{k = 1}^{K}2\mathbb{E}\big[x_{k}(r_{k} - \gamma_{0} - \gamma_{1}x_{k})\big] = 0 \end{aligned} $$
(4.49)
$$\displaystyle \begin{aligned} \implies 0 &= \sum_{k = 1}^{K}r_{k}\mathbb{E}\big[x_{k}\big] - \gamma_{0}\sum_{k = 1}^{K}\mathbb{E}\big[x_{k}\big] - \gamma_{1}\sum_{k = 1}^{K}\mathbb{E}\big[x_{k}^{2}\big] \\ &= \sum_{k = 1}^{K}r_{k}x_{k|K} - \gamma_{0}\sum_{k = 1}^{K}x_{k|K} - \gamma_{1}\sum_{k = 1}^{K}U_{k} \end{aligned} $$
(4.50)
$$\displaystyle \begin{aligned} \gamma_{0}\sum_{k = 1}^{K}x_{k|K} &+ \gamma_{1}\sum_{k = 1}^{K}U_{k} = \sum_{k = 1}^{K}r_{k}x_{k|K}. \end{aligned} $$
(4.51)

This provides the second equation necessary to solve for \(\gamma _{0}\) and \(\gamma _{1}\).

The parameter estimation step updates for \(\gamma _{0}\) and \(\gamma _{1}\) when we observe a continuous variable \(r_{k} = \gamma _{0} + \gamma _{1}x_{k} + v_{k}\) are

$$\displaystyle \begin{aligned} \Bigg[\begin{array}{c} \gamma_{0} \\ \gamma_{1}\end{array}\Bigg] = &\Bigg[\begin{array}{cc} K & \sum_{k = 1}^{K}x_{k|K} \\ \sum_{k = 1}^{K}x_{k|K} & \sum_{k = 1}^{K}U_{k}\end{array}\Bigg]^{-1} \Bigg[\begin{array}{c}\sum_{k = 1}^{K}r_{k} \\ \sum_{k = 1}^{K}r_{k}x_{k|K}\end{array}\Bigg]. \end{aligned} $$
(4.52)

4.3.4 Deriving the Sensor Noise Variance

The term \(Q_{2}\) in (4.44) also contains the sensor noise variance \(\sigma ^{2}_{v}\).

$$\displaystyle \begin{aligned} Q_{2} = \frac{-K}{2}\log\big(2\pi \sigma^{2}_{v}\big) - \sum_{k = 1}^{K}\frac{\mathbb{E}\Big[(r_{k} - \gamma_{0} - \gamma_{1}x_{k})^{2}\Big]}{2\sigma^{2}_{v}}. \end{aligned} $$
(4.53)

We take its partial derivative with respect to \(\sigma ^{2}_{v}\) and set it to 0 to solve for \(\sigma ^{2}_{v}\).

$$\displaystyle \begin{aligned} \frac{\partial Q_{2}}{\partial \sigma^{2}_{v}} &= \frac{-K}{2\sigma^{2}_{v}} + \frac{1}{2\sigma^{4}_{v}}\sum_{k = 1}^{K}\mathbb{E}\Big[(r_{k} - \gamma_{0} - \gamma_{1}x_{k})^{2}\Big] = 0 \end{aligned} $$
(4.54)
$$\displaystyle \begin{aligned} \implies \sigma^{2}_{v} &= \frac{1}{K}\sum_{k = 1}^{K}\mathbb{E}\Big[(r_{k} - \gamma_{0} - \gamma_{1}x_{k})^{2}\Big] \\ &= \frac{1}{K} \Bigg\{\sum_{k = 1}^{K}r_{k}^{2} + K\gamma_{0}^{2} + \gamma_{1}^{2}\sum_{k = 1}^{K}\mathbb{E}\big[x_{k}^{2}] - 2\gamma_{0}\sum_{k = 1}^{K}r_{k} - 2\gamma_{1}\sum_{k = 1}^{K}r_{k}\mathbb{E}\big[x_{k}\big] \\ &\quad + 2\gamma_{0}\gamma_{1}\sum_{k = 1}^{K}\mathbb{E}\big[x_{k}\big]\Bigg\} \\ &= \frac{1}{K} \Bigg\{\sum_{k = 1}^{K}r_{k}^{2} + K\gamma_{0}^{2} + \gamma_{1}^{2}\sum_{k = 1}^{K}U_{k} - 2\gamma_{0}\sum_{k = 1}^{K}r_{k} - 2\gamma_{1}\sum_{k = 1}^{K}r_{k}x_{k|K} \\ &\quad + 2\gamma_{0}\gamma_{1}\sum_{k = 1}^{K}x_{k|K}\Bigg\} . \end{aligned} $$
(4.55)

The parameter estimation step update for \(\sigma ^{2}_{v}\) when we observe a continuous variable \(r_{k} = \gamma _{0} + \gamma _{1}x_{k} + v_{k}\) is

$$\displaystyle \begin{aligned} \sigma^{2}_{v} &= \frac{1}{K} \Bigg\{\sum_{k = 1}^{K}r_{k}^{2} + K\gamma_{0}^{2} + \gamma_{1}^{2}\sum_{k = 1}^{K}U_{k} - 2\gamma_{0}\sum_{k = 1}^{K}r_{k} - 2\gamma_{1}\sum_{k = 1}^{K}r_{k}x_{k|K} \\ &\quad + 2\gamma_{0}\gamma_{1}\sum_{k = 1}^{K}x_{k|K}\Bigg\} . \end{aligned} $$
(4.56)

4.4 MATLAB Examples

The MATLAB code examples for implementing the EM algorithm described in this chapter are provided in the following folders:

  • one_bin_one_cont∖

    • sim∖

      • data_one_bin_one_cont.mat

      • filter_one_bin_one_cont.m

    • expm∖

      • expm_data_one_bin_two_cont.mat

      • expm_filter_one_bin_one_cont.m

Note that the code implements a slightly different version of what was discussed here in that the state equation does not contain \(\rho \). Code examples containing \(\rho \) and \(\alpha I_{k}\) are provided in the following chapter for the case where one binary and two continuous observations are present in the state-space model. The current code can easily be modified if \(\rho \) is to be included.

The code for this particular state-space model is an extension of the earlier model. It takes in as input variables n and r that denote \(n_{k}\) and \(r_{k}\), respectively. We use r0, r1, and vr for \(\gamma _{0}\), \(\gamma _{1}\), and \(\sigma ^{2}_{v}\). Shown below is a part of the code where \(\beta _{0}\) is calculated and the model parameters are initialized.

Similar to the code examples in the preceding chapter, we also use x_pred, x_updt, and x_smth to denote \(x_{k|k - 1}\), \(x_{k|k}\), and \(x_{k|K}\), respectively. Similarly, v_pred, v_updt, and v_smth are used to denote the corresponding variances \(\sigma ^{2}_{k|k - 1}\), \(\sigma ^{2}_{k|k}\), and \(\sigma ^{2}_{k|K}\). Just like in the earlier case as well, the code first progresses through the time indices \(k = 1, 2, \ldots , K\) at the state estimation step.

The update for \(x_{k|k}\) is calculated using the function shown below based on both \(n_{k}\) and \(r_{k}\).

The smoothing step also remains the same (there would have been a change if \(\rho \) was included).

The updates for \(\sigma ^{2}_{\varepsilon }\), \(\gamma _{0}\), \(\gamma _{1}\), and \(\sigma ^{2}_{v}\) are calculated at the parameter estimation step.

4.4.1 Application to EMG and Emotional Valence

Running the simulated and experimental data code examples produces the results shown in Fig. 4.2. The experimental data example relates to emotional valence and EMG. Emotion can be accounted for along two different orthogonal axes known as valence and arousal [55]. Valence refers to the pleasant–unpleasant nature of an emotion. In [27], this state-space model with one binary and one continuous feature was used to estimate emotional valence using EMG signal features. The binary and continuous features were extracted based on the amplitudes and powers of the EMG signal. The data were collected as a part of the study described in [58] where subjects were shown a series of music videos to elicit different emotional responses.

Fig. 4.2
5 pairs of graphs represent the state estimation with simulated data and experimental data. The input quantities are plotted with standard normal quantities. A fluctuating trend is observed.

State estimation based on observing one binary and one continuous variable. The left sub-figure depicts estimation on simulated data, and the right sub-figure depicts the estimation of emotional valence from EMG data. The sub-panels on the left, respectively, depict: (a) the binary event occurrences \(n_{k}\); (b) the continuous variable \(r_{k}\) (blue) and its estimate (red); (c) the probability of binary event occurrence \(p_{k}\) (blue) and its estimate (red); (d) the state \(x_{k}\) (blue) and its estimate (red); (e) the QQ plot for the residual error of \(x_{k}\). The sub-panels on the right, respectively, depict: (a) the raw EMG signal; (b) the binary EMG feature \(n_{k}\); (c) the continuous EMG feature \(r_{k}\) (blue) and its estimate (red); (d) the probability of binary event occurrence; (e) the emotional valence state \(x_{k}\). The shaded background colors on the right sub-figure correspond to music videos where subject-provided emotional valence ratings were high