The EM algorithms for state estimation that we consider in this book rely on basic concepts in statistics. In this chapter, we will review some results that will come in useful later on. Many of the concepts are introductory and only require a basic knowledge of probability and statistics. If you are already familiar with what is discussed here, feel free to skip ahead.

2.1 Basic Concepts Related to Mean and Variance

Shown below are some basic statistical results related to mean and variance that will be helpful when deriving the EM algorithm equations.

Basic Statistical Results—Part A

Given the random variables \(X_{k}\) and \(Z_{k}\), and the constant values \(\rho \) and \(\alpha \), the following results hold true for mean and variance. We use \(\mathbb {E}[\cdot ]\), \(V(\cdot )\), and \(Cov(\cdot )\) to denote the mean or expected value, the variance, and covariance, respectively.

$$\displaystyle \begin{aligned} \mathbb{E}[X_{k} + Z_{k}] &= \mathbb{E}[X_{k}] + \mathbb{E}[Z_{k}] {} \end{aligned} $$
(2.1)
$$\displaystyle \begin{aligned} \mathbb{E}[X_{k} + \alpha] &= \mathbb{E}[X_{k}] + \alpha {} \end{aligned} $$
(2.2)
$$\displaystyle \begin{aligned} \mathbb{E}[\rho X_{k}] &= \rho \mathbb{E}[X_{k}] {} \end{aligned} $$
(2.3)
$$\displaystyle \begin{aligned} V(X_{k} + Z_{k}) &= V(X_{k}) + V(Z_{k}) + 2 Cov(X_{k}, Z_{k}) {} \end{aligned} $$
(2.4)
$$\displaystyle \begin{aligned} V(X_{k} + \alpha) &= V(X_{k}) {} \end{aligned} $$
(2.5)
$$\displaystyle \begin{aligned} V(\rho X_{k}) &= \rho^{2} V(X_{k}). {} \end{aligned} $$
(2.6)

2.2 Basic Statistical Results Required for Deriving the Update Equations in the State Estimation Step

We will also require two results based on Bayes’ rule and Gaussian distributions, respectively, when deriving the update equations in the state estimation step. The two results are shown below:

  • Result 1:

    $$\displaystyle \begin{aligned} P(A|B \cap C) &= \frac{P(B|A \cap C)P(A|C)}{P(B|C)}. \end{aligned} $$

    We will consider the derivation of this result in two steps:

    • Step 1:

      $$\displaystyle \begin{aligned} P(A|B \cap C) &= \frac{P(A \cap B \cap C)}{P(B \cap C)} \end{aligned} $$
      (2.7)
      $$\displaystyle \begin{aligned} &= \frac{P(A \cap B \cap C)}{P(B \cap C)} \times \frac{P(C)}{P(C)} \end{aligned} $$
      (2.8)
      $$\displaystyle \begin{aligned} &= \frac{P(A \cap B \cap C)}{P(C)} \times \frac{1}{\frac{P(B \cap C)}{P(C)}} \end{aligned} $$
      (2.9)
      $$\displaystyle \begin{aligned} &= \frac{P(A \cap B | C)}{P(B|C)}. {} \end{aligned} $$
      (2.10)
    • Step 2:

      $$\displaystyle \begin{aligned} P(A \cap B | C) &= \frac{P(A \cap B \cap C)}{P(C)} \end{aligned} $$
      (2.11)
      $$\displaystyle \begin{aligned} &= \frac{P(A \cap B \cap C)}{P(C)} \times \frac{P(A \cap C)}{P(A \cap C)} \end{aligned} $$
      (2.12)
      $$\displaystyle \begin{aligned} &= \frac{P(A \cap B \cap C)}{P(A \cap C)} \times \frac{P(A \cap C)}{P(C)} \end{aligned} $$
      (2.13)
      $$\displaystyle \begin{aligned} &= P(B|A \cap C)P(A|C). {} \end{aligned} $$
      (2.14)

    We will substitute the result for \(P(A \cap B | C)\) in (2.14) and put it into (2.10) to obtain

    $$\displaystyle \begin{aligned} P(A|B \cap C) &= \frac{P(A \cap B | C)}{P(B|C)} = \frac{P(B|A \cap C)P(A|C)}{P(B|C)}. \end{aligned} $$
    (2.15)

Basic Statistical Results—Part B

Letting \(A = X_{k}\), \(B = Y_{k}\), and \(C = Y_{1:k-1}\), we can use the result just shown above to obtain

$$\displaystyle \begin{aligned} P(X_{k}|Y_{1:k}) &= P(X_{k}|Y_{k}, Y_{1:k-1}) = \frac{P(Y_{k}|X_{k}, Y_{1:k-1})P(X_{k}|Y_{1:k-1})}{P(Y_{k}|Y_{1:k-1})}{}. \end{aligned} $$
(2.16)

Recall that we split our state estimation into two steps: the predict step and the update step. At the predict step, we derive an estimate for \(x_{k}\) given that we have not yet observed the sensor reading \(y_{k}\). This estimate is actually based on \(P(X_{k}|Y_{1:k-1})\) since information only available until time index \((k - 1)\) is used to derive it. At the update step, we improve the predict step estimate based on \(P(X_{k}|Y_{1:k-1})\) to now include information from the new sensor measurement \(y_{k}\), i.e., we make use of \(y_{k}\) to obtain a new estimate based on \(P(X_{k}|Y_{1:k})\). The result in (2.16) will come in very useful at the update step.

  • Result 2:

    The mean and variance of a Gaussian random variable can be obtained by taking the derivatives of the exponent term of its probability density function (PDF).

    Consider \(X \sim \mathcal {N}(\mu , \sigma ^{2})\). The PDF of X is given by

    $$\displaystyle \begin{aligned} p(x) &= \frac{1}{\sqrt{2 \pi \sigma^{2}}}e^{qx} \enspace \text{where} \enspace q = \frac{-(x - \mu)^{2}}{2 \sigma^{2}}. \end{aligned} $$
    (2.17)

    To obtain the mean of X, we take the derivative of q and set it to 0 to determine where the maximum value occurs.

    $$\displaystyle \begin{aligned} \frac{dq}{dx} &= \frac{-2(x - \mu)}{2 \sigma^{2}} = 0 \end{aligned} $$
    (2.18)
    $$\displaystyle \begin{aligned} \implies x &= \mu. {} \end{aligned} $$
    (2.19)

    Therefore, the mean value occurs at the location for x at which the derivative of the exponent term is equal to 0.

    We next consider the variance. The second derivative of q with respect to x is

    $$\displaystyle \begin{aligned} \frac{d^{2}q}{dx^{2}} &= \frac{-1}{\sigma^{2}}. \end{aligned} $$
    (2.20)

    And therefore, the variance is given by

    $$\displaystyle \begin{aligned} \implies \sigma^{2} &= -\Bigg(\frac{d^{2}q}{dx^{2}}\Bigg)^{-1}{}. \end{aligned} $$
    (2.21)

Basic Statistical Results—Part C

In all the derivations of the state estimation step update equations, we will assume that the density functions are approximately Gaussian. We will also make use of what we have just shown: (i) the mean value is given by the location at which the first derivative of the exponent term is equal to 0; (ii) the variance is given by the negative inverse of the second derivative of the exponent term.

2.3 General Observations Related to Gaussian Random Variables

In general, for a set of independent Gaussian random variables \(Z_{i} \sim \mathcal {N}(\mu _{i}, \sigma ^{2}_{i})\), the following holds true.

$$\displaystyle \begin{aligned} \sum_{i} a_{i} Z_{i} \sim \mathcal{N}\Bigg(\sum_{i} a_{i}\mu_{i}, \sum_{i} a^{2}_{i}\sigma^{2}_{i}\Bigg), \end{aligned} $$
(2.22)

where the \(a_{i}\)’s are constant terms. Also, adding a constant term to a Gaussian random variable will cause it to remain Gaussian but have a shifted mean and unchanged variance. This can be verified from first principles (change of variables formula).

Basic Statistical Results—Part D

In general, for a set of independent Gaussian random variables \(Z_{i} \sim \mathcal {N}(\mu _{i}, \sigma ^{2}_{i})\),

$$\displaystyle \begin{aligned} \sum_{i} a_{i} Z_{i} \sim \mathcal{N}\Bigg(\sum_{i} a_{i}\mu_{i}, \sum_{i} a^{2}_{i}\sigma^{2}_{i}\Bigg){}. \end{aligned} $$
(2.23)