Optimization of Real-Time Coding and Control Policies: Structural and Existence Results

Yüksel, Serdar; Başar, Tamer

doi:10.1007/978-1-4614-7085-4_10

Serdar Yüksel³ &
Tamer Başar⁴

Part of the book series: Systems & Control: Foundations & Applications ((SCFA))

2321 Accesses

Abstract

This chapter establishes the structure of optimal quantization policies under various information structures for general cost functions. The coverage includes both single decision maker and multiple decision maker formulations, with partial as well as full observation. A dynamic programming approach is presented building on classical results by Witsenhausen, and Walrand and Varaiya. Existence results are established for optimal encoding policies. The chapter also presents optimal solutions for encoders and controllers under quadratic performance measure for linear Gaussian systems controlled over discrete noiseless channels.

Download chapter PDF

Keywords

1 Introduction

In Part II of this book we addressed the problem of stabilization of networked control systems. In this chapter, and Part III overall, we move beyond stabilization and study optimization of such systems, from the points of view of both encoding and control policies.

The chapter considers the optimal causal encoding/quantization problem for networked control systems. It presents structural results on optimal causal coding of Markov sources in a large class of settings: fully observed and partially observed Markov sources as well as multi-sensor systems and systems driven by control. For the optimal causal coding of a fully observed or a partially observed Markov source, the structure of optimal causal coders are obtained, which feature a separation structure. It is also shown that real-time decentralized coding of a partially observed i.i.d. source admits a memoryless optimal solution. Such a result does not, in general, extend to decentralized coding of partially observed Markov sources. We also establish in the chapter the existence of optimal control and quantization policies under appropriate technical conditions. Linear systems with quadratic cost will also be considered.

The contents of the chapter are as follows: In Sect. 10.2, we introduce the problem structure while also revisiting the setup of Sect. 5.2.2. We then present, in Sect. 10.3, structural results on optimal encoders, more precisely on optimal real-time coding of Markov sources when there is only one encoder. We study both fully observed and partially observed settings, as well as systems driven by control. Section 10.4 considers the existence of optimal quantization policies. In Sect. 10.5, we move to a decentralized setting, and show through a counterexample the difficulty one encounters in obtaining structural results for decentralized coding in the absence of a controlled Markov state construction, while providing a separation result when the source is memoryless. We discuss, in Sect. 10.6.2, the case of a partially observed Gaussian source and establish the optimality of a separation structure of estimation/filtering and quantization of the filtering output. We also investigate the optimal quantization and control problem for linear-quadratic-Gaussian (LQG) problems, and building on the developments in Chap. 4, we establish the existence of optimal quantizers and control policies. Finally, Sect. 10.7 considers the structure of optimal coding policies for the case with noisy channels and noiseless feedback. An appendix to the chapter includes proofs of the main results.

For background reading on Markov Decision Processes as well as for a review of the LQG control problem and Kalman filtering, we refer the reader to Appendix D.

2 Policies and Action Spaces for Encoding

We consider a typical causal encoding/quantization setup of the type introduced earlier in Chap. 5 (Sect. 5.2.2). For simplicity in exposition, but without much loss of conceptual generality, we consider the case of only two encoders and within this context introduce the causality and measurability constraints in quantizer design for decentralized systems.

Consider first a control-free partially observed Markov process, defined on a probability space, $(\Omega,\mathcal{F},P)$, and generated by the following scalar discrete-time equations for $t \geq 0$:

$$\displaystyle\begin{array}{rcl} x_{t+1}& =& f(x_{t},w_{t}),{}\end{array}$$

(10.1)

$$\displaystyle\begin{array}{rcl} y_{t}^{i}& =& {g}^{i}(x_{ t},v_{t}^{i}),{}\end{array}$$

(10.2)

for (Borel) measurable functions f, g ⁱ, i = 1, 2, with $\{w_{t},v_{t}^{i},i = 1,2\}$ zero-mean noise processes with finite second moments, which are independent across time and space. We further have $x_{t} \in \mathbb{X}$, and $y_{t}^{i} \in {\mathbb{Y}}^{i}$, where $\mathbb{X}, {\mathbb{Y}}^{i}$ are Polish spaces. Let an encoder, Encoder i, be located at one end of a measurement channel characterized by (10.2), this being so for i = 1, 2. The encoders transmit their information to a receiver (see Fig. 10.1), over a discrete noiseless channel with finite capacity, and hence, they have to quantize their input.

We let, as before in Sect. 5.2.2, ${\Pi }^{comp,i}$ denote a composite quantization policy for Encoder i, defined as a sequence of functions $\{Q_{t}^{comp,i},t \geq 0\}$ which are causal such that the quantization output at time t, $q_{t}^{i}$, under ${\Pi }^{comp,i}$ is generated by a function of its local information, that is, a mapping measurable with respect to the sigma-algebra generated by

$$\displaystyle{I_{t}^{i} =\{ y_{ [0,t]}^{i},q_{ [0,t-1]}^{i},z_{ [0,t-1]}^{i}\},\quad t \geq 1,}$$

and $I_{0}^{i} =\{ y_{0}^{i}\}$, with image space $\mathcal{M}_{t}^{i}$, where $\mathcal{M}_{t}^{i} :=\{ 1,2,\ldots,\vert \mathcal{M}_{t}^{i}\vert \}$, for $0 \leq t \leq T - 1$ and i = 1, 2. Here z ⁱ denotes some additional side information available, such as feedback from the receiver.

Let $\mathbb{I}_{t}^{i}$ denote the space $I_{t}^{i}$ belongs to; hence

$$\displaystyle{Q_{t}^{comp,i} : \mathbb{I}_{ t}^{i} \rightarrow \mathcal{M}_{ t}^{i}.}$$

As discussed in Sect. 5.2.2, equivalently, we can express the policy Π ^comp, i as a composition of a quantization policy ${\underline{\gamma }}^{i}$ and a quantizer: A quantization policy of Encoder i, ${\underline{\gamma }}^{i}$, is a sequence of functions $\{\gamma _{t}^{i}\}$, such that for each $t \geq 0$, $\gamma _{t}^{i}$ is a mapping from the information space $\mathbb{I}_{t}^{i}$ to the space of quantizers $\mathbb{Q}_{t}^{i}$. A quantizer is subsequently used to generate the quantizer output. Without any loss of generality, a quantizer action will be generated based on the common information at the encoders and the receiver, and the quantizer will map the relevant private information at the encoder to the quantization output. Let the information at the receiver at time t be $I_{t}^{r} =\{ q_{[0,t-1]}^{1},q_{[0,t-1]}^{2}\}$, for $t \geq 1$. Let the common information, under feedback, at the encoders and the receiver be $I_{t}^{c}$. Thus, we can express any measurable composite quantization policy as

$$\displaystyle\begin{array}{rcl} Q_{t}^{comp,i}(I_{ t}^{i}) = (\gamma _{ t}^{i}(I_{ t}^{c}))(I_{ t}^{i} \setminus I_{ t}^{c}),& &{}\end{array}$$

(10.3)

mapping the information space to $\mathcal{M}_{t}^{i}$.

Viewing each encoder as an agent or a decision maker (DM), we let DMi have policy ${\underline{\gamma }}^{i}$ and under this policy generate quantizers $\{Q_{t}^{i},t \geq 0\}$, $Q_{t}^{i} \in \mathbb{Q}_{t}^{i}$. Under action $Q_{t}^{i}$, the encoder generates $q_{t}^{i}$, as the quantization output at time $t$.

The receiver (or the controller), upon receiving the information from the encoders, generates its decision at time t, also causally: An admissible causal receiver policy is a sequence of measurable functions ${\underline{\gamma }}^{0} =\{\gamma _{ t}^{0}\}$ such that

$$\displaystyle{\gamma _{t}^{0} :\prod _{ s=0}^{t}\bigg(\mathcal{M}_{ s}^{1} \times \mathcal{M}_{ s}^{2}\bigg) \rightarrow \mathbb{U},\quad \quad t \geq 0,}$$

where $\mathbb{U}$ denotes the decision set for the receiver.

With the above formulation, one typical objective functional for the decision makers would be the following:

$$\displaystyle\begin{array}{rcl} \inf _{{\mathbf{\Pi }}^{comp}}\inf _{{\underline{\gamma }}^{0}}E_{\nu _{0}}^{{\mathbf{\Pi }}^{comp},{\underline{\gamma }}^{0} }[\sum _{t=0}^{T-1}c(x_{ t},u_{t})],& &{}\end{array}$$

(10.4)

with initial condition distribution $\nu _{0}$. Here $c(\cdot,\cdot )$, is a nonnegative, measurable function and $u_{t} =\gamma _{ t}^{0}(\mathbf{q}_{[0,t]})$ (with $\mathbf{q} = ({q}^{1},{q}^{2})$) for $t \geq 0$.

Before concluding this section, it may be worth emphasizing the operational nature of causality, as different approaches could be adopted. The encoders at any given time can only use their local information to generate the quantization outputs. The receiver, at any given time, can only use its local information to generate its decision/estimate. These happen with zero delay, that is, if there is a common clock at the encoders and the receiver, the receiver at time t needs to make its decision before the realizations $x_{t+1},y_{t+1}^{1},y_{t+1}^{2}$ have taken place. This corresponds to the zero-delay coding schemes of, for example, Witsenhausen [396] and Linder and Lugosi [236].

3 Single Terminal Case: Optimal Causal Codingof a Partially Observed Markov Source

3.1 Single Terminal, Fully Observed Case

We first consider the single-encoder, fully observed case: In this setup, (10.1)–(10.2) hold with one encoder, that is,

$$\displaystyle\begin{array}{rcl} x_{t+1} = f(x_{t},w_{t}),\quad \quad y_{t} = x_{t},t = 0,1,...\,.& &{}\end{array}$$

(10.5)

Let $\mathcal{P}(\mathbb{X})$ denote the space of probability measures on $\mathcal{B}(\mathbb{X})$ under the topology of weak convergence and define $\pi _{t} \in \mathcal{P}(\mathbb{X})$ to be the regular conditional probability measure given by $\pi _{t}(\cdot ) = P(x_{t} \in \cdot \vert q_{[0,t-1]})$, that is,

$$\displaystyle{\pi _{t}(A) = P(x_{t} \in A\vert q_{[0,t-1]}),\quad A \in \mathcal{B}(\mathbb{X}).}$$

We first state the following theorem on the structure of optimal causal quantization policies, due to Witsenhausen [396].

Theorem 10.3.1 (Witsenhausen [396]).

For system ( 10.5 ) and optimization problem ( 10.4 ), any composite quantization policy can be replaced, without any loss in performance, by one which only uses x _t and q _[0,t−1] at time $t \geq 1$.

Proof.

See Sect. 10.8.1. □

The following result is essentially due to Walrand and Varaiya [385]; however, the form below is more general since the spaces considered are not necessarily finite.

Theorem 10.3.2 ([425]).

For system ( 10.5 ) and optimization problem ( 10.4 ), any composite quantization policy can be replaced, without any loss in performance, by one which only uses the conditional probability measure $\pi _{t}(\cdot ) = P(x_{t} \in \cdot \vert q_{[0,t-1]})$, the state x _t, and the time information t, at time $t \geq 1$.

Proof.

See Sect. 10.8.2. □

Remark 10.3.1.

The difference between the structural results above is the following: In the setup of Theorem 10.3.1, the encoder’s memory space is not fixed and keeps expanding as the decision horizon in the optimization, T − 1, increases. In the setup of Theorem 10.3.2, the memory space of an optimal encoder is fixed. In general, the space of probability measures is a very large one; however, it may be the case that different quantization outputs may lead to the same conditional probability measure on the state process, leading to a reduction in the required memory. Furthermore, Theorem 10.3.2 allows one to apply the theory of Markov Decision Processes, an aspect which we will elaborate on further in this chapter.$\diamond$

As we observed in Remark 4.7.2, the set [see (4.10)]

$$\displaystyle{\Theta :=\{\zeta \in P({\mathbb{R}}^{n} \times \mathcal{M}) :\zeta = PQ,Q \in \mathcal{Q}\},}$$

(with $\mathcal{Q}$ denoting the set of $\vert \mathcal{M}\vert $-cell quantizers) is the Borel measurable set of the extreme points of the set of probability measures on ${\mathbb{R}}^{n} \times \mathcal{M}$ with a fixed input marginal P. In view of this observation and that the class of quantization policies which admit the structure suggested in Theorem 10.3.2 is an important one, we henceforth define

$$\displaystyle\begin{array}{rcl} & & \Pi _{W} :=\bigg\{ {\Pi }^{comp} =\{ Q_{ t}^{comp},t \geq 0\} : \quad \exists \gamma _{ t}^{1} : \mathcal{P}(\mathbb{X}) \rightarrow \mathcal{Q} \\ & &\quad \quad Q_{t}^{comp}(I_{ t}) = (\gamma _{t}^{1}(\pi _{ t}))(x_{t}),\forall I_{t}\bigg\}, {}\end{array}$$

(10.6)

to represent this class of policies. Here, the input measure is time varying and is given by $\pi_t$

3.2 Partially Observed Markov Source

We consider here the setup of (10.1)–(10.2) but with a single encoder. Thus, the system considered is a discrete-time scalar system described by

$$\displaystyle\begin{array}{rcl} x_{t+1} = f(x_{t},w_{t}),\quad \quad y_{t} = g(x_{t},v_{t}),t = 0,1,...& &{}\end{array}$$

(10.7)

where x _t, $\{w_{t},v_{t}\}$ are as introduced earlier. Let the quantizer, as described earlier, map its information to a finite set $\mathcal{M}_{t}$. At any given time, the receiver generates a quantity u _t as a function of its received information, that is, as a function of $\{q_{0},q_{1},\ldots,q_{t}\}$. The goal is to obtain a solution to (10.4) subject to constraints on the number of quantizer bins in $\mathcal{M}_{t}$ and the causality restriction in encoding and decoding.

Now, define $\tilde{\pi }_{t} \in \mathcal{P}(\mathbb{X})$ to be the regular conditional probability measure (whose existence for every realization of observation variables follows from the fact that both the state and the observation spaces are Polish) given by $P(dx_{t}\vert y_{[0,t]})$, that is,

$$\displaystyle{\tilde{\pi }_{t}(A) = P(x_{t} \in A\vert y_{[0,t]}),\quad A \in \mathcal{B}(\mathbb{X}).}$$

Under the topology of weak convergence for $\mathcal{P}(\mathbb{X})$, $\{\tilde{\pi }_{t}\}$ evolves according to a nonlinear filtering equation (see (10.40); see also [347]) and is itself a Markov process. Let us also define $\Xi _{t} \in \mathcal{P}(\mathcal{P}(\mathbb{X}))$ as the regular conditional measure

$$\displaystyle{\Xi _{t}(A) = P(\tilde{\pi }_{t} \in A\vert q_{[0,t-1]}),\quad A \in \mathcal{B}(\mathcal{P}(\mathbb{X})).}$$

The following are the main results of this subsection.

Theorem 10.3.3 ([425]).

For system ( 10.7 ) and optimization problem ( 10.4 ) with c bounded, any composite quantization policy can be replaced, without any loss in performance, by one which only uses $\{\tilde{\pi }_{t},q_{[0,t-1]}\}$ as a sufficient statistic for $t \geq 1$ . This can be expressed as a quantization policy which only uses q _[0,t−1] to generate a quantizer, where the quantizer uses $\tilde{\pi }_{t}$ to generate the quantization output at time $t \geq 1$.

Proof.

See Sect. 10.8.3. □

Theorem 10.3.4 ([425]).

For system ( 10.7 ) and optimization problem ( 10.4 ) with c bounded, any composite quantization policy can be replaced, without any loss in performance, by one which only uses $\{\Xi _{t},\tilde{\pi }_{t},t\}$ for $t \geq 1$ . This can be expressed as a quantization policy which only uses $\{\Xi _{t},t\}$ to generate a quantizer, where the quantizer uses $\tilde{\pi }_{t}$ to generate the quantization output at time $t \geq 1$.

Proof.

See Sect. 10.8.4. □

A number of remarks are now in order.

Remark 10.3.2.

From the proof of Theorem 10.3.4, we will see that $(\Xi _{t},Q_{t})$ forms a controlled Markov chain. Defining the actions as the quantizers allows one to define a Markov Decision Problem with well-defined cost functions, and state and action spaces.$\diamond$

Remark 10.3.3.

The results above can be viewed as direct extensions of the ones in the previous subsection with perfect state measurements. In fact, once one recognizes the fact that $\{\tilde{\pi }_{t}\}$ forms a Markov source and the cost function can be expressed as $\tilde{c}(\tilde{\pi },u)$, for some function $\tilde{c} : \mathcal{P}(\mathbb{X}) \times \mathbb{U} \rightarrow \mathbb{R}$, one could almost directly apply Theorems 10.3.1 and 10.3.2 to recover the structural results above. $\diamond$

The results of Theorems 10.3.3 and 10.3.4 are also generalizable to settings where (a) the source is Markov of order m > 0, (b) a finite delay d is allowed at the decoder, and (c) the observation process depends also on past source outputs in a sense described in (10.8) below. For these cases, we consider the following generalization of the source by expanding the state space.

Suppose that the partially observed source is such that either the source is Markov of order m or there is a finite delay d > 0 which is allowed at the decoder. Then we can augment the source to obtain $z_{t} =\{ x_{[t-\max (d+1,m)+1,t]}\}$. Note that $\{z_{t}\}$ is Markov. We can thus consider the following representation:

$$\displaystyle\begin{array}{rcl} z_{t+1} =\tilde{ f}(z_{t},\tilde{w}_{t}),\quad \quad y_{t} =\tilde{ g}(z_{t},\tilde{v}_{t}),& &{}\end{array}$$

(10.8)

for some $\tilde{f},\tilde{g}$, and where $z_{t} =\{ x_{[t-\max (d+1,m)+1,t]}\} \in {\mathbb{X}}^{\max (d+1,m)}$, and $\tilde{w}_{t},\tilde{v}_{t}$ are mutually independent, i.i.d. processes.

Any per-stage cost function of the form $c(x_{t},u_{t})$ can be written as for some $\tilde{c}$: $\tilde{c}(z_{t},u_{t})$. For the finite delay case, the cost per stage can further be specialized as $\tilde{c}(x_{t-d},u_{t})$. For the Markov case with memory, the cost function per stage writes as $\tilde{c}(x_{[t-m+1,t]},u_{t})$.

Now, by replacing $\mathbb{X}$ with ${\mathbb{X}}^{\max (d+1,m)}$, let $\tilde{\pi }_{t} \in \mathcal{P}({\mathbb{X}}^{\max (d+1,m)})$ be given by

$$\displaystyle{\tilde{\pi }_{t}(A) = P(z_{t} \in A\vert y_{[0,t]}),\quad A \in \mathcal{B}({\mathbb{X}}^{\max (d+1,m)})}$$

and $\Xi _{t} \in \mathcal{P}(\mathcal{P}({\mathbb{X}}^{\max (d+1,m)}))$ be the regular conditional measure defined by

$$\displaystyle{\Xi _{t}(A) = P(\tilde{\pi }_{t} \in A\vert q_{[0,t-1]}),\quad A \in \mathcal{B}(\mathcal{P}({\mathbb{X}}^{\max (d+1,m)})).}$$

Hence, we have the following result, which is a direct extension of Theorems 10.3.3 and 10.3.4. We assume that c is bounded.

Theorem 10.3.5.

Suppose that the partially observed source is such that either the source is Markov of order m or there is a finite delay d > 0 which is allowed at the decoder. With $z_{t} =\{ x_{[t-\max (d+1,m)+1,t]}\}$ and y _t generated by ( 10.8 ), the following holds:

(i)
Any (causal) composite quantization policy can be replaced, without any loss in performance, by one which only uses $\{\tilde{\pi }_{t},q_{[0,t-1]}\}$ as a sufficient statistic for $t \geq 1$ . This can be expressed as a quantization policy which only uses q _[0,t−1] to generate a quantizer, where the quantizer uses $\tilde{\pi }_{t}$ to generate the quantization output at time $t \geq 1$.
(ii)
Any (causal) composite quantization policy can be replaced, without any loss in performance, by one which only uses $\{\Xi _{t},\tilde{\pi }_{t},t\}$ for $t \geq 1$ . This can be expressed as a quantization policy which only uses $\{\Xi _{t},t\}$ to generate a quantizer, where the quantizer uses $\tilde{\pi }_{t}$ to generate the quantization output at time $t \geq 1$.

For a further case where the decoder’s memory is limited or imperfect, the results apply by replacing the full information considered so far at the receiver with the limited one with additional assumptions on the decoder’s update of its memory (in particular, (10.42) in the proof of Theorem 10.3.4 does not apply in general). However, an equivalent result of Theorem 10.3.3 applies also for the limited memory setting. Such memory settings have been considered in [248, 385, 396].

3.3 Structural Results for Systems with Control

Theorem 10.3.2 applies also for Markov sources driven by control. That is, instead of (10.1)–(10.2), consider a system described by the following equations:

$$\displaystyle\begin{array}{rcl} x_{t+1}& =& f(x_{t},u_{t},w_{t}), \\ y_{t}& =& x_{t}. {}\end{array}$$

(10.9)

Suppose that the goal is the minimization of (10.4), with the information restrictions stated in Sect. 10.2.

For this system, we have the following result (which extends the finite state-action space analysis in [250, 384]).

Theorem 10.3.6 ([423]).

(i)
For system ( 10.9 ) and optimization problem ( 10.4 ), any composite quantization policy (with a given control policy) can be replaced, without any loss in performance, by one which only uses x _t and q _[0,t−1] at time $t \geq 1$, while keeping the control policy unaltered. This can be expressed as a quantization policy which only uses q _[0,t−1] to generate a quantizer, where the quantizer uses x _t to generate the quantization output at time t.
(ii)
For system ( 10.9 )and optimization problem ( 10.4 ),any composite quantization policy can be replaced, without any loss in performance, by one which only uses the conditional probability measure $\pi _{t}(\cdot ) = P(x_{t} \in \cdot \vert q_{[0,t-1]})$, the state x _t, and the time information t, at time t. This can be expressed as a quantization policy which only uses $\{\pi _{t},t\}$ to generate a quantizer, where the quantizer uses x _t to generate the quantization output at time t.

Proof.

See Sect. 10.8.5. □

The result also applies to the partially observed case with the conditional probability replacing the state as in Theorem 10.3.4. The proof follows from those of Theorems 10.3.4 and 10.3.6.

4 Existence of Optimal Zero-Delay Quantizers

We now discuss the problem of existence of optimal composite quantization policies, given the structural results for a fully observed setting. We assume that the source to be quantized is an ${\mathbb{R}}^{n}$-valued Markov source. The goal is to minimize the cost

$$\displaystyle{ J_{\pi _{0}}({\Pi }^{comp},{\underline{\gamma }}^{0},T) := E_{\pi _{ 0}}^{{\Pi }^{comp},{\underline{\gamma }}^{0} }{\biggl [\sum _{t=0}^{T-1}c(x_{ t},u_{t})\biggr ]}, }$$

(10.10)

for some $T \geq 1$, where $c : {\mathbb{R}}^{n} \times \mathbb{U} \rightarrow \mathbb{R}_{+}$ is a (measurable) stagewise cost function where $\mathbb{U}$ is an action set.

We have the following assumptions on the source $\{x_{t}\}$ and the cost function:

Assumption 10.4.1.

(i)
The evolution of the Markov source $\{x_{t}\}$ in (10.1)–(10.2)is given by
$$\displaystyle\begin{array}{rcl} x_{t+1}& =& f(x_{t}) + w_{t},\quad t \geq 0 \\ y_{t}& =& x_{t}, {}\end{array}$$
(10.11)
where $\{w_{t},t \geq 0\}$ is an i.i.d. Gaussian noise sequence and $f : {\mathbb{R}}^{n} \rightarrow {\mathbb{R}}^{n}$ is measurable and bounded.
(ii)
The cost function $c : {\mathbb{R}}^{n} \times \mathbb{U} \rightarrow \mathbb{R}_{+}$ is continuous and bounded.
(iii)
The initial probability measure π ₀ is Gaussian.
(iv)
$\mathbb{U}$ is compact (the compactness condition will be relaxed for LQG problems in Sect. 10.6.3).

In this section, we will assume that the number of bins for the quantizers is constant for every time stage such that $\vert \mathcal{M}_{t}\vert = M$ for all t. As discussed in Sect. 4.7, a quantizer Q with cells $\{B_{1},\ldots,B_{M}\}$ can be characterized as a stochastic kernel Q from ${\mathbb{R}}^{n}$ to $\{1,\ldots,M\}$ defined by

$$\displaystyle{Q(i\vert x) = 1_{\{x\in B_{i}\}},\quad i = 1,\ldots,M.}$$

We endow the quantizers with a topology induced by such a stochastic kernel interpretation as in Sect. 4.7. If P is a probability measure on ${\mathbb{R}}^{n}$ and Q is a stochastic kernel from ${\mathbb{R}}^{n}$ to $\mathcal{M}$, then PQ denotes the resulting joint probability measure on ${\mathbb{R}}^{n} \times \mathcal{M}$. That is, a quantizer sequence Q _n converges to Q weakly at P ($Q_{n} \rightarrow Q$ weakly at P) if $P{Q}^{n} \rightarrow PQ$ weakly. Similarly, Q _n converges to Q in total variation at P ($Q_{n} \rightarrow Q$ at P in total variation at P) if $P{Q}^{n} \rightarrow PQ$ in total variation.

Suppose we adopt a quantizer policy which is in Π _W, that is, it admits the form suggested by Theorem 10.3.2. Properties of conditional probability leads to the following expression for $\pi _{t}(dx_{t}) = P(dx_{t}\vert q_{[0,t-1]})$ for $t \geq 1$:

$$\displaystyle\begin{array}{rcl} \pi _{t}(dx_{t}) = \frac{\int _{x_{t-1}}\pi _{t-1}(dx_{t-1})P(q_{t-1}\vert \pi _{t-1},x_{t-1})P(dx_{t}\vert x_{t-1})} {\int _{x_{t-1}}\int _{x_{t}}\pi _{t-1}(dx_{t-1})P(q_{t-1}\vert \pi _{t-1},x_{t-1})P(dx_{t}\vert x_{t-1})}.& & {}\\ \end{array}$$

Let $\mathcal{P}$ be the set of probability measures on ${\mathbb{R}}^{n}$ endowed with the topology of weak convergence. The following is a consequence of Theorem 10.3.4 and Remark 10.3.2.

Theorem 10.4.1.

The sequence of conditional measures and the sequence of quantizers, $(\pi _{t},Q_{t})$, form a joint Markov process in $\mathcal{P}\times \mathcal{Q}$.

Now, under any quantization policy in Π _W and for any T ≥ 1, by optimizing the receiver policy given a composite quantization policy in (10.10), we can define

$$\displaystyle{J_{\pi _{0}}({\Pi }^{comp},T) = E_{\pi _{ 0}}^{{\Pi }^{comp} }{\biggl [\sum _{t=0}^{T-1}\tilde{c}(\pi _{ t},Q_{t})\biggr ]},}$$

where, with $B_{i} = Q_{t}^{-1}(i)$, i = 1, …, M denoting the cells of Q _t, we have

$$\displaystyle\begin{array}{rcl} & & \tilde{c}(\pi _{t},Q_{t}) \\ & & =\sum _{i\in \mathcal{M}}P(q_{t} = i\vert q_{[0,t-1]})\inf _{u\in \mathbb{U}}\bigg(\int P(dx_{t}\vert q_{[0,t-1]},q_{t} = i)c(x_{t},u)\bigg) \\ & & =\sum _{i\in \mathcal{M}}\inf _{u\in \mathbb{U}}\int _{B_{i}}\pi _{t}(dx)c(x,u). {}\end{array}$$

(10.12)

As in Sect. 4.7, we restrict the set of quantizers considered by only allowing quantizers having convex quantization bins (cells) B _i, i = 1, …, M.

Assumption 10.4.2.

The quantizers have convex codecells with at most a given number of cells, that is, the quantizers live in $\mathcal{Q}_{c}(M)$, the collection of k-cell quantizers with convex cells where $1 \leq k \leq M$.

Let $\Pi _{W}^{C}$ denote the set of all composite quantization policies Π _W [defined in (10.6)] which in addition satisfy the condition that all quantizers Q _t, t ≥ 0 have convex cells (i.e., $Q_{t} \in \mathcal{Q}_{c}$ for all t ≥ 0).

We have the following result on the existence of optimal quantizers.

Theorem 10.4.2 ([437]).

For any $T \geq 1$ and arbitrary initial condition π ₀, under Assumptions 10.4.1 and 10.4.2, there exists a policy in $\Pi _{W}^{C}$ such that

$$\displaystyle{ \inf _{{\Pi }^{comp}\in \Pi _{W}^{C}}\inf _{{\gamma }^{0}}J_{\pi _{0}}({\Pi }^{comp}{,\gamma }^{0},T) }$$

(10.13)

is achieved. Letting $J_{T}^{T}(\cdot ) = 0$ and

$$\displaystyle{J_{0}^{T}(\pi _{ 0}) :=\min _{{\Pi }^{comp}\in \Pi _{W}^{C}{,\gamma }^{0}}J_{\pi _{0}}({\Pi }^{comp}{,\gamma }^{0},T),}$$

the dynamic programming recursion

$$\displaystyle{ J_{t}^{T}(\pi _{ t}) =\min _{Q\in \mathcal{Q}_{c}}\bigg(c(\pi _{t},Q_{t}) + E[J_{t+1}^{T}(\pi _{ t+1})\vert \pi _{t},Q_{t}]\bigg) }$$

(10.14)

holds for all t = 0,1,…,T − 1.

Proof.

See Sect. 10.8.6. □

5 Multiterminal (Decentralized) Setting

5.1 Memoryless Sources

Let us first consider a special, but important, case of (10.1)–(10.2) when $\{x_{t},t \geq 0\}$ is an i.i.d. sequence. Further, suppose that the observations are generated by

$$\displaystyle\begin{array}{rcl} y_{t}^{i}& =& {g}^{i}(x_{ t},v_{t}^{i}),{}\end{array}$$

(10.15)

for measurable functions g ⁱ, i = 1, 2, with $\{v_{t}^{1},v_{t}^{2}\}$ (across time) an i.i.d. noise process. We do not require that $v_{t}^{1}$ and $v_{t}^{2}$ are independent for a given t. We note that the results presented here are also applicable when the process $\{v_{t}^{1},v_{t}^{2}\}$ is only independent (across time), but not necessarily identically distributed. One difference with the general setup considered earlier is that we require the observation spaces ${\mathbb{Y}}^{i},i = 1,2$, to be finite spaces; $\mathbb{X}$ is Polish.

Suppose the goal is again the minimization

$$\displaystyle\begin{array}{rcl} \inf _{{\mathbf{\Pi }}^{comp}}\inf _{{\underline{\gamma }}^{0}}E_{\nu _{0}}^{{\mathbf{\Pi }}^{comp},{\underline{\gamma }}^{0} }[\sum _{t=0}^{T-1}c(x_{ t},u_{t})].& &{}\end{array}$$

(10.16)

Toward this end, we introduce the class of nonstationary memoryless team policies, given by

$$\displaystyle\begin{array}{rcl} & & {\Pi }^{NSM} :=\bigg\{{ \mathbf{\Pi }}^{comp} : P(\mathbf{q}_{ t}\vert \mathbf{y}_{[0,t]}) = P(q_{t}^{1}\vert y_{ t}^{1},t)P(q_{ t}^{2}\vert y_{ t}^{2},t) \\ & & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad = 1_{\{q_{t}^{1}=Q_{t}^{1}(y_{t}^{1})\}}1_{\{q_{t}^{2}=Q_{t}^{2}(y_{t}^{2})\}}, \\ & & \quad \quad \quad \quad \quad \quad Q_{t}^{1} : {\mathbb{Y}}^{1} \rightarrow \mathcal{M}_{ t}^{1},\quad \quad Q_{ t}^{2} : {\mathbb{Y}}^{2} \rightarrow \mathcal{M}_{ t}^{2},\quad t \geq 0\bigg\}, {}\end{array}$$

(10.17)

where $\{Q_{t}^{1},Q_{t}^{2}\}$ are arbitrary measurable functions.

Theorem 10.5.1 ([425]).

Consider the minimization problem of ( 10.16 ). An optimal composite quantization policy over all causal policies exists, and it is an element of ${\Pi }^{NSM}$.

Proof.

See Sect. 10.8.7. □

The result says that an optimal composite quantization policy only uses the product form admitted by a nonstationary memoryless team policy. It ignores the past observations and past quantization outputs without any loss. We note that this result applies also to the case when the source is memoryless, but not necessarily i.i.d.

Remark 10.5.1.

If there is an entropy constraint on the quantizer outputs, memory in the encoders might be useful for finite horizon problems as it provides common randomness, which cannot be achieved by time-sharing in a finite horizon problem. Neuhoff and Gilbert [292] noted that randomization of two scalar quantizers (operationally achievable through time-sharing) is optimal in causal coding of an i.i.d. source subject to entropy constraints. On the other hand, for the zero-delay setting, when one considers the distortion minimization problem subject to an entropy constraint, György and Linder [184] observed that the distortion-entropy curve is non-convex (leading to a benefit of common randomness which can be used to expand the set of achievable rate and distortion pairs) as we elaborated on in Sect. 5.4. $\diamond$

5.2 Markov Sources: Nonclassical Information Structure and a Counterexample Under Signaling

We now consider general Markov sources and show that a separation result of the type seen in the single-terminal case may not hold when there are multiple terminals.

We have the following (negative) result for the two-encoder setup, where the encoders have access to the feedback from the receiver (Fig. 10.1).

Proposition 10.5.1 ([425]).

Consider the setup in ( 10.1 )–( 10.2 ), and let $\tilde{\pi }_{t}^{i}(A) = P(x_{t} \in A\vert y_{[0,t]}^{i})$, i = 1,2, and $A \in \mathcal{B}(\mathbb{X})$ . An optimal composite quantization policy cannot, in general, be replaced by a policy which uses only $\{\mathbf{q}_{[0,t-1]},\tilde{\pi }_{t}^{i}\}$ to generate $q_{t}^{i}$ for i = 1,2.

Proof.

It suffices to produce an instance where an optimal policy cannot admit the separated structure. Toward this end, let $z_{1},z_{2},z_{3}$ be uniformly distributed, independent, binary numbers; and let $x_{0},x_{1}$ be defined by

$$\displaystyle{x_{0} = \left [\begin{array}{llll} z_{1} & z_{2} & 0&0 \end{array} \right ]^{\prime},\quad x_{1} = \left [\begin{array}{llll} 0&0&z_{2} & z_{3} \end{array} \right ]^{\prime},}$$

such that $x_{0}(1) = z_{1},x_{0}(2) = z_{2},x_{0}(3) = x_{0}(4) = 0$. Let the observations be given as follows:

$$\displaystyle{y_{t}^{1} = {g}^{1}(x_{ t}) = x_{t}(1) \oplus x_{t}(3) \oplus x_{t}(4),\quad \quad y_{t}^{2} = {g}^{2}(x_{ t}) = x_{t}(1) \oplus x_{t}(2),\quad t = 0,1.}$$

That is,

$$\displaystyle{y_{0}^{1} = \left [\begin{array}{l} z_{1}\end{array} \right ],\quad \quad y_{ 0}^{2} = \left [\begin{array}{l} z_{ 1} \oplus z_{2} \end{array} \right ],}$$

where ⊕ is the x-or operation, and

$$\displaystyle{y_{1}^{1} = \left [\begin{array}{l} z_{ 2} \oplus z_{3} \end{array} \right ],\quad \quad y_{1}^{2} = \left [\begin{array}{l} 0 \end{array} \right ],}$$

Let the cost be

$$\displaystyle{E\bigg[{(x_{0}(4) - E[x_{0}(4)\vert \mathbf{q}_{[0]}])}^{2} + {(x_{ 1}(4) - E[x_{1}(4)\vert \mathbf{q}_{[0,1]}])}^{2}\bigg].}$$

That is, the cost is $E[{(z_{3} - E[z_{3}\vert \mathbf{q}_{[0,1]}])}^{2}]$, where $q_{t}^{i}$ are the information bits sent to the decoder for t = 0 and 1.

We further restrict the information rates to satisfy $\vert \mathcal{M}_{0}^{1}\vert = \vert \mathcal{M}_{1}^{1}\vert = \vert \mathcal{M}_{1}^{2}\vert = 2$, $\vert \mathcal{M}_{0}^{2}\vert = 1$. That is, the encoder 2 may only send information at time t = 1.

Under arbitrary causal composite quantization policies, a cost of zero can be achieved as follows: If the encoder 1 sends the value z ₁ to the receiver and, at time 1, encoder 1 transmits $z_{2} \oplus z_{3}$ and encoder 2 transmits z ₂ (or $z_{1} \oplus z_{2}$), the receiver can uniquely identify the value of z ₃, for every realization of the random variables.

For such a source, an optimal composite policy cannot be written in the separated form, that is, an optimal policy of encoder 2 at time 1 cannot be written as $h_{1}(\mathbf{q}_{0},\tilde{\pi }_{1}^{2})$, for some measurable function h ₁. To see this, note the following: The conditional distribution of x ₁ at encoder 2 at time 1 is such that the conditional measure on $(z_{2},z_{3})$ is uniform and independent, that is, $P(z_{2} = a,z_{3} = b\vert z_{1} \oplus z_{2}) = (1/4)$ for all values of a, b. If a policy of the structure of h ₁ is adopted, then it is not possible for encoder 2 to recall its past observation to extract the value of z ₂. This is because $\tilde{\pi }_{1}^{2}$ will be a distribution only on z ₂ and z ₃, which will be uniform and independent, given $z_{1} \oplus z_{2}$. Thus, the information $y_{0}^{2}$ will not be available in the memory and the receiver will have access to at most $z_{2} \oplus z_{3}$ and z ₁ and $P(z_{2},z_{3}\vert z_{1} \oplus z_{2})$ (the last variable containing no useful information). The optimal estimator will be E[z ₃] = 1 ∕ 2, leading to a cost of 1 ∕ 4. □

5.2.1 Discussion: Connections with Team Decision Theory

Here, we interpret the results of this section in view of optimization for dynamic teams. With the characterization of information structures for dynamic teams provided in Chap. 3, every lossy coding problem is nonclassical, since a receiver cannot recover the information available at the encoder fully, while its information is clearly affected by the coding policy of the encoder. However, in an encoding problem, the problem itself is the transmission of information. We suggest the following: Signaling in a coding problem is the policy of an encoder to use the quantizers/encoding functions to transmit a message to other decision makers or to itself to be used in future stages, through the information sent to the receiver.

We have seen in Chap. 3 that in decentralized decision-making problems, when the information structure is nonclassical, the decision makers might benefit from communicating via their control actions, that is, by signaling. We also note that, in the information theory literature, signaling has been employed in coding for multiple-access channels with feedback, where active information transmission allows for coordination between encoders (see [82, 102, 378]).

The reason for the negative conclusion in Proposition 10.5.1 is that in general for an optimal policy,

$$\displaystyle\begin{array}{rcl} P(q_{t}^{i}\vert \pi _{ t}^{i},\mathbf{q}_{ [0,t-1]},y_{[0,t-1]}^{i})\neq P(q_{ t}^{i}\vert \pi _{ t}^{i},\mathbf{q}_{ [0,t-1]}),& &{}\end{array}$$

(10.18)

when the encoders have engaged in signaling (in contrast with what we will have in the proof of the separation results). The encoders may benefit from using the received past observation variables explicitly.

As we will discuss in detail in Chap. 12, separation results for such dynamic team problems typically require information sharing between the encoders (decision makers), where the shared information is used to establish a sufficient statistic living in a fixed state space and which admits a controlled Markov recursion (hence, such a sufficient statistics can serve as a state for the decentralized system). For the proof of Theorem 10.3.4, we see that Ξ _t forms such a state. For the proof of Theorem 10.5.1, we see that information sharing is not needed for the encoders to agree on a sufficient statistic, since the source considered is memoryless. Furthermore, for the multiterminal setting with a Markov source, a careful analysis of the proof of Theorem 10.5.1 reveals that if the encoders agree on $P(dx_{t}\vert \mathbf{y}_{[0,t-1]})$ through sharing their beliefs for all t ≥ 1, then a separation result involving this joint belief can be obtained. See Chap. 12 for further discussion on this topic and a discussion on the belief sharing information pattern.

6 Simultaneous Optimization of LQG Coding and Control Policies: Optimal Quantization and Control

In this section, we consider an important application of the results presented so far. We study a LQG setup, where a sensor encodes its noisy information to a controller/estimator. First, we discuss the case without control. The case with control will be considered subsequently.

6.1 Application to the LQG Setup: Separation of Estimation and Quantization

Consider a control-free LQG setup, where a sensor is connected to an estimator over a discrete noiseless channel. Let $x_{t} \in {\mathbb{R}}^{n},y_{t} \in {\mathbb{R}}^{m}$, and the evolution of the source be given by

$$\displaystyle\begin{array}{rcl} x_{t+1}& =& Ax_{t} + w_{t}, \\ y_{t}& =& Cx_{t} + v_{t},{}\end{array}$$

(10.19)

where $\{w_{t},v_{t}\}$ is a mutually independent zero-mean Gaussian noise sequence with $E[w_{t}w_{t}^{\prime}] =: W,E[v_{t}v_{t}^{\prime}] =: V$, and A, C are matrices of appropriate dimensions. The goal is to obtain a solution to the minimization problem

$$\displaystyle\begin{array}{rcl} \inf _{{\Pi }^{comp}}\inf _{{\underline{\gamma }}^{0}}E_{\nu _{0}}^{{\Pi }^{comp},{\underline{\gamma }}^{0} }[\sum _{t=0}^{T-1}(x_{ t} - u_{t})^{\prime}Q(x_{t} - u_{t})],& &{}\end{array}$$

(10.20)

with $\nu _{0}$ denoting a Gaussian distribution for the zero-mean initial state, and Q > 0 a positive-definite matrix.

The conditional distribution $\tilde{\pi }_{t}(\cdot ) = P(x_{t} \in \cdot \vert y_{[0,t]})$ is Gaussian for all time stages, which is characterized uniquely by its mean and covariance matrix for all time; thus $\tilde{\pi }_{t}$ can be uniquely characterized by an element of ${\mathbb{R}}^{\frac{{n}^{2}+3n} {2} }$. Furthermore, the nonlinear filter equation described in (D.4) admits a simpler recursion known as the Kalman filter (see Sect. B.4). We have the following result (see Fig. 10.2).

Theorem 10.6.1.

For the minimization of the cost in ( 10.20 ), any composite quantization policy can be replaced, without any loss in performance, by one which only uses the output of the Kalman filter and the information available at the receiver.

Proof.

The result can be proven by considering a direct approach, rather than as an application of Theorems 10.3.3 and 10.3.4 (which require bounded costs; however, this assumption can be relaxed for this case), exploiting the specific quadratic nature of the problem. Let, again, $x_{t} \in {\mathbb{R}}^{n}$ and $\vert \cdot \vert _{Q}$ denote the norm generated by an inner product of the form $\langle x,y\rangle _{Q} = {x}^{T}Qy$ for $x,y \in {\mathbb{R}}^{n}$ for positive-definite Q > 0. The projection theorem for Hilbert spaces implies that the random variable $x_{t} - E[x_{t}\vert y_{[0,t]}]$ is orthogonal (see Sect. B.4) to the random variables $\{y_{[0,t]},q_{[0,t]}\}$, where q _[0, t] is included due to the Markov chain condition that $P(dx_{t}\vert y_{[0,t]},q_{[0,t]}) = P(dx_{t}\vert y_{[0,t]})$. We thus obtain the following identity:

$$\displaystyle\begin{array}{rcl} & & E[\vert x_{t} - E[x_{t}\vert q_{[0,t]}]\vert _{Q}^{2}] = E[\vert x_{ t} - E[x_{t}\vert y_{[0,t]}]\vert _{Q}^{2}] \\ & & \quad \quad + E\bigg[\bigg\vert E[x_{t}\vert y_{[0,t]}] - E\bigg[E[x_{t}\vert y_{[0,t]}]\bigg\vert q_{[0,t]}\bigg]\bigg\vert _{Q}^{2}\bigg].{}\end{array}$$

(10.21)

The second term is to be minimized through the choice of the quantizers. Hence, the term $\bar{m}_{t} := E[x_{t}\vert y_{[0,t]}]$, which is computed through a Kalman filter, is to be quantized (see Fig. 10.2). Recall that by the Kalman filter (see Sect. D.2), with

$$\displaystyle{\Sigma _{0\vert -1} = E[x_{0}x_{0}^{\prime}]}$$

and for $t \geq 0$

$$\displaystyle\begin{array}{rcl} & & \Sigma _{t+1\vert t} = A\Sigma _{t\vert t-1}A^{\prime} + W - (A\Sigma _{t\vert t-1}C^{\prime}){(C\Sigma _{t\vert t-1}C^{\prime} + V )}^{-1}(C\Sigma _{ t\vert t-1}A^{\prime}), {}\\ \end{array}$$

the following recursion holds for $t \geq 0$ and with $\bar{m}_{-1} = 0$:

$$\displaystyle\begin{array}{rcl} & & \bar{m}_{t} = A\bar{m}_{t-1} + \Sigma _{t\vert t-1}C^{\prime}{(C\Sigma _{t\vert t-1}C^{\prime} + V )}^{-1}(CA(x_{ t-1} -\bar{ m}_{t-1}) + v_{t}). {}\\ \end{array}$$

Thus, the pair $(\bar{m}_{t},\Sigma _{t\vert t-1})$ is a Markov source, where the evolution of Σ _{t | t − 1} is deterministic. Even though the cost to be minimized is not bounded, since $\bar{m}_{t}$ itself is a fully observed process, Theorem 10.3.1 can be used to develop the structural result that any causal encoder can be replaced with one which uses $(\bar{m}_{t},\Sigma _{t\vert t-1})$ and the past quantization outputs. Likewise, the proof of Theorem 10.3.2 shows that, for the fully observed Markov source $(\bar{m}_{t},\Sigma _{t\vert t-1})$, any causal coder can be replaced with one which only uses the conditional probability on $\bar{m}_{t}$ and the realization $(\bar{m}_{t},\Sigma _{t\vert t-1},t)$ at time t. □

6.2 Optimal LQG Coding and Control Policies and Separation Results

Here, we consider an LQG setup with control, where a sensor encodes its noisy information to a controller. Let $x_{t} \in {\mathbb{R}}^{n}$ and the evolution of the system be given by the following:

$$\displaystyle\begin{array}{rcl} x_{t+1}& =& Ax_{t} + Bu_{t} + w_{t}, \\ y_{t}& =& x_{t}. {}\end{array}$$

(10.22)

Here, $\{w_{t}\}$ is a mutually independent, Gaussian noise sequence, $\{u_{t}\}$ is an ${\mathbb{R}}^{m}$-valued control action sequence, and A, B are matrices of appropriate dimensions. We assume that the initial state distribution is also Gaussian, denoted by ν ₀.

As depicted in Fig. 10.3, we will follow the framework of Sect. 10.2 (in particular, see Theorem 10.3.6).

Suppose that the goal is the computation of

$$\displaystyle\begin{array}{rcl} \inf _{{\Pi }^{comp}}\inf _{{\underline{\gamma }}^{0}}J({\Pi }^{comp},{\underline{\gamma }}^{0},T),& &{}\end{array}$$

(10.23)

where

$$\displaystyle{J({\Pi }^{comp},{\underline{\gamma }}^{0},T) := \frac{1} {T}E_{\nu _{0}}^{{\Pi }^{comp},{\underline{\gamma }}^{0} }[\sum _{t=0}^{T-1}x_{ t}^{\prime}Qx_{t} + u_{t}^{\prime}Ru_{t}].}$$

Here, Q ≥ 0, a positive semi-definite matrix, and R > 0, a positive-definite matrix.

6.2.1 Separation of Estimation Error and Control and Dual Effect

We first note that, by Theorem 10.3.6, an optimal composite quantization policy will be within the class Π _W.

Toward a solution, we adopt a dynamic programming approach and establish that the optimal controller is linear in its estimate [225]. This fact applies naturally for the terminal time stage control. That this also applies for the previous time stages applies from dynamic programming as we see in the following.

First consider the terminal time t = T − 1. For this time stage, to minimize $E[x_{t}^{\prime}Qx_{t} + u_{t}^{\prime}Ru_{t}]$, the optimal control is u _T − 1 = 0 a.s.

To obtain a solution for t = T − 2, we look for a solution to

$$\displaystyle{\min _{\gamma _{t}^{0}}E\bigg[\bigg(x_{t}^{\prime}Qx_{t} + u^{\prime}_{t}Ru_{t} + E[(Ax_{t} + Bu_{t} + w_{t})^{\prime}Q(Ax_{t} + Bu_{t} + w_{t})\vert \mathcal{I}_{t}^{c},u_{ t}]\bigg)\bigg\vert \mathcal{I}_{t}^{c}\bigg].}$$

By completing the squares and using the orthogonality principle (see Sect. B.4), we obtain that the optimal policy is linear and is given by

$$\displaystyle{u_{T-2} = L_{T-2}E[x_{T-2}\vert q_{[0,T-2]}],}$$

with

$$\displaystyle{L_{T-2} = -{R}^{-1}B^{\prime}QA.}$$

For t < T − 2, to obtain the solutions, we will first establish that the estimation errors are uncorrelated. Toward this end, define for $1 \leq t \leq T - 1$:

$$\displaystyle{\mathcal{I}_{t}^{c} =\{ q_{ [0,t]},u_{[0,t-1]}\},}$$

and note that

$$\displaystyle{\tilde{m}_{t+1} := E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] = E[Ax_{ t} + Bu_{t} + w_{t}\vert \mathcal{I}_{t+1}^{c}].}$$

It then follows that

$$\displaystyle\begin{array}{rcl} \tilde{m}_{t+1}& =& E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] \\ & =& E[x_{t+1} - E[x_{t+1}\vert \mathcal{I}_{t}^{c}] + E[x_{ t+1}\vert \mathcal{I}_{t}^{c}]\vert \mathcal{I}_{ t+1}^{c}] \\ & =& E[x_{t+1}\vert \mathcal{I}_{t}^{c}] + E[x_{ t+1} - E[x_{t+1}\vert \mathcal{I}_{t}^{c}]\vert \mathcal{I}_{ t+1}^{c}] \\ & =& E[Ax_{t} + Bu_{t} + w_{t}\vert \mathcal{I}_{t}^{c}] + E[x_{ t+1} - E[x_{t+1}\vert \mathcal{I}_{t}^{c}]\vert \mathcal{I}_{ t+1}^{c}] \\ & =& A\tilde{m}_{t} + Bu_{t} +\bigg (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - E[x_{ t+1}\vert \mathcal{I}_{t}^{c}]\bigg) \\ & =& A\tilde{m}_{t} + Bu_{t} +\bar{ w}_{t}, {}\end{array}$$

(10.24)

with

$$\displaystyle\begin{array}{rcl} \bar{w}_{t} = (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - E[x_{ t+1}\vert \mathcal{I}_{t}^{c}]).& &{}\end{array}$$

(10.25)

Now, $\bar{w}_{t}$ is orthogonal to the control action variable u _t, as control actions are determined by the past quantizer outputs and iterated expectation leads to the result that conditioned on $\mathcal{I}_{t}^{c}$, $\bar{w}_{t}$ is zero mean and is orthogonal to $\mathcal{I}_{t}^{c}$.

Now, for going into earlier time stages, the dynamic programming recursion for linear systems driven by an uncorrelated noise process would normally apply, since the estimate process $\{\tilde{m}_{t}\}$ is driven an uncorrelated noise (though, not necessarily independent) process $E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - E[x_{t+1}\vert \mathcal{I}_{t}^{c}]$. However, this lack of independence may be important. Using the completion of the squares method, we can establish that the optimal controller at time t will be linear, provided that the random variable $\bar{w}_{t}^{\prime}Q\bar{w}_{t}$ does not depend on $u_{k},k \leq t$ under any policy. A sufficient condition for this is that the encoder is a predictive one. We state this formally as follows (see [283] for a similar, but not identical, construction):

Definition 10.6.1.

A predictive quantizer policy is one where for each time stage t, the quantization has the form that the quantizer at all time stages subtracts the effect of the past control terms, that is, at time t it has the form $Q_{t}(x_{t} -\sum _{k=0}^{t-1}{A}^{t-k-1}Bu_{k})$, and the past control terms are added at the receiver. Hence, the encoder quantizes a control-free process, defined by

$$\displaystyle{\bar{x}_{t+1} = A\bar{x}_{t} + w_{t},}$$

and the receiver generates the quantized estimate and adds $\sum _{k=0}^{t-1}{A}^{t-k-1}Bu_{k}$ to compute the estimate of the state at time t. ◇

A predictive encoder is depicted in Fig. 10.4. We have the following key lemma.

Lemma 10.6.1 ([423]).

For problem ( 10.23 ), for any quantizer policy in class Π _W (which is without any loss as a result of Theorem 10.3.6 ), there exists a quantizer which satisfies the form of a predictive quantizer (see Definition 10.6.1 ) and attains the same performance under an optimal control policy.

Proof.

See Sect. 10.8.8. □

Remark 10.6.1.

We note that the structure in Definition 10.6.1 separates the estimation from control process in the sense that the estimation errors are independent of the control actions or policies. Hence, there is no dual effect of the control actions, in the sense that the estimation error at any given time is independent of the past applied control actions.$\diamond;$

As a consequence of the lack of dual effect, the cost function becomes

$$\displaystyle{J({\Pi }^{comp},{\underline{\gamma }}^{0},T) := \frac{1} {T}E_{\nu _{0}}^{{\Pi }^{comp},{\underline{\gamma }}^{0} }[\sum _{t=0}^{T-1}\tilde{m}_{ t}^{\prime}Q\tilde{m}_{t}^{\prime}Qx_{t}+u_{t}^{\prime}Ru_{t}+(x_{t}-\tilde{m}_{t})^{\prime}Q(x_{t}-\tilde{m}_{t})].}$$

Theorem 10.6.2.

For the minimization problem ( 10.23 ), the optimal control policy is given by $u_{t} = L_{t}E[x_{t}\vert q_{[0,t]}]$, where

$$\displaystyle{L_{t} = -{(R + B^{\prime}P_{t+1}B)}^{-1}B^{\prime}K_{ t+1}A,}$$

and

$$\displaystyle{P_{t} = A^{\prime}_{t}K_{t+1}B{(R + B^{\prime}K_{t+1}B)}^{-1}B^{\prime}K_{ t+1}A,}$$

$$\displaystyle{K_{t} = A^{\prime}_{t}K_{t+1}A_{t} - P_{t} + Q,}$$

with $K_{T} = P_{T-1} = 0$.

Given the optimal control policy, the following result is obtained after some analysis.

Theorem 10.6.3.

For the minimization problem ( 10.23 ), under an optimal control policy, the optimal cost is given by $\frac{1} {T}J_{0}({\Pi }^{comp},T)$, where

$$\displaystyle\begin{array}{rcl} & & J_{0}({\Pi }^{comp},T)=E[x_{ 0}^{\prime}K_{0}x_{0}]+E[(x_{0}-E[x_{0}\vert \mathcal{I}_{0}^{c}])^{\prime}(Q+A^{\prime}K_{ 1}A)(x_{0}-E[x_{0}\vert \mathcal{I}_{0}^{c}])] \\ & & \quad \quad +\sum _{ t=1}^{T-1}E[(x_{ t} - E[x_{t}\vert \mathcal{I}_{t}^{c}])^{\prime}(Q + A^{\prime}K_{ t+1}A - K_{t})(x_{t} - E[x_{t}\vert \mathcal{I}_{t}^{c}])] \\ & & \quad \quad +\sum _{ t=0}^{T-1}E[w^{\prime}_{ t}K_{t+1}w_{t}]. {}\end{array}$$

(10.26)

Proof.

See Sect. 10.8.9. □

We have thus established the solution to the optimal control problem. We address the optimal quantization problem in the following subsection.

6.3 Existence of Optimal Quantization Policies

Now that we have separated the costs due to control and quantization, under any such composite policy and $T \in \mathbb{N}$, we can define a cost to be minimized by a composite quantizer policy as

$$\displaystyle{J({\Pi }^{comp},T) = E_{\nu _{ 0}}^{{\Pi }^{comp} }[ \frac{1} {T}(x_{0}^{\prime}K_{0}x_{0} +\sum _{ t=0}^{T-1}c_{ t}(\pi _{t},Q_{t}))],}$$

where

$$\displaystyle{c_{t}(\pi _{t},Q_{t}) =\sum _{i\in \mathcal{M}}\inf _{\gamma _{t}^{0}(i)}\int _{{\mathbb{R}}^{n}}1_{\{q_{t}=i\}}\pi _{t}(d\bar{x})(\bar{x}_{t} -\gamma _{t}^{0}(i))^{\prime}P_{ t}(\bar{x}_{t} -\gamma _{t}^{0}(i)),}$$

where now ${\underline{\gamma }}^{0} =\{\gamma _{ t}^{0},t \geq 0\}$ denotes a receiver policy and $P_{t} = (Q + A^{\prime}K_{t+1}A - K_{t})$, by (10.59) and $P_{0} = Q + A^{\prime}K_{1}A$.

We note that here the process $\bar{x}_{t}$ is the control-free process given by $\bar{x}_{t+1} = A\bar{x}_{t} + w_{t}$.

Therefore, we consider the setting where in (10.22), u _t = 0 and the quantizer is designed for this system. We note that as a result of the decoupling from the control actions by the predictive quantization policy (see Definition 10.6.1), the separation results presented in Theorem 10.3.4 directly apply in this context.

In the analysis, we will restrict the quantizers to have convex codecells (see Assumption 10.4.2). As elaborated in Chap. 4, a quantizer can be characterized as a stochastic kernel Q from $\mathbb{X}$ to {1, …, M}) defined by

$$\displaystyle{Q(i\vert x) = 1_{\{x\in B_{i}\}},\quad i = 1,\ldots,M.}$$

We endow the quantizers by a topology induced by such a stochastic kernel interpretation. In view of the results of Sect. 4.4.1, we have the following.

Let $\pi _{t}(\cdot ) = P(x_{t} \in \cdot \vert q_{[0,t-1]})$. Recall that the properties of conditional probability lead to the filtering expression in (D.4).

Hence, with $\mathcal{P}({\mathbb{R}}^{n})$ denoting the set of probability measures on $\mathcal{B}({\mathbb{R}}^{n})$ under weak convergence, the conditional density process and the quantization process $(\pi _{t}(x),Q_{t})$ form a joint Markov process in $\mathcal{P}({\mathbb{R}}^{n}) \times \mathcal{Q}_{c}(M)$, as in Theorem 10.4.1.

We have the following result on the existence of optimal quantizers for the finite horizon setting.

Theorem 10.6.4.

For any T ≥ 1 and arbitrary initial condition π ₀, there exists a policy in $\Pi _{W}^{C}$ such that

$$\displaystyle{ \inf _{{\Pi }^{comp}\in \Pi _{W}^{C}}J_{\pi _{0}}({\Pi }^{comp},T) }$$

(10.27)

is achieved. Letting $J_{T}^{T}(\cdot ) = 0$ and

$$\displaystyle{J_{0}^{T}(\pi _{ 0}) :=\min _{{\Pi }^{comp}\in \Pi _{W}^{C}{,\gamma }^{0}}J_{\pi _{0}}({\Pi }^{comp}{,\gamma }^{0},T),}$$

the dynamic programming recursion

$$\displaystyle{ TJ_{t}^{T}(\pi _{ t}) =\min _{Q\in \mathcal{Q}_{c}(M)}\bigg(c(\pi _{t},Q_{t}) + TE[J_{t+1}^{T}(\pi _{ t+1})\vert \pi _{t},Q_{t}]\bigg) }$$

(10.28)

holds for all t = 0,1,…,T − 1.

Proof.

See Sect. 10.8.10. □

Note that the optimal control policy is linear in the conditional estimate and is given in Theorem 10.6.2.

6.4 Partially Observed Case

We consider now the setup in (10.22) and Fig. 10.3, with a partially observed state, that is, with

$$\displaystyle\begin{array}{rcl} x_{t+1}& =& Ax_{t} + Bu_{t} + w_{t}, \\ y_{t}& =& Cx_{t} + v_{t}, {}\end{array}$$

(10.29)

where, different from (10.22), here $y_{t} \in {\mathbb{R}}^{m}$, v _t is Gaussian and C a matrix.

Define $\bar{m}_{t} := E[x_{t}\vert y_{[0,t]}]$, which is computed through a Kalman filter. With

$$\displaystyle{\Sigma _{0\vert -1} = E[x_{0}x_{0}^{\prime}]}$$

and for t ≥ 0,

$$\displaystyle\begin{array}{rcl} & & \Sigma _{t+1\vert t} = A\Sigma _{t\vert t-1}A^{\prime} + W {}\\ & & \quad \quad - (A\Sigma _{t\vert t-1}C^{\prime}){(C\Sigma _{t\vert t-1}C^{\prime} + V )}^{-1}(C\Sigma _{ t\vert t-1}A^{\prime}), {}\\ \end{array}$$

the following recursion holds for t ≥ 0 and with $\tilde{m}_{-1} = 0$:

$$\displaystyle\begin{array}{rcl} & & \bar{m}_{t} = A\bar{m}_{t-1} + Bu_{t-1} {}\\ & & +\Sigma _{t\vert t-1}C^{\prime}{(C\Sigma _{t\vert t-1}C^{\prime} + V )}^{-1}(CA(x_{ t-1} -\tilde{ m}_{t-1}) + v_{t}). {}\\ \end{array}$$

Now, note that the cost

$$\displaystyle\begin{array}{rcl} \inf _{{\Pi }^{comp}}\inf _{\gamma }J({\Pi }^{comp},\gamma,T)& &{}\end{array}$$

(10.30)

with

$$\displaystyle{J({\Pi }^{comp},\gamma,T) = \frac{1} {T}E_{\nu _{0}}^{{\Pi }^{comp},\gamma }[\sum _{t=0}^{T-1}x_{ t}^{\prime}Qx_{t} + u_{t}^{\prime}Ru_{t}]}$$

can be written equivalently as

$$\displaystyle\begin{array}{rcl} J({\Pi }^{comp},\gamma,T)& =& \frac{1} {T}E_{\nu _{0}}^{{\Pi }^{comp},\gamma }[\sum _{t=0}^{T-1}\bar{m}_{ t}^{\prime}Q\bar{m}_{t} + u_{t}^{\prime}Ru_{t}] {}\\ & +& \frac{1} {T}E_{\nu _{0}}[\sum _{t=0}^{T-1}(x_{ t} -\bar{ m}_{t})^{\prime}Q(x_{t} -\bar{ m}_{t})] {}\\ \end{array}$$

since the quadratic error $(x_{t} -\bar{ m}_{t})^{\prime}Q(x_{t} -\bar{ m}_{t})$ is independent of the coding or the control policy.

Thus, we have that the processes $(\bar{m}_{t},\Sigma _{t+1\vert t})$ and u _t form a controlled Markov chain and we can invoke Theorem 10.3.6: Any causal quantizer policy can, without any loss, be replaced with one in Π _W (where the state is now $(\bar{m}_{t},\Sigma _{t+1\vert t})$ instead of x _t) as a consequence of Theorem 10.3.6. Furthermore, any quantizer in Π _W can be replaced without any loss with a predictive quantizer with the new state $\bar{m}_{t}$, as a consequence of Lemma 10.6.1 applied to the new state with identical arguments: Observe that the past control actions do not affect the evolution of Σ _{t + 1 | t}.

We have the following result.

Theorem 10.6.5.

For the minimization problem ( 10.30 ), the optimal control policy is given by $u_{t} = L_{t}E[x_{t}\vert q_{[0,t]}]$, where

$$\displaystyle{L_{t} = -{(R + B^{\prime}P_{t+1}B)}^{-1}B^{\prime}K_{ t+1}A,}$$

and

$$\displaystyle{P_{t} = A^{\prime}_{t}K_{t+1}B{(R + B^{\prime}K_{t+1}B)}^{-1}B^{\prime}K_{ t+1}A,}$$

$$\displaystyle{K_{t} = A^{\prime}_{t}K_{t+1}A_{t} - P_{t} + Q,}$$

with $K_{T} = P_{T-1} = 0$.

Given the optimal control policy, the following result is obtained.

Theorem 10.6.6.

For the minimization problem ( 10.30 ), the optimal cost it given by $\frac{1} {T}J_{0}({\Pi }^{comp},T)$, where

$$\displaystyle\begin{array}{rcl} & & J_{0}({\Pi }^{comp},T)=E[x_{ 0}^{\prime}K_{0}x_{0}]+E[(x_{0}-E[x_{0}\vert \mathcal{I}_{0}^{c}])^{\prime}(Q+A^{\prime}K_{ 1}A)(x_{0}-E[x_{0}\vert \mathcal{I}_{0}^{c}])] \\ & & \quad \quad \quad \quad +\sum _{ t=1}^{T-1}E[(x_{ t} - E[x_{t}\vert \mathcal{I}_{t}^{c}])^{\prime}(Q + A^{\prime}K_{ t+1}A - K_{t})(x_{t} - E[x_{t}\vert \mathcal{I}_{t}^{c}])] \\ & & \quad \quad \quad \quad +\sum _{ t=0}^{T-1}E[(x_{ t} -\bar{ m}_{t})^{\prime}Q(x_{t} -\bar{ m}_{t}) + w^{\prime}_{t}K_{t+1}w_{t}]. {}\end{array}$$

(10.31)

Now, that the cost has been separated; the following is a result of Theorem 10.6.1.

Theorem 10.6.7.

For the minimization of the cost in ( 10.30 ), any composite quantization policy can be replaced, without any loss in performance, by one which only uses the output of the Kalman filter and the information available at the receiver.

Thus, the optimality of Kalman filtering allows the encoder to only use the conditional estimate and the error covariance matrix without any loss of optimality (see Fig. 10.5), and the optimal quantization problem also has an explicit formulation.

7 Case with Noisy Channels and Noiseless Feedback

The results and the general program presented in this chapter apply also to coding over discrete memoryless (noisy) channels (DMCs) with feedback. In this context, consider the setup in Sect. 5.2.2, with one encoder and with $y_{t} = x_{t}$ and with the channel being a DMC. The equivalent results of Theorems 10.3.1 and 10.3.2 apply with q′ _t terms replacing q _t, if q′ _t is the output of a DMC at time t, as we state in the following.

In this context, let again $\mathcal{P}(\mathbb{X})$ denote the space of probability measures on $\mathcal{B}(\mathbb{X})$ under the topology of weak convergence and define $\pi _{t} \in \mathcal{P}(\mathbb{X})$ to be the regular conditional probability measure given by $\pi _{t}(\cdot ) = P(x_{t} \in \cdot \vert q^{\prime}_{[0,t-1]})$, where q′ _t is the channel output when the input is q _t. That is, $\pi _{t}(A) = P(x_{t} \in A\vert q^{\prime}_{[0,t-1]}),\quad A \in \mathcal{B}(\mathbb{X})$. The goal is the minimization

$$\displaystyle\begin{array}{rcl} \inf _{{\Pi }^{comp}}\inf _{{\underline{\gamma }}^{0}}E_{\nu _{0}}^{{\mathbf{\Pi }}^{comp},{\underline{\gamma }}^{0} }[\sum _{t=0}^{T-1}c(x_{ t},u_{t})],& &{}\end{array}$$

(10.32)

with initial condition distribution ν ₀. Here $c(\cdot,\cdot )$, is a nonnegative, measurable function and $u_{t} =\gamma _{ t}^{0}(q^{\prime}_{[0,t]})$. We state the following.

Theorem 10.7.1.

Any composite encoding policy can be replaced, without any loss in performance, by one which only uses x _t and q′ _[0,t−1] at time t ≥ 1 to generate the channel input q _t.

Theorem 10.7.2.

Any composite quantization policy can be replaced, without any loss in performance, by one which only uses the conditional probability measure $\pi _{t}(\cdot ) = P(x_{t} \in \cdot \vert q^{\prime}_{[0,t-1]})$, the state x _t, and the time information t, at time t ≥ 1 to generate the channel input q _t.

The proof of these results follow from those of Theorems 10.3.1 and 10.3.2 with almost identical steps with q _t being replaced with q′ _t in the information available at the receiver and the encoder.

Likewise, for a partially observed setup, extensions of Theorems 10.3.3 and 10.3.4 also apply to this case.

Remark 10.7.1.

When there is no feedback from the controller or when there is noisy feedback, the analysis requires a Markov chain construction in a larger state space provided memory restrictions are imposed on the decoders. We refer the reader to Teneketzis [361] and Mahajan and Teneketzis [248, 249] for a class of such settings.$\diamond$

8 Appendix: Proofs

8.1 Proof of Theorem 10.3.1

At time t = T − 1, the per-stage cost function can be written as follows, where $\gamma _{t}^{0}$ denotes a fixed receiver policy:

$$\displaystyle\begin{array}{rcl} E[c(x_{t},\gamma _{t}^{0}(q_{ [0,t]}))\vert q_{[0,t-1]}] = E[F(x_{t},q_{[0,t-1]},q_{t})\vert q_{[0,t-1]}]& & {}\\ \end{array}$$

where, $F(x_{t},q_{[0,t-1]},q_{t}) = c(x_{t},\gamma _{t}^{0}(q_{[0,t]}))$.

This is equivalent to, by the smoothing property of conditional expectation, the following:

$$\displaystyle{E\bigg[E[F(x_{t},q_{[0,t-1]},q_{t})\vert x_{t},q_{[0,t-1]}]\bigg\vert q_{[0,t-1]}\bigg].}$$

Now, we will apply Witsenhausen’s two-stage lemma [396], to show that we can obtain a lower bound for the double expectation by picking q _t as a result of a measurable function of $x_{t},q_{[0,t-1]}$. Thus, we will find a composite quantization policy which only uses $(x_{t},q_{[0,t-1]})$ which performs as well as one which uses the entire memory available at the encoder. To make this precise, let us fix the decision function $\gamma _{t}^{0}$ at the receiver corresponding to a given composite quantization policy at the encoder $Q_{t}^{comp}$, let t = T − 1, and define for every $k \in \mathcal{M}_{t}$:

$$\displaystyle\begin{array}{rcl} \beta _{k}& :=& \bigg\{x_{t},q_{[0,t-1]} : F(x_{t},q_{[0,t-1]},k) \leq F(x_{t},q_{[0,t-1]},q^{\prime}),\forall q^{\prime}\neq k,q^{\prime} \in \mathcal{M}_{t}\bigg\}. {}\\ \end{array}$$

These sets are Borel, by the measurability of F on $\mathbb{X}$. Such a construction covers the domain set consisting of $(x_{t},q_{[0,t-1]})$ but with overlaps. It covers the elements in $\mathbb{X} \times \prod _{t=0}^{T-2}\mathcal{M}_{t}$, since for every element in this product set, there is a minimizing $k \in \mathcal{M}_{t}$ ($\mathcal{M}_{t}$ is finite). To avoid the overlaps, we adopt the following technique which was introduced in Witsenhausen [396]. Let there be an ordering of the elements in $\mathcal{M}_{t}$ as $1,2,\ldots,\vert \mathcal{M}_{t}\vert $, and for k ≥ 1 in this sequence define a function $Q_{t}^{comp,{\ast}}$ as

$$\displaystyle{q_{t} = Q_{t}^{comp,{\ast}}(x_{ t},q_{[0,t-1]}) = k,\quad \mathrm{if}(x_{t},q_{[0,t-1]}) \in \beta _{k} -\cup _{i=1}^{k-1}\beta _{ i},}$$

with $\beta _{0} = \varnothing $. Thus, for any random variable q _t appropriately defined on the probability space,

$$\displaystyle\begin{array}{rcl} & & E\bigg[E[F(x_{t},q_{[0,t-1]},q_{t})\vert x_{t},q_{[0,t-1]}]\bigg\vert q_{[0,t-1]}\bigg] \\ & & \quad \quad \quad \quad \geq E\bigg[E[F(x_{t},q_{[0,t-1]},Q_{t}^{comp,{\ast}}(x_{ t},q_{[0,t-1]}))\vert x_{t},q_{[0,t-1]}]\bigg\vert q_{[0,t-1]}\bigg].{}\end{array}$$

(10.33)

Thus, the new composite policy performs at least as well as the original composite coding policy even though it has a restricted structure.

As such, if there is an optimal policy, it can be replaced with one which uses only $\{x_{t},q_{[0,t-1]}\}$ without any loss of performance while keeping the receiver decision function $\gamma _{t}^{0}$ fixed.

We have thus obtained the structure of the encoder for the last stage. We iteratively proceed to study the other time stages. In particular, since $\{x_{t}\}$ is Markov, we could proceed as follows (in essence using Witsenhausen’s three-stage lemma [396]): For a three-stage cost problem, the cost at time t = 2 can be written as, for measurable functions $c_{2},c_{3}$:

$$\displaystyle\begin{array}{rcl} & & E\bigg[c_{2}(x_{2},\gamma _{2}^{0}(q_{ 1},q_{2}),q_{1},q_{2}) {}\\ & & \quad \quad + E[c_{3}(x_{3},\gamma _{3}^{0}(q_{ 1},q_{2},Q_{3}^{comp,{\ast}}(x_{ 3},q_{2},q_{1}))\vert x_{3},q_{2},x_{2},q_{1},x_{1})]\bigg\vert x_{2},x_{1},q_{2},q_{1}\bigg]. {}\\ \end{array}$$

Since

$$\displaystyle{P(dx_{3},q_{2},q_{1}\vert x_{2},x_{1},q_{2},q_{1}) = P(dx_{3},q_{2},q_{1}\vert x_{2},q_{2},q_{1})}$$

and since under $Q_{3}^{comp,{\ast}}$, q ₃ is a function of x ₃ and $q_{1},q_{2}$, the expression above is equal to, for some measurable F ₂(. ), $F_{2}(x_{2},q_{2},q_{1})$. By a similar argument as above, a composite quantization policy at time 2 which uses x ₂ and q ₁ and which performs at least as good as the original policy can be constructed. By similar arguments, an encoder at time t, $1 \leq t \leq T - 1$ only uses $(x_{t},q_{[0,t-1]})$ can be constructed. The encoder at time t = 0 uses x ₀, where $x_{0} =\nu _{0}$ is the prior distribution on the initial state.

Now that we have obtained the restricted structure for a composite quantization policy which is without any loss, we can express this as

$$\displaystyle{Q_{t}^{comp}(x_{ t},q_{[0,t-1]}) = {Q}^{q_{[0,t-1]} }(x_{t}),\quad \forall x_{t},q_{[0,t-1]}}$$

such that the quantizer action ${Q}^{q_{[0,t-1]} } \in \mathbb{Q}(\mathbb{X};\mathcal{M}_{t})$ is generated using only q _{[0, t − 1]} and the quantizer outcome is generated by evaluating ${Q}^{q_{[0,t-1]} }(x_{t})$ for every x _t. □

8.2 Proof of Theorem 10.3.2

At time t = T − 1, the per-stage cost function can be written as

$$\displaystyle\begin{array}{rcl} E[c(x_{t},v_{t}(q_{[0,t]}))\vert q_{[0,t-1]}] = E[\int _{\mathbb{X}}P(dx_{t}\vert q_{[0,t-1]},q_{t})c(x_{t},v_{t}(q_{[0,t-1]},q_{t}))].& & {}\\ \end{array}$$

Thus, at time t = T − 1, an optimal receiver (which is deterministic without any loss of optimality) will use $P(dx_{t}\vert q_{[0,t]})$ as a sufficient statistic for an optimal decision (or any receiver can be replaced with one which uses this sufficient statistic without any loss). Let us fix a receiver policy which only uses the posterior $P(dx_{t}\vert q_{[0,t]})$ as its sufficient statistic. Let us further note that

$$\displaystyle\begin{array}{rcl} & & P(dx_{t}\vert q_{[0,t]}) = \frac{P(q_{t},dx_{t}\vert q_{[0,t-1]})} {\int _{x_{t}}P(q_{t},dx_{t}\vert q_{[0,t-1]})} \\ & & = \frac{P(q_{t}\vert x_{t},q_{[0,t-1]})P(dx_{t}\vert q_{[0,t-1]})} {\int _{x_{t}}P(q_{t}\vert x_{t},q_{[0,t-1]})P(dx_{t}\vert q_{[0,t-1]})}.{}\end{array}$$

(10.34)

The term $P(q_{t}\vert x_{t},q_{[0,t-1]})$ is determined by the quantizer action Q _t (this follows from Theorem 10.3.1). Furthermore, given Q _t, the relation (10.34) is measurable on $\mathcal{P}(\mathbb{X})$ (i.e., in $\pi _{t}(\cdot ) = P(x_{t} \in \cdot \vert q_{[0,t-1]})$) under weak convergence.

To prove this technical argument, consider the numerator in (10.34) and note that the function $\kappa _{B} : \mathcal{P}(\mathbb{X}) \rightarrow \mathbb{R}$ defined as $\kappa _{B}(\pi ) =\pi (B)$ is measurable under weak convergence topology as a consequence of Theorem B.2.1, for each $B \in \mathcal{B}(\mathbb{X})$. By Theorem B.2.2, this implies that the relation in (10.34) is measurable on $\mathcal{P}(\mathbb{X})$.

Let us denote the quantizer applied, given the past realizations of quantizer outputs as $Q_{t}^{q_{[0,t-1]} }$. Note that q _t is deterministically determined by $(x_{t},Q_{t})$ and the optimal receiver function can be expressed as $\gamma _{t}^{0}(P(dx_{t}\vert q_{[0,t-1]}),q_{t})$, given $Q_{t}^{q_{[0,t-1]} }$. The cost at time t = T − 1 can be expressed, given the quantizer $Q_{t}^{q_{[0,t-1]} }$, for some Borel function G, as $G(P(dx_{t}\vert q_{[0,t-1]}),Q_{t}^{q_{[0,t-1]} })$, where

$$\displaystyle\begin{array}{rcl} & & G(P(dx_{t}\vert q_{[0,t-1]}),Q_{t}^{q_{[0,t-1]} }) {}\\ & & \quad =\int _{\mathbb{X}}P(dx_{t}\vert q_{[0,t-1]})\sum _{\mathcal{M}_{t}}\bigg(1{_{\{q_{ t}=Q_{t}^{q_{[0,t-1]} }(x_{t})\}}\eta }^{{Q}^{q_{[0,t-1]} } }(P(dx_{t}\vert q_{[0,t-1]}),q_{t}))\bigg), {}\\ \end{array}$$

with ${\eta }^{{Q}^{q_{[0,t-1]} }}(P(dx_{ t}\vert q_{[0,t-1]}),q_{t})) = c(x_{t},\gamma _{t}^{0}(P(dx_{ t}\vert q_{[0,t-1]}),q_{t}))$.

Now, one can construct an equivalence class among the past q _{[0, t − 1]} sequences, which induce the same π _t, and can replace the quantizers in this class with one, which induces a lower cost among the finitely many elements in each class for the final time stage. An optimal quantization output thus may be generated using $\pi _{t}(\cdot ) = P(x_{t} \in \cdot \vert q_{[0,t-1]})$ and x _t, by extending Witsenhausen’s argument used earlier in the proof of Theorem 10.3.1 for the terminal time stage. Since there are only finitely many past sequences and finitely many π _t, this leads to a Borel measurable selection of x _t for every π _t, leading to a quantizer and a measurable selection in $\pi _{t},x_{t}$. Hence, the final stage cost can be expressed as $F_{t}(\pi _{t})$ for some F _t, without any performance loss.

The same argument applies for all time stages: At time t = T − 2, the sufficient statistic both for the immediate cost and the cost-to-go is $P(dx_{t-1}\vert q_{[0,t-1]})$, and thus for the cost impacting the time stage t = T − 1 as a result of the optimality result for $Q_{T-1}$. To show that the separation result generalizes to all time stages, it suffices to prove that $\{(\pi _{t},Q_{t})\}$ has a controlled Markov chain form, if the encoders use the structure above.

Now, for t ≥ 1, for all $B \in \mathcal{B}(\mathcal{P}(\mathbb{X}))$,

$$\displaystyle\begin{array}{rcl} & & P\bigg(P(dx_{t}\vert q_{[0,t-1]}) \in B\bigg\vert P(dx_{s}\vert q_{[0,s-1]}),Q_{s},s \leq t - 1\bigg) \\ & & = P\bigg(\int _{x_{t-1}}P(dx_{t},dx_{t-1}\vert q_{[0,t-1]}) \in B\bigg\vert P(dx_{s}\vert q_{[0,s-1]}),Q_{s},s \leq t - 1\bigg) \\ & & = P\bigg(\bigg\{ \frac{\int _{x_{t-1}}P(dx_{t}\vert x_{t-1})P(q_{t-1}\vert x_{t-1},q_{[0,t-2]})P(dx_{t-1}\vert q_{[0,t-2]})} {\int _{x_{t-1},x_{t}}P(dx_{t}\vert x_{t-1})P(q_{t-1}\vert x_{t-1},q_{[0,t-2]})P(dx_{t-1}\vert q_{[0,t-2]})}\bigg\} \in B \\ & & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \bigg\vert P(dx_{s}\vert q_{[0,s-1]}),Q_{s},s \leq t - 1\bigg) \\ & & =P\bigg(\bigg\{ \frac{\int _{x_{t-1}}P(dx_{t}\vert x_{t-1})P(q_{t-1}\vert x_{t-1},q_{[0,t-2]})P(dx_{t-1}\vert q_{[0,t-2]})} {\int _{x_{t-1},x_{t}}P(dx_{t}\vert x_{t-1})P(q_{t-1}\vert x_{t-1},q_{[0,t-2]})P(dx_{t-1}\vert q_{[0,t-2]})}\bigg\} \in B \\ & & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \bigg\vert P(dx_{t-1}\vert q_{[0,t-2]}),Q_{t-1}\bigg) {}\end{array}$$

(10.35)

$$\displaystyle\begin{array}{rcl} & & =P\bigg(\int _{x_{t-1}}P(dx_{t},dx_{t-1}\vert q_{[0,t-1]}) \in B\bigg\vert P(dx_{t-1}\vert q_{[0,t-2]}),Q_{t-1}\bigg) \\ & & = P\bigg(P(dx_{t}\vert q_{[0,t-1]}) \in B\bigg\vert P(dx_{t-1}\vert q_{[0,t-2]}),Q_{t-1}\bigg). {}\end{array}$$

(10.36)

In the above derivation, (10.35) uses the fact that the term $P(q_{t-1}\vert x_{t-1},q_{[0,t-2]})$ is uniquely identified by $P(dx_{t-1}\vert q_{[0,t-2]})$ and Q _t − 1, which in turn is uniquely identified by q _{[0, t − 2]} and Q _t − 1. Furthermore, (10.36) defines a regular conditional probability measure since for all $B \in \mathcal{B}(\mathbb{X})$,

$$\displaystyle\begin{array}{rcl} & & \pi _{t}(B) = P(x_{t} \in B\vert q_{[0,t-1]}) {}\\ & & =\int _{x_{t-1}}P(x_{t} \in B,dx_{t-1}\vert q_{[0,t-1]}) {}\\ & & \quad =\int _{x_{t-1}}P(x_{t} \in B\vert x_{t-1})P(dx_{t-1}\vert q_{[0,t-1]}) {}\\ \end{array}$$

is measurable in π _t − 1, given Q _t − 1 (as a consequence of the measurability of (10.34) in π _t). As a consequence the conditional probability π _t(B), $B \in \mathcal{B}(\mathbb{X})$, is a measurable function of π _t − 1, given Q _t − 1. By Theorem B.2.2, we conclude that for any measurable function F _t of $P(dx_{t}\vert q_{[0,t-1]})$

$$\displaystyle\begin{array}{rcl} & & E[F_{t}(P(dx_{t}\vert q_{[0,t-1]}))\vert P(dx_{s}\vert q_{[0,s-1]}),Q_{s},s \leq t - 1] \\ & & \quad \quad = E[F_{t}(P(dx_{t}\vert q_{[0,t-1]}),Q_{t})\vert P(dx_{t-1}\vert q_{[0,t-2]}),Q_{t-1}],{}\end{array}$$

(10.37)

for every given Q _t − 1. Once again an equivalence relationship between the finitely many past quantizer outputs, based on the equivalence of the conditional measures $P(dx_{t-1}\vert q_{[0,t-2]})$ they induce, can be constructed and as a consequence the conditional probability measure π _t is measurable in $\{P(dx_{t-1}\vert q_{[0,t-2]}),Q_{t-1}\}$, given Q _t − 1. With the controlled Markov structure, we can essentially follow the same argument for earlier time stages. As such, it suffices that the encoder uses only $(P(dx_{t}\vert q_{[0,t-1]}),t)$ as its sufficient statistic for all time stages, to generate the optimal quantizer. An optimal quantizer uses x _t to generate the optimal quantization outputs. □

8.3 Proof of Theorem 10.3.3

We transform the problem into a real-time coding problem involving a fully observed Markov source. At time t = T − 1, the per-stage cost function can be written as follows, where $\gamma _{t}^{0}$ denotes a fixed receiver policy:

$$\displaystyle\begin{array}{rcl} & & E[c(x_{t},\gamma _{t}^{0}(q_{ [0,t]}))\vert q_{[0,t-1]}] \\ & & =\sum _{\mathcal{M}_{t}}P(q_{t} = k\vert q_{[0,t-1]})(\int _{\mathbb{X}}P(dx_{t}\vert q_{[0,t-1]},k)c(x_{t},\gamma _{t}^{0}(q_{ [0,t-1]},k))) \\ & & =\int _{\mathbb{X}}\sum _{\mathcal{M}_{t}}P(dx_{t},q_{t} = k\vert q_{[0,t-1]})c(x_{t},\gamma _{t}^{0}(q_{ [0,t-1]},k)) \\ & & =\int _{\mathcal{P}(\mathbb{X})}\int _{\mathbb{X}}\sum _{\mathcal{M}_{t}}P(dx_{t},q_{t}=k,d\tilde{\pi }_{t}\vert q_{[0,t-1]})c(x_{t},\gamma _{t}^{0}(q_{ [0,t-1]},k)) \\ & & =\int _{\mathcal{P}(\mathbb{X})}\,\int _{\mathbb{X}}\,\,\sum _{\mathcal{M}_{t}}P(d\tilde{\pi }_{t}\vert q_{[0,t-1]})P(dx_{t}\vert \tilde{\pi }_{t})P(q_{t}=k\vert \tilde{\pi }_{t},q_{[0,t-1]})c(x_{t},\gamma _{t}^{0}(q_{ [0,t-1]},k)) \\ & & =\int _{\mathcal{P}(\mathbb{X})}\,\sum _{\mathcal{M}_{t}}\,\,P(d\tilde{\pi }_{t}\vert q_{[0,t-1]})P(q_{t}=k\vert \tilde{\pi }_{t},q_{[0,t-1]})\,\int _{\mathbb{X}}\,P(dx_{t}\vert \tilde{\pi }_{t})c(x_{t},\gamma _{t}^{0}(q_{ [0,t-1]},k)) \\ & & =E[F(\tilde{\pi }_{t},q_{[0,t-1]},q_{t})\vert q_{[0,t-1]}], {}\end{array}$$

(10.38)

where $\tilde{\pi }_{t}(\cdot )=P(x_{t} \in \cdot \vert y_{[0,t]})$ and $F(\tilde{\pi }_{t},q_{[0,t-1]},q_{t})=\int _{\mathbb{X}}\,\,\tilde{\pi }_{t}(dx)c(x,\gamma _{t}^{0}(q_{[0,t-1]},q_{t}))$.

In the above derivation, the fourth equality follows from the property that

$$\displaystyle{x_{t} \leftrightarrow P(dx_{t}\vert y_{[0,t]}) \leftrightarrow q_{[0,t]}.}$$

We note that $F(\cdot,\gamma _{t}^{0}(q_{[0,t-1]},q_{t}))$ is measurable by Theorem B.2.1 and the fact that the cost is bounded.

As in the proof of Theorem 10.3.1, one may define q _t as a random variable on the probability space such that the joint distribution of $(q_{t},\tilde{\pi }_{t},q_{[0,t-1]})$ matches the characterization that $q_{t} = Q_{t}^{comp}(y_{[0,t]},q_{[0,t-1]})$, since

$$\displaystyle{P(q_{t}\vert \tilde{\pi }_{t},q_{[0,t-1]}) =\int \limits _{{\mathbb{Y}}^{t+1}}P(q_{t}\vert y_{[0,t]},q_{[0,t-1]}))P(y_{[0,t]}\vert \tilde{\pi }_{t},q_{[0,t-1]}).}$$

The cost at the final stage is thus written as $E[F(\tilde{\pi }_{t},q_{[0,t-1]},q_{t})\vert q_{[0,t-1]}]$, which is equivalent to, by the smoothing property of conditional expectation, to the following:

$$\displaystyle{E\bigg[E[F(\tilde{\pi }_{t},q_{[0,t-1]},q_{t})\vert \tilde{\pi }_{t},q_{[0,t-1]}]\bigg\vert q_{[0,t-1]}\bigg].}$$

Now, we will apply Witsenhausen’s two-stage lemma [396], to show that we can obtain a lower bound for the double expectation by picking q _t to be a measurable function of $\tilde{\pi }_{t},q_{[0,t-1]}$. Thus, we will find a composite quantization policy which only uses $(\tilde{\pi }_{t},q_{[0,t-1]})$ which performs as well as one which uses the entire memory available at the encoder. Let us fix the receiver function $\gamma _{t}^{0}$ at the receiver corresponding to a given composite quantization policy at the encoder $Q_{t}^{comp}$, let t = T − 1, and define for every $k \in \mathcal{M}_{t}$:

$$\displaystyle\begin{array}{rcl} \beta _{k}& :=& \bigg\{\tilde{\pi }_{t},q_{[0,t-1]} : F(\tilde{\pi }_{t},q_{[0,t-1]},k) \leq F(\tilde{\pi }_{t},q_{[0,t-1]},q^{\prime}),\forall q^{\prime}\neq k,q^{\prime} \in \mathcal{M}_{t}\bigg\}. {}\\ \end{array}$$

These sets are Borel, by the measurability of F on $\mathcal{P}(\mathbb{X})$. Such a construction covers the domain set consisting of $(\tilde{\pi }_{t},q_{[0,t-1]})$ but with overlaps. It covers the elements in $\mathcal{P}(\mathbb{X}) \times \prod _{t=0}^{T-2}\mathcal{M}_{t}$, since for every element in this product set, there is a minimizing $k \in \mathcal{M}_{t}$ ($\mathcal{M}_{t}$ is finite). To avoid the overlaps, we adopt the following technique which was introduced in Witsenhausen [396]. Let there be an ordering of the elements in $\mathcal{M}_{t}$ as $1,2,\ldots,\vert \mathcal{M}_{t}\vert $, and for k ≥ 1 in this sequence define a function $Q_{t}^{comp,{\ast}}$ as

$$\displaystyle{q_{t} = Q_{t}^{comp,{\ast}}(\tilde{\pi }_{ t},q_{[0,t-1]}) = k,\quad \mathrm{if}(\tilde{\pi }_{t},q_{[0,t-1]}) \in \beta _{k} -\cup _{i=1}^{k-1}\beta _{ i},}$$

with $\beta _{0} = \varnothing $. Thus, for any random variable q _t appropriately defined on the probability space,

$$\displaystyle\begin{array}{rcl} & & E\bigg[E[F(\tilde{\pi }_{t},q_{[0,t-1]},q_{t})\vert \tilde{\pi }_{t},q_{[0,t-1]}]\bigg\vert q_{[0,t-1]}\bigg] \\ & & \quad \quad \quad \quad \geq E\bigg[E[F(\tilde{\pi }_{t},q_{[0,t-1]},Q_{t}^{comp,{\ast}}(\tilde{\pi }_{ t},q_{[0,t-1]}))\vert \tilde{\pi }_{t},q_{[0,t-1]}]\bigg\vert q_{[0,t-1]}\bigg].{}\end{array}$$

(10.39)

Thus, the new composite policy performs at least as well as the original composite coding policy even though it has a restricted structure.

As such, if there is an optimal policy, it can be replaced with one which uses only $\{\tilde{\pi }_{t},q_{[0,t-1]}\}$ without any loss of performance, while keeping the receiver decision function $\gamma _{t}^{0}$ fixed. It should now be noted that $\{\tilde{\pi }_{t}\}$ is a Markov process. Further note that

$$\displaystyle\begin{array}{rcl} P(dx_{t}\vert dy_{[0,t]}) = \frac{\int _{x_{t-1}}P(dy_{t}\vert x_{t})P(dx_{t}\vert x_{t-1})P(dx_{t-1}\vert dy_{[0,t-1]})} {\int _{x_{t-1},x_{t}}P(dy_{t}\vert x_{t})P(dx_{t}\vert x_{t-1})P(dx_{t-1}\vert dy_{[0,t-1]})}& & {}\\ \end{array}$$

and that $P(dy_{t}\vert \tilde{\pi }_{s},s \leq t - 1) =\int _{x_{t}}P(dy_{t},dx_{t}\vert \tilde{\pi }_{s},s \leq t - 1) = P(dy_{t}\vert \tilde{\pi }_{t-1})$. These imply that the following is a Markov kernel:

$$\displaystyle\begin{array}{rcl} P(d\tilde{\pi }_{t}\vert \tilde{\pi }_{s},s \leq t - 1) = P(d\tilde{\pi }_{t}\vert \tilde{\pi }_{t-1}).& &{}\end{array}$$

(10.40)

We have thus obtained the structure of the encoder for the last stage. We iteratively proceed to study the other time stages. In particular, since $\{\tilde{\pi }_{t}\}$ is Markov, we could proceed as follows (in essence using Witsenhausen’s three-stage lemma [396]): For a three-stage cost problem, the cost at time t = 2 can be written as, for measurable functions $c_{2},c_{3}$,

$$\displaystyle\begin{array}{rcl} & & E\bigg[c_{2}(\tilde{\pi }_{2},\gamma _{2}^{0}(q_{ 1},q_{2}),q_{1},q_{2}) {}\\ & & \quad \quad + E[c_{3}(\tilde{\pi }_{3},\gamma _{3}^{0}(q_{ 1},q_{2},Q_{3}^{comp}(\tilde{\pi }_{ 3},q_{2},q_{1}))\vert \tilde{\pi }_{3},q_{2},\tilde{\pi }_{2},q_{1},\tilde{\pi }_{1})]\bigg\vert \tilde{\pi }_{2},\tilde{\pi }_{1},q_{2},q_{1}\bigg]. {}\\ \end{array}$$

Since

$$\displaystyle{P(d\tilde{\pi }_{3},q_{2},q_{1}\vert \tilde{\pi }_{2},\tilde{\pi }_{1},q_{2},q_{1}) = P(d\tilde{\pi }_{3},q_{2},q_{1}\vert \tilde{\pi }_{2},q_{2},q_{1})}$$

and since under $Q_{3}^{comp,{\ast}}$, q ₃ is a function of $\tilde{\pi }_{3}$ and $q_{1},q_{2}$, the expectation above is equal to, for some measurable F ₂(. ), $E[F_{2}(\tilde{\pi }_{2},q_{2},q_{1})]$. Measurability follows since

$$\displaystyle\begin{array}{rcl} & & E\bigg[c_{2}(\tilde{\pi }_{2},\gamma _{2}^{0}(q_{ 1},q_{2}),q_{1},q_{2}) {}\\ & & \quad \quad + E[c_{3}(\tilde{\pi }_{3},\gamma _{3}^{0}(q_{ 1},q_{2},Q_{3}^{comp,{\ast}}(\tilde{\pi }_{ 3},q_{2},q_{1}))\vert \tilde{\pi }_{3},q_{2},\tilde{\pi }_{2},q_{1},\tilde{\pi }_{1})]\bigg\vert \tilde{\pi }_{2},\tilde{\pi }_{1},q_{2},q_{1}\bigg] {}\\ \end{array}$$

is measurable. Thus, a composite quantization policy at time 2 which uses $\tilde{\pi }_{2}$ and q ₁ and which is without any loss in comparison with the original policy can be constructed.

By a similar argument, an optimal encoder at time t, $1 \leq t \leq T - 1$ only uses $(\tilde{\pi }_{t},q_{[0,t-1]})$. The encoder at time t = 0 uses $\tilde{\pi }_{0}$, where $\tilde{\pi }_{0} =\nu _{0}$ is the prior distribution on the initial state.

Now that we have obtained the restricted structure for a composite quantization policy which is without any loss, we can express this as

$$\displaystyle{Q_{t}^{comp}(\tilde{\pi }_{ t},q_{[0,t-1]}) = {Q}^{q_{[0,t-1]} }(\tilde{\pi }_{t}),\quad \forall \tilde{\pi }_{t},q_{[0,t-1]}}$$

such that the quantizer action ${Q}^{q_{[0,t-1]} } \in \mathbb{Q}(\mathcal{P}(\mathbb{X});\mathcal{M}_{t})$ is generated using only $q_{[0,t-1]}$ and the quantizer outcome is generated by evaluating ${Q}^{q_{[0,t-1]} }(\tilde{\pi }_{t})$ for every $\tilde{\pi }_{t}$. □

8.4 Proof of Theorem 10.3.4

At time t = T − 1, an optimal receiver will use $P(dx_{t}\vert q_{[0,t]})$ as a sufficient statistic for an optimal decision (or any receiver can be replaced with one which uses this sufficient statistic without any loss). As in the proof of Theorem 10.3.2, let us fix a receiver policy which only uses the posterior $P(dx_{t}\vert q_{[0,t]})$ as its sufficient statistic. We now note that

$$\displaystyle\begin{array}{rcl} P(dx_{t}\vert q_{[0,t]})& =& \int _{\tilde{\pi }_{t}}P(dx_{t}\vert \tilde{\pi }_{t})P(d\tilde{\pi }_{t}\vert q_{[0,t]}).{}\end{array}$$

(10.41)

Let us note that

$$\displaystyle\begin{array}{rcl} & & P(d\tilde{\pi }_{t}\vert q_{[0,t]}) = \frac{P(q_{t},d\tilde{\pi }_{t}\vert q_{[0,t-1]})} {\int _{\tilde{\pi }_{t}}P(q_{t},d\tilde{\pi }_{t}\vert q_{[0,t-1]})} \\ & & = \frac{P(q_{t}\vert \tilde{\pi }_{t},q_{[0,t-1]})P(d\tilde{\pi }_{t}\vert q_{[0,t-1]})} {\int _{\tilde{\pi }_{t}}P(q_{t}\vert \tilde{\pi }_{t},q_{[0,t-1]})P(d\tilde{\pi }_{t}\vert q_{[0,t-1]})}.{}\end{array}$$

(10.42)

The term $P(q_{t}\vert \tilde{\pi }_{t},q_{[0,t-1]})$ is determined by the quantizer action Q _t (this follows from Theorem 10.3.3). Furthermore, given Q _t, the relation (10.42) is measurable on $\mathcal{P}(\mathcal{P}(\mathbb{X}))$ (i.e., in $\Xi _{t}(\cdot ) = P(\tilde{\pi }_{t} \in \cdot \vert q_{[0,t-1]})$) under weak convergence.

This argument, as in the proof of Theorem 10.3.2, follows from the observation that in the numerator of (10.42) the function $\kappa _{B} : \mathcal{P}(\mathcal{P}(\mathbb{X})) \rightarrow \mathbb{R}$ defined as $\kappa _{B}(\Xi ) = \Xi (B)$ is measurable under weak convergence topology as a consequence of Theorem B.2.1, for each $B \in \mathcal{B}(\mathcal{P}(\mathbb{X}))$. By Theorem B.2.2, this implies that the relation in (10.42) is measurable on $\mathcal{P}(\mathcal{P}(\mathbb{X}))$.

Let us denote the quantizer applied, given the past realizations of quantizer outputs as $Q_{t}^{q_{[0,t-1]} }$. Note that q _t is deterministically determined by $(\pi _{t},Q_{t}^{q_{[0,t-1]} })$ and the optimal receiver function can be expressed as $\gamma _{t}^{0}(\Xi _{t},q_{t})$ (as a measurable function), given $Q_{t}^{q_{[0,t-1]} }$. The cost at time t = T − 1 can be expressed, given the quantizer $Q_{t}^{q_{[0,t-1]} }$, for some Borel function G, as $G(\Xi _{t},Q_{t}^{q_{[0,t-1]} })$, where

$$\displaystyle\begin{array}{rcl} & & G(\Xi _{t},Q_{t}^{q_{[0,t-1]} }) {}\\ & & =\int _{\mathcal{P}(\mathbb{X})}\Xi _{t}(d\tilde{\pi }_{t})\sum _{\mathcal{M}_{t}}1{_{\{q_{ t}=Q_{t}^{q_{[0,t-1]} }(\pi _{t})\}}\eta }^{{Q}^{q_{[0,t-1]} } }(\Xi _{t},q_{t})), {}\\ \end{array}$$

with $ {\eta}^{{Q}^{q_{[0,t-1]}}}(\Xi _{ t},q_{t})) =\int \tilde{\pi } _{t}(dx_{t})c(x_{t},\gamma _{t}^{0}(\Xi _{ t},q_{t}))$. As in the proof of Theorem 10.3.2, one can construct an equivalence class among the past $q_{[0,t-1]}$ sequences which induce the same $\Xi _{t}$ and can replace the quantizers $Q_{t}^{q_{[0,t-1]} }$ for each class with one which induces a lower cost among the finitely many elements in each such class, for the final time stage. Thus, an optimal quantization output may be generated using $\Xi _{t}(\cdot ) = P(\tilde{\pi }_{t} \in \cdot \vert q_{[0,t-1]})$ and $\tilde{\pi }_{t}$. Since there are only finitely many past sequences and finitely many Ξ _t, this leads to a Borel measurable selection of $\tilde{\pi }_{t}$ for every $\Xi _{t}$, leading to a quantizer and a measurable selection in $\Xi _{t},\tilde{\pi }_{t}$.

Since such a selection for Q _t only uses $\Xi _{t}$, an optimal quantization output may be generated using $\Xi _{t}(\cdot ) = P(\tilde{\pi }_{t} \in \cdot \vert q_{[0,t-1]})$ and $\tilde{\pi }_{t}$. Hence, $G(\Xi _{t},Q_{t}^{q_{[0,t-1]} })$ can be replaced with $F_{t}(\Xi _{t})$ for some F _t, without any performance loss.

The same argument applies for all time stages: At time t = T − 2, the sufficient statistic both for the immediate cost and the cost-to-go is $P(d\tilde{\pi }_{t-1}\vert q_{[0,t-1]})$, and thus for the cost impacting the time stage t = T − 1, as a result of the optimality result for Q _T − 1. To show that the separation result generalizes to all time stages, it suffices to prove that $\{(\Xi _{t},Q_{t})\}$ is a controlled Markov chain, if the encoders use the structure above.

Now, for $t \geq 1$, for all $B \in \mathcal{B}(\mathcal{P}(\mathcal{P}(\mathbb{X})))$,

$$\displaystyle\begin{array}{rcl} & & P\bigg(P(d\tilde{\pi }_{t}\vert q_{[0,t-1]}) \in B\bigg\vert P(d\tilde{\pi }_{s}\vert q_{[0,s-1]}),Q_{s},s \leq t - 1\bigg) \\ & & = P\bigg(\int _{\tilde{\pi }_{t-1}}P(d\tilde{\pi }_{t},d\tilde{\pi }_{t-1}\vert q_{[0,t-1]}) \in B\bigg\vert P(d\tilde{\pi }_{s}\vert q_{[0,s-1]}),Q_{s},s \leq t - 1\bigg) \\ & & = P\bigg(\bigg\{ \frac{\int _{\tilde{\pi }_{t-1}}P(d\tilde{\pi }_{t}\vert \tilde{\pi }_{t-1})P(q_{t-1}\vert \tilde{\pi }_{t-1},q_{[0,t-2]})P(d\tilde{\pi }_{t-1}\vert q_{[0,t-2]})} {\int _{\tilde{\pi }_{t-1},\tilde{\pi }_{t}}P(d\tilde{\pi }_{t}\vert \tilde{\pi }_{t-1})P(q_{t-1}\vert \tilde{\pi }_{t-1},q_{[0,t-2]})P(d\tilde{\pi }_{t-1}\vert q_{[0,t-2]})}\bigg\} \in B \\ & & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \bigg\vert P(d\tilde{\pi }_{s}\vert q_{[0,s-1]}),Q_{s},s \leq t - 1\bigg) \\ & & =P\bigg(\bigg\{ \frac{\int _{\tilde{\pi }_{t-1}}P(d\tilde{\pi }_{t}\vert \tilde{\pi }_{t-1})P(q_{t-1}\vert \tilde{\pi }_{t-1},q_{[0,t-2]})P(d\tilde{\pi }_{t-1}\vert q_{[0,t-2]})} {\int _{\tilde{\pi }_{t-1},\tilde{\pi }_{t}}P(d\tilde{\pi }_{t}\vert \tilde{\pi }_{t-1})P(q_{t-1}\vert \tilde{\pi }_{t-1},q_{[0,t-2]})P(d\tilde{\pi }_{t-1}\vert q_{[0,t-2]})}\bigg\} \in B \\ & & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \bigg\vert P(d\tilde{\pi }_{t-1}\vert q_{[0,t-2]}),Q_{t-1}\bigg) {}\end{array}$$

(10.43)

$$\displaystyle\begin{array}{rcl} & & = P\bigg(\int _{\tilde{\pi }_{t-1}}P(d\tilde{\pi }_{t},d\tilde{\pi }_{t-1}\vert q_{[0,t-1]}) \in B\bigg\vert P(d\tilde{\pi }_{t-1}\vert q_{[0,t-2]}),Q_{t-1}\bigg) \\ & & = P\bigg(P(d\tilde{\pi }_{t}\vert q_{[0,t-1]}) \in B\bigg\vert P(d\tilde{\pi }_{t-1}\vert q_{[0,t-2]}),Q_{t-1}\bigg). {}\end{array}$$

(10.44)

Here, (10.43) uses the fact that $P(q_{t-1}\vert \tilde{\pi }_{t-1},q_{[0,t-2]})$ is identified by $\{\Xi _{t-1},Q_{t-1}\}$, which in turn is uniquely identified by q _{[0, t − 2]} and Q _t − 1. Furthermore, the expression in (10.44) defines a regular conditional probability measure, since for all $B \in \mathcal{B}(\mathcal{P}(\mathbb{X}))$,

$$\displaystyle\begin{array}{rcl} & & \Xi _{t}(B) = P(\tilde{\pi }_{t} \in B\vert q_{[0,t-1]}) {}\\ & & =\int _{\tilde{\pi }_{t-1}}P(\tilde{\pi }_{t} \in B,d\tilde{\pi }_{t-1}\vert q_{[0,t-1]}) {}\\ & & \quad =\int _{\tilde{\pi }_{t-1}}P(\tilde{\pi }_{t} \in B\vert \tilde{\pi }_{t-1})P(d\tilde{\pi }_{t-1}\vert q_{[0,t-1]}) {}\\ \end{array}$$

is measurable in $\Xi _{t-1}$, given Q _t − 1 (as a consequence of the measurability of (10.42) in $\Xi _{t}$). Hence, by Theorem B.2.2, we conclude that for any measurable function F _t of Ξ _t

$$\displaystyle\begin{array}{rcl} E[F_{t}(\Xi _{t})\vert \Xi _{[0,t-1]},Q_{[0,t-1]}] = E[F_{t}(\Xi _{t}),Q_{t})\vert \Xi _{t-1},Q_{t-1}],& & {}\\ \end{array}$$

for every given Q _t − 1. Now, once again an equivalence relationship between the finitely many past quantizer outputs, based on the equivalence of the conditional measures $\Xi _{t-1}$ they induce, can be constructed. With the controlled Markov structure, we can follow the same argument for earlier time stages. Therefore, it suffices that the encoder uses only $(\Xi _{t},t)$ as its sufficient statistic for all time stages, to generate the optimal quantizer. An optimal quantizer uses $\tilde{\pi }_{t}$ to generate the optimal quantization outputs. □

8.5 Proof of Theorem 10.3.6

We note that the analysis in (10.38)–(10.33) apply identically for the case with control, by replacing a receiver policy with a fixed control policy. We can thus obtain the structure of the optimal encoder for the last stage. We iteratively proceed to study the other time stages. The only difference here is that, with control, $\{x_{t}\}$ is no longer Markov, but $\{x_{t},u_{t}\}$ forms a controlled Markov chain. For a three-stage cost problem, the cost at time t = 2 can be written as, for measurable functions c ₂, c ₃,

$$\displaystyle\begin{array}{rcl} & & c_{2}(x_{2},u_{2}(q_{1},q_{2})) + E[c_{3}(x_{3},u_{3}(q_{1},q_{2},Q_{3}^{comp}(x_{ 3},q_{2},q_{1}))\vert q_{2},x_{2},u_{2},q_{1},x_{1})]. {}\\ \end{array}$$

Since

$$\displaystyle{P(dx_{3},q_{2},q_{1}\vert x_{2},u_{2},x_{1},q_{2},q_{1}) = P(dx_{3},q_{2},q_{1}\vert x_{2},u_{2},q_{2},q_{1}),}$$

and u ₂ is a function of $q_{1},q_{2}$ (with the control policy fixed, as mentioned earlier), and since under $Q_{3}^{comp,{\ast}}$, q ₃ is a function of x ₃ and $q_{1},q_{2}$, the expectation above is equal to $F_{2}(x_{2},q_{2},q_{1})$ for some measurable F ₂(. ). Thus, an optimal composite quantization policy at time 2 uses x ₂ and q ₁. The proof follows identically for other time stages. Hence, we have established an analogue of Theorem 10.3.1.

By observing that $P(dx_{t}\vert q_{[0,t]})$ is a sufficient statistic for an optimal control policy and the construction of a controlled Markov chain as in (10.35), it follows that the discussion in Theorem 10.3.2 also applies in this case. □

8.6 Proof of Theorem 10.4.2

We show that the measurable selection hypothesis (see Sect. D.1.5) applies in the set of states which are visited with probability 1. In particular, the elements in the set of reachable probability measures admit densities which furthermore satisfy the equicontinuity condition in view of Lemma 10.8.1 below.

The following is a key lemma.

Lemma 10.8.1.

For all $t \geq 1$, π _t (dx) is absolutely continuous with respect to the Lebesgue measure, i.e., it has a probability density function, which we will also denote by π _t by an abuse of notation. The density function π _t is uniformly continuous for every t and the sequence $\{\pi _{t}\}$ is a uniformly bounded and uniformly equicontinuous family.

Proof.

Let ϕ denote the common density of the Gaussian noise variables w _t. Since $x_{t} = f(x_{t-1}) + w_{t}$ and w _t is independent of q _{[0, t − 1]}, it is easy to see that the pdf π _t of the conditional measure $P(dx_{t}\vert q_{[0,t-1]})$ is given by

$$\displaystyle{\pi _{t}(z) =\int _{{\mathbb{R}}^{n}}\phi (z - f(x_{t-1}))P(dx_{t-1}\vert q_{[0,t-1]}),\quad z \in {\mathbb{R}}^{n}.}$$

The uniform boundedness of $\{\pi _{t}\}$ is immediate. Since ϕ is a Gaussian density, there is a C > 0 such that ${\bigl | \frac{\partial } {\partial z_{j}}\phi (z)\bigr |} \leq C$, j = 1, …, n. A standard application of the dominated convergence theorem implies that the partial derivatives of $\pi _{t}$ exist and they also satisfy ${\bigl | \frac{\partial } {\partial z_{j}}\pi _{t}(z)\bigr |} \leq C$, j = 1, …, n. Since C does not depend on t, the sequence of densities $\{\pi _{t}\}$ is uniformly equicontinuous. □

The following lemma is important for the proof.

Lemma 10.8.2 ([437]).

(a)
Let $\{\mu _{n}\}$ be a sequence of density functions on ${\mathbb{R}}^{n}$ which are uniformly equicontinuous and uniformly bounded and assume $\mu _{n} \rightarrow \mu$ weakly. Then $\mu _{n} \rightarrow \mu$ in total variation.
(b)
Let $\{Q_{n}\}$ be a sequence in $\mathcal{Q}_{c}$ such that Q _n → Q weakly at P. If P admits a density, then $Q_{n} \rightarrow Q$ in total variation at P.
(c)
Let $P_{n}Q_{n} \rightarrow PQ$ weakly, where $\{Q_{n}\}$ is a sequence in $\mathcal{Q}_{c}$ . Suppose further that P _n → P in total variation where P admits a density. Then $P_{n}Q_{n} \rightarrow PQ$ in total variation.
(d)
Let P _n → P in total variation where P admits a density function. Let $\{Q_{n}\}$ be a sequence in $\mathcal{Q}_{c}$ . Then, there exists some subsequence such that $P_{k_{n}}Q_{k_{n}} \rightarrow PQ$ for some $Q \in \mathcal{Q}_{c}$.

Proof.

(a)
Since $\mu _{n} \rightarrow \mu$ weakly, the sequence $\{\mu _{n}\}$ is tight. Then, using a minor modification of Lemma 4.6.3, one can show that μ has a density and that the equicontinuity and uniform boundedness of the $\{\mu _{n}\}$ imply that along some subsequence $\mu _{k_{n}}(x) \rightarrow \mu (x)$ pointwise for all x. By Scheffe’s theorem [70], $\mu _{k_{n}}$ converges to μ in L ₁, which is equivalent to convergence in total variation. This convergence holds for the original sequence μ _n as well, since there cannot be a subsequence which does not converge in L ₁: Suppose otherwise. Then, there exists a subsequence such that $\mu _{m_{n}}$ converges to μ weakly, but $\mu _{m_{n}}$ does not converge in L ₁. Then for some ε > 0 and a further subsequence with index m′ _n, $\|\mu _{m^{\prime}_{n}} -\mu \|_{TV } \geq \epsilon$. But, along a further subsequence $\mu _{m^{\prime\prime}_{n}}$ will converge to μ in total variation (by the arguments above), leading to a contradiction.
(b)
It was shown in (4.15) that
$$\displaystyle\begin{array}{rcl} \|PQ_{n} - PQ\|_{TV }& \leq &\sum _{i=1}^{M}P(B_{ i}^{n} \bigtriangleup B_{ i}), {}\\ \end{array}$$
where $B_{1}^{n},\ldots,B_{M}^{n}$ and $B_{1},\ldots,B_{M}$ are the cells of Q _n and Q, respectively, and $B_{i}^{n} \bigtriangleup B_{i} = (B_{i}^{n} \setminus B_{i}) \cup (B_{i} \setminus B_{i}^{n})$. Since Q has convex cells, the boundary $\partial B_{i}$ of each cell B _i has zero Lebesgue measure, so $P(\partial B_{i}) = 0$ because P has a density. Since $\partial (B_{i} \times \{ j\}) = \partial B_{i} \times \{ j\}$ and
$$\displaystyle{P(A \times \{ j\}) = P(A \cap B_{j}),}$$
we have
$$\displaystyle{PQ(\partial (B_{i} \times \{ j\})) = P(\partial B_{i} \cap B_{j}) = 0,}$$
for all i and j. Thus if $P{Q}^{n} \rightarrow PQ$ weakly, then $P{Q}^{n}(B_{i} \times \{ j\}) \rightarrow PQ(B_{i} \times \{ j\})$ by the Portmanteau theorem, which is equivalent to
$$\displaystyle{P(B_{i} \cap B_{j}^{n}) \rightarrow P(B_{ i} \cap B_{j})}$$
for all i and j. Since $B_{1}^{n},\ldots,B_{M}^{n}$ and $B_{1},\ldots,B_{M}$ are both partitions of ${\mathbb{R}}^{n}$, this implies $P(B_{i}^{n} \bigtriangleup B_{i}) \rightarrow 0$ for all i, which in turns proves that $P{Q}^{n} \rightarrow PQ$ in total variation via (4.15).
(c)
For any measurable $A \subset {\mathbb{R}}^{n} \times \mathcal{M}$ we have
$$\displaystyle\begin{array}{rcl} \vert PQ_{n}(A) - PQ(A)\vert &\leq &\vert PQ_{n}(A) - P_{n}Q_{n}(A)\vert {}\\ & &\mbox{ } + \vert P_{n}Q_{n}(A) - PQ(A)\vert{.} {}\\ \end{array}$$
It is relatively easy to see that $P_{n} \rightarrow P$ in total variation implies $\vert P_{n}Q_{n}(A) - PQ_{n}(A)\vert \rightarrow 0$. This follows from the observation that for $A_{1} =\{ x : (x,y) \in A\}$ and for x ∈ A ₁, $A_{2}(x) =\{ y : (x,y) \in A\}$,
$$\displaystyle{\vert P_{n}Q_{n}(A) - PQ_{n}(A)\vert \leq \int _{A_{1}}\vert P_{n}(dx) - P(dx)\vert \bigg(\int _{A_{2}(x)}Q_{n}(dy\vert x)\bigg)}$$
and that
$$\displaystyle{\int _{A_{1}}\vert P_{n}(dx) - P(dx)\vert \bigg(\int _{A_{2}(x)}Q_{n}(dy\vert x)\bigg) \leq \| P_{n} - P\|_{TV } \rightarrow 0.}$$

On the other hand, for any A with $PQ(\partial A) = 0$, we have $\vert P_{n}Q_{n}(A) - PQ(A)\vert \rightarrow 0$ since $P_{n}Q_{n} \rightarrow PQ$ weakly. This proves that PQ _n → PQ weakly. But since P admits a density, part (b) now implies that Q _n → Q in total variation.

Then we have
$$\displaystyle\begin{array}{llllll} & &\|P_{n}Q_{n}-PQ\|_{TV } {}\\ & =& \sup \limits_{f:\|f\|_{\infty }\leq 1}\left\vert \sum \limits_{i=1}^{M}{\left(\int _{{ \mathbb{R}}^{n}}f(x,i)Q_{n}(i\vert x)P_{n}(dx)-\int _{{\mathbb{R}}^{n}}f(x,i)Q(i\vert x)P(dx)\right)}\right\vert {}\\ &\leq &\sup \limits_{f:\|f\|_{\infty }\leq 1}\left|\sum \limits_{i=1}^{M}{\left(\int _{{ \mathbb{R}}^{n}}f(x,i)Q_{n}(i\vert x)P_{n}(dx)-\int _{{\mathbb{R}}^{n}}f(x,i)Q_{n}(i\vert x)P(dx)\right)}\right| {}\\ & & +\sup \limits_{f:\|f\|_{\infty }\leq 1}\left|\sum \limits_{i=1}^{M}{\left(\int _{{ \mathbb{R}}^{n}}f(x,i)Q_{n}(i\vert x)P(dx)-\int _{{\mathbb{R}}^{n}}f(x,i)Q(i\vert x)P(dx)\right)}\right| {}\\ & =& \sup \limits_{f:\|f\|_{\infty }\leq 1}\left|{\left(\int _{{\mathbb{R}}^{n}}(P_{n}(dx)-P(dx))\sum \limits_{i=1}^{M}f(x,i)Q_{ n}(i\vert x)\right)}\right| {}\\ & & +\sup \limits_{f:\|f\|_{\infty }\leq 1}\big|\sum \limits_{i=1}^{M}\left(\int _{{ \mathbb{R}}^{n}}f(x,i)Q_{n}(i\vert x)P(dx)\right. \\& &-\int _{{\mathbb{R}}^{n}}f(x,i)Q(i\vert x)\left.P(dx)\right)\big| \rightarrow 0 \end{array}$$
since P _n → P in total variation and Q _n → Q in total variation at P.
(d)
By (b) above, there exists a subsequence $Q_{m_{n}}$ such that $PQ_{m_{n}}$ converges to PQ for some Q. Since $P_{n} \rightarrow P$, we have that
$$\displaystyle{\|P_{m_{n}}Q_{m_{n}} - PQ\|_{TV } \leq \| P_{m_{n}}Q_{m_{n}} - PQ_{m_{n}}\|_{TV } +\| PQ_{m_{n}} - PQ\|_{TV } \rightarrow 0,}$$
and the result follows. □

8.6.1 A Measurable Selection Condition and the Proof of Theorem 10.4.2

We now provide a relaxation of the measurable selection conditions considered in [194, Theorem 3.3.5] (see also Appendix D). Let $\mathcal{S}\subset \mathcal{P}({\mathbb{R}}^{n})$ be the set of reachable states for π under any composite coding policy. Note that by Lemma 10.8.1, the set of densities in $\mathcal{S}$ is uniformly bounded and equicontinuous.

8.6.1.1 Condition D

(i)
The cost function c(π, Q) is continuous on $\mathcal{S}\times \mathcal{Q}_{c}$ in the sense that $P_{n}Q_{n} \rightarrow PQ$ implies that $c(P_{n},Q_{n}) \rightarrow c(P,Q)$.
(ii)
$\mathcal{Q}_{c}$ is compact in total variation at any input π admitting a density.
(iii)
$$\displaystyle{\int _{\mathcal{P}({\mathbb{R}}^{n})}J_{t+1}^{T}(\pi )P(d\pi _{ t+1}\vert \pi _{t},Q_{t})}$$
is a continuous function on $\mathcal{S}\times \mathcal{Q}_{c}$, for the value function $J_{t+1}^{T}$ at time t defined recursively as
$$\displaystyle{J_{t}^{T}(\pi _{ t}) =\min _{Q_{t}\in \mathcal{Q}_{c}}c(\pi _{t},Q_{t}) + E[J_{t+1}^{T}(\pi _{ t+1})\vert \pi _{t} =\pi,Q_{t} = Q],}$$
with $J_{T}^{T} = 0$.

The proof of the following theorem follows essentially from dynamic programming equation itself. This is related to (but weaker than) the Measurable Selection Condition 3.3.2 (with a,b,c1) and the subsequent Theorem 3.3.5 in [194] since here we directly consider the value function.

Theorem 10.8.1.

Under Condition D, there exists an optimal (Borel measurable) policy in $\Pi _{W}$ achieving (10.13).

In view of the preceding theorem, to prove Theorem 10.4.2 it suffices to show that Condition D holds. We note that (ii) in Condition D directly follows from Theorem 4.7.4.

An important supporting lemma is the following.

Lemma 10.8.3.

Let for $\pi _{n} \in \mathcal{S}$, $Q_{n} \in \mathcal{Q}_{c}$

$$\displaystyle\begin{array}{rcl} & & \pi ^{\prime}(m,\pi _{n},Q_{n})(C) := P(x_{n+1} \in C\vert \pi _{n},Q_{n},q_{n} = m) \\ & & \quad \quad \quad = \frac{1} {\pi _{n}(B_{m}^{n})}\int _{z\in C}\bigg\{\int \pi _{n}(dx)1_{\{x\in B_{m}^{n}\}}\phi (z-f(x))\bigg\}dz.{}\end{array}$$

(10.45)

As $(\pi _{n},Q_{n}) \rightarrow (\pi,Q)$ in total variation, for every $m \in \{ 1,\cdots \,,M\},$

$$\displaystyle{\|\pi ^{\prime}(m,\pi _{n},Q_{n}) -\pi ^{\prime}(m,\pi,Q)\|_{TV } \rightarrow 0,}$$

provided that π(B _m ) > 0.

The following lemma is a minor generalization of Theorem 4.5.4.

Lemma 10.8.4.

Let $P_{n}Q_{n} \rightarrow PQ$ in total variation. Then,

$$\displaystyle\begin{array}{rcl} & & \inf _{\gamma }\int P_{n}(dx)Q_{n}(q\vert x)c(x,\gamma (q)) \\ & & \quad \quad \quad \quad \rightarrow \inf _{\gamma }\int P(dx)Q(q\vert x)c(x,\gamma (q)).{}\end{array}$$

(10.46)

The next result establishes (i) in Condition D, the continuity of $c(\pi _{t},Q_{t})$ on $\mathcal{S}\times \mathcal{Q}_{c}$.

Theorem 10.8.2.

c(π,Q) is continuous on $\mathcal{S}\times \mathcal{Q}_{c}$.

Proof.

Let $\{(\pi _{n},Q_{n})\}$ be a sequence in $\mathcal{S}\times \mathcal{Q}_{c}$ such that $\pi _{n}Q_{n} \rightarrow \pi Q$ weakly. It follows from Lemma 10.8.2(a) that $\pi _{n} \rightarrow \pi$ in total variation and from Lemma 10.8.2(c) that $\pi _{n}Q_{n} \rightarrow \pi Q$ in total variation. Then Lemma 10.8.4 implies that $c(\pi _{n},Q_{n}) \rightarrow c(\pi,Q)$. □

Next we establish (iii) in Condition D. We wish to prove that $E[J_{t}^{T}(\pi _{t})\vert \pi,Q]$ is continuous in $\pi,Q$. We apply backward induction. At t = T − 1, let

$$\displaystyle{J_{T-1}^{T}(\pi _{ T-1}) =\min _{Q}c(\pi _{T-1},Q_{T-1}).}$$

By Theorem 10.8.2 and the compactness of the set of quantizers (by Theorem 4.7.5), there exists an optimal quantizer, $Q_{T-1}^{{\ast}}$. Furthermore, the following holds:

Lemma 10.8.5.

Let $F : \mathcal{S}\times \mathcal{Q}_{c} \rightarrow \mathbb{R}$ be continuous on $\mathcal{S}\times \mathcal{Q}_{c}$ in the sense that $P_{n}Q_{n} \rightarrow PQ$ implies that $F(\pi _{n},Q_{n}) \rightarrow F(\pi,Q)$ . Then, the function min _Q F(π,Q) is continuous in π.

Proof.

Let π _n → π, Q _n be optimal for π _n, and Q be optimal for π. Then

$$\displaystyle\begin{array}{rcl} & & \vert \min _{Q}F(\pi _{n},Q) -\min _{Q}F(\pi,Q)\vert \\ & &\leq \max \bigg (F(\pi _{n},Q) - F(\pi,Q),F(\pi,Q_{n}) - F(\pi _{n},Q_{n})\bigg).{}\end{array}$$

(10.47)

The first term above converges since F is continuous in π, Q. The second converges also. Suppose otherwise. Then, for some ε > 0, there exists a subsequence such that

$$\displaystyle{F(\pi,Q_{k_{n}}) - F(\pi _{k_{n}},Q_{k_{n}}) \geq \epsilon{.}}$$

Consider the sequence $(\pi _{k_{n}},Q_{k_{n}})$. There exists a subsequence such that $(\pi _{k^{\prime}_{n}},Q_{k^{\prime}_{n}})$ which converges to π, Q′ for some Q′, by Lemma 10.8.2(d). Hence, for this subsequence, we have convergence of $F(\pi _{k^{\prime}_{n}},Q_{k^{\prime}_{n}})$ as well as $F(\pi,Q_{k^{\prime}_{n}})$, leading to a contradiction. □

As a consequence, $J_{T-1}^{T}(\pi _{T-1})$ is continuous in π _T − 1. Consider now t = T − 2 and we wish to see if there is a solution to the following equality:

$$\displaystyle\begin{array}{rcl} J_{T-2}^{T}(\pi _{ T-2}) =\min _{Q}\bigg(c(\pi _{T-2},Q_{T-2}) + E[J_{T-1}^{T}(\pi _{ T-1})\vert \pi _{T-2},Q_{T-2}]\bigg).& &{}\end{array}$$

(10.48)

Note that

$$\displaystyle\begin{array}{rcl} & & E[J_{T-1}^{T}(\pi _{ T-1})\vert \pi _{T-2},Q_{T-2}] {}\\ & & =\sum _{ m=1}^{M}P(\pi ^{\prime}(m,\pi _{ T-2},Q_{T-2})\vert \pi _{T-2},Q_{T-2})J_{T-1}^{T}(\pi ^{\prime}(m,\pi _{ T-2},Q_{T-2}), {}\\ \end{array}$$

where

$$\displaystyle{P(\pi ^{\prime}(m,\pi _{T-2},Q_{T-2})\vert \pi _{T-2},Q_{T-2}) = P(q_{T-2} = m\vert \pi _{T-2},Q_{T-2})}$$

and

$$\displaystyle\begin{array}{rcl} \pi ^{\prime}(m,\pi _{T-2},Q_{T-2})(dz)=\frac{\int \,\pi _{T-2}(dx_{T-2})P(q_{t-2}=m\vert \pi _{T-2},x_{T-2})P(dz\vert x_{T-2})} {\int \,\,\int \,\,\pi _{T-2}(dx_{T-2})P(q_{t-2}=m\vert \pi _{T-2},x_{T-2})P(dz\vert x_{T-2})},& & {}\\ \end{array}$$

or

$$\displaystyle\begin{array}{rcl} \pi ^{\prime}(m,\pi,Q)(C) = \frac{1} {\pi (B_{m})}\int _{z\in C}\bigg\{\pi (dx)1_{\{x\in B_{m}\}}\phi (z - f(x))\bigg\}dz.& &{}\end{array}$$

(10.49)

Lemma 10.8.3 shows that as πQ _n → πQ, $\|\pi ^{\prime}(m,\pi,Q_{n}) -\pi ^{\prime}(m,\pi,Q)\|_{TV } \rightarrow 0$ whenever π′(m, π, Q) > 0. If $\pi ^{\prime}(m,\pi _{T-2},Q_{T-2}) = 0$, by the boundedness of the cost, it follows that

$$\displaystyle{P(\pi ^{\prime}(m,\pi _{T-2},Q_{T-2,n})\vert \pi _{T-2},Q_{T-2,n})J_{T-1}^{T}(\pi ^{\prime}(m,\pi _{ T-2},Q_{T-2,n}) \rightarrow 0,}$$

since $P(\pi ^{\prime}(m,\pi _{T-2},Q_{T-2,n})\vert \pi _{T-2},Q_{T-2,n}) \rightarrow 0$.

As a consequence, we have that $E[J_{T-1}^{T}(\pi _{T-1})\vert \pi _{T-2},Q_{T-2}]$ is continuous in $(\pi _{T-2},Q_{T-2})$, since both of the expressions in (10.48) are continuous. Hence,

$$\displaystyle{J_{T-2}^{T}(\pi _{ T-2}) =\min _{Q_{T-2}}\bigg(c(\pi _{T-2},Q_{T-2}) + E[J_{T-1}^{T}(\pi _{ T-1})\vert \pi _{T-2},Q_{T-2}]\bigg)}$$

exists and by Lemma 10.8.5 is continuous. The recursion applies for all time stages.

This concludes the proof of Theorem 10.4.2. □

8.7 Proof of Theorem 10.5.1

The proof is in three steps: (i), (ii), and (iii) below.

Step (i):

In decentralized dynamic decision problems where the decision makers have the same objective (i.e., in team problems), more information provided to the decision makers does not lead to any degradation in performance, since the decision makers can always choose to ignore the additional information (as we saw in Sect. 3.5.2, in view of expansion of information structures). In view of this, let us relax the information structure in such a way that the decision makers now have access to all the previous observations; that is, the information available at the encoders 1 and 2 are

$$\displaystyle{I_{t}^{i} =\{ y_{ t}^{i},\mathbf{y}_{ [0,t-1]},\mathbf{q}_{[0,t-1]}\}\quad t \geq 1,\quad i = 1,2.}$$

$$\displaystyle{I_{0}^{i} =\{ y_{ 0}^{i}\},\quad i = 1,2.}$$

The information pattern among the encoders is now the one-step delayed observation sharing pattern. We will show that the past information can be eliminated altogether, to prove the desired result.

Step (ii):

The second step uses the following technical lemma.

Lemma 10.8.6.

Under the relaxed information structure in step (i) above, any decentralized quantization policy at time t, $1 \leq t \leq T - 1$, can be replaced, without any loss in performance, with one which only uses $(\pi _{t},\mathbf{y}_{t},\mathbf{q}_{[0,t-1]})$, satisfying the following form:

$$\displaystyle\begin{array}{rcl} P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]})& =& P(q_{t}^{1}\vert y_{ t}^{1},\mathbf{q}_{ [0,t-1]})P(q_{t}^{2}\vert y_{ t}^{2},\mathbf{q}_{ [0,t-1]}) \\ & & =1_{\{q_{t}^{1}=\bar{{Q}}^{1}(y_{t}^{1},\mathbf{q}_{[0,t-1]})\}}1_{\{q_{t}^{2}=\bar{{Q}}^{2}(y_{t}^{2},\mathbf{q}_{[0,t-1]})\}},{}\end{array}$$

(10.50)

for measurable functions $\bar{{Q}}^{1}$ and $\bar{{Q}}^{2}$.

Proof.

Let us fix a composite quantization policy Q ^comp. At time t = T − 1, the per-stage cost function can be written as

$$\displaystyle\begin{array}{rcl} & & E[\int _{\mathbb{X}}P(dx_{t}\vert \mathbf{q}_{[0,t]})c(x_{t},u_{t})\vert \mathbf{q}_{[0,t-1]}].{}\end{array}$$

(10.51)

For this problem, $P(dx_{t}\vert \mathbf{q}_{[0,t]})$ is a sufficient statistic for an optimal receiver. Hence, at time t = T − 1, an optimal receiver will use $P(dx_{t}\vert \mathbf{q}_{[0,t]})$ as a sufficient statistic for an optimal decision as the cost function conditioned on $\mathbf{q}_{[0,t]}$ is written as $\int P(dx_{t}\vert \mathbf{q}_{[0,t]})c(x_{t},u_{t})$, where u _t is the decision of the receiver. Now, let us fix this decision policy at time t. We now observe that

$$\displaystyle\begin{array}{rcl} & & P(dx_{t}\vert \mathbf{q}_{[0,t]}) =\sum _{{\mathbb{Y}}^{t+1}}P(dx_{t},\mathbf{y}_{[0,t]}\vert \mathbf{q}_{[0,t]}) = \frac{\sum _{{\mathbb{Y}}^{t+1}}P(dx_{t},\mathbf{q}_{t},\mathbf{y}_{[0,t]}\vert \mathbf{q}_{[0,t-1]})} {P(\mathbf{q}_{t}\vert \mathbf{q}_{[0,t-1]})} \\ & =& \frac{\sum _{{\mathbb{Y}}^{t+1}}P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]})P(\mathbf{y}_{t}\vert x_{t})P(dx_{t}\vert \mathbf{y}_{[0,t-1]})P(\mathbf{y}_{[0,t-1]}\vert \mathbf{q}_{[0,t-1]})} {\int _{\mathbb{X},{\mathbb{Y}}^{t+1}}P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]})P(\mathbf{y}_{t}\vert x_{t})P(dx_{t}\vert \mathbf{y}_{[0,t-1]})P(\mathbf{y}_{[0,t-1]}\vert \mathbf{q}_{[0,t-1]})} \\ & =& \frac{\sum _{{\mathbb{Y}}^{t+1}}P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]})P(\mathbf{y}_{t}\vert x_{t})\pi (dx_{t})P(\mathbf{y}_{[0,t-1]}\vert \mathbf{q}_{[0,t-1]})} {\int _{\mathbb{X},{\mathbb{Y}}^{t+1}}P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]})P(\mathbf{y}_{t}\vert x_{t})\pi (dx_{t})P(\mathbf{y}_{[0,t-1]}\vert \mathbf{q}_{[0,t-1]})}. {}\end{array}$$

(10.52)

The term $P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]})$ is determined by the composite quantization policies:

$$\displaystyle\begin{array}{rcl} & & P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]}) {}\\ & & = P(q_{t}^{1}\vert y_{ t}^{1},\mathbf{y}_{ [0,t-1]},\mathbf{q}_{[0,t-1]})P(q_{t}^{2}\vert y_{ t}^{2},\mathbf{y}_{ [0,t-1]},\mathbf{q}_{[0,t-1]}) {}\\ & & = 1_{\{q_{t}^{1}=Q_{t}^{comp,1}(y_{t}^{1},\mathbf{y}_{[0,t-1]},\mathbf{q}_{[0,t-1]})\}}1_{\{q_{t}^{2}=Q_{t}^{comp,2}(y_{t}^{2},\mathbf{y}_{[0,t-1]},\mathbf{q}_{[0,t-1]})\}}. {}\\ \end{array}$$

In (10.52), we use the relation $P(dx_{t}\vert \mathbf{y}_{[0,t-1]}) = P(dx_{t}) =:\pi (dx_{t})$, where π( ⋅) denotes the marginal probability on x _t (recall that the source is memoryless). The term $P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]})$ in (10.52) is determined by the composite quantization policies:

$$\displaystyle\begin{array}{rcl} & & P(\mathbf{q}_{t}\vert \mathbf{y}_{[0,t]},\mathbf{q}_{[0,t-1]}) {}\\ & & = P(q_{t}^{1}\vert y_{ t}^{1},\mathbf{y}_{ [0,t-1]},\mathbf{q}_{[0,t-1]})P(q_{t}^{2}\vert y_{ t}^{2},\mathbf{y}_{ [0,t-1]},\mathbf{q}_{[0,t-1]}) {}\\ & & = 1_{\{q_{t}^{1}=Q_{t}^{comp,1}(y_{t}^{1},\mathbf{y}_{[0,t-1]},\mathbf{q}_{[0,t-1]})\}}1_{\{q_{t}^{2}=Q_{t}^{comp,2}(y_{t}^{2},\mathbf{y}_{[0,t-1]},\mathbf{q}_{[0,t-1]})\}}. {}\\ \end{array}$$

The above is valid since each encoder knows the past observations of both encoders.

As such, $P(dx_{t}\vert \mathbf{q}_{[0,t]})$ can be written as, for some function $ \Upsilon $,

$$\displaystyle{\Upsilon (\pi,\mathbf{q}_{[0,t-1]},\mathbf{Q}_{t}^{comp}(.)).}$$

Note that $\mathbf{q}_{[0,t-1]}$ appears due to the term $P(\mathbf{y}_{[0,t-1]}\vert \mathbf{q}_{[0,t-1]})$. Now, consider the space of joint mappings at time t, denoted by $\mathcal{G}_{t}$:

$$\displaystyle{\mathcal{G}_{t} =\{ \mathbf{\Psi }_{t} : \mathbf{\Psi }_{t} =\{ \Psi _{t}^{1},\Psi _{ t}^{2}\},\Psi _{ t}^{i} : {\mathbb{Y}}^{i} \rightarrow \mathcal{M}_{ t}^{i},\quad i = 1,2\}.}$$

For every composite quantization policy there exists a distribution P′ on random variables $(\mathbf{q}_{t},\pi,\mathbf{q}_{[0,t-1]})$ such that

$$\displaystyle\begin{array}{rcl} & & P^{\prime}(\mathbf{q}_{t}\vert \pi,\mathbf{q}_{[0,t-1]}) =\sum _{{({\mathbb{Y}}^{1}\times {\mathbb{Y}}^{2})}^{t+1}}P(\mathbf{q}_{t},\mathbf{y}_{[0,t]}\vert \pi,\mathbf{q}_{[0,t-1]}) \\ & & =\sum _{{({\mathbb{Y}}^{1}\times {\mathbb{Y}}^{2})}^{t+1}}\bigg(P(q_{t}^{1}\vert \mathbf{y}_{ [0,t-1]},y_{t}^{1},\mathbf{q}_{ [0,t-1]},\pi ) \\ & & P(q_{t}^{2}\vert \mathbf{y}_{ [0,t-1]},y_{t}^{2},\mathbf{q}_{ [0,t-1]},\pi )P(y_{t}^{1},y_{ t}^{2})P(\mathbf{y}_{ [0,t-1]}\vert \pi,\mathbf{q}_{[0,t-1]})\bigg).{}\end{array}$$

(10.53)

Furthermore, with every composite quantization policy and every realization of $\mathbf{y}_{[0,t-1]},\mathbf{q}_{[0,t-1]}$, we can associate an element in the space $\mathcal{G}_{t}$, $\mathbf{\Psi }_{\mathbf{y}_{ [0,t-1]},\mathbf{q}_{[0,t-1]}}$, such that the induced stochastic relationship in (10.53) can be obtained:

$$\displaystyle\begin{array}{rcl} & & P^{\prime}(\mathbf{q}_{t}\vert \pi,\mathbf{q}_{[0,t-1]}) =\sum _{{({\mathbb{Y}}^{1}\times {\mathbb{Y}}^{2})}^{t+1}}P(\mathbf{q}_{t},\mathbf{y}_{[0,t]}\vert \pi,\mathbf{q}_{[0,t-1]}) {}\\ & =& \sum _{{({\mathbb{Y}}^{1}\times {\mathbb{Y}}^{2})}^{t+1}}1_{\{\mathbf{\Psi }_{\mathbf{y}_{ [0,t-1]},\mathbf{q}_{[0,t-1]}}(y_{t}^{1},y_{t}^{2})=(q_{t}^{1},q_{t}^{2})\}}P(y_{t}^{1},y_{ t}^{2})P(\mathbf{y}_{ [0,t-1]}\vert \pi,\mathbf{q}_{[0,t-1]}). {}\\ \end{array}$$

We can thus express the cost, for some measurable function F in the following way:

$$\displaystyle{E[F(\pi,\mathbf{q}_{[0,t-1]},\mathbf{\Psi })\vert \pi,\mathbf{q}_{[0,t-1]}],}$$

where

$$\displaystyle\begin{array}{rcl} & & P(\mathbf{\Psi }\vert \pi,\mathbf{q}_{[0,t-1]}) =\sum _{{({\mathbb{Y}}^{1}\times {\mathbb{Y}}^{2})}^{t}}1_{\{\mathbf{\Psi }=\mathbf{\Psi }_{\mathbf{y}_{ [0,t-1]},\mathbf{q}_{[0,t-1]}}\}}P(\mathbf{y}_{[0,t-1]}\vert \pi,\mathbf{q}_{[0,t-1]}). {}\\ \end{array}$$

Now let t = T − 1 and define for every possible realization $\mathbf{\Psi }_{t} = (\Psi _{t}^{1},\Psi _{t}^{2}) \in \mathcal{G}_{t}$ (with the decision policy considered earlier fixed):

$$\displaystyle\begin{array}{rcl} & & \beta _{\mathbf{\Psi }_{t}} :=\bigg\{\pi,\mathbf{q}_{[0,t-1]} : F(\pi,\mathbf{q}_{[0,t-1]},\mathbf{\Psi }_{t}) \leq F(\pi,\mathbf{q}_{[0,t-1]},\mathbf{\Psi }^{\prime}_{t}) {}\\ & & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \forall ((\Psi _{t}^{1})^{\prime},(\Psi _{ t}^{2})^{\prime}) \in \mathcal{G}_{ t}\bigg\}. {}\\ \end{array}$$

As we had observed in the proof of Theorem 10.3.3, such a construction covers the domain set consisting of $(\pi,\mathbf{q}_{[0,t-1]})$ but possibly with overlaps. Note that for every $(\pi,\mathbf{q}_{[0,t-1]})$, there exists a minimizing function in $\mathcal{G}_{t}$, since $\mathcal{G}_{t}$ is a finite set. In this sequence, let there be an ordering of the finitely many elements in $\mathcal{G}_{t}$ as $\{\mathbf{\Psi }_{t}(1),\mathbf{\Psi }_{t}(2),\ldots,\mathbf{\Psi }_{t}(k),\ldots \}$, and define a function $\mathbf{T}_{t}^{{\ast}}$ as

$$\displaystyle\begin{array}{rcl} & & \mathbf{\Psi }_{t}(k) = \mathbf{T}_{t}^{{\ast}}(\pi,\mathbf{q}_{ [0,t-1]}),\mathrm{if}\quad \bigg(\pi,\mathbf{q}_{[0,t-1]}\bigg) \in \beta _{\mathbf{\Psi }_{t}(k)} -\cup _{i=0}^{k-1}\beta _{ \mathbf{\Psi }_{t}(i)}, {}\\ \end{array}$$

with $\beta _{\mathbf{\Psi }_{t}(0)} = \varnothing $.

Thus, we have constructed a policy which performs at least as well as the original composite quantization policy. It has a restricted structure in that it only uses $(\pi,\mathbf{q}_{[0,t-1]})$ to generate the team action and the local information $y_{t}^{1},y_{t}^{2}$ to generate the quantizer outputs.

Now that we have obtained the structure of the optimal encoders for the last stage, we can sequentially proceed to study the other time stages. Note that given a fixed π, $\{(\pi,\mathbf{y}_{t})\}$ is i.i.d. and hence Markov. Now, define $\pi ^{\prime}_{t} = (\pi,\mathbf{y}_{t})$. For a three-stage cost problem, the cost at time t = 2 can be written as, for measurable functions $c_{2},c_{3}$,

$$\displaystyle\begin{array}{rcl} c_{2}(\pi ^{\prime}_{2},v_{2}(\mathbf{q}_{[1,2]})) + E[c_{3}(\pi ^{\prime}_{3},v_{3}(\mathbf{q}_{[1,2]},Q_{3}(\pi ^{\prime}_{3},\mathbf{q}_{[1,2]})))\vert \pi ^{\prime}_{[1,2]},\mathbf{q}_{[1,2]}].& & {}\\ \end{array}$$

Since $P(d\pi ^{\prime}_{3},\mathbf{q}_{[1,2]}\vert \pi ^{\prime}_{2},\pi ^{\prime}_{1},\mathbf{q}_{[1,2]}) = P(d\pi ^{\prime}_{3},\mathbf{q}_{[1,2]}\vert \pi ^{\prime}_{2},\mathbf{q}_{[1,2]})$, the expression above is equal for some $F_{2}(\pi ^{\prime}_{2},\mathbf{q}_{2},\mathbf{q}_{1})$ for some measurable F ₂. By a similar argument, an optimal composite quantizer at time t, $1 \leq t \leq T - 1$ only uses $(\pi,\mathbf{y}_{t},\mathbf{q}_{[0,t-1]})$. An optimal (team) policy generates the quantizers $Q_{t}^{1},Q_{t}^{2}$ using $\mathbf{q}_{[0,t-1]},\pi$, and the quantizers use $\{y_{t}^{i}\}$ to generate the quantizer outputs at time t for i = 1, 2. □

Step(iii):

The final step will complete the proof. At time t = T − 1, an optimal receiver will use $P(dx_{t}\vert \mathbf{q}_{[0,t]})$ as a sufficient statistic for the optimal decision. We now observe that

$$\displaystyle\begin{array}{rcl} & & P(dx_{t}\vert \mathbf{q}_{[0,t]}) =\sum _{{\mathbb{Y}}^{t+1}}P(dx_{t}\vert \mathbf{y}_{[0,t]})P(\mathbf{y}_{[0,t]}\vert \mathbf{q}_{[0,t]}) {}\\ & & =\sum _{{\mathbb{Y}}^{t+1}}P(dx_{t}\vert \mathbf{y}_{t})P(\mathbf{y}_{[0,t]}\vert \mathbf{q}_{[0,t]}) =\sum _{\mathbb{Y}}P(dx_{t}\vert \mathbf{y}_{t})\sum _{{\mathbb{Y}}^{t}}P(\mathbf{y}_{[0,t]}\vert \mathbf{q}_{[0,t]}) {}\\ & & =\sum _{\mathbb{Y}}P(dx_{t}\vert \mathbf{y}_{t})P(\mathbf{y}_{t}\vert \mathbf{q}_{[0,t]}). {}\\ \end{array}$$

Thus, $P(dx_{t}\vert \mathbf{q}_{[0,t]})$ is a function of $P(\mathbf{y}_{t}\vert \mathbf{q}_{[0,t]})$. Now, let us note that

$$\displaystyle\begin{array}{rcl} & & P(\mathbf{y}_{t}\vert \mathbf{q}_{[0,t]}) = \frac{P(\mathbf{q}_{t},\mathbf{y}_{t}\vert \mathbf{q}_{[0,t-1]})} {\sum _{\mathbf{y}_{t}}P(\mathbf{q}_{t},\mathbf{y}_{t}\vert \mathbf{q}_{[0,t-1]})} \\ & & = \frac{P(\mathbf{q}_{t}\vert \mathbf{y}_{t},\mathbf{q}_{[0,t-1]})P(\mathbf{y}_{t}\vert \mathbf{q}_{[0,t-1]})} {\sum _{\mathbf{y}_{t}}P(\mathbf{q}_{t}\vert \mathbf{y}_{t},\mathbf{q}_{[0,t-1]})P(\mathbf{y}_{t}\vert \mathbf{q}_{[0,t-1]})} \\ & & = \frac{P(\mathbf{q}_{t}\vert \mathbf{y}_{t},\mathbf{q}_{[0,t-1]})P(\mathbf{y}_{t})} {\sum _{\mathbf{y}_{t}}P(\mathbf{q}_{t}\vert \mathbf{y}_{t},\mathbf{q}_{[0,t-1]})P(\mathbf{y}_{t})}, {}\end{array}$$

(10.54)

where the term $P(\mathbf{q}_{t}\vert \mathbf{y}_{t},\mathbf{q}_{[0,t-1]})$ is determined by the quantizer team action $\mathbf{Q}_{t}^{comp}$. As such, the cost at time t = T − 1 can be expressed as a measurable function $G(P(\mathbf{y}_{t}),\mathbf{Q}_{t})$. Thus, it follows that, an optimal quantizer policy at the last stage, t = T − 1 may only use P(y _t) to generate the quantizers, where the quantizers use the local information $y_{t}^{i}$ to generate the quantization output. The rest of the proof follows the arguments in the proof of Theorem 10.3.4: At time t = T − 2, the sufficient statistic for the cost function is $P(dx_{t-1}\vert \mathbf{q}_{[0,t-1]})$ both for the immediate cost and the cost-to-go, that is, the cost impacting the time stage t = T − 1, as a result of the optimality result for Q _T − 1 and the memoryless nature of the source dynamics. The same argument applies for all time stages.

Hence, any policy without loss can be replaced with one in Π ^NSM. Since there are finitely many policies in this class, an optimal composite quantization policy exists. □

8.8 Proof of Lemma 10.6.1

We apply dynamic programming. Let for the final stage, t = T − 1, $f_{t}(q_{[0,t-1]}) :=\sum _{ k=0}^{t-1}{A}^{t-k-1}Bu_{k}$ and $x_{t} =\bar{ x}_{t} + f_{t}(q_{[0,t-1]})$. If the policy is in Π _W, the composite quantization policy is of the form

$$\displaystyle{Q_{t}(\bar{x}_{t} +\sum _{ k=0}^{t-1}{A}^{t-k-1}Bu_{ k},P(\bar{x}_{t} +\sum _{ k=0}^{t-1}{A}^{t-k-1}Bu_{ k} \in \cdot \vert q_{[0,t-1]})).}$$

For this time stage, let there be an optimal decoder and controller for which a sufficient statistic for the optimal control policy is $E[x_{t}\vert q_{[0,t]}]$. Observe that

$$\displaystyle\begin{array}{rcl} & & E[\bar{x}_{t} + f_{t}(q_{[0,t-1]})\vert q_{[0,t]}] = E[\bar{x}_{t}\vert q_{[0,t]}] + f_{t}(q_{[0,t-1]}) \\ & & \quad \quad = E[\bar{x}_{t}\vert q_{[0,t-1]},q_{t}] + f_{t}(q_{[0,t-1]}). {}\end{array}$$

(10.55)

The quantization output q _t represents the bin information for x _t. By shifting the quantizer bins by $f_{t}(q_{[0,t-1]})$, a new quantizer which quantizes $\bar{x}_{t}$ can generate the same bin information on x _t through q _t. Hence, there is no information loss due to the elimination of the past control actions. Therefore, this new quantizer, by adding $f_{t}(q_{[0,t-1]})$ to the output, generates the same conditional estimate of the state as the original quantizer. Thus, there exists a quantizer of the form $\tilde{Q}_{t}(\bar{x}_{t},P(\bar{x}_{t} \in \cdot \vert q_{[0,t-1]}))$ with the following property: The estimation error realization and hence the estimation is the same almost surely and as a consequence of the structure of the cost and linearity in the system, the conditional estimate is a sufficient statistic, the cost realization is identical almost surely. Furthermore, $\bar{w}_{t}$ is independent of the control actions applied earlier (due to the separated structure).

Consequently, for t = T − 3, since u _T − 2 is independent of $\bar{w}_{T-2}$ and $\bar{w}_{T-1}$, an optimal controller will use $E[x_{t}\vert q_{[0,t]}]$ as a sufficient statistic given the structural result above for $u_{T-1},u_{T-2}$ and the encoder policies. Hence, the analysis above applies for t = T − 4 and by induction, for all time stages until t = 0. The estimation error is independent of the control actions under an optimal coding and control policy without any loss. □

8.9 Proof of Theorem 10.6.3

As a consequence of Theorem 10.6.2, we obtain that for $t \geq 0$, the unnormalized value function to be given by

$$\displaystyle\begin{array}{rcl} J_{t}(\mathcal{I}_{t}^{c})& =& E[x_{ t}^{\prime}K_{t}x_{t}\vert \mathcal{I}_{t}^{c}] +\sum _{ k=t}^{T-1}\bigg(E[(x_{ k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])^{\prime}Q(x_{ k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])] \\ & & +E[\bar{w}_{k}^{\prime}K_{k+1}\bar{w}_{k}]\bigg), {}\end{array}$$

(10.56)

where the effective noise process is $\bar{w}_{t} = E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - E[x_{t+1}\vert \mathcal{I}_{t}^{c}]$ with

$$\displaystyle{J({\Pi }^{comp},{\underline{\gamma }}^{0},T) = \frac{1} {T}J_{0}(\mathcal{I}_{0}^{c}).}$$

Given a positive-definite matrix Λ define an inner product as

$$\displaystyle{\langle z_{1},z_{2}\rangle _{\Lambda } = z_{1}^{\prime}\Lambda z_{2}.}$$

and the norm generated by this inner product as $\vert z\vert _{\Lambda } = \sqrt{z^{\prime}\Lambda z}$. We now note the following:

$$\displaystyle\begin{array}{rcl} & & E\bigg[\vert E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - E[x_{ t+1}\vert \mathcal{I}_{t}^{c}]\vert _{ \Lambda }^{2}\bigg] {}\\ & & = E\bigg[\vert \bigg((E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - x_{ t+1}) + (x_{t+1} - E[x_{t+1}\vert \mathcal{I}_{t}^{c}])\bigg)\vert _{ \Lambda }^{2}\bigg] {}\\ & & = E[\vert (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - x_{ t+1})\vert _{\Lambda }^{2}] + E[\vert (x_{ t+1} - E[x_{t+1}\vert \mathcal{I}_{t}^{c}])\vert _{ \Lambda }^{2}] {}\\ & & \quad \quad + 2E\bigg[\langle (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - x_{ t+1}),(x_{t+1} - E[x_{t+1}\vert \mathcal{I}_{t}^{c}])\rangle _{ \Lambda }\bigg]. {}\\ \end{array}$$

Note that

$$\displaystyle\begin{array}{rcl} & & E\bigg[\langle (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - x_{ t+1}),(x_{t+1} - E[x_{t+1}\vert \mathcal{I}_{t}^{c}])\rangle _{ \Lambda }\bigg] \\ & & = E\bigg[-\langle (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - x_{ t+1}),(E[x_{t+1}\vert \mathcal{I}_{t}^{c}])\rangle _{ \Lambda } \\ & & \quad \quad \quad \quad \quad +\langle (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - x_{ t+1}),(x_{t+1})\rangle _{\Lambda }\bigg] \\ & & = E\bigg[\langle (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - x_{ t+1}),(x_{t+1})\rangle _{\Lambda }\bigg] {}\end{array}$$

(10.57)

$$\displaystyle\begin{array}{rcl} & & = -E[\vert (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - x_{ t+1})\vert _{\Lambda }^{2}],{}\end{array}$$

(10.58)

where (10.57)–(10.58) follow from the orthogonality property of minimum mean-square estimation and that $E[x_{t+1}\vert \mathcal{I}_{t}^{c}]$ is measurable on $\sigma (\mathcal{I}_{t+1}^{c})$, the sigma-field generated by $\mathcal{I}_{t+1}^{c}$. Therefore, we have

$$\displaystyle\begin{array}{rcl} & & E\bigg[\vert (E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}] - E[x_{ t+1}\vert \mathcal{I}_{t}^{c}])\vert _{ K_{t+1}}^{2}\bigg] {}\\ & & = E\bigg[\vert (x_{t+1} - E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}])\vert _{ K_{t+1}}^{2} + E[\vert (Ax_{ t} + w_{t} - AE[x_{t}\vert \mathcal{I}_{t}^{c}])\vert _{ K_{t+1}}^{2}\bigg] {}\\ & & \quad \quad - 2E[\vert (x_{t+1} - E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}])\vert _{ K_{t+1}}^{2}] {}\\ & & = -E\bigg[(x_{t+1} - E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}])^{\prime}(K_{ t+1})(x_{t+1} - E[x_{t+1}\vert \mathcal{I}_{t+1}^{c}]) {}\\ & & \quad \quad + E[(x_{t} - E[x_{t}\vert \mathcal{I}_{t}^{c}])^{\prime}(A^{\prime}K_{ t+1}A)(x_{t} - E[x_{t}\vert \mathcal{I}_{t}^{c}])] + E[w^{\prime}K_{ t+1}w]\bigg]. {}\\ \end{array}$$

Thus, the finite horizon cost could be written as, since K _T = 0,

$$\displaystyle\begin{array}{rcl} J_{t}(\mathcal{I}_{t}^{c})& =& E[x_{ t}^{\prime}K_{t}x_{t}\vert \mathcal{I}_{t}^{c}] \\ & & \quad +\sum _{ k=t}^{T-1}E[(x_{ k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])^{\prime}(Q + A^{\prime}K_{ k+1}A)(x_{k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])] \\ & & \quad -\sum _{k=t}^{T-1}E[(x_{ k+1} - E[x_{k+1}\vert \mathcal{I}_{k+1}^{c}])^{\prime}(K_{ k+1})(x_{k+1} - E[x_{k+1}\vert \mathcal{I}_{k+1}^{c}])] \\ & & \quad +\sum _{ k=t}^{T-1}E[w^{\prime}_{ k}K_{k+1}w_{k}] \\ & & = E[x_{t}^{\prime}K_{t}x_{t}\vert \mathcal{I}_{t}^{c}]+\,\,\sum _{ k=t}^{T-1}\,\,E[(x_{ k}-E[x_{k}\vert \mathcal{I}_{k}^{c}])^{\prime}(Q+A^{\prime}K_{ k+1}A)(x_{k}\,-\,E[x_{k}\vert \mathcal{I}_{k}^{c}])] \\ & & \quad -\sum _{k=t+1}^{T-1}E[(x_{ k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])^{\prime}(K_{ k})(x_{k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])] \\ & & \quad +\sum _{ k=t}^{T-1}E[w^{\prime}_{ k}K_{k+1}w_{k}]. \\ & & =E[x_{t}^{\prime}K_{t}x_{t}\vert \mathcal{I}_{t}^{c}]+E[(x_{ t}-E[x_{t}\vert \mathcal{I}_{t}^{c}])^{\prime}(Q+A^{\prime}K_{ t+1}A)(x_{t}-E[x_{t}\vert \mathcal{I}_{t}^{c}])] \\ & & \quad +\sum _{ k=t+1}^{T-1}E[(x_{ k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])^{\prime}(Q + A^{\prime}K_{ k+1}A - K_{k})(x_{k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])] \\ & & \quad +\sum _{ k=t}^{T-1}E[w^{\prime}_{ k}K_{k+1}w_{k}]. {}\end{array}$$

(10.59)

Now, with a fixed horizon T and for t < T − 1,

$$\displaystyle\begin{array}{rcl} & & J_{t}(\mathcal{I}_{t}^{c}) = E[x_{ t}^{\prime}K_{t}x_{t}\vert \mathcal{I}_{t}^{c}] + E[(x_{ t} - E[x_{t}\vert \mathcal{I}_{t}^{c}])^{\prime}(Q + A^{\prime}K_{ t+1}A)(x_{t} - E[x_{t}\vert \mathcal{I}_{t}^{c}])] \\ & & \quad \quad +\sum _{ k=t+1}^{T-1}E[(x_{ k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])^{\prime}(Q + A^{\prime}K_{ k+1}A - K_{k})(x_{k} - E[x_{k}\vert \mathcal{I}_{k}^{c}])] \\ & & \quad \quad +\sum _{ k=t}^{T-1}E[w^{\prime}_{ k}K_{k+1}w_{k}]. {}\end{array}$$

(10.60)

Letting t = 0 completes the proof. □

8.10 Proof of Theorem 10.6.4

We will show that Condition D used in the proof of Theorem 10.4.2 applies. We need to modify the proof of Theorem 10.4.2 only in view of the unboundedness of the cost, which appears in two contexts. One is with regard to the continuity of c(π, Q) and the other is the weak continuity of the expected value function in the transition kernel. We address both below. To economize the notation, we take P _t = I in the following.

(a) Continuity ofc(π, Q)

Continuity under total variation can be extended for unbounded functions, provided there is a uniform integrability condition as follows:

$$\displaystyle{\lim _{L\rightarrow \infty }\sup _{Q_{n}}\inf _{\gamma }E_{\pi }^{Q_{n} }[(x - Q_{n}(x))^{\prime}(x - Q_{n}(x))1_{\{(x-\gamma (Q_{n}(x)))^{\prime}(x-\gamma (Q_{n}(x)))\geq L\}}] = 0,}$$

where by an abuse of notation, the infimization inf_γ is not for the truncated expression

$$\displaystyle{E_{\pi }^{Q_{n} }[(x - Q_{n}(x))^{\prime}(x - Q_{n}(x))1_{\{(x-\gamma (Q_{n}(x)))^{\prime}(x-\gamma (Q_{n}(x)))\geq L\}}],}$$

but for the original cost

$$\displaystyle{E_{\pi }^{Q_{n} }[(x - Q_{n}(x))^{\prime}(x - Q_{n}(x))].}$$

Now, by the parallelogram law

$$\displaystyle{(x - Q(x))^{\prime}(x - Q(x)) \leq 2x^{\prime}x + 2\vert \sup _{x}Q(x){\vert }^{2}.}$$

As a consequence, for any $Q_{n},\pi _{n}$, we have that for some sequence D _n

$$\displaystyle\begin{array}{rcl} & & \sup _{\pi _{n},Q_{n}}\inf _{\gamma }\int \pi _{n}(dx)(x-\gamma (Q_{n}(x))^{\prime}(x-\gamma (Q_{n}(x))1_{\{(x-\gamma (Q_{n}(x))^{\prime}(x-\gamma (Q_{n}(x))\geq L\}} \\ & & \quad \quad \quad \leq \sup _{\pi _{n}}\inf _{\gamma }\int \pi _{n}(dx)(2x^{\prime}x + 2D_{n})1_{\{2x^{\prime}x\geq L-2D_{n}\}}. {}\end{array}$$

(10.61)

For every π, Q and every sequence $\pi _{n},Q_{n}$ converging to π, Q, since the bins converge setwise, so does the minimizing quantizer reconstruction levels in the sense that

$$\displaystyle{\int _{B_{k}^{n}}\pi _{n}(dx)x \rightarrow \int _{B_{k}}\pi (dx)x,\quad 1 \leq k \leq M.}$$

Hence, for some D < ∞

$$\displaystyle\begin{array}{rcl} & &\sup _{\pi _{n},Q_{n}}\inf _{\gamma }\int \pi _{n}(dx)(x -\gamma (Q_{n}(x))^{\prime}(x -\gamma (Q_{n}(x))1_{\{(x-\gamma ({Q}^{(}x))^{\prime}(x-\gamma ({Q}^{(}x))\geq L\}} \\ & & \quad \quad \quad \leq \sup _{\pi _{n},Q_{n}}\inf _{\gamma }\int \pi _{n}(dx)(2x^{\prime}x + 2D_{n})1_{\{2x^{\prime}x\geq L-2D_{n}\}} \\ & & \quad \quad \quad \leq \sup _{\pi _{n}}\inf _{\gamma }\int \pi _{n}(dx)(2x^{\prime}x + 2D)1_{\{2x^{\prime}x\geq L-2D\}}. {}\end{array}$$

(10.62)

Hence, one needs to prove that $\{\pi _{n}\}$ itself is uniformly integrable. If this holds then, for every ε, there exists an L such that for all $(\pi _{n},Q_{n}) \rightarrow (\pi,Q)$, it follows that

$$\displaystyle\begin{array}{rcl} & & \bigg\vert E_{\pi _{n}}^{Q}[(x - Q_{ n}(x))^{\prime}(x - Q_{n}(x))1_{\{(x-\gamma (Q_{n}(x)))^{\prime}(x-\gamma (Q_{n}(x)))\leq L\}}] {}\\ & & \quad \quad \quad \quad \quad \quad - E_{\pi _{n}}^{Q_{n} }[(x - Q_{n}(x))^{\prime}(x - Q_{n}(x))]\bigg\vert \leq \epsilon /2 {}\\ \end{array}$$

and that for sufficiently large n, given L,

$$\displaystyle\begin{array}{rcl} & & \bigg\vert E_{\pi _{n}}^{Q_{n} }[(x - Q_{n}(x))^{\prime}(x - Q_{n}(x))1_{\{(x-\gamma (Q_{n}(x)))^{\prime}(x-\gamma (Q_{n}(x)))\leq L\}}] \\ & & \quad \quad \quad - E_{\pi }^{Q}[(x - Q(x))^{\prime}(x - Q(x))1_{\{ (x-\gamma (Q(x)))^{\prime}(x-\gamma (Q(x)))\leq L\}}]\bigg\vert \leq \epsilon /2.{}\end{array}$$

(10.63)

Hence, for every ε > 0 there exists n ₀ such that for all $n \geq n_{0}$,

$$\displaystyle{\vert E_{\pi _{n}}^{Q}[(x - Q_{ n}(x))^{\prime}(x - Q_{n}(x))] - E_{\pi }^{Q}[(x - Q(x))^{\prime}(x - Q(x))]\vert \leq \epsilon{.}}$$

Thus, continuity is established under the uniform integrability condition. The following technical lemma addresses the uniform integrability of π _n.

Lemma 10.8.7 ([434]).

Let $\pi _{t,n} \rightarrow \pi _{t}$ be a uniformly integrable sequence. Then, $\pi ^{\prime}(m,\pi _{t,n},Q_{t,n})$ [defined in ( 10.45 )] is uniformly integrable for $(\pi _{t,n},Q_{t,n}) \rightarrow \pi _{t},Q_{t}$.

(b) Continuity of the Value Function in the Quantizer We apply backward induction. Let

$$\displaystyle{J_{T-1}^{T}(\pi _{ T-1}) =\min _{Q}c(\pi _{T-1},Q_{T-1}).}$$

We observed in part (a) above that for this case the optimal cost function is continuous in $\pi _{T-1},Q_{T-1}$, provided π _T − 1 varies along a uniformly integrable sequence, and hence by Lemma 10.8.5, the value function $J_{T-1}^{T}$ is continuous in π _T − 1. Now, we wish to see if

$$\displaystyle\begin{array}{rcl} J_{T-2}^{T}(\pi _{ T-2}) =\min _{Q}\bigg(c(\pi _{T-2},Q_{T-2}) + E[J_{T-1}^{T}(\pi _{ T-1})\vert \pi _{T-2},Q_{T-2}]\bigg)& &{}\end{array}$$

(10.64)

is continuous in $(\pi _{T-2},Q_{T-2})$ (along a uniformly integrable sequence). Lemma 10.8.7 suggests that for $(\pi _{T-2,n},Q_{T-2,n})$ a uniformly integrable converging sequence, converging to $(\pi _{T-2},Q_{T-2})$, the term $J_{T-1}^{T}(\pi _{T-1}(m,\pi _{T-2,n},Q_{T-2,n}))$ also converges for every m, and hence continuity is established. It can be shown that by Lemma 10.8.3, as Q _n → Q, $\|\pi ^{\prime}(m,\pi,Q_{n}) -\pi ^{\prime}(m,\pi,Q)\|_{TV } \rightarrow 0$ for any quantizer $Q$ with M cells of positive measure [434]. Continuity can be established even if the number of cells is less then M by a bounding argument: if a bin probability decreases to zero, so does the value function (note that an optimal quantizer cannot have less than M cells since by splitting a given cell into two leads to the existence of another cell, yet the value function strictly decreases). Thus, in the update equation

$$\displaystyle\begin{array}{rcl} & & E[J_{T-1}^{T}(\pi _{ T-1})\vert \pi _{T-2},Q_{T-2}] {}\\ & & =\sum _{ m=1}^{M}P(\pi ^{\prime}(m,\pi _{ T-2},Q_{T-2})\vert \pi _{T-2},Q_{T-2})J_{T-1}^{T}(\pi ^{\prime}(m,\pi _{ T-2},Q_{T-2}), {}\\ \end{array}$$

$E[J_{T-1}^{T}(\pi _{T-1})\vert \pi _{T-2},Q_{T-2}]$ is continuous in $\pi _{T-2},Q_{T-2}$. By the continuity of $c(\pi _{T-2},Q_{T-2})$, we have that $c(\pi _{T-2},Q_{T-2}) + E[J_{T-1}^{T}(\pi _{T-1})\vert \pi _{T-2},Q_{T-2}]$ is continuous in $\pi _{T-2},Q_{T-2}$; the value function is continuous by Lemma 10.8.5. Thus, (10.64) admits a solution, and

$$\displaystyle{J_{T-3}^{T}(\pi _{ T-3}) =\min _{Q}\bigg(c(\pi _{T-3},Q_{T-3}) + E[J_{T-2}^{T}(\pi _{ T-2})\vert \pi _{T-3},Q_{T-3}]\bigg),}$$

is continuous in π _T − 3.

Continuing the same reasoning for the previous time stages, continuity and the existence of an optimal policy follows. □

9 Concluding Remarks

This chapter presented structural results on optimal causal coding of Markov sources in a large class of settings. The structural results are shown to feature a separation structure. For the optimal causal coding of a partially observed Markov source, the structure of the optimal causal coders is obtained and is shown to admit a separation structure. We observed in particular that separation of estimation (conditional probability computation) and quantization (of this probability) applies in such a setup. We also observed that optimal real-time decentralized coding of a partially observed i.i.d. source admits separation. Such a separation result does not, in general, extend to decentralized coding of partially observed Markov sources. The chapter has also established the existence of optimal control and quantization policies under some technical conditions.

The joint optimization of encoding and control policies for the LQG problem has also been studied in the chapter, and it has been shown that separation of estimation and control applies, an optimal quantizer exists under some technical assumptions on the space of policies considered, and the optimal control policy is linear in its conditional estimate.

The separation result presented in this chapter will likely find many applications in sensor networks and networked control problems where sensors have imperfect observation of a plant to be controlled. One direction still to explore is to find explicit results on the optimal policies using computational tools. One promising approach is expert-based systems, which are very effective once one imposes a structure on the designs; see [187] for details.

Theorem 10.3.4 motivates the problem of optimal quantization of probability measures. This remains an interesting problem to be investigated in a real-time coding context, with important practical consequences in control and economics applications. Toward this direction, Graf and Luschgy, in [167, 168], have studied the optimal quantization of probability measures.

10 Bibliographic Notes

Related papers on real-time coding include the following: [292] established that the optimal causal encoder minimizing the data rate subject to a distortion for an i.i.d. sequence is memoryless. If the source is kth-order Markov, then the optimal causal fixed-rate coder minimizing any measurable distortion uses only the last k source symbols, together with the current state at the receiver’s memory [396]. Walrand and Varaiya [385] considered the optimal causal coding problem of finite-state Markov sources over noisy channels with feedback. Teneketzis [361] and Mahajan and Teneketzis [249] considered optimal causal coding of Markov sources over noisy channels without feedback. Mahajan and Teneketzis [248] considered the optimal causal this is correct. We have made similar changes in similar occurrences. Please check. coding over a noisy channel with noisy feedback. Linder and Zamir [237] considered the causal coding of more general sources, stationary sources, under a high-rate assumption. An earlier reference on quantizer design is [108]. Relevant discussions on optimal quantization, randomized decisions, and optimal quantizer design can be found in [149, 438].

Borkar et al. [74] have studied a related problem of coding of a partially observed Markov source. This work also regarded actions as the quantizer functions. Nayyar and Teneketzis [289] considered within a multiterminal setup decentralized coding of correlated sources when the encoders observe conditionally independent messages given a finitely valued random variable and obtained separation results for the optimal encoders. Their paper also considers noisy channels. Some related studies include optimal control with multiple sensors and sequential decentralized hypothesis testing problems [375] and multi-access communications with feedback [8].

Existence of optimal quantizers for a one-stage cost problem has been investigated by Abaya and Wise [1], Pollard [309], and Yüksel and Linder [438]. For dynamic vector quantizers, Borkar, Mitter, and Tatikonda [74] obtained existence results for an infinite horizon setting. Mahajan and Teneketzis [250], Teneketzis [361], and Yüksel [418] considered zero-delay coding of Markov sources under various setups. Tatikonda et al. [358] considered general channels in the context of sequential rate distortion and established the result that uniform quantization is asymptotically optimal in the limit of large rates for quadratic distortion criteria. A similar discussion can be found in [427]. Linder and Zamir [237] considered causal coding of stationary sources in the limit of low distortion. Matveev and Savkin [262] established the existence of optimal coding and quantizer policies for the LQG setup under the assumption that the controller is memoryless.

There is a large literature on jointly optimal quantization for the LQG problem dating back to early 1960s (see, e.g., [108, 232]). References [42, 73, 139, 147, 262, 283, 358, 423] considered the optimal LQG quantization and control, with various results on the optimality or the lack of optimality of the separation principle. We also note that [425] provides a discussion for optimal quantization of control-free linear Gaussian systems. The LQG system analysis in this chapter builds primarily on [423].

Weissman and Merhav [389] considered optimal causal variable-rate coding under side information and [433] considered optimal variable-rate causal coding under distortion constraints.

In this chapter, we also presented structural results for optimal decentralized coding of i.i.d. sources, considered in [425]. There are algorithmic and asymptotic results available in the literature when the encoders satisfy the optimal structure obtained in the chapter; important contributions in this direction include [141, 172, 390].

A parallel line of consideration which is of a rate-distortion theoretic nature is the sequential-rate distortion proposed in [358] and the feedforward setup, which has been investigated in [129, 377].

This chapter is also related to Witsenhausen’s indirect rate distortion problem [397] (see also [119]). Further related papers include [20, 39, 53, 119, 138, 201].

Related papers considering multiterminal information theory problems in a team theoretic angle include [99, 289].

Theorems 10.3.3–10.3.5 and 10.5.1 follow from [418, 425]. Some of these results generalize the approaches in [385, 396]. Theorem 10.4.2 is due to [437]. Fischer [139], Nair et al. [283], and Tatikonda et al. [358] considered the optimal LQG quantization over general channels and established separation results; Theorem 10.6.2 has essentially appeared in [283, 423]. Lemma 10.6.1 is due to [423].

References

Abaya, E.F., Wise, G.L.: Convergence of vector quantizers with applications to optimal quantization. SIAM J. Appl. Math. 44, 183–189 (1984)
Article MathSciNet MATH Google Scholar
Anastasopoulos, A.: A sequential transmission scheme for the multiple access channel with noiseless feedback. In: Proceedings of the Annual Allerton Conference on Communications, Control and Computing, Monticello, IL, 2009
Google Scholar
Ayanoğlu, E., Gray, R.M.: The design of joint source and channel trellis waveform coders. IEEE Trans. Inf. Theor. 33(6), 855–865 (1987)
Article Google Scholar
Bansal, R., Başar, T.: Solutions to a class of linear-quadratic-Gaussian (LQG) stochastic team problems with nonclassical information. Syst. Control Lett. 9, 125–130 (1987)
Article MATH Google Scholar
Bao, L., Skoglund, M., Johansson, K.H.: Iterative encoder-controller design for feedback control over noisy channels. IEEE Trans. Autom. Control 57, 265–278 (2011)
Article MathSciNet Google Scholar
Berger, T.: Multiterminal source coding. In: Lecture Notes presented at the CISM Summer School, Udine, Italy, July 1977
Google Scholar
Borkar, V.S.: Probability Theory: An Advanced Course. Springer, New York (1995)
MATH Google Scholar
Borkar, V.S., Mitter, S.K.: LQG control with communication constraints. In: Kailath Festschrift. Kluwer Academic Publishers, Boston (1997)
Google Scholar
Borkar, V.S., Mitter, S.K., Tatikonda, S.: Optimal sequential vector quantization of Markov sources. SIAM J. Control Optim. 40, 135–148 (2001)
Article MathSciNet MATH Google Scholar
Bross, S., Lapidoth, A.: An improved achievable region for the discrete memoryless two-user multiple-access channel with noiseless feedback. IEEE Trans. Inf. Theor. 51, 811–833 (2005)
Article MathSciNet Google Scholar
Como, G., Yüksel, S.: On the capacity of memoryless finite state multiple access channels with asymmetric state information at the encoders. IEEE Trans. Inf. Theor. 56, 1267–1273 (2011)
Article Google Scholar
Cover, T.M., Leung, C.S.K.: An achievable rate region for the multiple-access channel with feedback. IEEE Trans. Inf. Theor. 27, 292–298 (1981)
Article MathSciNet MATH Google Scholar
Curry, R.E.: Estimation and Control with Quantized Measurements. MIT Press, Cambridge (1969)
Google Scholar
Dobrushin, R.L., Tsybakov, B.S.: Information transmission with additional noise. IRE Trans. Inf. Theor. 18, 293–304 (1962)
Article Google Scholar
Ebeid, H., Coleman, T.P.: Source coding with feedforward using the posterior matching scheme. In: Proceedings of the IEEE International Symposium on Information Theory (ISIT), Austin, Texas, USA, 2010
Google Scholar
Fine, T.: Optimum mean-square quantization of a noisy input. IEEE Trans. Inf. Theor. 11, 293–294 (1965)
Article MATH Google Scholar
Fischer, T.R.: Optimal quantized control. IEEE Trans. Autom. Control 27, 996–998 (1982)
Article MATH Google Scholar
Flynn, T.J., Gray, R.M.: Encoding of correlated observations. IEEE Trans. Inf. Theor. 33, 773–787 (1987)
Article MathSciNet Google Scholar
Fu, M.: Lack of separation principle for quantized linear quadratic Gaussian control. IEEE Trans. Autom. Control 57, 2385–2390 (2012)
Article Google Scholar
Gabor, G., Szekeres, G., Györfi, Z.: On the Gaarder-Slepian tracking system conjecture. IEEE Trans. Inf. Theor. 37, 1165–1168 (1991)
Article MATH Google Scholar
Graf, S., Luschgy, H.: Foundations of Quantization for Probability Distributions. Springer, Berlin (2000)
Book MATH Google Scholar
Graf, S., Luschgy, H.: Quantization for probability measures in the Prokhorov metric. Theor. Probab. Appl. 53, 216–241 (2009)
Article MATH Google Scholar
Gray, R.M., Boyd, S., Lookabaugh, T.: Low rate distributed quantization of noisy observations. In: Proceedings of the Allerton Conference Communication, Control and Computing, pp. 354–358, 1985
Google Scholar
György, A., Linder, T.: Optimal entropy-constrained scalar quantization of a uniform source. IEEE Trans. Inf. Theor. 46, 2704–2711 (2000)
Article MATH Google Scholar
György, A., Linder, T., Lugosi, G.: Tracking the best quantizer. IEEE Trans. Inf. Theor. 54, 1604–1625 (2008)
Article Google Scholar
Hernandez-Lerma, O., Lasserre, J.: Discrete-Time Markov Control Processes. Springer, New York (1996)
Book Google Scholar
Huang, M., Dey, S.: Dynamic quantizer design for hidden Markov state estimation via multiple sensors with fusion center feedback. IEEE Trans. Signal Process. 54, 2887–2896 (2006)
Article Google Scholar
Kushner, H.J.: Introduction to Stochastic Control Theory. Holt, Rinehart and Winston, New York (1972)
Google Scholar
Lewis, J.B., Tou, J.T.: Optimum sampled-data systems with quantized control signals. IEEE Trans. Appl. Ind. 82, 229–233 (1965)
Article Google Scholar
Linder, T., Lugosi, G.: A zero-delay sequential scheme for lossy coding of individual sequences. IEEE Trans. Inf. Theor. 47, 2533–2538 (2001)
Article MathSciNet MATH Google Scholar
Linder, T., Zamir, R.: Causal coding of stationary sources and individual sequences with high resolution. IEEE Trans. Inf. Theor. 52, 662–680 (2006)
Article MathSciNet Google Scholar
Mahajan, A., Teneketzis, D.: On the design of globally optimal communication strategies for real-time noisy communication with noisy feedback. IEEE J. Sel. Areas Commun. 26, 580–595 (2008)
Article Google Scholar
Mahajan, A., Teneketzis, D.: Optimal design of sequential real-time communication systems. IEEE Trans. Inf. Theor. 55, 5317–5338 (2009)
Article MathSciNet Google Scholar
Mahajan, A., Teneketzis, D.: Optimal performance of networked control systems with non-classical information structures. SIAM J. Control Optim. 48, 1377–1404 (2009)
Article MathSciNet MATH Google Scholar
Matveev, A.S., Savkin, A.V.: The problem of LQG optimal control via a limited capacity communication channel. Syst. Control Lett. 53, 51–64 (2004)
Article MathSciNet MATH Google Scholar
Nair, G.N., Fagnani, F., Zampieri, S., Evans, J.R.: Feedback control under data constraints: an overview. Proc. IEEE 95, 108–137 (2007)
Article Google Scholar
Nayyar, A., Teneketzis, D.: On the structure of real-time encoders and decoders in a multi-terminal communication system. IEEE Trans. Inf. Theor. 57, 6196–6214 (2011)
Article MathSciNet Google Scholar
Neuhoff, D.L., Gilbert, R.K.: Causal source codes. IEEE Trans. Inf. Theor. 28, 701–713 (1982)
Article MathSciNet MATH Google Scholar
Pollard, D.: Quantization and the method of k-means. IEEE Trans. Inf. Theor. 28, 199–205 (1982)
Article MathSciNet MATH Google Scholar
Stettner, L.: Ergodic control of partially observed Markov control processes with equivalent transition probabilities. Applicationes Mathematicae 22, 25–38 (1993)
MathSciNet MATH Google Scholar
Tatikonda, S., Sahai, A., Mitter, S.: Stochastic linear control over a communication channels. IEEE Trans. Autom. Control 49, 1549–1561 (2004)
Article MathSciNet Google Scholar
Teneketzis, D.: On the structure of optimal real-time encoders and decoders in noisy communication. IEEE Trans. Inf. Theor. 52, 4017–4035 (2006)
Article MathSciNet Google Scholar
Veeravalli, V.V., Başar, T., Poor, H.V.: Decentralized sequential detection with a fusion center performing the sequential test. IEEE Trans. Inf. Theor. 39, 433–442 (1993)
Article MATH Google Scholar
Venkataramanan, R., Pradhan, S.S.: Source coding with feed-forward: Rate-distortion theorems and error exponents for a general source. IEEE Trans. Inf. Theor. 53, 2154–2179 (2007)
Article MathSciNet Google Scholar
Venkataramanan, R., Pradhan, S.S.: A new achievable rate region for the discrete memoryless multiple-access channel with feedback. In: Proceedings of the IEEE International Symposium on Information Theory (ISIT), Seoul, South Korea, 2009
Google Scholar
Walrand, J.C., Varaiya, P.: Causal coding and control of markov chains. Syst. Control Lett. 3, 189–192 (1983)
Article MathSciNet MATH Google Scholar
Walrand, J.C., Varaiya, P.: Optimal causal coding-decoding problems. IEEE Trans. Inf. Theor. 19, 814–820 (1983)
Article MathSciNet Google Scholar
Weissman, T., Merhav, N.: On causal source codes with side information. IEEE Trans. Inf. Theor. 51, 4003–4013 (2005)
Article MathSciNet Google Scholar
Wernersson, N., Karlsson, J., Skoglund, M.: Distributed quantization over noisy channels. IEEE Trans. Commun. 57, 1693–1700 (2009)
Article Google Scholar
Witsenhausen, H.S.: On the structure of real-time source coders. Bell Syst. Tech. J. 58, 1437–1451 (1979)
MATH Google Scholar
Witsenhausen, H.S.: Indirect rate-distortion problems. IEEE Trans. Inf. Theor. 26, 518–521 (1980)
Article MathSciNet Google Scholar
Yüksel, S.: On optimal causal coding of partially observed Markov sources under classical and non-classical information structures. In: Proceedings of the IEEE International Symposium on Information Theory, pp. 81–85, Austin, TX, 2010
Google Scholar
Yüksel, S.: Jointly optimal LQG quantization and control policies for multi-dimensional linear Gaussian sources. In: Proceedings of the Annual Allerton Conference on Communications, Control and Computing, Monticello, IL, 2012
Google Scholar
Yüksel, S.: On optimal causal coding of partially observed Markov sources in single and multi-terminal settings. IEEE Trans. Inf. Theor. 59, 424–437 (2013)
Article Google Scholar
Yüksel, S., Başar, T.: Minimum rate coding for LTI systems over noiseless channels. IEEE Trans. Autom. Control 51(12), 1878–1887 (2006)
Article Google Scholar
Yüksel, S., Başar, T., Meyn, S.P.: Optimal causal quantization of Markov sources with distortion constraints. In: Proceedings of the Information Theory and Applications (ITA) Workshop, UCSD, 2008
Google Scholar
Yüksel, S., Linder, T.: On optimal zero-delay quantization of vector Markov sources. Technical Report, Queen’s University
Google Scholar
Yüksel, S., Linder, T.: On optimal zero-delay quantization of vector Markov sources. In: Proceedings of the IEEE Conference on Decision and Control, Hawaii, Dec 2012
Google Scholar
Yüksel, S., Linder, T.: Optimization and convergence of observation channels in stochastic control. SIAM J. Control Optim. 50, 864–887 (2012)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada
Serdar Yüksel
Department of Electrical and Computer Engineering, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Tamer Başar

Authors

Serdar Yüksel
View author publications
You can also search for this author in PubMed Google Scholar
Tamer Başar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yüksel, S., Başar, T. (2013). Optimization of Real-Time Coding and Control Policies: Structural and Existence Results. In: Stochastic Networked Control Systems. Systems & Control: Foundations & Applications. Birkhäuser, New York, NY. https://doi.org/10.1007/978-1-4614-7085-4_10

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7085-4_10
Published: 09 April 2013
Publisher Name: Birkhäuser, New York, NY
Print ISBN: 978-1-4614-7084-7
Online ISBN: 978-1-4614-7085-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics