Partially observed optimal stopping problem for discrete-time Markov processes

de Saporta, Benoîte; Dufour, François; Nivot, Christophe

doi:10.1007/s10288-016-0337-8

Partially observed optimal stopping problem for discrete-time Markov processes

Research paper
Published: 21 December 2016

Volume 15, pages 277–302, (2017)
Cite this article

4OR Aims and scope Submit manuscript

Benoîte de Saporta²,
François Dufour¹ &
Christophe Nivot¹

278 Accesses
3 Citations
Explore all metrics

Abstract

This paper is dedicated to the investigation of a new numerical method to approximate the optimal stopping problem for a discrete-time continuous state space Markov chain under partial observations. It is based on a two-step discretization procedure based on optimal quantization. First, we discretize the state space of the unobserved variable by quantizing an underlying reference measure. Then we jointly discretize the resulting approximate filter and the observation process. We obtain a fully computable approximation of the value function with explicit error bounds for its convergence towards the true value function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximation of Infinite Horizon Discounted Cost Markov Decision Processes

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Article 29 September 2018

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Article 19 October 2019

References

Bally V, Pagès G, Printems J (2005) A quantization tree method for pricing and hedging multidimensional American options. Math Finance 15(1):119–168
Article Google Scholar
Bäuerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer, Heidelberg
Book Google Scholar
Graf S, Luschgy H (2000) Foundations of quantization for probability distributions. Lecture notes in mathematics, vol 1730. Springer, Berlin
Book Google Scholar
Hernández-Lerma O (1989) Adaptive Markov control processes. Applied mathematical sciences, vol 79. Springer, New York
Book Google Scholar
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes. Applications of mathematics (New York), vol 30. Springer, New York
Book Google Scholar
Pagès G, Pham H, Printems J (2004a) An optimal Markovian quantization algorithm for multi-dimensional stochastic control problems. Stoch Dyn 4(4):501–545
Article Google Scholar
Pagès G, Pham H, Printems J (2004b) Optimal quantization methods and applications to numerical problems in finance. Handbook of computational and numerical methods in finance. Birkhäuser, Boston
Google Scholar
Pham H, Runggaldier W, Sellami A (2005) Approximation by quantization of the filter process and applications to optimal stopping problems under partial observation. Monte Carlo Methods Appl 11(1):57–81
Article Google Scholar
Ye F, Zhou E (2013) Optimal stopping of partially observable markov processes: a filtering-based duality approach. IEEE Trans Autom Control 58(10):2698–2704
Article Google Scholar
Zhou E (2013) Optimal stopping under partial observation: near-value iteration. IEEE Trans Autom Control 58(2):500–506
Article Google Scholar
Zhou E, Fu M, Marcus S (2010) Solving continuous-state pomdps via density projection. IEEE Trans Autom Control 55(5):1101–1116
Article Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, 200 Avenue de la Vieille Tour, 33405, Talence Cedex, France
François Dufour & Christophe Nivot
IMAG, Case Courrier 051, Place Eugène Bataillon, 34095, Montpellier Cedex 5, France
Benoîte de Saporta

Authors

Benoîte de Saporta
View author publications
You can also search for this author in PubMed Google Scholar
François Dufour
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Nivot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to François Dufour.

Ethics declarations

Conflict of interest

The authors have no potential conflict of interest

Appendix A: Proof of Theorem 1

In order to prove Theorem 1, we need to introduce a new auxiliary control model $\mathbf {M}$ given by the five-tuple $\big (\mathbb {F}, \mathbb {A},T,H,h\big )$ where

(a)
the state space is $\mathbb {F}=\mathbb {X}\times \mathbb {Y}\times \{0,1\}$,
(b)
the action space is $\mathbb {A}=\{0,1\}$,
(c)
the transition probability function is given the stochastic kernel T on $\mathbb {F}$ given $\mathbb {F}\times \mathbb {A}$ defined by $T(B\times C\times D |x,y,z,a)=R(B\times C |x,y) \big [ \delta _{z}(D) I_{\{a=0\}}+ \delta _{1}(D) I_{\{a=1\}} \big ]$ for any $B\in \mathcal {B}(\mathbb {X}), C\in \mathcal {B}(\mathbb {Y}), D\subset \{0,1\}$ and $(x,y,z,a)\in \mathbb {F}\times \mathbb {A}$,
(d)
the cost-per-stage H and the terminal cost h.

Define $\varvec{\Omega }=\mathbb {F}^{N_{0}+1}$ and $\varvec{\mathcal {F}}$ its associated product $\sigma $-algebra. Introduce the coordinate projections $\mathbf {X}_{t}$ (respectively $\mathbf {Y}_{t}$, and $\mathbf {Z}_{t}$) from $\varvec{\Omega }$ to the set $\mathbb {X}$ (respectively $\mathbb {Y}$, and $\{0,1\}$). Consider an arbitrary policy $\pi \in \varPi ^{o}$. Define recursively the action process $\{\mathbf {A}_{t}\}_{t\in \llbracket 0 ; N_{0}-1 \rrbracket }$ by $\mathbf {A}_{t}=\pi _{t}(\mathbf {Y}_{0},\mathbf {Z}_{0},\mathbf {A}_{0},\ldots , \mathbf {Y}_{t-1},\mathbf {Z}_{t-1},\mathbf {A}_{t-1},\mathbf {Y}_{t},\mathbf {Z}_{t})$ for $t\in \llbracket 1 ; N_{0}-1 \rrbracket $ and $\mathbf {A}_{0}=\pi _{0}(\mathbf {Y}_{0},\mathbf {Z}_{0})$. Define the filtration $\{\varvec{\mathcal {F}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }$ by $\varvec{\mathcal {F}}_{t}=\sigma \{\mathbf {X}_{0},\mathbf {Y}_{0},\mathbf {Z}_{0},\ldots ,\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t}\}$ for $t\in \llbracket 0 ; N_{0} \rrbracket $. According to Bäuerle and Rieder (2011), Hernández-Lerma and Lasserre (1996), there exists a probability measure $\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}$ on $\big ( \varvec{\Omega },\varvec{\mathcal {F}} \big )$ satisfying

(i)
$\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}\big ((\mathbf {X}_{0},\mathbf {Y}_{0},\mathbf {Z}_{0})\in B\times C \times D\big )= \delta _{(\mathbf {x},\mathbf {y})}(B\times C) \delta _{0}(D)$,
(ii)
$\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}\big ((\mathbf {X}_{t+1},\mathbf {Y}_{t+1},\mathbf {Z}_{t+1})\in B\times C\times D |\varvec{\mathcal {F}}_{t} \big ) =T(B\times C\times D|\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t},\mathbf {A}_{t})$,

for $t\in \llbracket 0 ; N_{0}-1 \rrbracket , B\in \mathcal {B}(\mathbb {X}), C\in \mathcal {B}(\mathbb {Y}), D\subset \{0,1\}$.

The expectation under the probability $\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}$ is denoted by $\mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})}$. For a policy $\pi \in \varPi ^{o}$, the performance criterion is given by

$$\begin{aligned} {\varvec{\mathcal {H}}_{\mathbf {M}}}(\mathbf {x},\mathbf {y},\pi )= & {} \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \left[ \sum _{t=0}^{N_{0}-1} H(\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t},\mathbf {A}_{t}) \right] +\mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})}\big [ h(\mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}},\mathbf {Z}_{N_{0}}) \big ].\nonumber \\ \end{aligned}$$

(39)

The optimization problem we are interested in is to maximize the reward function $\varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi )$ over $\varPi ^{o}$ and $\overline{\varvec{\mathcal {H}}}_{\mathbf {M}}(\mathbf {x},\mathbf {y})=\sup _{\pi \in \varPi ^{o}} \varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi )$. We first need to prove the following technical lemma.

Lemma 6

For any $t\in \llbracket 0 ; N_{0}\rrbracket $,

$$\begin{aligned} \sigma \{\mathbf {Y}_{0},\mathbf {Z}_{0},\ldots ,\mathbf {Y}_{t},\mathbf {Z}_{t}\} = \sigma \{\mathbf {Y}_{0},\ldots ,\mathbf {Y}_{t}\}. \end{aligned}$$

Proof

Clearly, $\mathbf {A}_{t}$ is measurable with respect to $\sigma \{\mathbf {Y}_{0},\mathbf {Z}_{0},\ldots ,\mathbf {Y}_{t},\mathbf {Z}_{t}\}$ for $t\in \llbracket 0 ; N_{0}-1 \rrbracket $. Moreover, from the definition of the transition kernel T, we obtain that $\mathbf {Z}_{t}=I_{\{\mathbf {A}_{t-1}=1\}}+\mathbf {Z}_{t-1}I_{\{\mathbf {A}_{t-1}=0\}}$ for any $t\in \llbracket 1 ; N_{0} \rrbracket $. Indeed, the kernel T sends z onto itself whenever $a=0$ and onto 1 whenever $a=1$. Recalling that $\mathbf {Z}_{0}=0$, it follows easily $\sigma \{\mathbf {Y}_{0},\mathbf {Z}_{0},\ldots ,\mathbf {Y}_{t},\mathbf {Z}_{t}\} \subset \sigma \{\mathbf {Y}_{0},\ldots ,\mathbf {Y}_{t}\}$ for $t\in \llbracket 0 ; N_{0}\rrbracket $ showing the result. $\square $

The next result shows that the optimization problem defined through $\varvec{\mathcal {M}}$ is equivalent to the initial optimal stopping problem defined in Definition 1.

Proposition 2

The following assertions hold.

(i)
For any control $\ell \in L$, there exist a policy $\pi \in \varPi ^{o}$ such that
$$\begin{aligned} \varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi )=\varvec{\mathcal {H}}(\mathbf {x},\mathbf {y},\ell ). \end{aligned}$$
(ii)
For any policy $\pi \in \varPi ^{o}$, there exist a control $\ell \in L$ such that
$$\begin{aligned} \varvec{\mathcal {H}}(\mathbf {x},\mathbf {y},\ell )=\varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi ). \end{aligned}$$

Proof

Regarding item (i), consider a control $\ell =\big ( {\varvec{\Xi }},{\varvec{\mathcal {G}}},{\mathbf {Q}},\{{\varvec{\mathcal {G}}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket },\{\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket },\tau \big )$ in L. On the probability space $\big ( \varvec{\Xi },{\varvec{\mathcal {G}}},{\mathbf {Q}}\big )$, let us define the processes $\{\varvec{\mathcal {A}}_{t}\}_{t\in \llbracket 0 ; N_{0}-1 \rrbracket }$ and $\{\varvec{\mathcal {Z}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }$ by $\varvec{\mathcal {A}}_{t}=I_{\{\tau \le t\}}$ and $\varvec{\mathcal {Z}}_{t}=\varvec{\mathcal {A}}_{t-1}$ for $t\in \llbracket 1 ; N_{0} \rrbracket $ and $\varvec{\mathcal {Z}}_{0}=0$. Introduce the filtrations $\{{\varvec{\mathcal {T}}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }$ by ${\varvec{\mathcal {T}}}_{t}=\sigma \{\varvec{\mathcal {X}}_{0},\varvec{\mathcal {Y}}_{0},\varvec{\mathcal {Z}}_{0},\varvec{\mathcal {A}}_{0},\ldots ,\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t},\varvec{\mathcal {Z}}_{t},\varvec{\mathcal {A}}_{t}\}$ and $\{{\varvec{\mathcal {G}}}^{\varvec{\mathcal {Y}}}_{t} \}_{t\in \llbracket 0 ; N_{0} \rrbracket }$ by ${\varvec{\mathcal {G}}}^{\varvec{\mathcal {Y}}}_{t}=\sigma \{\varvec{\mathcal {Y}}_{0},\ldots ,\varvec{\mathcal {Y}}_{t}\}$. Since $\tau $ is an $\{{\varvec{\mathcal {G}}}^{\varvec{\mathcal {Y}}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }$-stopping time, we have $\varvec{\mathcal {T}}_{t}\subset {\varvec{\mathcal {G}}}_{t}^{\varvec{\mathcal {Y}}}$. Moreover, $\varvec{\mathcal {Z}}_{t+1}$ is $\varvec{\mathcal {T}}_{t}$-measurable. Consequently, it is easy to show that

$$\begin{aligned} \mathbf {Q}\big ((\varvec{\mathcal {X}}_{t+1},\varvec{\mathcal {Y}}_{t+1},\varvec{\mathcal {Z}}_{t+1})\in B\times C\times D | \varvec{\mathcal {T}}_{t} \big ) =I_{\{\varvec{\mathcal {Z}}_{t+1}\in D\}} R(B\times C | \varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}). \end{aligned}$$

We have $\{\varvec{\mathcal {A}}_{t}=1\}=\{\varvec{\mathcal {Z}}_{t+1}=1\}$ and $\{\varvec{\mathcal {A}}_{t}=0\}=\{\varvec{\mathcal {Z}}_{t+1}=0\}\subset \{\varvec{\mathcal {A}}_{t-1}=0\}=\{\varvec{\mathcal {Z}}_{t}=0\}$, and so

$$\begin{aligned}&\mathbf {Q}\big ((\varvec{\mathcal {X}}_{t+1},\varvec{\mathcal {Y}}_{t+1},\varvec{\mathcal {Z}}_{t+1})\in B\times C\times D | \varvec{\mathcal {T}}_{t} \big ) \nonumber \\&\quad = \big [I_{\{\varvec{\mathcal {A}}_{t}=0\}} \delta _{\{\varvec{\mathcal {Z}}_{t}\in D\}} +I_{\{\varvec{\mathcal {A}}_{t}=1\}} \delta _{1}(D)\big ] R(B\times C | \varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}) \nonumber \\&\quad = T(B\times C \times D | \varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t},\varvec{\mathcal {Z}}_{t},\varvec{\mathcal {A}}_{t}). \end{aligned}$$

(40)

Now, there exists an $\mathbb {A}$-valued measurable mapping $\pi _{t}$ defined on $\mathbb {Y}^{t+1}$ satisfying $\varvec{\mathcal {A}}_{t}=\pi _{t}(\varvec{\mathcal {Y}}_{0},\ldots ,\varvec{\mathcal {Y}}_{t})$ and so,

$$\begin{aligned} \mathbf {Q}(\varvec{\mathcal {A}}_{t}\in F| \sigma \{\varvec{\mathcal {Y}}_{0},\varvec{\mathcal {Z}}_{0},\varvec{\mathcal {A}}_{0},\ldots ,\varvec{\mathcal {Y}}_{t},\varvec{\mathcal {Z}}_{t}\} )=\delta _{\pi _{t}(\varvec{\mathcal {Y}}_{0},\ldots ,\varvec{\mathcal {Y}}_{t})}(F), \end{aligned}$$

(41)

for any $t\in \llbracket 0 ; N_{0}-1 \rrbracket $ and $F\subset \mathbb {A}$. Recall that

$$\begin{aligned} \mathbf {Q}\big ((\varvec{\mathcal {X}}_{0},\varvec{\mathcal {Y}}_{0},\varvec{\mathcal {Z}}_{0})\in B\times C \times D\big )= \delta _{(\mathbf {x},\mathbf {y})}(B\times C) \delta _{0}(D) \end{aligned}$$

(42)

for any $B\in \mathcal {B}(\mathbb {X}), C\in \mathcal {B}(\mathbb {Y}), D\subset \{0,1\}$. Combining Eqs. (40)–(42) and by the uniqueness property in the Theorem of Ionescu-Tulcea (see, e.g. Hernández-Lerma and Lasserre 1996, Proposition C.10), it follows that for the control policy $\pi =\{\pi _{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }$

$$\begin{aligned}&\mathbf {Q}\big ((\varvec{\mathcal {X}}_{0}, \varvec{\mathcal {Y}}_{0},\varvec{\mathcal {Z}}_{0},\varvec{\mathcal {A}}_{0},\ldots , \varvec{\mathcal {X}}_{N_{0}-1},\varvec{\mathcal {Y}}_{N_{0}-1},\varvec{\mathcal {Z}}_{N_{0}-1},\varvec{\mathcal {A}}_{N_{0}-1}, \varvec{\mathcal {X}}_{N_{0}},\varvec{\mathcal {Y}}_{N_{0}},\varvec{\mathcal {Z}}_{N_{0}})\in H\big )\nonumber \\&\quad =\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}\big ((\mathbf {X}_{0},\mathbf {Y}_{0},\mathbf {Z}_{0},\mathbf {A}_{0},\ldots ,\mathbf {X}_{N_{0}-1},\mathbf {Y}_{N_{0}-1},\mathbf {Z}_{N_{0}-1},\mathbf {A}_{N_{0}-1}, \mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}},\mathbf {Z}_{N_{0}})\in H \big ) \end{aligned}$$

(43)

for any $H\in \varvec{\mathcal {F}}$.

Observe that for $k\in \llbracket 0 ; N_{0}-1 \rrbracket $ we have $\{\tau =k\}=\{\mathbf {Z}_{k}=0\}\cup \{\mathbf {A}_{k}=1\}$ and $\{\tau =N_{0}\}=\{\mathbf {Z}_{N_{0}}=0\}$. Consequently,

$$\begin{aligned} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{\tau },\varvec{\mathcal {Y}}_{\tau })\big ]&= \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}) I_{\{\tau =t\}}\big ] +\mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{N_{0}},\varvec{\mathcal {Y}}_{N_{0}}) I_{\{\tau =N_{0}\}}\big ] \nonumber \\&= \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}) I_{\{(\varvec{\mathcal {Z}}_{t},\varvec{\mathcal {A}}_{t})=(0,1)\}}\big ]\\&\quad +\mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{N_{0}},\varvec{\mathcal {Y}}_{N_{0}}) I_{\{\varvec{\mathcal {Z}}_{N_{0}}=1\}}\big ]. \end{aligned}$$

Now, by using the definitions of H and h we get

$$\begin{aligned} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{\tau },\varvec{\mathcal {Y}}_{\tau })\big ]&= \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ H(\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t},\varvec{\mathcal {Z}}_{t},\varvec{\mathcal {A}}_{t}) \big ]\\&\quad +\mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ h(\varvec{\mathcal {X}}_{N_{0}},\varvec{\mathcal {Y}}_{N_{0}},\varvec{\mathcal {Z}}_{N_{0}}) \big ]. \end{aligned}$$

By using Eq. (43), it follows that

$$\begin{aligned} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{\tau },\varvec{\mathcal {Y}}_{\tau })\big ]&= \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ H(\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t},\mathbf {A}_{t}) \big ] +\mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})}\big [ h(\mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}},\mathbf {Z}_{N_{0}}) \big ], \end{aligned}$$

showing the first claim.

Regarding item (ii), let $\pi $ be a policy in $\varPi ^{o}$. Then, on the probability space $\big (\varvec{\Omega },\varvec{\mathcal {F}},\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})} \big ), \{ \mathbf {X}_{t},\mathbf {Y}_{t} \}_{t\in \llbracket 0 ; N_{0}\rrbracket }$ is an $\{\varvec{\mathcal {F}}_{t}\}_{t\in \llbracket 0 ; N_{0}\rrbracket }$-adapted Markov chain with transition kernel R and with initial distribution $\delta _{(\mathbf {x},\mathbf {y})}$. Introduce the $ \llbracket 0 ; N_{0}\rrbracket $-valued random variable $\tau $ defined by

$$\begin{aligned} \tau ={\left\{ \begin{array}{ll} \inf \{k\in \llbracket 0 ; N_{0}-1\rrbracket : \mathbf {A}_{k}=1 \} &{} \text {if } \{k\in \llbracket 0 ; N_{0}-1\rrbracket : \mathbf {X}_{k}=1 \}\ne \emptyset , \\ N_{0} &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

It follows from Lemma 6 that $\tau $ is a stopping time with respect to $\big \{\sigma \{\mathbf {Y}_{0},\ldots ,\mathbf {Y}_{t}\}_{t\in \llbracket 0 ;N_{0}\rrbracket }\big \}$ showing that the control $\lambda $ defined by $\Big (\varvec{\Omega },\varvec{\mathcal {F}},\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}, \{\varvec{\mathcal {F}}_{t}\}_{t\in \llbracket 0 ; N_{0}\rrbracket },\{ \mathbf {X}_{t},\mathbf {Y}_{t} \}_{t\in \llbracket 0 ; N_{0}\rrbracket },\tau \Big )$ belongs to $\Lambda $. Recalling that $\mathbf {Z}_{0}=0$ and that $\mathbf {Z}_{t}=I_{\{\mathbf {A}_{t-1}=1\}}+\mathbf {Z}_{t-1}I_{\{\mathbf {A}_{t-1}=0\}}$ for any $t\in \llbracket 1 ; N_{0} \rrbracket $, we get that $\{\tau =t\}=\{\mathbf {Z}_{t}=0\}\cup \{\mathbf {A}_{t}=1\}$ for $t\in \llbracket 0 ; N_{0}-1 \rrbracket $ and $\{\tau =N_{0}\}=\{\mathbf {Z}_{N_{0}}=0\}$. Now, by using the definitions of H and h it follows that

$$\begin{aligned}&\sum _{t=0}^{N_{0}-1} \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ H(\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t},\mathbf {A}_{t}) \big ] +\mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})}\big [ h(\mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}},\mathbf {Z}_{N_{0}}) \big ] \nonumber \\&\quad = \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ \mathbf {H}(\mathbf {X}_{t},\mathbf {Y}_{t}) I_{\{(\mathbf {Z}_{t},\mathbf {A}_{t})=(0,1)\}}\big ] + \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ \mathbf {H}(\mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}}) I_{\{\mathbf {Z}_{N_{0}}=1\}}\big ] \nonumber \\&\quad = \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ \mathbf {H}(\mathbf {X}_{\tau },\mathbf {Y}_{\tau }) \big ], \end{aligned}$$

implying that $\varvec{\mathcal {H}}(\mathbf {x},\mathbf {y},\ell )=\varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi )$ and showing the second claim. $\square $

Proof of Theorem 1

From Theorem 5.3.2 in Bäuerle and Rieder (2011) we get that $\overline{\varvec{\mathcal {H}}}_{\mathbf {M}}(\mathbf {x},\mathbf {y})=\overline{\varvec{\mathcal {H}}}_{\mathcal {M}}(\mathbf {x},\mathbf {y})$ and so from Proposition 2, it follows that $\overline{\varvec{\mathcal {H}}}(\mathbf {x},\mathbf {y})=\overline{\varvec{\mathcal {H}}}_{\mathcal {M}}(\mathbf {x},\mathbf {y})$ giving the first equality in Eq. (8). Under Assumptions (A1), B and D, the hypotheses of Theorems 5.3.3 in Bäuerle and Rieder (2011) are satisfied. Therefore, it follows that the Bellman equation $\{v_{k}\}_{ k \in \llbracket 0 ; N_{0}\rrbracket }$ for the model $\mathcal {M}$ is given by

$$\begin{aligned} {\left\{ \begin{array}{ll} v_{0}(\theta ,y,z) = h(\theta ,y,z) \\ v_{k}(\theta ,y,z) = \max _{a\in \mathbb {A}} \big \{ H(\theta ,y,z,a)+Qv_{k-1}(\theta ,y,z,a) \big \} \end{array}\right. } \end{aligned}$$

and satisfies $\overline{\varvec{\mathcal {H}}}_{\mathcal {M}}(\mathbf {x},\mathbf {y})=v_{N_{0}}(\delta _{\mathbf {x}},\mathbf {y},0)$. However, since $h(\theta ,y,1)=H(\theta ,y,1)=0$, it easy to show that $v_{k}(\theta ,y,1)=0$ for any $(\theta ,y)\in \mathcal {P}(\mathbb {X})\times \mathbb {Y}$ and $k\in \llbracket 0 ; N_{0}\rrbracket $. Moreover, by using the definitions of h, H and the kernel Q we obtain that $v_{0}(\theta ,y,0) = \mathbf {H}(\theta ,y)$ and

$$\begin{aligned} v_{k}(\theta ,y,0)&= \max _{a\in \mathbb {A}} \big \{ H(\theta ,y,0,a)+Qv_{k-1}(\theta ,y,0,a) \big \} \nonumber \\&= \max \big \{ \mathbf {H}(\theta ,y),Sv_{k-1}(\theta ,y) \big \} = \mathfrak {B}v_{k-1}(\theta ,y) \end{aligned}$$

for any $(\theta ,y)\in \mathcal {P}(\mathbb {X})\times \mathbb {Y}$ and $k\in \llbracket 1 ; N_{0}\rrbracket $ implying that $\overline{\varvec{\mathcal {H}}}_{\mathcal {M}}(\mathbf {x},\mathbf {y})=\mathfrak {B}^{N_{0}}\mathbf {H}(\delta _{\mathbf {x}},\mathbf {y})$ and giving the second equality in Eq. (8). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Saporta, B., Dufour, F. & Nivot, C. Partially observed optimal stopping problem for discrete-time Markov processes. 4OR-Q J Oper Res 15, 277–302 (2017). https://doi.org/10.1007/s10288-016-0337-8

Download citation

Received: 19 July 2016
Revised: 13 November 2016
Published: 21 December 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10288-016-0337-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partially observed optimal stopping problem for discrete-time Markov processes

Abstract

Access this article

Similar content being viewed by others

Approximation of Infinite Horizon Discounted Cost Markov Decision Processes

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix A: Proof of Theorem 1

Lemma 6

Proof

Proposition 2

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Partially observed optimal stopping problem for discrete-time Markov processes

Abstract

Access this article

Similar content being viewed by others

Approximation of Infinite Horizon Discounted Cost Markov Decision Processes

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix A: Proof of Theorem 1

Appendix A: Proof of Theorem 1

Lemma 6

Proof

Proposition 2

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation