Skip to main content
Log in

Partially observed optimal stopping problem for discrete-time Markov processes

  • Research paper
  • Published:
4OR Aims and scope Submit manuscript

Abstract

This paper is dedicated to the investigation of a new numerical method to approximate the optimal stopping problem for a discrete-time continuous state space Markov chain under partial observations. It is based on a two-step discretization procedure based on optimal quantization. First, we discretize the state space of the unobserved variable by quantizing an underlying reference measure. Then we jointly discretize the resulting approximate filter and the observation process. We obtain a fully computable approximation of the value function with explicit error bounds for its convergence towards the true value function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bally V, Pagès G, Printems J (2005) A quantization tree method for pricing and hedging multidimensional American options. Math Finance 15(1):119–168

    Article  Google Scholar 

  • Bäuerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer, Heidelberg

    Book  Google Scholar 

  • Graf S, Luschgy H (2000) Foundations of quantization for probability distributions. Lecture notes in mathematics, vol 1730. Springer, Berlin

    Book  Google Scholar 

  • Hernández-Lerma O (1989) Adaptive Markov control processes. Applied mathematical sciences, vol 79. Springer, New York

    Book  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes. Applications of mathematics (New York), vol 30. Springer, New York

    Book  Google Scholar 

  • Pagès G, Pham H, Printems J (2004a) An optimal Markovian quantization algorithm for multi-dimensional stochastic control problems. Stoch Dyn 4(4):501–545

    Article  Google Scholar 

  • Pagès G, Pham H, Printems J (2004b) Optimal quantization methods and applications to numerical problems in finance. Handbook of computational and numerical methods in finance. Birkhäuser, Boston

    Google Scholar 

  • Pham H, Runggaldier W, Sellami A (2005) Approximation by quantization of the filter process and applications to optimal stopping problems under partial observation. Monte Carlo Methods Appl 11(1):57–81

    Article  Google Scholar 

  • Ye F, Zhou E (2013) Optimal stopping of partially observable markov processes: a filtering-based duality approach. IEEE Trans Autom Control 58(10):2698–2704

    Article  Google Scholar 

  • Zhou E (2013) Optimal stopping under partial observation: near-value iteration. IEEE Trans Autom Control 58(2):500–506

    Article  Google Scholar 

  • Zhou E, Fu M, Marcus S (2010) Solving continuous-state pomdps via density projection. IEEE Trans Autom Control 55(5):1101–1116

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to François Dufour.

Ethics declarations

Conflict of interest

The authors have no potential conflict of interest

Appendix A: Proof of Theorem 1

Appendix A: Proof of Theorem 1

In order to prove Theorem 1, we need to introduce a new auxiliary control model \(\mathbf {M}\) given by the five-tuple \(\big (\mathbb {F}, \mathbb {A},T,H,h\big )\) where

  1. (a)

    the state space is \(\mathbb {F}=\mathbb {X}\times \mathbb {Y}\times \{0,1\}\),

  2. (b)

    the action space is \(\mathbb {A}=\{0,1\}\),

  3. (c)

    the transition probability function is given the stochastic kernel T on \(\mathbb {F}\) given \(\mathbb {F}\times \mathbb {A}\) defined by \(T(B\times C\times D |x,y,z,a)=R(B\times C |x,y) \big [ \delta _{z}(D) I_{\{a=0\}}+ \delta _{1}(D) I_{\{a=1\}} \big ]\) for any \(B\in \mathcal {B}(\mathbb {X}), C\in \mathcal {B}(\mathbb {Y}), D\subset \{0,1\}\) and \((x,y,z,a)\in \mathbb {F}\times \mathbb {A}\),

  4. (d)

    the cost-per-stage H and the terminal cost h.

Define \(\varvec{\Omega }=\mathbb {F}^{N_{0}+1}\) and \(\varvec{\mathcal {F}}\) its associated product \(\sigma \)-algebra. Introduce the coordinate projections \(\mathbf {X}_{t}\) (respectively \(\mathbf {Y}_{t}\), and \(\mathbf {Z}_{t}\)) from \(\varvec{\Omega }\) to the set \(\mathbb {X}\) (respectively \(\mathbb {Y}\), and \(\{0,1\}\)). Consider an arbitrary policy \(\pi \in \varPi ^{o}\). Define recursively the action process \(\{\mathbf {A}_{t}\}_{t\in \llbracket 0 ; N_{0}-1 \rrbracket }\) by \(\mathbf {A}_{t}=\pi _{t}(\mathbf {Y}_{0},\mathbf {Z}_{0},\mathbf {A}_{0},\ldots , \mathbf {Y}_{t-1},\mathbf {Z}_{t-1},\mathbf {A}_{t-1},\mathbf {Y}_{t},\mathbf {Z}_{t})\) for \(t\in \llbracket 1 ; N_{0}-1 \rrbracket \) and \(\mathbf {A}_{0}=\pi _{0}(\mathbf {Y}_{0},\mathbf {Z}_{0})\). Define the filtration \(\{\varvec{\mathcal {F}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }\) by \(\varvec{\mathcal {F}}_{t}=\sigma \{\mathbf {X}_{0},\mathbf {Y}_{0},\mathbf {Z}_{0},\ldots ,\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t}\}\) for \(t\in \llbracket 0 ; N_{0} \rrbracket \). According to Bäuerle and Rieder (2011), Hernández-Lerma and Lasserre (1996), there exists a probability measure \(\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}\) on \(\big ( \varvec{\Omega },\varvec{\mathcal {F}} \big )\) satisfying

  1. (i)

    \(\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}\big ((\mathbf {X}_{0},\mathbf {Y}_{0},\mathbf {Z}_{0})\in B\times C \times D\big )= \delta _{(\mathbf {x},\mathbf {y})}(B\times C) \delta _{0}(D)\),

  2. (ii)

    \(\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}\big ((\mathbf {X}_{t+1},\mathbf {Y}_{t+1},\mathbf {Z}_{t+1})\in B\times C\times D |\varvec{\mathcal {F}}_{t} \big ) =T(B\times C\times D|\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t},\mathbf {A}_{t})\),

for \(t\in \llbracket 0 ; N_{0}-1 \rrbracket , B\in \mathcal {B}(\mathbb {X}), C\in \mathcal {B}(\mathbb {Y}), D\subset \{0,1\}\).

The expectation under the probability \(\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}\) is denoted by \(\mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})}\). For a policy \(\pi \in \varPi ^{o}\), the performance criterion is given by

$$\begin{aligned} {\varvec{\mathcal {H}}_{\mathbf {M}}}(\mathbf {x},\mathbf {y},\pi )= & {} \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \left[ \sum _{t=0}^{N_{0}-1} H(\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t},\mathbf {A}_{t}) \right] +\mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})}\big [ h(\mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}},\mathbf {Z}_{N_{0}}) \big ].\nonumber \\ \end{aligned}$$
(39)

The optimization problem we are interested in is to maximize the reward function \(\varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi )\) over \(\varPi ^{o}\) and \(\overline{\varvec{\mathcal {H}}}_{\mathbf {M}}(\mathbf {x},\mathbf {y})=\sup _{\pi \in \varPi ^{o}} \varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi )\). We first need to prove the following technical lemma.

Lemma 6

For any \(t\in \llbracket 0 ; N_{0}\rrbracket \),

$$\begin{aligned} \sigma \{\mathbf {Y}_{0},\mathbf {Z}_{0},\ldots ,\mathbf {Y}_{t},\mathbf {Z}_{t}\} = \sigma \{\mathbf {Y}_{0},\ldots ,\mathbf {Y}_{t}\}. \end{aligned}$$

Proof

Clearly, \(\mathbf {A}_{t}\) is measurable with respect to \(\sigma \{\mathbf {Y}_{0},\mathbf {Z}_{0},\ldots ,\mathbf {Y}_{t},\mathbf {Z}_{t}\}\) for \(t\in \llbracket 0 ; N_{0}-1 \rrbracket \). Moreover, from the definition of the transition kernel T, we obtain that \(\mathbf {Z}_{t}=I_{\{\mathbf {A}_{t-1}=1\}}+\mathbf {Z}_{t-1}I_{\{\mathbf {A}_{t-1}=0\}}\) for any \(t\in \llbracket 1 ; N_{0} \rrbracket \). Indeed, the kernel T sends z onto itself whenever \(a=0\) and onto 1 whenever \(a=1\). Recalling that \(\mathbf {Z}_{0}=0\), it follows easily \(\sigma \{\mathbf {Y}_{0},\mathbf {Z}_{0},\ldots ,\mathbf {Y}_{t},\mathbf {Z}_{t}\} \subset \sigma \{\mathbf {Y}_{0},\ldots ,\mathbf {Y}_{t}\}\) for \(t\in \llbracket 0 ; N_{0}\rrbracket \) showing the result. \(\square \)

The next result shows that the optimization problem defined through \(\varvec{\mathcal {M}}\) is equivalent to the initial optimal stopping problem defined in Definition 1.

Proposition 2

The following assertions hold.

  1. (i)

    For any control \(\ell \in L\), there exist a policy \(\pi \in \varPi ^{o}\) such that

    $$\begin{aligned} \varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi )=\varvec{\mathcal {H}}(\mathbf {x},\mathbf {y},\ell ). \end{aligned}$$
  2. (ii)

    For any policy \(\pi \in \varPi ^{o}\), there exist a control \(\ell \in L\) such that

    $$\begin{aligned} \varvec{\mathcal {H}}(\mathbf {x},\mathbf {y},\ell )=\varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi ). \end{aligned}$$

Proof

Regarding item (i), consider a control \(\ell =\big ( {\varvec{\Xi }},{\varvec{\mathcal {G}}},{\mathbf {Q}},\{{\varvec{\mathcal {G}}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket },\{\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket },\tau \big )\) in L. On the probability space \(\big ( \varvec{\Xi },{\varvec{\mathcal {G}}},{\mathbf {Q}}\big )\), let us define the processes \(\{\varvec{\mathcal {A}}_{t}\}_{t\in \llbracket 0 ; N_{0}-1 \rrbracket }\) and \(\{\varvec{\mathcal {Z}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }\) by \(\varvec{\mathcal {A}}_{t}=I_{\{\tau \le t\}}\) and \(\varvec{\mathcal {Z}}_{t}=\varvec{\mathcal {A}}_{t-1}\) for \(t\in \llbracket 1 ; N_{0} \rrbracket \) and \(\varvec{\mathcal {Z}}_{0}=0\). Introduce the filtrations \(\{{\varvec{\mathcal {T}}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }\) by \({\varvec{\mathcal {T}}}_{t}=\sigma \{\varvec{\mathcal {X}}_{0},\varvec{\mathcal {Y}}_{0},\varvec{\mathcal {Z}}_{0},\varvec{\mathcal {A}}_{0},\ldots ,\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t},\varvec{\mathcal {Z}}_{t},\varvec{\mathcal {A}}_{t}\}\) and \(\{{\varvec{\mathcal {G}}}^{\varvec{\mathcal {Y}}}_{t} \}_{t\in \llbracket 0 ; N_{0} \rrbracket }\) by \({\varvec{\mathcal {G}}}^{\varvec{\mathcal {Y}}}_{t}=\sigma \{\varvec{\mathcal {Y}}_{0},\ldots ,\varvec{\mathcal {Y}}_{t}\}\). Since \(\tau \) is an \(\{{\varvec{\mathcal {G}}}^{\varvec{\mathcal {Y}}}_{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }\)-stopping time, we have \(\varvec{\mathcal {T}}_{t}\subset {\varvec{\mathcal {G}}}_{t}^{\varvec{\mathcal {Y}}}\). Moreover, \(\varvec{\mathcal {Z}}_{t+1}\) is \(\varvec{\mathcal {T}}_{t}\)-measurable. Consequently, it is easy to show that

$$\begin{aligned} \mathbf {Q}\big ((\varvec{\mathcal {X}}_{t+1},\varvec{\mathcal {Y}}_{t+1},\varvec{\mathcal {Z}}_{t+1})\in B\times C\times D | \varvec{\mathcal {T}}_{t} \big ) =I_{\{\varvec{\mathcal {Z}}_{t+1}\in D\}} R(B\times C | \varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}). \end{aligned}$$

We have \(\{\varvec{\mathcal {A}}_{t}=1\}=\{\varvec{\mathcal {Z}}_{t+1}=1\}\) and \(\{\varvec{\mathcal {A}}_{t}=0\}=\{\varvec{\mathcal {Z}}_{t+1}=0\}\subset \{\varvec{\mathcal {A}}_{t-1}=0\}=\{\varvec{\mathcal {Z}}_{t}=0\}\), and so

$$\begin{aligned}&\mathbf {Q}\big ((\varvec{\mathcal {X}}_{t+1},\varvec{\mathcal {Y}}_{t+1},\varvec{\mathcal {Z}}_{t+1})\in B\times C\times D | \varvec{\mathcal {T}}_{t} \big ) \nonumber \\&\quad = \big [I_{\{\varvec{\mathcal {A}}_{t}=0\}} \delta _{\{\varvec{\mathcal {Z}}_{t}\in D\}} +I_{\{\varvec{\mathcal {A}}_{t}=1\}} \delta _{1}(D)\big ] R(B\times C | \varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}) \nonumber \\&\quad = T(B\times C \times D | \varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t},\varvec{\mathcal {Z}}_{t},\varvec{\mathcal {A}}_{t}). \end{aligned}$$
(40)

Now, there exists an \(\mathbb {A}\)-valued measurable mapping \(\pi _{t}\) defined on \(\mathbb {Y}^{t+1}\) satisfying \(\varvec{\mathcal {A}}_{t}=\pi _{t}(\varvec{\mathcal {Y}}_{0},\ldots ,\varvec{\mathcal {Y}}_{t})\) and so,

$$\begin{aligned} \mathbf {Q}(\varvec{\mathcal {A}}_{t}\in F| \sigma \{\varvec{\mathcal {Y}}_{0},\varvec{\mathcal {Z}}_{0},\varvec{\mathcal {A}}_{0},\ldots ,\varvec{\mathcal {Y}}_{t},\varvec{\mathcal {Z}}_{t}\} )=\delta _{\pi _{t}(\varvec{\mathcal {Y}}_{0},\ldots ,\varvec{\mathcal {Y}}_{t})}(F), \end{aligned}$$
(41)

for any \(t\in \llbracket 0 ; N_{0}-1 \rrbracket \) and \(F\subset \mathbb {A}\). Recall that

$$\begin{aligned} \mathbf {Q}\big ((\varvec{\mathcal {X}}_{0},\varvec{\mathcal {Y}}_{0},\varvec{\mathcal {Z}}_{0})\in B\times C \times D\big )= \delta _{(\mathbf {x},\mathbf {y})}(B\times C) \delta _{0}(D) \end{aligned}$$
(42)

for any \(B\in \mathcal {B}(\mathbb {X}), C\in \mathcal {B}(\mathbb {Y}), D\subset \{0,1\}\). Combining Eqs. (40)–(42) and by the uniqueness property in the Theorem of Ionescu-Tulcea (see, e.g. Hernández-Lerma and Lasserre 1996, Proposition C.10), it follows that for the control policy \(\pi =\{\pi _{t}\}_{t\in \llbracket 0 ; N_{0} \rrbracket }\)

$$\begin{aligned}&\mathbf {Q}\big ((\varvec{\mathcal {X}}_{0}, \varvec{\mathcal {Y}}_{0},\varvec{\mathcal {Z}}_{0},\varvec{\mathcal {A}}_{0},\ldots , \varvec{\mathcal {X}}_{N_{0}-1},\varvec{\mathcal {Y}}_{N_{0}-1},\varvec{\mathcal {Z}}_{N_{0}-1},\varvec{\mathcal {A}}_{N_{0}-1}, \varvec{\mathcal {X}}_{N_{0}},\varvec{\mathcal {Y}}_{N_{0}},\varvec{\mathcal {Z}}_{N_{0}})\in H\big )\nonumber \\&\quad =\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}\big ((\mathbf {X}_{0},\mathbf {Y}_{0},\mathbf {Z}_{0},\mathbf {A}_{0},\ldots ,\mathbf {X}_{N_{0}-1},\mathbf {Y}_{N_{0}-1},\mathbf {Z}_{N_{0}-1},\mathbf {A}_{N_{0}-1}, \mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}},\mathbf {Z}_{N_{0}})\in H \big ) \end{aligned}$$
(43)

for any \(H\in \varvec{\mathcal {F}}\).

Observe that for \(k\in \llbracket 0 ; N_{0}-1 \rrbracket \) we have \(\{\tau =k\}=\{\mathbf {Z}_{k}=0\}\cup \{\mathbf {A}_{k}=1\}\) and \(\{\tau =N_{0}\}=\{\mathbf {Z}_{N_{0}}=0\}\). Consequently,

$$\begin{aligned} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{\tau },\varvec{\mathcal {Y}}_{\tau })\big ]&= \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}) I_{\{\tau =t\}}\big ] +\mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{N_{0}},\varvec{\mathcal {Y}}_{N_{0}}) I_{\{\tau =N_{0}\}}\big ] \nonumber \\&= \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t}) I_{\{(\varvec{\mathcal {Z}}_{t},\varvec{\mathcal {A}}_{t})=(0,1)\}}\big ]\\&\quad +\mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{N_{0}},\varvec{\mathcal {Y}}_{N_{0}}) I_{\{\varvec{\mathcal {Z}}_{N_{0}}=1\}}\big ]. \end{aligned}$$

Now, by using the definitions of H and h we get

$$\begin{aligned} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{\tau },\varvec{\mathcal {Y}}_{\tau })\big ]&= \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ H(\varvec{\mathcal {X}}_{t},\varvec{\mathcal {Y}}_{t},\varvec{\mathcal {Z}}_{t},\varvec{\mathcal {A}}_{t}) \big ]\\&\quad +\mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ h(\varvec{\mathcal {X}}_{N_{0}},\varvec{\mathcal {Y}}_{N_{0}},\varvec{\mathcal {Z}}_{N_{0}}) \big ]. \end{aligned}$$

By using Eq. (43), it follows that

$$\begin{aligned} \mathbf {E}^{\mathbf {Q}}_{(\mathbf {x},\mathbf {y})}\big [ \mathbf {H}(\varvec{\mathcal {X}}_{\tau },\varvec{\mathcal {Y}}_{\tau })\big ]&= \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ H(\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t},\mathbf {A}_{t}) \big ] +\mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})}\big [ h(\mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}},\mathbf {Z}_{N_{0}}) \big ], \end{aligned}$$

showing the first claim.

Regarding item (ii), let \(\pi \) be a policy in \(\varPi ^{o}\). Then, on the probability space \(\big (\varvec{\Omega },\varvec{\mathcal {F}},\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})} \big ), \{ \mathbf {X}_{t},\mathbf {Y}_{t} \}_{t\in \llbracket 0 ; N_{0}\rrbracket }\) is an \(\{\varvec{\mathcal {F}}_{t}\}_{t\in \llbracket 0 ; N_{0}\rrbracket }\)-adapted Markov chain with transition kernel R and with initial distribution \(\delta _{(\mathbf {x},\mathbf {y})}\). Introduce the \( \llbracket 0 ; N_{0}\rrbracket \)-valued random variable \(\tau \) defined by

$$\begin{aligned} \tau ={\left\{ \begin{array}{ll} \inf \{k\in \llbracket 0 ; N_{0}-1\rrbracket : \mathbf {A}_{k}=1 \} &{} \text {if } \{k\in \llbracket 0 ; N_{0}-1\rrbracket : \mathbf {X}_{k}=1 \}\ne \emptyset , \\ N_{0} &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

It follows from Lemma 6 that \(\tau \) is a stopping time with respect to \(\big \{\sigma \{\mathbf {Y}_{0},\ldots ,\mathbf {Y}_{t}\}_{t\in \llbracket 0 ;N_{0}\rrbracket }\big \}\) showing that the control \(\lambda \) defined by \(\Big (\varvec{\Omega },\varvec{\mathcal {F}},\mathbf {P}^{\pi }_{(\mathbf {x},\mathbf {y})}, \{\varvec{\mathcal {F}}_{t}\}_{t\in \llbracket 0 ; N_{0}\rrbracket },\{ \mathbf {X}_{t},\mathbf {Y}_{t} \}_{t\in \llbracket 0 ; N_{0}\rrbracket },\tau \Big )\) belongs to \(\Lambda \). Recalling that \(\mathbf {Z}_{0}=0\) and that \(\mathbf {Z}_{t}=I_{\{\mathbf {A}_{t-1}=1\}}+\mathbf {Z}_{t-1}I_{\{\mathbf {A}_{t-1}=0\}}\) for any \(t\in \llbracket 1 ; N_{0} \rrbracket \), we get that \(\{\tau =t\}=\{\mathbf {Z}_{t}=0\}\cup \{\mathbf {A}_{t}=1\}\) for \(t\in \llbracket 0 ; N_{0}-1 \rrbracket \) and \(\{\tau =N_{0}\}=\{\mathbf {Z}_{N_{0}}=0\}\). Now, by using the definitions of H and h it follows that

$$\begin{aligned}&\sum _{t=0}^{N_{0}-1} \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ H(\mathbf {X}_{t},\mathbf {Y}_{t},\mathbf {Z}_{t},\mathbf {A}_{t}) \big ] +\mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})}\big [ h(\mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}},\mathbf {Z}_{N_{0}}) \big ] \nonumber \\&\quad = \sum _{t=0}^{N_{0}-1} \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ \mathbf {H}(\mathbf {X}_{t},\mathbf {Y}_{t}) I_{\{(\mathbf {Z}_{t},\mathbf {A}_{t})=(0,1)\}}\big ] + \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ \mathbf {H}(\mathbf {X}_{N_{0}},\mathbf {Y}_{N_{0}}) I_{\{\mathbf {Z}_{N_{0}}=1\}}\big ] \nonumber \\&\quad = \mathbf {E}^{\pi }_{(\mathbf {x},\mathbf {y})} \big [ \mathbf {H}(\mathbf {X}_{\tau },\mathbf {Y}_{\tau }) \big ], \end{aligned}$$

implying that \(\varvec{\mathcal {H}}(\mathbf {x},\mathbf {y},\ell )=\varvec{\mathcal {H}}_{\mathbf {M}}(\mathbf {x},\mathbf {y},\pi )\) and showing the second claim. \(\square \)

Proof of Theorem 1

From Theorem 5.3.2 in Bäuerle and Rieder (2011) we get that \(\overline{\varvec{\mathcal {H}}}_{\mathbf {M}}(\mathbf {x},\mathbf {y})=\overline{\varvec{\mathcal {H}}}_{\mathcal {M}}(\mathbf {x},\mathbf {y})\) and so from Proposition 2, it follows that \(\overline{\varvec{\mathcal {H}}}(\mathbf {x},\mathbf {y})=\overline{\varvec{\mathcal {H}}}_{\mathcal {M}}(\mathbf {x},\mathbf {y})\) giving the first equality in Eq. (8). Under Assumptions (A1), B and D, the hypotheses of Theorems 5.3.3 in Bäuerle and Rieder (2011) are satisfied. Therefore, it follows that the Bellman equation \(\{v_{k}\}_{ k \in \llbracket 0 ; N_{0}\rrbracket }\) for the model \(\mathcal {M}\) is given by

$$\begin{aligned} {\left\{ \begin{array}{ll} v_{0}(\theta ,y,z) = h(\theta ,y,z) \\ v_{k}(\theta ,y,z) = \max _{a\in \mathbb {A}} \big \{ H(\theta ,y,z,a)+Qv_{k-1}(\theta ,y,z,a) \big \} \end{array}\right. } \end{aligned}$$

and satisfies \(\overline{\varvec{\mathcal {H}}}_{\mathcal {M}}(\mathbf {x},\mathbf {y})=v_{N_{0}}(\delta _{\mathbf {x}},\mathbf {y},0)\). However, since \(h(\theta ,y,1)=H(\theta ,y,1)=0\), it easy to show that \(v_{k}(\theta ,y,1)=0\) for any \((\theta ,y)\in \mathcal {P}(\mathbb {X})\times \mathbb {Y}\) and \(k\in \llbracket 0 ; N_{0}\rrbracket \). Moreover, by using the definitions of hH and the kernel Q we obtain that \(v_{0}(\theta ,y,0) = \mathbf {H}(\theta ,y)\) and

$$\begin{aligned} v_{k}(\theta ,y,0)&= \max _{a\in \mathbb {A}} \big \{ H(\theta ,y,0,a)+Qv_{k-1}(\theta ,y,0,a) \big \} \nonumber \\&= \max \big \{ \mathbf {H}(\theta ,y),Sv_{k-1}(\theta ,y) \big \} = \mathfrak {B}v_{k-1}(\theta ,y) \end{aligned}$$

for any \((\theta ,y)\in \mathcal {P}(\mathbb {X})\times \mathbb {Y}\) and \(k\in \llbracket 1 ; N_{0}\rrbracket \) implying that \(\overline{\varvec{\mathcal {H}}}_{\mathcal {M}}(\mathbf {x},\mathbf {y})=\mathfrak {B}^{N_{0}}\mathbf {H}(\delta _{\mathbf {x}},\mathbf {y})\) and giving the second equality in Eq. (8). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Saporta, B., Dufour, F. & Nivot, C. Partially observed optimal stopping problem for discrete-time Markov processes. 4OR-Q J Oper Res 15, 277–302 (2017). https://doi.org/10.1007/s10288-016-0337-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10288-016-0337-8

Keywords

Mathematics Subject Classification

Navigation