Asymptotically optimal Bayesian sequential change detection and identification rules

Dayanik, Savas; Powell, Warren B.; Yamazaki, Kazutoshi

doi:10.1007/s10479-012-1121-6

Asymptotically optimal Bayesian sequential change detection and identification rules

Open access
Published: 12 April 2012

Volume 208, pages 337–370, (2013)
Cite this article

Download PDF

You have full access to this open access article

Annals of Operations Research Aims and scope Submit manuscript

Asymptotically optimal Bayesian sequential change detection and identification rules

Download PDF

Savas Dayanik¹,
Warren B. Powell² &
Kazutoshi Yamazaki³

750 Accesses
15 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 07 October 2021

This article has been updated

Abstract

We study the joint problem of sequential change detection and multiple hypothesis testing. Suppose that the common distribution of a sequence of i.i.d. random variables changes suddenly at some unobservable time to one of finitely many distinct alternatives, and one needs to both detect and identify the change at the earliest possible time. We propose computationally efficient sequential decision rules that are asymptotically either Bayes-optimal or optimal in a Bayesian fixed-error-probability formulation, as the unit detection delay cost or the misdiagnosis and false alarm probabilities go to zero, respectively. Numerical examples are provided to verify the asymptotic optimality and the speed of convergence.

Detection and identification of changes of hidden Markov chains: asymptotic theory

Article Open access 06 October 2021

Asymptotically optimal pointwise and minimax quickest change-point detection for dependent data

Article 14 October 2016

A Multiple Hypothesis Testing Approach to Detection Changes in Distribution

Article 01 April 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Sequential change detection and identification refers to the joint problem of sequential change point detection (CPD) and sequential multiple hypothesis testing (SMHT), where one needs to detect, based on a sequence of observations, a sudden and unobservable change as early as possible and identify its cause as accurately as possible. In a Bayesian setup, this problem boils down to optimally solving the trade-off between the expected detection delay and the false alarm and misdiagnosis costs.

The sequential analysis methods such as Wald’s (1947) sequential probability ratio test and Page’s (1954) cumulative sum were developed for the quality control problems, in which a production process may suddenly get out of control at some unknown and unobservable time and one needs to detect the failure time as soon as possible. However, it is more realistic to assume that a production process consists of multiple processing units, each of which is prone to failure, and one needs to detect the earliest failure time and accurately identify the failed component.

In economics and biosurveillance, elevated concerns about financial crises and bioterrorism have increased the importance of early warning systems (see Bussiere and Fratzscher 2006 and Heffernan et al. 2004); structural changes need to be detected in time series such as the S&P 500 index for better financial risk management and over-the-counter medication sales for early signs of a possible disease outbreak. There are a number of potential causes of structural changes, and one needs to identify the cause of the change in order to take the most appropriate countermeasures. Although most existing structural change detection methods employ retrospective tests on historical data, online tests are more appropriate in these settings because time-inhomogeneous data arrive sequentially, and the changes must be identified as soon as possible after they occur.

In this paper, we focus on two online Bayesian formulations and propose two computationally efficient and asymptotically optimal strategies inspired by the separate asymptotic analyses of SMHT (Baum and Veeravalli 1994; Dragalin et al. 1999; Dragalin et al. 2000) and CPD (Tartakovsky and Veeravalli 2004).

We suppose that a system starts in regime 0 and suddenly switches at some unknown and unobservable disorder time θ to one of finitely many regimes $\mu\in\mathcal{M}:= \{1,\ldots,M \}$. One observes a sequence of random variables X=(X_n)_n≥1 which are, conditionally on θ and μ, independent and distributed according to some cumulative distribution function F₀ before time θ and F_μ at and after time θ; namely,

$$\underbrace{X_1, \ldots, X_{\theta-1}}_{F_0\mbox{\scriptsize -}\mathrm {distributed}},\underbrace{X_\theta, X_{\theta+1} \ldots}_{F_\mu\mbox{\scriptsize -}\mathrm{distributed}}.$$

The objective is to detect the change as quickly as possible, and at the same time to identify the new regime μ as accurately as possible. More precisely, we want to find a strategy (τ,d), consisting of a pair of detection timeτ and diagnosis ruled, in order to minimize the expected detection delay time and the false alarm and misdiagnosis probabilities. This paper studies the following formulations:

(i)
In the minimum Bayes risk formulation, one minimizes a Bayes risk which is the sum of the expected detection delay time and the false alarm and misdiagnosis probabilities.
(ii)
In the Bayesian fixed-error-probability formulation, one minimizes the expected detection delay time subject to some small upper bounds on the false alarm and misdiagnosis probabilities.

The precise formulations are given as Problems 1 and 2, respectively, on p. 5 in Sect. 2. A majority of practitioners prefer working with the Bayesian fixed-error-probability formulation because the hard constraints on error probabilities are easier to set up and understand than the costs of detection delay, false alarm, and misdiagnosis in the minimum Bayes risk formulation. The Bayesian fixed-error-probability formulation is often solved by means of its Lagrange relaxation, which turns out to be a minimum Bayes risk problem where the costs are the Lagrange multipliers (or shadow prices) of the false alarm and misdiagnosis constraints. We discuss in more detail the correspondence between the optimal solutions of these two formulations in Sect. 2. Another reason for solving the minimum Bayes risk formulation is that it allows the expert opinions about the risks to be naturally included in the solution. Therefore, we decide to study both formulations in this paper.

Finding the optimal solutions under both formulations requires intensive computations. For example, the minimum Bayes risk formulation reduces to an optimal stopping problem as shown by Dayanik et al. (2008) (see also Lovejoy (1991), White (1991), Borkar (1991), and Runggaldier (1991) for general solution methods available for the partially observed Markov decision processes and Burnetas and Katehakis (1997) for adaptive control for Markov decision processes), and the optimal strategy is to stop as soon as the posterior probability process $\Pi=(\Pi_{n}^{(0)},\ldots,\Pi_{n}^{(M)})_{n \geq0}$, where

$$\Pi_n^{(i)} :=\mathbb{P}\{\mbox {The system is in regime } i\mbox{ at time } n \mid X_1,\ldots,X_n \} \quad \mbox {for every } i\in\mathcal{M}_0 \mbox{ and } n \geq0,$$

with $\mathcal{M}_{0} := \mathcal{M}\cup\{0\}$, enters some suitable region of the M-dimensional probability simplex.

Figure 1(a) illustrates the optimal stopping regions for a typical problem with M=2. The process Π starts in the lower-left corner, which corresponds to the “no change” state or regime 0. As observations are made, it progresses through the light-colored region, where raising a change-alarm is suboptimal. If it enters the shaded region in the top corner, then declaring a regime switch from 0 to 1 is optimal. If it enters the shaded region in the lower-right corner, then declaring a regime switch from 0 to 2 is optimal. The first hitting time to one of those shaded regions and the corresponding estimate of the new regime minimize the costs for the minimum Bayes risk formulation.

These shaded regions can in principle be found by dynamic programming methods; see, for example, Derman (1970), Puterman (1994) and Bertsekas (2005). However, those methods are generally computationally intensive due to the curse of dimensionality. The state space increases exponentially in the number of regimes, and finding an optimal strategy by using the classical dynamic programming methods tends to be practically impossible in higher dimensions.

Our goal is to obtain a practical solution that is both near-optimal and computationally feasible. We propose two simple and asymptotically optimal strategies by approximating the optimal stopping regions with simpler shapes. In particular, our strategy for the minimum Bayes risk formulation raises a change alarm and estimates the new regime when the posterior probability of at least one of the change types exceeds some predetermined threshold for the first time. In Fig. 1(b), the stopping regions of this strategy correspond to the union of the triangles in the two corners. Those triangular regions determine a stopping and selection strategy, and hence the problem is simplified to designing the triangular regions to minimize the risks.

We give an asymptotic analysis of the change detection and identification problem. The SMHT and CPD are the special cases. The asymptotic optimality of our strategies can be proved using nonlinear renewal theory after casting the log-likelihood-ratio (LLR) processes

$$\Lambda_n(i,j) := \log\frac{\Pi_n^{(i)}}{\Pi_n^{(j)}}, \quad n\geq1, i \in \mathcal{M}, j \in\mathcal{M}_0\setminus\{i\}, $$

(1)

as the sum of suitable random walks and some slowly-changing stochastic processes. We show that the r-quick convergence of Lai (1977) for an appropriate subset of the LLR processes in (1) is a sufficient condition for asymptotic optimality. We also pursue higher-order asymptotic approximations for the minimum Bayes risk formulation as inspired by Baum and Veeravalli (1994)’s work for SMHT.

The remainder of the paper is organized as follows. We formulate the Bayesian sequential change detection and identification problem in Sect. 2. In Sect. 3, we propose two sequential change detection and identification strategies and obtain sufficient conditions for their asymptotic optimality in terms of the LLR processes. In Sect. 4 we study certain convergence properties of the LLR processes that are required to implement the asymptotically optimal strategies. In Sect. 5, we obtain higher-order asymptotic approximations for the minimum Bayes risk formulation using nonlinear renewal theory. Section 6 concludes with numerical examples. The proofs and some auxiliary results are presented in the appendix.

2 Problem formulations

Consider a probability space $(\Omega, \mathcal{F}, \mathbb{P})$ hosting a stochastic process X=(X_n)_n≥1 taking values in some measurable space $(E,\mathcal {E})$. Let θ:Ω↦{0,1,…} and $\mu: \Omega\mapsto\mathcal{M}:= \{ 1,\dots, M \}$ be independent random variables defined on the same probability space with the probability distributions

$$\mathbb{P}\{ \theta= t \} = \left\{ \everymath{\displaystyle}\begin{array}{l@{\quad}l}p_0, &\mbox {if } t=0\cr\noalign{\vspace{3pt}}(1-p_0) (1-p)^{t-1}p, & \mbox {if } t\geq1\end{array}\right\}\quad \mbox {and} \quad\nu_i = \mathbb{P}\{\mu= i \} > 0, \quad i \in\mathcal{M}$$

for some known constants p₀∈[0,1), p∈(0,1), and positive constants $\nu= (\nu_{i})_{i \in\mathcal{M}}$. The random variable θ has an exponential tail with

$$\varrho:= -\lim_{t \uparrow\infty} \frac{\log\mathbb{P}\{ \theta \geq t+1 \}}{t} =\bigl| \log(1-p) \bigr|. $$

(2)

Given μ=i and θ=t, the random variables X₁,X₂,… are conditionally independent, and (X_n)_{1≤n≤t−1} and (X_n)_n≥t have common conditional probability density functions f₀ and f_i, respectively, with respect to some σ-finite measure m on $(E,\mathcal {E})$; namely,

for every $i \in\mathcal{M}$, t≥1, n≥1, and $(E_{1}\times\cdots \times E_{n}) \in\mathcal{E}^{n}$. The following assumptions remove certain trivial cases; see Remark 4.10 below.

Assumption 2.1

For every $i\in\mathcal{M}_{0}$ and $j \in\mathcal{M}_{0} \setminus\{i\}$, 0<f_i(X₁)/f_j(X₁)<∞ a.s., and F_i and F_j are distinguishable; $\int_{\{x\in E: f_{i}(x)\neq f_{j}(x)\}} f_{i}(x)m(\mathrm {d}x) > 0$.

Let $\mathbb{F}= (\mathcal{F}_{n})_{n \geq0}$ denote the filtration generated by X; namely, $\mathcal{F}_{0} = \{\varnothing, \Omega\}$ and $\mathcal {F}_{n} =\sigma(X_{1},\dots,X_{n})$ for every n≥1. A sequential change detection and identification rule (τ,d) is a pair consisting of an $\mathbb{F}$-stopping time τ (in short, $\tau\in \mathbb{F}$) and a random variable $d: \Omega\mapsto\mathcal{M}$ that is measurable with respect to the observation history $\mathcal{F}_{\tau}$ up to the stopping time τ (namely, $d \in\mathcal{F}_{\tau}$). Let

$$\Delta:= \bigl\{ (\tau,d) : \tau\in\mathbb{F}\mbox{ and } d \in \mathcal{F}_\tau \mbox{ is an } \mathcal{M}\mbox{-valued random variable} \bigr\}$$

be the collection of all sequential change detection and identification rules. The objective is to find a strategy (τ,d) that solves optimally the trade-off between the mth moment

$$D^{(m)}(\tau) := \mathbb{E}\bigl[(\tau- \theta)_+^m \bigr], $$

(3)

of the detection delay time (τ−θ)₊ for some m≥1 and the false alarm and misdiagnosis probabilities

(4)

(5)

Here and for the rest of the paper, x₊:=max(x,0) and x₋:=max(−x,0) for any x∈ℝ.

We formulate the optimal trade-offs between (3)–(5) as in the following two related problems:

Problem 1

(Minimum Bayes risk formulation)

For fixedm≥1, c>0, and strictly positive constants$a=(a_{ji})_{i \in\mathcal{M}, j \in\mathcal{M}_{0} \setminus\{i\}}$, calculate the minimum Bayes risk inf_(τ,d)∈ΔR^(c,a,m)(τ,d), where

$$R^{(c,a,m)}(\tau,d) := c\, D^{(m)}(\tau) + \sum _{i \in\mathcal{M}} \sum_{j\in \mathcal{M}_0 \setminus\{i\}}a_{ji} R_{ji}(\tau,d) $$

(6)

is the expected sum of all risks arising from the detection delay time, false alarm and misdiagnosis, and find a strategy (τ^∗,d^∗)∈Δ which attains the minimum Bayes risk, if such a strategy exists.

Problem 2

(Bayesian fixed-error-probability formulation)

For fixed positive constantsm≥1 and$\overline{R} = (\overline{R}_{ji})_{i \in\mathcal{M}, j \in \mathcal{M}_{0}\setminus\{i\}}$, calculate the smallestmth moment$\inf_{(\tau,d) \in\Delta(\overline{R})} D^{(m)}(\tau)$of detection delay time among all decision rules in

$$\Delta(\overline{R}) := \bigl\{ (\tau,d) \in\Delta: R_{ji}(\tau,d)\leq\overline{R}_{ji}, i \in\mathcal{M}, \, j \in\mathcal{M}_0\setminus \{i\} \bigr\}$$

with the same predetermined upper bounds on false alarm and misdiagnosis probabilities, and find a strategy$(\tau^{*},d^{*})\in \Delta(\overline{R})$which attains the minimum, if such a strategy exists.

Problem 1 can in principle be solved optimally by stochastic dynamic programming. A standard way to solve Problem 2 optimally is by working through its Lagrange relaxation, which turns out to be an instance of Problem 1, where a_ji serves as the Lagrange multiplier of the constraint $R_{ji}(\tau,d)\leq\overline{R}_{ji}$ for every $i \in\mathcal{M}$ and $j \in \mathcal{M}_{0}\setminus\{i\}$. Indeed, if for some a, a decision rule (τ^∗,d^∗)∈Δ attains the minimum Bayes risk inf_(τ,d)∈ΔR^(c,a,m)(τ,d) and if $R_{ji}(\tau^{*},d^{*}) = \overline{R}_{ji}$ for every $i \in\mathcal{M}, \, j \in\mathcal{M}_{0} \setminus\{i\}$, then for every $(\tau,d) \in\Delta(\overline{R}) \subseteq\Delta$,

$$c\, D^{(m)}\bigl(\tau^*\bigr) + \sum_{i \in\mathcal{M}}\sum_{j \in\mathcal{M}_0 \setminus \{i\}} a_{ji} R_{ji}\bigl(\tau^*,d^*\bigr) \leq c\, D^{(m)}(\tau) + \sum _{i\in\mathcal{M}} \sum_{j \in\mathcal{M}_0 \setminus\{i\}}a_{ji} R_{ji}(\tau,d)$$

implies that $c (D^{(m)}(\tau^{*})-D^{(m)}(\tau)) \leq\sum_{i \in \mathcal{M}}\sum_{j \in\mathcal{M}_{0} \setminus\{i\}} a_{ji}(R_{ji}(\tau,d)-R_{ji}(\tau^{*},d^{*})) =\sum_{i \in\mathcal{M}} \sum _{j \in \mathcal{M}_{0} \setminus\{i\}} a_{ji} (R_{ji}(\tau,d)-\overline{R}_{ji})\leq0$, and hence, the same (τ^∗,d^∗) rule is also optimal for the Bayesian fixed-error-probability formulation. The asymptotically optimal decision rules proposed for Problems 1 and 2 will likewise be related.

On the one hand, a majority of practitioners favor the formulation in Problem 2 over that in Problem 1, because the hard constraints $R_{ji}(\tau,d) \leq\overline{R}_{ji}, i \in\mathcal{M}, \, j \in \mathcal{M}_{0}\setminus\{i\}$ in Problem 2 are easier to set up and to understood than the (shadow) costs c and a of decision delay, false alarm, and misdiagnosis. On the other hand, some practitioners still find Problem 1 useful to incorporate expert opinions.

As we introduced in Sect. 1, let $\Pi=(\Pi_{n}^{(0)},\ldots,\Pi_{n}^{(M)})_{n \geq0}$ be the posterior probability process defined by

$$\Pi_n^{(0)} := \mathbb{P}\{ \theta> n \vert \mathcal{F}_n \} \quad\mbox{and} \quad\Pi_n^{(i)} := \mathbb{P}\{ \theta\leq n, \mu=i \vert \mathcal{F}_n \}, \quad i \in \mathcal{M}, n\geq0.$$

Dayanik et al. (2008) proved that Π is a Markov process satisfying

$$\Pi_n^{(i)} = \frac{\alpha_n^{(i)}(X_1,\ldots,X_n)}{\sum_{j \in \mathcal{M}_0} \alpha^{(j)}_n(X_1,\ldots,X_n)}, \quad i\in \mathcal{M}_0,$$

where $\alpha_{n}^{(i)} (x_{1},\ldots,x_{n})$ equals

$$\left\{ \everymath{\displaystyle}\begin{array}{l@{}l@{}l} &(1-p_0) (1-p)^n\prod_{l=1}^n f_0(x_l),&\quad i = 0\\&p_0 \nu_i \prod_{k=1}^nf_i(x_k) + (1-p_0)p \nu_i\sum_{k=1}^n (1-p)^{k-1} \prod _{l=1}^{k-1} f_0(x_l)\prod_{m=k}^n f_i(x_m), &\quad i \in\mathcal{M}\end{array}\right\}$$

for every n≥1 and (x₁,…,x_n)∈Eⁿ, and

$$\alpha^{(i)}_n(x_1,\ldots,x_n)m(\mathrm {d}x_1)\cdots m(\mathrm {d}x_n) = \left\{ \everymath{\displaystyle}\begin{array}{l@{}l@{}l} &\mathbb{P}\{\theta>n, X_1\in \mathrm {d}x_1,\ldots,X_n \in \mathrm {d}x_n\}, &\quad i=0\\&\mathbb{P}\{\theta\leq n, \mu= i, X_1\in \mathrm {d}x_1,\ldots,X_n \in \mathrm {d}x_n\}, &\quad i\in\mathcal{M}\end{array}\right\}.$$

Remark 2.2

Assumption 2.1 implies that $0< \Pi_{n}^{(i)} <1$ a.s. for every finite n≥1 and $i\in\mathcal{M}$.

Let us denote by $\alpha_{n}^{(i)}$ the random variable $\alpha_{n}^{(i)}(X_{1},\ldots,X_{n})$ for every n≥0. Then the LLR processes defined in (1) can be written as

$$\Lambda_n(i,j) = \log\frac{\alpha_n^{(i)}}{\alpha^{(j)}_n}, \quad i \in\mathcal{M}, j \in \mathcal{M}_0 \setminus\{i\}, n \geq1. $$

(7)

In our analyses, it is often very convenient to work under the conditional probability measures:

(8)

(9)

defined for every $i \in\mathcal{M}$, n≥1, $(E_{1} \times\cdots \times E_{n}) \in \mathcal {E}^{n}$. Let $\mathbb{E}_{i}$ and $\mathbb{E}_{i}^{(t)}$, respectively, be the expectations with respect to ℙ_i and $\mathbb{P}^{(t)}_{i}$. Under $\mathbb{P}_{i}^{(0)}$ and $\mathbb{P}_{i}^{(\infty)}$, the random variables X₁,X₂,… are independent and have common probability density functions f_i(⋅) and f₀(⋅), respectively. We denote by ℙ^(∞) any $\mathbb {P}_{i}^{(\infty)}$ for any $i \in\mathcal{M}$. The LLR processes in (1) or (7) play a role in changing probability measures as the next lemma shows.

Lemma 2.3

(Change of measure)

For every$i \in\mathcal{M}$, an$\mathbb{F}$-stopping timeτ, and an$\mathcal{F}_{\tau}$-measurable eventF,

The next proposition introduces the key risk components and its proof follows directly from Lemma 2.3 after setting $F:=\{d=i\} \in\mathcal{F}_{\tau}$ for every $i\in\mathcal{M}$.

Proposition 2.4

For every strategy (τ,d)∈Δ, c>0, m≥1 and strictly positive constants$a=(a_{ji})_{i\in\mathcal{M},j\in\mathcal {M}\setminus\{i\}}$, we can rewrite (4)–(6) as

where for every$i \in\mathcal{M}$

(10)

(11)

(12)

(13)

Here (10)–(12) correspond to the conditional risks given μ=i, written in terms of the process $G_{i}^{(a)} (n)$, which is a linear combination of the exponents of the LLR processes and serves as the Radon-Nikodym derivative.

Remark 2.5

In the remainder, we prove a number of results in the ℙ_i-a.s. sense for given $i \in\mathcal{M}$. These also hold automatically $\mathbb{P}_{i}^{(t)}$-a.s. for every t≥1. Indeed, because ℙ{θ<∞}=1, ℙ{θ=t}>0 for every t≥1 and $\mathbb{P}_{i} (F) = \sum_{t=0}^{\infty}\mathbb {P}\{ \theta = t \} \mathbb{P}_{i}^{(t)} (F)$ for every $F \in\mathcal{F}$, ℙ_i(F)=1 implies $\mathbb{P}^{(t)}_{i}(F)=1$ for every t≥1.

3 Asymptotically optimal sequential detection and identification strategies

We will introduce two strategies that are computationally efficient and asymptotically optimal. The first strategy raises an alarm as soon as the posterior probability of the event that at least one of the change types occurred exceeds some suitable threshold, and is shown to be asymptotically optimal for Problem 1. The second strategy is its variant expressed in terms of the LLR processes and is shown to be asymptotically optimal for Problem 2. The asymptotic performance analyses of both rules depend on the same convergence results of the LLR processes. The proofs can be conducted in parallel and almost simultaneously both for Problem 1 and for Problem 2 because the detection times can be approximated by the first hitting times of certain processes that share the same asymptotic properties.

Definition 3.1

((τ_A,d_A)-strategy for the minimum Bayes risk problem)

For every set $A = (A_{i})_{i \in\mathcal{M}}$ of strictly positive constants, let (τ_A,d_A) be the strategy defined by

(14)

Define the logarithm of the odds-ratio processes as

$$\Phi_n^{(i)} := \log\frac{\Pi_n^{(i)}}{1-\Pi_n^{(i)}} = - \log\biggl[\sum_{j \in\mathcal{M}_0 \setminus \{i\}} \exp\bigl(-\Lambda_n(i,j)\bigr) \biggr] , \quad i \in\mathcal{M}, n \geq1. $$

(15)

Then (14) can be rewritten as

$$\tau_A^{(i)} = \inf\biggl\{ n \geq1 : \frac{1-\Pi_n^{(i)}}{\Pi_n^{(i)}}< A_i \biggr\} = \inf\bigl\{n \geq1: \Phi_n^{(i)}> - \log A_i \bigr\}, \quad i \in\mathcal{M}. $$

(16)

The values of A determine the sizes of the polyhedrons that approximate the original optimal stopping regions, e.g., the triangular regions when M=2 as in Fig. 1(b), and need to be determined so as to minimize the Bayes risk.

Definition 3.2

((υ_B,d_B)-strategy for the Bayesian fixed-error-probability formulation)

For every set $B = (B_{i})_{i \in\mathcal{M}}$ and $B_{i} = (B_{ij})_{j \in\mathcal{M}_{0} \setminus\{i\}}$, $i \in \mathcal{M}$ of strictly positive constants, let (υ_B,d_B) be the strategy defined by

(17)

We show that, after choosing suitable A and B, the strategy (τ_A,d_A) is asymptotically optimal for Problem 1 as c goes to zero, and the strategy (υ_B,d_B) is asymptotically optimal for Problem 2 as

$$\|\overline{R}\| := \max_{i \in\mathcal{M}, j \in\mathcal{M}_0\setminus\{i\}}\overline{R}_{ji}$$

goes to zero—while $\overline{R}_{ji}/\overline{R}_{ki}$ for every $j,k\in\mathcal{M}_{0}\setminus\{i\}$ remains bounded away from zero in the sense that

$$\frac{\min_{j \in\mathcal{M}_0 \setminus\{i\}} \overline{R}_{ji}}{\max_{j \in\mathcal{M}_0 \setminus\{i\}} \overline{R}_{ji}} > k_i\quad \mbox {for every }i \in\mathcal{M} $$

(18)

for any strictly positive constants $k = (k_{i})_{i \in\mathcal {M}}$—and this limit mode will still be denoted by “$\|\overline{R}\|\downarrow0$” for brevity.

More precisely, we find functions A(c) of the unit sampling cost c in Problem 1 and $B(\overline{R})$ of the upper bounds $(\overline{R}_{ji})_{i\in\mathcal{M},j\in\mathcal {M}_{0}\setminus\{i\}}$ on the false alarm and misdiagnosis probabilities in Problem 2 so that (τ_A(c),d_A(c))∈Δ for every c>0, $(\upsilon_{B(\overline{R})},d_{B(\overline{R})}) \in \Delta(\overline{R})$ for every $\overline{R}>0$, and

(19)

(20)

for every fixed m≥1 and every set $a=(a_{ji})_{i\in\mathcal {M}, j\in \mathcal{M}_{0}\setminus\{i\}}$ of strictly positive constants. Here “x_γ∼y_γ as γ→γ₀” means $\lim_{\gamma\rightarrow \gamma_{0}} {x_{\gamma}} / {y_{\gamma}} = 1$. In fact, we obtain results stronger than (19)–(20); for every $i\in\mathcal{M}$

(21)

(22)

Remark 3.3

For all $i \in\mathcal{M}$, let $\overline{B}_{i} := \max_{j \in \mathcal{M}_{0}\setminus\{i\}} B_{ij}$, $\underline{B}_{i} := \min_{j \in\mathcal{M}_{0}\setminus\{i\}} B_{ij}$ and $\Psi^{(i)}_{n} := \min_{j \in\mathcal{M}_{0}\setminus\{i\}} \Lambda_{n}(i,j)$, n≥1. Then,

$$\underline{\upsilon}_B^{(i)} \leq\upsilon_B^{(i)}\leq\overline{\upsilon}_B^{(i)} \quad \mbox {for every } i\in \mathcal{M} $$

(23)

where $\underline{\upsilon}_{B}^{(i)} := \inf\{ n \geq1:\Psi^{(i)}_{n} > - \log\overline{B}_{i} \}$ and $\overline{\upsilon}_{B}^{(i)} := \inf\{ n \geq1: \Psi^{(i)}_{n} >- \log\underline{B}_{i} \}$. Notice that (15) implies $\Phi_{n}^{(i)} \leq\Lambda_{n}(i,j)$ for every n≥1 and $j \in\mathcal{M}_{0} \setminus\{i\}$, and hence

$$\Psi^{(i)}_n \geq\Phi_n^{(i)}, \quad n\geq1. $$

(24)

3.1 Convergence of false alarm and misdiagnosis probabilities and detection delay

As c and $\overline{R}$ decrease to zero in Problems 1 and 2, respectively, we expect that the optimal stopping regions shrink, or equivalently the values of A and B should decrease. We therefore study the asymptotic behaviors of the false alarm and misdiagnosis probabilities and the change detection time as

$$\| A \| := \max_{i \in\mathcal{M}} A_i \quad\mbox{and} \quad\| B\| := \max_{i \in\mathcal{M}, j \in\mathcal{M}_0 \setminus\{i\}} B_{ij}$$

go to zero, and then adapt their values as functions of c and $\overline{R}$ so as to attain asymptotically optimal strategies. Here in concordance with (18) the limits $\overline{B}_{i} \downarrow0$ for every $i \in\mathcal{M}$ are taken such that

$$ {\underline{B}_i}/{\overline{B}_i}= \frac{\min_{j\in\mathcal{M}_0\setminus\{i\}} B_{ij}}{\max_{j\in\mathcal{M}_0 \setminus\{i\}}B_{ij}} \geq b_i \quad \mbox {for some constants$0<b_{i}\leq1$}.$$

(25)

We first study the asymptotic behaviors of the false alarm and misdiagnosis probabilities. The upper bounds can be obtained by a direct application of Proposition 2.4.

Proposition 3.4

(Bounds on false alarm and misdiagnosis probabilities)

(i) For every fixed$A = (A_{i})_{i \in\mathcal{M}}$and$a = (a_{ji})_{i \in \mathcal{M}, j \in\mathcal{M}_{0}\setminus\{i\}}$, we have$R_{i}^{(a)}(\tau_{A},d_{A}) \leq \overline{a}_{i} A_{i}$for every$i \in\mathcal{M}$, where$\overline {a}_{i} :=\max_{j \in\mathcal{M}_{0}\setminus\{i\}} a_{ji}$andR_ji(τ_A,d_A)≤ν_iA_i≤ν_i∥A∥ for every$i\in\mathcal{M}$and$j\in \mathcal{M}_{0}\setminus\{i\}$.

(ii) For every$B =(B_{ij})_{i \in\mathcal{M}, j \in\mathcal{M}\setminus\{i\}}$, we haveR_ji(υ_B,d_B)≤ν_iB_ijfor every$i \in\mathcal{M}$and$j \in \mathcal{M}_{0} \setminus\{i\}$.

Corollary 3.5

(i) $\max_{i \in\mathcal{M}} R^{(a)}_{i} (\tau_{A},d_{A})\downarrow0$as ∥A∥↓0, (ii) $\max_{i\in\mathcal{M},j\in\mathcal {M}_{0} \setminus\{i\}} R_{ji}(\upsilon_{B}, d_{B}) \downarrow0$as ∥B∥↓0.

Proposition 3.6

Fix$i \in \mathcal{M}$. We have ℙ_i-a.s. (i) $\tau_{A}^{(i)} \uparrow \infty$asA_i↓0, (ii) τ_A↑∞ as ∥A∥↓0, (iii) $\upsilon_{B}^{(i)} \uparrow\infty$as$\overline{B}_{i}\downarrow0$, and (iv) υ_B↑∞ as ∥B∥↓0.

The asymptotic behavior of the detection delay is closely related to the convergence of the average increment Λ_n(i,j)/n. According to the next proposition, Λ_n(i,j)/n converges ℙ_i-a.s. as n↑∞ to some strictly positive constant for every $i\in\mathcal{M}$ and $j \in\mathcal{M}_{0}\setminus\{i\}$. The proof of Proposition 3.7 is deferred to Sect. 4, where the limiting values are analytically expressed in terms of the Kullback-Leibler divergence between the alternative probability measures.

Proposition 3.7

For every$i \in \mathcal{M}$and$j \in\mathcal{M}_{0} \setminus\{i\}$, we have ℙ_i-a.s. Λ_n(i,j)/n→l(i,j) asn↑∞ for some strictly positive constantl(i,j).

Let us fix any $i \in\mathcal{M}$. We show that, for small values of A and B, the stopping times $\tau_{A}^{(i)}$ and $\upsilon_{B}^{(i)}$ in (14) and (17) are essentially determined by the process Λ(i,j(i)), where

(26)

and ℙ_i-a.s. $\Lambda_{n}(i,j(i))/n \approx\Phi_{n}^{(i)}/n\approx \Psi^{(i)}_{n}/n \approx l(i)$ for sufficiently large n as the next proposition suggests.

Proposition 3.8

For every$i\in\mathcal{M}$, we have ℙ_i-a.s. (i) $\Phi _{n}^{(i)}/n \rightarrow l(i)$and (ii) $\Psi_{n}^{(i)}/n \rightarrow l(i)$asn↑∞.

The proof of part (i) follows from Proposition 3.7, and part (ii) follows from part (i) and Baum and Veeravalli (1994, Lemma 5.2). Proposition 3.8 implies the following convergence results.

Lemma 3.9

For every$i \in\mathcal{M}$and any$j(i) \in \arg \min _{j\in \mathcal{M}_{0}\setminus \{i\}} l(i,j)$, we have ℙ_i-a.s.

$$\everymath{\displaystyle}\begin{array}{r@{\quad}l@{\qquad}r@{\quad}l}(\mathrm{i}) &-\frac{\tau_A^{(i)}}{\log A_i} \stackrel{A_i\downarrow0}{\longrightarrow} \frac{1}{l(i)}, &(\mathrm{ii}) & - \frac{(\tau_A^{(i)}-\theta)_+}{\log A_i} \stackrel{A_i \downarrow0}{\longrightarrow} \frac{1}{l(i)},\cr\noalign{\vspace{3pt}}(\mathrm{iii}) &-\frac{\upsilon_B^{(i)}}{\log B_{ij(i)}} \stackrel{\overline{B}_i \downarrow0}{\longrightarrow} \frac{1}{l(i)}, &(\mathrm{iv}) &-\frac{(\upsilon_B^{(i)}-\theta)_+}{\log B_{ij(i)}}\stackrel{\overline{B}_i\downarrow0}{\longrightarrow} \frac{1}{l(i)}.\end{array}$$

Remark 3.10

We shall always assume that 0<B_ij<1 or −∞<logB_ij<0 for all $i \in\mathcal{M}$ and $j\in\mathcal{M}_{0}\backslash\{i\}$ as we are interested in the limits of certain quantities as ∥B∥↓0. Because (25) implies that $b_{i} \overline{B}_{i} \leq\underline{B}_{i} \leq B_{ij} \leq \overline{B}_{i}$, we have $1 \leq\frac{-\log B_{ij}}{-\log\overline {B}_{i}} \leq\frac{-\log \underline{B}_{i}}{-\log\overline{B}_{i}} \leq\frac{-\log(b_{i}\overline{B}_{i})}{-\log\overline{B}_{i}} \leq1+\frac{-\log b_{i}}{-\log\overline{B}_{i}}$, which implies that

$$ 1=\lim_{\overline{B}_i \downarrow0}\frac{\log B_{ij}}{\log \overline{B}_i} =\lim_{\overline{B}_i \downarrow0} \frac{\log\underline{B}_i}{\log \overline{B}_i} =\lim_{\overline{B}_i \downarrow0}\frac{\log B_{ij}}{\log \underline{B}_i} \quad \mbox {for every } i\in\mathcal{M}, j\in \mathcal{M}_0\setminus\{i\},$$

(27)

where the last equality follows from the first two equalities.

Because we want to minimize the mth moment of the detection delay time for any m≥1, we will strengthen the convergence results of Lemma 3.9. Condition 3.11 below for some r≥m is both necessary and sufficient for the L^m-convergences.

Condition 3.11

(Uniform Integrability)

For some r≥m,

(i)
the family $\{(\tau_{A}^{(i)}/(-\log A_{i}))^{r} \}_{A_{i} >0}$ is ℙ_i-uniformly integrable for every $i \in\mathcal{M}$,
(ii)
the family $\{(\upsilon_{B}^{(i)}/(-\log B_{ij(i)}))^{r} \}_{B_{i} > 0}$ is ℙ_i-uniformly integrable for every $i \in\mathcal{M}$.

Lemma 3.12

Letm≥1 be any integer.

(i)
Condition 3.11 (i) holds for somer≥mif and only if$\mathbb{E}_{i}[(\tau^{(i)}_{A})^{m}]<\infty $for everyA_i>0 and
(28)
(ii)
Condition 3.11 (ii) holds for somer≥mif and only if$\mathbb{E}_{i}[(\upsilon ^{(i)}_{B})^{m}]<\infty$for everyB_i>0 and
(29)

where the limits$\overline{B}_{i} \downarrow0$for all$i \in\mathcal{M}$are taken such that (25) is satisfied.

The proof of Lemma 3.12 follows from Lemma 3.9, Chung (2001, Theorem 4.5.4), Gut (2005, Theorem 5.2) and because $\tau_{A}^{(i)}-\theta\leq(\tau_{A}^{(i)}-\theta)_{+} \leq \tau_{A}^{(i)}$ and $\upsilon_{B}^{(i)}-\theta\leq (\upsilon_{B}^{(i)}-\theta)_{+} \leq\upsilon_{B}^{(i)}$. Using renewal theory, one can show that Condition 3.11 holds if Λ_n(i,j)=X₁+⋯+X_n is a random walk for some sequence (X_n)_n≥1 of i.i.d. random variables with $\mathbb{E}X_{1} > 0$ and $\mathbb{E}[(X_{1})^{r}_{-}] < \infty$; see Lai (1975). In the case of the SMHT, Λ_n(i,j) is indeed a random walk with positive drift for every $i\in\mathcal{M}$ and $j\in\mathcal{M}_{0}\setminus\{i\}$; see Baum and Veeravalli (1994).

Condition 3.11 is often hard to verify. An alternative sufficient condition can be given in terms of the r-quick convergence. The r-quick convergence of suitable stochastic processes is known to be sufficient for the asymptotic optimalities of certain sequential rules based on non-i.i.d. observations in CPD and SMHT problems. We will show that the r-quick convergence of the LLR processes is also sufficient for the joint sequential change detection and identification problem.

Definition 3.13

(The r-quick convergence)

Let (ξ_n)_n≥0 be any stochastic process and r>0. Then r-quick-lim inf_n→∞ξ_n≥c if and only if $\mathbb{E}[ (T_{\delta})^{r} ] < \infty$ for every δ>0, where

$$T_\delta:=\inf\Bigl\{ n \geq1: \inf_{m \geq n} \xi_m> c - \delta\Bigr\}, \quad\delta> 0. $$

(30)

According to Proposition 3.15, stated below and proved in the appendix, Condition 3.11 holds if $(\Phi_{n}^{(i)}/n)_{n \geq1}$ and $(\Psi_{n}^{(i)}/n)_{n \geq 1}$ converge r-quickly to l(i) under ℙ_i for every $i\in\mathcal{M}$, which we put together as a different condition:

Condition 3.14

For some r≥1, (i) $r\mbox {-}\mathrm {quick}\mbox {-}\liminf _{n \uparrow\infty} {\Phi_{n}^{(i)}}/n \geq l(i)$ under ℙ_i, (ii) $r\mbox {-}\mathrm {quick}\mbox {-}\liminf _{n \uparrow\infty} {\Psi_{n}^{(i)}}/n \geq l(i)$ under ℙ_i for every $i\in\mathcal{M}$.

Proposition 3.15

Letm≥1. (i) If Condition 3.14 (i) holds for somer≥m, then (28) and Condition 3.11 (i) hold. (ii) If Condition 3.14 (ii) holds for somer≥m, then (29) and Condition 3.11 (ii) hold.

Remark 3.16

Condition 3.14 (i) implies (ii) by (24). Moreover, Condition 3.14 holds if r-quick-lim inf_n↑∞(Λ_n(i,j)/n)≥l(i,j) under ℙ_i for every $i\in \mathcal{M}$ and $j \in\mathcal{M}_{0} \setminus\{i\}$.

3.2 Asymptotic optimality

We now prove the asymptotic optimalities of (τ_A,d_A) and (υ_B,d_B) for Problems 1 and 2 under Condition 3.11 (i) and (ii), respectively.

We first derive a lower bound on the expected detection delay under the optimal strategy. The lower bound on the expected detection delay under the optimal strategy can be obtained similarly to CPD and SMHT; see Baum and Veeravalli (1994), Dragalin et al. (1999), Dragalin et al. (2000), Lai (2000), Tartakovsky and Veeravalli (2004) and Baron and Tartakovsky (2006). This lower bound and Lemma 3.12 below can be combined to obtain asymptotic optimality for both problems.

Lemma 3.17

For every$i \in\mathcal{M}$, we have

$$\liminf_{\overline{R}_i \downarrow0} \inf_{(\tau,d) \in\Delta (\overline{R})} \frac{D_i^{(m)}(\tau)}{ ({|\log({\overline{R}_{j(i)i}}/ {\nu_i} )|} / {l(i)})^m} \geq1.$$

We now study how to set A in terms of c in order to achieve asymptotic optimality in Problem 1. We see from Proposition 3.4 and Lemma 3.12 that the false alarm and misdiagnosis probabilities decrease faster than the expected delay time and are negligible when A and B are small. Indeed, we have, in view of the definition of the Bayes risk in (10), by Proposition 3.4 and Lemma 3.12, for any $0 < \sigma_{i} < \overline {a}_{i}$ for every $i \in\mathcal{M}$,

$$R^{(c,a,m)}_{i} (\tau_{A},d_{A}) \sim c\biggl( \frac{-\log A_i}{l(i)} \biggr)^m + \sigma_iA_i \sim c \biggl( \frac{-\log A_i}{l(i)} \biggr)^m \quad \mbox{as } A_i \downarrow0. $$

(31)

This motivates us to choose the value of A_i such that it minimizes

$$g^{(c)}_i(x) := c \biggl( \frac{-\log x}{ l(i)}\biggr)^m + \sigma_i x, $$

(32)

over x∈(0,∞). Hence let

$$A_i(c) \in \arg \min _{x \in(0,\infty)}g_i^{(c)}(x),\quad c > 0. $$

(33)

For example, A_i(c)=c/(σ_il(i)) when m=1. It can be easily verified that for every m≥1 we have $A_{i}(c) \stackrel{c \downarrow0}{\longrightarrow} 0$ in such a way that logA_i(c)∼logc as c↓0. Hence we have

$$R^{(c,a,m)}_{i} (\tau_{A(c)},d_{A(c)}) \sim g_i^{(c)} \bigl(A_i(c)\bigr)\sim c \biggl(\frac{- \log c}{l(i)} \biggr)^m \quad\mbox{as } c \downarrow0. $$

(34)

Consequently, it is sufficient to show that

$$\liminf_{c \downarrow0} \frac{\inf_{(\tau,d) \in\Delta}R^{(c,a,m)}_{i}(\tau,d)}{g_i^{(c)} (A_i(c))} \geq1. $$

(35)

The proof of the asymptotic optimality below is similar to that of Theorem 3.1 in Baron and Tartakovsky (2006) for CPD.

Proposition 3.18

(Asymptotic optimality of (τ_A,d_A) in Problem 1)

Fixm≥1 and a set of strictly positive constantsa. Under Conditions 3.11 (i) or 3.14 (i) for the givenm, the strategy (τ_A(c),d_A(c)) is asymptotically optimal asc↓0; that is (21) holds for every$i\in\mathcal{M}$.

It should be remarked here that the asymptotic optimality results hold for any $0 < \sigma_{i} < \overline{a}_{i}$. However, for higher-order approximation, it is ideal to choose such that

$${R_i^{(a)}(\tau_A,d_A)} /{A_i} \stackrel{A_i \downarrow0}{\longrightarrow}\sigma_i. $$

(36)

In Sect. 5, we achieve this value using nonlinear renewal theory.

We now show that (υ_B,d_B) is asymptotically optimal for Problem 2. By Proposition 3.4, if we set

$$B_{ij} (\overline{R}) := {\overline{R}_{ji}} / {\nu_i} \quad \mbox {for every }i \in\mathcal{M}, j \in\mathcal{M}_0\setminus\{i\},$$

then we have $(\upsilon_{B(\overline{R})}, d_{B(\overline{R})}) \in \Delta(\overline{R})$ for every fixed positive constants $\overline{R} = (R_{ji})_{i \in\mathcal{M}, j \in\mathcal{M}_{0}\setminus\{i\}}$. By Lemma 3.12 (ii), $\upsilon_{B(\overline{R})} \leq\upsilon^{(i)}_{B(\overline{R})}$ and because $\overline{R}_{i} \downarrow0$ is equivalent to $B_{ij(i)} (\overline{R})\downarrow0$,

$$\limsup_{\overline{R}_i \downarrow0} \frac{D_i^m(\upsilon _{B(\overline{R})})}{( {|\log( \overline{R}_{j(i)i} / \nu_i ) |}/{l(i)} )^m} = \limsup_{\overline{R}_i\downarrow0}\frac{D_i^m(\upsilon_{B(\overline{R})})}{ ({|\log B_{ij(i)} (\overline{R})|} / {l(i)} )^m} \leq1.$$

This together with Lemma 3.17 shows the asymptotic optimality.

Proposition 3.19

(Asymptotic optimality of (υ_B,d_B) in Problem 2)

Fixm≥1. Under Conditions 3.11 (ii) or 3.14 (ii) for the givenm, the strategy$(\upsilon_{B(\overline{R})},d_{B(\overline{R})})$is asymptotically optimal as$\|\overline{R}\| \downarrow0$, i.e., (22) holds for every$i \in\mathcal{M}$.

4 The convergence results of the LLR processes

In this section, we will prove Proposition 3.7 and obtain the limits l(i,j) for every $i\in\mathcal{M}$ and $j\in\mathcal{M}_{0}\setminus\{i\}$, which can be expressed in terms of the Kullback-Leibler divergence of the pre- and post-change probability density functions and the exponential decay rate ϱ in (2) of the disorder time probability distribution. Under some mild condition, we show that the convergence also holds in L^r for every r≥1.

Let us denote the Kullback-Leibler divergence of f_i from f_j by

$$q(i,j) := \int_E \biggl( \log\frac{f_i(x)}{f_j(x)} \biggr)f_i(x) m(\mathrm {d}x), \quad i\in\mathcal{M}, j \in\mathcal{M}_0\setminus \{i\},$$

which always exists and is non-negative. Furthermore, Assumption 2.1 ensures that

$$q(i,j) > 0, \quad i\in\mathcal{M}, j \in\mathcal{M}_0 \setminus\{i\}. $$

(37)

To ensure that $\mathbb{E}_{i}^{(0)} [ \log(f_{0}(X_{1}))/(f_{j}(X_{1})) ]$ exists for every $i\in\mathcal{M}$, $j \in\mathcal{M}_{0} \setminus\{i\}$, we assume the following.

Assumption 4.1

For every $i \in\mathcal {M}$, we assume that q(i,0)<∞.

Since $\mathbb{E}^{(0)}_{i}[(\log (f_{i}(X_{1})/f_{j}(X_{1})))_{-}] \leq1$ for every $i\in\mathcal{M}$, $j\in \mathcal{M}_{0}\setminus\{i\}$, Assumption 4.1 guarantees the existence of

(38)

4.1 Decomposition of the LLR processes

We will decompose each LLR process (1) into some random walk with a positive drift and some stochastic process whose running average increment vanishes in the limit. In the SMHT case (namely, when p₀=1), for every $i\in\mathcal{M}$ and $j \in \mathcal{M}\setminus\{i\}$,

$$\Lambda_n(i,j) = \log\biggl( \frac{\nu_i \prod^n_{k=1}f_i(X_k)}{\nu_j \prod^n_{k=1} f_j(X_k)} \biggr) = \log \biggl( \frac{\nu_i}{\nu_j} \biggr) + \sum^n_{k=1}\log\biggl( \frac{f_i(X_k)}{f_j(X_k)} \biggr), \quad n\geq1,$$

is a ℙ_i-random walk. Its running average increment Λ_n(i,j)/n converges ℙ_i-a.s. to the Kullback-Leibler divergence q(i,j) as n↑∞ by the strong law of large numbers (SLLN). Although $(\Lambda(i,j))_{j \in\mathcal{M}_{0}\setminus\{i\}}$, for p₀≠0, are not ℙ_i-random walks, this observation nonetheless motivates us to approximate them by some random walks. Let

$$\Gamma_i := \bigl\{ j \in\mathcal{M}\setminus\{i\}: q(i,j) <q(i,0) +\varrho\bigr\},\quad i\in\mathcal{M}. $$

We show that Λ(i,j) can be approximated by a random walk with drift q(i,j)>0 if j∈Γ_i and with q(i,0)+ϱ>0 otherwise; namely, with drift min(q(i,j),q(i,0)+ϱ) if $j\in\mathcal{M}\setminus\{i\}$ and q(i,0)+ϱ if j=0. Define

(39)

(40)

for every n≥1 and $j\in\mathcal{M}_{0}$. Then it can be checked easily that, for any $j \in\mathcal{M}_{0}\setminus\{i\}$, we have

(41)

By (7), after taking logarithms on both sides, each LLR process can be written as

$$ \Lambda_n(i,j)=\sum_{l=1}^n h_{ij}(X_l)+ \epsilon_n(i,j), \quad j\in\mathcal{M}_0\setminus\{i\},$$

(42)

where

(43)

(44)

Moreover, $\sum_{l=1}^{n} h_{ij}(X_{l})$ can be split into post- and pre-change terms, and we have

$$\Lambda_n (i,j) = \sum_{l=\theta\vee1}^nh_{ij}(X_l) + \sum_{l=1}^{n \wedge(\theta-1)}h_{ij}(X_l) + \epsilon_n (i,j), \quad n\geq1, $$

(45)

for every fixed $j \in\mathcal{M}_{0} \setminus\{i\}$. Notice that the first term in (45) is conditionally a random walk under $\mathbb{P}_{i}^{(t)}$ given θ=t for every t≥0.

4.2 The convergence of the LLR processes

Fix $i \in\mathcal{M}$ and $j \in\mathcal{M}_{0} \setminus\{i\}$. In view of (42), we can explore the convergence for ${(\sum _{l=1}^{n} h_{ij}(X_{l}))}/n$ and ϵ_n(i,j)/n separately. For the first term, notice that

$$\frac{1}{n} \sum_{l=1}^nh_{ij}(X_l) = \frac{1}{n} \sum_{l=\theta\vee 1}^nh_{ij}(X_l) + \frac{1}{n} \sum_{l=1}^{n \wedge(\theta-1)}h_{ij}(X_l).$$

Because θ is an a.s. finite random variable, the first term on the righthand side converges $\mathbb{P}^{(t)}_{i}$-a.s. to

(46)

by the SLLN, while the second term converges to zero. Then Remark 2.5 implies Lemma 4.2, and, under some mild additional conditions, Lemma 4.3 below.

Lemma 4.2

For every$i \in\mathcal {M}$and$j\in\mathcal{M}_{0} \setminus\{i\}$, we have$(1/n) {\sum_{l=1}^{n} h_{ij}(X_{l})}\mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{\mathbb{P}_{i}\mbox{\scriptsize-a.s.}} l(i,j)$.

Lemma 4.3

For every$i \in \mathcal{M}$, $j\in\mathcal{M}_{0} \setminus\{i\}$andr≥1, we have$(1/n){\sum_{l=1}^{n} h_{ij}(X_{l})} \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow \infty}^{L^{r}(\mathbb{P}_{i})} l(i,j)$, if

$$\mathbb{E}^{(\infty)} \bigl \vert h_{ij}(X_1) \bigr \vert ^r <\infty\quad \mbox {and} \quad\mathbb{E}_i^{(0)}\bigl \vert h_{ij}(X_1) \bigr \vert ^r <\infty. $$

(47)

Note that (47) holds if and only if the following condition holds.

Condition 4.4

For every $i \in \mathcal{M}$, $j\in\mathcal{M}_{0}\setminus\{i\}$, and r≥1, suppose that

$$\everymath{\displaystyle}\begin{array}{l@{}l@{}l}\mathbb{E}^{(\infty)} \biggl \vert \log \frac{f_i(X_1)}{f_j(X_1)}\biggr \vert ^r <\infty\quad \mbox {and}\quad \mathbb{E}_i^{(0)} \biggl \vert \log\frac {f_i(X_1)}{f_j(X_1)}\biggr \vert ^r <\infty \quad \mbox {if } j\in\Gamma_i,\cr\noalign{\vspace{3pt}}\mathbb{E}^{(\infty)} \biggl \vert \log\frac {f_i(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty\quad \mbox {and} \quad\mathbb{E}_i^{(0)}\biggl \vert \log\frac{f_i(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty \quad \mbox {if } j \notin\Gamma_i.\end{array}$$

We now show that ϵ_n(i,j)/n converges ℙ_i-a.s. to zero. The convergence result holds in L^r(ℙ_i) as well for r≥1 under a mild condition. To show this, we first determine the limits of $(L_{n}^{(\cdot)}/n)_{n \geq1}$ and $(K_{n}^{(\cdot)}/n)_{n \geq1}$ as n↑∞ under ℙ_i.

Lemma 4.5

For every$i \in\mathcal{M}$, we have the followings under ℙ_i.

(i)
$L_{n}^{(i)}/n \stackrel{n \uparrow\infty}{\longrightarrow} 0$a.s.
(ii)
$L_{n}^{(j)}/n \stackrel{n \uparrow\infty}{\longrightarrow}[q(i,j)-q(i,0)-\varrho]_{+}$a.s. for every$j \in \mathcal{M}\setminus\{i\}$.
(iii)
$K_{n}^{(j)}/n \stackrel{n \uparrow\infty}{\longrightarrow}[q(i,j)-q(i,0)-\varrho]_{-}$a.s. for every$j \in \mathcal{M}\setminus\{i\}$.
(iv)
$L_{n}^{(i)}$converges a.s. asn↑∞ to a finite random variable$L_{\infty}^{(i)}$.
(v)
$L_{n}^{(j)}$converges a.s. asn↑∞ to a finite random variable$L_{\infty}^{(j)}$for everyj∈Γ_i.
(vi)
For every$j\in\mathcal{M}$, $(|L^{(j)}_{n}/n|^{r})_{n\geq 1}$is uniformly integrable for everyr≥1, if
$$ \mathbb{E}^{(\infty)}\bigl[{f_0(X_1)}/ {f_j(X_1)}\bigr] <\infty\quad \mbox {and} \quad\mathbb{E}_i^{(0)} \bigl[{f_0(X_1)} / {f_j(X_1)}\bigr] <\infty.$$
(48)
(vii)
For every$j\in\mathcal{M}$, $(|K^{(j)}_{n}/n|^{q})_{n\geq 1}$is uniformly integrable for every 0≤q≤r, if (48) holds and
$$ \mathbb{E}^{(\infty)}\biggl \vert \log \frac{f_j(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty\quad \mbox {and}\quad \mathbb{E}_i^{(0)} \biggl \vert \log\frac {f_j(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty, \quad\mathit{for\ some\ } r \geq1.$$
(49)

Notice in Lemma 4.5 (vi) that in order for $L_{n}^{(i)}$ to converge in L^r under ℙ_i to zero, it is sufficient to have

$$\mathbb{E}^{(\infty)} \bigl[ f_0 (X_1) /f_i(X_1) \bigr] < \infty $$

(50)

because $\mathbb{E}_{i}^{(0)} [ f_{0} (X_{1}) / f_{i}(X_{1}) ] = \int_{E} f_{0}(x)m(\mathrm {d}x) = 1 < \infty$. The characterization of ϵ_n(i,j) in (44) leads to the next convergence result.

Lemma 4.6

For every$i \in\mathcal {M}$and$j\in\mathcal{M}_{0}\setminus\{i\}$, we haveϵ_n(i,j)/n→0 asn↑∞ ℙ_i-a.s.

Moreover, the convergence holds in L^r under ℙ_i as well for some r≥1 given the following condition.

Condition 4.7

Given $i\in\mathcal{M}$, $j\in\mathcal{M}_{0}\setminus\{i\}$ and r≥1, we suppose that (50) holds and (i) j∈Γ_i and (48) holds, or (ii) j∉Γ_i or j=0 and (49) holds for the given r.

Lemma 4.8

Fix$i \in\mathcal {M}$, $j\in \mathcal{M}_{0}\setminus\{i\}$andr≥1. Under Condition 4.7, ϵ_n(i,j)/n→0 asn↑∞ inL^r(ℙ_i).

By combining the results in Lemmas 4.5 and 4.6, Proposition 3.7 indeed holds with l(⋅,⋅) as defined in (46). Moreover, the following convergence results hold by Lemmas 4.5 and 4.8.

Proposition 4.9

For every$i\in\mathcal{M}$and$j\in\mathcal{M}_{0}\setminus\{i\}$, we have Λ_n(i,j)/n→l(i,j) asn↑∞ inL^r(ℙ_i) for somer≥1 if Conditions 4.4 and 4.7 hold for the givenr.

Remark 4.10

(i)
Observe from (46) that we have l(i,j)≤l(i,0) for every $i \in\mathcal{M}$ and $j \in \mathcal{M}_{0}\setminus\{i\}$, and the equality holds if and only if $j \in \mathcal{M}_{0} \setminus(\Gamma_{i} \cup\{i\})$.
(ii)
Because q(i,j)=0 if and only if $\int_{\{x\in E: f_{i}(x) \neq f_{j}(x)\}}f_{i}(x)m(\mathrm {d}x)=0$, Assumption 2.1 guarantees that l(i,j)>0 for every $i \in\mathcal{M}$ and $j \in\mathcal{M}_{0}\setminus\{i\}$.
(iii)
We later assume, in Sect. 5 below for higher-order approximations, that there is a unique $j(i)\in \mathcal{M}_{0}\setminus\{i\}$ such that $l(i) = l(i,j(i)) = \min_{j \in\mathcal{M}_{0} \setminus\{i\}}l(i,j)$ for every $i\in\mathcal{M}$. Then (i) implies l(i)<l(i,0) and q(i,j(i))<q(i,0)+ϱ, and j(i)∈Γ_i and Γ_i≠∅.

Remark 4.11

We proved a number of results on the convergence of the LLR processes. However, those results do not guarantee their r-quick convergence. A sufficient condition derived by means of Jensen’s inequality can be found in our technical report (Dayanik et al. 2011).

5 Higher-order approximations

In this section, we derive a higher-order asymptotic approximation for the minimum Bayes risk in Problem 1 by choosing the values of σ in (31) as discussed in the previous section. Proposition 3.4 (i) gives an upper bound on $(R_{i}^{(a)} (\cdot,\cdot))_{i \in\mathcal{M}}$, and here we investigate if there exists some σ such that (36) holds.

5.1 Asymptotic behaviors of the false alarm and misdiagnosis probabilities

Fix $i \in\mathcal{M}$. By (12) and because $\tau_{A} =\tau^{(i)}_{A}$ on {d_A=i,θ≤τ_A<∞}, we have

(51)

(52)

Suppose that $H_{i}^{(a)}(A_{i})$ is bounded from below by some constant b and $H_{i}^{(a)}(A_{i})$ converges as A_i↓0 in distribution to some random variable $H_{i}^{(a)}$ under ℙ_i. Then, because x↦e^−x is continuous and bounded on x∈[b,∞], we have ${R_{i}^{(a)} (\tau_{A},d_{A})} / {A_{i}} \stackrel{A_{i} \downarrow0}{\longrightarrow} \mathbb {E}_{i} [\exp\{- H_{i}^{(a)} \}]$, and therefore (36) holds with $\sigma_{i} =\mathbb{E}_{i}[ \exp\{- H_{i}^{(a)} \}]$.

Recall that $\tau_{A}^{(i)}$ is the first time the process $\Phi_{n}^{(i)}$ exceeds the threshold −logA_i, and −logA_i↑∞⟺A_i↓0. The following lemma shows that the convergence holds on condition that the overshoot

$$W_i(A_i) := \Phi_{\tau^{(i)}_A}^{(i)} - (-\log A_i) = \Phi_{\tau ^{(i)}_A}^{(i)} + \log A_i \geq0 $$

(53)

converges in distribution as A_i↓0 to some random variable W_i under ℙ_i.

Lemma 5.1

Fix$i \in\mathcal{M}$. Ifj(i) is unique and the overshootW_i(A_i) in (53) converges in distribution asA_i↓0 to some random variableW_iunder ℙ_i, then (36) holds with$\sigma_{i} := a_{j(i)i} \mathbb{E}_{i} [ \exp\{- W_{i} \}]$.

In Lemma 5.1 above, σ_i does not depend on a_ji for any $j \in\mathcal{M}_{0} \setminus\{i,j(i) \}$ and therefore we see that R_ji(τ_A,d_A) is negligible compared with R_j(i)i(τ_A,d_A) for any $j \in\mathcal{M}_{0} \setminus\{i,j(i) \}$ for small A.

5.2 Nonlinear renewal theory and the overshoot distribution

We now see that Lemma 5.1 indeed holds via nonlinear renewal theory on condition that j(i) is unique. We obtain the limiting distribution of the overshoot (53).

Observe that, for every $k \in\mathcal{M}_{0} \setminus\{i\}$,

(54)

(55)

By (45) and (54), we have $\Phi_{n}^{(i)} =\sum_{l=\theta\vee1}^{n} h_{i j(i)} (X_{l}) + \xi_{n}(i,j(i))$, where

(56)

We will take advantage of the fact that, given θ, the process $\sum_{l=\theta\vee1}^{n} h_{i j(i)} (X_{l})$ is conditionally a random walk and ξ_n(i,j(i)) can be shown to be “slowly-changing”, in the sense that ξ_n+1(i,j(i))−ξ_n(i,j(i))≈0 for large n. This implies that the increments of the slowly-changing process ξ_n(i,j(i)) are negligible compared to those of the random walk term $\sum_{l=\theta\vee1}^{n} h_{ij(i)}(X_{l})$ at every large n. This result can be used to obtain the overshoot distribution of the process Φ⁽ⁱ⁾ at its boundary-crossing time $\tau^{(i)}_{A}$ for small A_i by means of the nonlinear renewal theory (Woodroofe 1982; Siegmund 1985). Let us firstly give a few definitions and state a fundamental theorem of nonlinear renewal theory.

Definition 5.2

A sequence of random variables (ξ_n)_n≥1 is called uniformly continuous in probability (u.c.i.p.) if for every ε>0, there is δ>0 such that ℙ{max_0≤k≤nδ|ξ_n+k−ξ_n|≥ε}≤ε for every n≥1.

Definition 5.3

A sequence of random variables (ξ_n)_n≥1 is said to be slowly-changing if it is u.c.i.p. and

$$ \frac{\max\{|\xi_1|,\ldots,|\xi_n| \}}{n}\mathop {{\hbox to1.5cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{\mathrm{in\ probability}} 0.$$

(57)

Remark 5.4

If a process converges a.s. to a finite random variable, then it is a slowly-changing process. Moreover, the sum of two slowly-changing processes is also a slowly-changing process.

The following theorem states that, if a process is the sum of a random walk with positive drift and a slowly-changing process, then the overshoot at the first time it exceeds some threshold has the same asymptotic distribution as that of the overshoot of the random walk, as the threshold tends to infinity.

Theorem 5.5

(Woodroofe 1982, Theorem 4.1; Siegmund 1985, Theorem 9.12)

On some$(\Omega,\mathcal {E},\mathbb{P})$, let (Z_n)_n≥1be a sequence of i.i.d. random variables with some common nonarithmetic distribution and mean$0<\mathbb{E}Z_{1} < \infty$. Let (ξ_n)_n≥1be a slowly-changing process and (Z_k)_k≥n+1be independent of (ξ_l)_1≤l≤nfor everyn≥1. If$\widetilde{T}_{b}:= \inf\{ n \geq1: \sum_{i=1}^{n} Z_{i} - \xi_{n} > b\}$and$T_{b} :=\inf\{ n \geq1: \sum_{i=1}^{n} Z_{i} > b\}$for everyb≥0,

We fix $i \in\mathcal{M}$ and obtain the limiting distribution of the overshoot W_i(A_i) as A_i↓∞ using Theorem 5.5.

Lemma 5.6

Fix$i \in\mathcal{M}$andt≥0. Ifj(i) is unique, thenξ_n(i,j(i)) is slowly-changing under$\mathbb{P}_{i}^{(t)}$.

For every t≥1 and $j(i) \in \arg \min _{j \in\mathcal{M}_{0}\setminus \{i\}} l(i,j)$, define a stopping time,

$$T_i^{(t)} := \inf\Biggl\{ n \geq t: \sum _{l=t}^{n} \log\biggl(\frac {f_i(X_l)}{f_{j(i)}(X_l)} \biggr) >0 \Biggr\},$$

and random variable $W^{(t)}_{i}$ whose distribution is given by

$$ \mathbb{P}_i^{(t)} \bigl\{W_i^{(t)} \leq w \bigr\} = \frac{\int_0^w \mathbb{P}_i^{(t)}\{\sum_{l=t}^{T_i^{(t)}} \log\frac{f_i(X_l)}{f_{j(i)}(X_l)} >s \} \mathrm {d}s}{\mathbb{E}_i^{(t)} [ \sum_{l=t}^{T_i^{(t)}} \log \frac{f_i(X_l)}{f_{j(i)}(X_l)} ]},\quad0\leq w < \infty.$$

(58)

The next lemma follows immediately from Theorem 5.5.

Lemma 5.7

Fix$i \in\mathcal {M}$andt≥0. Ifj(i) is unique, then the overshootW_i(A_i) converges to$W^{(t)}_{i}$in distribution under$\mathbb{P}_{i}^{(t)}$asA_i↓0.

Note that the distribution of $W_{i}^{(t)}$ under $\mathbb{P}^{(t)}_{i}$ is identical to that of $W_{i}^{(0)}$ under $\mathbb{P}^{(0)}_{i}$ for every t≥1, which leads to Lemma 5.8 below.

Lemma 5.8

Fix$i\in\mathcal{M}$. Ifj(i) is unique, then asA_i↓0 the overshootW_i(A_i) converges in distribution under ℙ_ito a random variableW_iwhose distribution under ℙ_iis identical to that of$W^{(0)}_{i}$in (58) under$\mathbb{P}^{(0)}_{i}$.

Finally, Lemmas 5.1 and 5.8 prove Proposition 5.9 below.

Proposition 5.9

Fix$i \in \mathcal{M}$and supposej(i) is unique. Then${R_{i}^{(a)} (\tau _{A},d_{A})} /{A_{i}} \stackrel{A_{i} \downarrow0}{\longrightarrow} a_{j(i) i} \mathbb{E}_{i} [ e^{-W_{i}}]$, whereW_iis the random variable defined in Lemma 5.8. Therefore, a higher-order approximation for Problem 1 can be achieved by setting in (32)

$$\sigma_i := a_{j(i) i} \mathbb{E}_i \bigl[e^{- W_i} \bigr]. $$

(59)

6 Numerical examples

To assess the performance of the asymptotically optimal rule, one firstly needs to find, for comparison, the optimal solution. As outlined in Sect. 2, in order to solve optimally the fixed-error-probability formulation, one first needs to transform it to a minimum Bayes risk formulation by means of Lagrange relaxation, and then solve repeatedly the latter for different values of Lagrange multipliers. Because this method requires extensive calculations and its details are not of the primary interest of this paper, we focus on the minimum Bayes risk formulation and evaluate the performance of the strategy (τ_A(c),d_A(c)) numerically in the i.i.d. Gaussian case described below. Its asymptotic optimality ensures that the strategy is near-optimal when the unit detection delay cost c is small. Our numerical example suggests that it is near-optimal even for mildly higher values of the unit detection delay cost.

6.1 The Gaussian case

Suppose that the observations $X_{n} = (X_{n}^{(1)},\ldots,X_{n}^{(K)})$, n≥1 form a sequence of K-tuple Gaussian random variables. Conditionally on θ and μ, they are mutually independent and have common means $(\lambda_{0}^{(1)}, \ldots,\lambda_{0}^{(K)})$ before θ and $(\lambda_{\mu}^{(1)}, \ldots,\lambda_{\mu}^{(K)})$ at and after θ and common variances (1,…,1) at all times. The Kullback-Leibler divergence between the probability density functions under μ=i and μ=j is $q(i,j) = \frac{1}{2} \sum_{k = 1}^{K} ( \lambda_{i}^{(k)} -\lambda_{j}^{(k)} )^{2}$ for every $i\in\mathcal{M}$, $j\in\mathcal{M}_{0}\setminus\{i\}$. Because Conditions 4.4 and 4.7 are satisfied, Propositions 3.7 and 4.9 hold with

$$l(i,j) = \min\Biggl\{ \varrho+ \frac{1}{2} \sum_{k = 1}^K\bigl( \lambda_i^{(k)} - \lambda_0^{(k)}\bigr)^2, \frac{1}{2} \sum_{k= 1}^K\bigl( \lambda_i^{(k)} - \lambda_j^{(k)}\bigr)^2 \Biggr\}, \quad \mbox { $j \in\mathcal{M}\setminus\{i\}$,} $$

(60)

and $l(i,0) = \varrho+ \frac{1}{2} \sum_{k = 1}^{K} (\lambda_{i}^{(k)} - \lambda_{0}^{(k)})^{2}$ for every $i \in \mathcal{M}$.

6.2 Numerical validation of Proposition 3.7

Let M=3, K=1, p₀=0, p=0.1, (ν₁,ν₂,ν₃)=(1/3,1/3,1/3), and $(\lambda_{0}^{(1)},\lambda_{1}^{(1)},\lambda_{2}^{(1)},\lambda_{3}^{(1)}) = (0,0.2,0.3,0.8)$. The limiting values l(⋅,⋅) in (60) are reported in Table 1. Figure 2 shows sample realizations of (Λ_n(μ,j)/n)_n≥1, j∈{0,1,2,3}∖{μ} and $(\Phi_{n}^{(\mu)}/n)_{n \geq1}$ given (a) μ=1 and θ=10, (b) μ=1 and θ=1000 and (c) μ=2 and θ=10. The figures and the limiting values in Table 2 are consistent as expected from Proposition 3.7. As guaranteed by Proposition 3.8, the process $(\Phi_{n}^{(i)}/n)_{n\geq1}$ converges to l(i).

Table 1 The limits l(i,j) of Proposition 3.7 calculated for the numerical example ($\arg \min _{\scriptsize j \in\mathcal{M}_{0} \setminus\{i\}}l(i,j)$ values are indicated in boldface)

Full size table

Table 2 Numerical comparisons of the optimal and approximate (τ_A(c),d_A(c)) Bayes risk values

Full size table

6.3 The numerical comparison of the minimum and asymptotically minimum Bayes risks

We calculate the minimum and asymptotically minimum Bayes risks for the following example. We assume that M=2, K=2, p₀=0, p=0.01, (ν₁,ν₂)=(0.1,0.9), and the mean vectors $\lambda_{0}=(\lambda^{(1)}_{0},\lambda^{(2)}_{0})$ and $\lambda_{i}=(\lambda^{(1)}_{i},\lambda^{(2)}_{i})$, i=1,2 before and after the change, respectively, satisfy

$$\lambda_1^{(1)} =\lambda_0^{(1)}+1.0,\qquad\lambda_2^{(1)} =\lambda_0^{(1)}+1.0,\qquad\lambda_1^{(2)} =\lambda_0^{(2)}+0.0,\qquad\lambda_2^{(2)} =\lambda_0^{(2)}+0.5.$$

Table 2 compares the performances of the strategy (τ_A(c),d_A(c)) and the optimal strategy for fixed a_ji=1 for every $i \in\mathcal{M}$ and $j \in\mathcal{M}_{0} \setminus\{i\}$ as the unit detection delay cost c decreases. The optimal stopping regions are found by the value iteration described by Dayanik et al. (2008). The Bayes risks of the strategies are estimated via Monte Carlo simulation. For accurate approximations, we used (59), and $(\sigma_{i})_{i \in \mathcal{M}}$ are computed with Monte Carlo methods.

We see that (τ_A(c),d_A(c)) is asymptotically optimal; the ratio of the optimal and approximate Bayes risk values converges to 1 as c↓0 as listed in the last column. Moreover, the approximate and the minimum Bayes risk values are close even for large c values, and this is due to the higher-order approximation as studied in Sect. 5.

Change history

07 October 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10479-021-04269-9

References

Baron, M., & Tartakovsky, A. G. (2006). Asymptotic optimality of change-point detection schemes in general continuous-time models. Sequential Analysis, 25(3), 257–296.
Article Google Scholar
Baum, C. W., & Veeravalli, V. V. (1994). A sequential procedure for multihypothesis testing. IEEE Transactions on Information Theory, 40(6), 1994–2007.
Article Google Scholar
Bertsekas, D. P. (2005). Dynamic programming and optimal control (Vol. I). Belmont: Athena Scientific.
Google Scholar
Borkar, V. S. (1991). A remark on control of partially observed Markov chains. Annals of Operations Research, 29(1–4), 429–438.
Article Google Scholar
Burnetas, A. N., & Katehakis, M. N. (1997). Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, 22(1), 222–255.
Article Google Scholar
Bussiere, M., & Fratzscher, M. (2006). Towards a new early warning system of financial crises. Journal of International Money and Finance, 25, 953–973.
Article Google Scholar
Chung, K. L. (2001). A course in probability theory (3rd ed.). Academic Press: San Diego.
Google Scholar
Dayanik, S., Goulding, C., & Poor, H. V. (2008). Bayesian sequential change diagnosis. Mathematics of Operations Research, 33(2), 475–496.
Article Google Scholar
Dayanik, S., Powell, W. B., & Yamazaki, K. (2011). Asymptotic theory of sequential change detection and identification (Technical report). Center for the Study of Finance and Insurance, Osaka University. Available at http://www-csfi.sigmath.es.osaka-u.ac.jp/en/activity/technicalreport.php.
Derman, C. (1970). Finite state Markovian decision processes. New York: Academic Press.
Google Scholar
Dragalin, V. P., Tartakovsky, A. G., & Veeravalli, V. V. (1999). Multihypothesis sequential probability ratio tests. I. Asymptotic optimality. IEEE Transactions on Information Theory, 45(7), 2448–2461.
Article Google Scholar
Dragalin, V. P., Tartakovsky, A. G., & Veeravalli, V. V. (2000). Multihypothesis sequential probability ratio tests. II. Accurate asymptotic expansions for the expected sample size. IEEE Transactions on Information Theory, 46(4), 1366–1383.
Article Google Scholar
Gut, A. (1988). Applied probability. A series of the applied probability trust: Vol. 5. Stopped random walks. New York: Springer.
Google Scholar
Gut, A. (2005). Probability: a graduate course. Springer texts in statistics. New York: Springer.
Google Scholar
Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., & Weiss, D. (2004). Syndromic surveillance in public health practice. CDC.
Lai, T. L. (1975). On uniform integrability in renewal theory. Bulletin of the Institute of Mathematics, Academia Sinica, 3(1), 99–105.
Google Scholar
Lai, T. L. (1977). Convergence rates and r-quick versions of the strong law for stationary mixing sequences. Annals of Probability, 5(5), 693–706.
Article Google Scholar
Lai, T. L. (2000). Sequential multiple hypothesis testing and efficient fault detection-isolation in stochastic systems. IEEE Transactions on Information Theory, 46(2), 595–608.
Article Google Scholar
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28(1–4), 47–65.
Article Google Scholar
Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41, 100–115.
Article Google Scholar
Puterman, M. (1994). Markov decision processes—discrete stochastic dynamic programming. New York: Wiley.
Book Google Scholar
Runggaldier, W. J. (1991). On the construction of ϵ-optimal strategies in partially observed MDPs. Annals of Operations Research, 28(1–4), 81–95.
Article Google Scholar
Siegmund, D. (1985). Springer series in statistics. Sequential analysis. New York: Springer.
Google Scholar
Tartakovsky, A. G., & Veeravalli, V. V. (2004). Statist. textbooks monogr.: Vol.173. Change-point detection in multichannel and distributed systems (pp. 339–370). New York: Dekker.
Google Scholar
Wald, A. (1947). Sequential analysis. New York: Wiley.
Google Scholar
White, C. C. III (1991). A survey of solution techniques for the partially observed Markov decision process. Annals of Operations Research, 32(1–4), 215–230.
Article Google Scholar
Woodroofe, M. (1982). CBMS-NSF regional conference series in applied mathematics: Vol.39. Nonlinear renewal theory in sequential analysis. Philadelphia: SIAM.
Google Scholar

Download references

Acknowledgements

The authors thank Alexander Tartakovsky for the illuminating discussions. We also thank an anonymous referee and the editors for the constructive remarks and suggestions which significantly improved our presentation. The research of Savas Dayanik was supported by the TÜBİTAK Research Grants 109M714 and 110M610. Warren B. Powell was supported in part by the Air Force Office of Scientific Research, contract FA9550-08-1-0195, and the National Science Foundation, contract CMMI-0856153. Kazutoshi Yamazaki was in part supported by Grant-in-Aid for Young Scientists (B)22710143, the Ministry of Education, Culture, Sports, Science and Technology, and Grant-in-Aid for Scientific Research (B)2271014, Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Departments of Industrial Engineering and Mathematics, Bilkent University, Bilkent, 06800, Ankara, Turkey
Savas Dayanik
Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, 08544, USA
Warren B. Powell
Center for the Study of Finance and Insurance, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka, 560-8531, Japan
Kazutoshi Yamazaki

Authors

Savas Dayanik
View author publications
You can also search for this author in PubMed Google Scholar
Warren B. Powell
View author publications
You can also search for this author in PubMed Google Scholar
Kazutoshi Yamazaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazutoshi Yamazaki.

Additional information

The original online version of this article was revised

Appendix A: Proofs and auxiliary results

1.1 A.1 Proof of Remark 2.2

We will prove that

$$ 0 < \prod^n_{k=1}\frac{f_i(X_k)}{f_0(X_k)} < \infty\quad \mbox {for every $i \in \mathcal{M}$},$$

(61)

which implies that ℙ-a.s. $0< \Pi^{(i)}_{n} ={\alpha^{(i)}_{n}}/{(\sum_{j\in\mathcal{M}_{0}} \alpha^{(j)}_{n})} =({\alpha^{(i)}_{n}/\prod^{n}_{k=1} f_{0}(X_{k})})/ ({\sum _{j\in\mathcal{M}_{0}}\alpha^{(j)}_{n}/\prod^{n}_{k=1} f_{0}(X_{k})})<1$ for every $i\in\mathcal{M}$, because ${\alpha^{(0)}_{n}}/{\prod^{n}_{k=1} f_{0}(X_{k})} =(1-p_{0})(1-p)^{n}>0$ and

To prove (61), let E_i:={x:0<f_i(x)/f₀(x)<∞} for every $i\in\mathcal{M}$. Then Assumption 2.1 implies that

Because ℙ{θ≤1,μ=j}>0 for every $j\in \mathcal{M}$ and ℙ{θ>1}>0, we must have $\int_{E_{i}} f_{j}(x) m(\mathrm {d}x) =1$ for every $j \in\mathcal{M}_{0}$. Therefore, for every $i\in\mathcal{M}$, $\mathbb{P}\{0 <\prod^{n}_{k=1} \frac{f_{i}(X_{k})}{f_{0}(X_{k})} <\infty\} = \mathbb{P}\{0<\frac{f_{i}(X_{k})}{f_{0}(X_{k})}<\infty\ \forall 1\leq k \leq n \} $ equals

1.2 A.2 Proof of Lemma 2.3

Because $\mathbb{P}(F \cap\{\mu=j, \theta\leq\tau< \infty \}) = \sum_{n=0}^{\infty}\mathbb{P}(F \cap\{ \tau = n\} \cap\{\theta\leq n, \mu=j \})=$

the first equality follows. The proof of the second equality is similar.

1.3 A.3 Proof of Proposition 3.4

(i) Since $\tau_{A} = \tau_{A}^{(i)}$ on {d_A=i,τ_A<∞}, $G_{i}^{(a)} (\tau_{A}) \leq\overline{a}_{i} \sum_{j\in \mathcal{M}_{0} \setminus\{i\}} \exp\{-\Lambda_{\tau_{A}^{(i)}}(i,j)\} =\overline{a}_{i} \exp\{-\Phi_{\tau_{A}^{(i)}}^{(i)} \} < \overline{a}_{i}A_{i}$ by (13), where the equality and the last inequality follow from (15) and (16), respectively. Hence, we have $R_{i}^{(a)} (\tau_{A},d_{A}) = \mathbb{E}_{i}[1_{\{d_{A}=i, \theta\leq\tau_{A} < \infty\}} G_{i}^{(a)} (\tau_{A})] \leq \overline{a}_{i} \,A_{i}$. Because $\exp\{-\Lambda_{\tau_{A}}(i,j) \} =\Pi^{(j)}_{\tau_{A}}/\Pi^{(i)}_{\tau_{A}} \leq (1-\Pi^{(i)}_{\tau_{A}})/\Pi^{(i)}_{\tau_{A}} < A_{i}$, we have $R_{ji}(\tau_{A},d_{A}) = \nu_{i} \mathbb{E}_{i} [1_{\{d_{A}=i,\theta\leq \tau_{A} <\infty\}} \exp\{-\Lambda_{\tau_{A}}(i,j)\}] \leq\nu_{i} A_{i} \leq\nu_{i}\|A\|$. (ii) Because $\upsilon_{B} = \upsilon_{B}^{(i)}$ on {d_B=i,θ≤υ_B<∞}, and $\Lambda_{\upsilon_{B}^{(i)}} (i,j) > - \log B_{ij}$, Proposition 2.4 implies $R_{ji} (\upsilon_{B}, d_{B}) =\nu_{i} \mathbb{E}_{i} [ 1_{\{ d_{B} = i, \theta\leq\upsilon_{B} < \infty \}}\exp\{-\Lambda_{\upsilon_{B}} (i,j)\}] \leq\nu_{i} B_{ij}$.

1.4 A.4 Proof of Proposition 3.6

For (i), because $(\tau_{A}^{(i)})$ increases as A_i↓0, it is enough to show that there is a subsequence the limit of which exists and equals ∞, ℙ_i-a.s. Fix n≥1. By (14), we have $\mathbb{P}_{i} \{ \tau_{A}^{(i)} \leq n\} =\mathbb{P}_{i} (\bigcup^{n}_{k=1} \{\Pi^{(i)}_{k} > 1/{(1+A_{i})} \}) \leq \sum^{n}_{k=1} \mathbb{P}_{i} \{\Pi^{(i)}_{k} > 1 /(1+A_{i}) \}$. Therefore, $\limsup_{A_{i} \downarrow0}\mathbb{P}_{i} \{ \tau_{A}^{(i)}\leq n \}\leq\sum_{k=1}^{n} \limsup_{A_{i} \downarrow0} \mathbb{P}_{i} \{ \Pi _{k}^{(i)} >1 /(1+A_{i}) \} \leq\sum_{k=1}^{n} \mathbb{P}_{i} \{ \Pi_{k}^{(i)} = 1\}$, which is zero by Remark 2.2. Namely, $\tau_{A}^{(i)}\rightarrow\infty$ in probability under ℙ_i as A_i↓0. Hence, there is a subsequence of (A_i) along which ℙ_i-a.s. $\tau_{A}^{(i)} \uparrow\infty$, which proves (i).

Because ℙ{d_A=j,μ=i}=ℙ{d_A=j,θ≤τ_A<∞,μ=i}+ℙ{d_A=j,τ_A<θ,μ=i}≤R_ij(τ_A,d_A)+R_0j(τ_A,d_A)≤2ν_jA_j by Proposition 3.4 (i), for every fixed n≥1, we have

which goes to zero as ∥A∥↓0 by (i) and by Proposition 3.4. Namely, τ_A→∞ in probability under ℙ_i as ∥A∥↓0; therefore, there is a subsequence of (τ_A)_A>0 that goes to ∞, ℙ_i-a.s. as ∥A∥↓0. Because (τ_A)_A>0 is increasing ℙ_i-a.s. as ∥A∥↓0, its limit exists and equals ∞, ℙ_i-a.s. as well, and (ii) follows.

Similarly, we have $\mathbb{P}_{i} \{ \upsilon_{B}^{(i)} \leq n \} \leq \sum_{k=1}^{n} \mathbb{P}_{i} \{ \Psi_{k}^{(i)} > - \log\overline{B}_{i}\}$. Because, for every fixed k≥1, $\{\Psi^{(i)}_{k} > - \log \overline{B}_{i} \} = \{\min_{j \in\mathcal{M}_{0}\setminus\{i\}}\Lambda_{k}(i,j) > -\log\overline{B}_{i} \} = \{\max_{j \in\mathcal{M}_{0}\setminus\{i\}} (\Pi^{(j)}_{k}/\Pi^{(i)}_{k}) < \overline{B}_{i} \}\subseteq\{\sum_{j \in\mathcal{M}_{0}\setminus\{i\}} (\Pi^{(j)}_{k}/\Pi^{(i)}_{k}) < M \overline{B}_{i} \} =\{(1-\Pi^{(i)}_{k})/\Pi^{(i)}_{k} < M \overline{B}_{i} \} =\{\Pi^{(i)}_{k} > 1/(1+M \overline{B}_{i}) \}$, we have $\limsup_{\overline{B}_{i} \downarrow0} \mathbb{P}_{i} \{ \upsilon_{B}^{(i)}\leq n\} \leq\sum_{k=1}^{n} \limsup_{\overline{B}_{i} \downarrow0} \mathbb {P}_{i} \{\Pi_{k}^{(i)} > 1/(1+M \overline{B}_{i}) \} \leq\sum_{k=1}^{n} \mathbb {P}_{i} \{\Pi_{k}^{(i)} = 1 \}=0$ by Remark 2.2. Therefore, as in the proof of (i), ℙ_i-a.s. $\upsilon_{B}^{(i)} \rightarrow\infty$ as $\overline{B}_{i}\downarrow0$, and (iii) follows. Furthermore, (iv) is immediate because, for every fixed n≥1, Proposition 3.4 (ii) implies $\mathbb{P}_{i} \{ \upsilon_{B}\leq n\} \leq\mathbb{P}_{i} \{ \upsilon_{B}^{(i)} \leq n \} + \frac{1}{\nu_{i}} \sum_{j \in\mathcal{M}\setminus\{i\}}(R_{0j}(\upsilon_{B},d_{B})+R_{ij}(\upsilon_{B},d_{B})) \leq\mathbb{P}_{i} \{\upsilon_{B}^{(i)} \leq n \} + \frac{1}{\nu_{i} }\sum_{j \in\mathcal{M}\setminus\{i\}} \nu_{j} (B_{j0}+B_{ji})\stackrel{\|B\| \downarrow 0}{\longrightarrow} 0$.

1.5 A.5 Proof of Lemma 3.9

First, (16) implies that $\Phi^{(i)}_{\tau_{A}^{(i)}-1} / {(\tau_{A}^{(i)}-1)} \leq-{\log A_{i}} /{(\tau_{A}^{(i)}-1)}$ and $-{\log A_{i}} /{\tau_{A}^{(i)}} <\Phi^{(i)}_{\tau_{A}^{(i)}} /\allowbreak {\tau_{A}^{(i)}}$. By Proposition 3.8 (i) and Proposition 3.6 (i), we have $l(i) \leq\liminf_{A_{i}\downarrow0} [{(-\log A_{i})} /\allowbreak {(\tau_{A}^{(i)}-1)}]$ and $\limsup_{A_{i} \downarrow0} [{(-\log A_{i})} / {\tau_{A}^{(i)}}] \leq l(i)$, ℙ_i-a.s, which proves (i). Because $\tau _{A}^{(i)}-\theta \leq(\tau_{A}^{(i)}-\theta)_{+} \leq\tau_{A}^{(i)}$ and $\theta/(-\log A_{i}) \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{A_{i} \downarrow0}^{\mathbb{P}_{i}\mbox {-a.s.}} 0$, (ii) follows from (i). Similarly, (23) implies that $\Psi^{(i)}_{\upsilon_{B}^{(i)}-1} / {(\upsilon_{B}^{(i)}-1)} \leq -{\log \underline{B}_{i}}/{(\upsilon_{B}^{(i)}-1)}$ and $-{\log\overline{B}_{i}}/ {\upsilon_{B}^{(i)}} < \Psi_{\upsilon_{B}^{(i)}} / {\upsilon_{B}^{(i)}}$. By Proposition 3.8 (ii) and Proposition 3.6 (iii), we have $l(i) \leq \liminf_{\overline{B}_{i} \downarrow0}[ {(-\log\underline{B}_{i})} /{(\upsilon_{B}^{(i)}-1)}]$ and $\limsup_{\overline{B}_{i} \downarrow0}[{(-\log\overline{B}_{i})} / {\upsilon_{B}^{(i)}}] \leq l(i)$, ℙ_i-a.s. If we divide and multiply by −logB_ij(i) before we take the limits and use (27), then (iii) follows; (iv) follows from (iii) because $\upsilon_{B}^{(i)}-\theta\leq (\upsilon_{B}^{(i)}-\theta)_{+} \leq\upsilon_{B}^{(i)}$ and $\theta /(-\log B_{ij(i)}) \stackrel{\overline{B}_{i} \downarrow0}{\longrightarrow} 0$ ℙ_i-a.s.

1.6 A.6 Proof of Proposition 3.15

Fix $i \in\mathcal{M}$. (i) Lemma 3.9 (i) and Fatou’s lemma give the inequality

$$ \liminf_{A_i \downarrow0} \mathbb{E}_{i} \bigl[\bigl( { \tau^{(i)}_A}/ {(-\log A_i)} \bigr)^m \bigr]\geq1/l(i)^{m}.$$

(62)

Let us next define $T_{\delta}:= \inf\{ n \geq1: \inf_{k \geq n}(\Phi_{k}^{(i)}/k) > l(i) - \delta\}$ for every 0<δ<l(i). Because by hypothesis $\Phi_{n}^{(i)}/n$ converges m-quickly (m≤r) to l(i) as n↑∞ under ℙ_i, $\mathbb{E}_{i} [ ( T_{\delta})^{m} ] < \infty$ for every 0<δ<l(i). On $\{ \tau_{A}^{(i)} > T_{\delta}\}\equiv\{\tau_{A}^{(i)}-1 \geq T_{\delta}\}$, we have ${\Phi^{(i)}_{\tau_{A}^{(i)}-1}}/{(\tau_{A}^{(i)}-1)} \geq l(i) - \delta \Longleftrightarrow\tau_{A}^{(i)} \leq {\Phi^{(i)}_{\tau_{A}^{(i)}-1}}/{(l(i) - \delta)} + 1$. Because $\Phi_{\tau_{A}^{(i)}-1}^{(i)} < - \log A_{i}$ by definition, $\tau _{A}^{(i)} < - {\log A_{i}}/{(l(i) - \delta)} + 1$ on $\{ \tau_{A}^{(i)} > T_{\delta}\}$, and we obtain $\tau_{A}^{(i)} =\tau_{A}^{(i)} 1_{\{\tau_{A}^{(i)} > T_{\delta}\}} + \tau_{A}^{(i)}1_{\{\tau_{A}^{(i)} \leq T_{\delta}\}} < -{\log A_{i}} / {(l(i) -\delta)} + 1 + T_{\delta}$. After dividing both sides by (−logA_i) and taking the m-norm on both sides, Minkowski’s inequality applied to the righthand side gives

which is finite for every 0<δ<l(i). Then $\limsup_{A_{i}\downarrow0} \mathbb{E}_{i} [({\tau_{A}^{(i)}}/{(-\log A_{i}}))^{m} ]^{1/m}\leq 1/{(l(i)-\delta)}$ for 0<δ<l(i). Letting δ↓0 gives $\limsup_{A_{i} \downarrow0} \mathbb{E}_{i} [({\tau _{A}^{(i)}}/({-\log A_{i}}))^{m}]^{1/m} \leq1/l(i)$, which together with (62) proves (i).

(ii) Lemma 3.9 (iii) and Fatou’s lemma imply that

$$\liminf_{\overline{B}_i \downarrow0} \mathbb{E}_{i} \bigl[ \bigl( {\upsilon^{(i)}_B}/ {(-\log B_{ij(i)})} \bigr)^m \bigr]\geq 1/l(i)^{m}. $$

(63)

Let us define $T_{\delta}:= \inf\{ n \geq1: \inf_{k \geq n}({\Psi_{k}^{(i)}} / k) > l(i) - \delta\}$ for every 0<δ<l(i). Because by hypothesis $\Psi_{n}^{(i)} / n$ converges m-quickly (m≤r) to l(i) as n↑∞ under ℙ_i, we have $\mathbb{E}_{i} [ ( T_{\delta})^{m} ] < \infty$ for every 0<δ<l(i). Using a similar argument as in the first part, we can show that $\overline{\upsilon}^{(i)}_{B} < -\log \underline{B}_{i}/(l(i)-\delta)+1+T_{\delta}$. After diving both sides by $(-\log\underline{B}_{i})$ and taking the m-norm of both sides, an application of Minkowski’s inequality on the righthand side gives

which is finite for every 0<δ<l(i). Then $\limsup_{\overline{B}_{i} \downarrow0} \mathbb{E}_{i}[({\overline{\upsilon}_{{B}}^{(i)}}/({-\log\underline{B}_{i}}))^{m}]^{1/m} \leq1/{(l(i)-\delta)}$ for 0<δ<l(i). Letting δ↓0 gives $\limsup_{\overline{B}_{i} \downarrow0}\mathbb{E}_{i}[({\overline{\upsilon}_{{B}}^{(i)}}/({-\log\underline{B}_{i}}))^{m}]^{1/m} \leq1/l(i)$. After raising both sides to power m, the inequality ${\upsilon}_{{B}}^{(i)} \leq \overline{\upsilon}_{{B}}^{(i)}$ implies $\limsup_{\overline{B}_{i}\downarrow0} \mathbb{E}_{i} [({{\upsilon}_{{B}}^{(i)}}/({-\log \underline{B}_{i}}))^{m} ] \leq\limsup_{\overline{B}_{i} \downarrow0}\mathbb{E}_{i} [({\overline{\upsilon}_{{B}}^{(i)}}/({-\log\underline {B}_{i}}))^{m}] \leq1/l(i)^{m}$. Dividing and multiplying the lefthand side with (−logB_ij(i))^m prior to taking the limit give $\limsup_{\overline{B}_{i} \downarrow0} \mathbb{E}_{i}[({{\upsilon}_{{B}}^{(i)}}/({-\log B_{ij(i)}}))^{m} ] \leq1/l(i)^{m}$ thanks to (27). The last inequality and (63) prove (ii).

1.7 A.7 Proof of Remark 3.16

Because Condition 3.14 (i) implies (ii), it is enough to show for (i). Fix $i \in\mathcal{M}$. For every fixed δ>0 and n>(2logM)/δ, we have $\Phi_{n}^{(i)}/n > l(i) - \delta \Longleftrightarrow\sum_{j \in\mathcal{M}_{0} \setminus\{i\}}e^{-\Lambda_{n}(i,j)} < e^{-n(l(i)-\delta)} $

Let $T_{\delta}(i) := \inf\{ n \geq1: \inf_{k \geq n}({\Phi_{k}^{(i)}} / k) > l(i) - \delta\}$ and T_δ(i,j):=inf{n≥1:inf_k≥n(Λ_k(i,j)/k)>l(i,j)−δ} for $j \in\mathcal{M}_{0} \setminus\{i\}$ and δ>0. Then $T_{\delta}(i) \leq( \max_{j \in\mathcal{M}_{0}\setminus \{i\}} T_{\delta/2}(i,j) ) \lor(2 \log M)/\delta$, and

for every δ>0, because r-quick-lim inf_n↑∞(Λ_n(i,j)/n)≥l(i,j) under ℙ_i for every $j \in\mathcal{M}_{0} \setminus\{i\}$. Therefore, $r\mbox {-}\mathrm {quick}\mbox {-}\liminf _{n \uparrow\infty} {\Phi_{n}^{(i)}}/n \geq l(i)$ under ℙ_i for every $i\in\mathcal{M}$.

1.8 A.8 Proof of Lemma 3.17

The proof requires the following three lemmas.

Lemma A.1

For every$i \in\mathcal{M}$, $j \in\mathcal{M}_{0} \setminus\{i\}$, L>0, c>1, we have

Proof

By Proposition 2.4, $R_{ji}(\tau,d) =\nu_{i} \mathbb{E}_{i} [ 1_{\{d= i , \theta\leq\tau< \infty\}}e^{-\Lambda_{\tau}(i,j)}] = \mathbb{E}[ 1_{\{\mu= i , \theta\leq \tau<\infty, d=i \}}\cdot e^{-\Lambda_{\tau}(i,j)}]$, and

for every fixed B>0. Hence, we have $\mathbb{P}\{\mu= i, \tau-\theta> L \} \geq\mathbb{P}\{\mu= i, \theta+ L < \tau< \infty\}\geq \mathbb{P}\{ \mu= i, \theta\leq\tau< \infty, d=i \} - e^{B}R_{ji}(\tau,d) - \mathbb{P}\{ \mu= i, \sup_{n \leq\theta+L}\Lambda_{n}(i,j) > B \} = \nu_{i} - \nu_{i} R_{i}^{(1)} (\tau,d) -e^{B} R_{ji}(\tau,d) - \mathbb{P}\{ \mu= i, \sup_{n \leq\theta+L}\Lambda_{n}(i,j) > B \}$. Dividing by ν_i=ℙ{μ=i} gives $\mathbb{P}_{i} \{\tau- \theta> L \} \geq1 - R_{i}^{(1)}(\tau,d) - \frac{e^{B}}{\nu_{i}} R_{ji}(\tau,d) - \mathbb{P}_{i} \{\sup_{n\leq \theta+ L}\Lambda_{n}(i,j) > B\}$. By setting B=cLl(i,j) and taking infimum on both sides,

Now the lemma holds because $(\tau,d) \in\Delta(\overline{R})$ implies that $R_{i}^{(1)}(\tau,d) \leq\frac{\sum_{j \in\mathcal{M}_{0}\setminus \{i\}} \overline{R}_{ji}}{\nu_{i}}$ and $R_{ji}(\tau,d) \leq \overline{R}_{ji}$. □

Lemma A.2

For every$i \in\mathcal {M}$andc>1, we have$\mathbb{P}_{i} \{ \sup_{n \leq\theta+ L} \Lambda_{n}(i,j(i)) >c L l(i)\} \stackrel{L \uparrow\infty}{\longrightarrow} 0$.

Proof

Since Λ_n(i,j(i))/n converges ℙ_i-a.s. to l(i) as n↑∞ by Assumption 3.7, there is ℙ_i-a.s. finite K_c such that $\sup_{n > K_{c}}\frac{\Lambda_{n} (i,j(i))_{+}}{n} = \sup_{n > K_{c}} \frac {\Lambda_{n}(i,j(i))}{n} < ( 1+ (c-1)/2 ) l(i)$, ℙ_i-a.s. Moreover, ℙ_i{sup_n≤θ+LΛ_n(i,j(i))>cLl(i)}≤

(64)

where $F_{L} := \{ \frac{\sup_{n \leq K_{c}}\Lambda_{n}(i,j(i))_{+}}{L} + \frac{\theta+ L}{L} \sup_{n >K_{c}} \frac{\Lambda_{n}(i,j(i))_{+}}{n} > c l(i) \}$. Because both K_c and θ are ℙ_i-a.s. finite,

by Remark 2.2. Thus, $1_{F_{L}} \rightarrow 0$ as L↑∞ ℙ_i-a.s., implying $\mathbb {P}_{i} (F_{L})\stackrel{L \uparrow\infty}{\longrightarrow} 0$, and the claim holds by (64). □

Lemma A.3

For every 0<δ<1, $i\in\mathcal{M}$andj(i), $\liminf_{ \overline{R}_{i} \downarrow0}\inf _{(\tau,d) \in\Delta (\overline{R})} \mathbb{P}_{i} \{ \tau- \theta\geq\delta\frac {|\log( {\overline{R}_{j(i)i}}/ {\nu_{i}} )|}{l(i)} \} \geq1$.

Proof

Fix $0 < \overline{R}_{j(i)i} < \nu_{i}$. Then $- \log({\overline{R}_{j(i)i}}/ {\nu_{i}} ) = |\log({\overline{R}_{j(i)i}}/ {\nu_{i}} )|$. If in Lemma A.1 we set j=j(i), $L :=L(\overline{R}_{j(i)i}) = \delta{|\log({\overline{R}_{j(i)i}}/ {\nu_{i}} )|}/ {l(i)}$, and choose c>1 such that 0<cδ<1, then we have

which is 1−o(1) as $\overline{R}_{i} \downarrow0$, because 0<1−cδ<1 and by Lemma A.2 noting that $\overline{R}_{i}\downarrow 0$ implies L↑∞. □

Proof of Lemma 3.17

Fix a set of positive constants $\overline{R}$, 0<δ<1 and (τ,d)∈Δ. By Markov inequality,

By taking limits on both sides,

which is greater than or equal to δ by Lemma A.3. The claim is proved because 0<δ<1 is arbitrary. □

1.9 A.9 Proof of Proposition 3.18

Assume on the contrary that $\liminf_{c \downarrow0}\inf_{(\tau,d) \in\Delta} R^{(c,a,m)}_{i}(\tau,d)/{g_{i}^{(c)}(A_{i}(c))} < 1$, implying that there is a decreasing subsequence (c_n)_n≥1↓0 and corresponding strategies $(\tau_{c_{n}}^{*},d_{c_{n}}^{*})$ such that

$$\lim_{n \uparrow\infty} \frac{R^{(c_n,a,m)}_{i}(\tau_{c_n}^*,d_{c_n}^*)}{g_i^{(c_n)} (A_i(c_n))} <1. $$

(65)

By (34), $\inf_{(\tau,d) \in\Delta}R^{(c_{n},a,m)}_{i}(\tau,d) \leq R^{(c_{n},a,m)}_{i}(\tau_{A(c_{n})},d_{A(c_{n})}) \stackrel{n \uparrow \infty}{\longrightarrow} 0$. Therefore, $\| R(\tau_{c_{n}}^{*},d_{c_{n}}^{*}) \|\stackrel{n \uparrow\infty}{\longrightarrow} 0$, where $R(\tau_{c_{n}}^{*},d_{c_{n}}^{*}) =(R_{ji}(\tau_{c_{n}}^{*},d_{c_{n}}^{*}) )_{i \in\mathcal{M}, j \in\mathcal{M}_{0}\setminus\{i\}}$ are the false alarm and misdiagnosis probabilities corresponding to the strategy $(\tau_{c_{n}}^{*},d_{c_{n}}^{*})$. Consequently, Lemma 3.17 applies and we have $D_{i}^{(m)} (\tau_{c_{n}}^{*}) \geq\inf_{(\tau,d) \in \Delta(R(\tau_{c_{n}}^{*},d_{c_{n}}^{*}))} D_{i}^{(m)} (\tau) \geq ( |\log(R_{j(i)i}(\tau_{c_{n}}^{*},d_{c_{n}}^{*})/\nu_{i} )|/l(i))^{m} (1 + o(1))$, where o(1)↓0 as n↑∞. Finally, $R_{i}^{(c_{n},a,m)} (\tau_{c_{n}}^{*},d_{c_{n}}^{*}) \geq c_{n}D_{i}^{(m)} (\tau_{c_{n}}^{*}) + a_{j(i) i}R_{j(i)i}(\tau_{c_{n}}^{*},d_{c_{n}}^{*})/\nu_{i} \geq$

where the last inequality follows from (33). However, this contradicts with (65), and the proof is complete.

1.10 A.10 Proof of Lemma 4.3

By Lemma 4.2, it is sufficient to show that $(\vert (1/n) {\sum_{l=1}^{n} h_{ij}(X_{l})} \vert ^{r})_{n \geq1}$ is uniformly integrable under ℙ_i. The running sum $\sum_{l=1}^{n} h_{ij}(X_{l})$ is a random walk under both ℙ^(∞) and $\mathbb{P}_{i}^{(0)}$, and it is uniformly integrable under both measures because (47) holds; see Gut (1988, Theorem 4.1). Hence, it is also uniformly integrable as well under ℙ_i because $\mathbb{E}_{i} Z \leq\mathbb {E}^{(\infty)} Z +\mathbb{E}_{i}^{(0)} Z$ for every random variable Z.

1.11 A.11 Proof of Lemma 4.5

We first prove the following.

Lemma A.4

Let (ξ_n)_n≥1be a positive stochastic process andTan a.s. finite random time defined on the same probability space$(\Omega, \mathcal {E}, \mathbb{P})$. GivenT, the random variables (ξ_n)_n≥1are conditionally independent, and (ξ_n)_{1≤n≤T−1}and (ξ_n)_n≥Thave common conditional probability distributions ℙ_∞and ℙ₀on, the expectations with respect to which are denoted by$\mathbb{E}_{\infty}$and$\mathbb{E}_{0}$, respectively. Suppose that$\mathbb{E}_{\infty}[\log \xi_{1}]$and$\mathbb{E}_{0}[\log\xi_{1}]$exist, and define

$$ \everymath{\displaystyle}\begin{array}{@{}l} \lambda:=\mathbb{E}_0[\log\xi_1],\qquad \alpha:= \mathbb{E}_{\infty} [\xi_1],\qquad \beta:= \mathbb{E}_0 [ \xi_1],\qquad \gamma:= \max\{\alpha,\beta\},\cr\noalign{\vspace{3pt}}\Phi_n:= \frac{1}{n} \log\prod^n_{k=1}\xi_k, \qquad \psi_n := \log \Biggl(c+ \sum^{n}_{l=1} e^{l\Phi_l}\Biggr), \qquad\eta_n := \frac{\psi_n}{n}, \quad n\geq1\end{array}$$

(66)

for some fixed constantc>0. Then the followings results hold under ℙ:

(i)
We have$\eta_{n} \stackrel{n \uparrow\infty}{\longrightarrow}\lambda_{+}$a.s.
(ii)
Ifλ<0, then the processψ_nconverges asn↑∞ to a finite limit a.s.
(iii)
Ifγ<∞, then (|η_n|^r)_n≥1is uniformly integrable.
(iv)
Ifr≥1 and$\max\{\mathbb{E}_{\infty} [|\log\xi_{1}|^{r} ],\mathbb{E}_{0} [|\log\xi_{1}|^{r} ]\}<\infty$, then (|Φ_n|^q)_n≥1is uniformly integrable for every 0≤q≤r.

Proof of Lemma A.4

Let $\zeta_{n} := \log( \prod_{k=1}^{n} \xi_{k} ) = \sum_{k=1}^{n}\log\xi_{k}$. We will firstly prove (i)–(ii) by considering cases −∞<λ<0, 0≤λ<∞, λ=∞, and λ=−∞, separately.

Case 1: −∞<λ<0. First, because $\eta_{n} \geq (1/n)\log e^{\Phi_{1}} =\Phi_{1}/n = (\log\xi_{1})/n$, we have lim inf_n↑∞η_n≥0 a.s. It is, therefore, enough to prove that its limit superior is less than or equal to zero.

By the SLLN and because T is a.s. finite, the exceptional set Ω₀:={ω∈Ω:ζ_n(ω)/n↛λ or T(ω)=∞} has zero measure. Fix ω∈Ω∖Ω₀ and choose sufficiently small ε>0 such that λ+ε<0. Then we can choose N_ε(ω)≥T(ω) such that, for every k≥N_ε(ω), $\frac{\zeta_{k}(\omega)-\zeta_{T(\omega)-1}(\omega)}{k-(T(\omega)-1)}< \lambda+ \varepsilon< 0$. For every n≥N_ε(ω),

$$e^{\psi_n(\omega)} = c+ \sum_{k=1}^ne^{\zeta_k(\omega)} = c+ \sum_{k=1}^{N_\varepsilon(\omega)-1}e^{\zeta_k (\omega)} + \sum_{k=N_\varepsilon(\omega)}^{n}e^{\zeta_k (\omega)}. $$

(67)

Because λ+ε<0,

which equals $e^{\zeta_{T(\omega)-1}(\omega) + (-T(\omega)+1)(\lambda + \varepsilon)} / {(1-e^{\lambda+\varepsilon})}$ and hence (67) is bounded by $B(\omega) :=c+\sum_{k=1}^{N_{\varepsilon}(\omega)-1} e^{\zeta_{k} (\omega)} +({e^{\zeta_{T(\omega)-1}(\omega) + (-T(\omega)+1) (\lambda+\varepsilon)}} )/({1-e^{\lambda+\varepsilon}}) < \infty$, independently of n. Therefore, lim sup_n↑∞η_n(ω)=lim sup_n↑∞(ψ_n(ω)/n)≤lim sup_n↑∞(logB(ω)/n)=0, as desired. Because ℙ(Ω∖Ω₀)=1, we have lim sup_n↑∞η_n≤0 a.s. Finally, because ψ_n(ω)≤logB(ω) for every n≥N_ε(ω) for a.e. ω and because ψ_n is increasing in n, ψ_n converges to a finite limit a.s.

Case 2: 0≤λ<∞. First note that, the SLLN and the finiteness of T imply $\eta_{n} \geq\frac{1}{n} \log(\xi_{1} \cdots\xi_{n}) = \frac{1}{n}\sum_{k=1}^{T-1} \log\xi_{k} + \frac{n-T+1}{n} \cdot \frac{1}{n-T+1} \sum_{k=T}^{n} \log\xi_{k}\mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{a.s.} \lambda$; therefore, lim inf_n↑∞η_n≥λ a.s. It is now sufficient to show that lim sup_n↑∞η_n−λ≤0.

Fix any realization ω∈Ω∖Ω₀ and ε>0, where Ω₀ is defined in Case 1. Let N_ε(ω)≥T(ω) be such that

$$k \geq N_\varepsilon(\omega)\quad\Longrightarrow\quad \frac {\zeta_k(\omega)-\zeta_{T(\omega)-1}(\omega)}{k-(T(\omega)-1)} <\lambda+ \varepsilon. $$

(68)

Then for every n≥N_ε(ω),

where the last inequality holds because by (68)

Moreover, for $n \geq\tilde{\tau}_{\varepsilon}(\omega) :=N_{\varepsilon}(\omega) \vee[\log (c+\sum_{k=1}^{N_{\varepsilon}(\omega)-1} e^{\zeta_{k}(\omega)} )/\lambda]$, we have $c e^{-n \lambda}+\sum_{k=1}^{N_{\varepsilon}(\omega)-1} e^{\zeta_{k}(\omega)-n \lambda}\leq1$; thus, letting A(ω):=ζ_T(ω)−1(ω)+(−T(ω)+1)(λ+ε) gives

Because ε>0 is arbitrary and ℙ(Ω∖Ω₀)=1, we have lim sup_n↑∞η_n−λ≤0, a.s.

Case 3:λ=−∞. For m∈(0,1), n≥1, define $\xi_{n}^{(m)} := m \vee\xi_{n} \geq m$. Because $-\infty=\mathbb{E}_{0}[ \log\xi_{1} ]=\mathbb{E}_{0} [ (\log\xi_{1})_{+} ]-\mathbb{E}_{0} [ (\log \xi_{1})_{-} ]$, we have $\mathbb{E}_{0}[ (\log\xi_{1})_{+} ] < \infty$ and $\mathbb{E}_{0} [ (\log \xi_{1})_{-} ] = \infty$. Consequently, $\mathbb{E}_{0} [ (\log \xi_{1}^{(m)})_{+} ] = \mathbb{E}_{0} [(\log m \vee\log\xi_{1})_{+} ] =\mathbb{E}_{0} [(\log \xi_{1})_{+}] < \infty$, and $\mathbb{E}_{0} [ (\log\xi_{1}^{(m)})_{-} ] =\mathbb{E}_{0}[ (\log m \vee\log\xi_{1})_{-} ] = \mathbb{E}_{0}[ ( \log m )_{-} \wedge(\log\xi_{1})_{-} ] \leq (\log m)_{-}< \infty$. Hence, $\lambda^{(m)} := \mathbb{E}_{0} [ \log\xi_{1}^{(m)} ]$ is well-defined and

$$\lambda^{(m)} = \mathbb{E}_0 \bigl[ \bigl(\log \xi_1^{(m)}\bigr)_+ \bigr]-\mathbb{E}_0 \bigl[ \bigl(\log\xi_1^{(m)}\bigr)_- \bigr] = \mathbb{E}_0 \bigl[(\log\xi_1)_+ \bigr]-\mathbb{E}_0 \bigl[ \bigl(\log \xi_1^{(m)}\bigr)_- \bigr] $$

(69)

for every m∈(0,1). Because $0 \leq(\log\xi_{1}^{(m)})_{-} = (\log m)_{-} \wedge(\log\xi_{1})_{-} \uparrow(\log\xi_{1})_{-}$ as m↓0, the monotone convergence theorem implies that $\lim_{m \downarrow 0} \mathbb{E}_{0} [ (\log\xi_{1}^{(m)})_{-} ] = \mathbb{E}_{0} [(\log\xi_{1})_{-} ] = \infty$. Therefore, there exists m₀∈(0,1) such that for every m≥m₀, $\mathbb{E}_{0} [(\log \xi_{1}^{(m)})_{-} ] > \mathbb{E}_{0} [ (\log\xi_{1})_{+} ]$, and λ^(m)∈(−∞,0) by (69). Now define $\psi_{n}^{(m)} := \log(c + \xi_{1}^{(m)} +\xi_{1}^{(m)}\xi_{2}^{(m)} + \cdots+ \xi_{1}^{(m)} \cdots \xi_{n}^{(m)} )$ and $\eta_{n}^{(m)} :=\frac{1}{n} \psi_{n}^{(m)}$ for every n≥1 and m∈(0,1). By Case 1, $\lim_{n \uparrow \infty} \psi_{n}^{(m)} < \infty$ and $\lim_{n \uparrow\infty}\eta_{n}^{(m)} = 0$ a.s. for every m≥m₀.

Because n↦ψ_n is increasing, lim_n↑∞ψ_n exists and $\log c \leq\psi_{n} \leq\psi_{n}^{(m_{0})}$ for all n≥0 (note $\xi_{n} \leq\xi_{n}^{(m)}$, n≥0), we have $\log c\leq\lim_{n \uparrow\infty}\psi_{n} \leq\lim_{n \uparrow \infty}\psi_{n}^{(m_{0})}$. Therefore, lim_n↑∞ψ_n is a finite random variable and $\eta_{n} = \psi_{n}/n\stackrel{n \uparrow\infty}{\longrightarrow} 0$ a.s.

Case 4:λ=∞. For every M>1 and n≥1, define $\xi_{n}^{(M)} := M \wedge\xi_{n} \leq M$. Since $\infty=\mathbb{E}_{0}[ \log\xi_{1} ] = \mathbb{E}_{0} [ (\log\xi_{1})_{+}]-\mathbb{E}_{0}[ (\log\xi_{1})_{-} ]$, we have $\mathbb{E}_{0} [ (\log\xi_{1})_{+}] = \infty$ and $\mathbb{E}_{0} [ (\log\xi_{1})_{-} ] <\infty$. Then $\mathbb{E}_{0} [(\log\xi_{1}^{(M)})_{-} ] = \mathbb{E}_{0} [ (\log M\wedge\log \xi_{1})_{-}] = \mathbb{E}_{0} [ (\log\xi_{1})_{-}] < \infty$, and $\mathbb {E}_{0} [(\log \xi_{1}^{(M)})_{+} ]\allowbreak = \mathbb{E}_{0} [ (\log M \wedge\log\xi_{1})_{+} ]= \mathbb{E}_{0} [ ( \log M )_{+} \wedge(\log\xi_{1})_{+} ] =\mathbb{E}_{0} [ ( \log M ) \wedge(\log\xi_{1})_{+} ]$ ≤logM<∞. Hence, $\lambda^{(M)} := \mathbb{E}_{0} [ \log \xi_{1}^{(M)} ]$ is well-defined and

$$\lambda^{(M)} = \mathbb{E}_0 \bigl[ \bigl(\log \xi_1^{(M)}\bigr)_+ \bigr]-\mathbb{E}_0 \bigl[ \bigl(\log\xi_1^{(M)}\bigr)_- \bigr] = \mathbb{E}_0 \bigl[\bigl(\log\xi_1^{(M)}\bigr)_+ \bigr]-\mathbb{E}_0\bigl[ (\log\xi_1)_- \bigr] $$

(70)

for every M≥1. Because $0 \leq( \log\xi_{1}^{(M)})_{+}= (\log M) \wedge(\log\xi_{1})_{+} \uparrow(\log\xi_{1})_{+}$ as M↑∞, the monotone convergence theorem implies $\lim_{M\uparrow\infty} \mathbb{E}_{0} [ ( \log\xi_{1}^{(M)})_{+} ] = \mathbb {E}_{0} [ (\log\xi_{1} )_{+} ] = \infty$.

Therefore, there exists M₀>1 such that for every M≥M₀, $\mathbb{E}_{0} [ ( \log\xi_{1}^{(M)})_{+} ] > \mathbb{E}_{0}[ ( \log\xi_{1})_{-} ]$ and therefore, λ^(M)∈(0,∞) by (70). Now, define $\psi_{n}^{(M)} := \log( c + \xi_{1}^{(M)} +\xi_{1}^{(M)}\xi_{2}^{(M)} + \cdots+ \xi_{1}^{(M)} \cdots \xi_{n}^{(M)} )$ and $\eta_{n}^{(M)} :=\psi_{n}^{(M)}/ n$ for every n≥1 and M>1. By Case 2, $\lim_{n \uparrow\infty}\eta_{n}^{(M)} = \lambda^{(M)}$ ℙ-a.s for M≥M₀. Because $\xi_{n} \geq M \wedge\xi_{n} = \xi_{n}^{(M)}$, we have $\psi_{n} \geq \psi_{n}^{(M)}$ and $\eta_{n} \geq\eta_{n}^{(M)}$. Therefore, $\liminf_{n\uparrow\infty} \eta_{n} \geq\lim_{n \uparrow\infty} \eta_{n}^{(M)} =\lambda^{(M)}$ for every M≥M₀.

Finally, lim inf_n↑∞η_n≥lim_M↑∞λ^(M) equals by (70) $\lim_{M \uparrow\infty} (\mathbb{E}_{0} [ (\log\xi_{1}^{(M)})_{+}] - \mathbb{E}_{0} [ (\log\xi_{1}^{(M)})_{-} ] )= \mathbb{E}_{0} [ (\log\xi_{1})_{+} ] - \mathbb{E}_{0} [ (\log \xi_{1})_{-} ] = \mathbb{E}_{0} [ \log\xi_{1} ] =\lambda= \infty$, by monotone convergence, which implies lim_n↑∞η_n=λ=λ₊. This completes the proof of (i)–(ii).

We now prove (iii) using the next sufficient condition for uniform integrability. □

Lemma A.5

(Woodroofe 1982)

Let (X_n)_n≥1be a stochastic process andr≥1 be fixed. Then (|X_n|^r)_n≥1is uniformly integrable if$\int^{\infty}_{0} x^{r-1} \sup_{n\geq1} \mathbb{P}\{|X_{n}|> x\}\mathrm {d}x <\infty$.

Fix r≥1. We will show that $\int^{\infty}_{0} x^{r-1} \sup _{n\geq 1} \mathbb{P}\{|\eta_{n}|^{r} > x\} \mathrm {d}x =\int^{\infty}_{0} x^{r-1} \sup_{n\geq1} \mathbb{P}\{|\psi_{n}| > nx^{1/r}\}\mathrm {d}x <\infty$, which implies the uniform integrability of (|η_n|^r)_n≥1 by Lemma A.5. Note sup_n≥1ℙ{|ψ_n|>nx^1/r}≤sup_n≥1ℙ{ψ_n<−nx^1/r}+sup_n≥1ℙ{ψ_n>nx^1/r}. Because ψ_n≥logc, we have ℙ{ψ_n<−nx^1/r}≤ℙ{ψ_n<−x^1/r}=0 for every x≥|logc|^r and n≥1, and hence $\int^{\infty}_{0} x^{r-1} \sup_{n\geq1} \mathbb{P}\{\psi_{n} < - nx^{1/r}\}\mathrm {d}x\leq\int^{|\log c|^{r}}_{0} x^{r-1} \mathrm {d}x < \infty$.

On the other hand, because ξ₁,ξ₂,… are conditionally independent given T, and $\mathbb{E}[\xi_{k} \mid T] = \alpha1_{\{k\leq T\}} +\beta1_{\{k\geq T\}} \leq\max\{\alpha,\beta\} = :\gamma<\infty$, Markov inequality gives

Therefore, $\int^{\infty}_{0} x^{r-1} \sup_{n\geq1} \mathbb{P}\{\psi _{n} > nx^{1/r}\} \mathrm {d}x \leq\int^{\gamma^{r}}_{0} x^{r-1} \mathrm {d}x + (c+\frac{1+\gamma}{\gamma} )\int^{\infty}_{\gamma^{r}} x^{r-1}e^{-(x^{1/r}-\gamma)} \mathrm {d}x =\frac{1}{r}\gamma^{r^{2}} + (c+\frac{1+\gamma}{\gamma} ) r e^{\gamma} \int^{\infty}_{\gamma}y^{r^{2}-1} e^{-y} \mathrm {d}y < \infty$, which completes the proofs of (iii).

Finally, for the proof of (iv), note $\Phi_{n} = \frac{1}{n}\sum^{n}_{k=1}\log\xi_{k}$ and $|\Phi_{n}|^{r} \leq(\frac{1}{n} \sum^{n}_{k=1}|\log \xi_{k}| )^{r} \leq\frac{1}{n} \sum^{n}_{k=1} |\log\xi_{k}|^{r}$ since r≥1 and x↦x^r is convex on x∈ℝ₊. Since $\mathbb{E}(|\log \xi_{k}|^{r} \mid T )$$\leq\max\{\mathbb{E}_{\infty}[|\log\xi _{1}|^{r}], \mathbb{E}_{0}[|\log \xi_{1}|^{r}]\}$ for every k≥1,

for every n≥1, and $\sup_{n\geq1} \mathbb{E}|\Phi_{n}|^{r}<\infty $. Moreover, for every ε>0, there exists some δ>0 such that max{ℙ_∞(A),ℙ₀(A)}<δ implies that $\max\{\mathbb{E}_{\infty} [|\log\xi_{1}|^{r} 1_{A}],\mathbb{E}_{0} [|\log\xi_{1}|^{r} 1_{A} ] \} <\varepsilon$, and

for every n≥1, which also implies, together with the boundedness of $(\mathbb{E}|\Phi _{n}|^{r})_{n\geq 1}$, that (|Φ_n|^r)_n≥1 is uniformly integrable. This completes the proof of (iv) and the lemma. □ Now we are ready to prove Lemma 4.5. Note that for every $j \in\mathcal{M}$ and n≥2,

if in (66) we set $\xi_{l} := (1-p) \frac {f_{0}(X_{l})}{f_{j}(X_{l})}$ and $c := \frac{p_{0} +(1-p_{0})p}{(1-p_{0})p} > 0$.

Given that μ=i and θ=t for any fixed $i\in\mathcal{M}$ and t≥1, the random variables ξ_t,ξ_t+1,… are conditionally i.i.d. with a common distribution independent of t; thus, the change time θ plays the role of the random time T in Lemma A.4. Then by Lemma A.4 (i) and (38) we have $L_{n}^{(j)}/n \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n\uparrow \infty}^{\mathbb{P}_{i}-a.s.}(\mathbb{E}_{i}^{(0)} [ \log((1-p) \frac{f_{0}(X_{1})}{f_{j}(X_{1})}) ] )_{+} = [ q(i,j)-q(i,0) - \varrho ]_{+}$, which proves (ii) immediately if $j\in\mathcal{M}\setminus\{i\}$, and (i) and (iv) by Lemma A.4 (ii) if j=i after noticing that $\mathbb{E}_{i}^{(0)} [ \log((1-p) \frac {f_{0}(X_{1})}{f_{i}(X_{1})} ) ] = q(i,i) -q(i,0) - \varrho= -q(i,0) -\varrho< 0$, by (37). Similarly, if j∈Γ_i, (v) holds by Lemma A.4 (ii), since $\mathbb{E}_{i}^{(0)} [ \log((1-p) \frac {f_{0}(X_{1})}{f_{j}(X_{1})} ) ] = q(i,j) -q(i,0) - \varrho<0$ by the definition of Γ_i. By (40), the SLLN and (ii),

which equals [q(i,j)−q(i,0)−ϱ]₋ and proves (iii). For the proof of (vi), note that by Minkowski’s inequality

Because (|log[(1−p₀)p]/n|^r)_n≥1 is bounded, and according to Lemma A.4 (iii) the process (|ψ_n/n|^r)_n≥1 is uniformly integrable under ℙ_i for every r≥1 when (48) is satisfied, we have (vi). Finally, for the proof of (vii), (40) implies

Because (48) holds, $(|L^{(j)}_{n}/n|)_{n\geq1}$ is uniformly integrable by (vi). If we set ξ_k:=[1/(1−p)][f_j(X_k)/f₀(X_k)] for every k≥1 in (66), then (49) and Lemma A.4 (iv) imply that $(|\frac{1}{n}\log\prod^{n}_{k=1}(\frac{1}{1-p}\frac{f_{j}(X_{k})}{f_{0}(X_{k})})|^{r})_{n\geq1}$ is uniformly integrable. Therefore, $(|K^{(j)}_{n}/n|^{r})_{n\geq1}$ is uniformly integrable, and the proof of (vii) is complete.

1.12 A.12 Proof of Lemma 5.1

It is sufficient to prove that $H_{i}^{(a)} (A_{i})$ in (52) converges in distribution as A_i↓0 to W_i−loga_j(i)i under ℙ_i and that $H_{i}^{(a)} (A_{i})$ is bounded from below by some constant.

Because $G_{i}^{(a)}(n) = \sum_{j \in\mathcal{M}_{0} \setminus\{i\}} a_{ji}e^{-\Lambda_{n}(i,j)} = (\sum_{j \in\mathcal{M}_{0} \setminus\{i\}}e^{-\Lambda_{n}(i,j)} ) M_{i}(n)$ for every n≥1, where $M_{i}(n) := {(\sum_{j \in\mathcal{M}_{0} \setminus\{i\}} a_{ji}e^{-\Lambda_{n}(i,j)})} / {(\sum_{j \in\mathcal{M}_{0} \setminus\{i\}}e^{-\Lambda_{n}(i,j)})}$, we have by (15)

$$- \log G_i^{(a)}\bigl(\tau^{(i)}_A\bigr) + \log A_i = W_i(A_i) - \log M_i\bigl(\tau_A^{(i)}\bigr). $$

(71)

Because j(i) is unique and $\Lambda_{n}(i,j)/n \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{\mathbb{P}_{i}\mbox {\scriptsize -a.s.}}l(i,j)$ for every $j \in\mathcal{M}_{0}\setminus\{i\}$ and l(i)<l(i,j) for every $j \in\mathcal{M}_{0}\setminus\{i,j(i)\}$, we have

$$M_i(n) =\frac{a_{j(i) i} + \sum_{j \in\mathcal{M}_0\setminus\{i, j(i)\}} a_{ji} \exp(n [\frac {\Lambda_{n}(i,j(i)) }{n} - \frac{\Lambda_{n}(i,j) }{n}] )}{1+\sum_{j \in\mathcal{M}_0 \setminus\{i, j(i)\}} \exp (n [\frac{\Lambda_{n}(i,j(i)) }{n} - \frac {\Lambda_{n}(i,j) }{n} ] )}\quad \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{{n\uparrow\infty}}^{\mathbb{P}_i\mbox {\scriptsize -a.s.}}\quad a_{j(i)i}.$$

Because $\tau^{(i)}_{A} \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{A_{i} \downarrow0}^{\mathbb{P}_{i}\mbox {\scriptsize -a.s.}} \infty$ by Proposition 3.6, this implies $- \log(M_{i} (\tau_{A}^{(i)}))\mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{A_{i} \downarrow0}^{\mathbb{P}_{i}\mbox {\scriptsize -a.s.}} - \log a_{j(i) i}$. By Proposition 3.4, $\mathbb{P}_{i} \{d_{A} =i, \theta\leq\tau_{A} < \infty\} = 1 - \frac{1}{\nu_{i}} \sum_{j \in \mathcal{M}_{0} \setminus\{i\}} R_{ji} (\tau_{A},d_{A})$ converges to 1 as A_i↓0; i.e., $1_{ \{d_{A}=i,\theta\leq\tau_{A} < \infty\}}$ converges in probability under ℙ_i to 1. These together with the assumption on the convergence of W_i(A_i) show the convergence of $H_{i}^{(a)} (A_{i})$. Finally, because (71) is bounded from below by $-\log\overline{a}_{i}$ and $-\log1_{ \{d_{A}=i, \theta\leq \tau_{A} < \infty\}} \geq0$, $H_{i}^{(a)} (A_{i})$ is bounded from below by $-\log\overline{a}_{i}$.

1.13 A.13 Proof of Lemma 5.6

It is sufficient to show that ξ_n(i,j(i)) converges $\mathbb{P}_{i}^{(t)}$-a.s. to a finite random variable by Remarks 2.5 and 5.4. Firstly, because j(i) is unique, j(i)∈Γ_i by Remark 4.10 (iii). Consequently, ϵ_n(i,j(i)) converges $\mathbb{P}_{i}^{(t)}$-a.s. to a finite random variable by Lemma 4.5 (iv) and (v). Secondly, η_n(i,j(i)) converges $\mathbb{P}_{i}^{(t)}$-a.s. to zero by Propositions 3.7 and 3.8. Finally, $\lim_{n \uparrow\infty }\sum_{l=1}^{n \wedge(\theta-1)} \log ( {f_{i}(X_{l})}/{f_{j(i)}(X_{l})} )$ exists $\mathbb{P}^{(t)}_{i}$-a.s. and equals $\mathbb{P}^{(t)}_{i}$-a.s. finite random variable $\sum_{l=1}^{\theta-1} \log({f_{i}(X_{l})}/{f_{j(i)}(X_{l})})$.

1.14 A.14 Proof of Lemma 5.8

Let g:ℝ↦ℝ be continuous and bounded. By the bounded convergence theorem and Lemma 5.7,

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Dayanik, S., Powell, W.B. & Yamazaki, K. Asymptotically optimal Bayesian sequential change detection and identification rules. Ann Oper Res 208, 337–370 (2013). https://doi.org/10.1007/s10479-012-1121-6

Download citation

Published: 12 April 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10479-012-1121-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Asymptotically optimal Bayesian sequential change detection and identification rules

Abstract

Similar content being viewed by others

Detection and identification of changes of hidden Markov chains: asymptotic theory

Asymptotically optimal pointwise and minimax quickest change-point detection for dependent data

A Multiple Hypothesis Testing Approach to Detection Changes in Distribution

1 Introduction

2 Problem formulations

Assumption 2.1

Problem 1

Problem 2

Remark 2.2

Lemma 2.3

Proposition 2.4

Remark 2.5

3 Asymptotically optimal sequential detection and identification strategies

Definition 3.1

Definition 3.2

Remark 3.3

3.1 Convergence of false alarm and misdiagnosis probabilities and detection delay

Proposition 3.4

Corollary 3.5

Proposition 3.6

Proposition 3.7

Proposition 3.8

Lemma 3.9

Remark 3.10

Condition 3.11

Lemma 3.12

Definition 3.13

Condition 3.14

Proposition 3.15

Remark 3.16

3.2 Asymptotic optimality

Lemma 3.17

Proposition 3.18

Proposition 3.19

4 The convergence results of the LLR processes

Assumption 4.1

4.1 Decomposition of the LLR processes

4.2 The convergence of the LLR processes

Lemma 4.2

Lemma 4.3

Condition 4.4

Lemma 4.5

Lemma 4.6

Condition 4.7

Lemma 4.8

Proposition 4.9

Remark 4.10

Remark 4.11

5 Higher-order approximations

5.1 Asymptotic behaviors of the false alarm and misdiagnosis probabilities

Lemma 5.1

5.2 Nonlinear renewal theory and the overshoot distribution

Definition 5.2

Definition 5.3

Remark 5.4

Theorem 5.5

Lemma 5.6

Lemma 5.7

Lemma 5.8

Proposition 5.9

6 Numerical examples

6.1 The Gaussian case

6.2 Numerical validation of Proposition 3.7

6.3 The numerical comparison of the minimum and asymptotically minimum Bayes risks

Change history

07 October 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A: Proofs and auxiliary results

Appendix A: Proofs and auxiliary results

1.1 A.1 Proof of Remark 2.2

1.2 A.2 Proof of Lemma 2.3

1.3 A.3 Proof of Proposition 3.4