1 Introduction

Sequential change detection and identification refers to the joint problem of sequential change point detection (CPD) and sequential multiple hypothesis testing (SMHT), where one needs to detect, based on a sequence of observations, a sudden and unobservable change as early as possible and identify its cause as accurately as possible. In a Bayesian setup, this problem boils down to optimally solving the trade-off between the expected detection delay and the false alarm and misdiagnosis costs.

The sequential analysis methods such as Wald’s (1947) sequential probability ratio test and Page’s (1954) cumulative sum were developed for the quality control problems, in which a production process may suddenly get out of control at some unknown and unobservable time and one needs to detect the failure time as soon as possible. However, it is more realistic to assume that a production process consists of multiple processing units, each of which is prone to failure, and one needs to detect the earliest failure time and accurately identify the failed component.

In economics and biosurveillance, elevated concerns about financial crises and bioterrorism have increased the importance of early warning systems (see Bussiere and Fratzscher 2006 and Heffernan et al. 2004); structural changes need to be detected in time series such as the S&P 500 index for better financial risk management and over-the-counter medication sales for early signs of a possible disease outbreak. There are a number of potential causes of structural changes, and one needs to identify the cause of the change in order to take the most appropriate countermeasures. Although most existing structural change detection methods employ retrospective tests on historical data, online tests are more appropriate in these settings because time-inhomogeneous data arrive sequentially, and the changes must be identified as soon as possible after they occur.

In this paper, we focus on two online Bayesian formulations and propose two computationally efficient and asymptotically optimal strategies inspired by the separate asymptotic analyses of SMHT (Baum and Veeravalli 1994; Dragalin et al. 1999; Dragalin et al. 2000) and CPD (Tartakovsky and Veeravalli 2004).

We suppose that a system starts in regime 0 and suddenly switches at some unknown and unobservable disorder time θ to one of finitely many regimes \(\mu\in\mathcal{M}:= \{1,\ldots,M \}\). One observes a sequence of random variables X=(Xn)n≥1 which are, conditionally on θ and μ, independent and distributed according to some cumulative distribution function F0 before time θ and Fμ at and after time θ; namely,

$$\underbrace{X_1, \ldots, X_{\theta-1}}_{F_0\mbox{\scriptsize -}\mathrm {distributed}},\underbrace{X_\theta, X_{\theta+1} \ldots}_{F_\mu\mbox{\scriptsize -}\mathrm{distributed}}.$$

The objective is to detect the change as quickly as possible, and at the same time to identify the new regime μ as accurately as possible. More precisely, we want to find a strategy (τ,d), consisting of a pair of detection timeτ and diagnosis ruled, in order to minimize the expected detection delay time and the false alarm and misdiagnosis probabilities. This paper studies the following formulations:

  1. (i)

    In the minimum Bayes risk formulation, one minimizes a Bayes risk which is the sum of the expected detection delay time and the false alarm and misdiagnosis probabilities.

  2. (ii)

    In the Bayesian fixed-error-probability formulation, one minimizes the expected detection delay time subject to some small upper bounds on the false alarm and misdiagnosis probabilities.

The precise formulations are given as Problems 1 and 2, respectively, on p. 5 in Sect. 2. A majority of practitioners prefer working with the Bayesian fixed-error-probability formulation because the hard constraints on error probabilities are easier to set up and understand than the costs of detection delay, false alarm, and misdiagnosis in the minimum Bayes risk formulation. The Bayesian fixed-error-probability formulation is often solved by means of its Lagrange relaxation, which turns out to be a minimum Bayes risk problem where the costs are the Lagrange multipliers (or shadow prices) of the false alarm and misdiagnosis constraints. We discuss in more detail the correspondence between the optimal solutions of these two formulations in Sect. 2. Another reason for solving the minimum Bayes risk formulation is that it allows the expert opinions about the risks to be naturally included in the solution. Therefore, we decide to study both formulations in this paper.

Finding the optimal solutions under both formulations requires intensive computations. For example, the minimum Bayes risk formulation reduces to an optimal stopping problem as shown by Dayanik et al. (2008) (see also Lovejoy (1991), White (1991), Borkar (1991), and Runggaldier (1991) for general solution methods available for the partially observed Markov decision processes and Burnetas and Katehakis (1997) for adaptive control for Markov decision processes), and the optimal strategy is to stop as soon as the posterior probability process \(\Pi=(\Pi_{n}^{(0)},\ldots,\Pi_{n}^{(M)})_{n \geq0}\), where

$$\Pi_n^{(i)} :=\mathbb{P}\{\mbox {The system is in regime } i\mbox{ at time } n \mid X_1,\ldots,X_n \} \quad \mbox {for every } i\in\mathcal{M}_0 \mbox{ and } n \geq0,$$

with \(\mathcal{M}_{0} := \mathcal{M}\cup\{0\}\), enters some suitable region of the M-dimensional probability simplex.

Figure 1(a) illustrates the optimal stopping regions for a typical problem with M=2. The process Π starts in the lower-left corner, which corresponds to the “no change” state or regime 0. As observations are made, it progresses through the light-colored region, where raising a change-alarm is suboptimal. If it enters the shaded region in the top corner, then declaring a regime switch from 0 to 1 is optimal. If it enters the shaded region in the lower-right corner, then declaring a regime switch from 0 to 2 is optimal. The first hitting time to one of those shaded regions and the corresponding estimate of the new regime minimize the costs for the minimum Bayes risk formulation.

Fig. 1
figure 1

(a) The union of the shaded regions is the optimal stopping regions. (b) The dotted triangles are the stopping regions of one of the strategies we propose in this paper

These shaded regions can in principle be found by dynamic programming methods; see, for example, Derman (1970), Puterman (1994) and Bertsekas (2005). However, those methods are generally computationally intensive due to the curse of dimensionality. The state space increases exponentially in the number of regimes, and finding an optimal strategy by using the classical dynamic programming methods tends to be practically impossible in higher dimensions.

Our goal is to obtain a practical solution that is both near-optimal and computationally feasible. We propose two simple and asymptotically optimal strategies by approximating the optimal stopping regions with simpler shapes. In particular, our strategy for the minimum Bayes risk formulation raises a change alarm and estimates the new regime when the posterior probability of at least one of the change types exceeds some predetermined threshold for the first time. In Fig. 1(b), the stopping regions of this strategy correspond to the union of the triangles in the two corners. Those triangular regions determine a stopping and selection strategy, and hence the problem is simplified to designing the triangular regions to minimize the risks.

We give an asymptotic analysis of the change detection and identification problem. The SMHT and CPD are the special cases. The asymptotic optimality of our strategies can be proved using nonlinear renewal theory after casting the log-likelihood-ratio (LLR) processes

$$\Lambda_n(i,j) := \log\frac{\Pi_n^{(i)}}{\Pi_n^{(j)}}, \quad n\geq1, i \in \mathcal{M}, j \in\mathcal{M}_0\setminus\{i\}, $$
(1)

as the sum of suitable random walks and some slowly-changing stochastic processes. We show that the r-quick convergence of Lai (1977) for an appropriate subset of the LLR processes in (1) is a sufficient condition for asymptotic optimality. We also pursue higher-order asymptotic approximations for the minimum Bayes risk formulation as inspired by Baum and Veeravalli (1994)’s work for SMHT.

The remainder of the paper is organized as follows. We formulate the Bayesian sequential change detection and identification problem in Sect. 2. In Sect. 3, we propose two sequential change detection and identification strategies and obtain sufficient conditions for their asymptotic optimality in terms of the LLR processes. In Sect. 4 we study certain convergence properties of the LLR processes that are required to implement the asymptotically optimal strategies. In Sect. 5, we obtain higher-order asymptotic approximations for the minimum Bayes risk formulation using nonlinear renewal theory. Section 6 concludes with numerical examples. The proofs and some auxiliary results are presented in the appendix.

2 Problem formulations

Consider a probability space \((\Omega, \mathcal{F}, \mathbb{P})\) hosting a stochastic process X=(Xn)n≥1 taking values in some measurable space \((E,\mathcal {E})\). Let θ:Ω↦{0,1,…} and \(\mu: \Omega\mapsto\mathcal{M}:= \{ 1,\dots, M \}\) be independent random variables defined on the same probability space with the probability distributions

$$\mathbb{P}\{ \theta= t \} = \left\{ \everymath{\displaystyle}\begin{array}{l@{\quad}l}p_0, &\mbox {if } t=0\cr\noalign{\vspace{3pt}}(1-p_0) (1-p)^{t-1}p, & \mbox {if } t\geq1\end{array}\right\}\quad \mbox {and} \quad\nu_i = \mathbb{P}\{\mu= i \} > 0, \quad i \in\mathcal{M}$$

for some known constants p0∈[0,1), p∈(0,1), and positive constants \(\nu= (\nu_{i})_{i \in\mathcal{M}}\). The random variable θ has an exponential tail with

$$\varrho:= -\lim_{t \uparrow\infty} \frac{\log\mathbb{P}\{ \theta \geq t+1 \}}{t} =\bigl| \log(1-p) \bigr|. $$
(2)

Given μ=i and θ=t, the random variables X1,X2,… are conditionally independent, and (Xn)1≤nt−1 and (Xn)nt have common conditional probability density functions f0 and fi, respectively, with respect to some σ-finite measure m on \((E,\mathcal {E})\); namely,

for every \(i \in\mathcal{M}\), t≥1, n≥1, and \((E_{1}\times\cdots \times E_{n}) \in\mathcal{E}^{n}\). The following assumptions remove certain trivial cases; see Remark 4.10 below.

Assumption 2.1

For every \(i\in\mathcal{M}_{0}\) and \(j \in\mathcal{M}_{0} \setminus\{i\}\), 0<fi(X1)/fj(X1)<∞ a.s., and Fi and Fj are distinguishable; \(\int_{\{x\in E: f_{i}(x)\neq f_{j}(x)\}} f_{i}(x)m(\mathrm {d}x) > 0\).

Let \(\mathbb{F}= (\mathcal{F}_{n})_{n \geq0}\) denote the filtration generated by X; namely, \(\mathcal{F}_{0} = \{\varnothing, \Omega\}\) and \(\mathcal {F}_{n} =\sigma(X_{1},\dots,X_{n})\) for every n≥1. A sequential change detection and identification rule (τ,d) is a pair consisting of an \(\mathbb{F}\)-stopping time τ (in short, \(\tau\in \mathbb{F}\)) and a random variable \(d: \Omega\mapsto\mathcal{M}\) that is measurable with respect to the observation history \(\mathcal{F}_{\tau}\) up to the stopping time τ (namely, \(d \in\mathcal{F}_{\tau}\)). Let

$$\Delta:= \bigl\{ (\tau,d) : \tau\in\mathbb{F}\mbox{ and } d \in \mathcal{F}_\tau \mbox{ is an } \mathcal{M}\mbox{-valued random variable} \bigr\}$$

be the collection of all sequential change detection and identification rules. The objective is to find a strategy (τ,d) that solves optimally the trade-off between the mth moment

$$D^{(m)}(\tau) := \mathbb{E}\bigl[(\tau- \theta)_+^m \bigr], $$
(3)

of the detection delay time (τθ)+ for some m≥1 and the false alarm and misdiagnosis probabilities

(4)
(5)

Here and for the rest of the paper, x+:=max(x,0) and x:=max(−x,0) for any x∈ℝ.

We formulate the optimal trade-offs between (3)–(5) as in the following two related problems:

Problem 1

(Minimum Bayes risk formulation)

For fixedm≥1, c>0, and strictly positive constants\(a=(a_{ji})_{i \in\mathcal{M}, j \in\mathcal{M}_{0} \setminus\{i\}}\), calculate the minimum Bayes risk inf(τ,d)∈ΔR(c,a,m)(τ,d), where

$$R^{(c,a,m)}(\tau,d) := c\, D^{(m)}(\tau) + \sum _{i \in\mathcal{M}} \sum_{j\in \mathcal{M}_0 \setminus\{i\}}a_{ji} R_{ji}(\tau,d) $$
(6)

is the expected sum of all risks arising from the detection delay time, false alarm and misdiagnosis, and find a strategy (τ,d)∈Δ which attains the minimum Bayes risk, if such a strategy exists.

Problem 2

(Bayesian fixed-error-probability formulation)

For fixed positive constantsm≥1 and\(\overline{R} = (\overline{R}_{ji})_{i \in\mathcal{M}, j \in \mathcal{M}_{0}\setminus\{i\}}\), calculate the smallestmth moment\(\inf_{(\tau,d) \in\Delta(\overline{R})} D^{(m)}(\tau)\)of detection delay time among all decision rules in

$$\Delta(\overline{R}) := \bigl\{ (\tau,d) \in\Delta: R_{ji}(\tau,d)\leq\overline{R}_{ji}, i \in\mathcal{M}, \, j \in\mathcal{M}_0\setminus \{i\} \bigr\}$$

with the same predetermined upper bounds on false alarm and misdiagnosis probabilities, and find a strategy\((\tau^{*},d^{*})\in \Delta(\overline{R})\)which attains the minimum, if such a strategy exists.

Problem 1 can in principle be solved optimally by stochastic dynamic programming. A standard way to solve Problem 2 optimally is by working through its Lagrange relaxation, which turns out to be an instance of Problem 1, where aji serves as the Lagrange multiplier of the constraint \(R_{ji}(\tau,d)\leq\overline{R}_{ji}\) for every \(i \in\mathcal{M}\) and \(j \in \mathcal{M}_{0}\setminus\{i\}\). Indeed, if for some a, a decision rule (τ,d)∈Δ attains the minimum Bayes risk inf(τ,d)∈ΔR(c,a,m)(τ,d) and if \(R_{ji}(\tau^{*},d^{*}) = \overline{R}_{ji}\) for every \(i \in\mathcal{M}, \, j \in\mathcal{M}_{0} \setminus\{i\}\), then for every \((\tau,d) \in\Delta(\overline{R}) \subseteq\Delta\),

$$c\, D^{(m)}\bigl(\tau^*\bigr) + \sum_{i \in\mathcal{M}}\sum_{j \in\mathcal{M}_0 \setminus \{i\}} a_{ji} R_{ji}\bigl(\tau^*,d^*\bigr) \leq c\, D^{(m)}(\tau) + \sum _{i\in\mathcal{M}} \sum_{j \in\mathcal{M}_0 \setminus\{i\}}a_{ji} R_{ji}(\tau,d)$$

implies that \(c (D^{(m)}(\tau^{*})-D^{(m)}(\tau)) \leq\sum_{i \in \mathcal{M}}\sum_{j \in\mathcal{M}_{0} \setminus\{i\}} a_{ji}(R_{ji}(\tau,d)-R_{ji}(\tau^{*},d^{*})) =\sum_{i \in\mathcal{M}} \sum _{j \in \mathcal{M}_{0} \setminus\{i\}} a_{ji} (R_{ji}(\tau,d)-\overline{R}_{ji})\leq0\), and hence, the same (τ,d) rule is also optimal for the Bayesian fixed-error-probability formulation. The asymptotically optimal decision rules proposed for Problems 1 and 2 will likewise be related.

On the one hand, a majority of practitioners favor the formulation in Problem 2 over that in Problem 1, because the hard constraints \(R_{ji}(\tau,d) \leq\overline{R}_{ji}, i \in\mathcal{M}, \, j \in \mathcal{M}_{0}\setminus\{i\}\) in Problem 2 are easier to set up and to understood than the (shadow) costs c and a of decision delay, false alarm, and misdiagnosis. On the other hand, some practitioners still find Problem 1 useful to incorporate expert opinions.

As we introduced in Sect. 1, let \(\Pi=(\Pi_{n}^{(0)},\ldots,\Pi_{n}^{(M)})_{n \geq0}\) be the posterior probability process defined by

$$\Pi_n^{(0)} := \mathbb{P}\{ \theta> n \vert \mathcal{F}_n \} \quad\mbox{and} \quad\Pi_n^{(i)} := \mathbb{P}\{ \theta\leq n, \mu=i \vert \mathcal{F}_n \}, \quad i \in \mathcal{M}, n\geq0.$$

Dayanik et al. (2008) proved that Π is a Markov process satisfying

$$\Pi_n^{(i)} = \frac{\alpha_n^{(i)}(X_1,\ldots,X_n)}{\sum_{j \in \mathcal{M}_0} \alpha^{(j)}_n(X_1,\ldots,X_n)}, \quad i\in \mathcal{M}_0,$$

where \(\alpha_{n}^{(i)} (x_{1},\ldots,x_{n})\) equals

$$\left\{ \everymath{\displaystyle}\begin{array}{l@{}l@{}l} &(1-p_0) (1-p)^n\prod_{l=1}^n f_0(x_l),&\quad i = 0\\&p_0 \nu_i \prod_{k=1}^nf_i(x_k) + (1-p_0)p \nu_i\sum_{k=1}^n (1-p)^{k-1} \prod _{l=1}^{k-1} f_0(x_l)\prod_{m=k}^n f_i(x_m), &\quad i \in\mathcal{M}\end{array}\right\}$$

for every n≥1 and (x1,…,xn)∈En, and

$$\alpha^{(i)}_n(x_1,\ldots,x_n)m(\mathrm {d}x_1)\cdots m(\mathrm {d}x_n) = \left\{ \everymath{\displaystyle}\begin{array}{l@{}l@{}l} &\mathbb{P}\{\theta>n, X_1\in \mathrm {d}x_1,\ldots,X_n \in \mathrm {d}x_n\}, &\quad i=0\\&\mathbb{P}\{\theta\leq n, \mu= i, X_1\in \mathrm {d}x_1,\ldots,X_n \in \mathrm {d}x_n\}, &\quad i\in\mathcal{M}\end{array}\right\}.$$

Remark 2.2

Assumption 2.1 implies that \(0< \Pi_{n}^{(i)} <1\) a.s. for every finite n≥1 and \(i\in\mathcal{M}\).

Let us denote by \(\alpha_{n}^{(i)}\) the random variable \(\alpha_{n}^{(i)}(X_{1},\ldots,X_{n})\) for every n≥0. Then the LLR processes defined in (1) can be written as

$$\Lambda_n(i,j) = \log\frac{\alpha_n^{(i)}}{\alpha^{(j)}_n}, \quad i \in\mathcal{M}, j \in \mathcal{M}_0 \setminus\{i\}, n \geq1. $$
(7)

In our analyses, it is often very convenient to work under the conditional probability measures:

(8)
(9)

defined for every \(i \in\mathcal{M}\), n≥1, \((E_{1} \times\cdots \times E_{n}) \in \mathcal {E}^{n}\). Let \(\mathbb{E}_{i}\) and \(\mathbb{E}_{i}^{(t)}\), respectively, be the expectations with respect to ℙi and \(\mathbb{P}^{(t)}_{i}\). Under \(\mathbb{P}_{i}^{(0)}\) and \(\mathbb{P}_{i}^{(\infty)}\), the random variables X1,X2,… are independent and have common probability density functions fi(⋅) and f0(⋅), respectively. We denote by ℙ(∞) any \(\mathbb {P}_{i}^{(\infty)}\) for any \(i \in\mathcal{M}\). The LLR processes in (1) or (7) play a role in changing probability measures as the next lemma shows.

Lemma 2.3

(Change of measure)

For every\(i \in\mathcal{M}\), an\(\mathbb{F}\)-stopping timeτ, and an\(\mathcal{F}_{\tau}\)-measurable eventF,

The next proposition introduces the key risk components and its proof follows directly from Lemma 2.3 after setting \(F:=\{d=i\} \in\mathcal{F}_{\tau}\) for every \(i\in\mathcal{M}\).

Proposition 2.4

For every strategy (τ,d)∈Δ, c>0, m≥1 and strictly positive constants\(a=(a_{ji})_{i\in\mathcal{M},j\in\mathcal {M}\setminus\{i\}}\), we can rewrite (4)(6) as

where for every\(i \in\mathcal{M}\)

(10)
(11)
(12)
(13)

Here (10)–(12) correspond to the conditional risks given μ=i, written in terms of the process \(G_{i}^{(a)} (n)\), which is a linear combination of the exponents of the LLR processes and serves as the Radon-Nikodym derivative.

Remark 2.5

In the remainder, we prove a number of results in the ℙi-a.s. sense for given \(i \in\mathcal{M}\). These also hold automatically \(\mathbb{P}_{i}^{(t)}\)-a.s. for every t≥1. Indeed, because ℙ{θ<∞}=1, ℙ{θ=t}>0 for every t≥1 and \(\mathbb{P}_{i} (F) = \sum_{t=0}^{\infty}\mathbb {P}\{ \theta = t \} \mathbb{P}_{i}^{(t)} (F)\) for every \(F \in\mathcal{F}\), ℙi(F)=1 implies \(\mathbb{P}^{(t)}_{i}(F)=1\) for every t≥1.

3 Asymptotically optimal sequential detection and identification strategies

We will introduce two strategies that are computationally efficient and asymptotically optimal. The first strategy raises an alarm as soon as the posterior probability of the event that at least one of the change types occurred exceeds some suitable threshold, and is shown to be asymptotically optimal for Problem 1. The second strategy is its variant expressed in terms of the LLR processes and is shown to be asymptotically optimal for Problem 2. The asymptotic performance analyses of both rules depend on the same convergence results of the LLR processes. The proofs can be conducted in parallel and almost simultaneously both for Problem 1 and for Problem 2 because the detection times can be approximated by the first hitting times of certain processes that share the same asymptotic properties.

Definition 3.1

((τA,dA)-strategy for the minimum Bayes risk problem)

For every set \(A = (A_{i})_{i \in\mathcal{M}}\) of strictly positive constants, let (τA,dA) be the strategy defined by

(14)

Define the logarithm of the odds-ratio processes as

$$\Phi_n^{(i)} := \log\frac{\Pi_n^{(i)}}{1-\Pi_n^{(i)}} = - \log\biggl[\sum_{j \in\mathcal{M}_0 \setminus \{i\}} \exp\bigl(-\Lambda_n(i,j)\bigr) \biggr] , \quad i \in\mathcal{M}, n \geq1. $$
(15)

Then (14) can be rewritten as

$$\tau_A^{(i)} = \inf\biggl\{ n \geq1 : \frac{1-\Pi_n^{(i)}}{\Pi_n^{(i)}}< A_i \biggr\} = \inf\bigl\{n \geq1: \Phi_n^{(i)}> - \log A_i \bigr\}, \quad i \in\mathcal{M}. $$
(16)

The values of A determine the sizes of the polyhedrons that approximate the original optimal stopping regions, e.g., the triangular regions when M=2 as in Fig. 1(b), and need to be determined so as to minimize the Bayes risk.

Definition 3.2

((υB,dB)-strategy for the Bayesian fixed-error-probability formulation)

For every set \(B = (B_{i})_{i \in\mathcal{M}}\) and \(B_{i} = (B_{ij})_{j \in\mathcal{M}_{0} \setminus\{i\}}\), \(i \in \mathcal{M}\) of strictly positive constants, let (υB,dB) be the strategy defined by

(17)

We show that, after choosing suitable A and B, the strategy (τA,dA) is asymptotically optimal for Problem 1 as c goes to zero, and the strategy (υB,dB) is asymptotically optimal for Problem 2 as

$$\|\overline{R}\| := \max_{i \in\mathcal{M}, j \in\mathcal{M}_0\setminus\{i\}}\overline{R}_{ji}$$

goes to zero—while \(\overline{R}_{ji}/\overline{R}_{ki}\) for every \(j,k\in\mathcal{M}_{0}\setminus\{i\}\) remains bounded away from zero in the sense that

$$\frac{\min_{j \in\mathcal{M}_0 \setminus\{i\}} \overline{R}_{ji}}{\max_{j \in\mathcal{M}_0 \setminus\{i\}} \overline{R}_{ji}} > k_i\quad \mbox {for every }i \in\mathcal{M} $$
(18)

for any strictly positive constants \(k = (k_{i})_{i \in\mathcal {M}}\)—and this limit mode will still be denoted by “\(\|\overline{R}\|\downarrow0\)” for brevity.

More precisely, we find functions A(c) of the unit sampling cost c in Problem 1 and \(B(\overline{R})\) of the upper bounds \((\overline{R}_{ji})_{i\in\mathcal{M},j\in\mathcal {M}_{0}\setminus\{i\}}\) on the false alarm and misdiagnosis probabilities in Problem 2 so that (τA(c),dA(c))∈Δ for every c>0, \((\upsilon_{B(\overline{R})},d_{B(\overline{R})}) \in \Delta(\overline{R})\) for every \(\overline{R}>0\), and

(19)
(20)

for every fixed m≥1 and every set \(a=(a_{ji})_{i\in\mathcal {M}, j\in \mathcal{M}_{0}\setminus\{i\}}\) of strictly positive constants. Here “xγyγ as γγ0” means \(\lim_{\gamma\rightarrow \gamma_{0}} {x_{\gamma}} / {y_{\gamma}} = 1\). In fact, we obtain results stronger than (19)–(20); for every \(i\in\mathcal{M}\)

(21)
(22)

Remark 3.3

For all \(i \in\mathcal{M}\), let \(\overline{B}_{i} := \max_{j \in \mathcal{M}_{0}\setminus\{i\}} B_{ij}\), \(\underline{B}_{i} := \min_{j \in\mathcal{M}_{0}\setminus\{i\}} B_{ij}\) and \(\Psi^{(i)}_{n} := \min_{j \in\mathcal{M}_{0}\setminus\{i\}} \Lambda_{n}(i,j)\), n≥1. Then,

$$\underline{\upsilon}_B^{(i)} \leq\upsilon_B^{(i)}\leq\overline{\upsilon}_B^{(i)} \quad \mbox {for every } i\in \mathcal{M} $$
(23)

where \(\underline{\upsilon}_{B}^{(i)} := \inf\{ n \geq1:\Psi^{(i)}_{n} > - \log\overline{B}_{i} \}\) and \(\overline{\upsilon}_{B}^{(i)} := \inf\{ n \geq1: \Psi^{(i)}_{n} >- \log\underline{B}_{i} \}\). Notice that (15) implies \(\Phi_{n}^{(i)} \leq\Lambda_{n}(i,j)\) for every n≥1 and \(j \in\mathcal{M}_{0} \setminus\{i\}\), and hence

$$\Psi^{(i)}_n \geq\Phi_n^{(i)}, \quad n\geq1. $$
(24)

3.1 Convergence of false alarm and misdiagnosis probabilities and detection delay

As c and \(\overline{R}\) decrease to zero in Problems 1 and 2, respectively, we expect that the optimal stopping regions shrink, or equivalently the values of A and B should decrease. We therefore study the asymptotic behaviors of the false alarm and misdiagnosis probabilities and the change detection time as

$$\| A \| := \max_{i \in\mathcal{M}} A_i \quad\mbox{and} \quad\| B\| := \max_{i \in\mathcal{M}, j \in\mathcal{M}_0 \setminus\{i\}} B_{ij}$$

go to zero, and then adapt their values as functions of c and \(\overline{R}\) so as to attain asymptotically optimal strategies. Here in concordance with (18) the limits \(\overline{B}_{i} \downarrow0\) for every \(i \in\mathcal{M}\) are taken such that

$$ {\underline{B}_i}/{\overline{B}_i}= \frac{\min_{j\in\mathcal{M}_0\setminus\{i\}} B_{ij}}{\max_{j\in\mathcal{M}_0 \setminus\{i\}}B_{ij}} \geq b_i \quad \mbox {for some constants$0<b_{i}\leq1$}.$$
(25)

We first study the asymptotic behaviors of the false alarm and misdiagnosis probabilities. The upper bounds can be obtained by a direct application of Proposition 2.4.

Proposition 3.4

(Bounds on false alarm and misdiagnosis probabilities)

(i) For every fixed\(A = (A_{i})_{i \in\mathcal{M}}\)and\(a = (a_{ji})_{i \in \mathcal{M}, j \in\mathcal{M}_{0}\setminus\{i\}}\), we have\(R_{i}^{(a)}(\tau_{A},d_{A}) \leq \overline{a}_{i} A_{i}\)for every\(i \in\mathcal{M}\), where\(\overline {a}_{i} :=\max_{j \in\mathcal{M}_{0}\setminus\{i\}} a_{ji}\)andRji(τA,dA)≤νiAiνiAfor every\(i\in\mathcal{M}\)and\(j\in \mathcal{M}_{0}\setminus\{i\}\).

(ii) For every\(B =(B_{ij})_{i \in\mathcal{M}, j \in\mathcal{M}\setminus\{i\}}\), we haveRji(υB,dB)≤νiBijfor every\(i \in\mathcal{M}\)and\(j \in \mathcal{M}_{0} \setminus\{i\}\).

Corollary 3.5

(i) \(\max_{i \in\mathcal{M}} R^{(a)}_{i} (\tau_{A},d_{A})\downarrow0\)asA∥↓0, (ii) \(\max_{i\in\mathcal{M},j\in\mathcal {M}_{0} \setminus\{i\}} R_{ji}(\upsilon_{B}, d_{B}) \downarrow0\)asB∥↓0.

Proposition 3.6

Fix\(i \in \mathcal{M}\). We havei-a.s. (i) \(\tau_{A}^{(i)} \uparrow \infty\)asAi↓0, (ii) τA↑∞ asA∥↓0, (iii) \(\upsilon_{B}^{(i)} \uparrow\infty\)as\(\overline{B}_{i}\downarrow0\), and (iv) υB↑∞ asB∥↓0.

The asymptotic behavior of the detection delay is closely related to the convergence of the average increment Λn(i,j)/n. According to the next proposition, Λn(i,j)/n converges ℙi-a.s. as n↑∞ to some strictly positive constant for every \(i\in\mathcal{M}\) and \(j \in\mathcal{M}_{0}\setminus\{i\}\). The proof of Proposition 3.7 is deferred to Sect. 4, where the limiting values are analytically expressed in terms of the Kullback-Leibler divergence between the alternative probability measures.

Proposition 3.7

For every\(i \in \mathcal{M}\)and\(j \in\mathcal{M}_{0} \setminus\{i\}\), we havei-a.s. Λn(i,j)/nl(i,j) asn↑∞ for some strictly positive constantl(i,j).

Let us fix any \(i \in\mathcal{M}\). We show that, for small values of A and B, the stopping times \(\tau_{A}^{(i)}\) and \(\upsilon_{B}^{(i)}\) in (14) and (17) are essentially determined by the process Λ(i,j(i)), where

(26)

and ℙi-a.s. \(\Lambda_{n}(i,j(i))/n \approx\Phi_{n}^{(i)}/n\approx \Psi^{(i)}_{n}/n \approx l(i)\) for sufficiently large n as the next proposition suggests.

Proposition 3.8

For every\(i\in\mathcal{M}\), we havei-a.s. (i) \(\Phi _{n}^{(i)}/n \rightarrow l(i)\)and (ii) \(\Psi_{n}^{(i)}/n \rightarrow l(i)\)asn↑∞.

The proof of part (i) follows from Proposition 3.7, and part (ii) follows from part (i) and Baum and Veeravalli (1994, Lemma 5.2). Proposition 3.8 implies the following convergence results.

Lemma 3.9

For every\(i \in\mathcal{M}\)and any\(j(i) \in \arg \min _{j\in \mathcal{M}_{0}\setminus \{i\}} l(i,j)\), we havei-a.s.

$$\everymath{\displaystyle}\begin{array}{r@{\quad}l@{\qquad}r@{\quad}l}(\mathrm{i}) &-\frac{\tau_A^{(i)}}{\log A_i} \stackrel{A_i\downarrow0}{\longrightarrow} \frac{1}{l(i)}, &(\mathrm{ii}) & - \frac{(\tau_A^{(i)}-\theta)_+}{\log A_i} \stackrel{A_i \downarrow0}{\longrightarrow} \frac{1}{l(i)},\cr\noalign{\vspace{3pt}}(\mathrm{iii}) &-\frac{\upsilon_B^{(i)}}{\log B_{ij(i)}} \stackrel{\overline{B}_i \downarrow0}{\longrightarrow} \frac{1}{l(i)}, &(\mathrm{iv}) &-\frac{(\upsilon_B^{(i)}-\theta)_+}{\log B_{ij(i)}}\stackrel{\overline{B}_i\downarrow0}{\longrightarrow} \frac{1}{l(i)}.\end{array}$$

Remark 3.10

We shall always assume that 0<Bij<1 or −∞<logBij<0 for all \(i \in\mathcal{M}\) and \(j\in\mathcal{M}_{0}\backslash\{i\}\) as we are interested in the limits of certain quantities as ∥B∥↓0. Because (25) implies that \(b_{i} \overline{B}_{i} \leq\underline{B}_{i} \leq B_{ij} \leq \overline{B}_{i}\), we have \(1 \leq\frac{-\log B_{ij}}{-\log\overline {B}_{i}} \leq\frac{-\log \underline{B}_{i}}{-\log\overline{B}_{i}} \leq\frac{-\log(b_{i}\overline{B}_{i})}{-\log\overline{B}_{i}} \leq1+\frac{-\log b_{i}}{-\log\overline{B}_{i}}\), which implies that

$$ 1=\lim_{\overline{B}_i \downarrow0}\frac{\log B_{ij}}{\log \overline{B}_i} =\lim_{\overline{B}_i \downarrow0} \frac{\log\underline{B}_i}{\log \overline{B}_i} =\lim_{\overline{B}_i \downarrow0}\frac{\log B_{ij}}{\log \underline{B}_i} \quad \mbox {for every } i\in\mathcal{M}, j\in \mathcal{M}_0\setminus\{i\},$$
(27)

where the last equality follows from the first two equalities.

Because we want to minimize the mth moment of the detection delay time for any m≥1, we will strengthen the convergence results of Lemma 3.9. Condition 3.11 below for some rm is both necessary and sufficient for the Lm-convergences.

Condition 3.11

(Uniform Integrability)

For some rm,

  1. (i)

    the family \(\{(\tau_{A}^{(i)}/(-\log A_{i}))^{r} \}_{A_{i} >0}\) is ℙi-uniformly integrable for every \(i \in\mathcal{M}\),

  2. (ii)

    the family \(\{(\upsilon_{B}^{(i)}/(-\log B_{ij(i)}))^{r} \}_{B_{i} > 0}\) is ℙi-uniformly integrable for every \(i \in\mathcal{M}\).

Lemma 3.12

Letm≥1 be any integer.

  1. (i)

    Condition 3.11 (i) holds for somermif and only if\(\mathbb{E}_{i}[(\tau^{(i)}_{A})^{m}]<\infty \)for everyAi>0 and

    (28)
  2. (ii)

    Condition 3.11 (ii) holds for somermif and only if\(\mathbb{E}_{i}[(\upsilon ^{(i)}_{B})^{m}]<\infty\)for everyBi>0 and

    (29)

where the limits\(\overline{B}_{i} \downarrow0\)for all\(i \in\mathcal{M}\)are taken such that (25) is satisfied.

The proof of Lemma 3.12 follows from Lemma 3.9, Chung (2001, Theorem 4.5.4), Gut (2005, Theorem 5.2) and because \(\tau_{A}^{(i)}-\theta\leq(\tau_{A}^{(i)}-\theta)_{+} \leq \tau_{A}^{(i)}\) and \(\upsilon_{B}^{(i)}-\theta\leq (\upsilon_{B}^{(i)}-\theta)_{+} \leq\upsilon_{B}^{(i)}\). Using renewal theory, one can show that Condition 3.11 holds if Λn(i,j)=X1+⋯+Xn is a random walk for some sequence (Xn)n≥1 of i.i.d. random variables with \(\mathbb{E}X_{1} > 0\) and \(\mathbb{E}[(X_{1})^{r}_{-}] < \infty\); see Lai (1975). In the case of the SMHT, Λn(i,j) is indeed a random walk with positive drift for every \(i\in\mathcal{M}\) and \(j\in\mathcal{M}_{0}\setminus\{i\}\); see Baum and Veeravalli (1994).

Condition 3.11 is often hard to verify. An alternative sufficient condition can be given in terms of the r-quick convergence. The r-quick convergence of suitable stochastic processes is known to be sufficient for the asymptotic optimalities of certain sequential rules based on non-i.i.d. observations in CPD and SMHT problems. We will show that the r-quick convergence of the LLR processes is also sufficient for the joint sequential change detection and identification problem.

Definition 3.13

(The r-quick convergence)

Let (ξn)n≥0 be any stochastic process and r>0. Then r-quick-lim infn→∞ξnc if and only if \(\mathbb{E}[ (T_{\delta})^{r} ] < \infty\) for every δ>0, where

$$T_\delta:=\inf\Bigl\{ n \geq1: \inf_{m \geq n} \xi_m> c - \delta\Bigr\}, \quad\delta> 0. $$
(30)

According to Proposition 3.15, stated below and proved in the appendix, Condition 3.11 holds if \((\Phi_{n}^{(i)}/n)_{n \geq1}\) and \((\Psi_{n}^{(i)}/n)_{n \geq 1}\) converge r-quickly to l(i) under ℙi for every \(i\in\mathcal{M}\), which we put together as a different condition:

Condition 3.14

For some r≥1, (i) \(r\mbox {-}\mathrm {quick}\mbox {-}\liminf _{n \uparrow\infty} {\Phi_{n}^{(i)}}/n \geq l(i)\) under ℙi, (ii) \(r\mbox {-}\mathrm {quick}\mbox {-}\liminf _{n \uparrow\infty} {\Psi_{n}^{(i)}}/n \geq l(i)\) under ℙi for every \(i\in\mathcal{M}\).

Proposition 3.15

Letm≥1. (i) If Condition 3.14 (i) holds for somerm, then (28) and Condition 3.11 (i) hold. (ii) If Condition 3.14 (ii) holds for somerm, then (29) and Condition 3.11 (ii) hold.

Remark 3.16

Condition 3.14 (i) implies (ii) by (24). Moreover, Condition 3.14 holds if r-quick-lim infn↑∞n(i,j)/n)≥l(i,j) under ℙi for every \(i\in \mathcal{M}\) and \(j \in\mathcal{M}_{0} \setminus\{i\}\).

3.2 Asymptotic optimality

We now prove the asymptotic optimalities of (τA,dA) and (υB,dB) for Problems 1 and 2 under Condition 3.11 (i) and (ii), respectively.

We first derive a lower bound on the expected detection delay under the optimal strategy. The lower bound on the expected detection delay under the optimal strategy can be obtained similarly to CPD and SMHT; see Baum and Veeravalli (1994), Dragalin et al. (1999), Dragalin et al. (2000), Lai (2000), Tartakovsky and Veeravalli (2004) and Baron and Tartakovsky (2006). This lower bound and Lemma 3.12 below can be combined to obtain asymptotic optimality for both problems.

Lemma 3.17

For every\(i \in\mathcal{M}\), we have

$$\liminf_{\overline{R}_i \downarrow0} \inf_{(\tau,d) \in\Delta (\overline{R})} \frac{D_i^{(m)}(\tau)}{ ({|\log({\overline{R}_{j(i)i}}/ {\nu_i} )|} / {l(i)})^m} \geq1.$$

We now study how to set A in terms of c in order to achieve asymptotic optimality in Problem 1. We see from Proposition 3.4 and Lemma 3.12 that the false alarm and misdiagnosis probabilities decrease faster than the expected delay time and are negligible when A and B are small. Indeed, we have, in view of the definition of the Bayes risk in (10), by Proposition 3.4 and Lemma 3.12, for any \(0 < \sigma_{i} < \overline {a}_{i}\) for every \(i \in\mathcal{M}\),

$$R^{(c,a,m)}_{i} (\tau_{A},d_{A}) \sim c\biggl( \frac{-\log A_i}{l(i)} \biggr)^m + \sigma_iA_i \sim c \biggl( \frac{-\log A_i}{l(i)} \biggr)^m \quad \mbox{as } A_i \downarrow0. $$
(31)

This motivates us to choose the value of Ai such that it minimizes

$$g^{(c)}_i(x) := c \biggl( \frac{-\log x}{ l(i)}\biggr)^m + \sigma_i x, $$
(32)

over x∈(0,∞). Hence let

$$A_i(c) \in \arg \min _{x \in(0,\infty)}g_i^{(c)}(x),\quad c > 0. $$
(33)

For example, Ai(c)=c/(σil(i)) when m=1. It can be easily verified that for every m≥1 we have \(A_{i}(c) \stackrel{c \downarrow0}{\longrightarrow} 0\) in such a way that logAi(c)∼logc as c↓0. Hence we have

$$R^{(c,a,m)}_{i} (\tau_{A(c)},d_{A(c)}) \sim g_i^{(c)} \bigl(A_i(c)\bigr)\sim c \biggl(\frac{- \log c}{l(i)} \biggr)^m \quad\mbox{as } c \downarrow0. $$
(34)

Consequently, it is sufficient to show that

$$\liminf_{c \downarrow0} \frac{\inf_{(\tau,d) \in\Delta}R^{(c,a,m)}_{i}(\tau,d)}{g_i^{(c)} (A_i(c))} \geq1. $$
(35)

The proof of the asymptotic optimality below is similar to that of Theorem 3.1 in Baron and Tartakovsky (2006) for CPD.

Proposition 3.18

(Asymptotic optimality of (τA,dA) in Problem 1)

Fixm≥1 and a set of strictly positive constantsa. Under Conditions 3.11 (i) or 3.14 (i) for the givenm, the strategy (τA(c),dA(c)) is asymptotically optimal asc↓0; that is (21) holds for every\(i\in\mathcal{M}\).

It should be remarked here that the asymptotic optimality results hold for any \(0 < \sigma_{i} < \overline{a}_{i}\). However, for higher-order approximation, it is ideal to choose such that

$${R_i^{(a)}(\tau_A,d_A)} /{A_i} \stackrel{A_i \downarrow0}{\longrightarrow}\sigma_i. $$
(36)

In Sect. 5, we achieve this value using nonlinear renewal theory.

We now show that (υB,dB) is asymptotically optimal for Problem 2. By Proposition 3.4, if we set

$$B_{ij} (\overline{R}) := {\overline{R}_{ji}} / {\nu_i} \quad \mbox {for every }i \in\mathcal{M}, j \in\mathcal{M}_0\setminus\{i\},$$

then we have \((\upsilon_{B(\overline{R})}, d_{B(\overline{R})}) \in \Delta(\overline{R})\) for every fixed positive constants \(\overline{R} = (R_{ji})_{i \in\mathcal{M}, j \in\mathcal{M}_{0}\setminus\{i\}}\). By Lemma 3.12 (ii), \(\upsilon_{B(\overline{R})} \leq\upsilon^{(i)}_{B(\overline{R})}\) and because \(\overline{R}_{i} \downarrow0\) is equivalent to \(B_{ij(i)} (\overline{R})\downarrow0\),

$$\limsup_{\overline{R}_i \downarrow0} \frac{D_i^m(\upsilon _{B(\overline{R})})}{( {|\log( \overline{R}_{j(i)i} / \nu_i ) |}/{l(i)} )^m} = \limsup_{\overline{R}_i\downarrow0}\frac{D_i^m(\upsilon_{B(\overline{R})})}{ ({|\log B_{ij(i)} (\overline{R})|} / {l(i)} )^m} \leq1.$$

This together with Lemma 3.17 shows the asymptotic optimality.

Proposition 3.19

(Asymptotic optimality of (υB,dB) in Problem 2)

Fixm≥1. Under Conditions 3.11 (ii) or 3.14 (ii) for the givenm, the strategy\((\upsilon_{B(\overline{R})},d_{B(\overline{R})})\)is asymptotically optimal as\(\|\overline{R}\| \downarrow0\), i.e., (22) holds for every\(i \in\mathcal{M}\).

4 The convergence results of the LLR processes

In this section, we will prove Proposition 3.7 and obtain the limits l(i,j) for every \(i\in\mathcal{M}\) and \(j\in\mathcal{M}_{0}\setminus\{i\}\), which can be expressed in terms of the Kullback-Leibler divergence of the pre- and post-change probability density functions and the exponential decay rate ϱ in (2) of the disorder time probability distribution. Under some mild condition, we show that the convergence also holds in Lr for every r≥1.

Let us denote the Kullback-Leibler divergence of fi from fj by

$$q(i,j) := \int_E \biggl( \log\frac{f_i(x)}{f_j(x)} \biggr)f_i(x) m(\mathrm {d}x), \quad i\in\mathcal{M}, j \in\mathcal{M}_0\setminus \{i\},$$

which always exists and is non-negative. Furthermore, Assumption 2.1 ensures that

$$q(i,j) > 0, \quad i\in\mathcal{M}, j \in\mathcal{M}_0 \setminus\{i\}. $$
(37)

To ensure that \(\mathbb{E}_{i}^{(0)} [ \log(f_{0}(X_{1}))/(f_{j}(X_{1})) ]\) exists for every \(i\in\mathcal{M}\), \(j \in\mathcal{M}_{0} \setminus\{i\}\), we assume the following.

Assumption 4.1

For every \(i \in\mathcal {M}\), we assume that q(i,0)<∞.

Since \(\mathbb{E}^{(0)}_{i}[(\log (f_{i}(X_{1})/f_{j}(X_{1})))_{-}] \leq1\) for every \(i\in\mathcal{M}\), \(j\in \mathcal{M}_{0}\setminus\{i\}\), Assumption 4.1 guarantees the existence of

(38)

4.1 Decomposition of the LLR processes

We will decompose each LLR process (1) into some random walk with a positive drift and some stochastic process whose running average increment vanishes in the limit. In the SMHT case (namely, when p0=1), for every \(i\in\mathcal{M}\) and \(j \in \mathcal{M}\setminus\{i\}\),

$$\Lambda_n(i,j) = \log\biggl( \frac{\nu_i \prod^n_{k=1}f_i(X_k)}{\nu_j \prod^n_{k=1} f_j(X_k)} \biggr) = \log \biggl( \frac{\nu_i}{\nu_j} \biggr) + \sum^n_{k=1}\log\biggl( \frac{f_i(X_k)}{f_j(X_k)} \biggr), \quad n\geq1,$$

is a ℙi-random walk. Its running average increment Λn(i,j)/n converges ℙi-a.s. to the Kullback-Leibler divergence q(i,j) as n↑∞ by the strong law of large numbers (SLLN). Although \((\Lambda(i,j))_{j \in\mathcal{M}_{0}\setminus\{i\}}\), for p0≠0, are not ℙi-random walks, this observation nonetheless motivates us to approximate them by some random walks. Let

$$\Gamma_i := \bigl\{ j \in\mathcal{M}\setminus\{i\}: q(i,j) <q(i,0) +\varrho\bigr\},\quad i\in\mathcal{M}. $$

We show that Λ(i,j) can be approximated by a random walk with drift q(i,j)>0 if j∈Γi and with q(i,0)+ϱ>0 otherwise; namely, with drift min(q(i,j),q(i,0)+ϱ) if \(j\in\mathcal{M}\setminus\{i\}\) and q(i,0)+ϱ if j=0. Define

(39)
(40)

for every n≥1 and \(j\in\mathcal{M}_{0}\). Then it can be checked easily that, for any \(j \in\mathcal{M}_{0}\setminus\{i\}\), we have

(41)

By (7), after taking logarithms on both sides, each LLR process can be written as

$$ \Lambda_n(i,j)=\sum_{l=1}^n h_{ij}(X_l)+ \epsilon_n(i,j), \quad j\in\mathcal{M}_0\setminus\{i\},$$
(42)

where

(43)
(44)

Moreover, \(\sum_{l=1}^{n} h_{ij}(X_{l})\) can be split into post- and pre-change terms, and we have

$$\Lambda_n (i,j) = \sum_{l=\theta\vee1}^nh_{ij}(X_l) + \sum_{l=1}^{n \wedge(\theta-1)}h_{ij}(X_l) + \epsilon_n (i,j), \quad n\geq1, $$
(45)

for every fixed \(j \in\mathcal{M}_{0} \setminus\{i\}\). Notice that the first term in (45) is conditionally a random walk under \(\mathbb{P}_{i}^{(t)}\) given θ=t for every t≥0.

4.2 The convergence of the LLR processes

Fix \(i \in\mathcal{M}\) and \(j \in\mathcal{M}_{0} \setminus\{i\}\). In view of (42), we can explore the convergence for \({(\sum _{l=1}^{n} h_{ij}(X_{l}))}/n\) and ϵn(i,j)/n separately. For the first term, notice that

$$\frac{1}{n} \sum_{l=1}^nh_{ij}(X_l) = \frac{1}{n} \sum_{l=\theta\vee 1}^nh_{ij}(X_l) + \frac{1}{n} \sum_{l=1}^{n \wedge(\theta-1)}h_{ij}(X_l).$$

Because θ is an a.s. finite random variable, the first term on the righthand side converges \(\mathbb{P}^{(t)}_{i}\)-a.s. to

(46)

by the SLLN, while the second term converges to zero. Then Remark 2.5 implies Lemma 4.2, and, under some mild additional conditions, Lemma 4.3 below.

Lemma 4.2

For every\(i \in\mathcal {M}\)and\(j\in\mathcal{M}_{0} \setminus\{i\}\), we have\((1/n) {\sum_{l=1}^{n} h_{ij}(X_{l})}\mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{\mathbb{P}_{i}\mbox{\scriptsize-a.s.}} l(i,j)\).

Lemma 4.3

For every\(i \in \mathcal{M}\), \(j\in\mathcal{M}_{0} \setminus\{i\}\)andr≥1, we have\((1/n){\sum_{l=1}^{n} h_{ij}(X_{l})} \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow \infty}^{L^{r}(\mathbb{P}_{i})} l(i,j)\), if

$$\mathbb{E}^{(\infty)} \bigl \vert h_{ij}(X_1) \bigr \vert ^r <\infty\quad \mbox {and} \quad\mathbb{E}_i^{(0)}\bigl \vert h_{ij}(X_1) \bigr \vert ^r <\infty. $$
(47)

Note that (47) holds if and only if the following condition holds.

Condition 4.4

For every \(i \in \mathcal{M}\), \(j\in\mathcal{M}_{0}\setminus\{i\}\), and r≥1, suppose that

$$\everymath{\displaystyle}\begin{array}{l@{}l@{}l}\mathbb{E}^{(\infty)} \biggl \vert \log \frac{f_i(X_1)}{f_j(X_1)}\biggr \vert ^r <\infty\quad \mbox {and}\quad \mathbb{E}_i^{(0)} \biggl \vert \log\frac {f_i(X_1)}{f_j(X_1)}\biggr \vert ^r <\infty \quad \mbox {if } j\in\Gamma_i,\cr\noalign{\vspace{3pt}}\mathbb{E}^{(\infty)} \biggl \vert \log\frac {f_i(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty\quad \mbox {and} \quad\mathbb{E}_i^{(0)}\biggl \vert \log\frac{f_i(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty \quad \mbox {if } j \notin\Gamma_i.\end{array}$$

We now show that ϵn(i,j)/n converges ℙi-a.s. to zero. The convergence result holds in Lr(ℙi) as well for r≥1 under a mild condition. To show this, we first determine the limits of \((L_{n}^{(\cdot)}/n)_{n \geq1}\) and \((K_{n}^{(\cdot)}/n)_{n \geq1}\) as n↑∞ under ℙi.

Lemma 4.5

For every\(i \in\mathcal{M}\), we have the followings underi.

  1. (i)

    \(L_{n}^{(i)}/n \stackrel{n \uparrow\infty}{\longrightarrow} 0\)a.s.

  2. (ii)

    \(L_{n}^{(j)}/n \stackrel{n \uparrow\infty}{\longrightarrow}[q(i,j)-q(i,0)-\varrho]_{+}\)a.s. for every\(j \in \mathcal{M}\setminus\{i\}\).

  3. (iii)

    \(K_{n}^{(j)}/n \stackrel{n \uparrow\infty}{\longrightarrow}[q(i,j)-q(i,0)-\varrho]_{-}\)a.s. for every\(j \in \mathcal{M}\setminus\{i\}\).

  4. (iv)

    \(L_{n}^{(i)}\)converges a.s. asn↑∞ to a finite random variable\(L_{\infty}^{(i)}\).

  5. (v)

    \(L_{n}^{(j)}\)converges a.s. asn↑∞ to a finite random variable\(L_{\infty}^{(j)}\)for everyj∈Γi.

  6. (vi)

    For every\(j\in\mathcal{M}\), \((|L^{(j)}_{n}/n|^{r})_{n\geq 1}\)is uniformly integrable for everyr≥1, if

    $$ \mathbb{E}^{(\infty)}\bigl[{f_0(X_1)}/ {f_j(X_1)}\bigr] <\infty\quad \mbox {and} \quad\mathbb{E}_i^{(0)} \bigl[{f_0(X_1)} / {f_j(X_1)}\bigr] <\infty.$$
    (48)
  7. (vii)

    For every\(j\in\mathcal{M}\), \((|K^{(j)}_{n}/n|^{q})_{n\geq 1}\)is uniformly integrable for every 0≤qr, if (48) holds and

    $$ \mathbb{E}^{(\infty)}\biggl \vert \log \frac{f_j(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty\quad \mbox {and}\quad \mathbb{E}_i^{(0)} \biggl \vert \log\frac {f_j(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty, \quad\mathit{for\ some\ } r \geq1.$$
    (49)

Notice in Lemma 4.5 (vi) that in order for \(L_{n}^{(i)}\) to converge in Lr under ℙi to zero, it is sufficient to have

$$\mathbb{E}^{(\infty)} \bigl[ f_0 (X_1) /f_i(X_1) \bigr] < \infty $$
(50)

because \(\mathbb{E}_{i}^{(0)} [ f_{0} (X_{1}) / f_{i}(X_{1}) ] = \int_{E} f_{0}(x)m(\mathrm {d}x) = 1 < \infty\). The characterization of ϵn(i,j) in (44) leads to the next convergence result.

Lemma 4.6

For every\(i \in\mathcal {M}\)and\(j\in\mathcal{M}_{0}\setminus\{i\}\), we haveϵn(i,j)/n→0 asn↑∞ ℙi-a.s.

Moreover, the convergence holds in Lr under ℙi as well for some r≥1 given the following condition.

Condition 4.7

Given \(i\in\mathcal{M}\), \(j\in\mathcal{M}_{0}\setminus\{i\}\) and r≥1, we suppose that (50) holds and (i) j∈Γi and (48) holds, or (ii) j∉Γi or j=0 and (49) holds for the given r.

Lemma 4.8

Fix\(i \in\mathcal {M}\), \(j\in \mathcal{M}_{0}\setminus\{i\}\)andr≥1. Under Condition 4.7, ϵn(i,j)/n→0 asn↑∞ inLr(ℙi).

By combining the results in Lemmas 4.5 and 4.6, Proposition 3.7 indeed holds with l(⋅,⋅) as defined in (46). Moreover, the following convergence results hold by Lemmas 4.5 and 4.8.

Proposition 4.9

For every\(i\in\mathcal{M}\)and\(j\in\mathcal{M}_{0}\setminus\{i\}\), we have Λn(i,j)/nl(i,j) asn↑∞ inLr(ℙi) for somer≥1 if Conditions 4.4 and 4.7 hold for the givenr.

Remark 4.10

  1. (i)

    Observe from (46) that we have l(i,j)≤l(i,0) for every \(i \in\mathcal{M}\) and \(j \in \mathcal{M}_{0}\setminus\{i\}\), and the equality holds if and only if \(j \in \mathcal{M}_{0} \setminus(\Gamma_{i} \cup\{i\})\).

  2. (ii)

    Because q(i,j)=0 if and only if \(\int_{\{x\in E: f_{i}(x) \neq f_{j}(x)\}}f_{i}(x)m(\mathrm {d}x)=0\), Assumption 2.1 guarantees that l(i,j)>0 for every \(i \in\mathcal{M}\) and \(j \in\mathcal{M}_{0}\setminus\{i\}\).

  3. (iii)

    We later assume, in Sect. 5 below for higher-order approximations, that there is a unique \(j(i)\in \mathcal{M}_{0}\setminus\{i\}\) such that \(l(i) = l(i,j(i)) = \min_{j \in\mathcal{M}_{0} \setminus\{i\}}l(i,j)\) for every \(i\in\mathcal{M}\). Then (i) implies l(i)<l(i,0) and q(i,j(i))<q(i,0)+ϱ, and j(i)∈Γi and Γi≠∅.

Remark 4.11

We proved a number of results on the convergence of the LLR processes. However, those results do not guarantee their r-quick convergence. A sufficient condition derived by means of Jensen’s inequality can be found in our technical report (Dayanik et al. 2011).

5 Higher-order approximations

In this section, we derive a higher-order asymptotic approximation for the minimum Bayes risk in Problem 1 by choosing the values of σ in (31) as discussed in the previous section. Proposition 3.4 (i) gives an upper bound on \((R_{i}^{(a)} (\cdot,\cdot))_{i \in\mathcal{M}}\), and here we investigate if there exists some σ such that (36) holds.

5.1 Asymptotic behaviors of the false alarm and misdiagnosis probabilities

Fix \(i \in\mathcal{M}\). By (12) and because \(\tau_{A} =\tau^{(i)}_{A}\) on {dA=i,θτA<∞}, we have

(51)
(52)

Suppose that \(H_{i}^{(a)}(A_{i})\) is bounded from below by some constant b and \(H_{i}^{(a)}(A_{i})\) converges as Ai↓0 in distribution to some random variable \(H_{i}^{(a)}\) under ℙi. Then, because xex is continuous and bounded on x∈[b,∞], we have \({R_{i}^{(a)} (\tau_{A},d_{A})} / {A_{i}} \stackrel{A_{i} \downarrow0}{\longrightarrow} \mathbb {E}_{i} [\exp\{- H_{i}^{(a)} \}]\), and therefore (36) holds with \(\sigma_{i} =\mathbb{E}_{i}[ \exp\{- H_{i}^{(a)} \}]\).

Recall that \(\tau_{A}^{(i)}\) is the first time the process \(\Phi_{n}^{(i)}\) exceeds the threshold −logAi, and −logAi↑∞⟺Ai↓0. The following lemma shows that the convergence holds on condition that the overshoot

$$W_i(A_i) := \Phi_{\tau^{(i)}_A}^{(i)} - (-\log A_i) = \Phi_{\tau ^{(i)}_A}^{(i)} + \log A_i \geq0 $$
(53)

converges in distribution as Ai↓0 to some random variable Wi under ℙi.

Lemma 5.1

Fix\(i \in\mathcal{M}\). Ifj(i) is unique and the overshootWi(Ai) in (53) converges in distribution asAi↓0 to some random variableWiunderi, then (36) holds with\(\sigma_{i} := a_{j(i)i} \mathbb{E}_{i} [ \exp\{- W_{i} \}]\).

In Lemma 5.1 above, σi does not depend on aji for any \(j \in\mathcal{M}_{0} \setminus\{i,j(i) \}\) and therefore we see that Rji(τA,dA) is negligible compared with Rj(i)i(τA,dA) for any \(j \in\mathcal{M}_{0} \setminus\{i,j(i) \}\) for small A.

5.2 Nonlinear renewal theory and the overshoot distribution

We now see that Lemma 5.1 indeed holds via nonlinear renewal theory on condition that j(i) is unique. We obtain the limiting distribution of the overshoot (53).

Observe that, for every \(k \in\mathcal{M}_{0} \setminus\{i\}\),

(54)
(55)

By (45) and (54), we have \(\Phi_{n}^{(i)} =\sum_{l=\theta\vee1}^{n} h_{i j(i)} (X_{l}) + \xi_{n}(i,j(i))\), where

(56)

We will take advantage of the fact that, given θ, the process \(\sum_{l=\theta\vee1}^{n} h_{i j(i)} (X_{l})\) is conditionally a random walk and ξn(i,j(i)) can be shown to be “slowly-changing”, in the sense that ξn+1(i,j(i))−ξn(i,j(i))≈0 for large n. This implies that the increments of the slowly-changing process ξn(i,j(i)) are negligible compared to those of the random walk term \(\sum_{l=\theta\vee1}^{n} h_{ij(i)}(X_{l})\) at every large n. This result can be used to obtain the overshoot distribution of the process Φ(i) at its boundary-crossing time \(\tau^{(i)}_{A}\) for small Ai by means of the nonlinear renewal theory (Woodroofe 1982; Siegmund 1985). Let us firstly give a few definitions and state a fundamental theorem of nonlinear renewal theory.

Definition 5.2

A sequence of random variables (ξn)n≥1 is called uniformly continuous in probability (u.c.i.p.) if for every ε>0, there is δ>0 such that ℙ{max0≤k|ξn+kξn|≥ε}≤ε for every n≥1.

Definition 5.3

A sequence of random variables (ξn)n≥1 is said to be slowly-changing if it is u.c.i.p. and

$$ \frac{\max\{|\xi_1|,\ldots,|\xi_n| \}}{n}\mathop {{\hbox to1.5cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{\mathrm{in\ probability}} 0.$$
(57)

Remark 5.4

If a process converges a.s. to a finite random variable, then it is a slowly-changing process. Moreover, the sum of two slowly-changing processes is also a slowly-changing process.

The following theorem states that, if a process is the sum of a random walk with positive drift and a slowly-changing process, then the overshoot at the first time it exceeds some threshold has the same asymptotic distribution as that of the overshoot of the random walk, as the threshold tends to infinity.

Theorem 5.5

(Woodroofe 1982, Theorem 4.1; Siegmund 1985, Theorem 9.12)

On some\((\Omega,\mathcal {E},\mathbb{P})\), let (Zn)n≥1be a sequence of i.i.d. random variables with some common nonarithmetic distribution and mean\(0<\mathbb{E}Z_{1} < \infty\). Let (ξn)n≥1be a slowly-changing process and (Zk)kn+1be independent of (ξl)1≤lnfor everyn≥1. If\(\widetilde{T}_{b}:= \inf\{ n \geq1: \sum_{i=1}^{n} Z_{i} - \xi_{n} > b\}\)and\(T_{b} :=\inf\{ n \geq1: \sum_{i=1}^{n} Z_{i} > b\}\)for everyb≥0,

We fix \(i \in\mathcal{M}\) and obtain the limiting distribution of the overshoot Wi(Ai) as Ai↓∞ using Theorem 5.5.

Lemma 5.6

Fix\(i \in\mathcal{M}\)andt≥0. Ifj(i) is unique, thenξn(i,j(i)) is slowly-changing under\(\mathbb{P}_{i}^{(t)}\).

For every t≥1 and \(j(i) \in \arg \min _{j \in\mathcal{M}_{0}\setminus \{i\}} l(i,j)\), define a stopping time,

$$T_i^{(t)} := \inf\Biggl\{ n \geq t: \sum _{l=t}^{n} \log\biggl(\frac {f_i(X_l)}{f_{j(i)}(X_l)} \biggr) >0 \Biggr\},$$

and random variable \(W^{(t)}_{i}\) whose distribution is given by

$$ \mathbb{P}_i^{(t)} \bigl\{W_i^{(t)} \leq w \bigr\} = \frac{\int_0^w \mathbb{P}_i^{(t)}\{\sum_{l=t}^{T_i^{(t)}} \log\frac{f_i(X_l)}{f_{j(i)}(X_l)} >s \} \mathrm {d}s}{\mathbb{E}_i^{(t)} [ \sum_{l=t}^{T_i^{(t)}} \log \frac{f_i(X_l)}{f_{j(i)}(X_l)} ]},\quad0\leq w < \infty.$$
(58)

The next lemma follows immediately from Theorem 5.5.

Lemma 5.7

Fix\(i \in\mathcal {M}\)andt≥0. Ifj(i) is unique, then the overshootWi(Ai) converges to\(W^{(t)}_{i}\)in distribution under\(\mathbb{P}_{i}^{(t)}\)asAi↓0.

Note that the distribution of \(W_{i}^{(t)}\) under \(\mathbb{P}^{(t)}_{i}\) is identical to that of \(W_{i}^{(0)}\) under \(\mathbb{P}^{(0)}_{i}\) for every t≥1, which leads to Lemma 5.8 below.

Lemma 5.8

Fix\(i\in\mathcal{M}\). Ifj(i) is unique, then asAi↓0 the overshootWi(Ai) converges in distribution underito a random variableWiwhose distribution underiis identical to that of\(W^{(0)}_{i}\)in (58) under\(\mathbb{P}^{(0)}_{i}\).

Finally, Lemmas 5.1 and 5.8 prove Proposition 5.9 below.

Proposition 5.9

Fix\(i \in \mathcal{M}\)and supposej(i) is unique. Then\({R_{i}^{(a)} (\tau _{A},d_{A})} /{A_{i}} \stackrel{A_{i} \downarrow0}{\longrightarrow} a_{j(i) i} \mathbb{E}_{i} [ e^{-W_{i}}]\), whereWiis the random variable defined in Lemma 5.8. Therefore, a higher-order approximation for Problem 1 can be achieved by setting in (32)

$$\sigma_i := a_{j(i) i} \mathbb{E}_i \bigl[e^{- W_i} \bigr]. $$
(59)

6 Numerical examples

To assess the performance of the asymptotically optimal rule, one firstly needs to find, for comparison, the optimal solution. As outlined in Sect. 2, in order to solve optimally the fixed-error-probability formulation, one first needs to transform it to a minimum Bayes risk formulation by means of Lagrange relaxation, and then solve repeatedly the latter for different values of Lagrange multipliers. Because this method requires extensive calculations and its details are not of the primary interest of this paper, we focus on the minimum Bayes risk formulation and evaluate the performance of the strategy (τA(c),dA(c)) numerically in the i.i.d. Gaussian case described below. Its asymptotic optimality ensures that the strategy is near-optimal when the unit detection delay cost c is small. Our numerical example suggests that it is near-optimal even for mildly higher values of the unit detection delay cost.

6.1 The Gaussian case

Suppose that the observations \(X_{n} = (X_{n}^{(1)},\ldots,X_{n}^{(K)})\), n≥1 form a sequence of K-tuple Gaussian random variables. Conditionally on θ and μ, they are mutually independent and have common means \((\lambda_{0}^{(1)}, \ldots,\lambda_{0}^{(K)})\) before θ and \((\lambda_{\mu}^{(1)}, \ldots,\lambda_{\mu}^{(K)})\) at and after θ and common variances (1,…,1) at all times. The Kullback-Leibler divergence between the probability density functions under μ=i and μ=j is \(q(i,j) = \frac{1}{2} \sum_{k = 1}^{K} ( \lambda_{i}^{(k)} -\lambda_{j}^{(k)} )^{2}\) for every \(i\in\mathcal{M}\), \(j\in\mathcal{M}_{0}\setminus\{i\}\). Because Conditions 4.4 and 4.7 are satisfied, Propositions 3.7 and 4.9 hold with

$$l(i,j) = \min\Biggl\{ \varrho+ \frac{1}{2} \sum_{k = 1}^K\bigl( \lambda_i^{(k)} - \lambda_0^{(k)}\bigr)^2, \frac{1}{2} \sum_{k= 1}^K\bigl( \lambda_i^{(k)} - \lambda_j^{(k)}\bigr)^2 \Biggr\}, \quad \mbox { $j \in\mathcal{M}\setminus\{i\}$,} $$
(60)

and \(l(i,0) = \varrho+ \frac{1}{2} \sum_{k = 1}^{K} (\lambda_{i}^{(k)} - \lambda_{0}^{(k)})^{2}\) for every \(i \in \mathcal{M}\).

6.2 Numerical validation of Proposition 3.7

Let M=3, K=1, p0=0, p=0.1, (ν1,ν2,ν3)=(1/3,1/3,1/3), and \((\lambda_{0}^{(1)},\lambda_{1}^{(1)},\lambda_{2}^{(1)},\lambda_{3}^{(1)}) = (0,0.2,0.3,0.8)\). The limiting values l(⋅,⋅) in (60) are reported in Table 1. Figure 2 shows sample realizations of (Λn(μ,j)/n)n≥1, j∈{0,1,2,3}∖{μ} and \((\Phi_{n}^{(\mu)}/n)_{n \geq1}\) given (a) μ=1 and θ=10, (b) μ=1 and θ=1000 and (c) μ=2 and θ=10. The figures and the limiting values in Table 2 are consistent as expected from Proposition 3.7. As guaranteed by Proposition 3.8, the process \((\Phi_{n}^{(i)}/n)_{n\geq1}\) converges to l(i).

Fig. 2
figure 2

The realization of process j: (Λn(μ,j)/n)n≥1 for every j∈{0,1,2,3}∖{μ} and process phi: \((\Phi_{n}^{(\mu)}/n)_{n\geq1}\) given that (a) μ=1,θ=10, (b) μ=1,θ=1000, and (c) μ=2,θ=10

Table 1 The limits l(i,j) of Proposition 3.7 calculated for the numerical example (\(\arg \min _{\scriptsize j \in\mathcal{M}_{0} \setminus\{i\}}l(i,j)\) values are indicated in boldface)
Table 2 Numerical comparisons of the optimal and approximate (τA(c),dA(c)) Bayes risk values

6.3 The numerical comparison of the minimum and asymptotically minimum Bayes risks

We calculate the minimum and asymptotically minimum Bayes risks for the following example. We assume that M=2, K=2, p0=0, p=0.01, (ν1,ν2)=(0.1,0.9), and the mean vectors \(\lambda_{0}=(\lambda^{(1)}_{0},\lambda^{(2)}_{0})\) and \(\lambda_{i}=(\lambda^{(1)}_{i},\lambda^{(2)}_{i})\), i=1,2 before and after the change, respectively, satisfy

$$\lambda_1^{(1)} =\lambda_0^{(1)}+1.0,\qquad\lambda_2^{(1)} =\lambda_0^{(1)}+1.0,\qquad\lambda_1^{(2)} =\lambda_0^{(2)}+0.0,\qquad\lambda_2^{(2)} =\lambda_0^{(2)}+0.5.$$

Table 2 compares the performances of the strategy (τA(c),dA(c)) and the optimal strategy for fixed aji=1 for every \(i \in\mathcal{M}\) and \(j \in\mathcal{M}_{0} \setminus\{i\}\) as the unit detection delay cost c decreases. The optimal stopping regions are found by the value iteration described by Dayanik et al. (2008). The Bayes risks of the strategies are estimated via Monte Carlo simulation. For accurate approximations, we used (59), and \((\sigma_{i})_{i \in \mathcal{M}}\) are computed with Monte Carlo methods.

We see that (τA(c),dA(c)) is asymptotically optimal; the ratio of the optimal and approximate Bayes risk values converges to 1 as c↓0 as listed in the last column. Moreover, the approximate and the minimum Bayes risk values are close even for large c values, and this is due to the higher-order approximation as studied in Sect. 5.