Abstract
We study the joint problem of sequential change detection and multiple hypothesis testing. Suppose that the common distribution of a sequence of i.i.d. random variables changes suddenly at some unobservable time to one of finitely many distinct alternatives, and one needs to both detect and identify the change at the earliest possible time. We propose computationally efficient sequential decision rules that are asymptotically either Bayes-optimal or optimal in a Bayesian fixed-error-probability formulation, as the unit detection delay cost or the misdiagnosis and false alarm probabilities go to zero, respectively. Numerical examples are provided to verify the asymptotic optimality and the speed of convergence.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Sequential change detection and identification refers to the joint problem of sequential change point detection (CPD) and sequential multiple hypothesis testing (SMHT), where one needs to detect, based on a sequence of observations, a sudden and unobservable change as early as possible and identify its cause as accurately as possible. In a Bayesian setup, this problem boils down to optimally solving the trade-off between the expected detection delay and the false alarm and misdiagnosis costs.
The sequential analysis methods such as Wald’s (1947) sequential probability ratio test and Page’s (1954) cumulative sum were developed for the quality control problems, in which a production process may suddenly get out of control at some unknown and unobservable time and one needs to detect the failure time as soon as possible. However, it is more realistic to assume that a production process consists of multiple processing units, each of which is prone to failure, and one needs to detect the earliest failure time and accurately identify the failed component.
In economics and biosurveillance, elevated concerns about financial crises and bioterrorism have increased the importance of early warning systems (see Bussiere and Fratzscher 2006 and Heffernan et al. 2004); structural changes need to be detected in time series such as the S&P 500 index for better financial risk management and over-the-counter medication sales for early signs of a possible disease outbreak. There are a number of potential causes of structural changes, and one needs to identify the cause of the change in order to take the most appropriate countermeasures. Although most existing structural change detection methods employ retrospective tests on historical data, online tests are more appropriate in these settings because time-inhomogeneous data arrive sequentially, and the changes must be identified as soon as possible after they occur.
In this paper, we focus on two online Bayesian formulations and propose two computationally efficient and asymptotically optimal strategies inspired by the separate asymptotic analyses of SMHT (Baum and Veeravalli 1994; Dragalin et al. 1999; Dragalin et al. 2000) and CPD (Tartakovsky and Veeravalli 2004).
We suppose that a system starts in regime 0 and suddenly switches at some unknown and unobservable disorder time θ to one of finitely many regimes \(\mu\in\mathcal{M}:= \{1,\ldots,M \}\). One observes a sequence of random variables X=(Xn)n≥1 which are, conditionally on θ and μ, independent and distributed according to some cumulative distribution function F0 before time θ and Fμ at and after time θ; namely,
The objective is to detect the change as quickly as possible, and at the same time to identify the new regime μ as accurately as possible. More precisely, we want to find a strategy (τ,d), consisting of a pair of detection timeτ and diagnosis ruled, in order to minimize the expected detection delay time and the false alarm and misdiagnosis probabilities. This paper studies the following formulations:
-
(i)
In the minimum Bayes risk formulation, one minimizes a Bayes risk which is the sum of the expected detection delay time and the false alarm and misdiagnosis probabilities.
-
(ii)
In the Bayesian fixed-error-probability formulation, one minimizes the expected detection delay time subject to some small upper bounds on the false alarm and misdiagnosis probabilities.
The precise formulations are given as Problems 1 and 2, respectively, on p. 5 in Sect. 2. A majority of practitioners prefer working with the Bayesian fixed-error-probability formulation because the hard constraints on error probabilities are easier to set up and understand than the costs of detection delay, false alarm, and misdiagnosis in the minimum Bayes risk formulation. The Bayesian fixed-error-probability formulation is often solved by means of its Lagrange relaxation, which turns out to be a minimum Bayes risk problem where the costs are the Lagrange multipliers (or shadow prices) of the false alarm and misdiagnosis constraints. We discuss in more detail the correspondence between the optimal solutions of these two formulations in Sect. 2. Another reason for solving the minimum Bayes risk formulation is that it allows the expert opinions about the risks to be naturally included in the solution. Therefore, we decide to study both formulations in this paper.
Finding the optimal solutions under both formulations requires intensive computations. For example, the minimum Bayes risk formulation reduces to an optimal stopping problem as shown by Dayanik et al. (2008) (see also Lovejoy (1991), White (1991), Borkar (1991), and Runggaldier (1991) for general solution methods available for the partially observed Markov decision processes and Burnetas and Katehakis (1997) for adaptive control for Markov decision processes), and the optimal strategy is to stop as soon as the posterior probability process \(\Pi=(\Pi_{n}^{(0)},\ldots,\Pi_{n}^{(M)})_{n \geq0}\), where
with \(\mathcal{M}_{0} := \mathcal{M}\cup\{0\}\), enters some suitable region of the M-dimensional probability simplex.
Figure 1(a) illustrates the optimal stopping regions for a typical problem with M=2. The process Π starts in the lower-left corner, which corresponds to the “no change” state or regime 0. As observations are made, it progresses through the light-colored region, where raising a change-alarm is suboptimal. If it enters the shaded region in the top corner, then declaring a regime switch from 0 to 1 is optimal. If it enters the shaded region in the lower-right corner, then declaring a regime switch from 0 to 2 is optimal. The first hitting time to one of those shaded regions and the corresponding estimate of the new regime minimize the costs for the minimum Bayes risk formulation.
These shaded regions can in principle be found by dynamic programming methods; see, for example, Derman (1970), Puterman (1994) and Bertsekas (2005). However, those methods are generally computationally intensive due to the curse of dimensionality. The state space increases exponentially in the number of regimes, and finding an optimal strategy by using the classical dynamic programming methods tends to be practically impossible in higher dimensions.
Our goal is to obtain a practical solution that is both near-optimal and computationally feasible. We propose two simple and asymptotically optimal strategies by approximating the optimal stopping regions with simpler shapes. In particular, our strategy for the minimum Bayes risk formulation raises a change alarm and estimates the new regime when the posterior probability of at least one of the change types exceeds some predetermined threshold for the first time. In Fig. 1(b), the stopping regions of this strategy correspond to the union of the triangles in the two corners. Those triangular regions determine a stopping and selection strategy, and hence the problem is simplified to designing the triangular regions to minimize the risks.
We give an asymptotic analysis of the change detection and identification problem. The SMHT and CPD are the special cases. The asymptotic optimality of our strategies can be proved using nonlinear renewal theory after casting the log-likelihood-ratio (LLR) processes
as the sum of suitable random walks and some slowly-changing stochastic processes. We show that the r-quick convergence of Lai (1977) for an appropriate subset of the LLR processes in (1) is a sufficient condition for asymptotic optimality. We also pursue higher-order asymptotic approximations for the minimum Bayes risk formulation as inspired by Baum and Veeravalli (1994)’s work for SMHT.
The remainder of the paper is organized as follows. We formulate the Bayesian sequential change detection and identification problem in Sect. 2. In Sect. 3, we propose two sequential change detection and identification strategies and obtain sufficient conditions for their asymptotic optimality in terms of the LLR processes. In Sect. 4 we study certain convergence properties of the LLR processes that are required to implement the asymptotically optimal strategies. In Sect. 5, we obtain higher-order asymptotic approximations for the minimum Bayes risk formulation using nonlinear renewal theory. Section 6 concludes with numerical examples. The proofs and some auxiliary results are presented in the appendix.
2 Problem formulations
Consider a probability space \((\Omega, \mathcal{F}, \mathbb{P})\) hosting a stochastic process X=(Xn)n≥1 taking values in some measurable space \((E,\mathcal {E})\). Let θ:Ω↦{0,1,…} and \(\mu: \Omega\mapsto\mathcal{M}:= \{ 1,\dots, M \}\) be independent random variables defined on the same probability space with the probability distributions
for some known constants p0∈[0,1), p∈(0,1), and positive constants \(\nu= (\nu_{i})_{i \in\mathcal{M}}\). The random variable θ has an exponential tail with
Given μ=i and θ=t, the random variables X1,X2,… are conditionally independent, and (Xn)1≤n≤t−1 and (Xn)n≥t have common conditional probability density functions f0 and fi, respectively, with respect to some σ-finite measure m on \((E,\mathcal {E})\); namely,
for every \(i \in\mathcal{M}\), t≥1, n≥1, and \((E_{1}\times\cdots \times E_{n}) \in\mathcal{E}^{n}\). The following assumptions remove certain trivial cases; see Remark 4.10 below.
Assumption 2.1
For every \(i\in\mathcal{M}_{0}\) and \(j \in\mathcal{M}_{0} \setminus\{i\}\), 0<fi(X1)/fj(X1)<∞ a.s., and Fi and Fj are distinguishable; \(\int_{\{x\in E: f_{i}(x)\neq f_{j}(x)\}} f_{i}(x)m(\mathrm {d}x) > 0\).
Let \(\mathbb{F}= (\mathcal{F}_{n})_{n \geq0}\) denote the filtration generated by X; namely, \(\mathcal{F}_{0} = \{\varnothing, \Omega\}\) and \(\mathcal {F}_{n} =\sigma(X_{1},\dots,X_{n})\) for every n≥1. A sequential change detection and identification rule (τ,d) is a pair consisting of an \(\mathbb{F}\)-stopping time τ (in short, \(\tau\in \mathbb{F}\)) and a random variable \(d: \Omega\mapsto\mathcal{M}\) that is measurable with respect to the observation history \(\mathcal{F}_{\tau}\) up to the stopping time τ (namely, \(d \in\mathcal{F}_{\tau}\)). Let
be the collection of all sequential change detection and identification rules. The objective is to find a strategy (τ,d) that solves optimally the trade-off between the mth moment
of the detection delay time (τ−θ)+ for some m≥1 and the false alarm and misdiagnosis probabilities
Here and for the rest of the paper, x+:=max(x,0) and x−:=max(−x,0) for any x∈ℝ.
We formulate the optimal trade-offs between (3)–(5) as in the following two related problems:
Problem 1
(Minimum Bayes risk formulation)
For fixedm≥1, c>0, and strictly positive constants\(a=(a_{ji})_{i \in\mathcal{M}, j \in\mathcal{M}_{0} \setminus\{i\}}\), calculate the minimum Bayes risk inf(τ,d)∈ΔR(c,a,m)(τ,d), where
is the expected sum of all risks arising from the detection delay time, false alarm and misdiagnosis, and find a strategy (τ∗,d∗)∈Δ which attains the minimum Bayes risk, if such a strategy exists.
Problem 2
(Bayesian fixed-error-probability formulation)
For fixed positive constantsm≥1 and\(\overline{R} = (\overline{R}_{ji})_{i \in\mathcal{M}, j \in \mathcal{M}_{0}\setminus\{i\}}\), calculate the smallestmth moment\(\inf_{(\tau,d) \in\Delta(\overline{R})} D^{(m)}(\tau)\)of detection delay time among all decision rules in
with the same predetermined upper bounds on false alarm and misdiagnosis probabilities, and find a strategy\((\tau^{*},d^{*})\in \Delta(\overline{R})\)which attains the minimum, if such a strategy exists.
Problem 1 can in principle be solved optimally by stochastic dynamic programming. A standard way to solve Problem 2 optimally is by working through its Lagrange relaxation, which turns out to be an instance of Problem 1, where aji serves as the Lagrange multiplier of the constraint \(R_{ji}(\tau,d)\leq\overline{R}_{ji}\) for every \(i \in\mathcal{M}\) and \(j \in \mathcal{M}_{0}\setminus\{i\}\). Indeed, if for some a, a decision rule (τ∗,d∗)∈Δ attains the minimum Bayes risk inf(τ,d)∈ΔR(c,a,m)(τ,d) and if \(R_{ji}(\tau^{*},d^{*}) = \overline{R}_{ji}\) for every \(i \in\mathcal{M}, \, j \in\mathcal{M}_{0} \setminus\{i\}\), then for every \((\tau,d) \in\Delta(\overline{R}) \subseteq\Delta\),
implies that \(c (D^{(m)}(\tau^{*})-D^{(m)}(\tau)) \leq\sum_{i \in \mathcal{M}}\sum_{j \in\mathcal{M}_{0} \setminus\{i\}} a_{ji}(R_{ji}(\tau,d)-R_{ji}(\tau^{*},d^{*})) =\sum_{i \in\mathcal{M}} \sum _{j \in \mathcal{M}_{0} \setminus\{i\}} a_{ji} (R_{ji}(\tau,d)-\overline{R}_{ji})\leq0\), and hence, the same (τ∗,d∗) rule is also optimal for the Bayesian fixed-error-probability formulation. The asymptotically optimal decision rules proposed for Problems 1 and 2 will likewise be related.
On the one hand, a majority of practitioners favor the formulation in Problem 2 over that in Problem 1, because the hard constraints \(R_{ji}(\tau,d) \leq\overline{R}_{ji}, i \in\mathcal{M}, \, j \in \mathcal{M}_{0}\setminus\{i\}\) in Problem 2 are easier to set up and to understood than the (shadow) costs c and a of decision delay, false alarm, and misdiagnosis. On the other hand, some practitioners still find Problem 1 useful to incorporate expert opinions.
As we introduced in Sect. 1, let \(\Pi=(\Pi_{n}^{(0)},\ldots,\Pi_{n}^{(M)})_{n \geq0}\) be the posterior probability process defined by
Dayanik et al. (2008) proved that Π is a Markov process satisfying
where \(\alpha_{n}^{(i)} (x_{1},\ldots,x_{n})\) equals
for every n≥1 and (x1,…,xn)∈En, and
Remark 2.2
Assumption 2.1 implies that \(0< \Pi_{n}^{(i)} <1\) a.s. for every finite n≥1 and \(i\in\mathcal{M}\).
Let us denote by \(\alpha_{n}^{(i)}\) the random variable \(\alpha_{n}^{(i)}(X_{1},\ldots,X_{n})\) for every n≥0. Then the LLR processes defined in (1) can be written as
In our analyses, it is often very convenient to work under the conditional probability measures:
defined for every \(i \in\mathcal{M}\), n≥1, \((E_{1} \times\cdots \times E_{n}) \in \mathcal {E}^{n}\). Let \(\mathbb{E}_{i}\) and \(\mathbb{E}_{i}^{(t)}\), respectively, be the expectations with respect to ℙi and \(\mathbb{P}^{(t)}_{i}\). Under \(\mathbb{P}_{i}^{(0)}\) and \(\mathbb{P}_{i}^{(\infty)}\), the random variables X1,X2,… are independent and have common probability density functions fi(⋅) and f0(⋅), respectively. We denote by ℙ(∞) any \(\mathbb {P}_{i}^{(\infty)}\) for any \(i \in\mathcal{M}\). The LLR processes in (1) or (7) play a role in changing probability measures as the next lemma shows.
Lemma 2.3
(Change of measure)
For every\(i \in\mathcal{M}\), an\(\mathbb{F}\)-stopping timeτ, and an\(\mathcal{F}_{\tau}\)-measurable eventF,
The next proposition introduces the key risk components and its proof follows directly from Lemma 2.3 after setting \(F:=\{d=i\} \in\mathcal{F}_{\tau}\) for every \(i\in\mathcal{M}\).
Proposition 2.4
For every strategy (τ,d)∈Δ, c>0, m≥1 and strictly positive constants\(a=(a_{ji})_{i\in\mathcal{M},j\in\mathcal {M}\setminus\{i\}}\), we can rewrite (4)–(6) as
where for every\(i \in\mathcal{M}\)
Here (10)–(12) correspond to the conditional risks given μ=i, written in terms of the process \(G_{i}^{(a)} (n)\), which is a linear combination of the exponents of the LLR processes and serves as the Radon-Nikodym derivative.
Remark 2.5
In the remainder, we prove a number of results in the ℙi-a.s. sense for given \(i \in\mathcal{M}\). These also hold automatically \(\mathbb{P}_{i}^{(t)}\)-a.s. for every t≥1. Indeed, because ℙ{θ<∞}=1, ℙ{θ=t}>0 for every t≥1 and \(\mathbb{P}_{i} (F) = \sum_{t=0}^{\infty}\mathbb {P}\{ \theta = t \} \mathbb{P}_{i}^{(t)} (F)\) for every \(F \in\mathcal{F}\), ℙi(F)=1 implies \(\mathbb{P}^{(t)}_{i}(F)=1\) for every t≥1.
3 Asymptotically optimal sequential detection and identification strategies
We will introduce two strategies that are computationally efficient and asymptotically optimal. The first strategy raises an alarm as soon as the posterior probability of the event that at least one of the change types occurred exceeds some suitable threshold, and is shown to be asymptotically optimal for Problem 1. The second strategy is its variant expressed in terms of the LLR processes and is shown to be asymptotically optimal for Problem 2. The asymptotic performance analyses of both rules depend on the same convergence results of the LLR processes. The proofs can be conducted in parallel and almost simultaneously both for Problem 1 and for Problem 2 because the detection times can be approximated by the first hitting times of certain processes that share the same asymptotic properties.
Definition 3.1
((τA,dA)-strategy for the minimum Bayes risk problem)
For every set \(A = (A_{i})_{i \in\mathcal{M}}\) of strictly positive constants, let (τA,dA) be the strategy defined by
Define the logarithm of the odds-ratio processes as
Then (14) can be rewritten as
The values of A determine the sizes of the polyhedrons that approximate the original optimal stopping regions, e.g., the triangular regions when M=2 as in Fig. 1(b), and need to be determined so as to minimize the Bayes risk.
Definition 3.2
((υB,dB)-strategy for the Bayesian fixed-error-probability formulation)
For every set \(B = (B_{i})_{i \in\mathcal{M}}\) and \(B_{i} = (B_{ij})_{j \in\mathcal{M}_{0} \setminus\{i\}}\), \(i \in \mathcal{M}\) of strictly positive constants, let (υB,dB) be the strategy defined by
We show that, after choosing suitable A and B, the strategy (τA,dA) is asymptotically optimal for Problem 1 as c goes to zero, and the strategy (υB,dB) is asymptotically optimal for Problem 2 as
goes to zero—while \(\overline{R}_{ji}/\overline{R}_{ki}\) for every \(j,k\in\mathcal{M}_{0}\setminus\{i\}\) remains bounded away from zero in the sense that
for any strictly positive constants \(k = (k_{i})_{i \in\mathcal {M}}\)—and this limit mode will still be denoted by “\(\|\overline{R}\|\downarrow0\)” for brevity.
More precisely, we find functions A(c) of the unit sampling cost c in Problem 1 and \(B(\overline{R})\) of the upper bounds \((\overline{R}_{ji})_{i\in\mathcal{M},j\in\mathcal {M}_{0}\setminus\{i\}}\) on the false alarm and misdiagnosis probabilities in Problem 2 so that (τA(c),dA(c))∈Δ for every c>0, \((\upsilon_{B(\overline{R})},d_{B(\overline{R})}) \in \Delta(\overline{R})\) for every \(\overline{R}>0\), and
for every fixed m≥1 and every set \(a=(a_{ji})_{i\in\mathcal {M}, j\in \mathcal{M}_{0}\setminus\{i\}}\) of strictly positive constants. Here “xγ∼yγ as γ→γ0” means \(\lim_{\gamma\rightarrow \gamma_{0}} {x_{\gamma}} / {y_{\gamma}} = 1\). In fact, we obtain results stronger than (19)–(20); for every \(i\in\mathcal{M}\)
Remark 3.3
For all \(i \in\mathcal{M}\), let \(\overline{B}_{i} := \max_{j \in \mathcal{M}_{0}\setminus\{i\}} B_{ij}\), \(\underline{B}_{i} := \min_{j \in\mathcal{M}_{0}\setminus\{i\}} B_{ij}\) and \(\Psi^{(i)}_{n} := \min_{j \in\mathcal{M}_{0}\setminus\{i\}} \Lambda_{n}(i,j)\), n≥1. Then,
where \(\underline{\upsilon}_{B}^{(i)} := \inf\{ n \geq1:\Psi^{(i)}_{n} > - \log\overline{B}_{i} \}\) and \(\overline{\upsilon}_{B}^{(i)} := \inf\{ n \geq1: \Psi^{(i)}_{n} >- \log\underline{B}_{i} \}\). Notice that (15) implies \(\Phi_{n}^{(i)} \leq\Lambda_{n}(i,j)\) for every n≥1 and \(j \in\mathcal{M}_{0} \setminus\{i\}\), and hence
3.1 Convergence of false alarm and misdiagnosis probabilities and detection delay
As c and \(\overline{R}\) decrease to zero in Problems 1 and 2, respectively, we expect that the optimal stopping regions shrink, or equivalently the values of A and B should decrease. We therefore study the asymptotic behaviors of the false alarm and misdiagnosis probabilities and the change detection time as
go to zero, and then adapt their values as functions of c and \(\overline{R}\) so as to attain asymptotically optimal strategies. Here in concordance with (18) the limits \(\overline{B}_{i} \downarrow0\) for every \(i \in\mathcal{M}\) are taken such that
We first study the asymptotic behaviors of the false alarm and misdiagnosis probabilities. The upper bounds can be obtained by a direct application of Proposition 2.4.
Proposition 3.4
(Bounds on false alarm and misdiagnosis probabilities)
(i) For every fixed\(A = (A_{i})_{i \in\mathcal{M}}\)and\(a = (a_{ji})_{i \in \mathcal{M}, j \in\mathcal{M}_{0}\setminus\{i\}}\), we have\(R_{i}^{(a)}(\tau_{A},d_{A}) \leq \overline{a}_{i} A_{i}\)for every\(i \in\mathcal{M}\), where\(\overline {a}_{i} :=\max_{j \in\mathcal{M}_{0}\setminus\{i\}} a_{ji}\)andRji(τA,dA)≤νiAi≤νi∥A∥ for every\(i\in\mathcal{M}\)and\(j\in \mathcal{M}_{0}\setminus\{i\}\).
(ii) For every\(B =(B_{ij})_{i \in\mathcal{M}, j \in\mathcal{M}\setminus\{i\}}\), we haveRji(υB,dB)≤νiBijfor every\(i \in\mathcal{M}\)and\(j \in \mathcal{M}_{0} \setminus\{i\}\).
Corollary 3.5
(i) \(\max_{i \in\mathcal{M}} R^{(a)}_{i} (\tau_{A},d_{A})\downarrow0\)as ∥A∥↓0, (ii) \(\max_{i\in\mathcal{M},j\in\mathcal {M}_{0} \setminus\{i\}} R_{ji}(\upsilon_{B}, d_{B}) \downarrow0\)as ∥B∥↓0.
Proposition 3.6
Fix\(i \in \mathcal{M}\). We have ℙi-a.s. (i) \(\tau_{A}^{(i)} \uparrow \infty\)asAi↓0, (ii) τA↑∞ as ∥A∥↓0, (iii) \(\upsilon_{B}^{(i)} \uparrow\infty\)as\(\overline{B}_{i}\downarrow0\), and (iv) υB↑∞ as ∥B∥↓0.
The asymptotic behavior of the detection delay is closely related to the convergence of the average increment Λn(i,j)/n. According to the next proposition, Λn(i,j)/n converges ℙi-a.s. as n↑∞ to some strictly positive constant for every \(i\in\mathcal{M}\) and \(j \in\mathcal{M}_{0}\setminus\{i\}\). The proof of Proposition 3.7 is deferred to Sect. 4, where the limiting values are analytically expressed in terms of the Kullback-Leibler divergence between the alternative probability measures.
Proposition 3.7
For every\(i \in \mathcal{M}\)and\(j \in\mathcal{M}_{0} \setminus\{i\}\), we have ℙi-a.s. Λn(i,j)/n→l(i,j) asn↑∞ for some strictly positive constantl(i,j).
Let us fix any \(i \in\mathcal{M}\). We show that, for small values of A and B, the stopping times \(\tau_{A}^{(i)}\) and \(\upsilon_{B}^{(i)}\) in (14) and (17) are essentially determined by the process Λ(i,j(i)), where
and ℙi-a.s. \(\Lambda_{n}(i,j(i))/n \approx\Phi_{n}^{(i)}/n\approx \Psi^{(i)}_{n}/n \approx l(i)\) for sufficiently large n as the next proposition suggests.
Proposition 3.8
For every\(i\in\mathcal{M}\), we have ℙi-a.s. (i) \(\Phi _{n}^{(i)}/n \rightarrow l(i)\)and (ii) \(\Psi_{n}^{(i)}/n \rightarrow l(i)\)asn↑∞.
The proof of part (i) follows from Proposition 3.7, and part (ii) follows from part (i) and Baum and Veeravalli (1994, Lemma 5.2). Proposition 3.8 implies the following convergence results.
Lemma 3.9
For every\(i \in\mathcal{M}\)and any\(j(i) \in \arg \min _{j\in \mathcal{M}_{0}\setminus \{i\}} l(i,j)\), we have ℙi-a.s.
Remark 3.10
We shall always assume that 0<Bij<1 or −∞<logBij<0 for all \(i \in\mathcal{M}\) and \(j\in\mathcal{M}_{0}\backslash\{i\}\) as we are interested in the limits of certain quantities as ∥B∥↓0. Because (25) implies that \(b_{i} \overline{B}_{i} \leq\underline{B}_{i} \leq B_{ij} \leq \overline{B}_{i}\), we have \(1 \leq\frac{-\log B_{ij}}{-\log\overline {B}_{i}} \leq\frac{-\log \underline{B}_{i}}{-\log\overline{B}_{i}} \leq\frac{-\log(b_{i}\overline{B}_{i})}{-\log\overline{B}_{i}} \leq1+\frac{-\log b_{i}}{-\log\overline{B}_{i}}\), which implies that
where the last equality follows from the first two equalities.
Because we want to minimize the mth moment of the detection delay time for any m≥1, we will strengthen the convergence results of Lemma 3.9. Condition 3.11 below for some r≥m is both necessary and sufficient for the Lm-convergences.
Condition 3.11
(Uniform Integrability)
For some r≥m,
-
(i)
the family \(\{(\tau_{A}^{(i)}/(-\log A_{i}))^{r} \}_{A_{i} >0}\) is ℙi-uniformly integrable for every \(i \in\mathcal{M}\),
-
(ii)
the family \(\{(\upsilon_{B}^{(i)}/(-\log B_{ij(i)}))^{r} \}_{B_{i} > 0}\) is ℙi-uniformly integrable for every \(i \in\mathcal{M}\).
Lemma 3.12
Letm≥1 be any integer.
-
(i)
Condition 3.11 (i) holds for somer≥mif and only if\(\mathbb{E}_{i}[(\tau^{(i)}_{A})^{m}]<\infty \)for everyAi>0 and
(28) -
(ii)
Condition 3.11 (ii) holds for somer≥mif and only if\(\mathbb{E}_{i}[(\upsilon ^{(i)}_{B})^{m}]<\infty\)for everyBi>0 and
(29)
where the limits\(\overline{B}_{i} \downarrow0\)for all\(i \in\mathcal{M}\)are taken such that (25) is satisfied.
The proof of Lemma 3.12 follows from Lemma 3.9, Chung (2001, Theorem 4.5.4), Gut (2005, Theorem 5.2) and because \(\tau_{A}^{(i)}-\theta\leq(\tau_{A}^{(i)}-\theta)_{+} \leq \tau_{A}^{(i)}\) and \(\upsilon_{B}^{(i)}-\theta\leq (\upsilon_{B}^{(i)}-\theta)_{+} \leq\upsilon_{B}^{(i)}\). Using renewal theory, one can show that Condition 3.11 holds if Λn(i,j)=X1+⋯+Xn is a random walk for some sequence (Xn)n≥1 of i.i.d. random variables with \(\mathbb{E}X_{1} > 0\) and \(\mathbb{E}[(X_{1})^{r}_{-}] < \infty\); see Lai (1975). In the case of the SMHT, Λn(i,j) is indeed a random walk with positive drift for every \(i\in\mathcal{M}\) and \(j\in\mathcal{M}_{0}\setminus\{i\}\); see Baum and Veeravalli (1994).
Condition 3.11 is often hard to verify. An alternative sufficient condition can be given in terms of the r-quick convergence. The r-quick convergence of suitable stochastic processes is known to be sufficient for the asymptotic optimalities of certain sequential rules based on non-i.i.d. observations in CPD and SMHT problems. We will show that the r-quick convergence of the LLR processes is also sufficient for the joint sequential change detection and identification problem.
Definition 3.13
(The r-quick convergence)
Let (ξn)n≥0 be any stochastic process and r>0. Then r-quick-lim infn→∞ξn≥c if and only if \(\mathbb{E}[ (T_{\delta})^{r} ] < \infty\) for every δ>0, where
According to Proposition 3.15, stated below and proved in the appendix, Condition 3.11 holds if \((\Phi_{n}^{(i)}/n)_{n \geq1}\) and \((\Psi_{n}^{(i)}/n)_{n \geq 1}\) converge r-quickly to l(i) under ℙi for every \(i\in\mathcal{M}\), which we put together as a different condition:
Condition 3.14
For some r≥1, (i) \(r\mbox {-}\mathrm {quick}\mbox {-}\liminf _{n \uparrow\infty} {\Phi_{n}^{(i)}}/n \geq l(i)\) under ℙi, (ii) \(r\mbox {-}\mathrm {quick}\mbox {-}\liminf _{n \uparrow\infty} {\Psi_{n}^{(i)}}/n \geq l(i)\) under ℙi for every \(i\in\mathcal{M}\).
Proposition 3.15
Letm≥1. (i) If Condition 3.14 (i) holds for somer≥m, then (28) and Condition 3.11 (i) hold. (ii) If Condition 3.14 (ii) holds for somer≥m, then (29) and Condition 3.11 (ii) hold.
Remark 3.16
Condition 3.14 (i) implies (ii) by (24). Moreover, Condition 3.14 holds if r-quick-lim infn↑∞(Λn(i,j)/n)≥l(i,j) under ℙi for every \(i\in \mathcal{M}\) and \(j \in\mathcal{M}_{0} \setminus\{i\}\).
3.2 Asymptotic optimality
We now prove the asymptotic optimalities of (τA,dA) and (υB,dB) for Problems 1 and 2 under Condition 3.11 (i) and (ii), respectively.
We first derive a lower bound on the expected detection delay under the optimal strategy. The lower bound on the expected detection delay under the optimal strategy can be obtained similarly to CPD and SMHT; see Baum and Veeravalli (1994), Dragalin et al. (1999), Dragalin et al. (2000), Lai (2000), Tartakovsky and Veeravalli (2004) and Baron and Tartakovsky (2006). This lower bound and Lemma 3.12 below can be combined to obtain asymptotic optimality for both problems.
Lemma 3.17
For every\(i \in\mathcal{M}\), we have
We now study how to set A in terms of c in order to achieve asymptotic optimality in Problem 1. We see from Proposition 3.4 and Lemma 3.12 that the false alarm and misdiagnosis probabilities decrease faster than the expected delay time and are negligible when A and B are small. Indeed, we have, in view of the definition of the Bayes risk in (10), by Proposition 3.4 and Lemma 3.12, for any \(0 < \sigma_{i} < \overline {a}_{i}\) for every \(i \in\mathcal{M}\),
This motivates us to choose the value of Ai such that it minimizes
over x∈(0,∞). Hence let
For example, Ai(c)=c/(σil(i)) when m=1. It can be easily verified that for every m≥1 we have \(A_{i}(c) \stackrel{c \downarrow0}{\longrightarrow} 0\) in such a way that logAi(c)∼logc as c↓0. Hence we have
Consequently, it is sufficient to show that
The proof of the asymptotic optimality below is similar to that of Theorem 3.1 in Baron and Tartakovsky (2006) for CPD.
Proposition 3.18
(Asymptotic optimality of (τA,dA) in Problem 1)
Fixm≥1 and a set of strictly positive constantsa. Under Conditions 3.11 (i) or 3.14 (i) for the givenm, the strategy (τA(c),dA(c)) is asymptotically optimal asc↓0; that is (21) holds for every\(i\in\mathcal{M}\).
It should be remarked here that the asymptotic optimality results hold for any \(0 < \sigma_{i} < \overline{a}_{i}\). However, for higher-order approximation, it is ideal to choose such that
In Sect. 5, we achieve this value using nonlinear renewal theory.
We now show that (υB,dB) is asymptotically optimal for Problem 2. By Proposition 3.4, if we set
then we have \((\upsilon_{B(\overline{R})}, d_{B(\overline{R})}) \in \Delta(\overline{R})\) for every fixed positive constants \(\overline{R} = (R_{ji})_{i \in\mathcal{M}, j \in\mathcal{M}_{0}\setminus\{i\}}\). By Lemma 3.12 (ii), \(\upsilon_{B(\overline{R})} \leq\upsilon^{(i)}_{B(\overline{R})}\) and because \(\overline{R}_{i} \downarrow0\) is equivalent to \(B_{ij(i)} (\overline{R})\downarrow0\),
This together with Lemma 3.17 shows the asymptotic optimality.
Proposition 3.19
(Asymptotic optimality of (υB,dB) in Problem 2)
Fixm≥1. Under Conditions 3.11 (ii) or 3.14 (ii) for the givenm, the strategy\((\upsilon_{B(\overline{R})},d_{B(\overline{R})})\)is asymptotically optimal as\(\|\overline{R}\| \downarrow0\), i.e., (22) holds for every\(i \in\mathcal{M}\).
4 The convergence results of the LLR processes
In this section, we will prove Proposition 3.7 and obtain the limits l(i,j) for every \(i\in\mathcal{M}\) and \(j\in\mathcal{M}_{0}\setminus\{i\}\), which can be expressed in terms of the Kullback-Leibler divergence of the pre- and post-change probability density functions and the exponential decay rate ϱ in (2) of the disorder time probability distribution. Under some mild condition, we show that the convergence also holds in Lr for every r≥1.
Let us denote the Kullback-Leibler divergence of fi from fj by
which always exists and is non-negative. Furthermore, Assumption 2.1 ensures that
To ensure that \(\mathbb{E}_{i}^{(0)} [ \log(f_{0}(X_{1}))/(f_{j}(X_{1})) ]\) exists for every \(i\in\mathcal{M}\), \(j \in\mathcal{M}_{0} \setminus\{i\}\), we assume the following.
Assumption 4.1
For every \(i \in\mathcal {M}\), we assume that q(i,0)<∞.
Since \(\mathbb{E}^{(0)}_{i}[(\log (f_{i}(X_{1})/f_{j}(X_{1})))_{-}] \leq1\) for every \(i\in\mathcal{M}\), \(j\in \mathcal{M}_{0}\setminus\{i\}\), Assumption 4.1 guarantees the existence of
4.1 Decomposition of the LLR processes
We will decompose each LLR process (1) into some random walk with a positive drift and some stochastic process whose running average increment vanishes in the limit. In the SMHT case (namely, when p0=1), for every \(i\in\mathcal{M}\) and \(j \in \mathcal{M}\setminus\{i\}\),
is a ℙi-random walk. Its running average increment Λn(i,j)/n converges ℙi-a.s. to the Kullback-Leibler divergence q(i,j) as n↑∞ by the strong law of large numbers (SLLN). Although \((\Lambda(i,j))_{j \in\mathcal{M}_{0}\setminus\{i\}}\), for p0≠0, are not ℙi-random walks, this observation nonetheless motivates us to approximate them by some random walks. Let
We show that Λ(i,j) can be approximated by a random walk with drift q(i,j)>0 if j∈Γi and with q(i,0)+ϱ>0 otherwise; namely, with drift min(q(i,j),q(i,0)+ϱ) if \(j\in\mathcal{M}\setminus\{i\}\) and q(i,0)+ϱ if j=0. Define
for every n≥1 and \(j\in\mathcal{M}_{0}\). Then it can be checked easily that, for any \(j \in\mathcal{M}_{0}\setminus\{i\}\), we have
By (7), after taking logarithms on both sides, each LLR process can be written as
where
Moreover, \(\sum_{l=1}^{n} h_{ij}(X_{l})\) can be split into post- and pre-change terms, and we have
for every fixed \(j \in\mathcal{M}_{0} \setminus\{i\}\). Notice that the first term in (45) is conditionally a random walk under \(\mathbb{P}_{i}^{(t)}\) given θ=t for every t≥0.
4.2 The convergence of the LLR processes
Fix \(i \in\mathcal{M}\) and \(j \in\mathcal{M}_{0} \setminus\{i\}\). In view of (42), we can explore the convergence for \({(\sum _{l=1}^{n} h_{ij}(X_{l}))}/n\) and ϵn(i,j)/n separately. For the first term, notice that
Because θ is an a.s. finite random variable, the first term on the righthand side converges \(\mathbb{P}^{(t)}_{i}\)-a.s. to
by the SLLN, while the second term converges to zero. Then Remark 2.5 implies Lemma 4.2, and, under some mild additional conditions, Lemma 4.3 below.
Lemma 4.2
For every\(i \in\mathcal {M}\)and\(j\in\mathcal{M}_{0} \setminus\{i\}\), we have\((1/n) {\sum_{l=1}^{n} h_{ij}(X_{l})}\mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{\mathbb{P}_{i}\mbox{\scriptsize-a.s.}} l(i,j)\).
Lemma 4.3
For every\(i \in \mathcal{M}\), \(j\in\mathcal{M}_{0} \setminus\{i\}\)andr≥1, we have\((1/n){\sum_{l=1}^{n} h_{ij}(X_{l})} \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow \infty}^{L^{r}(\mathbb{P}_{i})} l(i,j)\), if
Note that (47) holds if and only if the following condition holds.
Condition 4.4
For every \(i \in \mathcal{M}\), \(j\in\mathcal{M}_{0}\setminus\{i\}\), and r≥1, suppose that
We now show that ϵn(i,j)/n converges ℙi-a.s. to zero. The convergence result holds in Lr(ℙi) as well for r≥1 under a mild condition. To show this, we first determine the limits of \((L_{n}^{(\cdot)}/n)_{n \geq1}\) and \((K_{n}^{(\cdot)}/n)_{n \geq1}\) as n↑∞ under ℙi.
Lemma 4.5
For every\(i \in\mathcal{M}\), we have the followings under ℙi.
-
(i)
\(L_{n}^{(i)}/n \stackrel{n \uparrow\infty}{\longrightarrow} 0\)a.s.
-
(ii)
\(L_{n}^{(j)}/n \stackrel{n \uparrow\infty}{\longrightarrow}[q(i,j)-q(i,0)-\varrho]_{+}\)a.s. for every\(j \in \mathcal{M}\setminus\{i\}\).
-
(iii)
\(K_{n}^{(j)}/n \stackrel{n \uparrow\infty}{\longrightarrow}[q(i,j)-q(i,0)-\varrho]_{-}\)a.s. for every\(j \in \mathcal{M}\setminus\{i\}\).
-
(iv)
\(L_{n}^{(i)}\)converges a.s. asn↑∞ to a finite random variable\(L_{\infty}^{(i)}\).
-
(v)
\(L_{n}^{(j)}\)converges a.s. asn↑∞ to a finite random variable\(L_{\infty}^{(j)}\)for everyj∈Γi.
-
(vi)
For every\(j\in\mathcal{M}\), \((|L^{(j)}_{n}/n|^{r})_{n\geq 1}\)is uniformly integrable for everyr≥1, if
$$ \mathbb{E}^{(\infty)}\bigl[{f_0(X_1)}/ {f_j(X_1)}\bigr] <\infty\quad \mbox {and} \quad\mathbb{E}_i^{(0)} \bigl[{f_0(X_1)} / {f_j(X_1)}\bigr] <\infty.$$(48) -
(vii)
For every\(j\in\mathcal{M}\), \((|K^{(j)}_{n}/n|^{q})_{n\geq 1}\)is uniformly integrable for every 0≤q≤r, if (48) holds and
$$ \mathbb{E}^{(\infty)}\biggl \vert \log \frac{f_j(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty\quad \mbox {and}\quad \mathbb{E}_i^{(0)} \biggl \vert \log\frac {f_j(X_1)}{f_0(X_1)}\biggr \vert ^r <\infty, \quad\mathit{for\ some\ } r \geq1.$$(49)
Notice in Lemma 4.5 (vi) that in order for \(L_{n}^{(i)}\) to converge in Lr under ℙi to zero, it is sufficient to have
because \(\mathbb{E}_{i}^{(0)} [ f_{0} (X_{1}) / f_{i}(X_{1}) ] = \int_{E} f_{0}(x)m(\mathrm {d}x) = 1 < \infty\). The characterization of ϵn(i,j) in (44) leads to the next convergence result.
Lemma 4.6
For every\(i \in\mathcal {M}\)and\(j\in\mathcal{M}_{0}\setminus\{i\}\), we haveϵn(i,j)/n→0 asn↑∞ ℙi-a.s.
Moreover, the convergence holds in Lr under ℙi as well for some r≥1 given the following condition.
Condition 4.7
Given \(i\in\mathcal{M}\), \(j\in\mathcal{M}_{0}\setminus\{i\}\) and r≥1, we suppose that (50) holds and (i) j∈Γi and (48) holds, or (ii) j∉Γi or j=0 and (49) holds for the given r.
Lemma 4.8
Fix\(i \in\mathcal {M}\), \(j\in \mathcal{M}_{0}\setminus\{i\}\)andr≥1. Under Condition 4.7, ϵn(i,j)/n→0 asn↑∞ inLr(ℙi).
By combining the results in Lemmas 4.5 and 4.6, Proposition 3.7 indeed holds with l(⋅,⋅) as defined in (46). Moreover, the following convergence results hold by Lemmas 4.5 and 4.8.
Proposition 4.9
For every\(i\in\mathcal{M}\)and\(j\in\mathcal{M}_{0}\setminus\{i\}\), we have Λn(i,j)/n→l(i,j) asn↑∞ inLr(ℙi) for somer≥1 if Conditions 4.4 and 4.7 hold for the givenr.
Remark 4.10
-
(i)
Observe from (46) that we have l(i,j)≤l(i,0) for every \(i \in\mathcal{M}\) and \(j \in \mathcal{M}_{0}\setminus\{i\}\), and the equality holds if and only if \(j \in \mathcal{M}_{0} \setminus(\Gamma_{i} \cup\{i\})\).
-
(ii)
Because q(i,j)=0 if and only if \(\int_{\{x\in E: f_{i}(x) \neq f_{j}(x)\}}f_{i}(x)m(\mathrm {d}x)=0\), Assumption 2.1 guarantees that l(i,j)>0 for every \(i \in\mathcal{M}\) and \(j \in\mathcal{M}_{0}\setminus\{i\}\).
-
(iii)
We later assume, in Sect. 5 below for higher-order approximations, that there is a unique \(j(i)\in \mathcal{M}_{0}\setminus\{i\}\) such that \(l(i) = l(i,j(i)) = \min_{j \in\mathcal{M}_{0} \setminus\{i\}}l(i,j)\) for every \(i\in\mathcal{M}\). Then (i) implies l(i)<l(i,0) and q(i,j(i))<q(i,0)+ϱ, and j(i)∈Γi and Γi≠∅.
Remark 4.11
We proved a number of results on the convergence of the LLR processes. However, those results do not guarantee their r-quick convergence. A sufficient condition derived by means of Jensen’s inequality can be found in our technical report (Dayanik et al. 2011).
5 Higher-order approximations
In this section, we derive a higher-order asymptotic approximation for the minimum Bayes risk in Problem 1 by choosing the values of σ in (31) as discussed in the previous section. Proposition 3.4 (i) gives an upper bound on \((R_{i}^{(a)} (\cdot,\cdot))_{i \in\mathcal{M}}\), and here we investigate if there exists some σ such that (36) holds.
5.1 Asymptotic behaviors of the false alarm and misdiagnosis probabilities
Fix \(i \in\mathcal{M}\). By (12) and because \(\tau_{A} =\tau^{(i)}_{A}\) on {dA=i,θ≤τA<∞}, we have
Suppose that \(H_{i}^{(a)}(A_{i})\) is bounded from below by some constant b and \(H_{i}^{(a)}(A_{i})\) converges as Ai↓0 in distribution to some random variable \(H_{i}^{(a)}\) under ℙi. Then, because x↦e−x is continuous and bounded on x∈[b,∞], we have \({R_{i}^{(a)} (\tau_{A},d_{A})} / {A_{i}} \stackrel{A_{i} \downarrow0}{\longrightarrow} \mathbb {E}_{i} [\exp\{- H_{i}^{(a)} \}]\), and therefore (36) holds with \(\sigma_{i} =\mathbb{E}_{i}[ \exp\{- H_{i}^{(a)} \}]\).
Recall that \(\tau_{A}^{(i)}\) is the first time the process \(\Phi_{n}^{(i)}\) exceeds the threshold −logAi, and −logAi↑∞⟺Ai↓0. The following lemma shows that the convergence holds on condition that the overshoot
converges in distribution as Ai↓0 to some random variable Wi under ℙi.
Lemma 5.1
Fix\(i \in\mathcal{M}\). Ifj(i) is unique and the overshootWi(Ai) in (53) converges in distribution asAi↓0 to some random variableWiunder ℙi, then (36) holds with\(\sigma_{i} := a_{j(i)i} \mathbb{E}_{i} [ \exp\{- W_{i} \}]\).
In Lemma 5.1 above, σi does not depend on aji for any \(j \in\mathcal{M}_{0} \setminus\{i,j(i) \}\) and therefore we see that Rji(τA,dA) is negligible compared with Rj(i)i(τA,dA) for any \(j \in\mathcal{M}_{0} \setminus\{i,j(i) \}\) for small A.
5.2 Nonlinear renewal theory and the overshoot distribution
We now see that Lemma 5.1 indeed holds via nonlinear renewal theory on condition that j(i) is unique. We obtain the limiting distribution of the overshoot (53).
Observe that, for every \(k \in\mathcal{M}_{0} \setminus\{i\}\),
By (45) and (54), we have \(\Phi_{n}^{(i)} =\sum_{l=\theta\vee1}^{n} h_{i j(i)} (X_{l}) + \xi_{n}(i,j(i))\), where
We will take advantage of the fact that, given θ, the process \(\sum_{l=\theta\vee1}^{n} h_{i j(i)} (X_{l})\) is conditionally a random walk and ξn(i,j(i)) can be shown to be “slowly-changing”, in the sense that ξn+1(i,j(i))−ξn(i,j(i))≈0 for large n. This implies that the increments of the slowly-changing process ξn(i,j(i)) are negligible compared to those of the random walk term \(\sum_{l=\theta\vee1}^{n} h_{ij(i)}(X_{l})\) at every large n. This result can be used to obtain the overshoot distribution of the process Φ(i) at its boundary-crossing time \(\tau^{(i)}_{A}\) for small Ai by means of the nonlinear renewal theory (Woodroofe 1982; Siegmund 1985). Let us firstly give a few definitions and state a fundamental theorem of nonlinear renewal theory.
Definition 5.2
A sequence of random variables (ξn)n≥1 is called uniformly continuous in probability (u.c.i.p.) if for every ε>0, there is δ>0 such that ℙ{max0≤k≤nδ|ξn+k−ξn|≥ε}≤ε for every n≥1.
Definition 5.3
A sequence of random variables (ξn)n≥1 is said to be slowly-changing if it is u.c.i.p. and
Remark 5.4
If a process converges a.s. to a finite random variable, then it is a slowly-changing process. Moreover, the sum of two slowly-changing processes is also a slowly-changing process.
The following theorem states that, if a process is the sum of a random walk with positive drift and a slowly-changing process, then the overshoot at the first time it exceeds some threshold has the same asymptotic distribution as that of the overshoot of the random walk, as the threshold tends to infinity.
Theorem 5.5
(Woodroofe 1982, Theorem 4.1; Siegmund 1985, Theorem 9.12)
On some\((\Omega,\mathcal {E},\mathbb{P})\), let (Zn)n≥1be a sequence of i.i.d. random variables with some common nonarithmetic distribution and mean\(0<\mathbb{E}Z_{1} < \infty\). Let (ξn)n≥1be a slowly-changing process and (Zk)k≥n+1be independent of (ξl)1≤l≤nfor everyn≥1. If\(\widetilde{T}_{b}:= \inf\{ n \geq1: \sum_{i=1}^{n} Z_{i} - \xi_{n} > b\}\)and\(T_{b} :=\inf\{ n \geq1: \sum_{i=1}^{n} Z_{i} > b\}\)for everyb≥0,
We fix \(i \in\mathcal{M}\) and obtain the limiting distribution of the overshoot Wi(Ai) as Ai↓∞ using Theorem 5.5.
Lemma 5.6
Fix\(i \in\mathcal{M}\)andt≥0. Ifj(i) is unique, thenξn(i,j(i)) is slowly-changing under\(\mathbb{P}_{i}^{(t)}\).
For every t≥1 and \(j(i) \in \arg \min _{j \in\mathcal{M}_{0}\setminus \{i\}} l(i,j)\), define a stopping time,
and random variable \(W^{(t)}_{i}\) whose distribution is given by
The next lemma follows immediately from Theorem 5.5.
Lemma 5.7
Fix\(i \in\mathcal {M}\)andt≥0. Ifj(i) is unique, then the overshootWi(Ai) converges to\(W^{(t)}_{i}\)in distribution under\(\mathbb{P}_{i}^{(t)}\)asAi↓0.
Note that the distribution of \(W_{i}^{(t)}\) under \(\mathbb{P}^{(t)}_{i}\) is identical to that of \(W_{i}^{(0)}\) under \(\mathbb{P}^{(0)}_{i}\) for every t≥1, which leads to Lemma 5.8 below.
Lemma 5.8
Fix\(i\in\mathcal{M}\). Ifj(i) is unique, then asAi↓0 the overshootWi(Ai) converges in distribution under ℙito a random variableWiwhose distribution under ℙiis identical to that of\(W^{(0)}_{i}\)in (58) under\(\mathbb{P}^{(0)}_{i}\).
Finally, Lemmas 5.1 and 5.8 prove Proposition 5.9 below.
Proposition 5.9
Fix\(i \in \mathcal{M}\)and supposej(i) is unique. Then\({R_{i}^{(a)} (\tau _{A},d_{A})} /{A_{i}} \stackrel{A_{i} \downarrow0}{\longrightarrow} a_{j(i) i} \mathbb{E}_{i} [ e^{-W_{i}}]\), whereWiis the random variable defined in Lemma 5.8. Therefore, a higher-order approximation for Problem 1 can be achieved by setting in (32)
6 Numerical examples
To assess the performance of the asymptotically optimal rule, one firstly needs to find, for comparison, the optimal solution. As outlined in Sect. 2, in order to solve optimally the fixed-error-probability formulation, one first needs to transform it to a minimum Bayes risk formulation by means of Lagrange relaxation, and then solve repeatedly the latter for different values of Lagrange multipliers. Because this method requires extensive calculations and its details are not of the primary interest of this paper, we focus on the minimum Bayes risk formulation and evaluate the performance of the strategy (τA(c),dA(c)) numerically in the i.i.d. Gaussian case described below. Its asymptotic optimality ensures that the strategy is near-optimal when the unit detection delay cost c is small. Our numerical example suggests that it is near-optimal even for mildly higher values of the unit detection delay cost.
6.1 The Gaussian case
Suppose that the observations \(X_{n} = (X_{n}^{(1)},\ldots,X_{n}^{(K)})\), n≥1 form a sequence of K-tuple Gaussian random variables. Conditionally on θ and μ, they are mutually independent and have common means \((\lambda_{0}^{(1)}, \ldots,\lambda_{0}^{(K)})\) before θ and \((\lambda_{\mu}^{(1)}, \ldots,\lambda_{\mu}^{(K)})\) at and after θ and common variances (1,…,1) at all times. The Kullback-Leibler divergence between the probability density functions under μ=i and μ=j is \(q(i,j) = \frac{1}{2} \sum_{k = 1}^{K} ( \lambda_{i}^{(k)} -\lambda_{j}^{(k)} )^{2}\) for every \(i\in\mathcal{M}\), \(j\in\mathcal{M}_{0}\setminus\{i\}\). Because Conditions 4.4 and 4.7 are satisfied, Propositions 3.7 and 4.9 hold with
and \(l(i,0) = \varrho+ \frac{1}{2} \sum_{k = 1}^{K} (\lambda_{i}^{(k)} - \lambda_{0}^{(k)})^{2}\) for every \(i \in \mathcal{M}\).
6.2 Numerical validation of Proposition 3.7
Let M=3, K=1, p0=0, p=0.1, (ν1,ν2,ν3)=(1/3,1/3,1/3), and \((\lambda_{0}^{(1)},\lambda_{1}^{(1)},\lambda_{2}^{(1)},\lambda_{3}^{(1)}) = (0,0.2,0.3,0.8)\). The limiting values l(⋅,⋅) in (60) are reported in Table 1. Figure 2 shows sample realizations of (Λn(μ,j)/n)n≥1, j∈{0,1,2,3}∖{μ} and \((\Phi_{n}^{(\mu)}/n)_{n \geq1}\) given (a) μ=1 and θ=10, (b) μ=1 and θ=1000 and (c) μ=2 and θ=10. The figures and the limiting values in Table 2 are consistent as expected from Proposition 3.7. As guaranteed by Proposition 3.8, the process \((\Phi_{n}^{(i)}/n)_{n\geq1}\) converges to l(i).
6.3 The numerical comparison of the minimum and asymptotically minimum Bayes risks
We calculate the minimum and asymptotically minimum Bayes risks for the following example. We assume that M=2, K=2, p0=0, p=0.01, (ν1,ν2)=(0.1,0.9), and the mean vectors \(\lambda_{0}=(\lambda^{(1)}_{0},\lambda^{(2)}_{0})\) and \(\lambda_{i}=(\lambda^{(1)}_{i},\lambda^{(2)}_{i})\), i=1,2 before and after the change, respectively, satisfy
Table 2 compares the performances of the strategy (τA(c),dA(c)) and the optimal strategy for fixed aji=1 for every \(i \in\mathcal{M}\) and \(j \in\mathcal{M}_{0} \setminus\{i\}\) as the unit detection delay cost c decreases. The optimal stopping regions are found by the value iteration described by Dayanik et al. (2008). The Bayes risks of the strategies are estimated via Monte Carlo simulation. For accurate approximations, we used (59), and \((\sigma_{i})_{i \in \mathcal{M}}\) are computed with Monte Carlo methods.
We see that (τA(c),dA(c)) is asymptotically optimal; the ratio of the optimal and approximate Bayes risk values converges to 1 as c↓0 as listed in the last column. Moreover, the approximate and the minimum Bayes risk values are close even for large c values, and this is due to the higher-order approximation as studied in Sect. 5.
Change history
07 October 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10479-021-04269-9
References
Baron, M., & Tartakovsky, A. G. (2006). Asymptotic optimality of change-point detection schemes in general continuous-time models. Sequential Analysis, 25(3), 257–296.
Baum, C. W., & Veeravalli, V. V. (1994). A sequential procedure for multihypothesis testing. IEEE Transactions on Information Theory, 40(6), 1994–2007.
Bertsekas, D. P. (2005). Dynamic programming and optimal control (Vol. I). Belmont: Athena Scientific.
Borkar, V. S. (1991). A remark on control of partially observed Markov chains. Annals of Operations Research, 29(1–4), 429–438.
Burnetas, A. N., & Katehakis, M. N. (1997). Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, 22(1), 222–255.
Bussiere, M., & Fratzscher, M. (2006). Towards a new early warning system of financial crises. Journal of International Money and Finance, 25, 953–973.
Chung, K. L. (2001). A course in probability theory (3rd ed.). Academic Press: San Diego.
Dayanik, S., Goulding, C., & Poor, H. V. (2008). Bayesian sequential change diagnosis. Mathematics of Operations Research, 33(2), 475–496.
Dayanik, S., Powell, W. B., & Yamazaki, K. (2011). Asymptotic theory of sequential change detection and identification (Technical report). Center for the Study of Finance and Insurance, Osaka University. Available at http://www-csfi.sigmath.es.osaka-u.ac.jp/en/activity/technicalreport.php.
Derman, C. (1970). Finite state Markovian decision processes. New York: Academic Press.
Dragalin, V. P., Tartakovsky, A. G., & Veeravalli, V. V. (1999). Multihypothesis sequential probability ratio tests. I. Asymptotic optimality. IEEE Transactions on Information Theory, 45(7), 2448–2461.
Dragalin, V. P., Tartakovsky, A. G., & Veeravalli, V. V. (2000). Multihypothesis sequential probability ratio tests. II. Accurate asymptotic expansions for the expected sample size. IEEE Transactions on Information Theory, 46(4), 1366–1383.
Gut, A. (1988). Applied probability. A series of the applied probability trust: Vol. 5. Stopped random walks. New York: Springer.
Gut, A. (2005). Probability: a graduate course. Springer texts in statistics. New York: Springer.
Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., & Weiss, D. (2004). Syndromic surveillance in public health practice. CDC.
Lai, T. L. (1975). On uniform integrability in renewal theory. Bulletin of the Institute of Mathematics, Academia Sinica, 3(1), 99–105.
Lai, T. L. (1977). Convergence rates and r-quick versions of the strong law for stationary mixing sequences. Annals of Probability, 5(5), 693–706.
Lai, T. L. (2000). Sequential multiple hypothesis testing and efficient fault detection-isolation in stochastic systems. IEEE Transactions on Information Theory, 46(2), 595–608.
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28(1–4), 47–65.
Page, E. S. (1954). Continuous inspection schemes. Biometrika, 41, 100–115.
Puterman, M. (1994). Markov decision processes—discrete stochastic dynamic programming. New York: Wiley.
Runggaldier, W. J. (1991). On the construction of ϵ-optimal strategies in partially observed MDPs. Annals of Operations Research, 28(1–4), 81–95.
Siegmund, D. (1985). Springer series in statistics. Sequential analysis. New York: Springer.
Tartakovsky, A. G., & Veeravalli, V. V. (2004). Statist. textbooks monogr.: Vol.173. Change-point detection in multichannel and distributed systems (pp. 339–370). New York: Dekker.
Wald, A. (1947). Sequential analysis. New York: Wiley.
White, C. C. III (1991). A survey of solution techniques for the partially observed Markov decision process. Annals of Operations Research, 32(1–4), 215–230.
Woodroofe, M. (1982). CBMS-NSF regional conference series in applied mathematics: Vol.39. Nonlinear renewal theory in sequential analysis. Philadelphia: SIAM.
Acknowledgements
The authors thank Alexander Tartakovsky for the illuminating discussions. We also thank an anonymous referee and the editors for the constructive remarks and suggestions which significantly improved our presentation. The research of Savas Dayanik was supported by the TÜBİTAK Research Grants 109M714 and 110M610. Warren B. Powell was supported in part by the Air Force Office of Scientific Research, contract FA9550-08-1-0195, and the National Science Foundation, contract CMMI-0856153. Kazutoshi Yamazaki was in part supported by Grant-in-Aid for Young Scientists (B)22710143, the Ministry of Education, Culture, Sports, Science and Technology, and Grant-in-Aid for Scientific Research (B)2271014, Japan Society for the Promotion of Science.
Author information
Authors and Affiliations
Corresponding author
Additional information
The original online version of this article was revised
Appendix A: Proofs and auxiliary results
Appendix A: Proofs and auxiliary results
1.1 A.1 Proof of Remark 2.2
We will prove that
which implies that ℙ-a.s. \(0< \Pi^{(i)}_{n} ={\alpha^{(i)}_{n}}/{(\sum_{j\in\mathcal{M}_{0}} \alpha^{(j)}_{n})} =({\alpha^{(i)}_{n}/\prod^{n}_{k=1} f_{0}(X_{k})})/ ({\sum _{j\in\mathcal{M}_{0}}\alpha^{(j)}_{n}/\prod^{n}_{k=1} f_{0}(X_{k})})<1\) for every \(i\in\mathcal{M}\), because \({\alpha^{(0)}_{n}}/{\prod^{n}_{k=1} f_{0}(X_{k})} =(1-p_{0})(1-p)^{n}>0\) and
To prove (61), let Ei:={x:0<fi(x)/f0(x)<∞} for every \(i\in\mathcal{M}\). Then Assumption 2.1 implies that
Because ℙ{θ≤1,μ=j}>0 for every \(j\in \mathcal{M}\) and ℙ{θ>1}>0, we must have \(\int_{E_{i}} f_{j}(x) m(\mathrm {d}x) =1\) for every \(j \in\mathcal{M}_{0}\). Therefore, for every \(i\in\mathcal{M}\), \(\mathbb{P}\{0 <\prod^{n}_{k=1} \frac{f_{i}(X_{k})}{f_{0}(X_{k})} <\infty\} = \mathbb{P}\{0<\frac{f_{i}(X_{k})}{f_{0}(X_{k})}<\infty\ \forall 1\leq k \leq n \} \) equals
1.2 A.2 Proof of Lemma 2.3
Because \(\mathbb{P}(F \cap\{\mu=j, \theta\leq\tau< \infty \}) = \sum_{n=0}^{\infty}\mathbb{P}(F \cap\{ \tau = n\} \cap\{\theta\leq n, \mu=j \})=\)
the first equality follows. The proof of the second equality is similar.
1.3 A.3 Proof of Proposition 3.4
(i) Since \(\tau_{A} = \tau_{A}^{(i)}\) on {dA=i,τA<∞}, \(G_{i}^{(a)} (\tau_{A}) \leq\overline{a}_{i} \sum_{j\in \mathcal{M}_{0} \setminus\{i\}} \exp\{-\Lambda_{\tau_{A}^{(i)}}(i,j)\} =\overline{a}_{i} \exp\{-\Phi_{\tau_{A}^{(i)}}^{(i)} \} < \overline{a}_{i}A_{i}\) by (13), where the equality and the last inequality follow from (15) and (16), respectively. Hence, we have \(R_{i}^{(a)} (\tau_{A},d_{A}) = \mathbb{E}_{i}[1_{\{d_{A}=i, \theta\leq\tau_{A} < \infty\}} G_{i}^{(a)} (\tau_{A})] \leq \overline{a}_{i} \,A_{i}\). Because \(\exp\{-\Lambda_{\tau_{A}}(i,j) \} =\Pi^{(j)}_{\tau_{A}}/\Pi^{(i)}_{\tau_{A}} \leq (1-\Pi^{(i)}_{\tau_{A}})/\Pi^{(i)}_{\tau_{A}} < A_{i}\), we have \(R_{ji}(\tau_{A},d_{A}) = \nu_{i} \mathbb{E}_{i} [1_{\{d_{A}=i,\theta\leq \tau_{A} <\infty\}} \exp\{-\Lambda_{\tau_{A}}(i,j)\}] \leq\nu_{i} A_{i} \leq\nu_{i}\|A\|\). (ii) Because \(\upsilon_{B} = \upsilon_{B}^{(i)}\) on {dB=i,θ≤υB<∞}, and \(\Lambda_{\upsilon_{B}^{(i)}} (i,j) > - \log B_{ij}\), Proposition 2.4 implies \(R_{ji} (\upsilon_{B}, d_{B}) =\nu_{i} \mathbb{E}_{i} [ 1_{\{ d_{B} = i, \theta\leq\upsilon_{B} < \infty \}}\exp\{-\Lambda_{\upsilon_{B}} (i,j)\}] \leq\nu_{i} B_{ij}\).
1.4 A.4 Proof of Proposition 3.6
For (i), because \((\tau_{A}^{(i)})\) increases as Ai↓0, it is enough to show that there is a subsequence the limit of which exists and equals ∞, ℙi-a.s. Fix n≥1. By (14), we have \(\mathbb{P}_{i} \{ \tau_{A}^{(i)} \leq n\} =\mathbb{P}_{i} (\bigcup^{n}_{k=1} \{\Pi^{(i)}_{k} > 1/{(1+A_{i})} \}) \leq \sum^{n}_{k=1} \mathbb{P}_{i} \{\Pi^{(i)}_{k} > 1 /(1+A_{i}) \}\). Therefore, \(\limsup_{A_{i} \downarrow0}\mathbb{P}_{i} \{ \tau_{A}^{(i)}\leq n \}\leq\sum_{k=1}^{n} \limsup_{A_{i} \downarrow0} \mathbb{P}_{i} \{ \Pi _{k}^{(i)} >1 /(1+A_{i}) \} \leq\sum_{k=1}^{n} \mathbb{P}_{i} \{ \Pi_{k}^{(i)} = 1\}\), which is zero by Remark 2.2. Namely, \(\tau_{A}^{(i)}\rightarrow\infty\) in probability under ℙi as Ai↓0. Hence, there is a subsequence of (Ai) along which ℙi-a.s. \(\tau_{A}^{(i)} \uparrow\infty\), which proves (i).
Because ℙ{dA=j,μ=i}=ℙ{dA=j,θ≤τA<∞,μ=i}+ℙ{dA=j,τA<θ,μ=i}≤Rij(τA,dA)+R0j(τA,dA)≤2νjAj by Proposition 3.4 (i), for every fixed n≥1, we have
which goes to zero as ∥A∥↓0 by (i) and by Proposition 3.4. Namely, τA→∞ in probability under ℙi as ∥A∥↓0; therefore, there is a subsequence of (τA)A>0 that goes to ∞, ℙi-a.s. as ∥A∥↓0. Because (τA)A>0 is increasing ℙi-a.s. as ∥A∥↓0, its limit exists and equals ∞, ℙi-a.s. as well, and (ii) follows.
Similarly, we have \(\mathbb{P}_{i} \{ \upsilon_{B}^{(i)} \leq n \} \leq \sum_{k=1}^{n} \mathbb{P}_{i} \{ \Psi_{k}^{(i)} > - \log\overline{B}_{i}\}\). Because, for every fixed k≥1, \(\{\Psi^{(i)}_{k} > - \log \overline{B}_{i} \} = \{\min_{j \in\mathcal{M}_{0}\setminus\{i\}}\Lambda_{k}(i,j) > -\log\overline{B}_{i} \} = \{\max_{j \in\mathcal{M}_{0}\setminus\{i\}} (\Pi^{(j)}_{k}/\Pi^{(i)}_{k}) < \overline{B}_{i} \}\subseteq\{\sum_{j \in\mathcal{M}_{0}\setminus\{i\}} (\Pi^{(j)}_{k}/\Pi^{(i)}_{k}) < M \overline{B}_{i} \} =\{(1-\Pi^{(i)}_{k})/\Pi^{(i)}_{k} < M \overline{B}_{i} \} =\{\Pi^{(i)}_{k} > 1/(1+M \overline{B}_{i}) \}\), we have \(\limsup_{\overline{B}_{i} \downarrow0} \mathbb{P}_{i} \{ \upsilon_{B}^{(i)}\leq n\} \leq\sum_{k=1}^{n} \limsup_{\overline{B}_{i} \downarrow0} \mathbb {P}_{i} \{\Pi_{k}^{(i)} > 1/(1+M \overline{B}_{i}) \} \leq\sum_{k=1}^{n} \mathbb {P}_{i} \{\Pi_{k}^{(i)} = 1 \}=0\) by Remark 2.2. Therefore, as in the proof of (i), ℙi-a.s. \(\upsilon_{B}^{(i)} \rightarrow\infty\) as \(\overline{B}_{i}\downarrow0\), and (iii) follows. Furthermore, (iv) is immediate because, for every fixed n≥1, Proposition 3.4 (ii) implies \(\mathbb{P}_{i} \{ \upsilon_{B}\leq n\} \leq\mathbb{P}_{i} \{ \upsilon_{B}^{(i)} \leq n \} + \frac{1}{\nu_{i}} \sum_{j \in\mathcal{M}\setminus\{i\}}(R_{0j}(\upsilon_{B},d_{B})+R_{ij}(\upsilon_{B},d_{B})) \leq\mathbb{P}_{i} \{\upsilon_{B}^{(i)} \leq n \} + \frac{1}{\nu_{i} }\sum_{j \in\mathcal{M}\setminus\{i\}} \nu_{j} (B_{j0}+B_{ji})\stackrel{\|B\| \downarrow 0}{\longrightarrow} 0\).
1.5 A.5 Proof of Lemma 3.9
First, (16) implies that \(\Phi^{(i)}_{\tau_{A}^{(i)}-1} / {(\tau_{A}^{(i)}-1)} \leq-{\log A_{i}} /{(\tau_{A}^{(i)}-1)}\) and \(-{\log A_{i}} /{\tau_{A}^{(i)}} <\Phi^{(i)}_{\tau_{A}^{(i)}} /\allowbreak {\tau_{A}^{(i)}}\). By Proposition 3.8 (i) and Proposition 3.6 (i), we have \(l(i) \leq\liminf_{A_{i}\downarrow0} [{(-\log A_{i})} /\allowbreak {(\tau_{A}^{(i)}-1)}]\) and \(\limsup_{A_{i} \downarrow0} [{(-\log A_{i})} / {\tau_{A}^{(i)}}] \leq l(i)\), ℙi-a.s, which proves (i). Because \(\tau _{A}^{(i)}-\theta \leq(\tau_{A}^{(i)}-\theta)_{+} \leq\tau_{A}^{(i)}\) and \(\theta/(-\log A_{i}) \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{A_{i} \downarrow0}^{\mathbb{P}_{i}\mbox {-a.s.}} 0\), (ii) follows from (i). Similarly, (23) implies that \(\Psi^{(i)}_{\upsilon_{B}^{(i)}-1} / {(\upsilon_{B}^{(i)}-1)} \leq -{\log \underline{B}_{i}}/{(\upsilon_{B}^{(i)}-1)}\) and \(-{\log\overline{B}_{i}}/ {\upsilon_{B}^{(i)}} < \Psi_{\upsilon_{B}^{(i)}} / {\upsilon_{B}^{(i)}}\). By Proposition 3.8 (ii) and Proposition 3.6 (iii), we have \(l(i) \leq \liminf_{\overline{B}_{i} \downarrow0}[ {(-\log\underline{B}_{i})} /{(\upsilon_{B}^{(i)}-1)}]\) and \(\limsup_{\overline{B}_{i} \downarrow0}[{(-\log\overline{B}_{i})} / {\upsilon_{B}^{(i)}}] \leq l(i)\), ℙi-a.s. If we divide and multiply by −logBij(i) before we take the limits and use (27), then (iii) follows; (iv) follows from (iii) because \(\upsilon_{B}^{(i)}-\theta\leq (\upsilon_{B}^{(i)}-\theta)_{+} \leq\upsilon_{B}^{(i)}\) and \(\theta /(-\log B_{ij(i)}) \stackrel{\overline{B}_{i} \downarrow0}{\longrightarrow} 0\) ℙi-a.s.
1.6 A.6 Proof of Proposition 3.15
Fix \(i \in\mathcal{M}\). (i) Lemma 3.9 (i) and Fatou’s lemma give the inequality
Let us next define \(T_{\delta}:= \inf\{ n \geq1: \inf_{k \geq n}(\Phi_{k}^{(i)}/k) > l(i) - \delta\}\) for every 0<δ<l(i). Because by hypothesis \(\Phi_{n}^{(i)}/n\) converges m-quickly (m≤r) to l(i) as n↑∞ under ℙi, \(\mathbb{E}_{i} [ ( T_{\delta})^{m} ] < \infty\) for every 0<δ<l(i). On \(\{ \tau_{A}^{(i)} > T_{\delta}\}\equiv\{\tau_{A}^{(i)}-1 \geq T_{\delta}\}\), we have \({\Phi^{(i)}_{\tau_{A}^{(i)}-1}}/{(\tau_{A}^{(i)}-1)} \geq l(i) - \delta \Longleftrightarrow\tau_{A}^{(i)} \leq {\Phi^{(i)}_{\tau_{A}^{(i)}-1}}/{(l(i) - \delta)} + 1\). Because \(\Phi_{\tau_{A}^{(i)}-1}^{(i)} < - \log A_{i}\) by definition, \(\tau _{A}^{(i)} < - {\log A_{i}}/{(l(i) - \delta)} + 1\) on \(\{ \tau_{A}^{(i)} > T_{\delta}\}\), and we obtain \(\tau_{A}^{(i)} =\tau_{A}^{(i)} 1_{\{\tau_{A}^{(i)} > T_{\delta}\}} + \tau_{A}^{(i)}1_{\{\tau_{A}^{(i)} \leq T_{\delta}\}} < -{\log A_{i}} / {(l(i) -\delta)} + 1 + T_{\delta}\). After dividing both sides by (−logAi) and taking the m-norm on both sides, Minkowski’s inequality applied to the righthand side gives
which is finite for every 0<δ<l(i). Then \(\limsup_{A_{i}\downarrow0} \mathbb{E}_{i} [({\tau_{A}^{(i)}}/{(-\log A_{i}}))^{m} ]^{1/m}\leq 1/{(l(i)-\delta)}\) for 0<δ<l(i). Letting δ↓0 gives \(\limsup_{A_{i} \downarrow0} \mathbb{E}_{i} [({\tau _{A}^{(i)}}/({-\log A_{i}}))^{m}]^{1/m} \leq1/l(i)\), which together with (62) proves (i).
(ii) Lemma 3.9 (iii) and Fatou’s lemma imply that
Let us define \(T_{\delta}:= \inf\{ n \geq1: \inf_{k \geq n}({\Psi_{k}^{(i)}} / k) > l(i) - \delta\}\) for every 0<δ<l(i). Because by hypothesis \(\Psi_{n}^{(i)} / n\) converges m-quickly (m≤r) to l(i) as n↑∞ under ℙi, we have \(\mathbb{E}_{i} [ ( T_{\delta})^{m} ] < \infty\) for every 0<δ<l(i). Using a similar argument as in the first part, we can show that \(\overline{\upsilon}^{(i)}_{B} < -\log \underline{B}_{i}/(l(i)-\delta)+1+T_{\delta}\). After diving both sides by \((-\log\underline{B}_{i})\) and taking the m-norm of both sides, an application of Minkowski’s inequality on the righthand side gives
which is finite for every 0<δ<l(i). Then \(\limsup_{\overline{B}_{i} \downarrow0} \mathbb{E}_{i}[({\overline{\upsilon}_{{B}}^{(i)}}/({-\log\underline{B}_{i}}))^{m}]^{1/m} \leq1/{(l(i)-\delta)}\) for 0<δ<l(i). Letting δ↓0 gives \(\limsup_{\overline{B}_{i} \downarrow0}\mathbb{E}_{i}[({\overline{\upsilon}_{{B}}^{(i)}}/({-\log\underline{B}_{i}}))^{m}]^{1/m} \leq1/l(i)\). After raising both sides to power m, the inequality \({\upsilon}_{{B}}^{(i)} \leq \overline{\upsilon}_{{B}}^{(i)}\) implies \(\limsup_{\overline{B}_{i}\downarrow0} \mathbb{E}_{i} [({{\upsilon}_{{B}}^{(i)}}/({-\log \underline{B}_{i}}))^{m} ] \leq\limsup_{\overline{B}_{i} \downarrow0}\mathbb{E}_{i} [({\overline{\upsilon}_{{B}}^{(i)}}/({-\log\underline {B}_{i}}))^{m}] \leq1/l(i)^{m}\). Dividing and multiplying the lefthand side with (−logBij(i))m prior to taking the limit give \(\limsup_{\overline{B}_{i} \downarrow0} \mathbb{E}_{i}[({{\upsilon}_{{B}}^{(i)}}/({-\log B_{ij(i)}}))^{m} ] \leq1/l(i)^{m}\) thanks to (27). The last inequality and (63) prove (ii).
1.7 A.7 Proof of Remark 3.16
Because Condition 3.14 (i) implies (ii), it is enough to show for (i). Fix \(i \in\mathcal{M}\). For every fixed δ>0 and n>(2logM)/δ, we have \(\Phi_{n}^{(i)}/n > l(i) - \delta \Longleftrightarrow\sum_{j \in\mathcal{M}_{0} \setminus\{i\}}e^{-\Lambda_{n}(i,j)} < e^{-n(l(i)-\delta)} \)
Let \(T_{\delta}(i) := \inf\{ n \geq1: \inf_{k \geq n}({\Phi_{k}^{(i)}} / k) > l(i) - \delta\}\) and Tδ(i,j):=inf{n≥1:infk≥n(Λk(i,j)/k)>l(i,j)−δ} for \(j \in\mathcal{M}_{0} \setminus\{i\}\) and δ>0. Then \(T_{\delta}(i) \leq( \max_{j \in\mathcal{M}_{0}\setminus \{i\}} T_{\delta/2}(i,j) ) \lor(2 \log M)/\delta\), and
for every δ>0, because r-quick-lim infn↑∞(Λn(i,j)/n)≥l(i,j) under ℙi for every \(j \in\mathcal{M}_{0} \setminus\{i\}\). Therefore, \(r\mbox {-}\mathrm {quick}\mbox {-}\liminf _{n \uparrow\infty} {\Phi_{n}^{(i)}}/n \geq l(i)\) under ℙi for every \(i\in\mathcal{M}\).
1.8 A.8 Proof of Lemma 3.17
The proof requires the following three lemmas.
Lemma A.1
For every\(i \in\mathcal{M}\), \(j \in\mathcal{M}_{0} \setminus\{i\}\), L>0, c>1, we have
Proof
By Proposition 2.4, \(R_{ji}(\tau,d) =\nu_{i} \mathbb{E}_{i} [ 1_{\{d= i , \theta\leq\tau< \infty\}}e^{-\Lambda_{\tau}(i,j)}] = \mathbb{E}[ 1_{\{\mu= i , \theta\leq \tau<\infty, d=i \}}\cdot e^{-\Lambda_{\tau}(i,j)}]\), and
for every fixed B>0. Hence, we have \(\mathbb{P}\{\mu= i, \tau-\theta> L \} \geq\mathbb{P}\{\mu= i, \theta+ L < \tau< \infty\}\geq \mathbb{P}\{ \mu= i, \theta\leq\tau< \infty, d=i \} - e^{B}R_{ji}(\tau,d) - \mathbb{P}\{ \mu= i, \sup_{n \leq\theta+L}\Lambda_{n}(i,j) > B \} = \nu_{i} - \nu_{i} R_{i}^{(1)} (\tau,d) -e^{B} R_{ji}(\tau,d) - \mathbb{P}\{ \mu= i, \sup_{n \leq\theta+L}\Lambda_{n}(i,j) > B \}\). Dividing by νi=ℙ{μ=i} gives \(\mathbb{P}_{i} \{\tau- \theta> L \} \geq1 - R_{i}^{(1)}(\tau,d) - \frac{e^{B}}{\nu_{i}} R_{ji}(\tau,d) - \mathbb{P}_{i} \{\sup_{n\leq \theta+ L}\Lambda_{n}(i,j) > B\}\). By setting B=cLl(i,j) and taking infimum on both sides,
Now the lemma holds because \((\tau,d) \in\Delta(\overline{R})\) implies that \(R_{i}^{(1)}(\tau,d) \leq\frac{\sum_{j \in\mathcal{M}_{0}\setminus \{i\}} \overline{R}_{ji}}{\nu_{i}}\) and \(R_{ji}(\tau,d) \leq \overline{R}_{ji}\). □
Lemma A.2
For every\(i \in\mathcal {M}\)andc>1, we have\(\mathbb{P}_{i} \{ \sup_{n \leq\theta+ L} \Lambda_{n}(i,j(i)) >c L l(i)\} \stackrel{L \uparrow\infty}{\longrightarrow} 0\).
Proof
Since Λn(i,j(i))/n converges ℙi-a.s. to l(i) as n↑∞ by Assumption 3.7, there is ℙi-a.s. finite Kc such that \(\sup_{n > K_{c}}\frac{\Lambda_{n} (i,j(i))_{+}}{n} = \sup_{n > K_{c}} \frac {\Lambda_{n}(i,j(i))}{n} < ( 1+ (c-1)/2 ) l(i)\), ℙi-a.s. Moreover, ℙi{supn≤θ+LΛn(i,j(i))>cLl(i)}≤
where \(F_{L} := \{ \frac{\sup_{n \leq K_{c}}\Lambda_{n}(i,j(i))_{+}}{L} + \frac{\theta+ L}{L} \sup_{n >K_{c}} \frac{\Lambda_{n}(i,j(i))_{+}}{n} > c l(i) \}\). Because both Kc and θ are ℙi-a.s. finite,
by Remark 2.2. Thus, \(1_{F_{L}} \rightarrow 0\) as L↑∞ ℙi-a.s., implying \(\mathbb {P}_{i} (F_{L})\stackrel{L \uparrow\infty}{\longrightarrow} 0\), and the claim holds by (64). □
Lemma A.3
For every 0<δ<1, \(i\in\mathcal{M}\)andj(i), \(\liminf_{ \overline{R}_{i} \downarrow0}\inf _{(\tau,d) \in\Delta (\overline{R})} \mathbb{P}_{i} \{ \tau- \theta\geq\delta\frac {|\log( {\overline{R}_{j(i)i}}/ {\nu_{i}} )|}{l(i)} \} \geq1\).
Proof
Fix \(0 < \overline{R}_{j(i)i} < \nu_{i}\). Then \(- \log({\overline{R}_{j(i)i}}/ {\nu_{i}} ) = |\log({\overline{R}_{j(i)i}}/ {\nu_{i}} )|\). If in Lemma A.1 we set j=j(i), \(L :=L(\overline{R}_{j(i)i}) = \delta{|\log({\overline{R}_{j(i)i}}/ {\nu_{i}} )|}/ {l(i)}\), and choose c>1 such that 0<cδ<1, then we have
which is 1−o(1) as \(\overline{R}_{i} \downarrow0\), because 0<1−cδ<1 and by Lemma A.2 noting that \(\overline{R}_{i}\downarrow 0\) implies L↑∞. □
Proof of Lemma 3.17
Fix a set of positive constants \(\overline{R}\), 0<δ<1 and (τ,d)∈Δ. By Markov inequality,
By taking limits on both sides,
which is greater than or equal to δ by Lemma A.3. The claim is proved because 0<δ<1 is arbitrary. □
1.9 A.9 Proof of Proposition 3.18
Assume on the contrary that \(\liminf_{c \downarrow0}\inf_{(\tau,d) \in\Delta} R^{(c,a,m)}_{i}(\tau,d)/{g_{i}^{(c)}(A_{i}(c))} < 1\), implying that there is a decreasing subsequence (cn)n≥1↓0 and corresponding strategies \((\tau_{c_{n}}^{*},d_{c_{n}}^{*})\) such that
By (34), \(\inf_{(\tau,d) \in\Delta}R^{(c_{n},a,m)}_{i}(\tau,d) \leq R^{(c_{n},a,m)}_{i}(\tau_{A(c_{n})},d_{A(c_{n})}) \stackrel{n \uparrow \infty}{\longrightarrow} 0\). Therefore, \(\| R(\tau_{c_{n}}^{*},d_{c_{n}}^{*}) \|\stackrel{n \uparrow\infty}{\longrightarrow} 0\), where \(R(\tau_{c_{n}}^{*},d_{c_{n}}^{*}) =(R_{ji}(\tau_{c_{n}}^{*},d_{c_{n}}^{*}) )_{i \in\mathcal{M}, j \in\mathcal{M}_{0}\setminus\{i\}}\) are the false alarm and misdiagnosis probabilities corresponding to the strategy \((\tau_{c_{n}}^{*},d_{c_{n}}^{*})\). Consequently, Lemma 3.17 applies and we have \(D_{i}^{(m)} (\tau_{c_{n}}^{*}) \geq\inf_{(\tau,d) \in \Delta(R(\tau_{c_{n}}^{*},d_{c_{n}}^{*}))} D_{i}^{(m)} (\tau) \geq ( |\log(R_{j(i)i}(\tau_{c_{n}}^{*},d_{c_{n}}^{*})/\nu_{i} )|/l(i))^{m} (1 + o(1))\), where o(1)↓0 as n↑∞. Finally, \(R_{i}^{(c_{n},a,m)} (\tau_{c_{n}}^{*},d_{c_{n}}^{*}) \geq c_{n}D_{i}^{(m)} (\tau_{c_{n}}^{*}) + a_{j(i) i}R_{j(i)i}(\tau_{c_{n}}^{*},d_{c_{n}}^{*})/\nu_{i} \geq\)
where the last inequality follows from (33). However, this contradicts with (65), and the proof is complete.
1.10 A.10 Proof of Lemma 4.3
By Lemma 4.2, it is sufficient to show that \((\vert (1/n) {\sum_{l=1}^{n} h_{ij}(X_{l})} \vert ^{r})_{n \geq1}\) is uniformly integrable under ℙi. The running sum \(\sum_{l=1}^{n} h_{ij}(X_{l})\) is a random walk under both ℙ(∞) and \(\mathbb{P}_{i}^{(0)}\), and it is uniformly integrable under both measures because (47) holds; see Gut (1988, Theorem 4.1). Hence, it is also uniformly integrable as well under ℙi because \(\mathbb{E}_{i} Z \leq\mathbb {E}^{(\infty)} Z +\mathbb{E}_{i}^{(0)} Z\) for every random variable Z.
1.11 A.11 Proof of Lemma 4.5
We first prove the following.
Lemma A.4
Let (ξn)n≥1be a positive stochastic process andTan a.s. finite random time defined on the same probability space\((\Omega, \mathcal {E}, \mathbb{P})\). GivenT, the random variables (ξn)n≥1are conditionally independent, and (ξn)1≤n≤T−1and (ξn)n≥Thave common conditional probability distributions ℙ∞and ℙ0on, the expectations with respect to which are denoted by\(\mathbb{E}_{\infty}\)and\(\mathbb{E}_{0}\), respectively. Suppose that\(\mathbb{E}_{\infty}[\log \xi_{1}]\)and\(\mathbb{E}_{0}[\log\xi_{1}]\)exist, and define
for some fixed constantc>0. Then the followings results hold under ℙ:
-
(i)
We have\(\eta_{n} \stackrel{n \uparrow\infty}{\longrightarrow}\lambda_{+}\)a.s.
-
(ii)
Ifλ<0, then the processψnconverges asn↑∞ to a finite limit a.s.
-
(iii)
Ifγ<∞, then (|ηn|r)n≥1is uniformly integrable.
-
(iv)
Ifr≥1 and\(\max\{\mathbb{E}_{\infty} [|\log\xi_{1}|^{r} ],\mathbb{E}_{0} [|\log\xi_{1}|^{r} ]\}<\infty\), then (|Φn|q)n≥1is uniformly integrable for every 0≤q≤r.
Proof of Lemma A.4
Let \(\zeta_{n} := \log( \prod_{k=1}^{n} \xi_{k} ) = \sum_{k=1}^{n}\log\xi_{k}\). We will firstly prove (i)–(ii) by considering cases −∞<λ<0, 0≤λ<∞, λ=∞, and λ=−∞, separately.
Case 1: −∞<λ<0. First, because \(\eta_{n} \geq (1/n)\log e^{\Phi_{1}} =\Phi_{1}/n = (\log\xi_{1})/n\), we have lim infn↑∞ηn≥0 a.s. It is, therefore, enough to prove that its limit superior is less than or equal to zero.
By the SLLN and because T is a.s. finite, the exceptional set Ω0:={ω∈Ω:ζn(ω)/n↛λ or T(ω)=∞} has zero measure. Fix ω∈Ω∖Ω0 and choose sufficiently small ε>0 such that λ+ε<0. Then we can choose Nε(ω)≥T(ω) such that, for every k≥Nε(ω), \(\frac{\zeta_{k}(\omega)-\zeta_{T(\omega)-1}(\omega)}{k-(T(\omega)-1)}< \lambda+ \varepsilon< 0\). For every n≥Nε(ω),
Because λ+ε<0,
which equals \(e^{\zeta_{T(\omega)-1}(\omega) + (-T(\omega)+1)(\lambda + \varepsilon)} / {(1-e^{\lambda+\varepsilon})}\) and hence (67) is bounded by \(B(\omega) :=c+\sum_{k=1}^{N_{\varepsilon}(\omega)-1} e^{\zeta_{k} (\omega)} +({e^{\zeta_{T(\omega)-1}(\omega) + (-T(\omega)+1) (\lambda+\varepsilon)}} )/({1-e^{\lambda+\varepsilon}}) < \infty\), independently of n. Therefore, lim supn↑∞ηn(ω)=lim supn↑∞(ψn(ω)/n)≤lim supn↑∞(logB(ω)/n)=0, as desired. Because ℙ(Ω∖Ω0)=1, we have lim supn↑∞ηn≤0 a.s. Finally, because ψn(ω)≤logB(ω) for every n≥Nε(ω) for a.e. ω and because ψn is increasing in n, ψn converges to a finite limit a.s.
Case 2: 0≤λ<∞. First note that, the SLLN and the finiteness of T imply \(\eta_{n} \geq\frac{1}{n} \log(\xi_{1} \cdots\xi_{n}) = \frac{1}{n}\sum_{k=1}^{T-1} \log\xi_{k} + \frac{n-T+1}{n} \cdot \frac{1}{n-T+1} \sum_{k=T}^{n} \log\xi_{k}\mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{a.s.} \lambda\); therefore, lim infn↑∞ηn≥λ a.s. It is now sufficient to show that lim supn↑∞ηn−λ≤0.
Fix any realization ω∈Ω∖Ω0 and ε>0, where Ω0 is defined in Case 1. Let Nε(ω)≥T(ω) be such that
Then for every n≥Nε(ω),
where the last inequality holds because by (68)
Moreover, for \(n \geq\tilde{\tau}_{\varepsilon}(\omega) :=N_{\varepsilon}(\omega) \vee[\log (c+\sum_{k=1}^{N_{\varepsilon}(\omega)-1} e^{\zeta_{k}(\omega)} )/\lambda]\), we have \(c e^{-n \lambda}+\sum_{k=1}^{N_{\varepsilon}(\omega)-1} e^{\zeta_{k}(\omega)-n \lambda}\leq1\); thus, letting A(ω):=ζT(ω)−1(ω)+(−T(ω)+1)(λ+ε) gives
Because ε>0 is arbitrary and ℙ(Ω∖Ω0)=1, we have lim supn↑∞ηn−λ≤0, a.s.
Case 3:λ=−∞. For m∈(0,1), n≥1, define \(\xi_{n}^{(m)} := m \vee\xi_{n} \geq m\). Because \(-\infty=\mathbb{E}_{0}[ \log\xi_{1} ]=\mathbb{E}_{0} [ (\log\xi_{1})_{+} ]-\mathbb{E}_{0} [ (\log \xi_{1})_{-} ]\), we have \(\mathbb{E}_{0}[ (\log\xi_{1})_{+} ] < \infty\) and \(\mathbb{E}_{0} [ (\log \xi_{1})_{-} ] = \infty\). Consequently, \(\mathbb{E}_{0} [ (\log \xi_{1}^{(m)})_{+} ] = \mathbb{E}_{0} [(\log m \vee\log\xi_{1})_{+} ] =\mathbb{E}_{0} [(\log \xi_{1})_{+}] < \infty\), and \(\mathbb{E}_{0} [ (\log\xi_{1}^{(m)})_{-} ] =\mathbb{E}_{0}[ (\log m \vee\log\xi_{1})_{-} ] = \mathbb{E}_{0}[ ( \log m )_{-} \wedge(\log\xi_{1})_{-} ] \leq (\log m)_{-}< \infty\). Hence, \(\lambda^{(m)} := \mathbb{E}_{0} [ \log\xi_{1}^{(m)} ]\) is well-defined and
for every m∈(0,1). Because \(0 \leq(\log\xi_{1}^{(m)})_{-} = (\log m)_{-} \wedge(\log\xi_{1})_{-} \uparrow(\log\xi_{1})_{-}\) as m↓0, the monotone convergence theorem implies that \(\lim_{m \downarrow 0} \mathbb{E}_{0} [ (\log\xi_{1}^{(m)})_{-} ] = \mathbb{E}_{0} [(\log\xi_{1})_{-} ] = \infty\). Therefore, there exists m0∈(0,1) such that for every m≥m0, \(\mathbb{E}_{0} [(\log \xi_{1}^{(m)})_{-} ] > \mathbb{E}_{0} [ (\log\xi_{1})_{+} ]\), and λ(m)∈(−∞,0) by (69). Now define \(\psi_{n}^{(m)} := \log(c + \xi_{1}^{(m)} +\xi_{1}^{(m)}\xi_{2}^{(m)} + \cdots+ \xi_{1}^{(m)} \cdots \xi_{n}^{(m)} )\) and \(\eta_{n}^{(m)} :=\frac{1}{n} \psi_{n}^{(m)}\) for every n≥1 and m∈(0,1). By Case 1, \(\lim_{n \uparrow \infty} \psi_{n}^{(m)} < \infty\) and \(\lim_{n \uparrow\infty}\eta_{n}^{(m)} = 0\) a.s. for every m≥m0.
Because n↦ψn is increasing, limn↑∞ψn exists and \(\log c \leq\psi_{n} \leq\psi_{n}^{(m_{0})}\) for all n≥0 (note \(\xi_{n} \leq\xi_{n}^{(m)}\), n≥0), we have \(\log c\leq\lim_{n \uparrow\infty}\psi_{n} \leq\lim_{n \uparrow \infty}\psi_{n}^{(m_{0})}\). Therefore, limn↑∞ψn is a finite random variable and \(\eta_{n} = \psi_{n}/n\stackrel{n \uparrow\infty}{\longrightarrow} 0\) a.s.
Case 4:λ=∞. For every M>1 and n≥1, define \(\xi_{n}^{(M)} := M \wedge\xi_{n} \leq M\). Since \(\infty=\mathbb{E}_{0}[ \log\xi_{1} ] = \mathbb{E}_{0} [ (\log\xi_{1})_{+}]-\mathbb{E}_{0}[ (\log\xi_{1})_{-} ]\), we have \(\mathbb{E}_{0} [ (\log\xi_{1})_{+}] = \infty\) and \(\mathbb{E}_{0} [ (\log\xi_{1})_{-} ] <\infty\). Then \(\mathbb{E}_{0} [(\log\xi_{1}^{(M)})_{-} ] = \mathbb{E}_{0} [ (\log M\wedge\log \xi_{1})_{-}] = \mathbb{E}_{0} [ (\log\xi_{1})_{-}] < \infty\), and \(\mathbb {E}_{0} [(\log \xi_{1}^{(M)})_{+} ]\allowbreak = \mathbb{E}_{0} [ (\log M \wedge\log\xi_{1})_{+} ]= \mathbb{E}_{0} [ ( \log M )_{+} \wedge(\log\xi_{1})_{+} ] =\mathbb{E}_{0} [ ( \log M ) \wedge(\log\xi_{1})_{+} ]\) ≤logM<∞. Hence, \(\lambda^{(M)} := \mathbb{E}_{0} [ \log \xi_{1}^{(M)} ]\) is well-defined and
for every M≥1. Because \(0 \leq( \log\xi_{1}^{(M)})_{+}= (\log M) \wedge(\log\xi_{1})_{+} \uparrow(\log\xi_{1})_{+}\) as M↑∞, the monotone convergence theorem implies \(\lim_{M\uparrow\infty} \mathbb{E}_{0} [ ( \log\xi_{1}^{(M)})_{+} ] = \mathbb {E}_{0} [ (\log\xi_{1} )_{+} ] = \infty\).
Therefore, there exists M0>1 such that for every M≥M0, \(\mathbb{E}_{0} [ ( \log\xi_{1}^{(M)})_{+} ] > \mathbb{E}_{0}[ ( \log\xi_{1})_{-} ]\) and therefore, λ(M)∈(0,∞) by (70). Now, define \(\psi_{n}^{(M)} := \log( c + \xi_{1}^{(M)} +\xi_{1}^{(M)}\xi_{2}^{(M)} + \cdots+ \xi_{1}^{(M)} \cdots \xi_{n}^{(M)} )\) and \(\eta_{n}^{(M)} :=\psi_{n}^{(M)}/ n\) for every n≥1 and M>1. By Case 2, \(\lim_{n \uparrow\infty}\eta_{n}^{(M)} = \lambda^{(M)}\) ℙ-a.s for M≥M0. Because \(\xi_{n} \geq M \wedge\xi_{n} = \xi_{n}^{(M)}\), we have \(\psi_{n} \geq \psi_{n}^{(M)}\) and \(\eta_{n} \geq\eta_{n}^{(M)}\). Therefore, \(\liminf_{n\uparrow\infty} \eta_{n} \geq\lim_{n \uparrow\infty} \eta_{n}^{(M)} =\lambda^{(M)}\) for every M≥M0.
Finally, lim infn↑∞ηn≥limM↑∞λ(M) equals by (70) \(\lim_{M \uparrow\infty} (\mathbb{E}_{0} [ (\log\xi_{1}^{(M)})_{+}] - \mathbb{E}_{0} [ (\log\xi_{1}^{(M)})_{-} ] )= \mathbb{E}_{0} [ (\log\xi_{1})_{+} ] - \mathbb{E}_{0} [ (\log \xi_{1})_{-} ] = \mathbb{E}_{0} [ \log\xi_{1} ] =\lambda= \infty\), by monotone convergence, which implies limn↑∞ηn=λ=λ+. This completes the proof of (i)–(ii).
We now prove (iii) using the next sufficient condition for uniform integrability. □
Lemma A.5
(Woodroofe 1982)
Let (Xn)n≥1be a stochastic process andr≥1 be fixed. Then (|Xn|r)n≥1is uniformly integrable if\(\int^{\infty}_{0} x^{r-1} \sup_{n\geq1} \mathbb{P}\{|X_{n}|> x\}\mathrm {d}x <\infty\).
Fix r≥1. We will show that \(\int^{\infty}_{0} x^{r-1} \sup _{n\geq 1} \mathbb{P}\{|\eta_{n}|^{r} > x\} \mathrm {d}x =\int^{\infty}_{0} x^{r-1} \sup_{n\geq1} \mathbb{P}\{|\psi_{n}| > nx^{1/r}\}\mathrm {d}x <\infty\), which implies the uniform integrability of (|ηn|r)n≥1 by Lemma A.5. Note supn≥1ℙ{|ψn|>nx1/r}≤supn≥1ℙ{ψn<−nx1/r}+supn≥1ℙ{ψn>nx1/r}. Because ψn≥logc, we have ℙ{ψn<−nx1/r}≤ℙ{ψn<−x1/r}=0 for every x≥|logc|r and n≥1, and hence \(\int^{\infty}_{0} x^{r-1} \sup_{n\geq1} \mathbb{P}\{\psi_{n} < - nx^{1/r}\}\mathrm {d}x\leq\int^{|\log c|^{r}}_{0} x^{r-1} \mathrm {d}x < \infty\).
On the other hand, because ξ1,ξ2,… are conditionally independent given T, and \(\mathbb{E}[\xi_{k} \mid T] = \alpha1_{\{k\leq T\}} +\beta1_{\{k\geq T\}} \leq\max\{\alpha,\beta\} = :\gamma<\infty\), Markov inequality gives
Therefore, \(\int^{\infty}_{0} x^{r-1} \sup_{n\geq1} \mathbb{P}\{\psi _{n} > nx^{1/r}\} \mathrm {d}x \leq\int^{\gamma^{r}}_{0} x^{r-1} \mathrm {d}x + (c+\frac{1+\gamma}{\gamma} )\int^{\infty}_{\gamma^{r}} x^{r-1}e^{-(x^{1/r}-\gamma)} \mathrm {d}x =\frac{1}{r}\gamma^{r^{2}} + (c+\frac{1+\gamma}{\gamma} ) r e^{\gamma} \int^{\infty}_{\gamma}y^{r^{2}-1} e^{-y} \mathrm {d}y < \infty\), which completes the proofs of (iii).
Finally, for the proof of (iv), note \(\Phi_{n} = \frac{1}{n}\sum^{n}_{k=1}\log\xi_{k}\) and \(|\Phi_{n}|^{r} \leq(\frac{1}{n} \sum^{n}_{k=1}|\log \xi_{k}| )^{r} \leq\frac{1}{n} \sum^{n}_{k=1} |\log\xi_{k}|^{r}\) since r≥1 and x↦xr is convex on x∈ℝ+. Since \(\mathbb{E}(|\log \xi_{k}|^{r} \mid T )\)\(\leq\max\{\mathbb{E}_{\infty}[|\log\xi _{1}|^{r}], \mathbb{E}_{0}[|\log \xi_{1}|^{r}]\}\) for every k≥1,
for every n≥1, and \(\sup_{n\geq1} \mathbb{E}|\Phi_{n}|^{r}<\infty \). Moreover, for every ε>0, there exists some δ>0 such that max{ℙ∞(A),ℙ0(A)}<δ implies that \(\max\{\mathbb{E}_{\infty} [|\log\xi_{1}|^{r} 1_{A}],\mathbb{E}_{0} [|\log\xi_{1}|^{r} 1_{A} ] \} <\varepsilon\), and
for every n≥1, which also implies, together with the boundedness of \((\mathbb{E}|\Phi _{n}|^{r})_{n\geq 1}\), that (|Φn|r)n≥1 is uniformly integrable. This completes the proof of (iv) and the lemma. □ Now we are ready to prove Lemma 4.5. Note that for every \(j \in\mathcal{M}\) and n≥2,
if in (66) we set \(\xi_{l} := (1-p) \frac {f_{0}(X_{l})}{f_{j}(X_{l})}\) and \(c := \frac{p_{0} +(1-p_{0})p}{(1-p_{0})p} > 0\).
Given that μ=i and θ=t for any fixed \(i\in\mathcal{M}\) and t≥1, the random variables ξt,ξt+1,… are conditionally i.i.d. with a common distribution independent of t; thus, the change time θ plays the role of the random time T in Lemma A.4. Then by Lemma A.4 (i) and (38) we have \(L_{n}^{(j)}/n \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n\uparrow \infty}^{\mathbb{P}_{i}-a.s.}(\mathbb{E}_{i}^{(0)} [ \log((1-p) \frac{f_{0}(X_{1})}{f_{j}(X_{1})}) ] )_{+} = [ q(i,j)-q(i,0) - \varrho ]_{+}\), which proves (ii) immediately if \(j\in\mathcal{M}\setminus\{i\}\), and (i) and (iv) by Lemma A.4 (ii) if j=i after noticing that \(\mathbb{E}_{i}^{(0)} [ \log((1-p) \frac {f_{0}(X_{1})}{f_{i}(X_{1})} ) ] = q(i,i) -q(i,0) - \varrho= -q(i,0) -\varrho< 0\), by (37). Similarly, if j∈Γi, (v) holds by Lemma A.4 (ii), since \(\mathbb{E}_{i}^{(0)} [ \log((1-p) \frac {f_{0}(X_{1})}{f_{j}(X_{1})} ) ] = q(i,j) -q(i,0) - \varrho<0\) by the definition of Γi. By (40), the SLLN and (ii),
which equals [q(i,j)−q(i,0)−ϱ]− and proves (iii). For the proof of (vi), note that by Minkowski’s inequality
Because (|log[(1−p0)p]/n|r)n≥1 is bounded, and according to Lemma A.4 (iii) the process (|ψn/n|r)n≥1 is uniformly integrable under ℙi for every r≥1 when (48) is satisfied, we have (vi). Finally, for the proof of (vii), (40) implies
Because (48) holds, \((|L^{(j)}_{n}/n|)_{n\geq1}\) is uniformly integrable by (vi). If we set ξk:=[1/(1−p)][fj(Xk)/f0(Xk)] for every k≥1 in (66), then (49) and Lemma A.4 (iv) imply that \((|\frac{1}{n}\log\prod^{n}_{k=1}(\frac{1}{1-p}\frac{f_{j}(X_{k})}{f_{0}(X_{k})})|^{r})_{n\geq1}\) is uniformly integrable. Therefore, \((|K^{(j)}_{n}/n|^{r})_{n\geq1}\) is uniformly integrable, and the proof of (vii) is complete.
1.12 A.12 Proof of Lemma 5.1
It is sufficient to prove that \(H_{i}^{(a)} (A_{i})\) in (52) converges in distribution as Ai↓0 to Wi−logaj(i)i under ℙi and that \(H_{i}^{(a)} (A_{i})\) is bounded from below by some constant.
Because \(G_{i}^{(a)}(n) = \sum_{j \in\mathcal{M}_{0} \setminus\{i\}} a_{ji}e^{-\Lambda_{n}(i,j)} = (\sum_{j \in\mathcal{M}_{0} \setminus\{i\}}e^{-\Lambda_{n}(i,j)} ) M_{i}(n)\) for every n≥1, where \(M_{i}(n) := {(\sum_{j \in\mathcal{M}_{0} \setminus\{i\}} a_{ji}e^{-\Lambda_{n}(i,j)})} / {(\sum_{j \in\mathcal{M}_{0} \setminus\{i\}}e^{-\Lambda_{n}(i,j)})}\), we have by (15)
Because j(i) is unique and \(\Lambda_{n}(i,j)/n \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{n \uparrow\infty}^{\mathbb{P}_{i}\mbox {\scriptsize -a.s.}}l(i,j)\) for every \(j \in\mathcal{M}_{0}\setminus\{i\}\) and l(i)<l(i,j) for every \(j \in\mathcal{M}_{0}\setminus\{i,j(i)\}\), we have
Because \(\tau^{(i)}_{A} \mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{A_{i} \downarrow0}^{\mathbb{P}_{i}\mbox {\scriptsize -a.s.}} \infty\) by Proposition 3.6, this implies \(- \log(M_{i} (\tau_{A}^{(i)}))\mathop {{\hbox to1cm{\rightarrowfill }}}\limits _{A_{i} \downarrow0}^{\mathbb{P}_{i}\mbox {\scriptsize -a.s.}} - \log a_{j(i) i}\). By Proposition 3.4, \(\mathbb{P}_{i} \{d_{A} =i, \theta\leq\tau_{A} < \infty\} = 1 - \frac{1}{\nu_{i}} \sum_{j \in \mathcal{M}_{0} \setminus\{i\}} R_{ji} (\tau_{A},d_{A})\) converges to 1 as Ai↓0; i.e., \(1_{ \{d_{A}=i,\theta\leq\tau_{A} < \infty\}}\) converges in probability under ℙi to 1. These together with the assumption on the convergence of Wi(Ai) show the convergence of \(H_{i}^{(a)} (A_{i})\). Finally, because (71) is bounded from below by \(-\log\overline{a}_{i}\) and \(-\log1_{ \{d_{A}=i, \theta\leq \tau_{A} < \infty\}} \geq0\), \(H_{i}^{(a)} (A_{i})\) is bounded from below by \(-\log\overline{a}_{i}\).
1.13 A.13 Proof of Lemma 5.6
It is sufficient to show that ξn(i,j(i)) converges \(\mathbb{P}_{i}^{(t)}\)-a.s. to a finite random variable by Remarks 2.5 and 5.4. Firstly, because j(i) is unique, j(i)∈Γi by Remark 4.10 (iii). Consequently, ϵn(i,j(i)) converges \(\mathbb{P}_{i}^{(t)}\)-a.s. to a finite random variable by Lemma 4.5 (iv) and (v). Secondly, ηn(i,j(i)) converges \(\mathbb{P}_{i}^{(t)}\)-a.s. to zero by Propositions 3.7 and 3.8. Finally, \(\lim_{n \uparrow\infty }\sum_{l=1}^{n \wedge(\theta-1)} \log ( {f_{i}(X_{l})}/{f_{j(i)}(X_{l})} )\) exists \(\mathbb{P}^{(t)}_{i}\)-a.s. and equals \(\mathbb{P}^{(t)}_{i}\)-a.s. finite random variable \(\sum_{l=1}^{\theta-1} \log({f_{i}(X_{l})}/{f_{j(i)}(X_{l})})\).
1.14 A.14 Proof of Lemma 5.8
Let g:ℝ↦ℝ be continuous and bounded. By the bounded convergence theorem and Lemma 5.7,
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dayanik, S., Powell, W.B. & Yamazaki, K. Asymptotically optimal Bayesian sequential change detection and identification rules. Ann Oper Res 208, 337–370 (2013). https://doi.org/10.1007/s10479-012-1121-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-012-1121-6