Abstract
The problem of estimating the change point in a sequence of independent observations is considered. Hinkley (1970) demonstrated that the maximum likelihood estimate of the change point is associated with a two-sided random walk in which the ascending and descending epochs and heights are the key elements for its evaluation. The aim here is to expand the information generated from the random walks and from fluctuation theory and applied to the change point formulation. This permits us to obtain computable expressions for the asymptotic distribution of the change point with respect to convolutions and Laplace transforms of the likelihood ratios. Further, if moment expressions of the likelihood ratios are known, explicit representations of the asymptotic distribution of the change point become accessible up to the second order with respect to the moments. In addition, the rate of convergence between the finite and infinite distribution of the change point distribution is established and it is shown to be of polynomial order.
Similar content being viewed by others
References
Asmussen S (2003) Applied probability and queues. application of mathematics, vol 51, 2nd edn. Springer, New York
Barndorff-Nielsen O (1978) Information and exponential families in statistical theory. Wiley, New York
Bayer N (1996) On the identification of Wiener-Hopf factors. Queueing Syst 23:293–300
Bender EA (1974) Asymptotic methods in enumeration. Siam Rev 16:485–515
Bingham NH (1973) Limit theorems in fluctuation theory. Adv Appl Prob 5:554–569
Bingham NH, Goldie CM, Teugels JL (1987) Regular variations. Encyclopedia of mathematics and its applications. Cambridge University Press, New York
Borovkov AA (1976) Stochastic processes in queueing theory. Springer-Verlag, New York
Bryn-Jones A, Doney RA (2006) A functional limit theorem for random walk to stay non-negative. J Lond Math Soc 74:244–258
Caravenna F (2005) A local limit theorem for random walk conditioned to stay positive. Probab Theory Relat Fields 133:508–530
Caravenna F, Chaumont L (2008) Invariance principles for random walk conditioned to stay positive. Ann Inst Henri Poincare Probab Statist 44:170–190
Chung KL (1974) A course in probability theory, 2nd edn. North-Holland, Amsterdam
Daniel HE (1954) Saddlepoint approximations in statistics. Ann Math Stat 25:631–650
Daniel HE (1987) Tail probability approximations. Int Stat Rev 55:37–48
Doney RA, Jones EM (2012) Conditioned random walks and Lévy processes. J Lond Math Soc 44:139–150
Embrechts P, Hawkes J (1982) A limit theorem for the tails of discrete infinitely divisible laws with applications to fluctuation theory. JAMS A 32:412–422
Esscher F (1932) On approximate computations when the corresponding characteristic function are known. Skand Akt Tidskt 1963:78–86
Feller W (1971) An Introduction to probability theory and its applications, vol 2, 2nd edn. Willey, New York
Field C, Ronchetti E (1990) Small sample asymptotics. Institute of Mathematical Statistics, Hayward
Fotopoulos SB (2009) The geometric convergence rate of the classical change-point estimate. Stat Prob Lett 79:131–137
Fotopoulos SB, Jandhyala VK (2001) Maximum likelihood estimation of a change point for exponentially distributed random variables. Stat Prob Lett 51:423–429
Fotopoulos SB, Jandhyala VK, Khapalova E (2010) Exact asymptotic distribution of change-point MLE for change in the mean of Gaussian sequences. Ann Appl Stat 4:1081–1104
Fotopoulos SB, Paparas A, Jandhyala VK (2021) Change point detection and estimation methods under gamma series of observations. Statistical Papers (to appear).
Gradshteyn IS, Ryzhik IM (2000) Table of integrals series and products, 6th edn. Academic Press, Cambridge
Hayman WK (1956) A generalization of Stirling’s formula. J Reine Angew Math 196:67–95
Hinkley DV (1970) Inference about the change-point in a sequence of random variables. Biometrika 57:1–17
Inglehart DL (1974) Random walks with negative drift conditioned to stay positive. J Appl Probab 11:742–751
Jandhyala VK, Fotopoulos SB (1999) Capturing the distributional behavior of the maximum likelihood estimator of a change-point. Biometrika 86:129–140
Jandhyala VK, Fotopoulos SB (2001) Rate of convergence of the maximum likelihood estimate of a change-point. Sankhyă Ser A 63:277–285
Jensen JL (1995) Saddlepoint approximations. Oxford University Press, Oxford
Karamata J (1930) Sur un mode de croissance régulière des functions. Mathematica 4:38–53
Lugannani R, Rice S (1980) Saddle point approximation for the distribution of the sum of independent random variables. Adv in Appl Probab 12:475–490
Martin RJ (2004) Credit portfolio modeling handbook. Credit Suisse First Boston, New York
Meir A, Moon JW (1978) On the altitude of nodes in random trees. Can J Math 30:997–1015
Meir A, Moon JW (1987) On an asymptotic method in enumeration. J Comb. Theory Ser A 51:77–89
Nevzorov VB (1987) The distribution of the maximum term in a sequence of sample means. Theory Probab Appl 32:125–130
Prabhu NU (1980) Stochastic storage processes. Queues, insurance risk and dams, 2nd edn. Springer-Verlag, New York
Resnick S (1992) Adventures in stochastic processes. Birkhäuser, Boston, Basel, Berlin
Rogozin BA (1971) The distribution of the first ladder moment and the height and fluctuation of random walks. Theory Probab Appl 16:575–595
Rojo J (1996) On tail categorization of probability laws. J Am Stat Assoc 91:378–384
Spitzer F (1974) Principles of random walk. Van Nostrand, Princeton
Vatutin VA, Wachtel V (2009) Local probabilities for random walks conditioned to stay positive. Probab Theory Related Fields 143:177–217
Vatutin VA, Wachtel V (2010) Sudden extinction of a critical branching process in random environment. Theory Probab Appl 54:466–484
Veraverbeke N (1977) Asymptotic behavior of Wiener–Hopf factors of a random walk. Stoch Process Appl 5:27–37
Veraverbeke N, Teugels JL (1975) The exponential rate of convergence of the distribution of the maximum of the random walk. J Appl Probab 12:279–288
Acknowledgements
The author thanks the referee for a careful and detailed reading of the article in which some important flaws in the earlier version were discovered and for other useful comments which improved the readability.
Author information
Authors and Affiliations
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
1.1 Wiener–Hopf formulae and their consequences
Some of the results below are emanated from the work Spitzer (1974), Feller (1971, XII, and XVIII), Chung (1974), Inglehart (1974), Prabhu (1981), Asmussen (2003) among many others. They represent the behavior of a random walk on the line and conditioned that the random walk stays on the positive.
To gain more insight into the significance of Theorem 1.1 and Theorem 1.2, we first concentrate on the asymptotic expressions that appear in the theorems. The following result reveals some expressions between the time the maximum occurs and the time the random walk stays positive. The following results are a modification of Borovkov’s (1976) claim.
Theorem A.1
(Borovkov 1976). The sequence \(\left\{ {Y_{k}^{\left( 1 \right)} \in \mathcal{X}:k \in {\mathbb{N}}} \right\}\) satisfies:
Based on Corollary 6 in Borovkov (1976, p. 93), we have:
Theorem A.2
If \(Y_{k}^{\left( 1 \right)}\), \(k \in {\mathbb{N}}\) are non-lattice, then,
Proof
When \(\left| z \right| < 1\), \({\text{Im}} \left( \lambda \right) = 0\), then \(\left| {z \, \hat{f}_{{Y^{\left( 1 \right)} }} \left( \lambda \right)} \right| = \left| {z\int_{\mathbb{R}} {e^{i\lambda x} dP\left( {Y^{\left( 1 \right)} \le x} \right)} } \right| < 1\). Set \(\wp_{z}^{\left( 1 \right)} \left( \lambda \right) = 1 - z \, \hat{f}_{{Y^{\left( 1 \right)} }}^{\left( 1 \right)} \left( \lambda \right)\), then the function \(\wp_{z}^{\left( 1 \right)} \left( \lambda \right)\) (see e.g., Spitzer 1974) represents the Wiener–Hopf equation, that is written as,
where,
Since \(Y_{k}^{\left( 1 \right)}\) is non-lattice \(P\left( {S_{k}^{\left( 1 \right)} = 0} \right) = 0\), \(k \in {\mathbb{N}}\). Thus, (A.1) is simplified into,
Set \(_{u} J_{\infty }^{\left( 1 \right)} = \sup \left\{ {k \in {\mathbb{N}}:S_{k}^{\left( 1 \right)} = \overline{S}_{\infty }^{\left( 1 \right)} } \right\}\) and \(_{l} J_{\infty }^{\left( 1 \right)} = \inf \left\{ {k \in {\mathbb{N}}:S_{k}^{\left( 1 \right)} = \overline{S}_{\infty }^{\left( 1 \right)} } \right\}\). Applying Corollary 6 (Borovkov 1976, p. 93) we have
Since \(P\left( {S_{k}^{\left( 1 \right)} = 0} \right) = 0\), \(k \in {\mathbb{N}}\), it follows that \(_{u} J_{\infty }^{\left( 1 \right)} =_{l} J_{\infty }^{\left( 1 \right)} = J_{\infty }^{\left( 1 \right)}\).\(\hfill\square\)
This completes the proof of Theorem A.2.
To see the relationship between \(J_{\infty }^{\left( 1 \right)}\) and the ascending and descending epochs, we have:
Corollary A.1
We have \(P\left( {J_{\infty }^{\left( 1 \right)} = n} \right) = P\left( {T_{ \le }^{\left( 1 \right)} > n} \right)P\left( {T_{ > }^{\left( 1 \right)} = \infty } \right)\).
Proof
Since \(S_{j + 1:n} =_{D} S_{n - j}\), \(j = 0, \cdots ,n\), and since \(\left( {Y_{1}^{\left( 1 \right)} , \cdots ,Y_{n}^{\left( 1 \right)} } \right)\), \(\left( {Y_{n + 1}^{\left( 1 \right)} ,Y_{n + 2}^{\left( 1 \right)} , \cdots } \right)\), \(n \in {\mathbb{N}}\backslash \left\{ 0 \right\}\) are independent, we have,
This completes the proof of Corollary A.1.
The following two results are of importance to express probabilities or mixed characteristics and moment generating functions with respect to walk stays positive.\(\hfill\square\)
Theorem A.3
If \(z \in \left( {0,1} \right)\), \({\text{Im}} \left( \lambda \right) = 0\), and \(Y_{k}^{\left( 1 \right)}\), \(k \in {\mathbb{N}}\), are non-lattice, then,
where \(Q^{\left( 1 \right)} \left( {z,\lambda } \right) = \sum\nolimits_{0 \le n} {z^{n} E\left[ {e^{{i\lambda S_{n}^{\left( 1 \right)} }} I\left( {T_{ \le }^{\left( 1 \right)} > n} \right)} \right]}\).
Proof
From the Baxter’s equations that determines the joint distributions of \(\left( {T_{ > }^{\left( 1 \right)} ,S_{{T_{ > }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)\) and \(\left( {T_{ \le }^{\left( 1 \right)} ,S_{{T_{ \le }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)\), and for \(z \in \left( {0,1} \right)\) and \(\lambda \in {\mathbb{R}}\), the following identities hold,
A similar identification of \(1 - E\left[ {z^{{T_{ \le }^{\left( 1 \right)} }} \exp \left( {i\lambda S_{{T_{ \le }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)} \right]\) is obtained when \(\left| z \right| \ge 1\) by simply replacing the event \(\left\{ {S_{n}^{\left( 1 \right)} > 0} \right\}\) with its complementary event \(\left\{ {S_{n}^{\left( 1 \right)} \le 0} \right\}\). Note that, in contrast to the series (A.5), the series with the complementary events fails to converge at \(z = 1\). The verification that the series is convergent at \(z = 1\) relies on the fact that the formula contains an exponential of Laurent series which converges throughout \(\left| z \right| \ge 1\) and emphasizes the fact that \(1 - E\left[ {z^{{T_{ \le }^{\left( 1 \right)} }} \exp \left( {i\lambda S_{{T_{ \le }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)} \right]\) has a zero at \(z = 1\) when \(\lambda = 0\) (see, e.g., Bayer 1996),
Call \(\hat{f}_{ > }^{\left( 1 \right)} \left( {1,\lambda } \right): = \hat{f}_{ > }^{\left( 1 \right)} \left( \lambda \right)\), \({\text{Im}} \left( \lambda \right) \ge 0\) and \(\hat{f}_{ \le }^{\left( 1 \right)} \left( {1,\lambda } \right): = \hat{f}_{ \le }^{\left( 1 \right)} \left( \lambda \right)\), \({\text{Im}} \left( \lambda \right) \le 0\). From the Wiener–Hopf factorization, we also have
where \(\hat{f}_{ > }^{\left( 1 \right)}\) and \(\hat{f}_{ \le }^{\left( 1 \right)}\) are the Fourier-Stieltjes transforms of the random variables \(S_{{T_{ > }^{\left( 1 \right)} }}^{\left( 1 \right)}\) and \(S_{{T_{ \le }^{\left( 1 \right)} }}^{\left( 1 \right)}\), respectively. Writing \(B^{\left( 1 \right)} < \infty\), \(F_{ > }^{\left( 1 \right)}\) and \(F_{ \le }^{\left( 1 \right)}\) for the distribution function of \(S_{{T_{ > }^{\left( 1 \right)} }}^{\left( 1 \right)}\) and \(S_{{T_{ \le }^{\left( 1 \right)} }}^{\left( 1 \right)}\), the following identity is well known
where \(F^{\left( 1 \right)}\) is expressed with respect to the distributions \(F_{ > }^{\left( 1 \right)}\) and \(F_{ \le }^{\left( 1 \right)}\) defined on \(\left( { - \infty ,0} \right]\) and \(\left( {0,\infty } \right)\), respectively. Setting \(z = 1\) and \(\lambda = 0\) in (A.5) and (A.6), we have that,
Since \(A^{\left( 1 \right)} + B^{\left( 1 \right)} = \infty\) and \(E\left[ {Y^{\left( 1 \right)} } \right] < 0\), it yields that,
Incorporating (A.5) and (A.6), the following holds,
Let the progressive \(\sigma -\) algebra be defined as \(F_{n - 1}^{\left( 1 \right)} = \sigma \left( {S_{0}^{\left( 1 \right)} ,S_{1}^{\left( 1 \right)} , \cdots ,S_{n - 1}^{\left( 1 \right)} } \right)\). The numerator in (A.9) can be then expressed as,
In conjunction with (A.9) and (A.10), the proof of Theorem A.3 is completed.
The Baxter equation and the Spitzer’s formula are the highest intellectual achievements of the classical random walk theory. These equations show that the join distribution of \(\left( {T_{ > }^{\left( 1 \right)} ,S_{{T_{ > }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)\) is determined by the mixed transform of \(E\left[ {z^{{T_{ > }^{\left( 1 \right)} }} \exp \left( {i\lambda S_{{T_{ > }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)} \right]\). In other words, this mixed transform can be inverted to obtain the joint probabilities associated with \(\left( {T_{ > }^{\left( 1 \right)} ,S_{{T_{ > }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)\). Similar remarks apply to \(\left( {T_{ \le }^{\left( 1 \right)} ,S_{{T_{ \le }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)\). These formulae are rather complex and depend on the knowledge of \(F_{{Y^{\left( 1 \right)} }}^{n * }\), for all \(n \ge 0\) as indicated in Theorem A.3 and the Corollary A.1, below. However, before we even continue to apply the Baxter’s equation, we require to show that,
is bounded above. Note that the right-hand side of (A.11) is just a version of the Lévy-Khintchine representation. Therefore, to show the boundness of (A.11) the following lemma is of value.\(\hfill\square\)
Lemma A.1
Suppose that the measure \(\upsilon\) on \({\mathbb{R}}\) is given by,
that satisfies, \(\upsilon \left( {\left\{ 0 \right\}} \right) = 0\) and \(\int_{\mathbb{R}} {\left( {\left| x \right| \wedge 1} \right)} \upsilon \left( {dx} \right) < \infty\). Then, there exists a random variable \(Y\) with probability measure \(\mu\) that has characteristic function \(\hat{\mu }\left( s \right)\) given by
and this representation is unique.
Upon Lemma A.1, it is then necessary to demonstrate that \(\upsilon \left( {\left\{ 0 \right\}} \right) = 0\) and \(\int_{\mathbb{R}} {\left( {\left| x \right| \wedge 1} \right)} \upsilon \left( {dx} \right) < \infty\) hold for the random walk \(S^{\left( 1 \right)}\) (see, also Resnick 1992).
Lemma A.2
We have,
Proof
Note that,
To show that each component on the right-hand side of (A.12) is bounded, we introduce the random variable \(N \sim Geom\left( z \right)\), \(z \in \left( {0,1} \right)\), that is independent of the random walk \(S^{\left( 1 \right)}\). Thus,
Applying similar arguments as above, it can be also shown that,
This completes the proof of Lemma A.2.
The following corollary is a simple consequence of Theorem A.3.\(\hfill\square\)
Corollary A.2
If \(z \in \left( {0,1} \right)\), and \(Y_{k}^{\left( 1 \right)}\), \(k \in {\mathbb{N}}\), are non-lattice, then,
where \(Q^{\left( 1 \right)} \left( z \right) = \sum\nolimits_{0 \le n} {z^{n} P\left( {T_{ \le }^{\left( 1 \right)} > n} \right)}\).
Proof
From the Wiener–Hopf Eq. (2.3) and setting \(\lambda = 0\) yields,which in turn yields,
Letting \(z \uparrow 1\) in (A.12), we have,
Considering (A.13) and (A.14), it yields from Theorem A.3 when \(\lambda = 0\) that,
Also, from Theorem A.3 setting \(\lambda = 0\), it yields,
This completes the proof of Corollary 2.1.\(\hfill\square\)
Appendix B
2.1 Overshoot of the random walk
Let \(K\left( s \right)\) denote the cumulant generating function (CGF) of the random variable \(Y\), defined by
For each \(s \in {\mathbf{S}} = \left\{ {s \in {\mathbf{S}}:K\left( s \right) < \infty } \right\}\), we denote by \(P_{s}\) the probability measure with density \(e^{sx - K\left( s \right)}\) with respect to probability measure \(P\). In statistical terminology, \(\left\{ {F_{s} :s \in {\mathbf{S}}} \right\}\), the distribution function under the probability measure \(P_{s}\), represents the exponential family generated by \(F\). Specifically, the random variable \(Y\) and the probability measure \(P\) generate an exponential family of equivalent probability measure \(P_{s}\), \(s \in {\mathbf{S}}\) by,
Let \(\mu \left( s \right) = E_{s} \left[ Y \right]\) and \(\sigma^{2} \left( s \right) = {\text{Var}}_{s} \left( Y \right)\) denote the mean and variance, respectively under the probability measure \(P_{s}\). Let \(K_{s} \left( u \right)\) be the CGF of \(F_{s}\). Then, it follows that,
Since \(K\left( s \right)\) is strictly convex and the supp \(\left( F \right) \cap \left( {0,\infty } \right) \ne \emptyset\), \(K\left( s \right) \to \infty\) as \(s \to \infty\). Quite often \(K\left( \cdot \right)\) has finite radius of convergence (exponential family) and in many cases (heavy tails) \(K\left( s \right) = \infty\), \(\forall s > 0\). More often, the precise characterization of the heavy or light-tailed distribution are expressed in terms of the survival distribution, \(\overline{F} = 1 - F\). Specifically, when \(\overline{F}\left( {\ln x} \right) = l\left( x \right)\), where \(l\left( \cdot \right)\) is slowly varying function, \(F\) is heavy tailed distribution. In this case, \(K\left( s \right) = \infty\), \(\forall s > 0\) and the expectation may be finite or infinite. When \(\overline{F}\left( {\ln x} \right) = x^{\delta } l\left( x \right)\), \(\delta < 0\), \(F\) is medium tailed distribution. In this case, \(K\left( s \right) < \infty\), \(\forall s > 0\) and the expectation is finite. When \(\overline{F}\left( {\ln x} \right) \in R_{ - \infty }\), where \(R_{ - \infty }\) is the class of functions whose Karamata indices are both \(- \infty\) (see, e.g., Bingham et al. 1987), \(F\) is of short tailed distribution. In this case, \(K\left( s \right) < \infty\), \(\forall s > 0\) and the expectation is finite. Various other variations of the above categories also exist (see, e.g., Rojo 1996). Here, we assume that the moment generating function is finite and \(- \infty < E\left[ Y \right] < 0\) are satisfied, which, in turn, implies that we are dealing with short-tailed distributions. Under the short tail distribution and when the expectation of \(Y\) is strictly negative, there exists \(s_{0} > 0:\)\(K^{\prime}\left( {s_{0} } \right) = 0\) and \(s_{1} > 0\): \(K\left( {s_{1} } \right) = 0\). Specifically, it is seen that \(\mu_{{s_{0} }} : = E_{{s_{0} }} \left[ Y \right] = K^{\prime}\left( {s_{0} } \right) = 0\) and if further the convexity property is applied, we also have \(\mu_{{s_{1} }} : = E_{{s_{1} }} \left[ Y \right] =\)\(K^{\prime}\left( {s_{1} } \right) > 0\).
Translating the above remarks in terms of the change point scenario, the CGF of the random variable \(Y^{\left( 1 \right)}\) is formed as \(K_{{Y^{\left( 1 \right)} }} \left( \lambda \right) = \ln \left\{ {\int_{X} {f_{2}^{\lambda } \left( x \right)f_{1}^{1 - \lambda } \left( x \right)dv\left( x \right)} } \right\}\), \(\lambda \in \left( {0,1} \right)\), in which case, \(K_{{Y^{\left( 1 \right)} }} \left( 0 \right) = K_{{Y^{\left( 1 \right)} }} \left( 1 \right) = 0\) and \(\exists \, \lambda_{0} \in \left( {0,1} \right)\): \(K_{{Y^{\left( 1 \right)} }} \left( {\lambda_{0} } \right) < 1\) and \(K^{\prime}_{{Y^{\left( 1 \right)} }} \left( {\lambda_{0} } \right) = 0\), \(K^{\prime\prime}_{{Y^{\left( 1 \right)} }} \left( {\lambda_{0} } \right) > 0\), and in the support of \(\lambda \in \left( {0,1} \right)\), \(K_{{Y^{\left( 1 \right)} }}\) is may be infinitely differentiable. Note that \(K^{\prime}_{{Y^{\left( 1 \right)} }} \left( 0 \right)\) and \(K^{\prime}_{{Y^{\left( 1 \right)} }} \left( 1 \right)\) have opposite signs. To further obtain a knowledge of the above properties and see some implications significant to the change point, we next introduce the exponential family with parameter restricted on \(\lambda \in \left( {0,1} \right)\) and a density given by \(\frac{{dP_{{Y^{\left( 1 \right)} ,\lambda }} }}{{dP_{{Y^{\left( 1 \right)} }} }} = e^{{\lambda x - K_{{Y^{\left( 1 \right)} }} \left( \lambda \right)}}\) with \(P_{{Y^{\left( 1 \right)} ,0}} : = P_{{Y^{\left( 1 \right)} }}\). Note that the expected value and variance under \(P_{{Y^{\left( 1 \right)} ,\lambda }}\) are expressed as, \(\mu_{\lambda }^{\left( 1 \right)} : = K^{\prime}_{{Y^{\left( 1 \right)} }} \left( \lambda \right) = E_{\lambda } \left[ {Y^{\left( 1 \right)} } \right]\) and \(\sigma_{\lambda }^{\left( 1 \right)2} : = {\text{Var}}\left( {Y^{\left( 1 \right)} } \right) = K^{\prime\prime}_{{Y^{\left( 1 \right)} }} \left( \lambda \right)\). Hence, letting \( \, x_{0} \in {\text{int}} X\), the maximum likelihood estimates \(\hat{\lambda }\left( {x_{0} } \right): = \hat{\lambda }\) under the exponential family are defined as a solution of the equation,
Applying the Kullback–Leibler divergence for the exponential family, the information divergence turns out to be the Legendre-Fenchel transform given by \(\hat{l}\left( {x_{0} } \right)\), i.e.,
Note that the entropy \(\lim_{n \to \infty } n^{ - 1} \log P\left( {n^{ - 1} S_{n}^{\left( 1 \right)} \in A} \right) = s^{\left( 1 \right)} \left( A \right)\), for any Borel set \(A\), is closely related to Legendre-Fenchel transform via \(s\left( x \right) = - \hat{l}\left( x \right)\), which opens a new direction of how alternately the change point can be handled.
If \(E_{\lambda }\) is the expectation operator corresponding to \(P_{{Y^{\left( 1 \right)} ,\lambda }}\), then for any fixed \(n \in {\mathbb{N}}\)
for all measurable functions \(f:{\mathbb{R}}^{n} \to {\mathbb{R}}\), which are bounded or nonnegative (see, Asmussen 2003, XIII). Replacing first \(f\left( {Y_{1}^{\left( 1 \right)} , \cdots ,Y_{n}^{\left( 1 \right)} } \right)\) by \(e^{{ - \lambda S_{n} + nK_{{Y^{\left( 1 \right)} }} \left( \lambda \right)}} f\left( {Y_{1}^{\left( 1 \right)} , \cdots ,Y_{n}^{\left( 1 \right)} } \right)\) and specializing next to an indicator function of a Borel set \(A \subseteq {\mathbb{R}}^{n}\), it yields,
The point of formulations as (B.3) and (B.4) is that in several cases the \(F_{\lambda }\)-distribution has more accessible properties than \(F\), as it is demonstrated below.
When \(K^{\prime}_{{Y^{\left( 1 \right)} }} \left( 0 \right) = E\left[ {Y^{\left( 1 \right)} } \right] < 0\), the Lundberg equation expressed as \(K_{{Y^{\left( 1 \right)} }} \left( \lambda \right) = 0\) suggest that \(\lambda = 1\) and \(K^{\prime}_{{Y^{\left( 1 \right)} }} \left( 1 \right) > 0\). In what follows we exploit the exponential change of measure corresponding to \(\lambda = 1\) and write \(P_{{Y^{\left( 1 \right)} ,1}} : = P_{1}\). Specifically, the Cramér-Lundberg theory is now applicable to provide asymptotic expressions of the distribution of the \(\overline{S}_{\infty }^{\left( 1 \right)}\) that can be consequently of use to develop expressions for the distribution of \(\xi_{\infty }\). Applying (2.4), we have \(P\left( G \right) = E_{1} \left[ {e^{{ - \lambda S_{{T_{ > }^{\left( 1 \right)} \left( x \right)}}^{\left( 1 \right)} }} ,G} \right]\), for \(G \in F_{{T_{ > }^{\left( 1 \right)} }}^{\left( 1 \right)}\), \(G \subseteq \left\{ {T_{ > }^{\left( 1 \right)} \left( x \right) < \infty } \right\}\) and \(x \in {\mathbb{R}}_{ > }\). Define \(B^{\left( 1 \right)} \left( x \right): = S_{{T_{ > }^{\left( 1 \right)} \left( x \right)}}^{\left( 1 \right)} - x\) be the overshoot variable. Then, the Lundberg’s inequality (see Asmussen 2003, XIII) is written as,
The argument in (B.5) is used just to neglect the excess over the boundary. A small refinement produces a version of the Cramér-Lundberg approximation, which is given without any proof in the following theorem (see, Veraverbeke 1977 or Asmussen 2003, XIII).
Theorem B.1
If \(B^{\left( 1 \right)} \left( x \right)\) converges in \(P_{1} -\) distribution as \(x \to \infty\), \(\lim_{x \to \infty } B^{\left( 1 \right)} \left( x \right) = B^{\left( 1 \right)} \left( \infty \right)\), then,
where \(C^{\left( 1 \right)} = E_{1} \left[ {e^{{ - B^{\left( 1 \right)} \left( \infty \right)}} } \right]\).
The next theorem provides expressions of the ultimum overshoot time \(B^{\left( 1 \right)} \left( \infty \right)\).
Theorem B.2
For the random walk \(S^{\left( 1 \right)}\), \(B^{\left( 1 \right)} \left( \infty \right)\) exists with respect to \(P_{1}\). In this case the constant \(C\) is given by
From (A.10) and since \(E\left[ {\exp \left( {Y^{\left( 1 \right)} } \right)} \right] = 1\), we also obtain using the Wald’s identity that,
which yields that,
Note, again, that the constant \(C^{\left( 1 \right)}\) is now explicitly determined by the n-times convolutions of the random walk \(S^{\left( 1 \right)}\).
From Corollary, A.2, we have that \(Q^{\left( 1 \right)} \left( z \right)\wp_{{z^{ + } }}^{\left( 1 \right)} \left( 0 \right) = 1\), for \(z \in \left( {0,1} \right)\). Note that when \(f\left( x \right)\) and \(g\left( x \right)\) are n-times differentiable functions, then the Leibnitz rule states that the product \(h\left( x \right) = f\left( x \right)g\left( x \right)\) is also n-times differentiable and its nth derivative is given by \(h^{\left( n \right)} \left( x \right) = \sum\nolimits_{0 \le k \le n} {\left( \begin{gathered} n \hfill \\ k \hfill \\ \end{gathered} \right)f^{{\left( {n - k} \right)}} \left( x \right)g^{\left( k \right)} \left( x \right)}\). Setting \(b_{n}^{\left( 1 \right)} = P\left( {S_{n}^{\left( 1 \right)} > 0} \right)\), \(q_{n}^{\left( 1 \right)} = P\left( {T_{ \le }^{\left( 1 \right)} > n} \right)\) and then applying the Leibnitz rule for \(Q^{\left( 1 \right)} \left( z \right)\wp_{{z^{ + } }}^{\left( 1 \right)} \left( 0 \right) = 1\), we obtain that,
Applying Theorem A.3, we also have that \(Q^{\left( 1 \right)} \left( {1, - i\lambda } \right)\wp_{{1^{ + } }}^{\left( 1 \right)} \left( { - i\lambda } \right) = 1\). Setting \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{n}^{\left( 1 \right)} = E\left[ {e^{{ - S_{n}^{\left( 1 \right)} }} I\left( {S_{n}^{\left( 1 \right)} > 0} \right)} \right]\), \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{q}_{n}^{\left( 1 \right)} = E\left[ {e^{{ - S_{n}^{\left( 1 \right)} }} I\left( {T_{ \le }^{\left( 1 \right)} > n} \right)} \right]\) and then employing the Leibnitz rule, as in (B.6), we have that,
Building on the above expressions, we have provided a complete solution of the random walk problems. In a way, the results in Appendix A and Appendix B show that the probability distributions of \(\left( {T_{ \le }^{\left( 1 \right)} ,S_{{T_{ \le }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)\), \(\left( {T_{ > }^{\left( 1 \right)} ,S_{{T_{ > }^{\left( 1 \right)} }}^{\left( 1 \right)} } \right)\), \(\overline{S}_{\infty }^{\left( 1 \right)}\) and \(q_{n}^{\left( 1 \right)} = P\left( {T_{ \le }^{\left( 1 \right)} > n} \right)\) are explicitly determined by the knowledge of the distribution of \(F_{{Y^{\left( 1 \right)} }}\) and consequently being able to evaluate the n-time convolutions. Thereby expressions \(E\left[ {e^{{ - S_{n}^{\left( 1 \right)} }} I\left( {T_{ \le }^{\left( 1 \right)} > n} \right)} \right]\) are plainly determined via the statements \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{b}_{n}^{\left( 1 \right)} = E\left[ {e^{{ - S_{n}^{\left( 1 \right)} }} I\left( {S_{n}^{\left( 1 \right)} > 0} \right)} \right]\), i.e., functionals of the n-time convolutions of \(F_{{Y^{\left( 1 \right)} }}\).
Appendix C
3.1 Edgeworth expansion
For any given \(X,X_{1} , \cdots ,X_{n}\) of i.i.d. random variables with mean \(\mu = E\left[ X \right]\), let \(\overline{X}_{n} = n^{ - 1} \sum\nolimits_{k = 1}^{n} {X_{k} }\). Throughout the all the appendices assume that \(X\) is not arithmetic with cumulative distribution function \(F\). The classical Edgeworth expansion approximates the density \(U = \sqrt n \left( {\overline{X}_{n} - \mu } \right)\) via the characteristic as (see, Jensen 1995)
where \(\beta = \sum\nolimits_{j = 3}^{m} {\frac{{\left( {it} \right)^{j} \kappa_{j} }}{{j!n^{{{{\left( {j - 2} \right)} \mathord{\left/ {\vphantom {{\left( {j - 2} \right)} 2}} \right. \kern-0pt} 2}}} }}}\) and \(\eta = \omega \frac{{c_{2} E\left[ {\left| X \right|^{m + 1} } \right]}}{{n^{{{{\left( {m - 1} \right)} \mathord{\left/ {\vphantom {{\left( {m - 1} \right)} 2}} \right. \kern-0pt} 2}}} }}\left| t \right|^{m + 1}\), for \(\left| t \right| \le \sqrt n c_{1} E\left[ {\left| X \right|^{m + 1} } \right]\). Here \(\omega \in {\mathbb{C}}\) and it is such that \(\left\| \omega \right\| \le 1\). Before implemented (C.1), the exponential \(\exp \left( {\beta + \eta } \right)\) is expanded in a power series to have,
for \(\left| t \right| \le \sqrt n c_{3} \left( {\kappa_{2} ,E\left[ {\left| X \right|^{m + 1} } \right]} \right)\), and \(c_{3}\),\(c_{4}\) are constants depending on \(\kappa_{2}\) and the absolute moment \(E\left[ {\left| X \right|^{m + 1} } \right]\). The \(P_{s}\)’s are orthogonal polynomials with coefficients depending on the cumulants. For \(m = 4\), we have
The theorem below represents the basis of how a density can be recovered from the characteristic function, known as the inversion theorem.
Theorem C1
(Inversion formula). If \(\gamma \in {\mathbf{L}}_{1} \left( {\mathbb{R}} \right)\) then \(X\) has a bounded continuous density \(f\) with respect to Lebesgue measure given by
Since \(E\left[ {e^{itU} } \right]{, }e^{{ - {{\kappa_{2} t^{2} } \mathord{\left/ {\vphantom {{\kappa_{2} t^{2} } 2}} \right. \kern-0pt} 2}}} \left\{ {1 + \sum\nolimits_{j = 3}^{m} {n^{{ - {{\left( {j - 2} \right)} \mathord{\left/ {\vphantom {{\left( {j - 2} \right)} 2}} \right. \kern-0pt} 2}}} P_{j} \left( {it} \right)} } \right\} \in {\mathbf{L}}_{1} \left( {\mathbb{R}} \right)\), it follows from Theorem A1 that \(U\) and the series expansion have a bounded density \(g_{n}\) and a real-valued inversion function \(h_{n}\) given by
satisfying \(\left| {g_{n} \left( x \right) - h_{n} \left( x \right)} \right| = o\left( {n^{{{{\left( {m - 2} \right)} \mathord{\left/ {\vphantom {{\left( {m - 2} \right)} 2}} \right. \kern-0pt} 2}}} } \right)\). In addition, since \(\int_{\mathbb{R}} {\left( {it} \right)^{j} e^{ - itx} e^{{ - {{\kappa_{2} t^{2} } \mathord{\left/ {\vphantom {{\kappa_{2} t^{2} } 2}} \right. \kern-0pt} 2}}} dt} = \kappa_{2}^{{ - {j \mathord{\left/ {\vphantom {j 2}} \right. \kern-0pt} 2}}} H_{j} \left( {{x \mathord{\left/ {\vphantom {x {\sqrt {\kappa_{2} } }}} \right. \kern-0pt} {\sqrt {\kappa_{2} } }}} \right)e^{{ - {{x^{2} } \mathord{\left/ {\vphantom {{x^{2} } {2\kappa_{2} }}} \right. \kern-0pt} {2\kappa_{2} }}}}\), \(j \ge 3\), where \(H_{j}\) denotes the Hermite polynomials, the \(m = 4\) approximation for \(g_{n}\), the probability density of \(U\) can be approximated as
where \(\lambda_{j} = {{\kappa_{j} } \mathord{\left/ {\vphantom {{\kappa_{j} } {\kappa_{2}^{{{j \mathord{\left/ {\vphantom {j 2}} \right. \kern-0pt} 2}}} }}} \right. \kern-0pt} {\kappa_{2}^{{{j \mathord{\left/ {\vphantom {j 2}} \right. \kern-0pt} 2}}} }}\), \(j \ge 3\), the standardized cumulants. This is known as the Edgeworth expansion for 1-dimension.
Appendix D
4.1 Esscher tilting
The basic idea behind the approximations below stemmed from the work of Esscher (1932). To understand how the derivations will be developed, the abstract formulation below will guide us to the solution. Let \(P\) and \(Q\) be equivalent measures. Then, for any Borel set \(A\), we have that,
where \(x_{Q}\) can be any given arbitrary point choice.
To have a sense of how \(x_{Q}\) is chosen, we again refer to Jensen (1995, p. 12). A consequence of (D.1) is to determine the density of a statistic \(T\) with respect to some measure \(\nu\). To see this, let \(P_{T}\) be the measure of the statistic and let \(Q_{T}\) be an equivalent measure. Then, applying (D.1) to a single point, we have
The measure \(Q\) is taken such that \(x\) is a central point of the distribution, i.e., \(Q\) is the tilted measure.
In applying (B.2), we let the statistic of interest, \(T = \overline{X}_{n}\), be the sample mean. Let \({\mathbf{S}} = \left\{ {s \in \infty :K\left( s \right) < \infty } \right\}\) and \(f_{n}\) be the density function of \(\overline{X}_{n}\), where \(K\) denotes the cumulative generating function of \(X\). The following lemma is of importance.
Lemma D1
For \(s \in {\mathbf{S}}\), we have
where \(f_{n,s}\) is the density of \(\sqrt n \left( {\overline{X}_{n} - x} \right)\) under \(P_{s}\).
Proof
Set \(\frac{{dQ_{{\overline{X}_{n} }} }}{{dP_{{\overline{X}_{n} }} }} = e^{{n\left\{ {sx - K\left( s \right)} \right\}}}\) to be the density of the exponential family. Set \(U = \sqrt n \left( {\overline{X}_{n} - x} \right)\). Applying (B.2) for the exponential measure, we have
where \(S_{n} = \sum\nolimits_{k = 1}^{n} {X_{k} }\).
To see how the density \(f_{n,s}\) works for any value \(y \in {\mathbb{R}}\), \(y > x\), or how (D.1) is implemented for Borel sets to the right of \(x \in {\mathbb{R}}_{ + }\), it is convenient to consider the cumulative distribution function of the statistic \(T = \overline{X}_{n}\), or even some moment expressions of \(T\). We here confine the estimates to the tail distribution \(P\left( {\overline{X}_{n} > x} \right)\) and to the moment expression given by \(E\left[ {e^{{ - u\left( {\overline{X}_{n} - x} \right)}} I\left( {\overline{X} > x} \right)} \right]\), \(x,u \in {\mathbb{R}}_{ + }\). To expedient the computations, assume that \(F_{n,s}\) is absolute continuous (with respect to Lebesgue measure) with density \(f_{n,s}\) for all \(s \in {\mathbf{S}}\).\(\hfill\square\)
Lemma D2
For \(s \in {\mathbf{S}}\), we have
If \(s \in {\mathbf{S}} \cap \left( {0,\infty } \right)\), then
where \(U = \sqrt n {{\left( {\overline{X}_{n} - x} \right)} \mathord{\left/ {\vphantom {{\left( {\overline{X}_{n} - x} \right)} {\sqrt {\kappa_{2,s} } }}} \right. \kern-0pt} {\sqrt {\kappa_{2,s} } }}\) and \(F_{n,s}\) is the cumulative distribution of \(U\) under \(P_{s}\).
If \(F_{n,s}\) is absolute continuous (with respect to Lebesgue measure) with density \(f_{n,s}\) for all \(s \in {\mathbf{S}} \cap \left( {0,\infty } \right)\), then,
Proof
We have
Then, replacing \(S_{n} = nx + \sqrt {n\kappa_{2,s} } U\), (D.3) follows immediately. By integration by parts the integral (D.3) results (D.4). The last statement is trivial.
To relate the above lemma with the Edgeworth expansion in Appendix C, the random variable \(G = X - \mu_{s}\) is defined under the probability measure \(P_{s}\). Note that its characteristic function is defines as
Then the following lemma is of importance.\(\hfill\square\)
Lemma D3
For \(s \in {\mathbf{S}}\) and \(E_{s} \left[ {e^{itU} } \right] \in {\mathbf{L}}_{1} \left( {\mathbb{R}} \right)\),
where \(\mu_{s}\) and \(\kappa_{2,s}\) are the first and the second cumulants under the measure \(P_{s}\).
Proof
Using (D.3) and the inversion formula, Theorem C1, we have
In the expression above, Fubini’s theorem justifies the interchanging the order of integration since all points of the integration contour have a positive real part which bounded away from zero. Thus,
which leads to the desired result after integrating with respect to \(y\).
Lastly, the next result of interest is \(E\left[ {e^{{ - u\left( {\overline{X}_{n} - x} \right)}} I\left( {\overline{X} > x} \right)} \right]\). Again, the idea here is to express the integral with respect to the characteristic to connect it with Appendix C.\(\hfill\square\)
Lemma D4
For \(s \in {\mathbf{S}} \cap \left( {0,\infty } \right)\) and \(E_{s} \left[ {e^{itU} } \right] \in {\mathbf{L}}_{1} \left( {\mathbb{R}} \right)\),
Proof
We have
Replacing from Lemma B2 the \(P\left( {\overline{X}_{n} > y} \right)\) and using the Fubini’s theorem, we have
This proves the lemma.\(\hfill\square\)
Appendix E
5.1 Saddle approximation
For \(T = \overline{X}_{n}\). Here, the parameter \(s\) is chosen via the saddlepoint, which in return simplifies the number of computations seen in Appendix D. Thus, let \(\hat{s}: = \hat{s}\left( x \right)\), that is determined by the equation \(K^{\prime}\left( s \right) = x = \mu_{s}\). When the exponential family is regular, that is, \({\mathbf{S}}\) is open, which implies that the exponential family is steep. When the family is neither regular nor steep, the equation \(K^{\prime}\left( s \right) = x\), cab still solved for values of \(x \in {\text{int}} {\mathbf{S}}\) (see, e.g., Barndorff-Nielsen (1978, p.153). To obtain the firs order saddlepoint approximations, we simply replace \(\phi_{s}^{n} \left( {{t \mathord{\left/ {\vphantom {t {\sqrt {n\kappa_{2,s} } }}} \right. \kern-0pt} {\sqrt {n\kappa_{2,s} } }}} \right)\) by the approximation as shown in (C.2). Consider the
where \(P_{j,s}\) are again orthogonal polynomial of the same form as in (C.2), but now are indexed by the parameter s. In any event, \(P_{j,s}\) still include expressions of a constant term multiplied by \(\left( {it} \right)^{k}\), \(k \in {\mathbb{N}}\). Thus, the goal is to first evaluate the integrals of the form (see, e.g., Jensen 1995)
before (C.1) is estimated.
Lemma E1
(Jensen 1995, p.24).
where \(\overline{\Phi }\left( x \right) = 1 - \Phi \left( x \right) = 1 - \frac{1}{{\sqrt {2\pi } }}\int_{{\left( { - \infty ,x} \right]}} {e^{{ - {{x^{2} } \mathord{\left/ {\vphantom {{x^{2} } 2}} \right. \kern-0pt} 2}}} dx}\).
To have a clear sense of the approximations derived below, we will primarily deal with the following terms,\(B_{0} ,B_{3} ,B_{4}\) and \(B_{6}\). Thus, some additional details of \(B_{j}\)’s expression with respect to \(\lambda = \sqrt {n\kappa_{2,s} } s\) are necessary to identify their magnitude in relation to size \(n\).
In addition, an expansion of the normal tail distribution will reveal how the \(B_{j}\)’s act with respect to \(n \to \infty\).
Lemma E2
(Gradshteyn and Ryzhik 2000, 8.254). For \(n \to \infty\), we have that, for \(\lambda = \sqrt {n\kappa_{2,s} } s\),
Further, as \(\lambda = \sqrt {n\kappa_{2,s} } s \to \infty\)
Also, \({{B_{3} \left( \lambda \right)} \mathord{\left/ {\vphantom {{B_{3} \left( \lambda \right)} \lambda }} \right. \kern-0pt} \lambda } \to - 3\left( {2\pi } \right)^{{ - {1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}}\).
To estimate (E.1) or more specific the tail distribution of the sample means in Lemma B3 and the restricted moments \(E\left[ {e^{{ - u\left( {\overline{X}_{n} - x} \right)}} I\left( {\overline{X} > x} \right)} \right]\) in Lemma D4, the use of Lemma E1 is adopted to provide the following.
Lemma E3
For \(s \in {\mathbf{S}} \cap \left( {0,\infty } \right)\) and \(x,u \in {\mathbb{R}}_{ + }\),
where \(\lambda = \sqrt {n\kappa_{2,s} } s\).
Remark
The error term can be sharpened when the parameter \(s\) is bounded away from zero. Further, from Lemma C1 it can be shown that \(\frac{{\lambda_{3,s} }}{\sqrt n 3!}B_{3} \left( {\sqrt {n\kappa_{2,s} } s} \right) = O\left( {n^{ - 1} } \right)\) for \(s \in {\mathbf{S}} \cap \left( {0,\infty } \right)\) and \(s > \varepsilon\) for positive fixed \(\varepsilon\), such that the saddle point approximation, corresponding to \(B_{0}\) term in Lemma C2 has relative error \(O\left( {n^{ - 1} } \right)\) for \(s > \varepsilon\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fotopoulos, S.B. The distribution of the maximum likelihood estimates of the change point and their relation to random walks. Stat Inference Stoch Process 27, 335–372 (2024). https://doi.org/10.1007/s11203-023-09304-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11203-023-09304-z