Abstract
In this paper, sequences of trials having three outcomes are studied. The outcomes are labelled as success, failure of type I and failure of type II. A run is called at most \(1+1\) contaminated, if it contains at most one failure of type I and at most one failure of type II. The accompanying distribution for the length of the longest at most \(1+1\) contaminated run is obtained. The proof is based on a powerful lemma of Csáki, Földes and Komlós. Besides a mathematical proof, simulation results supporting our theorem are also presented.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The problem of the length of the longest head run for n Bernoulli random variables was first raised by T. Varga. The first answer for the case of a fair coin was given in the classical paper by Erdős and Rényi [1]. A surprisingly more precise answer, the almost sure limit result for the length of the longest runs containing at most T tails was given by Erdős and Révész [2]. Their result is the following. Consider the usual coin tossing experiment with a fair coin. Let \(Z_T(N)\) denote the longest head run containing at most T tails in the first N trials. Let \(\log \) denote the logarithm to base 2 and let [. ] denote the integer part. Let \(h(N)= \log N +T \log \log N- \log \log \log N -\log T! + \log \log e\). Let \(\varepsilon \) be an arbitrary positive number. Then for almost all \(\omega \in \Omega \) there exists a finite \(N_0=N_0(\omega )\) such that \(Z_T(N)\ge [h(N) -2 -\varepsilon ]\) if \(N\ge N_0\), moreover, there exists an infinite sequence \(N_i=N_i(\omega )\) of integers such that \(Z_T(N_i) < [h(N_i) -1 +\varepsilon ]\).
These results later inspired renewed interest in this research area and several subsequent papers came up. Asymptotic results for the distribution of the number of T-contaminated head runs, the first hitting time of a T-contaminated head run having a fixed length, and the length of the longest T-contaminated head run were presented by Földes [3]. For the asymptotic distribution of \(Z_T(N)\), Földes [3] presented the following result. For any integer k and \(T\ge 0\)
as \(N\rightarrow \infty \), where \(\{. \}\) denotes the fractional part.
By applying extreme value theory, Gordon et al. [4] obtained the asymptotic behaviour of the expectation and the variance of the length of the longest T-contaminated head run. Also, in the same article, accompanying distributions were obtained for the length of the longest T-contaminated head run.
Fazekas and Suja [5] showed that the accompanying distribution initially obtained by Gordon et al. [4] could as well be arrived at using the approach given by Földes [3]. After some probabilistic calculations and algebraic manipulations of the approximation of the length of the longest T-contaminated run and using the main lemma in Csáki et al. [6], a convergence rate was obtained for an accompanying distribution of the longest T-contaminated head run by Fazekas et al. [7] for \(T=1\) and \(T=2\).
Different authors have given in depth considerations to experiments involving sequences of runs emerging from trinary trials, where the Markov chain approach is used in their analysis. Such sequences include system theoretic applications, where components might exist in the following states: ‘perfect functioning’, ‘partial functioning’ and ‘complete failure’ (see [8, 9]).
In this paper, we study sequences of trials having three outcomes: success, failure of type I and failure of type II. We shall say that a run is at most \(1+1\) contaminated if it includes at most 1 failure of type I and at most 1 failure of type II. Section 2 contains the main results. We give accompanying distributions for the appropriately centralized length of the longest at most \(1+1\) contaminated run, see Theorem 2.1. In Sect. 2, we also present simulation results supporting our theorem. The proofs are presented in Sect. 3. The proofs are based on a powerful lemma by Csáki et al. [6]. For the reader’s convenience, we quote it in Sect. 3. We mention that the manuscript Fazekas et al. [10] is an extended version of the present paper. In that manuscript one can find minor details of the proofs, additional simulation examples and some further results.
2 The longest at most \(1+1\) contaminated run
Let \(X_1,X_2,\ldots ,X_N\) be a sequence of independent random variables with three possible outcomes; 0, \(+1\) and \(-1\) labelled as success, failure of type I and failure of type II, respectively, with the distribution
where \(p+ q_1+q_2 = 1\) and \(p>0\), \(q_1>0\), \(q_2 > 0\).
A sequence of length m is called a pure run if it contains only 0 values. It is called a one-type contaminated run if it contains precisely one nonzero element: either a \(+1\) or a \(-1\). On the other hand, it is called a two-type contaminated run if it contains precisely one \(+1\), and one \(-1\) while the rest of the elements are 0’s.
A run is called at most one + one contaminated (shortly at most \(1+1\) contaminated) if it is either pure, one-type contaminated, or two-type contaminated.
So, for an arbitrary fixed m, let \(A_n = A_{n,m}\) denote the occurrence of the event at the \(n^{th}\) step, that is, there is an at most \(1+1\) contaminated run in the sequence \(X_n, X_{n+1}, \ldots , X_{n+m-1}\) and \(\bar{A}_n\) is its non-occurrence. Clearly,
Let \(\mu (N)\) be the length of the longest at most \(1+1\) contaminated run in \(X_1,X_2,\ldots ,X_N\). Then, \(\{\mu (N) < m\}\) if and only if no run of length m in \(X_1,X_2,\ldots ,X_N\) is two-type contaminated, one-type contaminated or pure.
In what follows, we shall use the notation
where
We also need the notation
where \(C=\ln \frac{1}{p}\) and \(\ln \) is the logarithm to base e. Let
where \(\log \) denotes the logarithm to base 1/p. Let [m(N)] denote the integer part of m(N) and let \(\{m(N)\}= m(N)-[m(N)]\) denote its fractional part. Introduce the function
Theorem 2.1
Let \(p>0\), \(q_1>0\), \(q_2 > 0\) be fixed with \(p+ q_1+q_2 = 1\). Let \(\mu (N)\) be the length of the longest at most \(1+1\) contaminated run in \(X_1,X_2,\ldots ,X_N\). Then, for any integer k,
where \(f(N) = \textrm{O}( g(N))\) if f(N)/g(N) is bounded as \(N\rightarrow \infty \).
Analysing the beginning of the proof of Theorem 2.1, we can see that the lemma of Csáki et al. [6] offers a good approximation if p is small, but it offers a worse approximation if p is close to 1. However, our simulation studies show that the approximation for the longest run is very good for small values of p, but it is still appropriate if p is close to 1.
Example 2.2
We performed several computer simulation studies for certain fixed values of p, \(q_1\), and \(q_2\). Below N denotes the length of the sequences generated by us and s denotes the number of repetitions of the N-length sequences.
Figures 1, 2, 3 and 4 present the results of the simulations. Each figure shows the empirical distribution of the longest at most \(1+1\) contaminated run and its approximation suggested by Theorem 2.1. An asterisk (i.e., \(*\)) denotes the result of the simulation, i.e., the empirical distribution function of the longest at most \(1+1\) contaminated run and circle (\(\circ \)) denotes the approximation offered by Theorem 2.1.
3 Proofs
The following lemma of Csáki, Földes and Komlós plays a fundamental role in our proofs.
Lemma 3.1
(Main lemma, stationary case, finite form of Csáki et al. [6]) Let \(X_1,X_2,\dots \) be any sequence of independent random variables, and let \({\mathcal {F}}_{n,m}\) be the \(\sigma \)-algebra generated by the random variables \(X_n,X_{n+1},\dots ,X_{n+m-1}\). Let m be fixed and let \(A_n=A_{n,m}\in {\mathcal {F}}_{n,m}\). Assume that the sequence of events \(A_n = A_{n,m}\) is stationary, that is, \(P(A_{i_1+d} A_{i_2+d} \cdots A_{i_k+d})\) is independent of d.
Assume that there is a fixed number \(\alpha \), \(0<\alpha \le 1\), such that the following three conditions hold for some fixed k with \(2 \le k \le m\) and fixed \(\varepsilon \) with \(0< \varepsilon < \min \{ p/10, 1/42\}\):
-
(SI)
$$\begin{aligned} |P(\bar{A_2} \cdots \bar{A_k}|A_1)-\alpha |<\varepsilon , \end{aligned}$$
-
(SII)
$$\begin{aligned} \sum _{k+1 \le i \le 2m} P(A_i |A_1) <\varepsilon , \end{aligned}$$
-
(SIII)
$$\begin{aligned} P(A_1) < \varepsilon /m. \end{aligned}$$
Then, for all \(N > 1\),
and
Before proceeding with the proof, we shall consider the fulfilment of some conditions given in the main Lemma for the case of \(k=m\) (for fixed m) and \(0<p<1\) and for some \(\varepsilon >0\).
Remark 3.2
First, we consider condition (SIII) and show that it is true for any large enough m. We have
This inequality is true for any positive \(\varepsilon \) if m is large enough.
If \(m \approx \log N\), then \(p^m \approx p^{\log N} = \frac{1}{N}\) and then \(\varepsilon \approx \frac{(\log N)^3}{N}\).
Remark 3.3
Now, consider condition (SII). If \(i>m\), then \(A_i\) and \(A_1\) are independent, therefore
which gives precisely the previous assumption in Remark 3.2.
Lemma 3.4
Condition (SI) is satisfied for \(k=m\) in the following form:
where \(\alpha \) is given by (2.1) and \(\varepsilon = \textrm{O}(p^m)\).
Proof
To begin, we shall be required to divide the event \(A_1\) into the following pairwise disjoint parts:
where \(A_1^{(0)}\) is the event that \(X_1,X_2, \ldots , X_m\) is a pure run,
\(A_1^{(+)}(i)\) denotes that \(X_i = +1\), while the rest are zeros,
\(A_1^{(-)}(i)\) denotes that \(X_i = -1\), while the rest are zeros,
finally, \(A_1^{(2)}(i,j)\) denotes that \(X_i = +1\), \(X_j = -1\), while the rest are zeros.
Then
Here, we can obtain the formula for \(\sum _{i=1}^m Y_i^{(-)}\) by interchanging the role of \(q_1\) and \(q_2\) in the corresponding formula \(\sum _{i=1}^m Y_i^{(+)}\). Similarly, we can obtain \(\sum _{i>j}^m Y_{i,j}\) by interchanging the role of \(q_1\) and \(q_2\) in the corresponding formula \(\sum _{i<j}^m Y_{i,j}\).
-
1.
\(Y^{(0)} = 0\), since the event is impossible.
-
2.
Now, calculate \( Y_i^{(+)} = P(A_1^{(+)}(i) \bar{A_2} \cdots \bar{A}_m) \).
We want to evaluate probabilities corresponding to different values of i as follows.
-
(a)
If \(i=1\), then the event is impossible. So \(Y_1^{(+)} = 0\).
-
(b)
Let \(1<i<m\). Then the \(m+1\) position should be \(+1\). Furthermore, in positions \(m+2,\ldots ,m+i\), it is not possible that all elements are zeros and also not possible that there is a \(-1\) and the rest of the elements are zeros. So, for this case,
$$\begin{aligned} Y_i^{(+)} = q_1 p^{m-1} q_1 \left( 1 - p^{i-1} - (i-1) q_2 p^{i-2} \right) , \quad \text {if} \quad 1< i<m. \end{aligned}$$(3.5) -
(c)
If \(i=m\), then \(X_{m+1}\) should be \(+1\) and the remaining elements are arbitrary. So
$$\begin{aligned} Y_m^{(+)} = q_1 p^{m-1} q_1. \end{aligned}$$(3.6)
-
(a)
-
3.
Now, let us turn to \(Y_{i,j}\); first we consider the case when \(i<j\).
-
(a)
If \(i=1\) and \(j = m\), then \(X_{m+1}\) should be \(-1\) and the remaining elements are arbitrary. So
$$\begin{aligned} Y_{1,m} = q_1 q_2 p^{m-2} q_2. \end{aligned}$$(3.7) -
(b)
Now, let \(i=1\) and \(j<m\). Then \(X_{m+1}\) should be \(-1\). Moreover, it is not possible that all elements in positions \(m+2, \ldots , m+j\) are zeros; nor is it possible that one is +1 and the rest are zeros. So
$$\begin{aligned} Y_{1,j} = q_1 q_2 p^{m-2} q_2 \left( 1 - p^{j-1} - p^{j-2}(j-1) q_1 \right) \quad \text {if} \quad 1<j<m. \end{aligned}$$(3.8) -
(c)
Now, let \(i>1\) and \(j=m\). Then \(X_{m+1}\) can either be +1 or \(-1\).
If \(X_{m+1}\) is \(-1\), then the remaining elements are arbitrary. However, if \(X_{m+1}\) is +1, then in positions \(m+2, \ldots , m+i\) there should be at least one nonzero element. So
$$\begin{aligned} Y_{i,m} = q_1 q_2 p^{m-2} \left( q_1 \left( 1 - p^{i-1}\right) + q_2 \right) \quad \text {if} \quad i>1,\quad j=m. \end{aligned}$$(3.9) -
(d)
Consider now the case \(i>1\) and \(j<m\). We divide this event into two parts.
First, let \(X_{m+1}= +1\). Then it is not possible that the elements in positions \(m+2, \ldots , m+i\) are all zeros. It also impossible that there is one \(-1\) among \(m+2, \ldots , m+i\) while all \(m+i+1, \ldots , m+j\) are zeros. So this part of \(Y_{i,j}\) is
$$\begin{aligned} q_1 q_2 p^{m-2} q_1 \left( 1 - p^{i-1} - (i-1) p^{i-2} q_2 p^{j-i}\right) \quad \text {if} \quad i>1,\quad j<m. \end{aligned}$$(3.10)Finally, let \(X_{m+1}= -1\). Then it is not possible that all elements in positions \(m+2, \ldots , m+j\) are zeros, nor is it possible that among \(m+2,\) \(\ldots , m+j\) there is one +1 and the rest are zeros. So this second part of \(Y_{i,j}\) is
$$\begin{aligned} q_1 q_2 p^{m-2} q_2 \left( 1 - p^{j-1} - (j-1) q_1 p^{j-2} \right) \quad \text {if} \quad i>1,\quad j<m. \end{aligned}$$(3.11)
-
(a)
Summing (3.5) and (3.6), we get \(Y_i^{(+)}\) and consequently by interchanging the roles of \(q_1\) and \(q_2\) we obtain \(Y_j^{(-)}\) as follows:
Here we applied
which can be obtained by differentiating the known formula for the sum of a geometric sequence. Similarly, summing (3.7), (3.8), (3.9), (3.10) and (3.11) together with their corresponding versions we get by interchanging the roles of \(q_1\) and \(q_2\), we obtain
So
Finally,
Therefore, combining (3.12) and (3.13) and by some simplification, we obtain
So,
We therefore satisfy from Lemma 3.4
where \(\varepsilon = \textrm{O}(p^m)\) and \(\alpha \) is given by (2.1). \(\square \)
Proof of Theorem 2.1
Let \(N_1=N-m+1\), where m will be specified so that \(m\sim \log N\). Then
As \(mP(A_1) \sim m^3 p^m \sim \frac{(\log N)^3}{N}\), so \(e^{\pm 2\,m P(A_1)} = 1 + \textrm{O}\left( \frac{(\log N)^3}{N} \right) \). Similarly, as \(\varepsilon = \textrm{O}\left( p^m\right) \) and \(m \approx \log N\),
Therefore, we can calculate
So we have to calculate
where
Our aim is to find m(N) so that the asymptotic behaviour of \(P(\mu (N)-[m(N)]<k)\) can be obtained. Then
Let \( m= m(N)+k-\left\{ m(N) \right\} \). So
We want to find m(N) so that the remainder term in the exponent l be small. We shall do it step by step using several Taylor expansions. We try to find m(N) as \(\log N+A\), where A will be specified later. So \(m= \log N+A+k-\{ m(N)\}\).
Then, using the Taylor expansion \(\log (x_0+y)= \log x_0 + \frac{y}{C x_0}-\frac{1}{2C}\frac{y^2}{x_0^2} + \frac{1}{3C}\frac{y^3}{\tilde{x}^3}\), where \(\tilde{x}\) is between \(x_0\) and \(x_0+y\) and \(x_0>0\), \(x_0+y>0\), we obtain
Now since \(m = \log N + A +k- \left\{ m(N)\right\} \), we have
Again applying the above Taylor expansion, we infer
Now, we let \(A = 2 \log \log N + B\). Then
Let \(B = \frac{4 \log \log N}{C \log N} + D\). Then
We shall use the Taylor expansion \( \frac{1}{x_0+x}= \frac{1}{x_0} - \frac{x}{x_0^2} + \frac{x^2}{x_0^3} - \frac{x^3}{\tilde{x}^4} \), where \(\tilde{x}\) is between \(x_0\) and \(x_0+x\) and where \(x_0>0\), \(x_0+x >0\). Since \(m = \log N + A + k - \left\{ m(N)\right\} \),
and
Now let
Then
So we have obtained
Now let
To collect those terms which contain \( k- \{m(N)\}\), we use the function H(x) introduced in (2.2). Inserting these expressions, we obtain
So choosing
we have
So we obtain
\(\square \)
4 Discussion
In this paper we studied experiments, where two types of failures may occur. Repeating the experiment several times, we considered those runs which contain at most one failure of type I and at most one failure of type II. We were able to find a good approximation for the length distribution of the longest such kind of runs.
References
P. Erdős, A. Rényi, On a new law of large numbers. Anal. Math. 23, 103–111 (1970)
P. Erdős, P. Révész, On the length of the longest head-run, in Topics in Information Theory. Colloq. Math. Soc. János Bolyai, vol. 16, (North-Holland, Amsterdam, 1975), pp.219–228
A. Földes, The limit distribution of the length of the longest head-run. Period. Math. Hung. 10(4), 301–310 (1979)
L. Gordon, M.F. Schilling, M.S. Waterman, An extreme value theory for long head runs. Probab. Theory Relat. Fields 72(2), 279–287 (1986)
I. Fazekas, M. Suja, Limit theorems for contaminated runs of heads. Ann. Univ. Sci. Budapest Sect. Comp. 52, 131–146 (2021)
E. Csáki, A. Földes, J. Komlós, Limit theorems for Erdős–Rényi type problems. Studia Sci. Math. Hungar. 22, 321–332 (1987)
I. Fazekas, B. Fazekas, M.O. Suja, Convergence rate for the longest T-contaminated runs of heads (paper with detailed proofs). (2023) arXiv:2302.06657
S. Eryilmaz, M. Gong, M. Xie, Generalized sooner waiting time problems in a sequence of trinary trials. Stat. Prob. Lett. 115, 70–78 (2016)
M. Koutras, V. Alexandrou, Sooner waiting time problems in a sequence of trinary trials. J. Appl. Probab. 34(3), 593–609 (1997)
I. Fazekas, B. Fazekas, M.O. Suja, Limit theorems for runs containing two types of contaminations (Paper with detailed proofs) (2023). arXiv:2309.11602
Acknowledgements
The authors would like to thank the referee for a careful reading of the paper and for valuable suggestions.
Funding
Open access funding provided by University of Debrecen.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fazekas, I., Fazekas, B. & Suja, M.O. A limit theorem for runs containing two types of contaminations. Period Math Hung (2024). https://doi.org/10.1007/s10998-024-00600-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10998-024-00600-6