Abstract
In a sequence of independent identically distributed geometric random variables, the sum of the first two record values is distributed as a simple linear combination of geometric variables. It is verified that this distributional property characterizes the geometric distribution. A related characterization conjecture is also discussed. Related discussion in the context of weak records is also provided.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Consider a sequence of independent identically distributed (i.i.d.) positive integer valued random variables \(\{X_{n}\}_{n=1}^{\infty }\). Denote the corresponding sequence of upper records by \(\{X^{(n)}\}_{n=1}^{\infty }\). Specifically, the first random variable in the sequence is identified as the first record, the second record is the first subsequent Xn which exceeds X1. It is well known that the record value sequence corresponding to a sequence of geometric random variables has a simple distributional structure. If we define the record spacings sequence \(\{S_{n}\}_{n=1}^{\infty } \) by S1 = X1 = X(1) and for n > 1, Sn = X(n) − X(n− 1), then in the geometric case these spacings are independent random variables. Geometric characterizations based on the independence of the record spacings are well known. In the present paper we will consider a simple relationship between the distribution of the first two records and the distribution of the first two Xn’s. Two related conjectured characterizations are described. In addition parallel results are discussed in the case of weak records.
2 The Conjectured Characterizations
Consider a sequence of i.i.d. positive integer valued random variables \( \{X_{n}\}_{n=1}^{\infty }\) with corresponding upper record sequence \(\{X^{(n)}\}_{n=1}^{\infty }\). If the Xi’s have a common geometric distribution, then because the record spacings are themselves geometrically distributed with homogeneous success probabilities, it follows that
After formulating this unusual relationship between the two sequences, \( \{X_{n}\}_{n=1}^{\infty }\) and \(\{X^{(n)}\}_{n=1}^{\infty }\), it becomes plausible that this is a characteristic property of the geometric distribution. Two conjectures were considered.
Conjecture 1.
Suppose that \(X^{(1)}+X^{(2)} \overset {d}{=}X_{1}+2X_{2}\), then px = P(X = x) = p(1 − p)x− 1, for each x = 1, 2, ... for some p ∈ (0, 1).
Conjecture 2.
Suppose that, for some positive integer m > 2, \({\sum }_{i=1}^{m}\) \(X^{(i)} \overset {d}{=}{\sum }_{i=1}^{m} iX_{i}\), then px = P(X = x) = p(1 − p)x− 1, for each x = 1, 2, ... for some p ∈ (0, 1).
Both conjectures are judged to be plausible. Conjecture 2 would appear to be more difficult to resolve. In the next section we will provide a proof of Conjecture 1 under no regularity conditions. A proof of Conjecture 2 remains elusive.
3 Proof of Conjecture 1
Throughout this section we will employ the usual convention, when convenient, of denoting 1 − p by q and denoting 1 − px by qx.
Theorem 1.
If \( \{X_{n}\}_{n=1}^{\infty }\) are i.i.d. positive integer valued random variables with common discrete density function f(x) = px, x = 1, 2, ... where px > 0 ∀x so that a record value sequence is well-defined, and if \(X^{(1)}+X^{(2)} \overset {d}{=}X_{1}+2X_{2}\), then px = p(1 − p)x− 1, x = 1, 2, ... for some p ∈ (0, 1).
Proof.
First note that set of possible values of X1 + 2X2 and of X(1) + X(2) is the set {3, 4, 5, ...}. □
Necessity
It is well-known that if the Xi’s are i.i.d. with a common Geometric (p) distribution, then the record spacings X(m) − X(m− 1) are also i.i.d. with a common geometric (p) distribution. Since we can write X(1) + X(2) = (X(2) − X(1)) + 2X(1), the result follows.
Sufficiency
As in the statement of the theorem we have P(X = x) = px, x = 1, 2...
Assuming that \(X^{(1)}+X^{(2)} \overset {d}{=}X_{1}+2X_{2}\), we wish to prove that px = pqx− 1, x = 1, 2, .... First note that
while
Equating these expressions we may conclude that p2 = p1q1 For simplicity of notation we will denote p1 by p. Thus far we have shown that p1 = p = pq1 − 1 and p2 = pq = pq2 − 1. We now argue inductively. Suppose that for some positive even integer 2k, we have pj = pqj− 1 for every j ≤ 2k, we claim that in such a case because of Eq. 2.1, we will also have p2k+ 1 = pq2k+ 1 − 1. To see this, consider
and
Since (2.1) holds, we may conclude that
which implies that p2k+ 1 = pq2k = pq(2k+ 1)− 1, as claimed.
A similar argument will show that if for some positive odd integer 2k − 1, we have pj = pqj− 1 for every j ≤ 2k − 1, then because of Eq. 2.1, we will also have p2k = pq2k− 1. For this, it is necessary to equate P(X1 + 2X2 = 2k + 1) and P(X(1) + X(2) = 2k + 1).
It then follows by induction that px = pqx− 1 for every x = 1, 2, ..., i.e. that X has a geometric(p) distribution.
4 Discussion Regarding Conjecture 2
The proof of Theorem 1 was less transparent than was expected. Although Conjecture 2 is eminently plausible, the book-keeping necessary to prove the result appears to be daunting and the conjecture remains open. However, if we consider the case in which m = 3, we may argue that the conjecture appears to be unlikely to be true based on the following observations.
The possible values of X1 + 2X2 + 3X3 and of X(1) + X(2) + X(3) are {6, 7, 8, ...}. If we assume that P(X1 + 2X2 + 3X3 = 6) = P(X(1) + X(2) + X(3) = 6), this implies that
from which we obtain
Thus p1 and p2 appear to be unconstrained, except that their sum must be less than 1.
If we consider other possible values, i.e., consider equalities of the form
then each new value of y will result in an expression for py in terms of p1, p2, ... , py− 1. However no obvious constraints on p1 or p2 appear to arise.
Of course, if p2 = p1(1 − p1) then subsequent pj’s appear to be of the geometric form (i.e., = p1(1 − p1)j− 1). However, other choices for p2 would seem to lead to non-geometric solutions.
Cases in which m > 3, exhibit similar problems and, in fact, would appear to admit an even wider variety of non-geometric solutions. It appears that only in the case m = 2 is a characterization possible.
Remark 1.
We have carefully avoided stating that non-geometric solutions will exist in cases in which m > 2, because we have been unable to explicitly determine completely a convergent non-geometric sequence that satisfies the condition \({\sum }_{i=1}^{m}X^{(i)} \overset {d}{=} {\sum }_{i=1}^{m} iX_{i}.\)
5 An Analogous Weak Record Result
When we turn to investigate record phenomena for sequences of i.i.d. non-negative integer valued random variables, the concept of weak records plays the role usually played by records. An observation in the sequence \(\{X_{i}\}_{i-1}^{\infty }\) is a weak record if it exceeds or equals all the preceding Xi’s in the sequence. In this setting geometric random variables with possible values {0, 1, 2, ...} play a role analogous to that played by positive geometric variables in record value discussions. In this Section we will add asterisks to non-negative integer random variables and corresponding weak records to distinguish them from the positive random variables and ordinary records discussed in the previous Sections.
We thus will consider a sequence \(\{X^{*}_{i}\}_{i=1}^{\infty }\) of non-negative random variables with a corresponding weak record sequence denoted by \(\{X^{*(i)}\}_{i=1}^{\infty }\). (an introduction to weak records can be found in Arnold et al. (1998)). We will say that a non-negative integer valued random variable X∗ has a geometric∗ distribution if its discrete density is of the form P(X∗ = k) = p(1 − p)k, k = 0, 1, 2, ... and we write \(X^{*} \sim geo^{*}(p)\). Parallel to the result for positive geometric variables, it is well-known that the weak record spacings corresponding to geometric∗(p) are themselves i.i.d. with a common geometric∗(p) distribution. It is consequently plausible that the following result, analogous to Theorem 1, might be true (this was suggested by a referee). The proof is a close parallel to the proof for ordinary (i.e., positive) geometric variables.
Theorem 2.
If \( \{X^{*}_{n}\}_{n=1}^{\infty }\) are i.i.d. non-negative integer valued random variables with common discrete density function f(x) = px, x = 0, 1, 2, ... where px > 0 ∀x so that a weak record value sequence is well-defined, and if \(X^{*(1)}+X^{*(2)} \overset {d}{=}X^{*}_{1}+2X^{*}_{2}\), then px = p(1 − p)x, x = 0, 1, 2, ... for some p ∈ (0, 1).
Proof.
First note that set of possible values of \(X^{*}_{1}+2X^{*}_{2}\) and of X∗(1) + X∗(2) is the set {0, 1, 2, ...}. □
Necessity
We use the fact that if the \(X^{*}_{i}\)’s are i.i.d. with a common geometric∗(p) distribution, then the record spacings X∗(m) − X∗(m− 1) are also i.i.d. with a common geometric∗(p)distribution. Since we can write X∗(1) + X∗(2) = (X∗(2) − X∗(1)) + 2X∗(1), the result follows.
Sufficiency
As in the statement of the theorem we have P(X = x) = px, x = 0, 1, 2... , however it will be convenient to denote p0 by p ∈ (0, 1).
For convenience we define \(V=X_{1}^{*}+2X_{2}^{*}\) and W = X∗(1) + X∗(2). Under the assumption that \(V\overset {d}{=}W\) we wish to prove that pk = p(1 − p)k k = 0, 1, 2, ... where \(p=P(X^{*}_{1}=0)\). Elementary computations yield the following expressions for the discrete densities of V and W, in which we use the notation \(q_{j}=P(X_{1}^{*} \geq j))\).
Since \(V\overset {d}{=}W\), we can equate (5.1) and (5.2) when k = 2 and conclude that p1 = p(1 − p). Next consider an arbitrary k > 2 and assume that, for j < k − 1, it has been verified that pj = p(1 − p)j and qj = (1 − p)j. Then by equating (5.1) and (5.2), if k is odd, or by equating (5.3) and (5.4), if k is even, we may conclude that pk− 1 = p(1 − p)k− 1. We may thus, by induction, conclude that \(P(X^{*}_{1}=k)=p_{k}=p(1-p)^{k}, \ \ k=0, 1,2,..\), i.e., that \(X**_{1} \sim geo*(p)\).
6 Closing Observations
Conjecture 2 continues to be tantalizing. Our arguments in Section 4 strongly suggest that it will not prove to be true. One might try to use simulations to compare the distributions of X1 + 2X2 and of X(1) + X(2) using a particular non-geometric distributions for the Xi’s. However, it is highly unlikely that any well-known choice for the distribution of the Xi’s will result in the desired equi-distribution of the two statistics. We believe that the best hope for resolving the problem lies in identifying a convergent non-geometric discrete density as outlined at the end of Section 4.
References
Arnold, B.C., Balakrishnan, N. and Nagaraja, H.N. (1998). Records. Wiley, New York.
Acknowledgments
We are grateful for the careful reading and helpful suggestions provided by the reviewers of an earlier version of this paper. In particular, the suggestion that weak records might be considered led to the discussion in Section 5.
Funding
Funding for Jose Villasenor was provided by the Colegio de Postgraduados, Montecillo, Mexico.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Arnold, B.C., Villasenor, J.A. Characterization of the Geometric Distribution Via Linear Combinations of Observations and of Records. Sankhya A 85, 651–657 (2023). https://doi.org/10.1007/s13171-021-00271-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-021-00271-2