The Sibling Distribution for Multivariate Life Time Data

A flexible class of multivariate distributions for continuous lifetimes is proposed. The distribution is defined in terms of the age-at-death of m siblings. The expression for the joint density is derived using classical results from mathematical demography. The parameters of the distribution are the age-specific birth and death rates, in addition to a vector of relative death times for the m siblings. For the case of constant birth and death rates we are able to derive an explicit expression for the bivariate sibling density, which is proven to be MTP2, and hence has positive dependence. Further, we show that a special case of the sibling distribution belongs to the Block-Basu class of multivariate distribution. In the general case, with age-dependent birth and death rates, evaluation of the density involves numerical integration, but is still feasible.


Introduction
Classes of multivariate densities for multivariate life time and survival data are well studied in the statistical and demographic literature (Hougaard, 2001;Barreto-Souza and Mayrink, 2019). A common approach for making survival times positively dependent goes via "shared frailties" (Hougaard, 2001, Chpt. 7). A frailty is a latent random variable that proportionally scales the hazard rate in a group of individuals, hence inducing dependence between otherwise independent lifetimes. In the present paper we introduce a new class of latent variable models, named the "sibling distribution", which is defined in terms of the age-at-death for each of m half siblings. There is no information available about their common mother, except that she had m offspring in total, and was alive at a specified point in time, taken to be t = 0 for convenience. The two latent variables of the model are the mother's birth and death times. The building blocks of the sibling distribution are the individual birth rates β(a) and death rates φ(a), where a is the age of an individual. We define the distribution for general functions, β(a) and φ(a), but for most part we shall assume that β(a) and φ(a) are constants as functions of a.
The positive dependence between the sibling's life times comes from conditioning on their (absolute) times of death, in combination with a shared dependence on the mother's life span. The times-of-death become parameters of the distribution. This somewhat implicit construction will be seen to be a mixture distribution, and can be studied using general theory for multivariate dependence (Shaked and Spizzichino, 1998;Khaledi and Kochar, 2001). We prove that the bivariate constant-rate sibling distribution is multivariate totally positive of order 2 (Karlin and Rinott, 1980), which for instance imply that the correlation is positive.
The constant-rate sibling distribution turns out to have marginal distributions that are perturbated exponential distributions. The exponential distribution plays a special role for univariate life times and several multivariate extensions can be found in the literature (Marshall and Olkin, 1967;Freund, 1961;Block and Basu, 1974;Arnold and Strauss, 1988;Gumbel, 1960;Hougaard, 1986;Sarkar, 1987). One of these extensions was introduced by Block and Basu (Block and Basu, 1974). The Block-Basu bivariate lifetime distribution can be derived by omitting the singular part of a bivariate exponential distribution as outlined by Marshall and Olkin (Marshall and Olkin, 1967), but can also be viewed as a reparametrization of Freund's distribution (Freund, 1961). We will see that the constant-rate (birth and death) sibling distribution reduces to a Block-Basu distribution, which will be used to shed light on the sibling distribution.
An alternative route to construction of multivariate life time distributions goes via copulae (Andersen, 2005). The implication also goes in the other direction; our sibling distribution induces a novel symmetric two-parameter copula.
The remaining part of the paper is organized as follows. In Section 2 we introduce the general sibling distribution. Explicit expressions in the bivariate case, along with positive dependence property, are derived in Section 3. In Section 4 we discuss the relationship to the Block-Basu distribution and in Section 5 we address simulation and parameter estimation. Finally, we provide a discussion in Section 6.

The Sibling Age Distribution
Consider a female who over her lifespan is known to have had m offspring. Denote by t j and x j the time of death and age-at-death, respectively, of the The sibling distribution... j'th offspring. The offspring are arbitrarily ordered, not according to the time of birth. We shall view t and x as random variables taking values on the real line. For notational simplicity we let j = 0 refer to the mother, and we condition on the fact that the mother was alive at time t = 0, i.e. on the event that t 0 − x 0 ≤ 0 ≤ t 0 . This assumption should be kept in mind at all times when reading this paper. We define a 0 = x 0 − t 0 as the age of the mother at t = 0, and we denote the joint density of (a 0 , t 0 ) by g(a 0 , t 0 ). Let y j = t j − x j , be the birth time of the j'th offspring. Please refer to Fig. 1 for an illustration of key quantities.
We denote random variables by capital letters. Conditionally on (A 0 , T 0 ) = (a 0 , t 0 ), and hence on where β(a) is the age-specific rate at which the mother produces offspring. We shorten our notation for conditional densities, e.g. we write f Y rather than the full f Y |A 0 ,T 0 . The joint conditional density of (X j , T j ) is where z + = max(z, 0). The constraints on x j express the fact that x j ≥ 0 and y 0 ≤ y j ≤ t 0 , i.e. the offspring must be born in the time window when the mother is alive (see Fig. 1). The latter is related to Eq. 2.3 via the algebraic equivalence (2.4) Figure 1: Birth (y 0 ) and death (t 0 ) times of mother, and corresponding times (y j and t j ) for the j'th offspring. Further, a 0 is the age of the mother at the reference point t = 0 The births and deaths of the m siblings are assumed to be conditionally independent, given (A 0 , T 0 ) = (a 0 , t 0 ), so the joint density of X 1:m = (X 1 , . . . , X m ) and T 1: , where the density f X,T (x j , t j |a 0 , t 0 ) is given by Eq. 2.2. We are now in position to define the sibling distribution as the conditional distribution of X 1:m , given T 1:m = t 1:m .
Remark 1. The sibling distribution of X 1:m has density (2.5) where f X,T and f T are given by Eqs. 2.2 and 2.4, respectively, and g is the joint density of (A 0 , T 0 ) for which we will derive the density (2.7) below.
The parameters of the sibling distribution are t 1:m ∈ R m , in addition to whatever parameters are hidden in the functional forms of the functions β(a) and l(a). Note that the t j are not restricted to be positive, i.e. the offspring may have died before t = 0. Mostly, we shall parameterize l(a) in terms of the age-specific death rate φ(a), which is related to the survival function through the well known relationship l(x) = exp − x 0 φ(a)da . In order to derive an expression for the density g(a 0 , t 0 ), occurring in Eq. 2.5, we use the theory for stable age distributions from mathematical demography (Caswell and Keyfitz, 2005), which we now briefly review. A population in which the age-specific rates φ(a) and β(a) do not change with time will settle into a stable age distribution. Further, the population will grow at a rate r given as the solution to the "characteristic equation" ∞ 0 β(a)l(a)e −ra da = 1.
The stable age distribution has density f A (a) = l(a)e −ra / a 0 l(u)e −ru du, for a ≥ 0. Our point of view is that the mother is randomly selected among all females alive at t = 0, so that the density of A 0 is given by f A . This is yet not taking into account the fact that she has m offspring over her life time. The joint density of A 0 and T 0 is where f T 0 |A 0 (t 0 |a 0 ) = −l (a 0 + t 0 )/l(a 0 ). Conditionally on A 0 and T 0 , and hence on the length of the time period X 0 = A 0 +T 0 that she is alive, her total number of offspring M is Poisson distributed with mean B(x 0 ) = The sibling distribution...
Knowing that the mother had m offspring over her lifetime perturbs the distribution (2.6) as follows We have now completely specified the sibling distribution via its density (2.5). Instead of providing results about its properties in the general case, we turn to a special case in which explicit results can be found. In Section 6 we briefly return with some discussion of the general case.

Constant Birth and Death Rates
We shall refer to the situation β(a) = β and φ(a) = φ for all a, (3.1) as the constant-rate sibling distribution. Under this assumption it follows that the conditional densities (2.1) and (2.2) reduce respectively to Further, the marginal density (2.4) becomes (3.4) and the joint density (2.7) becomes Using these expressions we are able to find an explicit expression for the sibling distribution (2.5) of order m = 2. We shall first assume that t 1 = t 2 which simplifies expressions somewhat. Derivations for the case t 1 = t 2 are very similar. Consider the distribution of the life times X 1 and X 2 , given that T 1 = T 2 = t, where t is the common time of death of the two siblings. We have the following expression for the sibling density, which due to symmetry is presented only for x 1 ≤ x 2 .
Proof. See Appendix A. By integrating over three branches in Eq. 3.6 we obtain the following expression for the normalizing constant: and similarly integration of Eq. 3.7 yields Note that f (x 1 , x 2 ) does not depend on t when t < 0 . When we in addition set β = 0 (interpreted as a limit), X 1 and X 2 become independent, exponentially distributed. Further interpretation of the case that t < 0 is given in Section 4 below. Like the exponential distribution, the constant-rate sibling distribution is closed under change of scale. If we define X j = cX j for c > 0, the parameters of the resulting sibling distribution are φ = c −1 φ, β = c −1 β and t = ct. Hence, we may set φ = 1 and reparameterize the distribution in terms of (c, β, t), which for some purposes is useful.
As seen from Eq. 3.6, the density has a piecewise definition. When using symmetry to include also the case x 1 > x 2 , the definition of the sibling density splits the first quadrant, x 1 , x 2 ≥ 0, into six regions ( Fig. 2 with t 1 = t 2 ). We see that log{f (x 1 , x 2 )} is piecewise linear over these regions, and is continuous (but not differentiable) across the boundaries of the regions. The density is unimodal, with the mode at (x 1 , x 2 ) = (0, 0) when β < φ, and while β > φ the mode is at (x 1 , x 2 ) = (t, t). Figure 3 shows f (x 1 , x 2 ) for three different parameter.  Figure 2: The six different regions of the sibling density when t 1 = 9 and t 2 = 5. The dashed line indicates ages where the siblings are born at the same time (y 1 = y 2 ) In order to present the sibling distribution for the case t 1 = t 2 , it is advantageous to introduce a general piecewise log-linear density f over the regions R 1 , . . . , R 6 in Fig. 2: Here, b 1:6 = (b 1 , . . . , b 6 ), c 1:6 = (c 1 , . . . , c 6 ) and d 1:6 = (d 1 , . . . , d 6 ) are constants satisfying the constraints c 2 , c 3 , c 4 , d 3 , d 4 , d 5 < 0, which are needed for Figure 3: Bivariate sibling density f (x 1 , x 2 ) with parameters φ = 1, β = 0.8, 1.0, 1.2 (left to right) and (t 1 , t 2 ) = (4, 4). The red dots show expected value (μ, μ). The dashed white curve is the contour c = 1 of c(x 1 , x 2 ) given by Eq. 3.15 f to be a proper density. Straightforward, but tedious, integration yields the normalization constant: ( 3.11) where c 1 +d 1 = 0 and c 6 +d 6 = 0 in addition to the constraint c 2 , c 3 , c 4 , d 3 , d 4 , d 5 < 0. The constants C 1 and C 2 in Theorem 1 are special cases of this.
It is easy to derive the moment generating function of Eq. 3.10: where c 1:6 + s 1 = (c 1 + s 1 , . . . , c 6 + s 1 ) and d 1: Moments of X 1 and X 2 of various orders can be obtained by repeated differentiation of M (s 1 , s 2 ) at s 1 = s 2 = 0. The resulting expressions are complex, and not well suited for interpretation, but are nevertheless useful for numerical evaluation.
Theorem 2. With constant rates (3.1), the sibling density (2.5) with m = 2 and t 1 = t 2 has density given by Eq. 3.10 with coefficients as specified in Table 1.
Proof. The proof is very similar to that of Theorem 1 and is omitted. 3.1. Positive Association Intuitively, X 1 and X 2 are positively associated under the sibling distribution, due to their dependence of the lifespan Table 1: Choice of coefficients in Eq. 3.10 yielding the order 2 sibling density The sibling distribution... 347 (−A 0 , T 0 ) of their shared mother. Further, for each marginal (j = 1, 2) we must have that X j is stochastically increasing in the parameter t j . The informal argument for the latter is that since the mother is known to have been alive at t = 0, the larger the death time t j the older (x j ) the individual is likely to be. We now set out to prove these claims rigorously. We start out by proving the positive association between X 1 and X 2 . An appropriate notion of positive association is the so-called Multivariate Total Positivity of order 2 ( . See Karlin and Rinott (1980) for a comprehensive overview of properties of MTP 2 distributions.
Theorem 3. The sibling densities (3.6) and (3.7) are MTP 2 . This is proved in Appendix C using the definition Eq. 3.13 of MTP 2 directly. We believe that an alternative proof may be based on the fact that mixture distributions, of which the numerator of Eq. 2.5 is an example, under certain conditions are MTP 2 (Khaledi and Kochar, 2001;Shaked and Spizzichino, 1998). Using this approach it may be possible to prove that more general sibling distributions than (3.6) are MTP 2 .
MTP 2 is a strong positive dependence property, which among other things imply that cov(X 1 , X 2 ) ≥ 0. Although covariance is (arguably) not the most relevant dependence measure for life times, it nevertheless the most common dependence measure in general, and it is therefore useful to have establish this result.

Marginal Distribution and Copula
The marginal densities in Eq. 3.6 are both given as (3.14) where C 1 is given by Eq. 3.8. As a local measure of dependency between X 1 and X 2 we introduce The c = 1 contour of c(x 1 , x 2 ) is displayed in Fig. 3. The region in which c(x 1 , x 2 ) > 1 is located around the diagonal x 1 = x 2 . This reflects the positive dependence in the sibling distribution.
We can obtain an analytical expression for the cumulative joint distribution F (x 1 , x 2 ) by integrating Eq. 3.6. Similarly, we get an expression for the cumulative marginal distribution function G(x) by integrating Eq. 3.14. Then we can define a copula (Nelsen, 2007), F G −1 (x 1 ), G −1 (x 2 ) , based on the sibling distribution, where G −1 denotes the inverse of G. Because the sibling distribution is closed under change of scale, we set φ = 1, and the copula thus has β and t as free parameters. We do not explore this copula further in this paper.
We next prove that X (X 1 or X 2 ) is stochastically increasing in t, in the sense of the following theorem.
Theorem 4. For any t > t > 0 we have ( 3.16) It turns out to be easier to prove the more general statement that (X 1 , X 2 ) is multivariate stochastically increasing (Shaked and Shanthikumar, 2007, p. 265), which imply Eq. 3.16. The reason this is simpler is that the key quantity, the ratio f (x; t )/f (x; t), which is involved in Theorem 6.B.8. of Shaked and Shanthikumar (2007, p. 265), takes on a simpler form for the bivariate density (3.6) than for the univariate density (3.14). The details of the proof are given in Appendix D.
Stochastic monotonicity of a random variable X implies that E(X) is an increasing function of t (Shaked and Shanthikumar, 2007, p. 4). This means that, for given φ and β, there is a one-to-one correspondence between t and μ = E(X). This fact will play a crucial role when we later devise an estimator for the parameters φ, β and t.
3.3. The Role of β and t Because the family of constant-rate sibling distributions is closed under change of scale we set φ = 1. In this section we will study the effect of varying β and t on two characteristics: the correlation (COR) between X 1 and X 2 and CV(X j ) = Var(X j )/E(X j ). Without conditioning on T j = t, we have that X j is exponentially distributed with rate φ = 1. The process of conditioning on T j can be expected to deduce CV(X j ). Rather than trying to prove this formally, we provide numerical evidence. Figure 4 shows correlation and CV as functions of β (top) and t (bottom), and indeed we see that CV ≤ 1 for all β and t. When β and t are both close to zero we have CV ≈ 1 and COR ≈ 0, which reflects the fact that X 1 and X 2 are then approximately independent and exponentially distributed. For increasing β the correlation increases, but not necessarily monotonically, and approaches an asymptotic level (top-left). From the corresponding plot of The sibling distribution... the CV (top-right) we notice that the overall trend is that the CV decreases as the value of β increases. The decrease is steepest for the highest values of t.
Further, we see from the bottom row of Fig. 4 that the correlation increases as a function of t, except for very small t. It can be shown that when β = φ the correlation approaches 1 as t → ∞. When β = φ, the correlation does not approach 1 as t → ∞, but flattens out at a lower value which depends on the value of β. The CV decrease quickly as a function of t, especially for the higher values of β. When β < φ we see that the CV first decrease, then starts to increase for higher values of t.

Relationship to the Block-Basu Distribution
In this section we clarify the relationship between the constant-rate sibling distribution and the Block-Basu distribution (Block and Basu, 1974), and we shall use this relationship to interpret the sibling distribution. One way of deriving the Block-Basu distribution goes via (Freund, 1961), and we will refer to this as the "Freund interpretation". Let now X 1 and X 2 be the lifetimes of two components assumed to be independently exponentially distributed with rate parameters α 1 and α 2 , respectively. When one of the component fails, the rate for the remaining component changes from α 1 to α * 1 or from α 2 to α * 2 , depending on which component fails first. The resulting density (see Freund (1961)) is When setting we see that Eq. 4.1 reduces to the symmetric sibling density (3.7) with t ≤ 0. Since t = 0 is the time point at which the mother is known to have been alive, we are effectively considering a sibling distribution where both offspring are dying before their mother. The Freund interpretation yields that X (1) = min(X 1 , X 2 ), i.e. the age of the youngest sibling, has an exponential distribution with rate parameter α 1 + α 2 = 2(β + φ). Further, the age difference between the oldest and the youngest, X (2) = max(X 1 , X 2 ) − min(X 1 , X 2 ), has an exponential distribution with rate parameter α * = 2β + φ. In the following paragraphs we interpret the rates (4.2) in the context of the sibling distribution.
If we consider the case with t = 0, we know that the mother and her two offspring were all alive at (or just prior to) t = 0. The Freund interpretation requires us to look backwards in time, starting from t = 0. The "failure" of a component corresponds to an offspring being born. We first look at the event X (1) > x, which can be broken into three sub events: (i) The mother was born prior to t = −x, i.e. a 0 > x. Because the stable age distribution of A 0 is exponential with rate β, we have P (A 0 > x) = exp(−βx).
(ii) Both offspring were born prior to −x, and because we are conditioning on there being m = 2 siblings in total, this implies that there were no additional births in (−x, 0). The latter has probability exp(−βx).

When combining the independent events i)-iii) we get the Freund interpretation
The event X (2) > x can be interpreted similarly, but we must shift our point of view backwards in time to t = −x (1) when the youngest sibling was born. The mother would have to be born prior to t = −(x + x (1) ). Using the stable age distribution of A 0 , this has conditional probability Secondly, there couldn't have been any births between t = − x + x (1) and t = −x (1) , which has probability exp (−βx). Finally, the offspring that was born at t = −x (2) survived from t = − x + x (1) until t = −x (1) , which has probability exp (−φx). In total we get P X (2) > x = exp [−(2β + φ)x], which is the Freund interpretation of the sibling distribution. Similar arguments applies to the situation t < 0.
Finally, we discuss a few additional insights gained from the Freund interpretation (4.2). First, the age difference between the siblings, X 1 − X 2 follows a Laplace distribution with rate 2β + φ. Further, note that α * j → α j = φ as β → 0. Hence, X 1 and X 2 are independent in the limit β → 0, each having an exponential distribution with rate φ.

Simulation, Estimation and Application to Real Data
We first devise an algorithm for sampling (x 1 , x 2 ) from the density (3.6). Rather than sampling directly from Eq. 3.6, which would be feasible albeit a bit technical, we choose to go back to the definition of the sibling distribution. This involves explicitly sampling (a 0 , t 0 ) for the mother. As a byproduct, our algorithm sheds light on the sibling distribution, through an expression for the conditional density of (A 0 , T 0 ), given (T 1 , T 2 ).
We also construct a hybrid moment/maximum likelihood estimator for the parameter vector θ = (β, φ, t). The statistical properties of this estimator are investigated on simulated data.

Simulation
The joint density of (A 0 , T 0 ), (X 1 , T 1 ) and (X 2 , T 2 ) is given by where for notational simplicity we suppress subscripts on densities and range of variables in this section. The quantities on the left-hand side are given by Eqs. 3.3 and 3.5, while the right-hand side is a generic refactoring of the joint density in terms of conditional densities. Dividing through by f (t 1 , t 2 ) in Eq. 5.1 we obtain the target distribution, and the right-hand side (5.1) suggests the following algorithm for sampling (X 1 , X 2 ) conditionally on (T 1 , T 2 ): Step (ii) is straight forward, and is seen from Eq. 3.3 to amount to sampling from an exponential distribution with x j constrained to a certain interval.

Estimation
Consider n observation pairs {(x 1i , x 2i ); i = 1, . . . , n} from Eq. 3.6. While f (x 1 , x 2 ) is continuous as a function of θ = (β, φ, t), it is not differentiable in t at t = x 1 and t = x 2 . This implies that the log-likelihood has 2n points where the derivative is not differentiable. Standard numerical optimization algorithms typically either do not use derivative information at all, or requires the objective function to be continuously differentiable in all variables. The former types of algorithms are slow and unstable, and the latter type are not directly applicable to our setting. We thus devise a special two-stage estimation algorithm.
Because X 1 and X 2 have the same marginal distribution we define the overall sample meanx = (2n) −1 n i=1 (x 1i + x 2i ). We denote by μ(β, φ, t) the expectation of X 1 and X 2 , and impose the constraint μ(β, φ, t) =x on The sibling distribution... the parameter estimation problem. An analytical expression for μ(β, φ, t) is given in Appendix B. The expression is complicated, but nevertheless well suited for numerical evaluation. Recall that we have proven earlier that μ(β, φ, t) is increasing as a function of t.

Simulation Experiment
Using the algorithm of Section 5.1, we sampled n = 1000 observation pairs (x 1 , x 2 ) to which the estimator of Section 5.2 was applied. This process was repeated 1000 times to assess the statistical properties of the estimator. Table 2 shows the results for 20 different parameter combinations (one row per combination). Moments characterizing the distribution of (X 1 , X 2 ), obtained from the moment generating function (3.12), are also given in the table.
From Table 2 we see that the estimator for the parameter vector θ = (β, φ, t) is overall stable with respect to the different combinations of the input parameters. More specifically, for the parameter t in the first column, we see that the mean values of the estimates are all very close to the true value of t, but they are slightly worse in the case when β > φ. The same trend can be seen in the standard deviations, which are higher in these situations. For the parameter β we see that the mean of the estimates are more accurate for higher values of t, but we also see the trend with better predictions when β > φ. The standard deviations are however quite stable for all combinations of the input parameters. The mean values of the estimates for the parameter φ are all very similar and they do not seem to be affected by the different combinations of the input parameters. The standard deviations are higher when t = 2, but otherwise quite similar. Figure 5 shows the marginal density (3.14) for some of the parameter combinations used in Table 2.

Application to Real Data
We have presented the sibling distribution as a distribution for life times, but it may in fact be applied to any set of non-negative quantities with positive dependence. The constant-rate case is applicable only when the CV is less than one, and when t 1 = t 2 the marginals must be the same. We use the "twinData" dataset found in the R-package "OpenMx" (Neale et al., 2016) as an example. These are BMI measurements on twins (around age 18), but nevertheless satisfy the above mentioned restrictions (Table 3).  The column "True" shows the values used in the simulations, while "Mean" and "SD" are, respectively, the average and standard deviation of (β,φ,t) across 1000 repetitions. The three rightmost columns show respectively E(X), CV (X) = Var(X)/E(X), and COV(X1, X2), all calculated using the moment generating function for the true parameter values The dataset consists of BMI measurements for 3569 (male/female, monozygotic/dizygotic) twin pairs. The fitted sibling distribution is unable to accomodate the light left-hand tail of data (Fig. 6). The fitted density is huge perturbation of the unconditional (on T ) distribution of X, which for the constant-rate case is an exponential distribution. This illustrates the flexibility of the distribution. Table 3 shows the parameters of the fitted distribution. The lack of fit is reflected in estimated moments not fitting the empirical moments very well. If we look at the estimated parameter values in Fig. 6 we see that these are "outside in the normal range", in the following sense. The expected life length of an individual is 1/φ = 1/0.94 = 1.12, but The sibling distribution...  Table 2, which gives the parameter setting used for the different density curves the mother-offspring duo spanned (mother's birth to offspring's death) at leastt = 21.77 time units.

Discussion
The idea of a sibling distribution was conceived during our work with the recently invented Close-Kin Mark-Recapture method (Bravington et al., 2016), in which the joint age distribution of half-siblings plays a crucial role. Its usefulness as a distribution for multivariate life time data in general remains to be explored. The fact that it is a mixture (over A 0 and T 0 ) of independent life times makes it amenable to analysis in the framework of Shaked and Spizzichino (1998). However, due to the conditioning on  T j its structure, and in particular the moments, is complicated. Moreover, the non-differentiability of the likelihood with respect to the parameter t prevents straight forward application of maximum likelihood estimation.
Our numerical experiments indicate that the constant-rate (3.1) distribution has CV ≤ 1. This is clearly limiting for a general purpose life time distribution, but this restriction can be removed by choosing a non-constant φ(a). We have not been able to obtain explicit expressions for the sibling density under more general conditions. In general, the sibling density (2.5) can be evaluated numerically. Both the numerator and denominator involves two-dimensional integrals (with respect to a 0 and t 0 ). The integrand of the denominator is a product of m functions on form (2.4), which each involves a one dimensional integral. This is by no means computationally prohibitive, but specially tailored numerical integration schemes would have to be devised in order for the general distribution to be practically useful.
We have shown that the constant-rate sibling distribution with t 1 = t 2 = 0 coincides with the exchangeable Block-Basu distribution (α 1 = α 2 and α * 1 = α * 2 ). The non-exchangeable case is not a sibling distribution, as the sibling framework requires that φ and β are the same for both siblings. Conversely, the sibling distribution with t 1 = t 2 , or t 1 = t 2 > 0, is not a Block-Basu distribution. Also, when allowing age-specific rates, φ(a) and β(a), the sibling distribution is no longer a Block-Basu distribution. Kundu and Gupta (2010) extended the Block-Basu distribution by deriving it from components that were Weibull distributed instead of exponential. The additional shape parameter makes the extended Block-Basu distribution more flexible at the cost of being less computational tractable.
Finally, we have proven that the constant-rate sibling distribution is MTP 2 and stochastically increasing in t. We conjecture, based on literature for mixture distributions (Shaked and Spizzichino, 1998, p. 273;Shaked and Shanthikumar, 2007) that these properties hold for a wider class of sibling distributions, but possibly not for all. Figure 7: Part of the proof of Theorem C. The green dashed line is an example of a level curve (C.2). The red rectangles are used to prove supermodularity and (B, C, C , B ), and it is sufficient to check Eq. C.1 for the two parts. Hence, to check all of i)-iii) it suffices to check all the red sub-rectangles in Fig. 7, which we will now do. First, (A, B, B , A ) lies in a single region (R 6 ) so Eq. C.1 holds. For (B, C, C , B ) we find by looking at the level curves of g that g(C) > g (B) and g(C ) = g(B ), which imply that Eq. C.1 holds. Finally, the (solid black) vertical line x 1 = t splits (C, D, D , C ) in two parts which each line entirely in R 1 and R 2 , respectively. This completes the proof for x 2 < t.
The situation that z 2 > t, i.e. the red part of the figure is moved above the (solid black) horizontal line x 2 = t, follows by symmetry. The remaining case, x 2 > t > z 1 , can be handled by splitting in two the rectangle horizontally at the x-axis, for each of which we know Eq. C.1 holds. Since supermodularity is also additive when splitting a rectangle horizontally, we have completed the proof.
We prove that the conditions in Theorem 6.B.8 in (Shaked and Shanthikumar, 2007) are satisfied. First, it follows from the MTP 2 property that (X 1 , X 2 ) is "associated" in the sense of the theorem (Karlin and Rinott, 1980, Eq. (1.7)). Define the function h(x 1 , x 2 ) = log [f (x 1 , x 2 ; t )/f (x 1 , x 2 ; t)]. The main condition of Theorem 6.B.8 is that h(x 1 , x 2 ) is increasing in (x 1 , x 2 ) for t > t. To verify this we will check that ∂h ∂x 1 ≥ 0 and ∂h ∂x 2 ≥ 0, (D.1) which together with the continuity of h is sufficient. We build on the proof of Theorem 3, and denote the six regions of Fig. 7 associated with t by R 1 , . . . , R 6 . We need to verify Eq. D.1 when (x 1 , x 2 ) ∈ R j ∩R k for different values of j and k. When j = k it follows that h(x 1 , x 2 ) = 0 which implies Eq. D.1. Next, due to the fact that t ≤ t many combinations of j and k cannot occur, and we are left with the following list to check: For all of these combinations (D.1) holds. Note that we have skipped additive terms in h that does not depend on x 1 or x 2 . This completes the proof.