Correlated endpoints: simulation, modeling, and extreme correlations

Leonov, Sergei; Qaqish, Bahjat

doi:10.1007/s00362-017-0960-2

Correlated endpoints: simulation, modeling, and extreme correlations

Regular Article
Published: 25 October 2017

Volume 61, pages 741–766, (2020)
Cite this article

Statistical Papers Aims and scope Submit manuscript

374 Accesses
5 Citations
Explore all metrics

Abstract

Modeling and simulation of correlated random variables are important for evaluating operating characteristics of experimental designs in various applications, of which clinical trials with multiple endpoints provide an important example. There exist efficient algorithms to address the problem of generating multivariate distributions with given marginals and correlation structure. For model fitting as well as for simulation, it is important to know the feasible range of pairwise correlations, which can be much narrower than the interval $[-\,1,+\,1]$. We provide closed-form expressions for extreme correlations for several classes of bivariate distributions that involve both discrete and continuous endpoints, as well as an algorithm for the construction of such distributions in the discrete case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Evaluating significance in linear mixed-effects models in R

Article 12 September 2016

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

Article Open access 19 December 2014

References

Abramowitz M, Stegun IA (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables. National Bureau of Standard, Applied Mathematics Series, v. 55, Washington, DC
Bai TR, Vonk JM, Postma DS, Boezen HM (2007) Severe exacerbations predict excess lung function decline in asthma. Eur Respir J 30:452–456
Article Google Scholar
Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Dept. of Industrial Engineering and Management Sciences, Northwestern University, Evanston. http://users.iems.northwestern.edu/~nelsonb/norta4.ps
Chaganty NR, Joe H (2006) Range of correlation matrices for dependent Bernoulli random variables. Biometrika 93(1):197–206
Article MathSciNet MATH Google Scholar
Demirtas H, Hedeker D (2011) A practical way for computing approximate lower and upper correlation bounds. Am Stat 65(2):104–109
Article MathSciNet MATH Google Scholar
De Veaux D (1976) Tight upper and lower bounds for correlation of bivariate distributions arising in air pollution models. Techical Report 5, Dept. of Statistics, Stanford University. https://statistics.stanford.edu/sites/default/files/SIMS%2005.pdf
Dukic VM, Marić N (2013) Minimum correlation in construction of multivariate distributions. Phys Rev E 87(3):032114
Article Google Scholar
Devroye L (1986) Non-uniform random variate generation. Springer, New York
Book MATH Google Scholar
Emrich LJ, Piedmonte MR (1991) A method for generating high-dimensional multivariate binary variates. Am Stat 45(4):302–304
Google Scholar
Farrell PJ, Rogers-Stewart K (2008) Methods for generating longitudinally correlated binary data. Int Stat Rev 76:28–38
Article MATH Google Scholar
Farrell PJ, Sutradhar BC (2006) A non-linear conditional probability model for generating correlated binary data. Stat Probab Lett 76:353–361
Article MathSciNet MATH Google Scholar
Fedorov VV, Leonov SL (2013) Optimal design for nonlinear response models. CRC Biostatistics Series. Chapman & Hall, Boca Raton
Book MATH Google Scholar
Fedorov V, Wu Y, Zhang R (2012) Optimal dose-finding designs with correlated continuous and discrete responses. Stat Med 31(3):217–234
Article MathSciNet Google Scholar
Ghosh S, Henderson SG (2003) Behavior of the NORTA method for correlated random vector generation as the dimension increases. ACM T Model Comput S (TOMACS) 13(3):276–294
Article MATH Google Scholar
Gradstein M (1986) Maximal correlation between normal and dichotomous variables. J Educ Stat 11(4):259–261
Article Google Scholar
Ivanova A (2003) A new dose-finding design for bivariate outcomes. Biometrics 59(4):1001–1007
Article MathSciNet MATH Google Scholar
Jokela M, Berg V, Silventoinen K, Batty GD, Singh-Manoux A, Kaprio J, Davey Smith G, Kivimäki M (2016) Body mass index and depressive symptoms: testing for adverse and protective associations in two twin cohort studies. Twin Res Hum Genet 19(4):306–311
Article Google Scholar
Kang SH, Jung SH (2001) Generating correlated binary variables with complete specification of the joint distribution. Biom J 43:263–269
Article MathSciNet MATH Google Scholar
Konietschke F, Pauly M (2014) Bootstrapping and permuting paired t-test type statistics. Stat Comput 24:283–296
Article MathSciNet MATH Google Scholar
Krummenauer F (1998) Efficient simulation of multivariate binomial and Poisson distributions. Biom J 40(7):823–832
Article MathSciNet MATH Google Scholar
Lee AJ (1993) Generating random binary deviates having fixed marginal distributions and specified degrees of association. Am Stat 47(3):209–215
Google Scholar
Lewin L (1958) Dilogarithms and associated functions. Macdonald, London
MATH Google Scholar
Long D, Krzysztofowics R (1995) A family of bivariate densities constructed from marginals. J Am Stat Assoc 90(430):739–746
Article MathSciNet MATH Google Scholar
Lunn AD, Davies SJ (1998) A note on generating correlated binary variables. Biometrika 85:487–490
Article MathSciNet MATH Google Scholar
Makris D, Moschandreas J, Damianaki A, Ntaoukakis E, Siafakas NM, Milic Emili J, Tzanakis N (2007) Exacerbations and lung function decline in COPD: new insights in current and ex-smokers. Resp Med 101:1305–1312
Article Google Scholar
Michael JR, Schucany WR (2002) The mixture approach for simulating bivariate distributions with specified correlations. Am Stat 56(1):48–54
Article MathSciNet MATH Google Scholar
Olkin I, Tate RF (1961) Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32(2):448–465
Article MathSciNet MATH Google Scholar
Oman SD, Zucker DM (2001) Modelling and generating correlated binary variables. Biometrika 88:287–290
Article MathSciNet MATH Google Scholar
Park CG, Park T, Shin DW (1996) A simple method for generating correlated binary variates. Am Stat 50(4):306–310
MathSciNet Google Scholar
Prentice RL (1988) Correlated binary regression with covariates specific to each binary observation. Biometrics 44(4):1033–1048
Article MathSciNet MATH Google Scholar
Qaqish BF (2003) A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90:455–463
Article MathSciNet MATH Google Scholar
Qaqish BF, Ivanova A (2006) Multivariate logistic models. Biometrika 93(4):1011–1017
Article MathSciNet MATH Google Scholar
Shih WJ, Huang W-M (1992) Evaluating correlation with proper bounds. Biometrics 48(4):1207–1213
Article Google Scholar
Shin K, Pasupathy R (2010) An algorithm for fast generation of bivariate Poisson random vectors. INFORMS J Comput 22(1):81–92
Article MathSciNet MATH Google Scholar
Sutradhar BC (2011) Dynamic mixed models for familial longitudinal data. Springer, New York
Book MATH Google Scholar
Sutradhar BC, Farrell PJ (2007) On optimal lag 1 dependence estimation for dynamic binary models with application to asthma data. Sankhya 69(3):448–467
MathSciNet MATH Google Scholar
Tate RF (1954) Correlation between a discrete and a continuous variable. Point-biserial correlation. Ann Math Stat 25(3):603–607
Article MathSciNet MATH Google Scholar
Teixeira-Pinto A, Normand SL (2009) Correlated bivariate continuous and binary outcomes: issues and applications. Stat Med 28(13):1753–1773
Article MathSciNet Google Scholar
Thall PF, Cook JD (2004) Dose-finding based on efficacy-toxicity trade-offs. Biometrics 60(3):684–693
Article MathSciNet MATH Google Scholar
Whitt W (1976) Bivariate distributions with given marginals. Ann Stat 4(6):1280–1289
Article MathSciNet MATH Google Scholar
Yahav I, Shmueli G (2012) On generating multivariate Poisson data in management science applications. Appl Stoch Model Bus 28:91–102
Article MathSciNet MATH Google Scholar
Yang Y, Zhao H, Heath AC, Madden PA, Martin NG, Nyholt DR (2016) Shared genetic factors underlie migraine and depression. Twin Res Hum Genet 19(4):341–350
Article Google Scholar
Zhou Y, Whitehead J, Bonvini E, Stevens JW (2006) Bayesian decision procedures for binary and continuous bivariate dose-escalation studies. Pharm Stat 5(2):125–133
Article Google Scholar

Download references

Acknowledgements

The authors wish to thank two anonymous referees for their constructive comments on an earlier version of the paper.

Author information

Authors and Affiliations

Innovation Center, ICON Clinical Research, 2100 Pennbrook Parkway, North Wales, PA, 19454, USA
Sergei Leonov
UNC Gillings School of Global Public Health, Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC, 27599, USA
Bahjat Qaqish

Authors

Sergei Leonov
View author publications
You can also search for this author in PubMed Google Scholar
Bahjat Qaqish
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergei Leonov.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix

1.1 A.1. Proof of formulas (14), (15), Example 7

Let $f_{-1}=0$ and note that the set $\{f_i\}$ defined in (13) satisfies

$$\begin{aligned} 0 = f_{-1} \le f_0 \le f_1 \le f_2 \le \dots , ~~\lim _i f_i = 1. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathrm{E}(Y_1 Y_2)= & {} \mathrm{E}[G^{-1}(U) F^{-1}(U)] = \int _0^1 G^{-1}(u) F^{-1}(u) du\nonumber \\= & {} \sum _{i=0}^{\infty } \int _{f_{i-1}}^{f_i} G^{-1}(u) F^{-1}(u) du. \end{aligned}$$

(33)

If $u\in (f_{i-1},f_i)$, then $F^{-1}(u) \equiv i$ by definition of the inverse function. Thus, (33) implies that

$$\begin{aligned} \mathrm{E}(Y_1 Y_2) = \sum _{i=0}^{\infty } i~\int _{f_{i-1}}^{f_i} G^{-1}(u) du = \sum _{i=1}^{\infty } i~\int _{f_{i-1}}^{f_i} G^{-1}(u) du . \end{aligned}$$

(34)

Using the transformation of variables $x=G^{-1}(u)$, and noting that $du = g(x) dx$, one gets

$$\begin{aligned} \mathrm{E}(Y_1 Y_2) = \sum _{i=1}^{\infty } i \int _{G^{-1}(f_{i-1})}^{G^{-1}(f_i)} x g(x) dx, \end{aligned}$$

which implies (14).

To prove (15), remark that

$$\begin{aligned} \mathrm{E}(\bar{Y}_1 Y_2) = \sum _{i=1}^{\infty } i \int _{f_{i-1}}^{f_i} G^{-1}(1-u) du. \end{aligned}$$

If $x=G^{-1}(1-u)$, then $du = -g(x) dx$ and

$$\begin{aligned} \mathrm{E}(\bar{Y}_1 Y_2) = - \sum _{i=1}^{\infty } i \int _{G^{-1}(1-f_{i-1})}^{G^{-1}(1-f_i)} x g(x) dx = \sum _{i=1}^{\infty } i \int _{G^{-1}(1-f_i)}^{G^{-1}(1-f_{i-1})} x g(x) dx, \end{aligned}$$

which proves (15). Note that $f_i$ is a non-decreasing function of i and, thus, $G^{-1}(1-f_i) \le G^{-1}(1-f_{i-1})$.

1.2 A.2. Proof of formulas (16), (17), Example 8

To prove (16), we will need the following property of the standard normal distribution:

$$\begin{aligned} I_{a,b} = \int _a^b \varPhi ^{-1}(u) du = \phi [\varPhi ^{-1}(a)] - \phi [\varPhi ^{-1}(b)]. \end{aligned}$$

(35)

Indeed, make the transformation of variables, $x = \varPhi ^{-1}(u)$, so that

$$\begin{aligned} u = \varPhi (x),~du = \phi (x) dx. \end{aligned}$$

(36)

Now it follows from (9) and (36) that

$$\begin{aligned} I_{a,b} = \int _{\varPhi ^{-1}(a)}^{\varPhi ^{-1}(b)} x \phi (x) dx = - \int _{\varPhi ^{-1}(a)}^{\varPhi ^{-1}(b)} d[\phi (x)] = \phi [\varPhi ^{-1}(a)] - \phi [\varPhi ^{-1}(b)], \end{aligned}$$

which proves (35).

To prove the first formula in (16), since $\mathrm{E}(X_2)= 0$ and $\mathrm{Var}(X_2) = 1$, it is sufficient to show that

$$\begin{aligned} I = \mathrm{E}[\varPhi ^{-1}(U) F^{-1}(U)] = \sum _{i=0}^{\infty } \phi [\varPhi ^{-1}(f_i)]. \end{aligned}$$

(37)

It follows from (34) that

$$\begin{aligned} I = \sum _{i=1}^{\infty } i~\int _{f_{i-1}}^{f_i} \varPhi ^{-1}(u) du , \end{aligned}$$

and, therefore, (35) implies:

$$\begin{aligned} I= & {} \sum _{i=1}^{\infty } i~\{\phi [\varPhi ^{-1}(f_{i-1})] - \phi [\varPhi ^{-1}(f_i)]\} \nonumber \\= & {} \phi [\varPhi ^{-1}(f_{0})] - \phi [\varPhi ^{-1}(f_1)] + 2 \{\phi [\varPhi ^{-1}(f_{1})] - \phi [\varPhi ^{-1}(f_2)]\} \nonumber \\&+\,3 \{\phi [\varPhi ^{-1}(f_{2})] - \phi [\varPhi ^{-1}(f_3)]\} + ~\dots ~= \sum _{i=0}^{\infty } \phi [\varPhi ^{-1}(f_i)], \end{aligned}$$

(38)

which proves (37) and thus proves (16).

Formula (17) also follows from the equalities in (38) since $f_k = 1$ when $X_1$ has a finite support with $P_i=0$ for all $i>k$.

1.3 A.3. Proof of formula (22)

We present the expectation $\mathrm{E}(X)$ as

$$\begin{aligned} \mathrm{E}(X) = \int _0^1 F^{-1}_X(u) du = \int _0^{1-p} F^{-1}_X(u) du + \int _{1-p}^1 F^{-1}_X(u) du . \end{aligned}$$

(39)

If $Y_1 = F^{-1}_X(u), ~Y_2 = Bern^{-1}_p(U)$ and $Y_2^- = Bern^{-1}_{1-p}(1-U)$, then it follows from (20) that

$$\begin{aligned} \mathrm{E}(Y_1 Y_2) = \int _{1-p}^1 F^{-1}_X(u) du, ~\text{ and }~~\mathrm{corr}(Y_1,Y_2) = \frac{\mathrm{E}(Y_1 Y_2) -p\mathrm{E}(X)}{\sqrt{p(1-p)\mathrm{Var}(X)}}. \end{aligned}$$

On the other hand, $Y_2^- = I(1-U > p) = I(U < 1- p)$ and it follows from (39) that $\mathrm{E}(Y_1^- Y_2) = \mathrm{E}(X) - \mathrm{E}(Y_1 Y_2)$ and, consequently,

$$\begin{aligned} \mathrm{corr}(Y_1^-, Y_2)= & {} \frac{\mathrm{E}(X)-\mathrm{E}(Y_1 Y_2) - (1-p)\mathrm{E}(X)}{\sqrt{p(1-p)\mathrm{Var}(X)}} = \frac{-\mathrm{E}(Y_1 Y_2) + p\mathrm{E}(X)}{\sqrt{p(1-p)\mathrm{Var}(X)}} \\= & {} -\mathrm{corr}(Y_1,Y_2), \end{aligned}$$

which proves (22).

1.4 A.4. Example 15. Code for computing matrix {$\pi _{ij}$} in (27) and implementing formula (28)

Matlab code for (27), (28) (the text after % is a comment)

R code for (27), (28)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leonov, S., Qaqish, B. Correlated endpoints: simulation, modeling, and extreme correlations. Stat Papers 61, 741–766 (2020). https://doi.org/10.1007/s00362-017-0960-2

Download citation

Received: 02 June 2017
Revised: 11 October 2017
Published: 25 October 2017
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00362-017-0960-2

Keywords

Mathematics Subject Classification

62H20

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Correlated endpoints: simulation, modeling, and extreme correlations

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Evaluating significance in linear mixed-effects models in R

Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range

References

Acknowledgements