Skip to main content
Log in

Correlated endpoints: simulation, modeling, and extreme correlations

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Modeling and simulation of correlated random variables are important for evaluating operating characteristics of experimental designs in various applications, of which clinical trials with multiple endpoints provide an important example. There exist efficient algorithms to address the problem of generating multivariate distributions with given marginals and correlation structure. For model fitting as well as for simulation, it is important to know the feasible range of pairwise correlations, which can be much narrower than the interval \([-\,1,+\,1]\). We provide closed-form expressions for extreme correlations for several classes of bivariate distributions that involve both discrete and continuous endpoints, as well as an algorithm for the construction of such distributions in the discrete case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Abramowitz M, Stegun IA (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables. National Bureau of Standard, Applied Mathematics Series, v. 55, Washington, DC

  • Bai TR, Vonk JM, Postma DS, Boezen HM (2007) Severe exacerbations predict excess lung function decline in asthma. Eur Respir J 30:452–456

    Article  Google Scholar 

  • Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Dept. of Industrial Engineering and Management Sciences, Northwestern University, Evanston. http://users.iems.northwestern.edu/~nelsonb/norta4.ps

  • Chaganty NR, Joe H (2006) Range of correlation matrices for dependent Bernoulli random variables. Biometrika 93(1):197–206

    Article  MathSciNet  MATH  Google Scholar 

  • Demirtas H, Hedeker D (2011) A practical way for computing approximate lower and upper correlation bounds. Am Stat 65(2):104–109

    Article  MathSciNet  MATH  Google Scholar 

  • De Veaux D (1976) Tight upper and lower bounds for correlation of bivariate distributions arising in air pollution models. Techical Report 5, Dept. of Statistics, Stanford University. https://statistics.stanford.edu/sites/default/files/SIMS%2005.pdf

  • Dukic VM, Marić N (2013) Minimum correlation in construction of multivariate distributions. Phys Rev E 87(3):032114

    Article  Google Scholar 

  • Devroye L (1986) Non-uniform random variate generation. Springer, New York

    Book  MATH  Google Scholar 

  • Emrich LJ, Piedmonte MR (1991) A method for generating high-dimensional multivariate binary variates. Am Stat 45(4):302–304

    Google Scholar 

  • Farrell PJ, Rogers-Stewart K (2008) Methods for generating longitudinally correlated binary data. Int Stat Rev 76:28–38

    Article  MATH  Google Scholar 

  • Farrell PJ, Sutradhar BC (2006) A non-linear conditional probability model for generating correlated binary data. Stat Probab Lett 76:353–361

    Article  MathSciNet  MATH  Google Scholar 

  • Fedorov VV, Leonov SL (2013) Optimal design for nonlinear response models. CRC Biostatistics Series. Chapman & Hall, Boca Raton

    Book  MATH  Google Scholar 

  • Fedorov V, Wu Y, Zhang R (2012) Optimal dose-finding designs with correlated continuous and discrete responses. Stat Med 31(3):217–234

    Article  MathSciNet  Google Scholar 

  • Ghosh S, Henderson SG (2003) Behavior of the NORTA method for correlated random vector generation as the dimension increases. ACM T Model Comput S (TOMACS) 13(3):276–294

    Article  MATH  Google Scholar 

  • Gradstein M (1986) Maximal correlation between normal and dichotomous variables. J Educ Stat 11(4):259–261

    Article  Google Scholar 

  • Ivanova A (2003) A new dose-finding design for bivariate outcomes. Biometrics 59(4):1001–1007

    Article  MathSciNet  MATH  Google Scholar 

  • Jokela M, Berg V, Silventoinen K, Batty GD, Singh-Manoux A, Kaprio J, Davey Smith G, Kivimäki M (2016) Body mass index and depressive symptoms: testing for adverse and protective associations in two twin cohort studies. Twin Res Hum Genet 19(4):306–311

    Article  Google Scholar 

  • Kang SH, Jung SH (2001) Generating correlated binary variables with complete specification of the joint distribution. Biom J 43:263–269

    Article  MathSciNet  MATH  Google Scholar 

  • Konietschke F, Pauly M (2014) Bootstrapping and permuting paired t-test type statistics. Stat Comput 24:283–296

    Article  MathSciNet  MATH  Google Scholar 

  • Krummenauer F (1998) Efficient simulation of multivariate binomial and Poisson distributions. Biom J 40(7):823–832

    Article  MathSciNet  MATH  Google Scholar 

  • Lee AJ (1993) Generating random binary deviates having fixed marginal distributions and specified degrees of association. Am Stat 47(3):209–215

    Google Scholar 

  • Lewin L (1958) Dilogarithms and associated functions. Macdonald, London

    MATH  Google Scholar 

  • Long D, Krzysztofowics R (1995) A family of bivariate densities constructed from marginals. J Am Stat Assoc 90(430):739–746

    Article  MathSciNet  MATH  Google Scholar 

  • Lunn AD, Davies SJ (1998) A note on generating correlated binary variables. Biometrika 85:487–490

    Article  MathSciNet  MATH  Google Scholar 

  • Makris D, Moschandreas J, Damianaki A, Ntaoukakis E, Siafakas NM, Milic Emili J, Tzanakis N (2007) Exacerbations and lung function decline in COPD: new insights in current and ex-smokers. Resp Med 101:1305–1312

    Article  Google Scholar 

  • Michael JR, Schucany WR (2002) The mixture approach for simulating bivariate distributions with specified correlations. Am Stat 56(1):48–54

    Article  MathSciNet  MATH  Google Scholar 

  • Olkin I, Tate RF (1961) Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32(2):448–465

    Article  MathSciNet  MATH  Google Scholar 

  • Oman SD, Zucker DM (2001) Modelling and generating correlated binary variables. Biometrika 88:287–290

    Article  MathSciNet  MATH  Google Scholar 

  • Park CG, Park T, Shin DW (1996) A simple method for generating correlated binary variates. Am Stat 50(4):306–310

    MathSciNet  Google Scholar 

  • Prentice RL (1988) Correlated binary regression with covariates specific to each binary observation. Biometrics 44(4):1033–1048

    Article  MathSciNet  MATH  Google Scholar 

  • Qaqish BF (2003) A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90:455–463

    Article  MathSciNet  MATH  Google Scholar 

  • Qaqish BF, Ivanova A (2006) Multivariate logistic models. Biometrika 93(4):1011–1017

    Article  MathSciNet  MATH  Google Scholar 

  • Shih WJ, Huang W-M (1992) Evaluating correlation with proper bounds. Biometrics 48(4):1207–1213

    Article  Google Scholar 

  • Shin K, Pasupathy R (2010) An algorithm for fast generation of bivariate Poisson random vectors. INFORMS J Comput 22(1):81–92

    Article  MathSciNet  MATH  Google Scholar 

  • Sutradhar BC (2011) Dynamic mixed models for familial longitudinal data. Springer, New York

    Book  MATH  Google Scholar 

  • Sutradhar BC, Farrell PJ (2007) On optimal lag 1 dependence estimation for dynamic binary models with application to asthma data. Sankhya 69(3):448–467

    MathSciNet  MATH  Google Scholar 

  • Tate RF (1954) Correlation between a discrete and a continuous variable. Point-biserial correlation. Ann Math Stat 25(3):603–607

    Article  MathSciNet  MATH  Google Scholar 

  • Teixeira-Pinto A, Normand SL (2009) Correlated bivariate continuous and binary outcomes: issues and applications. Stat Med 28(13):1753–1773

    Article  MathSciNet  Google Scholar 

  • Thall PF, Cook JD (2004) Dose-finding based on efficacy-toxicity trade-offs. Biometrics 60(3):684–693

    Article  MathSciNet  MATH  Google Scholar 

  • Whitt W (1976) Bivariate distributions with given marginals. Ann Stat 4(6):1280–1289

    Article  MathSciNet  MATH  Google Scholar 

  • Yahav I, Shmueli G (2012) On generating multivariate Poisson data in management science applications. Appl Stoch Model Bus 28:91–102

    Article  MathSciNet  MATH  Google Scholar 

  • Yang Y, Zhao H, Heath AC, Madden PA, Martin NG, Nyholt DR (2016) Shared genetic factors underlie migraine and depression. Twin Res Hum Genet 19(4):341–350

    Article  Google Scholar 

  • Zhou Y, Whitehead J, Bonvini E, Stevens JW (2006) Bayesian decision procedures for binary and continuous bivariate dose-escalation studies. Pharm Stat 5(2):125–133

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank two anonymous referees for their constructive comments on an earlier version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergei Leonov.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix

Appendix

1.1 A.1. Proof of formulas (14), (15), Example 7

Let \(f_{-1}=0\) and note that the set \(\{f_i\}\) defined in (13) satisfies

$$\begin{aligned} 0 = f_{-1} \le f_0 \le f_1 \le f_2 \le \dots , ~~\lim _i f_i = 1. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathrm{E}(Y_1 Y_2)= & {} \mathrm{E}[G^{-1}(U) F^{-1}(U)] = \int _0^1 G^{-1}(u) F^{-1}(u) du\nonumber \\= & {} \sum _{i=0}^{\infty } \int _{f_{i-1}}^{f_i} G^{-1}(u) F^{-1}(u) du. \end{aligned}$$
(33)

If \(u\in (f_{i-1},f_i)\), then \(F^{-1}(u) \equiv i\) by definition of the inverse function. Thus, (33) implies that

$$\begin{aligned} \mathrm{E}(Y_1 Y_2) = \sum _{i=0}^{\infty } i~\int _{f_{i-1}}^{f_i} G^{-1}(u) du = \sum _{i=1}^{\infty } i~\int _{f_{i-1}}^{f_i} G^{-1}(u) du . \end{aligned}$$
(34)

Using the transformation of variables \(x=G^{-1}(u)\), and noting that \(du = g(x) dx\), one gets

$$\begin{aligned} \mathrm{E}(Y_1 Y_2) = \sum _{i=1}^{\infty } i \int _{G^{-1}(f_{i-1})}^{G^{-1}(f_i)} x g(x) dx, \end{aligned}$$

which implies (14).

To prove (15), remark that

$$\begin{aligned} \mathrm{E}(\bar{Y}_1 Y_2) = \sum _{i=1}^{\infty } i \int _{f_{i-1}}^{f_i} G^{-1}(1-u) du. \end{aligned}$$

If \(x=G^{-1}(1-u)\), then \(du = -g(x) dx\) and

$$\begin{aligned} \mathrm{E}(\bar{Y}_1 Y_2) = - \sum _{i=1}^{\infty } i \int _{G^{-1}(1-f_{i-1})}^{G^{-1}(1-f_i)} x g(x) dx = \sum _{i=1}^{\infty } i \int _{G^{-1}(1-f_i)}^{G^{-1}(1-f_{i-1})} x g(x) dx, \end{aligned}$$

which proves (15). Note that \(f_i\) is a non-decreasing function of i and, thus, \(G^{-1}(1-f_i) \le G^{-1}(1-f_{i-1})\).

1.2 A.2. Proof of formulas (16), (17), Example 8

To prove (16), we will need the following property of the standard normal distribution:

$$\begin{aligned} I_{a,b} = \int _a^b \varPhi ^{-1}(u) du = \phi [\varPhi ^{-1}(a)] - \phi [\varPhi ^{-1}(b)]. \end{aligned}$$
(35)

Indeed, make the transformation of variables, \(x = \varPhi ^{-1}(u)\), so that

$$\begin{aligned} u = \varPhi (x),~du = \phi (x) dx. \end{aligned}$$
(36)

Now it follows from (9) and (36) that

$$\begin{aligned} I_{a,b} = \int _{\varPhi ^{-1}(a)}^{\varPhi ^{-1}(b)} x \phi (x) dx = - \int _{\varPhi ^{-1}(a)}^{\varPhi ^{-1}(b)} d[\phi (x)] = \phi [\varPhi ^{-1}(a)] - \phi [\varPhi ^{-1}(b)], \end{aligned}$$

which proves (35).

To prove the first formula in (16), since \(\mathrm{E}(X_2)= 0\) and \(\mathrm{Var}(X_2) = 1\), it is sufficient to show that

$$\begin{aligned} I = \mathrm{E}[\varPhi ^{-1}(U) F^{-1}(U)] = \sum _{i=0}^{\infty } \phi [\varPhi ^{-1}(f_i)]. \end{aligned}$$
(37)

It follows from (34) that

$$\begin{aligned} I = \sum _{i=1}^{\infty } i~\int _{f_{i-1}}^{f_i} \varPhi ^{-1}(u) du , \end{aligned}$$

and, therefore, (35) implies:

$$\begin{aligned} I= & {} \sum _{i=1}^{\infty } i~\{\phi [\varPhi ^{-1}(f_{i-1})] - \phi [\varPhi ^{-1}(f_i)]\} \nonumber \\= & {} \phi [\varPhi ^{-1}(f_{0})] - \phi [\varPhi ^{-1}(f_1)] + 2 \{\phi [\varPhi ^{-1}(f_{1})] - \phi [\varPhi ^{-1}(f_2)]\} \nonumber \\&+\,3 \{\phi [\varPhi ^{-1}(f_{2})] - \phi [\varPhi ^{-1}(f_3)]\} + ~\dots ~= \sum _{i=0}^{\infty } \phi [\varPhi ^{-1}(f_i)], \end{aligned}$$
(38)

which proves (37) and thus proves (16).

Formula (17) also follows from the equalities in (38) since \(f_k = 1\) when \(X_1\) has a finite support with \(P_i=0\) for all \(i>k\).

1.3 A.3. Proof of formula (22)

We present the expectation \(\mathrm{E}(X)\) as

$$\begin{aligned} \mathrm{E}(X) = \int _0^1 F^{-1}_X(u) du = \int _0^{1-p} F^{-1}_X(u) du + \int _{1-p}^1 F^{-1}_X(u) du . \end{aligned}$$
(39)

If \(Y_1 = F^{-1}_X(u), ~Y_2 = Bern^{-1}_p(U)\) and \(Y_2^- = Bern^{-1}_{1-p}(1-U)\), then it follows from (20) that

$$\begin{aligned} \mathrm{E}(Y_1 Y_2) = \int _{1-p}^1 F^{-1}_X(u) du, ~\text{ and }~~\mathrm{corr}(Y_1,Y_2) = \frac{\mathrm{E}(Y_1 Y_2) -p\mathrm{E}(X)}{\sqrt{p(1-p)\mathrm{Var}(X)}}. \end{aligned}$$

On the other hand, \(Y_2^- = I(1-U > p) = I(U < 1- p)\) and it follows from (39) that \(\mathrm{E}(Y_1^- Y_2) = \mathrm{E}(X) - \mathrm{E}(Y_1 Y_2)\) and, consequently,

$$\begin{aligned} \mathrm{corr}(Y_1^-, Y_2)= & {} \frac{\mathrm{E}(X)-\mathrm{E}(Y_1 Y_2) - (1-p)\mathrm{E}(X)}{\sqrt{p(1-p)\mathrm{Var}(X)}} = \frac{-\mathrm{E}(Y_1 Y_2) + p\mathrm{E}(X)}{\sqrt{p(1-p)\mathrm{Var}(X)}} \\= & {} -\mathrm{corr}(Y_1,Y_2), \end{aligned}$$

which proves (22).

1.4 A.4. Example 15. Code for computing matrix {\(\pi _{ij}\)} in (27) and implementing formula (28)

Matlab code for (27), (28) (the text after % is a comment)

figure a

R code for (27), (28)

figure b

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leonov, S., Qaqish, B. Correlated endpoints: simulation, modeling, and extreme correlations. Stat Papers 61, 741–766 (2020). https://doi.org/10.1007/s00362-017-0960-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-017-0960-2

Keywords

Mathematics Subject Classification

Navigation