Abstract
Modeling and simulation of correlated random variables are important for evaluating operating characteristics of experimental designs in various applications, of which clinical trials with multiple endpoints provide an important example. There exist efficient algorithms to address the problem of generating multivariate distributions with given marginals and correlation structure. For model fitting as well as for simulation, it is important to know the feasible range of pairwise correlations, which can be much narrower than the interval \([-\,1,+\,1]\). We provide closed-form expressions for extreme correlations for several classes of bivariate distributions that involve both discrete and continuous endpoints, as well as an algorithm for the construction of such distributions in the discrete case.
Similar content being viewed by others
References
Abramowitz M, Stegun IA (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables. National Bureau of Standard, Applied Mathematics Series, v. 55, Washington, DC
Bai TR, Vonk JM, Postma DS, Boezen HM (2007) Severe exacerbations predict excess lung function decline in asthma. Eur Respir J 30:452–456
Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Dept. of Industrial Engineering and Management Sciences, Northwestern University, Evanston. http://users.iems.northwestern.edu/~nelsonb/norta4.ps
Chaganty NR, Joe H (2006) Range of correlation matrices for dependent Bernoulli random variables. Biometrika 93(1):197–206
Demirtas H, Hedeker D (2011) A practical way for computing approximate lower and upper correlation bounds. Am Stat 65(2):104–109
De Veaux D (1976) Tight upper and lower bounds for correlation of bivariate distributions arising in air pollution models. Techical Report 5, Dept. of Statistics, Stanford University. https://statistics.stanford.edu/sites/default/files/SIMS%2005.pdf
Dukic VM, Marić N (2013) Minimum correlation in construction of multivariate distributions. Phys Rev E 87(3):032114
Devroye L (1986) Non-uniform random variate generation. Springer, New York
Emrich LJ, Piedmonte MR (1991) A method for generating high-dimensional multivariate binary variates. Am Stat 45(4):302–304
Farrell PJ, Rogers-Stewart K (2008) Methods for generating longitudinally correlated binary data. Int Stat Rev 76:28–38
Farrell PJ, Sutradhar BC (2006) A non-linear conditional probability model for generating correlated binary data. Stat Probab Lett 76:353–361
Fedorov VV, Leonov SL (2013) Optimal design for nonlinear response models. CRC Biostatistics Series. Chapman & Hall, Boca Raton
Fedorov V, Wu Y, Zhang R (2012) Optimal dose-finding designs with correlated continuous and discrete responses. Stat Med 31(3):217–234
Ghosh S, Henderson SG (2003) Behavior of the NORTA method for correlated random vector generation as the dimension increases. ACM T Model Comput S (TOMACS) 13(3):276–294
Gradstein M (1986) Maximal correlation between normal and dichotomous variables. J Educ Stat 11(4):259–261
Ivanova A (2003) A new dose-finding design for bivariate outcomes. Biometrics 59(4):1001–1007
Jokela M, Berg V, Silventoinen K, Batty GD, Singh-Manoux A, Kaprio J, Davey Smith G, Kivimäki M (2016) Body mass index and depressive symptoms: testing for adverse and protective associations in two twin cohort studies. Twin Res Hum Genet 19(4):306–311
Kang SH, Jung SH (2001) Generating correlated binary variables with complete specification of the joint distribution. Biom J 43:263–269
Konietschke F, Pauly M (2014) Bootstrapping and permuting paired t-test type statistics. Stat Comput 24:283–296
Krummenauer F (1998) Efficient simulation of multivariate binomial and Poisson distributions. Biom J 40(7):823–832
Lee AJ (1993) Generating random binary deviates having fixed marginal distributions and specified degrees of association. Am Stat 47(3):209–215
Lewin L (1958) Dilogarithms and associated functions. Macdonald, London
Long D, Krzysztofowics R (1995) A family of bivariate densities constructed from marginals. J Am Stat Assoc 90(430):739–746
Lunn AD, Davies SJ (1998) A note on generating correlated binary variables. Biometrika 85:487–490
Makris D, Moschandreas J, Damianaki A, Ntaoukakis E, Siafakas NM, Milic Emili J, Tzanakis N (2007) Exacerbations and lung function decline in COPD: new insights in current and ex-smokers. Resp Med 101:1305–1312
Michael JR, Schucany WR (2002) The mixture approach for simulating bivariate distributions with specified correlations. Am Stat 56(1):48–54
Olkin I, Tate RF (1961) Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 32(2):448–465
Oman SD, Zucker DM (2001) Modelling and generating correlated binary variables. Biometrika 88:287–290
Park CG, Park T, Shin DW (1996) A simple method for generating correlated binary variates. Am Stat 50(4):306–310
Prentice RL (1988) Correlated binary regression with covariates specific to each binary observation. Biometrics 44(4):1033–1048
Qaqish BF (2003) A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90:455–463
Qaqish BF, Ivanova A (2006) Multivariate logistic models. Biometrika 93(4):1011–1017
Shih WJ, Huang W-M (1992) Evaluating correlation with proper bounds. Biometrics 48(4):1207–1213
Shin K, Pasupathy R (2010) An algorithm for fast generation of bivariate Poisson random vectors. INFORMS J Comput 22(1):81–92
Sutradhar BC (2011) Dynamic mixed models for familial longitudinal data. Springer, New York
Sutradhar BC, Farrell PJ (2007) On optimal lag 1 dependence estimation for dynamic binary models with application to asthma data. Sankhya 69(3):448–467
Tate RF (1954) Correlation between a discrete and a continuous variable. Point-biserial correlation. Ann Math Stat 25(3):603–607
Teixeira-Pinto A, Normand SL (2009) Correlated bivariate continuous and binary outcomes: issues and applications. Stat Med 28(13):1753–1773
Thall PF, Cook JD (2004) Dose-finding based on efficacy-toxicity trade-offs. Biometrics 60(3):684–693
Whitt W (1976) Bivariate distributions with given marginals. Ann Stat 4(6):1280–1289
Yahav I, Shmueli G (2012) On generating multivariate Poisson data in management science applications. Appl Stoch Model Bus 28:91–102
Yang Y, Zhao H, Heath AC, Madden PA, Martin NG, Nyholt DR (2016) Shared genetic factors underlie migraine and depression. Twin Res Hum Genet 19(4):341–350
Zhou Y, Whitehead J, Bonvini E, Stevens JW (2006) Bayesian decision procedures for binary and continuous bivariate dose-escalation studies. Pharm Stat 5(2):125–133
Acknowledgements
The authors wish to thank two anonymous referees for their constructive comments on an earlier version of the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Appendix
Appendix
1.1 A.1. Proof of formulas (14), (15), Example 7
Let \(f_{-1}=0\) and note that the set \(\{f_i\}\) defined in (13) satisfies
Therefore,
If \(u\in (f_{i-1},f_i)\), then \(F^{-1}(u) \equiv i\) by definition of the inverse function. Thus, (33) implies that
Using the transformation of variables \(x=G^{-1}(u)\), and noting that \(du = g(x) dx\), one gets
which implies (14).
To prove (15), remark that
If \(x=G^{-1}(1-u)\), then \(du = -g(x) dx\) and
which proves (15). Note that \(f_i\) is a non-decreasing function of i and, thus, \(G^{-1}(1-f_i) \le G^{-1}(1-f_{i-1})\).
1.2 A.2. Proof of formulas (16), (17), Example 8
To prove (16), we will need the following property of the standard normal distribution:
Indeed, make the transformation of variables, \(x = \varPhi ^{-1}(u)\), so that
Now it follows from (9) and (36) that
which proves (35).
To prove the first formula in (16), since \(\mathrm{E}(X_2)= 0\) and \(\mathrm{Var}(X_2) = 1\), it is sufficient to show that
It follows from (34) that
and, therefore, (35) implies:
which proves (37) and thus proves (16).
Formula (17) also follows from the equalities in (38) since \(f_k = 1\) when \(X_1\) has a finite support with \(P_i=0\) for all \(i>k\).
1.3 A.3. Proof of formula (22)
We present the expectation \(\mathrm{E}(X)\) as
If \(Y_1 = F^{-1}_X(u), ~Y_2 = Bern^{-1}_p(U)\) and \(Y_2^- = Bern^{-1}_{1-p}(1-U)\), then it follows from (20) that
On the other hand, \(Y_2^- = I(1-U > p) = I(U < 1- p)\) and it follows from (39) that \(\mathrm{E}(Y_1^- Y_2) = \mathrm{E}(X) - \mathrm{E}(Y_1 Y_2)\) and, consequently,
which proves (22).
1.4 A.4. Example 15. Code for computing matrix {\(\pi _{ij}\)} in (27) and implementing formula (28)
Rights and permissions
About this article
Cite this article
Leonov, S., Qaqish, B. Correlated endpoints: simulation, modeling, and extreme correlations. Stat Papers 61, 741–766 (2020). https://doi.org/10.1007/s00362-017-0960-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0960-2
Keywords
- Correlated endpoints
- Marginal distribution
- Maximal correlation
- Extreme correlation
- Pearson correlation
- Spearman correlation