Abstract
Multivariate matched proportions (MMP) data appear in a variety of contexts including post-market surveillance of adverse events in pharmaceuticals, disease classification, and agreement between care providers. It consists of multiple sets of paired binary measurements taken on the same subject. While recent work proposes methods to address the complexities of MMP data, the issue of sparse response, where no or very few “yes” responses are recorded for one or more sets, is unaddressed. The presence of sparse response sets results in the underestimation of variance components, loss of coverage, and lowered power in existing methods. Bayesian methods, which have not previously been considered for MMP data, provide a useful framework when sparse responses are present. In particular, the Bayesian probit model in combination with mean model prior specifications provides an elegant solution to the problem of variance underestimation. We examine a multivariate probit-based approach using hierarchical horseshoe-like priors along with a Bayesian functional principal component analysis (FPCA) to model the latent covariance. We show that our approach performs well on MMP data with sparse responses and outperforms existing methods. In a re-examination of a study on the system of care (SOC) framework for children with mental and behavioral disorders, we are able to provide a more complete picture of the relationships in the data. Our analysis provides additional insights into the functioning on the SOC that a previous univariate analysis missed.
Similar content being viewed by others
References
Knutson KH, Meyer MJ, Thakrar N, Stein BD (2018) Care coordination for youth with mental health disorders in primary care. Clin Pediatr 57:5–10. https://doi.org/10.1177/0009922817733740
Klingenberg B, Agresti A (2006) Multivariate extension of McNemar’s test. Biometrics 62:921–928. https://doi.org/10.1111/j.1541-0420.2006.00525.x
McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Pyschometrika 12:153–157. https://doi.org/10.1007/BF02295996
Consonni G, La Rocca L (2008) Tests based on intrinsic priors for the equality of two correlated proportions. J Am Stat Assoc 103:1260–1269. https://doi.org/10.1198/01621450800000043
Saeki H, Tango T, Wang J (2017) Statistical inference for noninferiority of difference in proportions of clustered matched-pair data from multiple raters. J Biopharm Stat 27:70–83. https://doi.org/10.1080/10543406.2016.1148709
Westfall PH, Troendle JF, Pennello G (2010) Multiple McNemar tests. Biometrics 66:1185–1191. https://doi.org/10.1111/j.1541-0420.2010.01408.x
Xu J, Yu M (2013) Sample size determination and re-estimation for matched pair designs with multiple binary endpoints. Biom J 55:430–443. https://doi.org/10.1002/bimj.201100231
Lui K-J, Chang K-C (2013) Testing and estimation of proportion (or risk) ratio under the matched-pair design with multiple binary endpoints. Biom J 55:603–616. https://doi.org/10.1002/bimj.201200224
Lui K-J, Chang K-C (2016) Notes on testing noninferiority in multivariate binary data under the matched-pair design. Stat Methods Med Res 25:1272–1289. https://doi.org/10.1177/0962280213477022
Cochran WG (1950) The comparison of percentages in matched samples. Biometrika 37:256–266. https://doi.org/10.2307/2332378
Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719–748. https://doi.org/10.1093/jnci/22.4.719
Jiang Y, Xu J (2017) A comparative study of matched pair designs with two binary endpoints. Stat Methods Med Res 26:2526–2542. https://doi.org/10.1177/0962280215601136
Agresti A (2013) Categorical data analysis, 3rd edn. Wiley, Hoboken
Altham PME (1971) The analysis of matched proportions. Biometrika 58:561–576. https://doi.org/10.2307/2334391
Broemeling LD, Gregurich MA (1996) A Bayesian alternative to the analysis of matched categorical responses. Commun Stat 25:1429–1445. https://doi.org/10.1080/03610929608831777
Ghosh M, Chen M-H, Ghosh A, Agresti A (2000) Hierarchical Bayesian analysis of binary matched pairs data. Stat Sin 10:647–657
Albert J, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669–679. https://doi.org/10.1080/01621459.1993.10476321
Albert J, Chib S (1995) Bayesian residual analysis for binary response models. Biometrika 82:747–759. https://doi.org/10.1093/biomet/82.4.747
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge
Gelman A (2006) Prior distributions for variance parameters in hierarchical models. Bayesian Anal 1:513–533. https://doi.org/10.1214/06-BA117A
Carvalho CM, Polson NG, Scott JG (2010) The horseshoe estimator for sparse signals. Biometrika 97:465–480. https://doi.org/10.1093/biomet/asq017
Van der Linde A (2008) Variational Bayesian functional PCA. Comput Stat Data Anal 53:517–533. https://doi.org/10.1016/j.csda.2008.09.015
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Chapman and Hall-CRC, Boca Raton
Chib S, Greenberg E (1998) Analysis of multivariate probit models. Biometrika 85:347–361. https://doi.org/10.1093/biomet/85.2.347
Liu C (2001) Discussion. J Comput Graph Stat 10:75–81. https://doi.org/10.1198/10618600152418746
Zhang X, Boscardin WJ, Belin TR (2006) Sampling correlation matrices in Bayesian models with correlated latent variables. J Comput Graph Stat 15:880–896. https://doi.org/10.1198/106186006X160050
Webb EL, Forster JJ (2008) Bayesian model determination for multivariate ordinal and binary data. Comput Stat Data Anal 52:2632–2649. https://doi.org/10.1016/j.csda.2007.09.008
Goldsmith J, Kitago T (2016) Assessing systematic effects of stroke on motor control using hierarchical function-on-scalar regression. J R Stat Soc Ser C 65:215–236. https://doi.org/10.1111/rssc.12115
Meyer MJ, Morris JS, Gazes RP, Coull BA (2022) Ordinal probit functional outcome regression with application to computer-use behavior in rhesus monkeys. Ann Appl Stat 16:537–550. https://doi.org/10.1214/21-AOAS1513
Gupta AK, Nagar DK (2000) Matrix variate distributions, 2nd edn. Chapman & Hall/CRC, Boca Raton
Eilers PHC, Marx BD (1996) Flexible smoothing with b-splines and penalties. Stat Sci 11:89–121. https://doi.org/10.1214/ss/1038425655
Polson NG, Scott JG (2012) On the half-Cauchy prior for a global scale parameter. Bayesian Anal 7:887–902. https://doi.org/10.1214/12-BA730
Wand MP, Ormerod JT, Padoan SA, Frühwirth R (2011) Mean field variational bayes for elaborate distributions. Bayesian Anal 6:847–900. https://doi.org/10.1214/11-BA631
Agresti A, Coull BA (1998) Approximate is better than ‘exact’ for interval estimation of binomial proportions. Am Stat 52:119–126. https://doi.org/10.1080/00031305.1998.10480550
Agresti A, Caffo B (2000) Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat 54:280–288. https://doi.org/10.1080/00031305.2000.10474560
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–511. https://doi.org/10.1214/ss/1177011136
Costello EJ, He J-P, Sampson NA, Kessler RC, Merikangas KR (2014) Services for adolescents with psychiatric disorders: 12-month data from the National Comorbidity Survey-Adolescent. Psychiatr Serv 65:359–366. https://doi.org/10.1176/appi.ps.201100518
Acknowledgements
Partial funding for this work was provided by internal Georgetown University Summer Academic Research Grants.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no potential or actually conflicts of interest to declare. No additional data were collected for this research. The original study was retrospective in nature and was approved by the Boston University Institutional Review Board [1].
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Meyer, M.J., Cheng, H. & Knutson, K.H. Bayesian Analysis of Multivariate Matched Proportions with Sparse Response. Stat Biosci 15, 490–509 (2023). https://doi.org/10.1007/s12561-023-09368-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-023-09368-8