Bayesian analysis of longitudinal studies with treatment by indication

Mozer, Reagan; Glickman, Mark E.

doi:10.1007/s10742-022-00295-7

Bayesian analysis of longitudinal studies with treatment by indication

Published: 15 December 2022

Volume 23, pages 468–491, (2023)
Cite this article

Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

86 Accesses
1 Citation
Explore all metrics

Abstract

In a medical setting, observational studies commonly involve patients who initiate a particular treatment (e.g., medication therapy) and others who do not, and the goal is to draw causal inferences about the effect of treatment on a time-to-event outcome. A difficulty with such studies is that the notion of a treatment initiation time is not well-defined for the control group. In this paper, we propose a Bayesian approach to estimate treatment effects in longitudinal observational studies where treatment is given by indication and thereby the exact timing of treatment is only observed for treated units. We present a framework for conceptualizing an underlying randomized experiment in this setting based on separating the time of indication for treatment, which we model using a latent state-space process, from the mechanism that determines assignment to treatment versus control. Next, we develop a two-step inferential approach that uses Markov Chain Monte Carlo (MCMC) posterior sampling to (1) infer the unobserved indication times for units in the control group, and (2) estimate treatment effects based on inferential conclusions from Step 1. This approach allows us to incorporate uncertainty about the unobserved indication times which induces uncertainty in both the selection of the control group and the measurement of time-to-event outcomes for these controls. We demonstrate our approach to study the effects on mortality of inappropriately prescribing phosphodiesterase type 5 inhibitors (PDE5Is), a medication contraindicated for certain types of pulmonary hypertension, using data from the Veterans Affairs (VA) health care system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating Effects of Dynamic Treatment Strategies in Pharmacoepidemiologic Studies with Time-Varying Confounding: a Primer

Article 17 October 2017

Causal survival analysis under competing risks using longitudinal modified treatment policies

Article 24 August 2023

Semiparametric temporal process regression of survival-out-of-hospital

Article 23 May 2018

Data availability

All data and replication materials for the simulation study described in Sect. 3.4 are available at https://github.com/reaganmozer/longbayes. However, all data generated and/or analyzed in Sect. 5 are constructed from confidential patient-level data, which cannot be made available due to restrictions set forth in the data use agreement signed by the authors.

Code availability

Pertinent source code related to the findings in Sect. 5 (without the data) is available from the authors upon reasonable request.

References

Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Article Google Scholar
Angrist, J.D., Imbens, G.W., Rubin, D.B.: Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91(434), 444–455 (1996)
Article Google Scholar
Basse, G.W., Volfovsky, A., Airoldi, E.M.: Observational studies with unknown time of treatment. arXiv preprint arXiv:1601.04083 (2016)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Breiman, L., Cutler, A.: Manual for Setting Up, Using, and Understanding Random Forests. (2003) https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf
Byar, D.P.: Why data bases should not replace randomized clinical trials. Biometrics 36(2), 337–342 (1980)
Article CAS PubMed Google Scholar
Carlin, B.P., Polson, N.G.: Monte Carlo Bayesian methods for discrete regression models and categorical time series. Bayesian Stat. 4, 577–586 (1992)
Google Scholar
Danaei, G., Rodríguez, L.A.G., Cantero, O.F., Logan, R., Hernán, M.A.: Observational data for comparative effectiveness research: an emulation of randomised trials of statins and primary prevention of coronary heart disease. Stat. Methods Med. Res. 22(1), 70–96 (2013)
Article PubMed Google Scholar
Denison, D.G., Holmes, C.C., Mallick, B.K., Smith, A.F.: Bayesian Methods for Nonlinear Classification and Regression, vol. 386. Wiley, New York (2002)
Google Scholar
Freiman, M.R., Rose, A.J., Powell, R.W., Miller, D.R., Wiener, R.S.: Patterns of potentially inappropriate prescribing of phosphodiesterase inhibitors in pulmonary hypertension in the va. In: C13. Accounting for costs and resource utilization in respiratory health, pp. A3889–A3889. American Thoracic Society (2015)
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis, vol. 2. Chapman & Hall/CRC, Boca Raton (2014)
Google Scholar
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4), 457–472 (1992)
Article Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)
Article CAS PubMed Google Scholar
Geweke, J.: Evaluating the accuracy of sampling-based approaches to the calculations of posterior moments. Bayesian Stat. 4, 641–649 (1992)
Google Scholar
Hernán, M.A., Alonso, A., Logan, R., Grodstein, F., Michels, K.B., Stampfer, M.J., Willett, W.C., Manson, J.E., Robins, J.M.: Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 19(6), 766 (2008)
Article PubMed PubMed Central Google Scholar
Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986)
Article Google Scholar
Kim, D., Lee, K.M., Freiman, M.R., Powell, W.R., Klings, E.S., Rinne, S.T., Miller, D.R., Rose, A.J., Wiener, R.S.: Phosphodiesterase-5 inhibitor therapy for pulmonary hypertension in the united states. Actual versus recommended use. Ann. Am. Thorac. Soc. 15(6), 693–701 (2018)
Article PubMed Google Scholar
Levine, M.N., Julian, J.A.: Registries that show efficacy: good, but not good enough. J. Clin. Oncol. 26(33), 5316–5319 (2008)
Article PubMed Google Scholar
Li, Y.P., Propert, K.J., Rosenbaum, P.R.: Balanced risk set matching. J. Am. Stat. Assoc. 96(455), 870–882 (2001)
Article Google Scholar
Mamdani, M., Rochon, P.A., Juurlink, D.N., Kopp, A., Anderson, G.M., Naglie, G., Austin, P.C., Laupacis, A.: Observational study of upper gastrointestinal haemorrhage in elderly patients given selective cyclo-oxygenase-2 inhibitors or conventional non-steroidal anti-inflammatory drugs. BMJ 325(7365), 624 (2002)
Article CAS PubMed PubMed Central Google Scholar
McGettigan, P., Henry, D.: Cardiovascular risk and inhibition of cyclooxygenase a systematic review of the observational studies of selective and nonselective inhibitors of cyclooxygenase 2. JAMA 296(13), 1633–1644 (2006)
Article CAS PubMed Google Scholar
Plummer, M., et al.: Jags: a program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing, vol. 124, pp. 1–10. Vienna, Austria (2003)
Poses, R.M., Smith, W.R., McClish, D.K., Anthony, M.: Controlling for confounding by indication for treatment: are administrative data equivalent to clinical data? Medical Care, pp. AS36–AS46 (1995)
Robins, J.M., Hernán, M.A., Brumback, B.: Marginal structural models and causal inference in epidemiology. Epidemiology 11(5), 551 (2000)
Article Google Scholar
Rubin, D.B.: Matching to remove bias in observational studies. Biometrics 29, 159–183 (1973)
Article Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article Google Scholar
Rubin, D.B.: Randomization analysis of experimental data: the fisher randomization test comment. J. Am. Stat. Assoc. 75(371), 591–593 (1980)
Google Scholar
Rubin, D.B.: William G. Cochran’s contributions to the design, analysis, and evaluation of observational studies. In: Rao, S., Sedransk, J. (eds.) Research Work of William G. Cochran, pp. 37–69. Wiley, New York (1984)
Google Scholar
Rubin, D.B.: The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Stat. Med. 26(1), 20–36 (2007)
Article PubMed Google Scholar
Rubin, D.B.: On the limitations of comparative effectiveness research. Stat. Med. 29(19), 1991–1995 (2010)
Article PubMed Google Scholar
Slaughter, J.L., Reagan, P.B., Newman, T.B., Klebanoff, M.A.: Comparative effectiveness of nonsteroidal anti-inflammatory drug treatment vs no treatment for patent ductus arteriosus in preterm infants. JAMA Pediatr. 171(3), e164354–e164354 (2017)
Article PubMed PubMed Central Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 64(4), 583–639 (2002)
Article Google Scholar
van Houwelingen, H., Putter, H.: Dynamic Prediction in Clinical Survival Analysis. CRC Press, Boca Raton (2011)
Book Google Scholar
Watson, D., Spaulding, A.B., Dreyfus, J.: Risk-set matching to assess the impact of hospital-acquired bloodstream infections. Am. J. Epidemiol. 188(2), 461–466 (2019)
Article PubMed Google Scholar
Yoshida, K., Solomon, D.H., Kim, S.C.: Active-comparator design and new-user design in observational studies. Nat. Rev. Rheumatol. 11(7), 437–441 (2015)
Article CAS PubMed PubMed Central Google Scholar
Zhou, H., Hanson, T.: A unified framework for fitting Bayesian semiparametric models to arbitrarily censored survival data, including spatially referenced data. J. Am. Stat. Assoc. 113(522), 571–581 (2018)
Article CAS Google Scholar
Zhou, H., Hanson, T., Zhang, J.: spbayessurv: Fitting Bayesian spatial survival models using r. J. Stat. Softw. 92(1), 1–33 (2020)
Google Scholar
Zhou, Z., Rahme, E., Abrahamowicz, M., Pilote, L.: Survival bias associated with time-to-treatment initiation in drug effectiveness evaluation: a comparison of methods. Am. J. Epidemiol. 162(10), 1016–1023 (2005)
Article PubMed Google Scholar

Download references

Funding

This work was supported in part by the United States Department of Veterans Affairs (VA) Health Services Research and Development (IIR 15-115).

Author information

Authors and Affiliations

Department of Mathematical Sciences, Bentley University, 175 Forest St., Waltham, MA, 02452, USA
Reagan Mozer
Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, MA, 02138, USA
Mark E. Glickman

Authors

Reagan Mozer
View author publications
You can also search for this author in PubMed Google Scholar
Mark E. Glickman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reagan Mozer.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Posterior sampling

1.1 General sampling scheme

We employ a Gibbs sampler to draw posterior samples from the model described in Sect. 3. In each iteration of the Gibbs sampler, we draw the missing indication times T for units with $M=1$ from the conditional posterior predictive distribution of T given covariates X and the current draw of the parameter $\theta$. The completed indication times can then be used to classify untreated patients into distinct groups of true controls and ineligible controls based on eligibility S, where the true control group consists of patients with $M_i = 1$ and $S_i=1$. For the true controls, we can then calculate values for the potential outcomes Y(0) given the generated values of T. These values are then regarded as the observed potential outcomes, $Y^{obs}=(1-Z)Y(0) + ZY(1)$. We can then update the parameters $\theta _1, \theta _2$ and $\theta _3$ by drawing from the conditional posterior distribution with density function $p(\theta _1,\theta _2,\theta _3|Y^{obs},T,M,X)$.

Posterior inference on the causal effects of interest can be obtained by computing the values of the constructed estimator within each MCMC iteration and summarizing their distribution across the posterior sample. Thus, in each iteration, we can construct a dataset consisting of the observed indication times, the simulated indication times, and all observed potential outcomes, and use these completed data to calculate an estimate of the treatment effect. Alternatively, we could specify a joint distribution for the potential outcomes $Y = (Y(0), Y(1))$ that we could then use to impute the missing potential outcomes $Y^{mis}$ in each iteration by drawing from the conditional distribution with density function $p(Y^{mis}|Y^{obs}, T, X, \theta$). Repeating this process over many such simulated datasets produces the approximate posterior distribution for all causal effects of interest. In the same way, posterior samples of $\theta$ can provide posterior estimates of the parameters that characterize the data-generating process; this is described in greater detail in Sect. 4.

1.2 Full conditionals

For the model described in Sect. 4.1, we employ the Gibbs sampler as the posterior sampling strategy. Using a data augmentation approach, we also let $\Psi _{it}=\theta _{it}+{\varvec{X}}_{it}\varvec{\beta } + \nu _{it}$ where $\nu _{it}\sim N(0,1)$ such that the indication times $T_i$ can be represented as $T_i=\inf \{t\in [0,K]:\Psi _{it} >0 \}$. Note that the latent variables $\Psi _{it}$ are conditionally independent across units i and over t given $\theta _{it}$ and $\varvec{\beta }$.

To begin, we set $j=0$ and draw initial values for the parameters $\Theta =(\rho , \varvec{\beta }, \delta _0, \delta _1)$ and the latent variables $\theta _{i,1:K}$ and $\Psi _{i,1:K}$ for all $i=1,\ldots ,n$. The latent variables $\Psi _{i,1:K}$ are then used to determine the initial values for the missing indication times $T_i$. For each iteration, we then proceed as follows:

(a)
Draw the latent variables $\Psi _{it}$ from the full conditional distribution given below, which is either an unrestricted normal density (when $T>t$) or a truncate normal density on the interval $(-\inf , 0]$ when $T<t$ or $(0,\inf )$ when $T=t$.
$$\begin{aligned} p(\Psi _{1:K}|\cdot ) = \prod _{i=1}^N \prod _{t=1}^K 1(\Psi _{it}>0)^{T_i=t} 1(\Psi _{it}\le 0)^{T_i < t} \phi (\Psi _{it} - \theta _{it}+{\varvec{X}}_{it}\varvec{\beta }) \end{aligned}$$
where $\phi (\cdot )$ denotes the probability density function of the standard normal distribution.
(b)
Since Eqs. (5) and (6) define a linear state space model, we sample $\theta _{i,1:K}$ using the Kalman filter. The forward conditional is given by:
$$\begin{aligned} p(\theta _{1:K}|\cdot ) = \prod _{i=1}^N\prod _{t=1}^K \phi (\theta _{i,t}-\rho \theta _{i,t-1}) \end{aligned}$$
where $\phi (\cdot )$ denotes the probability density function of the standard normal distribution
(c)
Next, draw the probability of assignment to treatment upon indication from the conditional distribution given by:
$$\begin{aligned} p(\pi |\cdot ) = \prod _{i=1}^N \prod _{t=1}^K \left( \pi _{it}^{Z_i} (1-\pi _{it})^{1-Z_i}\right) ^{1(T_i=t)} \end{aligned}$$
(d)
Assuming a general multivariate normal prior for $\varvec{\beta }$ of the form $\varvec{\beta }\sim N(\varvec{\beta }_0,\Sigma _0)$, draw $\varvec{\beta }$ from the multivariate normal distribution $\varvec{\beta }|\cdot \sim N_p(\varvec{\beta }_1, \Sigma _1)$ where
$$\begin{aligned} \varvec{\beta }_1=\Sigma _1\left( \Sigma _0^{-1}\varvec{\beta }_0 + \sum _{i=1}^N\sum _{t=1}^K {\varvec{X}}_{it}(\Psi _{it}-\theta _{it})\right) , \end{aligned}$$
and
$$\begin{aligned} \Sigma _1 = \left( \Sigma _0^{-1}+\sum _{i=1}^N \sum _{t=1}^K {\varvec{X}}_{it}{\varvec{X}}_{it}^T\right) ^{-1} \end{aligned}$$
(e)
Draw the autocorrelation parameter $\rho$ from the full conditional distribution given by the truncated normal distribution
$$\begin{aligned} \rho | \cdot \sim N_{[-1,1]}\left( \frac{\sum _{i=1}^N\sum _{t=1}^K \theta _{i,t}\theta _{i,t-1}}{\sum _{i=1}^N\sum _{t=1}^K \theta _{i,t-1}^2}, \frac{1}{\sum _{i=1}^N\sum _{t=1}^K \theta _{i,t-1}^2}\right) \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mozer, R., Glickman, M.E. Bayesian analysis of longitudinal studies with treatment by indication. Health Serv Outcomes Res Method 23, 468–491 (2023). https://doi.org/10.1007/s10742-022-00295-7

Download citation

Received: 19 January 2022
Accepted: 20 November 2022
Published: 15 December 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10742-022-00295-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian analysis of longitudinal studies with treatment by indication

Abstract

Access this article

Similar content being viewed by others

Estimating Effects of Dynamic Treatment Strategies in Pharmacoepidemiologic Studies with Time-Varying Confounding: a Primer

Causal survival analysis under competing risks using longitudinal modified treatment policies

Semiparametric temporal process regression of survival-out-of-hospital

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix A Posterior sampling

1.1 General sampling scheme

1.2 Full conditionals

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian analysis of longitudinal studies with treatment by indication

Abstract

Access this article

Similar content being viewed by others

Estimating Effects of Dynamic Treatment Strategies in Pharmacoepidemiologic Studies with Time-Varying Confounding: a Primer

Causal survival analysis under competing risks using longitudinal modified treatment policies

Semiparametric temporal process regression of survival-out-of-hospital

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix A Posterior sampling

Appendix A Posterior sampling

1.1 General sampling scheme

1.2 Full conditionals

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation