Bayesian pattern-mixture models for dropout and intermittently missing data in longitudinal data analysis

Blozis, Shelley A.

doi:10.3758/s13428-023-02128-y

Bayesian pattern-mixture models for dropout and intermittently missing data in longitudinal data analysis

Open access
Published: 23 May 2023

Volume 56, pages 1953–1967, (2024)
Cite this article

Download PDF

You have full access to this open access article

Behavior Research Methods Aims and scope Submit manuscript

Bayesian pattern-mixture models for dropout and intermittently missing data in longitudinal data analysis

Download PDF

Shelley A. Blozis ORCID: orcid.org/0000-0002-8272-3258¹

1425 Accesses
1 Altmetric
Explore all metrics

Abstract

Valid inference can be drawn from a random-effects model for repeated measures that are incomplete if whether the data are missing or not, known as missingness, is independent of the missing data. Data that are missing completely at random or missing at random are two data types for which missingness is ignorable. Given ignorable missingness, statistical inference can proceed without addressing the source of the missing data in the model. If the missingness is not ignorable, however, recommendations are to fit multiple models that represent different plausible explanations of the missing data. A popular choice in methods for evaluating nonignorable missingness is a random-effects pattern-mixture model that extends a random-effects model to include one or more between-subjects variables that represent fixed patterns of missing data. Generally straightforward to implement, a fixed pattern-mixture model is one among several options for assessing nonignorable missingness, and when it is used as the sole model to address nonignorable missingness, understanding the impact of missingness is greatly limited. This paper considers alternatives to a fixed pattern-mixture model for nonignorable missingness that are generally straightforward to fit and encourage researchers to give greater attention to the possible impact of nonignorable missingness in longitudinal data analysis. Patterns of both monotonic and non-monotonic (intermittently) missing data are addressed. Empirical longitudinal psychiatric data are used to illustrate the models. A small Monte Carlo data simulation study is presented to help illustrate the utility of such methods.

Estimating psychological networks and their accuracy: A tutorial paper

Article Open access 24 March 2017

Fixed and random effects models: making an informed choice

Article Open access 07 August 2018

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Psychological and behavioural data are gathered at multiple points in time to study how variables change or develop. Despite efforts to obtain complete data in a longitudinal study, missing data can sometimes be unavoidable. Further, measures can be taken at different times for different subjects, making the data analysis complex. As random-effects models naturally allow for responses to be observed at different points in time between subjects, the models naturally handle missing response data. If some values are missing, valid inference depends on whether the missingness (i.e., whether or not the data are missing) is independent of the missing response (Laird, 1988). If independent, the missingness is ignorable. Missingness is not ignorable, however, if the missingness is related to the missing data, even if after conditioning on model covariates (Little & Rubin, 2002).

A potentially serious problem with nonignorable missingness is that model inference can be biased, making it essential to address the missingness (Little & Rubin, 2002). For example, if the data for participants who drop from a study tend to differ from those who completed the study, then accounting for subject attrition is informative when modelling the longitudinal data. A problem in evaluating the mechanism giving rise to missing data, however, is that any model applied to empirical data is sensitive to unverifiable assumptions. Indeed, when drawing inference from a random-effects model, certainty about whether the missingness is ignorable or not is problematic because the analyst has access to only the observed data, but ignorable missingness involves the missing data. Further, the fit of a given model to data that are not complete is based on how well the model fits the observed data and not the unobserved data, creating a challenge in efforts to assess possible nonignorable missingness. For these reasons, recommendations include fitting multiple models that represent plausible explanations of the missing data in a given application (Molenberghs & Kenward, 2007). Importantly, the fit of any model is not testable given that only observed data are available for analysis. Inference may proceed by making comparisons between models that differ in their assumptions about the missing data process, while assessing the sensitivity of inference about the longitudinal process under different but plausible mechanisms for the missing data.

Missingness in longitudinal data

Several approaches are available for addressing missingness in longitudinal data, and before describing some of the major frameworks for this, it is useful to introduce notation for the data model of a longitudinal outcome and a separate model for non-response. Consider longitudinal data for a normal outcome variable Y_i = (Y_i1, …, Y_in)^′, where measures for all i = 1, …, N individuals are planned for n occasions. Interest in Y_i often concerns how the response depends on time and possibly covariates that may vary by occasion or the individual, and thus, a data model is generated based on these considerations. Letting X_i be an n × p matrix that contains study design information (e.g., measures of time when Y_i was observed and covariates), the multivariate density of Y_i is conditional on X_i and a set of unknown parameters γ_y that link Y_i to X_i: f(Y_i| X_i, γ_y). Primary interest generally lies in the inference about the elements contained in γ_y. In a longitudinal study, it is common for measures of the outcome variable to have patterns of incomplete data that vary between individuals, and so a separate model for non-response can be specified to clarify those patterns. Let R_i = (R_1i, …, R_ni)^′ be a set of variables for individual i that indicates missingness in the outcome variable at each occasion t, t = 1, …, n, where R_ti = 1 if Y_ti is observed, and R_ti = 0 if Y_ti is missing. In a given problem, several factors could affect non-response, including the outcome Y_i and covariates. Similar to the data model for Y_i, a multivariate density of R_i is defined: f(R_i| Y_i, X_i, γ_r), where γ_r is a set of unknown parameters that links indicators of non-response to the longitudinal response and covariates. Assuming incomplete longitudinal data, let Y_i now be a full data vector that is comprised of a set of observed values Y_oi and missing values Y_mi, where the number of missing and complete values can vary between individuals. Taken together, the joint density of Y_i and R_i is

$$f\left({\boldsymbol{Y}}_i,{\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}},{\boldsymbol{\gamma}}_{\textrm{r}}\right).$$

(1)

Rubin (1976) provided a framework for three types of missing data mechanisms, namely missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Under MCAR, the missingness is independent of the observed and missing values of Y_i. Under MAR, the missingness is dependent on Y_oi but is independent of Y_mi. Under MNAR, the missingness is dependent on Y_mi, whether or not it is dependent on Y_oi. These mechanisms can be understood by factorization of the joint density in (1). To that end, factorizations of the joint density based on three major modelling frameworks for missing data are reviewed first before returning to add further clarification to the three missing data mechanisms.

Modelling frameworks for missing data

Three major modelling frameworks for missing data are the selection model, pattern-mixture model and shared parameter model, each distinguished by their factorization of the joint density in (1). For the selection model (cf. Heckman, 1976),

$$f\left({\boldsymbol{Y}}_i,{\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}},{\boldsymbol{\gamma}}_{\textrm{r}}\right)=f\left({\boldsymbol{Y}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}}\right)f\left({\boldsymbol{R}}_i|{\boldsymbol{Y}}_i,{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{r}}\right),$$

where the first factor is the marginal density of the longitudinal process that depends on covariates, and the second is the marginal density of the missingness process that is conditional on the longitudinal response and covariates. For the pattern-mixture model (cf. Little, 1993, 1994, Little, 1995),

$$f\left({\boldsymbol{Y}}_i,{\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}},{\boldsymbol{\gamma}}_{\textrm{r}}\right)=f\left({\boldsymbol{Y}}_i|{\boldsymbol{R}}_i,{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}}\right)f\left({\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{r}}\right),$$

where the first factor specifies that the marginal density of the longitudinal outcome depends on indicators of missingness and covariates. The second factor specifies that the missingness depends on covariates but not on the longitudinal response. For the shared-parameter model (cf. Wu & Carroll, 1988; Wu & Bailey, 1988, 1989),

$$f\left({\boldsymbol{Y}}_i,{\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}},{\boldsymbol{\gamma}}_{\textrm{r}},{\boldsymbol{b}}_i\right)=f\left({\boldsymbol{Y}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}},{\boldsymbol{b}}_i\right)f\left({\boldsymbol{R}}_i|{\boldsymbol{Y}}_i,{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{r}},{\boldsymbol{b}}_i\right),$$

where the first factor is the marginal density of Y_i that depends on X_i and random effect b_i, and the second is the marginal density of the missingness process that is conditional on Y_i, X_i and random effect b_i. Random effects contained in b_i vary by individual, such as a random intercept or random slope of a linear growth model for Y_i. Clearly, the factorization of the shared parameter model is based on the selection model with the addition that both factors share the random effect b_i. For example, a data model based on a random-effects growth model that includes a random intercept and random slope could specify that the missingness also depends on these two random effects.

Returning to Rubin’s (1976) framework for the three missing data mechanisms, the mechanisms can be clarified using the selection model framework, though it is noted that the missing data mechanisms are not dependent on a particular framework. Under MCAR, the joint density of Y_i and R_i can be specified as

$$f\left({\boldsymbol{Y}}_i,{\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}},{\boldsymbol{\gamma}}_{\textrm{r}}\right)=f\left({\boldsymbol{Y}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}}\right)f\left({\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{r}}\right).$$

The implication of MCAR is that valid inference from the data model can be made independent of the missing data process. Under MAR, the full data vector Y_i is partitioned into its observed Y_oi and missing Y_mi components, and then the joint density is

$$f\left({\boldsymbol{Y}}_i,{\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}},{\boldsymbol{\gamma}}_{\textrm{r}}\right)=f\left({\boldsymbol{Y}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}}\right)f\left({\boldsymbol{R}}_i|{\boldsymbol{Y}}_{\textrm{o}i},{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{r}}\right),$$

such that the missing data process depends on observed values of the outcome variable but not those that are missing. The implication of MAR is that valid inference of the longitudinal process can be made independent of the missing data process provided that all available data are analysed. Finally, under MNAR, the joint density is

$$f\left({\boldsymbol{Y}}_i,{\boldsymbol{R}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}},{\boldsymbol{\gamma}}_{\textrm{r}}\right)=f\left({\boldsymbol{Y}}_i|{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{y}}\right)f\left({\boldsymbol{R}}_i|{\boldsymbol{Y}}_i,{\boldsymbol{X}}_i,{\boldsymbol{\gamma}}_{\textrm{r}}\right),$$

such that the missing data process depends on the full data vector that includes observed and missing values of the outcome variable. The implication of MNAR is that valid inference of the longitudinal process cannot be made independent of the missing data process. Herein lies the problem of MNAR in that the observed data alone are not enough to inform the analyst about the missingness. It is therefore up to the analyst to carefully consider possible mechanisms that may account for missing data in a given problem and to then proceed with an understanding that inferences under these assumptions depend on the observed, and not the missing, data.

Pattern-mixture models

Among the major frameworks for addressing nonignorable missingness in longitudinal data is a random-effects pattern-mixture model in which one or more between-subject indicator variables are created to represent fixed patterns of missing data and individuals are classified according to these patterns (Hedeker & Gibbons, 1997; Little, 1995; Molenberghs & Kenward, 2007). Under the pattern-mixture model factorization of the joint density between the outcome variable and indicators of missingness, the longitudinal outcome is dependent on the missingness. Thus, under this framework, the longitudinal response is conditioned on patterns that describe when the longitudinal outcome is observed and when it is missing. Specifically, between-subject indicator variables are created to represent patterns of missing data. For example, an indicator variable could represent whether or not a subject has complete data versus any pattern of missing data. In a random-effects pattern-mixture model, pattern effects are fixed. Fixed pattern effects are included in the model for the longitudinal response with the assumption that, conditional on the pattern, the missingness is ignorable. Pattern indicators can also be used to predict coefficients of a growth model, such as a random slope, similar to how other subject-level attributes are used to predict characteristics of change in a longitudinal response. Again, conditional on the pattern of missing data, the missingness is assumed to be ignorable. Random-effects pattern-mixture models are generally straightforward to estimate using computer software designed to estimate random-effects models and have been shown to be effective in addressing nonignorable missing data in longitudinal research (Fitzmaurice et al., 2008; Molenberghs & Kenward, 2007).

The random-effects pattern-mixture model is widely applied in the behavioural sciences, possibly due to its ease of application. For instance, in a review of articles that cited Hedeker and Gibbons’ (1997) article on applications of this model, about 200 peer-reviewed articles reported an application of a random-effects pattern-mixture to address data that were possibly MNAR. Of those articles, nearly all applied a single model that represented the missingness by a single variable that denoted each subject’s completion status (e.g., a binary indicator of whether or not a study participant completed the planned assessments or a variable equal to the number of assessments completed), and no alternative models to address the missing data were considered. This is counter, however, to recommendations that researchers consider multiple models that represent plausible explanations about the missing data. Thus, the purpose of this paper is to consider extensions of the random-effects pattern-mixture model and illustrate their applications in assessing the impact of missing data on the statistical inference of a random-effects model for longitudinal data.

Illness severity in a sample of patients with schizophrenia

Prior to describing models aimed to address nonignorable missingness, a longitudinal study is described that motivates a practice of relying on multiple models that differ in their assumptions about a missing data process and use of a sensitivity analysis to assess the impact of nonignorable missingness under different mechanisms. The data are first analysed by fitting different growth models assuming ignorable missingness. Using the best-fitting model among those considered, models that make different assumptions about the missing data process are applied and compared.

The longitudinal study was designed to examine the effects of psychiatric medications in the treatment of mental illness in a sample of patients with schizophrenia. The National Institute of Mental Health Schizophrenia Collaborative Study was a nationwide controlled study of a psychopharmacological treatment (phenothiazine treatment) in acute schizophrenia (National Institute of Mental Health, Psychopharmacology Service Center Collaborative Study Group, 1964; National Institute of Mental Health, Psychopharmacology Research Branch Collaborative Study Group, 1967). Data were collected from nine public, private and university hospitals. Within hospitals, newly admitted patients diagnosed with schizophrenia and who met the study criteria were randomly assigned to one of four study medications, including a placebo, using a double-blind assignment process. Patients were first stratified according to sex, and within each sex, were randomly assigned to a drug treatment.

Data for the 437 patients reported in Fitzmaurice et al. (2011) and Hedeker and Gibbons (1997) are studied here. A global rating of illness severity was measured by Item 79 of the Inpatient Multidimensional Psychiatric Scale (IMPS) (Lorr & Klett, 1966) and assessed using a 7-point ordinal scale. The response scale had values denoting the severity of illness as: 1 = normal, not at all ill; 2 = borderline mentally ill; 3 = mildly ill; 4 = moderately ill; 5 = markedly ill; 6 = severely ill and 7 = among the most extremely ill. Observed ratings, which include non-integer values falling between values of the measurement scale, were obtained beginning with a baseline assessment and follow-ups spanning up to 6 weeks thereafter. Patients who received any psychiatric drug were combined into one group because there were no detectable differences in illness ratings between these groups (see Hedeker & Gibbons). The score is analysed here as a continuous variable. Table 1 provides sample descriptives of illness scores, separately for patients assigned to receive the placebo or a drug, for week = baseline, 1, 2,…,6. Figure 1 displays observed trajectories of nine individuals from each of the two patient groups. Displays of all cases are in Hedeker and Gibbons (1997). Individual differences in the observed trajectories are evident, as is the nonlinearity in the form of change for some patients.

Table 1 Descriptive statistics for mental illness scores (n = 437)

Full size table

Patterns of missing data

According to the study’s description, the plan was to obtain illness measures at baseline and 1, 3 and 6 weeks following baseline. For a small number of patients (2–5% of the total subject count), scores were also obtained at weeks 2, 4 and 5. Given the data collection plan, it seemed reasonable to assume that missing scores at weeks 2, 4 and 5 for the vast majority of patients were missing by design, and thus, missing completely at random. Data missing by design are missing completely at random if the missingness is independent of both the observed and the missing data (Graham et al., 2006). Assuming missingness in weeks 2, 4 and 5 is ignorable, the analysis proceeds here in addressing patterns of missing data with regard to baseline and weeks 1, 3 and 6, noting that data from all weeks, as available, are included in the reported analyses.

Indicators of patterns of missing data were generated based on data at baseline and weeks 1, 3 and 6, resulting in nine patterns (see Table 2). Pattern 1 reflects complete data at baseline and weeks 1, 3 and 6 and so corresponds to a pattern assumed to reflect ignorable missingness, as discussed earlier. Patterns 2, 3 and 7 correspond to patterns of monotonic dropout. Patterns 4, 5, 6 and 8 show intermittently missing data. Pattern 9 has a pattern of both intermittently missing data and possible attrition. Fisher’s exact test of independence between pattern and treatment group was statistically significant at the .05 level (p = .016). Based on the observed and expected cell counts provided in Table 2, the placebo group has lower than expected counts of patients with complete data at baseline and weeks 1, 3 and 6, whereas the drug group tends to have lower than expected patient counts for patterns involving missing data. Based on this, it may be important to address the missing data when studying differences in illness ratings between the patient groups.

Table 2 Patterns of missing data (n = 437)

Full size table

Growth models assuming ignorable missingness

Different growth models were first fit to the illness ratings under the assumption that the missingness was ignorable with the goal of characterizing scores across weeks for the two patient groups. Let y_ti denote an illness rating at week t for patient i. Week was centred to the baseline assessment: week = 0, 1, …, 6. A drug treatment indicator denoted whether or not a patient received one of four psychiatric medications (Drug_i = 0 if patient i was given a placebo; Drug_i = 1 if a patient was given a psychiatric drug) and was used to predict each coefficient of a given growth model at the subject level. Illness ratings were assumed to follow a random-effects model:

Level 1: y_ti = f(β_i, week_ti) + e_ti
Level 2: β_i = g(γ, Drug_i) + u_i

where, at the first level, f(∙) denotes a function of a set of random coefficients in β_i and the week of assessment, and e_ti is the residual. At the second level, g(∙) denotes a vector-valued function in which the random coefficients at the first level are regressed on the treatment indicator, Drug_i, where γ is a set of fixed effects that link the random coefficients and the treatment indicator; u_i is the set of residuals corresponding to the level-2 regressions. The conditional random effects u_i were assumed to be bivariate normal with means equal to zero and variance-covariance matrix Φ_u:

$${\boldsymbol{\Phi}}_u=\left[\begin{array}{cc}{\phi}_{u_0}& \\ {}{\phi}_{u_1{u}_0}& {\phi}_{u_1}\end{array}\right]$$

Conditional on the treatment effects, the missingness was assumed to be ignorable.

Four different growth functions (see Table 3) were fit to model change in ratings: linear growth, quadratic growth, linear growth using a square root transformation of week, and exponential growth, with all but the linear model used to address possible nonlinearity in the form of change. The quadratic function allows for non-constant change rates, but due to the parabolic shape of the function, scores would be expected to decrease and later increase with time. The linear function that uses the square root of week changes the time metric and helps to linearize the form of the relationship between the outcome and time. The exponential function includes a lower asymptote to capture stability in illness ratings if it is expected that patients would initially show improvement by a decreasing trend in their clinical ratings and later achieve stability in their illness status.

Table 3 Four growth models fitted to illness ratings (n = 437)

Full size table

To help decipher the information in Table 3, the linear growth model is described as an example. At level 1, the growth function is β_0i + β_1iweek_ti, and at level 2, each growth coefficient is regressed on Drug: β_0i = γ₀₀ + γ₀₁Drug_i + u_0i and β_1i = γ₁₀ + γ₁₁Drug_i + u_1i. The intercept of each level 2 regression is the fixed growth coefficient (intercept and slope, respectively) for the placebo group, and the effect of Drug is the difference in the growth coefficient between the placebo and drug group. For example, γ₀₀ is the expected illness rating for the placebo group at baseline, and γ₀₁ is the expected baseline difference in ratings between the placebo and drug group. The residual of each level 2 regression is the random effect conditional on Drug. For example, u_0i is the residual corresponding to the subject-specific intercept of the growth model after conditioning on Drug. Although not shown in Table 2, the level 1 residual variance was modelled using an exponential function to permit heterogeneity of variance between treatment groups: ${\sigma}_{e_i}^2=\exp \left\{{\tau}_0+{\tau}_1{Drug}_i\right\}$.

Estimation

Maximum likelihood and Bayesian estimation are the major approaches to the estimation of random-effects models. Here, estimation is carried out using PROC MCMC for SAS/STAT software version 9.4^{Footnote 1}. Bayesian estimation was selected for the current analysis due to the greater flexibility in how a random-effects model may be specified, a feature that will become evident when models for nonignorable missingness are described. PROC MCMC is a flexible simulation procedure for which Bayesian estimation is carried out by repeatedly sampling from a posterior distribution using the Markov chain Monte Carlo (MCMC) approach (for details, see Gelfand et al., 1990). The primary sampling mechanism for PROC MCMC is a self-tuned random walk Metropolis algorithm (Chen, 2009). The samples drawn are used to estimate the posterior marginal distributions from which statistical inference of the model parameters may be drawn. In fitting a random-effects model using Bayesian methods, the fixed effects and the random effects are treated as random variables.

Weakly informative prior distributions were specified for most of the model parameters. Fixed effects were assumed to have Gaussian priors with mean = 0 and variance = 1000. The prior distribution of the intercept of the growth model was restricted to have a lower bound of 1 and an upper bound of 7, given that the illness rating scale was bounded between 1 and 7. The prior of the variance-covariance matrix of the random coefficients at the subject level and those at the pattern level (when applicable) was assumed to be an inverse-Wishart with small degrees of freedom (e.g., quadratic model assumed 3 df). As an exponential model was used for the residual variance at the first level, the parameters of the variance model were assumed to have Gaussian priors with mean = 0 and variance = 1000. A lower bound for effective sample size (ESS) was calculated using the R package mcmcse (Flegal et al., 2021) assuming a 95% confidence level. The observed ESS of each model parameter was compared to this minimum criterion value. Markov chains were run for 10,000,000 iterations with 50,000 burn-in iterations and thinning set to 1000, with thinning done to reduce memory requirements. Specifications were set to meet the minimum ESS needed across the planned models. Model convergence was judged by inspecting trace and autocorrelation plots and meeting the lower bound ESS for a given model. The posterior mean of fixed effects (assuming symmetric posterior distributions) and the posterior mode for variance parameters (assuming non-symmetric posterior distribution), along with 95% highest posterior density intervals (HPDI), for parameter estimates are reported.

Results: Growth models assuming ignorable missingness

Indices of fit for the four growth models are given in Table 3. The models take into account possible differences between the treatment groups by including the effects of Drug and assume that the missingness is ignorable. Conditioning on the effects of Drug, the linear model using a square-root transformation of week had the lowest deviance information criterion^{Footnote 2} (DIC) value. Assuming that a square-root transformation of week provided a more suitable representation than the exponential growth model that included a lower asymptote could be an indication that illness ratings, across the patient samples, were not tending towards clinical stabilization. Going forward, the illness ratings are described by the linear growth model using the square root transformation of week.

Estimates from models assuming ignorable missingness are summarized in the first column of estimates in Table 4. Posterior means of fixed effects and posterior medians for variance parameters, with 95% HPDI, are reported. Based on the estimates, patient groups differed in the expected change over time (${\hat{\alpha}}_{11}$=−0.65, 95% HPDI: (−0.81,−0.49)) but not in their expected baseline levels (${\hat{\alpha}}_{01}$ =0.05, 95% HPDI: (−0.14,0.27)). The residual variance of illness ratings was greater for those assigned to a drug relative to those assigned to the placebo (${\hat{\tau}}_1$=0.28, HPDI: (0.07,0.50)). Between-subject heterogeneity of variance in the expected baseline levels (${\hat{\phi}}_{u_0}$=0.47, HPDI: (0.36, 0.59)) and in the slopes (${\hat{\phi}}_{u_1}$=0.32, HPDI: (0.25, 0.39)) is notable.

Table 4 Bayesian estimates of a growth model for illness ratings under different missingness mechanisms (n=437)

Full size table

Models for nonignorable missingness

As described earlier, about 23.3% of patients are considered to have (unplanned) incomplete data. Reasons for the missing data are not described in the documentation cited previously for the National Institute of Mental Health Schizophrenia Collaborative Study, so it is reasonable to consider different scenarios that might account for the sources of the missing data, and importantly, study how inferences about the longitudinal process are sensitive to different assumptions.

The patterns in Table 2 are used to formulate models that represent possible missing data processes. Three of the nine patterns (patterns 2, 3 and 7) reflect a monotonic dropout pattern, and four others (patterns 4, 5, 6 and 8) reflect intermittently missing data. The last pattern (pattern 9) reflects intermittently missing data but possible attrition near the end of the planned observation period. Based on the result of a Fisher’s exact test of independence between pattern and treatment group, the patterns of missing data may be informative in the analysis of the illness ratings. The models considered here use information from the patterns in multiple ways with a goal of representing multiple plausible models for the missing data. The goal in fitting multiple models that reflect plausible missing data processes was to evaluate if the parameters of the longitudinal model for the illness ratings were sensitive to the assumptions made about the missingness.

The first model considered is a pattern-mixture random-effects model with a single fixed pattern of dropout. This model uses the 6^th week of observation to indicate whether or not a patient provided data at the final planned assessment. Thus, this single indicator groups the 101 individuals with patterns of monotonic dropout and the one individual with a combination of intermittently missing data and dropout patterns. This model assumes that patterns of intermittently missing data are not important and that the effects of monotonic dropout, regardless of the timing, do not differ from each other. The second and third models are random pattern-mixture models that treat the missing data pattern as a random effect. Specifically, the second model uses all nine patterns and assumes that monotonic dropout and intermittently missing data patterns are important, and the third model uses five of the nine patterns to include only monotonic dropout patterns.

Subject attrition is common in longitudinal investigations, and so additional models were specified in which the growth model for illness ratings was estimated jointly with a model for the week when the patient was last observed. In the first model, the timing of dropout was regressed on Drug, and in the second model, was additionally regressed on the random intercept and slope of the growth model. Thus, the latter model links the timing of dropout to the illness ratings through the random growth coefficients that characterize change in the illness ratings, and the former assumes the two processes are independent. The second of these two models is known as a shared parameter model in which coefficients of one model are shared with those of the other model and where estimation of the two models is done jointly (Albert & Follman, 2009; Wu & Carroll, 1988).

Fixed pattern-mixture model

The first model was a random-effects model with a single fixed-pattern effect. An indicator of dropout was assumed to account for differences in the longitudinal trajectories between those who completed treatment and those who did not, defined by whether or not the patient was observed at the 6^th week. The indicator, henceforth called Drop, was equal to 1 for patients with patterns k = 2, 3, 7 and 9 in Table 2 (n = 102 (23.3%) patients) and otherwise was equal to 0 (see Hedeker & Gibbons, 1997). To fit this model, the model in Eq. (1) was extended to include the effect of Drop and its interaction with Drug:

$${y}_{tik}={\beta}_{0 ik}+{\beta}_{1 ik}\sqrt{week_{tik}}+{e}_{tik,}$$

where

$${\beta}_{0 ik}={\gamma}_{00}+{\gamma}_{01}{Drug}_{ik}+{\gamma}_{02}{Drop}_{ik}+{\gamma}_{03}{Drug}_{ik}\ast {Drop}_{ik}+{u}_{0 ik}$$

$${\beta}_{1 ik}={\gamma}_{10}+{\gamma}_{11}{Drug}_{ik}+{\gamma}_{12}{Drop}_{ik}+{\gamma}_{13}{Drug}_{ik}\ast {Drop}_{ik}+{u}_{1 ik}.$$

The coefficients of the growth model are functions of Drug, Drop and their interaction. The residuals u_0ik and u_1ik of the level 2 equations are conditional random effects. As was done in the model that assumed ignorable missingness and all forthcoming models that assume nonignorable missingness, the two random effects could covary.

The growth coefficients were allowed to differ between groups based on the indicator of dropout. To ease the comparison of estimates from this model to other models, the overall population effects are calculated by averaging across patterns, weighted by the proportions of subjects within patterns (cf. Little, 1993; 1995; Hogan & Laird, 1997):

$$\overline{\gamma}={\pi}^{Drop=0}{\gamma}^{Drop=0}+{\pi}^{Drop=1}{\gamma}^{Drop=1}$$

where γ is a fixed growth coefficient, such as the model’s intercept, π^Drop = 0 is the population proportion of individuals with no pattern of dropout, and π^Drop = 1 is the population proportion of individuals with a pattern of dropout. Using the sample proportion of patients with a pattern of dropout (.2334), estimates of the population averages were obtained for the model’s fixed intercept and the effects of $\sqrt{week}$, Drug and the interaction between Drug and $\sqrt{week}$.

Random pattern-mixture model

Next, random pattern-mixture models were specified. Two pattern sets were tested, each set assumed to be from a population of missing data patterns. The first, Pattern Set 1, included patterns of intermittently missing data, patterns of monotonic dropout and a combination of the two (pattern 9). If intermittently missing data were missing at random, and thus, ignorable, then it would not be important to include those patterns in an analysis. So, a second set, Pattern Set 2, included only the monotonic pattern of missingness (patterns 2, 3, 7 and 9; as pattern 9 possibly has a pattern of attrition, it was included here as a pattern of dropout). Note that Pattern Set 2 is comprised of patterns used to make the indicator Dropout, but in this model, the pattern effect is assumed to be random, and as such, the model permits differences in effects due to differences in the timing of when a patient dropped from the study. If a patient was considered to have complete data, then missingness was assumed to be ignorable and their model was specified by Eqs. (1) and (2). Otherwise, patients who had a pattern of missing data had a longitudinal model that included the random patterns effect v_0k and v_1k:

$${y}_{tik}={\beta}_{0 ik}+{\beta}_{1 ik}\sqrt{week_{tik}}+{e}_{tik},$$

where, at the subject level,

$${\beta}_{0 ik}={\gamma}_{00k}+{\gamma}_{01}{Drug}_{ik}+{u}_{0 ik}$$

$${\beta}_{1 ik}={\gamma}_{10k}+{\gamma}_{11}{Drug}_{ik}+{u}_{1 ik,}$$

and at the pattern level,

$${\gamma}_{00k}={\gamma}_{00}+{v}_{0k}$$

$${\gamma}_{10k}={\gamma}_{10}+{v}_{1k}.$$

Conditional on a random pattern of missing data, missingness was assumed to be ignorable. The random pattern effects v_0k and v_1k were assumed to be bivariate normal with means equal to zero and variance-covariance matrix Φ_v:

$${\boldsymbol{\Phi}}_v=\left[\begin{array}{cc}{\phi}_{v_0}& \\ {}{\phi}_{v_1{v}_0}& {\phi}_{v_1}\end{array}\right].$$

Estimation of the data model for illness ratings was also carried out simultaneously with a model that predicted the log-transformed (to reduce positive skew) value of the week when a patient was last observed, henceforth called ln(MaxWeek). The variable MaxWeek might be considered a proxy for the actual time of dropout from the study. A higher value of ln(MaxWeek) indicates greater time spent in the study. As the models that aim to address nonignorable missingness take into account a patient’s pattern of missingness, the outcome y_tik includes an added subscript k to denote the missing pattern for the individual. The joint model is presented as

$${y}_{tik}={\beta}_{0 ik}+{\beta}_{1 ik}\sqrt{week_{tik}}+{e}_{tik}$$

(1)

where

$${\beta}_{0 ik}={\gamma}_{00}+{\gamma}_{01}{Drug}_{ik}+{u}_{0 ik}$$

$${\beta}_{1 ik}={\gamma}_{10}+{\gamma}_{11}{Drug}_{ik}+{u}_{1 ik,}$$

and

$$\ln {(MaxWeek)}_{ik}={\alpha}_0+{\alpha}_1{Drug}_i+{\varepsilon}_{ik}.$$

(2)

The model for ln(MaxWeek)_ik includes an intercept, α₀, a treatment effect, α₁ and the residual of the regression, ε_ik. Similar to the residual variance of the model for illness ratings, the residual variance of the model for the last week of observation was allowed to differ between treatment groups by using an exponential model: ${\sigma}_{\varepsilon_i}^2=\exp \left\{{\kappa}_0+{\kappa}_1{Drug}_i\right\}$. Thus, under this model, the longitudinal process and the timing of dropout are assumed to be independent.

Shared parameter, random-effects model

Last, a shared parameter random-effects model was fit in which the random intercept and slope of the model for illness ratings were shared parameters in the model for ln(MaxWeek)_ik:

$${y}_{tik}={\beta}_{0 ik}+{\beta}_{1 ik}\sqrt{week_{tik}}+{e}_{tik}$$

where

$${\beta}_{0 ik}={\gamma}_{00}+{\gamma}_{01}{Drug}_{ik}+{u}_{0 ik}$$

$${\beta}_{1 ik}={\gamma}_{10}+{\gamma}_{11}{Drug}_{ik}+{u}_{1 ik},$$

and

$$\ln {(MaxWeek)}_{ik}={\alpha}_0+{\alpha}_1{Drug}_i+{\alpha}_2{\beta}_{0 ik}+{\alpha}_3{\beta}_{1 ik}+{\varepsilon}_{ik},$$

where α₂ and α₃ are the effects of the random intercept β_0ik and slope β_1ik of the longitudinal model on the timing of dropout. Under this model, it is assumed that the timing of dropout is dependent on the subject-specific aspects of the longitudinal trajectory. Thus, nonignorable missingness is accounted for through the relationship between the timing of dropout and aspects of change in the illness ratings that characterize the observed and the missing illness ratings. Conditional on the random coefficients β_0ik and β_1ik, the longitudinal response y_tik and the week of the last observation ln(MaxWeek)_ik are independent. Finally, the residual in the model for the timing of dropout was assumed to be normally distributed with mean equal to 0 and a variance that could different between treatment groups. Specifically, similar to the model used to represent the residual variance of the growth model for illness ratings, an exponential function was used to model the residual variance for the regression of the timing of dropout: ${\sigma}_{\varepsilon_i}^2=\exp \left\{{\kappa}_0+{\kappa}_1{Drug}_i\right\}$.

Results

Results from fitting the model that assumed ignorable missingness (described earlier) and those that assumed a mechanism of nonignorable missingness are summarized in Table 4. The posterior mean for fixed effects and the posterior median for variance parameters, with 95% HPDI, are reported for the parameter estimates. For the pattern-mixture model with a fixed pattern effect that reflected whether or not a patient had complete data, estimates are given for the model with estimates based on how the model was parameterized (discussed earlier), as well as a set of estimates for which the fixed effects are the population-averaged estimates, as previously discussed. For the random pattern-mixture models, estimates are provided for the model that used Pattern Set 1 to define the pattern effects in which the effects related to attrition and intermittently missing data, and for the model that used Pattern Set 2 to define the pattern effects in which the effects related only to patterns of attrition. Finally, estimates are provided for the joint model for the longitudinal outcome and the timing of dropout, followed by estimates of the shared parameter model.

From the table of parameter estimates, it is clear that similar conclusions can be drawn about the marginal growth parameters for the two patient populations. That is, whether the missingness is considered to be ignorable or nonignorable, similar conclusions are reached about treatment effects on the illness trajectories of the two patient groups. The pattern-mixture model with a fixed pattern effect showed differences in the expected change between those with complete data versus those with incomplete data. The average effects, however, yield estimated population parameters that are close to the estimates resulting from all other models. This is similar to the results reported in Hedeker and Gibbons (1997). Estimates between the two random pattern-mixture models were comparable whether patterns of intermittently missing data were included or not when defining the random pattern effect, and estimates from these two models were comparable to those from all other models that were fit. Under the shared parameter model in which timing of dropout was regressed on the random intercept and slope of the longitudinal model, dropout was not dependent on the random coefficients of the growth model, a result that also suggests ignorable missingness.

A simulation study

To validate the random pattern-mixture model as a viable approach to addressing non-ignorable missingness, a small simulation study was conducted. A set of 100 data sets was generated for 400 subjects, measured from one to up to six occasions (coded as wave = 1,…6) under a random-effects linear growth model for a single normal variable y with a subject-level binary covariate X_1i (simulated as X_1i~N(0, 1) with a cutpoint at 0) and a subject-level continuous covariate X_2i (X_2i~N(0, 1)). Unlike the covariate X_1i that was used in the generating model and the models fitted for analysis, X_2i was only used in the data-generating model to simulate an unmeasured covariate that was related to y_ti through the parameters of the growth model, related to the covariate X_1i, and predicted missingness in y_ti. The response y_ti at wave t for subject i was generated by

$${y}_{ti}={\beta}_{0i}+{\beta}_{1i}\left({wave}_{ti}-1\right)+{e}_{ti}$$

where

$${\beta}_{0i}={\gamma}_{00}+{\gamma}_{01}{X}_{1i}+{\gamma}_{02}{X}_{2i}+{u}_{0i}$$

$${\beta}_{1i}={\gamma}_{10}+{\gamma}_{11}{X}_{1i}+{\gamma}_{12}{X}_{2i}+{u}_{1i},$$

where γ₀₀ = 1, γ₀₁ = 0.5, γ₀₂ = 1, γ₁₀ = 2, γ₁₁ = 0.2 and γ₁₂ = 0.5. Further, X_1i = .5X_2i + e_xi.

The residual at the first level was assumed to be independent and identically distributed (i.i.d.) normal as ${\boldsymbol{e}}_i\sim \left(\textbf{0},{\sigma}_e^2{\textbf{I}}_{\textbf{6}}\right)$, where ${\sigma}_e^2=0.3$ and I₆ was a 6 × 6 identity matrix. The residuals at the second level were assumed to be independent between subjects and multivariate normal:

$$\left[\begin{array}{c}{u}_{0i}\\ {}{u}_{1i}\end{array}\right]\sim mvn\left(\left[\begin{array}{c}0\\ {}0\end{array}\right],\left[\begin{array}{cc}{\phi}_{u_0}& \\ {}{\phi}_{u_1{u}_0}& {\phi}_{u_1}\end{array}\right]\right),$$

where ${\phi}_{u_0}=1$, ${\phi}_{u_1}=0.5$, and ${\phi}_{u_1{u}_0}=0.1$.

Missingness in y_ti was generated according to a logistic regression model for a set of binary dependent variables that represented missing (R_ti = 1) or not missing (R_ti = 0) in y_ti at waves t = 2,…,6, where missingness depended on the covariates X_1i and X_2i. Letting η_t denote the logit at wave t of the probability that y_ti was missing, η_t by wave was specified as

$${\eta}_2=-1+0.2{X}_{1i}+0.3{X}_{2i}$$

$${\eta}_3=-1+0.4{X}_{1i}+0.6{X}_{2i}$$

$${\eta}_4=-1+0.8{X}_{1i}+1.2{X}_{2i}$$

$${\eta}_5=-1+1.6{X}_{1i}+2.4{X}_{2i}$$

$${\eta}_6=-1+3.2{X}_{1i}+4.8{X}_{2i}.$$

The probability that y_ti was missing when X_1i = 0 and X_2i = 0 was P(1/(1 + exp {1})) = .27, with the probability of missingness increasing over waves for X_1i = 1 and increased values of X_2i according to the coefficients specified in the equations relating to the logit. This data-generating model therefore generated both monotonic and non-monotonic missingness in y_ti.

Four linear growth models, with X_1i as a predictor of the random intercept and slope, were fitted to the simulated data. The first assumed that the missingness was ignorable. The second model included an indicator variable that denoted whether or not an individual had any pattern of monotonic dropout (drop_i= 1; drop_i= 0 otherwise). Thus, this model ignored patterns of intermittently missing data and assumed that the effects of monotonic missingness were equal. The third model included a numeric covariate that denoted the timing of monotonic dropout (timing_i), with this model also ignoring patterns of intermittently missing data. The fourth model clustered subjects by pattern of missing data. For each parameter, the average 95% credible interval is reported along with bias in Table 5. As shown in Table 5, the magnitude of parameter bias is consistently lowest under the random pattern-mixture model that captures patterns of monotonic and non-monotonic missingness in y_ti. Though none of the fitted models was the model that generated the data, the random pattern-mixture model provided parameter estimates of the growth model that were closest to the true parameter values.

Table 5 Results from 100 simulated data sets under different assumed missingness data mechanisms (n=400 subjects)

Full size table

Discussion

Missing data in longitudinal studies are common, making it important in many problems for the analysis to address reasons why data are missing, and importantly, how they may impact inference from a longitudinal model. The issue is that the data from subjects with missing data may be different from those of subjects with complete data. If that is the case, then inference from a longitudinal model that assumes ignorable missingness will not reflect the full population that would include a combination of subjects with either complete or incomplete data. If data are missing solely by design (Graham et al., 2006), then it is reasonable to assume that the missingness is ignorable, leaving no need to also account for the missingness in the analysis. If data show patterns of subject attrition, however, then it is advisable to consider different scenarios about the source of the missingness. If correlates of the missingness or the missing data are available, then such variables can be included in the analysis, such as by including correlates as covariates. In other situations where correlates of the missingness or the missing values are not available, then the researcher may consider models that reflect nonignorable missingness, including the use of pattern-mixture, selection models and shared parameter models. This may be done by specifying a pattern-mixture model with one or more fixed pattern effects that allow the marginal effects of a growth model to differ according to groups defined by a finite number of missing data patterns (Hedeker & Gibbons, 1997). Alternatively, the pattern effect may be random (Guo et al., 2004), an option that may be more suitable to problems involving many patterns of missing data, including patterns reflective of intermittently missing data and subject attrition. Another option is to specify a shared parameter model in which the missingness depends on the observed and missing values of the measured outcome (Albert & Follman, 2009; Wu & Carroll, 1988). Here, for example, the missingness was represented by the last week that a patient was observed, and the observed and missing values of the measured illness ratings were presented by the random coefficients of the growth model.

Using a set of empirical data from a longitudinal study, this paper illustrated these major frameworks for dealing with nonignorable missingness. The aim was to show how to compare estimates of a longitudinal model under a range of different possible mechanisms of the missing data to assess whether inference from the model was sensitive to the assumptions made about the missing data (Daniels & Hogan, 2008). A pattern-mixture model was applied in which the pattern of missing data was fixed in one version of the model and assumed to be random in a different version. The version in which the pattern was fixed is the same as one of the models reported in Hedeker and Gibbons (1997). In the version of a pattern-mixture model that assumed a random pattern effect, it was possible to account for patterns of intermittently missing data and patterns of attrition. Although a random pattern-mixture model has been previously considered by Guo et al. (2004) (for a different set of empirical data), their application of the model was one in which the random pattern was designed to capture the effects of subject attrition. The application here proposes use of a random pattern-mixture model as a tool for evaluating whether intermittently missing data are possibly nonignorable.

In addition to fitting a pattern-mixture model with either a fixed or random pattern effect, a shared parameter model was applied in which the timing of dropout was dependent on the random coefficients of the longitudinal model in what is called a shared parameter model. A shared parameter model in the context of missing data is a special case of a class of models known as selection models in which missingness is predicted by the observed and the missing values of the measured response of the growth models (Albert & Follman, 2009; Wu & Carroll, 1988). Here, the link between missingness and the illness ratings was made through a variable representing the last week a patient was observed and the random effects of the growth model that represented the illness ratings. Thus, in this nonignorable model, the missingness was specified to be related to both the observed and the missing values of the primary response through the random coefficients of the growth model. In a selection model, the assumption is that the parameters of the longitudinal response and dropout are independent after conditioning each on the random effects of a growth model.

Comparisons were made between the estimated fixed effects of the growth model, and inference about the marginal growth parameters did not differ greatly between models. Under the random pattern-mixture model in particular, it did not matter whether the random pattern effect included patterns of attrition alone or a combination of patterns reflecting attrition and intermittently missing data. Thus, including additional patterns to reflect intermittently missing data did not result in a different conclusion about group-level change in the illness ratings. Inference from the marginal longitudinal model also did not differ if dropout was allowed to depend on the random effects of the growth model. Thus, conditioning on the drug treatment effects, the missingness is arguably ignorable for the marginal aspects of illness ratings. This is consistent with the conclusions about the missingness for this particular data set that was presented in Hedeker and Gibbons (1997). (For examples of a sensitivity analysis that does result in differences between models, please see Molenberghs & Kenward, 2007).

The small collection of models considered here are used as a means to model nonignorable missingness in applications of random-effects models for longitudinal data. The work here relied on Bayesian estimation, instead of maximum likelihood estimation that is more common. Bayesian estimation was used primarily because this approach provides a great deal of flexibility in an analysis, which seemed particularly applicable to the estimation of the pattern-mixture model that assumed a random pattern effect. If the number of patterns of missing data is small, then using Bayesian estimation can permit testing of a model that assumes that the standard deviation of a single random pattern effect follows a half t distribution (Chen et al., 2016). This kind of problem is analogous to fitting three-level models for which the number of random subjects at the highest level is small (Gelman, 2006).

The empirical example presented in this paper was used to illustrate different ways in which one might address nonignorable missingness in a longitudinal data analysis that uses a random-effects model. Naturally, there are variations in the specific models that were tested here, such as those considered for a pattern-mixture model with a fixed pattern effect (see Hedeker & Gibbons, 1997). For example, for the illness ratings that were analysed in this report, the patterns of missing data included those reflective of attrition, as well as those reflective of intermittently missing data. Thus, in the analysis of this data set, models for nonignorable missingness were specified to reflect both patterns. Finally, it is also important to mention that although different models may be considered to represent a missing data process, the fact that an analysis may suggest that the missingness is ignorable does not imply certainty in that conclusion. That is, the models that one uses to represent a nonignorable process may not capture the true underlying process. For this reason, researchers must carefully consider including additional variables that may be correlated with either the missingness or the primary variables of a data model. If these added variables are correlated with the missingness or the variables of the data model, then conditioning on their effects may help to account for the missingness (Little & Rubin, 2002).

This paper focused on some of the major frameworks for analysing longitudinal data that are MNAR. These methods represent different ways in which an analyst can model missingness and its possible impact on inference from the substantive model that is often the primary interest. An important shortcoming from the application of any one framework in which an analyst then chooses to model missingness by a single model is that inference from the substantive model is done under the assumption that the model for missingness generated the missing data. Obviously, this strategy ignores the uncertainty about the true source of the missing data, in part from only having observed data available for analysis, but also from considering only a single model to represent the missingness. It is therefore recommended that data be analysed under multiple models of plausible mechanisms of missingness, as was done in this paper, with an understanding of the likely possibility that no one model of those considered accurately captures the missing data process.

One major strategy for handling missing data that was not considered here is multiple imputation (MI). The central idea of MI is to replace missing values in a data set with a set of multiple plausible values from which inference is drawn about the parameters of the marginal model. The process by which data are imputed is not necessarily dependent on the specification of a missingness process, although some approaches to using MI have included aspects of a pattern-mixture model to define the imputation model (e.g., Little & Yau, 1996; Thijs, Molenberghs, Verbeke, & Curran, 2002). A benefit of MI methods is that they can ease the constraints in how the mechanism for missingness is represented in the imputation model, and importantly, permit the inclusion of variables that predict missingness in the imputation process. This is helpful in situations in which there is no interest in including these particular variables as covariates in a longitudinal model. That is, these auxiliary variables can provide valuable information about the missing data during the imputation process but will not interfere with the goals of modelling the longitudinal outcome.

MI is indeed a Bayesian approach to missing data (Schafer, 1997). MI methods have the desirable aspect of being able to include many auxiliary variables to address sources of nonignorable missingness and are naturally not dependent on model specifications regarding particular missing data processes that are inherent to the frameworks considered in this paper. That said, MI also involves uncertainty in the imputation model itself, and methods have been designed to address this (Hinne et al., 2020; Kaplan & Yavuz, 2020) that might also be applied in accounting for nonignorable missingness in the context of longitudinal data.

Data availability

A dataset and scripts for analyses presented in the study are included as Supplementary Materials.

Notes

SAS System for Windows. Copyright © 2016 by SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.
The deviance information criterion (Spiegelhalter et al., 2002) is calculated using the posterior mean estimates of the model parameters where both fixed and random effects are treated as random variables; this is in contrast to a marginal information criterion that is conditional on the fixed effects alone.

References

Albert, P.S., & Follman, D.A. (2009). Shared-parameter models. In: Ftizmaurice, Verbeke & Molenberghs (ed). Longitudinal data analysis. : Chapman & Hall / CRC Press. pp. 433–452.
Chen, F. (2009). Bayesian modeling using the MCMC procedure. SAS Institute Inc..
Google Scholar
Chen, F., Brown, G., & Stokes, M. (2016). Fitting your favorite mixed models with PROC MCMC. SAS Institute Inc..
Google Scholar
Daniels, M. J., & Hogan, J. W. (2008). Missing data in longitudinal studies: Strategies for Bayesian modelling and sensitivity analysis. Chapman & Hall.
Book Google Scholar
Fitzmaurice, G. M., Davidian, M., Verbeke, G., & Molenberghs, M. (2008). Longitudinal data analysis. Chapman and Hall.
Book Google Scholar
Fitzmaurice, G., Laird, N., & Ware, J. (2011). Applied longitudinal analysis (2^nd ed.). John Wiley & Sons, Inc.
Flegal, J. M., Hughes, J., Vats, D., Dai, N., Gupta, K, & Maji, U. (2021). Mcmcse: Monte Carlo standard errors for MCMC. R package version 1.5-0.
Gelfand, A. E., Hills, S. E., Racine-Poon, A., & Smith, A. F. M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association, 85(412), 972–985. https://doi.org/10.2307/2289594
Article Google Scholar
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1(3), 515–534. https://doi.org/10.1214/06-BA117A
Article Google Scholar
Graham, J. W., Taylor, B. J., Olchowski, A. E., & Cumsille, P. E. (2006). Planned missing data designs in psychological research. Psychological Methods, 11(4), 323–343. https://doi.org/10.1037/1082-989X.11.4.323
Article PubMed Google Scholar
Guo, W., Ratcliffe, S. J., & Ten Have, T. T. (2004). A random pattern-mixture model for longitudinal data with dropouts. Journal of the American Statistical Association, 99(468), 929–937. https://doi.org/10.1198/016214504000000674
Article Google Scholar
Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection, and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5(4), 475–492.
Google Scholar
Hedeker, D., & Gibbons, R. D. (1997). Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods, 2(1), 64–78 https://psycnet.apa.org/buy/1997-07778-004
Article Google Scholar
Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E.-J. (2020). A conceptual introduction to Bayesian model averaging. Advances in Methods and Practices in Psychological Science, 3(2), 200–215.
Article Google Scholar
Hogan, J. W., & Laird, N. M. (1997). Mixture models for the joint distribution of repeated measures and event times. Statistics in Medicine, 16(3), 239–258. https://doi.org/10.1002/(SICI)1097-0258(19970215)
Article PubMed Google Scholar
Kaplan, D., & Yavuz, S. (2020). An approach to addressing multiple imputation model uncertainty using Bayesian model averaging. Multivariate Behavioral Research, 55(4), 553–567. https://doi.org/10.1080/00273171.2019.1657790
Article PubMed Google Scholar
Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7(1-2), 305–315. https://doi.org/10.1002/sim.4780070131
Article PubMed Google Scholar
Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88(421), 125–134. https://doi.org/10.1080/01621459.1993.10594302
Little, R. J. A. (1994). A class of pattern-mixture models for normal incomplete data. Biometrika, 81(3), 471–483.
Little, R. J. A. (1995). Modeling the drop-out mechanism in longitudinal studies. Journal of the American Statistical Association, 90(431), 1112–1121. https://doi.org/10.1080/01621459.1995.10476615
Article Google Scholar
Little, R., & Yau, L. (1996). Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics, 52(4), 1324–1333. https://doi.org/10.2307/2532847
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2^nd ed). : John Wiley & Sons, Inc.
Lorr, M., & Klett, C. J. (1966). Inpatient multidimensional psychiatric scale: Manual. Consulting Psychologists Press.
Google Scholar
Molenberghs, G., & Kenward, M. G. (2007). Missing data in clinical studies. : John Wiley & Sons, Ltd.
National Institute of Mental Health Psychopharmacology Service Center Collaborative Study Group. (1964). Phenothiazine treatment of acute schizophrenia: Effectiveness. Archives of General Psychiatry, 10, 246–261 https://pubmed.ncbi.nlm.nih.gov/14089354/
Article Google Scholar
National Institute of Mental Health Psychopharmacology Research Branch Collaborative Study Group. (1967). Differences in clinical effects in three phenothiazines in acute schizophrenia. Diseases of the Nervous System, 28, 369–383.
Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. https://doi.org/10.2307/2335739
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chapman and Hall/CRC.
Book Google Scholar
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, 64, 583–616.
Article Google Scholar
Thijs, H., Molenberghs, G., Michiels, B., Verbeke, G., Curran, D. (2002). Strategies to fit pattern‐mixture models. Biostatistics, 3(2), 245–265. https://doi.org/10.1093/biostatistics/3.2.245
Wu, M. C., & Bailey, K. (1988). Analyzing changes in the presence of informative right censoring caused by death and withdrawal. Statistics in Medicine, 7(1-2), 337–346. https://doi.org/10.1002/sim.4780070134
Wu, M. C., & Bailey, K. R. (1989). Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model. Biometrics, 45(3), 939–955. https://doi.org/10.2307/2531694
Wu, M. C., & Carroll, R. J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 44(1), 175–188. https://doi.org/10.2307/2531905

Download references

Author information

Authors and Affiliations

Department of Psychology, University of California, Davis, Davis, CA, USA
Shelley A. Blozis

Authors

Shelley A. Blozis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shelley A. Blozis.

Ethics declarations

Conflicts of interest

The author has no relevant financial or non-financial interests to disclose.

Additional information

Open Practices Statement: The data set used in the examples is available in Fitzmaurice et al. (2011) and Hedeker and Gibbons (1997) and as a supplemental file. Scripts to analyse the empirical data and to generate and analyse the simulated data are available as supplemental files.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(CSV 23 kb)

ESM 2

(DOCX 13 kb)

ESM 3

(DOCX 12 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Blozis, S.A. Bayesian pattern-mixture models for dropout and intermittently missing data in longitudinal data analysis. Behav Res 56, 1953–1967 (2024). https://doi.org/10.3758/s13428-023-02128-y

Download citation

Accepted: 13 April 2023
Published: 23 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.3758/s13428-023-02128-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bayesian pattern-mixture models for dropout and intermittently missing data in longitudinal data analysis

Abstract

Similar content being viewed by others

Estimating psychological networks and their accuracy: A tutorial paper

Fixed and random effects models: making an informed choice

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Missingness in longitudinal data

Modelling frameworks for missing data

Pattern-mixture models

Illness severity in a sample of patients with schizophrenia

Patterns of missing data

Growth models assuming ignorable missingness

Estimation

Results: Growth models assuming ignorable missingness

Models for nonignorable missingness

Fixed pattern-mixture model

Random pattern-mixture model

Shared parameter, random-effects model

Results

A simulation study

Discussion

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher’s note

Supplementary Information

ESM 1

ESM 2

ESM 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian pattern-mixture models for dropout and intermittently missing data in longitudinal data analysis

Abstract

Similar content being viewed by others

Estimating psychological networks and their accuracy: A tutorial paper

Fixed and random effects models: making an informed choice

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Missingness in longitudinal data

Modelling frameworks for missing data

Pattern-mixture models

Illness severity in a sample of patients with schizophrenia

Patterns of missing data

Growth models assuming ignorable missingness

Estimation

Results: Growth models assuming ignorable missingness

Models for nonignorable missingness

Fixed pattern-mixture model

Random pattern-mixture model

Shared parameter, random-effects model

Results

A simulation study

Discussion

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher’s note

Supplementary Information

ESM 1

ESM 2

ESM 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation