Multilevel data are often found in psychological research. The complex pattern of variability in such data allows the use of statistical models that can accommodate multiple sources of variation. In recent years, multilevel models have become the standard tool for analyzing such data structures (Raudenbush & Bryk, 2002). In the social sciences, missing data represent a pervasive problem that has received considerable attention during the last two decades. There is consensus in the methodological literature that methods such as multiple imputation (MI) are much better suited for treating missing data than traditional approaches such as listwise or pairwise deletion (Little & Rubin, 2002; Schafer & Graham, 2002).

Although several book-length treatises have familiarized applied researchers with modern missing-data methods (Allison, 2001; Enders, 2010; Graham, 2012; van Buuren, 2012), less has been said about how to deal with missing values in multilevel research. The previous studies concerned with missing data in multilevel modeling have consistently found that parameter estimates can be seriously distorted if the multilevel structure is not taken into account in the imputation process (Andridge, 2011; van Buuren, 2011). However, these studies have focused on random-intercept models, which assume that the relations between variables do not vary across groups.

In the present article, we focused on random-slope models. These models are frequently used in organizational and educational research to investigate whether relations at Level 1 (e.g., students, employees) vary across Level 2 units (e.g., classes, working teams), or in longitudinal research to assess different developmental trajectories across subjects. For example, Hochweber, Hosenfeld, and Klieme (2014) investigated the relationship between students’ mathematics achievement and their math grades, and Hussong et al. (2008) examined the effects of parents’ alcohol abuse on the development of children’s internalizing behavior.

Using a multivariate mixed-effects model and the software pan, we explored strategies for dealing with missing data in models with random slopes. In three simulation studies, we considered cases in which the outcome variable, the predictor variable, or both variables contain missing data, as well as different sample properties and patterns of missing data.

Missing data in multilevel research

The multivariate mixed-effects model

A few options for treating missing data in multilevel models are available in standard statistical software. The pan package (Schafer & Zhao, 2014) has been recommended for MI of multilevel data (Enders, 2010; Graham, 2012) and is easily accessible through the statistical software R (R Development Core Team, 2014). The statistical model behind pan, which we will refer to as the “pan model,” is the multivariate mixed-effects model presented by Schafer (1997). The pan model is capable of treating multilevel missing data but may also be used to describe both the imputation and analysis of multilevel data. The model reads

$$ {\mathbf{Y}}_j={\mathbf{X}}_j\boldsymbol{\upbeta} +{\mathbf{Z}}_j{\mathbf{b}}_j+{\mathbf{E}}_j, $$
(1)

where j = 1, …, G denotes groups or other observational units at Level 2. Here, the response matrix Y j of group j is regressed on a design matrix X j (containing intercept and predictor values) with associated fixed effects β, and a design matrix Z j with associated group-specific random effects b j . The random-effects matrix b j (with columns stacked) is assumed to follow a normal distribution with mean zero and covariance matrix Ψ (independent and identically distributed [iid.] for all groups). Each row of the error matrix E j is assumed to follow a normal distribution with mean zero and covariance matrix Σ (iid. for all individuals). Note that in the pan model, Σ is missing the index j, and is thus assumed to be the same for each group.

Suppose that our dataset consists of two variables X and Y, both of which are Level 1 variables that have some variation at Level 2. In a special application of the pan model, we may want to estimate the regression of Y on X with varying coefficients across groups—that is, the random-coefficient (RC) model. This model results if we write the outcome Y (e.g., students’ math grades) on the left-hand side of Eq. 1 and the covariate X (e.g., individual achievement) on the right-hand side, and allow for the intercepts and slopes to vary across groups. We will also call the RC model the “analyst’s model” because it fits our supposed research question. Finally, we can express the parameters of the analyst’s model in a single expression θ = (β, Ψ, Σ) and write f(Y | X, θ X ) in short for the RC model.

Missing data terminology

The common classification of missing-data mechanisms found in Rubin (1987) assumes a hypothetical complete data matrix, which is decomposed into observed and unobserved parts Y = (Y obs, Y mis) by an indicator matrix R, denoting the missing data. If values are missing as a random sample of the hypothetical complete data—that is, P(R | Y) = P(R)—then the data are missing completely at random (MCAR). If the missingness depends on other variables but the data are MCAR with these partialed out—that is, P(R | Y) = P(R | Y obs)—then the data are missing at random (MAR). These two missing-data mechanisms are often called “ignorable.” An ignorable missing-data mechanism is highly beneficial for MI, because all of the relevant information about the missing values is present in the dataset. This is in contrast to data that are missing not at random (MNAR), where the missingness is additionally dependent on the missing part of the data—that is, P(R | Y) = P(R | Y obs, Y mis). For such “nonignorable” missingness, a general approach to an analysis of missing data is not feasible, and strong assumptions have to be made about the missing-data mechanism (Carpenter & Kenward, 2013).

Multiple imputation for multilevel models

Multiple imputation, as introduced by Rubin (1987), is a convenient procedure for obtaining valid parameter estimates from partially unobserved data that usually relies on the MAR assumption (i.e., the observed values provide sufficient information about the missing-data mechanism). Using MI, the researcher draws independent random samples from the posterior predictive distribution of the missing values given the observed data and a statistical model, thus generating a number of complete datasets to use in further analyses. The final parameter estimates can be obtained according to the rules described by Rubin (1987) simply by averaging over the parameter estimates from all imputed datasets. Applying MI can be subtle and need not always be the most practical choice in multilevel research (Peters et al., 2012; Twisk, de Boer, de Vente, & Heymans, 2013), because its validity is subject to some further conditions.

First, with increasing variation and sample size at Level 2, it becomes necessary to include the multilevel structure in the imputation model. Ignoring the multilevel structure using single-level MI may result in biased parameter estimates (Taljaard, Donner, & Klar, 2008; van Buuren, 2011). Second, the analyst’s model has to be considered, and the imputation model must be specified accordingly (Meng, 1994; Schafer, 2003). Broadly speaking, the imputation model must account for the complexity of the desired analysis. If an imputation model is used that does not include variables or parameters that are relevant to the analyst (e.g., slope variance), then the analysis results will be biased. And third, the imputation model must incorporate relevant information about the missing-data process—that is, variables predictive of the missing variables or of the missingness itself (Carpenter & Kenward, 2013)—to make the MAR assumption more plausible (Collins, Schafer, & Kam, 2001). Satisfying these conditions can be cumbersome when varying slopes are of interest. However, little is known about how the quality of parameter estimates in multilevel modeling is affected if one of these conditions is not met.

Missing covariates in models with random slopes

When only the outcome variable contains missing values, MI for a random-coefficient model is straightforward. The imputation model can be specified in pan by writing the outcome on the left-hand side of Eq. 1 and the covariate with fixed and random effects on the right-hand side. This is the previously mentioned RC model, denoted f(Y | X, θ X ). The imputation model is then equivalent to the analyst’s model.

Fewer guidelines are available if a covariate contains missing values. If the outcome is completely observed, then a reversed imputation model may be used. For this model, the covariate is written on the left-hand side of Eq. 1 and the outcome on the right-hand side (with fixed and random effects). We will refer to this as the “reversed RC model” and denote it f(X | Y, θ Y ). This model assumes slope variation, but does so by regressing X on Y, which might induce bias in the parameter estimation. So far, pan has been recommended only for missing covariates whose effect is fixed across groups (Schafer, 1997). Alternatively, for a multivariate imputation model, denoted f(X, Y | θ 0), both variables could be written on the left-hand side of Eq. 1 with random intercepts for both variables. Slope variation is ignored in this model, but in contrast to the conditional models (i.e., reversed and regular RC), it is able to account for multivariate patterns of missing data. An additional description of these models can be found in Supplement A in the online supplemental materials. The supplemental online materials can be downloaded from http://dx.doi.org/10.6084/m9.figshare.1206375.

In three simulation studies, we assessed the performance of conditional and multivariate MI for random-slope models. In Study 1, Study 2, and Study 3 we examined cases in which missing values occurred on the outcome, the covariate, or both variables, respectively. Study 1 attempted to replicate the findings of previous research on partially observed outcome variables. We expected both conditional MI and listwise deletion (LD) to provide approximately unbiased estimates if the outcome was MAR (Carpenter & Kenward, 2013; Little & Rubin, 2002). Study 2 focused on missing covariate data. We expected that the reversed model would recover most parameters of the RC model, but that it might perform poorly for the slope variance. LD was expected to provide biased estimates with MAR and MNAR data. In Study 3 we examined multivariate missing data. We expected that multivariate MI would underestimate the slope variance but would recover most other parameters. We expected the results for LD to be similar to the results from the second study.

Study 1

In the first study, we compared the performance of LD, conditional MI, and multivariate MI when the only outcome had missing values. For conditional MI, both the analyst’s model f(Y | X, θ X ) and the imputation model g(Y | X, ω X ) were RC models in which ω X took on the same role as θ X but denoted a distinct set of model parameters. These models were equally complex and fit the clustered structure of the data. Multivariate MI was set up as described earlier, and LD was applied by restricting the analysis to complete cases only.

Simulation and method

Data generation and imposition of missing values

Two standardized normal variables X and Y were simulated. Both varied at two levels, as indicated by their intraclass correlations (ICCs) ρ X and ρ Y , respectively. The covariate X was simulated from its within- and between-group portions X W ∼ N(0, 1 − ρ X ) and X B ∼ N(0, ρ X ), respectively. Then Y was simulated conditionally on X according to Eq. 1 with fixed effects β = (β0, β1), where β0 was zero due to standardization. The covariance matrix of random effects was \( \boldsymbol{\uppsi} =\left(\begin{array}{ll}{\uppsi}_{11}^2\hfill & 0\hfill \\ {}0\hfill & {\uppsi}_{22}^2\hfill \end{array}\right) \).Thus, the intercepts and slopes were uncorrelated. The Level 1 residual variance was Σ = σ2. The variables in this study were parameterized by their intraclass correlations (ICCs) rather than their actual variance components. Given the ICC and a slope variance ψ22 2, the other variance components followed (see Snijders & Bosker, 2012) as

$$ \begin{array}{l}{\sigma}^2=\left(1-{\uprho}_Y\right)-{\upbeta}_1^2\left(1-{\uprho}_X\right)-{\uppsi}_{22}^2\left(1-{\uprho}_X\right)\kern1em \\ {}{\uppsi}_{11}^2={\uprho}_Y-{\upbeta}_1^2{\uprho}_X-{\uppsi}_{22}^2{\uprho}_X.\kern1em \end{array} $$
(2)

Missing values on Y were imposed using a linear model for the latent response variable R . Values in Y were set to be missing if their respective R > 0 according to

$$ {R}^{\ast }=\upalpha +{\uplambda}_1\mathrm{X}+{\uplambda}_2\mathrm{Y}+{\upvarepsilon}_{R^{\ast }}, $$
(3)

where α is a value of the standard normal distribution according to a missing-data probability (e.g., α = −.67 for 25 % missing data), and λ1 and λ2 are used to control the missing-data mechanism. The residuals were distributed normally with mean zero and variance

$$ {\sigma}_{R^{\ast}}^2=1-{\uplambda}_1^2-{\uplambda}_2^2-2{\uplambda}_1{\uplambda}_2\mathrm{C}\mathrm{o}\mathrm{v}\left(X,Y\right). $$
(4)

Table 1 provides an overview of the conditions included in all three studies. The two ICCs were set to be equal—that is, ρ X = ρ Y = ρ. In order for Y to be MCAR, we set λ1 = λ2 = 0, and for MAR, we set λ1 = 0.5 and λ2 = 0. For Y to be MNAR, we chose equal values for λ1 and λ2, such that the error variance in R was the same as in the MAR condition. Hence, with Cov(X, Y ) = β1 = 0.5, we had λ1 = λ2 = √0.25/3 ≈ 0.289. The conditions were chosen to mimic typical data in psychology and the behavioral sciences (Aguinis, Gottfredson, & Culpepper, 2013; Mathieu, Aguinis, Culpepper, & Chen, 2012; Murray & Blitstein, 2003).

Table 1 Simulation designs of Studies 1–3

In summary, each simulated setting was defined by the number of groups (G), the number of individuals within each group (N), the ICCs of X and Y (ρ), the fixed slope (β1), the slope variance (ψ22 2), the proportion of missing data, and the missing-data mechanism (including the missing-data effects λ1 and λ2). Each setting was replicated 1,000 times.

Imputation and data analysis

The R package pan was used to impute missing values (Schafer & Zhao, 2014). We let pan perform 10,000 burn-in cycles before drawing one imputed dataset for every 200 cycles, leading to M = 50 imputed datasets and 20,000 cycles in total (see Graham, Olchowski, & Gilreath, 2007). Diagnostic plots regarding the convergence behavior of pan’s Gibbs sampler are presented in Supplement B in the online supplemental materials.

Least-informative inverse-Wishart priors for Σ and Ψ were chosen with ΣW −1(I 1, 1) and ΨW −1(I 2, 2) for conditional MI, and ΣW −1(I 2, 2) and ΨW −1(I 2, 2) for multivariate MI, where I n denotes the identity matrix of size n. We fit the analyst’s model to each imputed dataset using the R package lme4 (Bates, Maechler, Bolker, & Walker, 2013). The final parameter estimates were obtained according to Rubin’s (1987) rules. We note that choosing least-informative priors implies a prior expectation of variances of .50, which might induce bias into small variance components. However, because noninformative priors are often desirable for MI, the same priors were used throughout the three studies. Possible alternative specifications of the prior distribution will be reviewed in the General Discussion. The computer code for running conditional and multivariate MI, with least-informative or alternative priors, is provided in Supplement C of the supplemental online materials.

Bias and the root-mean-square error (RMSE) were calculated for each condition and each parameter. The bias is the mean difference between a parameter estimate \( \overset{\hat{\mkern6mu}}{\uptheta} \) and its true value θ and is crucial for statistical reasoning in general. The RMSE is the root of the mean squared difference between \( \overset{\hat{\mkern6mu}}{\uptheta} \) and θ and represents both accuracy and precision (i.e., the variability) of an estimator. Thus, it is an important measure of practical utility.

Results and discussion

Due to the large simulation design, only the most important findings will be reported. Furthermore, only results for 25 % missing data will be reported, because higher rates did not yield interesting results. The complete results for Study 1 are given in Supplement D in the online supplemental materials. Table 2 shows the results of the first study for samples that featured small variance components (i.e., ICC = .05, ψ22 2 = .01) for MCAR and MAR data in smaller (N = 10, G = 50) and larger (N = 30, G = 150) samples. Notable values for bias and RMSE are presented in bold. Each bias presented in bold is at least ±5 % off the true value for fixed effects, and ±30 % off for variance components. For parameters whose true value was zero, a threshold of ±.05 was used. For each simulated condition, the highest RMSE is printed in bold as long as it was significantly larger than that found for the complete datasets (at least twice as large).

Table 2 Study 1: Bias and root-mean squared error (RMSE) for estimates obtained from listwise deletion (LD) and multiple imputation, given small variance components, smaller or larger samples, and missing Y

As can be seen in Table 2, neither LD nor MI produced strongly biased results, but bias emerged under specific conditions for both MI procedures. The multivariate imputation model underestimated the slope variance by as much as 50 % unless it was essentially zero (i.e., .01), but overestimated the intercept variance. Conditional MI (using the RC model) overestimated both the intercept and slope variance (Table 2, top panel). A sufficient sample size reduced bias to acceptable proportions even for the smallest variance components (Table 2, bottom panel). For larger values of the ICC (i.e., .15 and .25) and the slope variance (i.e., .05, .10, and .20), this bias was reduced to essentially zero (see Supplement D). Using LD, the intercept and slope variance were sometimes biased when samples were not sufficiently large. When the data were MNAR, all approaches yielded biased results (see Supplement D).

LD has previously been shown to provide essentially unbiased estimates when the outcome is ignorably missing (e.g., Little & Rubin, 2002). Surprisingly, the imputation models overestimated small random-effects variances in small samples. We argue that this is a side effect of the least-informative prior, which expects variances to be larger, and that bias may be reduced to zero when the prior is set on an appropriate scale (see the General Discussion). From the data at hand, both LD and conditional MI can be recommended for univariate missing data on Y, provided that the sample is sufficiently large or the prior is set on an appropriate scale. Care should be taken when small variance components are to be estimated, because overly noninformative priors may inflate them. The multivariate model is useful if the slope variance is close to zero.

Study 2

In the second study, we examined the performance of MI and LD with missing values on the covariate X. The analyst’s model was again the RC model f(Y | X, θ X ), whereas conditional MI was carried out using the reversed RC model g(X | Y, ω Y ). The two models fit the clustered structure of the data, but differed in the ways that the slope variability was attributed. Multivariate MI and LD were administered as before.

Simulation and methods

The same procedures applied in Study 1 were used to simulate the data and impose missing values on the covariate X, whereas MAR was now dependent on the outcome Y. Imputations were created by pan using the least-informative priors chosen in Study 1. The analyst’s model was fit using lme4, and the bias and RMSE were calculated for each parameter in each setting.

Results

The results of Study 2 are reported in full in Supplement D. Here, we will report the most important findings. Table 3 provides a brief overview of the results for samples that featured small variance components. Estimating the fixed effects of the RC model proved to be more accurate and efficient using MI. Specific difficulties emerged again for small variance components—that is, when samples featured small ICCs or little slope variation. In contrast to when data were missing on Y, however, estimates of larger slope variances were not necessarily unbiased.

Table 3 Study 2: Bias and root-mean squared error (RMSE) for estimates obtained from listwise deletion (LD) and multiple imputation, given small variance components, smaller or larger samples, and missing X

Fixed effects

As is shown in Table 3, LD led to biased estimates for the fixed effects unless the data were MCAR (see Supplement D). Bias for the fixed intercept varied between −.098 and −.161 with MAR data, and between −.055 and −.101 with MNAR data. The fixed slope was underestimated by approximately 6 %–10 % when the data were not MCAR. The results from MI were essentially unbiased, but the reversed model exhibited a small downward bias across conditions. The RMSE suggested that the estimates obtained from MI were at least as efficient as those obtained by LD across conditions, and more efficient when the data were not MCAR.

Interestingly, the biases from both LD and conditional MI were dependent on the amount of slope variation present in the dataset. As slope variation increased, bias became weaker with LD, and stronger with conditional MI. This result is illustrated in Fig. 1 for small samples (N = 10, G = 150), moderate ICCs (i.e., .15), and MAR data. Nonetheless, the estimates obtained from MI were more accurate and efficient across all conditions.

Fig. 1
figure 1

Bias in estimating the fixed slopes for univariate missing data on X (Study 2) for different missing-data mechanisms, different missing-data techniques, and different amounts of slope variance. ψ2 22 = true slope variance; MCAR = missing completely at random; MAR = missing at random; MNAR = missing not at random; CD = complete data; LD = listwise deletion; MV = multivariate imputation; RC = conditional imputation using the reversed RC model

Variance and covariance of random effects

Conditional and multivariate MI underestimated the intercept variance when the ICCs were small, but provided unbiased estimates otherwise. LD followed the same pattern for MCAR data, but otherwise underestimated the intercept variance. This bias was strongest in the MAR condition, weaker with MNAR data, and increased as the ICCs grew larger. Figure 2 (top row) illustrates this finding for different levels of ICC.

Fig. 2
figure 2

Bias in estimating the intercept (top row) and slope (bottom row) variance for univariate missing data on X (Study 2) for different values of the true intraclass correlation (ICC) or slope variance (ψ 2 22), respectively, and for different missing-data mechanisms and missing-data techniques. MCAR = missing completely at random; MAR = missing at random; MNAR = missing not at random; CD = complete data; LD = listwise deletion; MV = multivariate imputation; RC = conditional imputation using the reversed RC model

The results for the slope variance differed from those from Study 1. Although conditional MI again overestimated small amounts of slope variation, this bias was much weaker and practically disappeared in larger samples (see Table 3). Moderate slope variation could be estimated almost without bias. In contrast to Study 1, however, large and very large slope variances were not estimated correctly by conditional MI, but increasingly suffered from a downward bias. LD provided practically unbiased estimates of the slope variance if the sample size was sufficiently large. The positive bias for conditional MI was also present with MNAR data, whereas the negative bias was smaller. Figure 2 (bottom row) illustrates these findings for different levels of slope variation.

According to the RMSE, the intercept variance could occasionally be estimated more efficiently using MI, whereas the slope variance could be estimated more accurately using LD. However, these differences were usually very small. Supplement D even suggests that conditional MI occasionally estimated the slope variance more efficiently in small samples.

Other parameters

The covariance between random intercepts and slopes was recovered well across all conditions. The Level 1 residual variance was overestimated using MI, where conditional MI was less biased, but it was underestimated by LD when the data were not MCAR. For higher amounts of slope variation, the bias associated with LD became smaller, whereas the bias grew for MI. These patterns were observed with MAR and MNAR data, but the bias was relatively small.

Discussion

Regarding most parameters of the analyst’s model, better estimates could be obtained using the reversed MI procedure, especially when the covariate X was not MCAR. This was true for the fixed regression coefficients, but also applied to the intercept variance, and even transferred to MNAR data. However, reversed MI seemed to provide unstable estimates of the slope variance, which could be positively or negatively biased. The positive bias for small slope variances became essentially zero as the samples grew larger. For larger slope variances, the bias did not approach zero (as in Study 1), but turned negative regardless of sample size. The negative bias was, however, rather small and could be viewed as negligible, considering that it only occurred for large slope variances, which are rarely found in empirical studies. Furthermore, the overall precision of the estimates, as indicated by the RMSE, was often comparable to LD, because the data were handled more efficiently using MI. The reversed model seemed to share many but not all of the desirable properties of the regular RC model.

The multivariate imputation model is applicable if little slope variation is present in the data, but it will suppress even moderate amounts of slope variation and inflate the Level 1 residual variance. Estimates of the fixed slope obtained from multivariate MI were even less biased and more efficient than those from the reversed MI procedure. LD offered little benefit, since most of its parameter estimates were biased unless the data were MCAR. However, LD provided surprisingly accurate results for the slope variance. Small variance components were again positively biased, but less so than in the previous study. We will return to this point in the General Discussion.

Study 3

In the final study, we examined the performance of MI and LD with multivariate missing data. The analyst’s model was once again the RC model f(Y | X, θ X ), but only the multivariate imputation model g(X,Y | ω 0) could be applied. This imputation model ignores slope variability but may provide reasonable results for the remaining parameters of the analyst’s model.

Simulation and methods

The same procedures that had been used in the previous studies could be used for most tasks. Because the pattern of missing data was no longer univariate, the missing-data model had to be adjusted. We excluded unit-nonresponse from our considerations; thus, every participant was expected to have at least one observation on either X or Y. This allowed us to implement the same mechanisms described before (i.e., MCAR, MAR, and MNAR) for both X and Y. For each case, a coin toss decided whether X or Y could be missing (i.e., each was equally likely). The actual missing values were then imposed on either X or Y with the probability that was given in the simulation design. Thus, the numbers of missing values in each dataset were the same in all three studies.

Results and discussion

The results of the third study provided little further insight into the performance of LD and multivariate MI, because the bias and RMSE were usually halfway between those reported in Studies 1 and 2. The results for small variance components are presented in Table 4. The complete results are available in Supplement D. Multivariate MI provided approximately unbiased estimates of all parameters, as long as the slope variance was close to zero and the values were either MCAR or MAR. The slope variance was underestimated by as much as 40 %, especially in larger samples, where more values were imputed under false assumptions. When the data were MNAR, multivariate MI underestimated the fixed regression coefficient, but the bias was relatively small as compared with the true values. The estimates obtained from LD were approximately unbiased when the data were MCAR. When the data were MAR or MNAR, the fixed effects were biased downward and were estimated less efficiently than with multivariate MI, in which higher values for the ICC and slope variance reduced bias with LD (see Supplement D).

Table 4 Study 3: Bias and root-mean squared error (RMSE) for estimates obtained from listwise deletion (LD) and multiple imputation, given small variance components, smaller or larger samples, and missing X and Y

The results of the third study suggest that MI is necessary for the proper estimation of fixed regression coefficients. Unfortunately, pan’s multivariate imputation model could not preserve the slope variance. If the slope variance was small and the number of missing values was not very high, then the bias was relatively small in absolute size. Limiting the analysis to complete cases only distorted the parameter estimates but provided reasonable estimates of the slope variance.

General discussion

We investigated the performance of conditional and multivariate MI for univariate and multivariate patterns of missing data. Both conditional MI and LD provided unbiased estimates if only the outcome was missing. Care should be taken if covariates are partially unobserved. Imputing the covariate in a reversed manner accounted for, but also misspecified, the slope variation. Only vague estimates could be obtained for the slope variance, but the bias was not extreme, and the remaining estimates exhibited either no or less bias than would have been obtained by deleting cases. The multivariate imputation model rarely induced any bias but strongly underestimated the slope variance. Thus, it is appropriate only if the true slope variance is close to zero and not too many values are unobserved. We recommend that LD be avoided when covariate data are missing unless the data are strictly MCAR.

As is true for all computer simulations, our study was limited in several ways. The missing-data mechanisms were based on linear models and may behave quite differently in nature. Other implementations are possible, and results may vary especially for MAR and MNAR data (Allison, 2000; Galati & Seaton, 2013). We focused on descriptive measures of approximate performance but ignored statistical inference. Testing for slope variation (LaHuis & Avis, 2007) as well as for the Type I and Type II error rates associated with LD and MI should be a subject of future research. Rather than estimating the slope variance, researchers often wish to explain it using predictor variables at Level 2 (Aguinis et al., 2013; Mathieu et al., 2012). Cross-level interaction effects might be relatively easy to recover, even if the slope variance is not.

Interestingly, small variance components were positively biased across the three studies. We argue that this was due to the standard least-informative prior, which induces bias into small variance components. Ad hoc procedures might combine the specific advantages of LD and MI and lead to less biased and more stable estimates. For example, choosing D −1 = 2 · Ψ LD as the scale matrix of the inverse-Wishart prior for the covariance matrix of random effects, where Ψ LD is an estimate of this covariance matrix obtained from LD, would loosely center the prior distribution around appropriate values. The computer code for this specification is provided in Supplement C of the supplemental online materials. We conducted a small simulation to examine whether the bias for the intercept and slope variance could be reduced by rescaling the prior distribution in this manner. The simulation featured small samples, univariate MAR data for either X or Y, small values for the ICCs, as well as small and very large values for the slope variance. Estimates of small variance components that utilized the adjusted prior did not exhibit any more bias than LD did, and they were often more efficient. The positive bias reported in Studies 1 and 2 could therefore be viewed as an artifact of specifying the least-informative prior. The negative bias for large slope variances in Study 2, however, could not be improved in this manner. Using least-squares or maximum likelihood estimation might further strengthen this approach.

The methodological literature offers alternatives to pan for multilevel MI. It has been suggested that multilevel data be imputed using dummy variables in random-intercept models, but that imputations should be conducted separately for each group if random slopes are involved (Graham, 2009, 2012). However, Andridge (2011) found that the first approach leads to biased results, and unreported simulation results indicated that very large samples are needed to treat even small amounts of missing data with the second approach. Alternative MI procedures include fully conditional specification using chained equations (van Buuren & Groothuis-Oudshoorn, 2011). These procedures might lead to better results, but they may face similar problems with respect to the slope variance. However, recent developments in the context of substantive model-compatible MI have offered promising results for interaction effects and nonlinear terms among covariates that have missing values (Bartlett, Seaman, White, & Carpenter, 2014; von Hippel, 2009). Extending this approach to multilevel MI (Goldstein, Carpenter, & Browne, 2014; Goldstein, Carpenter, Kenward, & Levin, 2009) and applying it to random-slope models should be the subject of future research. Adaptations of the pan model have been proposed by Shin and Raudenbush (2010) and Yucel (2011). The latter approach specifies a joint model that allows the within-group covariance matrix to vary across groups and has recently been discussed by Carpenter and Kenward (2013). However, it is currently not available in standard software and has yet to be evaluated in a systematic manner.

In general, we believe that MI is a flexible and powerful tool that can be used to treat missing data in multilevel research. More research should be conducted to generalize the current formulations of MI and to evaluate recent developments as well as sensible ad hoc solutions to missing data in multilevel models with random slopes.