The popularity of the linear mixed effects models (e.g., random effect models, multilevel models) is intuitively explained by the variety of different names under which the family of statistical models for clustered data are known. In clustered data (e.g., hierarchical data, multilevel data) observations are associated, and not independently observed (Dorman 2008). Dependencies among observations in clusters can be expressed as a correlation, where positively correlated observations share similar information (Kenny and Judd 1986), and are not as informative as independent observations (Galbraith et al. 2010). Only when accounting for the correlation between clustered observations, correct statistical inferences can be made. In the well-known multilevel modelling framework (McCulloch et al. 2008), this correlation is modelled by a random effect –also known as a latent variable– where clustered observations are positively correlated since they share the same random effect. The variance of the random effect then determines the strength of the correlation. As a result, the multilevel modelling frameworks restricts correlations to be positive, since a variance parameter cannot be negative.
However, negative correlations among clustered observations can and do occur (Kenny et al. 2002). For instance, when fixed resources are divided among group members, in non-random sampling when dissimilar groups are sampled by intention, or when there is competitive social interaction: when individuals compete for a scarce (and fixed) set of resources (e.g. litter mates are negatively correlated in terms of food, water and living space), and the speaking time of one individual is at the expense of another individual (Pryseley et al. 2011). When observations within clusters are negatively associated, observations within clusters are less alike than observations from different clusters (Kenny and Judd 1986). From a sampling perspective this is sometimes referred to as the situation where observations within a cluster are even less alike than under random assignment of observations to clusters (Molenberghs and Verbeke 2007; Verbeke and Molenberghs 2003; Molenberghs and Verbeke 2011). Negative intra-cluster correlations (ICC; see Table 1 for an overview of the often used abbreviations) can also be detected in randomized experiments, when evaluating the effects of covariates that vary systematically within each cluster (Norton et al. 1996).
Table 1 List with the often used abbreviations in the current article In general, it is well-known that ignoring a small positive clustering leads to the incorrect assumption that the observations are independently distributed. It is our aim to extend this knowledge with the current article: we will show that ignoring a positive and negative clustering leads to a violation of the independence assumption. In fact, any violation of the independence assumption (positive and negative) results in inaccurate Type-I errors, which increase the risk of accepting an incorrect hypothesis (Clarke 2008). Barcikowski (1981) quantified the effects of ignoring small positive correlations in clustered observations in a two-level study design (with a group and an individual level). He showed that, when having ten observations per group, even the ignorance of an ICC as small as .01 can lead to an inflation of making a Type-I error: a regression effect will be assumed to be significant with a significance level of 5% although the true significance level equalled 6%. Furthermore, Barcikowski showed that the Type-I error increased for increasing values of the ICC. For an ICC of .05 the Type-I error rate is .11 and for an ICC of .40 the Type-I error is .46. Moreover, by increasing the number of observations per group the Type-I error is even more inflated (the findings of Barcikowski are in line with those of many others, see for example Clarke 2008; Dorman 2008; Rosner and Grove 1999).
As negative clustering effects are largely unknown to the sheer majority of the research community, we conducted a simulation study to detail the bias that occurs when analysing negative clustering effects with the linear mixed effects model in similar fashion as Barcikowski (1981). Towards that end, we demonstrate that ignoring a small negative correlation leads to deflated Type-I errors, invalid standard errors and confidence intervals in regression analysis. We highlight the importance of understanding these phenomena through analysis of the data from Lamers et al. (2015). We conclude with an updated reflection on well-known multilevel modelling rules. In the remainder of this section, we discuss negative dependencies between observations in clustered data, show how the linear mixed effects model (LME) deals with negative clustering effects, and reflect on why the LME should include negative variance components. Note that the LME is used only to quantify bias when ignoring negative clustering effects. We stress that the LME is not designed to model negatively correlated observations and it should not be expected to perform properly to those cases. The covariance structure model (CSM) is introduced to deal with negative within-cluster correlations and this model is used in our real data example to examine negative clustering effects.
Type-I errors and positive and negative dependencies between observations
The inflation of the Type-I error under violated of the independence assumption can be explained by the variance inflation factor (VIF), also referred to as the design effect (Kish 1965). In case of cluster sampling, a design effect that is greater than one is known to indicate a positive within-cluster correlation, indicating that observations are not independent of each other. When \(\text {VIF} > 1\) the precision of cluster sample estimates are less than that of those based on a simple random sample with a similar size. The homogeneity in clustered observations leads to less information in comparison to an independent random sample. When ignoring a small positive ICC, the VIF is underestimated, which leads to an underestimation of the standard errors (i.e. overestimating the precision), and the corresponding confidence intervals (CIs) are too narrow, and effect sizes will then also be incorrect as they depend on standard error (SE) estimates (Hox et al. 2010; Kenny et al. 2002). When the CI of an estimate is too narrow, there is an increase in the probability to reject a correct null hypothesis, which corresponds to an inflation of the Type-I error.
Although a few researchers have reported about negative clustering effects (Kenny et al. 2002; Molenberghs and Verbeke 2007, 2011; Pryseley et al. 2011; Oliveira et al. 2017; Verbeke and Molenberghs 2003; El Leithy et al. 2016; Klotzke and Fox 2019a, b; Loeys and Molenberghs 2013), the effects of ignoring negatively clustered observations has hardly been recognized. Because negative clustering effects are not considered by the majority of the multilevel modelling community, these effects are not well understood. This is partly caused by the fact that the mixed effect models (to which we also refer as ‘LME’, see Table 1) can only describe positive correlations, and cannot handle negative correlations among clustered observations (Searle et al. 1992). In the next section, it is explained why negative clustering effects cannot be modeled with LME, and we reflect on the key principles of negative ICCs.
The LME cannot identify any negative correlation and will assume independently distributed observations. Researchers usually fix negative ICC estimates to zero and ignore any negative correlation within a cluster (Baldwin et al. 2008; Maas and Hox 2005). Furthermore, it is sometimes concluded that negative ICC estimates are caused by a small between-cluster variance (smaller than the within-cluster variance) and that such a small between-group variance can be ignored (Giberson et al. 2005; Krannitz et al. 2015; Langfred 2007). Other researchers relate negative ICC estimates to sampling error (cf. Eldridge et al. 2009), which can be ignored. Others –such as Baldwin et al. (2008), Norton et al. (1996), and Rosner and Grove (1999)– stated that the Type-I error will be deflated when fixing a negative ICC to zero.
The linear mixed effects model and negative dependencies
In this study, we consider two models: the LME and a covariance structure model (CSM, see Table 1). Both models can assess clustered data, where a one-way classification structured is considered. In the one-way classification, a common correlation is assumed among clustered observations, and observations from different clusters are assumed to be independently distributed.
The linear mixed effects model
Without making an explicit distinction between a random variable and a realized value, the LME for the one-way classification is given by
$$\begin{aligned} y_{ij} = \beta _{0} + \beta _{1}X_{ij} + u_{j} + e_{ij}, \end{aligned}$$
(1)
referred to as the random intercept model, where the random effect is assumed to be normally distributed, \(u_{j} \sim {\mathcal {N}}(0,\tau )\), and the error term is also assumed to follow a normal distribution \(e_{ij} \sim {\mathcal {N}}(0,\sigma ^2)\). A total of \(j =\) 1\(,\ldots ,m\) clusters are assumed with each \(i =\) 1\(,\ldots ,n\) observations, which leads to a balanced study design. The common intercept and regression parameter are referred to as \(\beta _0\) and \(\beta _1\), respectively. The outcome \(y_{ij}\) is assumed to be independently distributed given the random effect \(u_j\).
It can be shown that the random effect \(u_{j}\) defines a variance-covariance structure for the data. The covariance between two clustered observations is equal to (suppressing the conditioning on \(\mathbf {X}_{j}\))
$$\begin{aligned} cov\left( y_{ij},y_{lj} \right)= & {} cov\left( E\left( y_{ij}\mid u_j\right) ,E \left( y_{lj} \mid u_j \right) \right) + E\left( cov\left( y_{ij},y_{lj} \mid u_j \right) \right) \nonumber \\= & {} cov\left( \beta _{0} + \beta _{1}X_{ij} + u_{j},\beta _{0} +\beta _{1}X_{lj} + u_{j}\right) + 0 \nonumber \\= & {} cov\left( u_{j}, u_{j}\right) = var\left( u_j \right) = \tau , \end{aligned}$$
(2)
and the variance of an observation equals
$$\begin{aligned} var\left( y_{ij} \right)= & {} var\left( E\left( y_{ij}\mid e_{ij}\right) \right) + E\left( var\left( y_{ij} \mid u_j \right) \right) \nonumber \\= & {} \sigma ^2 + \tau . \end{aligned}$$
(3)
The dependence structure of the observations in the clusters \(\mathbf {y}_j\) modelled by the random effect \(u_j\) is given by
$$\begin{aligned} var\left( \mathbf {y}_{j}\right) = \mathbf {\omega } = \begin{bmatrix} \sigma _{}^{2} + \tau &{} \tau &{} \dots &{} \tau \\ \tau &{} \sigma _{}^{2} + \tau &{} \dots &{} \vdots \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \tau &{} \dots &{} \tau &{} \sigma _{}^{2} + \tau \end{bmatrix}. \end{aligned}$$
(4)
Thus, \(\mathbf {\omega }=\sigma ^2 \mathbf {I}_n +\mathbf {J}_n\tau\) represents the dependence structure implied by the random effect \(u_j\). The \(\mathbf {J}_n\) is a matrix of dimension n with all elements equal to one and \(\mathbf {I}_n\) is the identity matrix.
The covariance structure model
An alternative specification of the LME in Equation (1) can be given. In this approach, the covariance structure is modelled directly and not indirectly through the specification of a random effect. The distribution of clustered observations is assumed to be multivariate normally distributed with a covariance matrix \(\mathbf {\omega }\),
$$\begin{aligned} \mathbf {y}_j= & {} \beta _0 + \beta _1 \mathbf {X}_j +\mathbf {e}_j, \end{aligned}$$
(5)
where the errors are multivariate normally distributed, \(\mathbf {e}_j \sim {\mathcal {N}}(0,\mathbf {\omega })\). We refer to the model in Equation (5) as the CSM. The development and use of the covariance structure model has a long history, which is intertwined with the development of factor models. Classic works in covariance structure modelling can be found in that tradition (e.g., Bock and Bargmann 1966; Jöreskog 1969, 1971). Fox et al. (2017), Klotzke and Fox (2019a), and Klotzke and Fox (2019b) developed a novel Bayesian modelling framework in which they directly modelled the covariance structure of more complex dependence structures. In their Bayesian covariance structure modelling (BCSM) approach, dependencies among observations that are usually modelled through random effects are modelled directly through covariance parameters under the BCSM.
When comparing the modelling structure of the CSM (also referred to as BCSM; Klotzke and Fox 2019a, b) with that of the LME, it can be seen that the \(\tau\) is restricted to be positive in the model in Equation (1), since it represents a variance parameter. However, in the model in Equation (5), the \(\tau\) parameter can also be negative since it represents a covariance parameter. This makes the CSM more general than the LME, since the covariance parameters can be positive and negative, which allows for more flexibility in specifying complex dependence structures (cf. Klotzke and Fox 2019a, b).
The linear mixed effects model with negative variance components
There are some restrictions on the variance-covariance components in the CSM. From the definition of the error variance follows directly that the \(\sigma ^2\) is restricted to be greater than zero (i.e. 0 \(<\sigma ^2<\infty\)). However, the covariance parameter \(\tau\) is not necessarily restricted to be greater than zero. Under the CSM, the covariance matrix \(\mathbf {\omega }\) needs to be positive definite, which that implies the restriction –for balanced designs– \(n\tau +\sigma ^2>\) 0. This important result follows from Rao (1973, p. 32), where the determinant of a compound symmetry covariance matrix is expressed as
$$\begin{aligned} det\left( \sigma ^2\mathbf {I}_n + \tau \mathbf {J}_n \right)= & {} det\left( \sigma ^2\mathbf {I}_n \right) \left( 1+\tau \mathbf {1}_n^t \mathbf {1}_n/\sigma ^2 \right) \nonumber \\= & {} \sigma ^2\left( 1 + n\tau /\sigma ^2 \right) =n\tau +\sigma ^2, \end{aligned}$$
(6)
and the covariance matrix is positive definite if the determinant is greater than zero . Subsequently, \(\tau\) needs to be greater than \(-\sigma ^2/n\). However, when modeling the covariance structure with the LME, the \(\tau\) is restricted to be greater than zero, since it represents the random intercept variance. In the literature, it has been shown that the maximum likelihood estimate of the random effect variance can become negative (Kenny et al. 2002; Molenberghs and Verbeke 2007, 2011; Pryseley et al. 2011; Oliveira et al. 2017; Verbeke and Molenberghs 2003; El Leithy et al. 2016; Klotzke and Fox 2019a, b; Loeys and Molenberghs 2013). For the (one-way) LME (for balanced groups), two sums of squares are considered to estimate the covariance components \(\tau\) and \(\sigma ^2\),
$$\begin{aligned} SS_{A}= & {} \sum _{j=1}^{m} n \left( \overline{y}_j - \overline{y} \right) ^2, \nonumber \\ SS_{E}= & {} \sum _{j=1}^{m} \sum _{i=1}^{n} \left( y_{ij} - \overline{y}_j \right) ^2. \end{aligned}$$
(7)
Consider the sum of squares \(SS_{A}\), which has as expected value \(n\tau +\sigma ^2\). It follows that, \(\hat{\tau } = SS_{A}/(nm) -\sigma ^2/n\), which leads to a negative estimate of \(\tau\) if \(\sigma ^2> SS_{A}/m\). This scenario is often neglected or referred to as statistically incorrect, restricting \(\tau\) to represent a positive covariance among clustered observations.
For \(\tau >\) 0, the ICC is often interpreted as the ratio of variance explained by the clustering of observations in comparison to the total variance in the data; \(\rho = \tau /(\tau + \sigma ^{2})\) (Raudenbush and Bryk 2002; Snijders and Bosker 2012; Oliveira et al. 2017). However, the ICC can also be considered to quantify the degree of resemblance or average similarity of observations within a cluster, or as the ‘average correlation’ in each cluster (Kenny and Judd 1986; Kenny et al. 2002). Then, conceptually, a negative covariance \((\tau <0)\) represents a negative ICC. In that case, \(\rho\) becomes negative, and the ICC represents a negative association among clustered observations (i.e. observations within clusters are less alike than observations from different clusters). A negative ICC simply represents the opposite of a positive ICC: if an observation in a cluster is below the population mean, then it is more likely that another value in that cluster is above the population mean if the observations are negatively correlated (Kenny and Judd 1986; Kenny et al. 2002).
Even though negative clustering effects were discussed previously by others (cf. Oliveira et al. 2017; Pryseley et al. 2011), there still appears to be a lack of awareness about these effects. As the LME comes with the restriction that observations need to be positively clustered, several suggestions can be found in the literature that \(\tau\) should be set to zero, when the ICC estimate becomes negative (see for example Baldwin et al. 2008; Maas and Hox 2005; Gibson et al. 2015; Krannitz et al. 2015; Langfred 2007; Eldridge et al. 2009). In the next section, we will discuss our simulation study which aims to not only show that fixing the ICC to 0 is –in fact– wrong, but also quantify the bias that arises when negative clustering effects are ignored.