Abstract
Permutation methods offer an acceptable and convenient tool for inferring zero variance components in linear mixed models using the likelihood ratio test. However, when data exhibit heavytailed distribution, heavyskewed distribution or outliers, maximum likelihood estimation may not be the best choice in constructing useful test statistics. In this article, we propose the use of robust rankbased estimation as an alternative. The finite sample distribution of our test statistic is well approximated using suitable permutations of the cluster indices that are exchangeable when the null hypothesis is true. Empirical results, comparing the new test to existing tests, indicate that all tests maintain acceptable Type I error rates when data exhibit heavytailed or heavyskewed distributions. However, only our new test remains robust against the presence of outlier in the response space. Besides, it is only the latter case where other tests could show a competing power to our test. Otherwise, the new test is superior with an outstanding power under the remaining settings.
1 Introduction
Statistical inference using linear mixedeffects (LME) models is usually encountered in many applications where the structure of the data exhibits a clustering nature (Fitzmaurice et al. 2007), accounts for blocking factors (Kloke et al. 2009), or is delivered from a twostage sampling design (Pfeffermann 2013). Inferring the need for random effects or equivalently testing the nullity of variance components is an essential task in LME models. Assuming the familiar chisquare distribution of the likelihood ratio test (LRT) statistic is usually criticized because the null value of the variance components lies on the boundary of the parameter space (Self and Liang 1987). The limiting distribution of the LRT statistic is derived in Self and Liang (1987) as a mixture of chisquare distributed random variables for models involving one variance component. For models containing multiple random effects, Stram and Lee (1994) concluded that the asymptotic distribution of the LRT statistic can be affected by the correlation between the random effects. Investigation of this variance boundary problem is considered in various studies involving Shapiro (1985, 1988) and Stoel et al. (2006). Using numerical simulations, Fitzmaurice et al. (2007) suggested that even with large number of clusters, the mixture chisquare distribution is a poor approximation. Crainiceanu and Ruppert (2004) used a simulationbased algorithm to generate the finite sample distribution using an eigendecomposition of the LRT statistic. Recent tests for zero variance components tend to approximate the finite sample distribution of the LRT statistic using permutations methods (Arboretti et al. 2015). See for example Fitzmaurice et al. (2007) and Lee and Braun (2012). Permutation methods have been used also in the tests proposed in Drikvandi et al. (2013) and Du and Wang (2020).
In practice, the presence of outliers, heavytailed distributions, or heavyskewed distributions is evident in various applications exhibiting hierarchical data structures. In such cases, the superiority of likelihoodbased estimation is questionable and hence the use of the LRT. On another hand, to our knowledge, neither robust variance components test procedures nor a relevant empirical assessment of the robustness of the LRT has yet been considered in the literature under such distributional violations. Robust rankbased estimation of LME models offers an attractive alternative to maximum likelihood estimation (Hettmansperger and McKean 2010; Liu and McKean 2015). Original developments for regression models with identically and independently distributed (iid) errors were considered in Jureckova (1971) and Jaeckel (1972). Kloke et al. (2009) developed the theory for obtaining robust jointrank (JR) estimators of the unknown parameters under LME models with one variance component. The development therein provides protection against outlying responses, heavytailed symmetric distributions, and heavyskewed distributions of the error components. Of note, robust rankbased estimation has not been used in constructing test statistics for testing zero variance components. Bridging this gap provides a reasonable alternative to the LRT when it does not offer the best choice.
The objective of this article is to introduce a robust test that also does not suffer from the variance boundary problem. To achieve this task, we use the robust rankbased estimation method under LME models (Hettmansperger and McKean 2010; Kloke et al. 2009). The task is fulfilled by introducing a test statistic with a wellapproximated finite sample distribution, i.e. controllable TypeI error rate, using a permutation method. In other words, we propose a permutation test where calculation of the test statistic is based on the robust rankbased parameter estimation theory. We shall base the calculation of our test statistic on the estimators of the fixed effects and the variance components as prescribed in Kloke et al. (2009). Under the null hypothesis of zero variance components, the cluster indices are simply random labels. Thus, any permutation of those indices is just equally likely, ensuring their exchangeability (Fitzmaurice et al. 2007). As such permutation of the indices is nothing but permuting the pairs \((y,x)\) that include the response and the associated set of explanatory variables, then it is also a permutation of the residual errors that are iid when the null hypothesis holds. Hence, the necessary condition of the exchangeability of the residual errors is satisfied. An approximate finite sample distribution of the proposed test statistic is then obtained using the permutation distribution (Pesarin and Salmaso 2010) generated in conjunction with the robust estimation of the parameters of the LME model.
We shed light on situations where rankbased estimation is more efficient (produces smaller standard errors) than maximum likelihood estimation (Kloke et al. 2009; McKean and Kloke 2014; McKean and Hettmansperger 2016). In such situations, our empirical results show that robust rankbased estimation empowers the use of permutation tests for testing zero variance components. We emphasize that our development applies under LME models involving a single variance component. We rely on simulation experiments via which we highlight the superiority of the proposed test under all chosen schemes for comparisons. Simulation schemes are chosen such that we violate many of the standard assumptions upon which maximum likelihood estimation is known to lose efficiency. Using the proper score function for calculating the robust rankbased estimates, the proposed permutation test can be as doubly powerful (or even more) as the remaining tests.
The rest of this paper is organized as follows. Section 2 introduces the LME model. The proposed test statistic is considered in Sect. 3. In Sect. 4, the results of the simulation study are presented and a summary of the performance of the proposed test is provided. An application to a real dataset is given in Sect. 5. Conclusions of this study are summarized in Sect. 6.
2 Linear mixedeffects model
Consider a data set of \(m\) clusters, with \({n}_{k}\) observations in the kth cluster, \(k=1,\dots ,m\). Let \({{\varvec{Y}}}_{k}\) and \({{\varvec{X}}}_{k}\), denote, respectively, the \({n}_{k}\times 1\) vector of responses and the \({n}_{k}\times p\) design matrix. Let \({b}_{k}\) denotes the kth random cluster effect, and \({{\varvec{\epsilon}}}_{k}\) the \({n}_{k}\times 1\) vector of errors. The model for \({{\varvec{Y}}}_{k}\) is
where \({\varvec{\beta}}\) is the vector of regression coefficients that usually contains an intercept term. Alternatively, the model can be written in a compact form as \({\varvec{Y}}={\varvec{X}}{\varvec{\beta}}+{\varvec{Z}}{\varvec{b}}+{\varvec{\epsilon}}\) where \({\varvec{Y}}={({{\varvec{Y}}}_{1}^{{{\prime}}},\dots ,{{\varvec{Y}}}_{m}^{{{\prime}}})}^{{{\prime}}}\), \({\varvec{X}}={({{\varvec{X}}}_{1}^{{{\prime}}},\dots ,{{\varvec{X}}}_{m}^{{{\prime}}})}^{{{\prime}}}\), \({\varvec{\epsilon}}={({{\varvec{\epsilon}}}_{1}^{{{\prime}}},\dots ,{{\varvec{\epsilon}}}_{m}^{{{\prime}}})}^{\mathrm{{\prime}}}\), \({\varvec{b}}={({b}_{1}, \dots ,{b}_{m})}^{\mathrm{{\prime}}}\), and \({\varvec{Z}}=diag({1}_{1},\dots ,{1}_{m})\) such that \({1}_{k}\) denotes an \({n}_{k}\times 1\) vector of ones. Further, denote by \(N={\sum }_{k=1}^{m} {n}_{k}\) the total sample size and let \(E\left({\varvec{\epsilon}}\right)=0\), \(var\left({\varvec{\epsilon}}\right)={\sigma }_{\epsilon }^{2}{\varvec{I}}\), \(E\left({\varvec{b}}\right)=0\), \(var\left({\varvec{b}}\right)={\sigma }_{b}^{2}{\varvec{I}}\), and \(cov\left({\varvec{\epsilon}},{\varvec{b}}\right)=0\). Independence is assumed among the random effects in \({\varvec{b}}\), among the residual errors in \({\varvec{\epsilon}}\), and between \({\varvec{b}}\) and \({\varvec{\epsilon}}\).
The objective of this article is to test whether the random effects are needed in model (1). Thus, the hypothesis of interest can be formulated as
Let \({l}_{{H}_{0}}\) and \({l}_{{H}_{1}}\) denote, respectively, the loglikelihood functions maximised over \({H}_{0}\) and \({H}_{1}\). The LRT statistic is given by
Crainiceanu and Ruppert (2004) proposed a finite sample distribution of the LRT in (3) under null hypotheses and provided an algorithm for simulating that distribution. Fitzmaurice et al. (2007) proposed a permutation test for variance components using (3), which provides a onesided pvalue and has the correct empirical size regardless of the number of clusters or the cluster size. The latter test randomly permutes the cluster indices, holding the number of observations within each cluster as structured in the original dataset. The authors showed, using simulation studies, that this permutation test controls the TypeI error rate when the null hypothesis holds. We shall follow the same permutation method given therein.
As we focus on situations where the common assumptions underlying maximum likelihood estimation are severely violated, one immediately thinks of robust estimation methods. We mainly consider robust rankbased estimation. The statistical theory for rankbased estimation under (1) is developed in Kloke et al. (2009). We provide a brief overview of this method. The subsequent steps to generate the finite sample distribution of the proposed test statistic are given in Sect. 3.
For notational convenience, let \(\eta \) denote the intercept term to be excluded from \({\varvec{\beta}}\) and rewrite model (1), following the notations in Kloke et al. (2009), such that
where
Combining (4) and (5) for all clusters, then
where \({\varvec{e}}={({{\varvec{e}}}_{1}^{\mathrm{{\prime}}},\dots ,{{\varvec{e}}}_{m}^{\mathrm{{\prime}}})}^{\boldsymbol{^{\prime}}}\). The following assumptions are needed. The random vectors in \({\varvec{e}}\) are independent and the univariate marginal distribution of \({{\varvec{e}}}_{k}\) is continuous and is the same for all \(k\). Let \({F}_{{\varvec{e}}}(.)\) and \({f}_{{\varvec{e}}}(.)\) denote, respectively, this common distribution function and density function about \({{\varvec{e}}}_{k}\). Further, assume that \({f}_{{\varvec{e}}}(.)\) is absolutely continuous and that the usual regularity (likelihood) conditions hold. Assume further that Huber’s condition holds for the design matrix \({\varvec{X}}\) [i.e. the leverage values get uniformly small as \(N\) goes large (Kloke et al. 2009)]. Under a LME modelling framework, the ordinary rankbased estimator of \({\varvec{\beta}}\) is given by
where \({\parallel {\varvec{v}}\parallel }_{\varphi }={\sum }_{t=1}^{N}\left\{a[R({v}_{t})]{v}_{t}\right\}\) for \({\varvec{v}}\in {\mathbb{R}}^{N}\), \(R({v}_{t})\) denotes the rank of \({v}_{t}\) among \({v}_{1},\dots ,{v}_{N}\) and the scores \(a\left[.\right]\) are generated as \(a\left[t\right]=\varphi [t/(N+1)]\) for \(\varphi (u)\) a nondecreasing bounded squareintegrable function defined on the interval (0,1) such that \(\sum_{t}a[t]=0\), \(\underset{0}{\overset{1}{\int }}\varphi (u)du=0\) and \(\underset{0}{\overset{1}{\int }}{\varphi }^{2}(u)du=1\). The estimator in (7) satisfies the solution to \({{\varvec{S}}}_{{\varvec{X}}}\left({\varvec{\beta}}\right)=0\) where
The estimator of the intercept term \(\eta \), denoted by \(\widehat{\eta }\), is given by the median over the residuals where
Consequently, the residuals are defined as
The estimate of \({\sigma }_{b}^{2}\) using these residuals can be calculated as follows. Rewrite model (4) in elementwise form as
for \(j=1,\dots ,{n}_{k}\). Since the residuals \({\widehat{e}}_{kj}\) in (10) provide estimates of the left side in (11), a predictor of \({b}_{k}\) for a given cluster, say \(k\), is the median over the \({n}_{k}\) residuals in that cluster. That is, \({\widehat{b}}_{k}={median}_{1\le j\le {n}_{k}}\{{\widehat{e}}_{kj}\}\). The robust estimator of \({\sigma }_{b}^{2}\) is given by
The last formula for \({\widehat{\sigma }}_{b}^{2}\) denotes the squared scaled median absolute deviations of \({\widehat{b}}_{k}\)’s from their overall median. See Kloke et al. (2009) and Liu and McKean (2015) for thorough details and references on the derivation of \({\widehat{\sigma }}_{b}^{2}\) and the rationale behind it.
3 New test based on robust estimation
Permutation tests (Pesarin and Salmaso 2010, 2012; Hahn and Salmaso 2017) are nonparametric computationally intensive tests. In regression contexts, permutation tests possess the nominal size (Schmoyer 1994) when the sample data are correctly permuted such that the null distribution of the test statistic is approximated by repeatedly computing its values using each permuted sample. Specifically, those tests assume the exchangeability of the values being permuted (Basso et al. 2009) where exchangeability is less stringent than being iid.
We propose a robust permutation test for (2), utilizing the fact that permutation tests are distribution free. To investigate the robustness, we consider the error components in (1) to follow a symmetric distribution with heavy tails, a heavy skewed distribution, or to contain outliers. To fulfill this proposal, we replace the unknown variance component \({\sigma }_{b}^{2}\) by its robust rankbased estimator \({\widehat{\sigma }}_{b}^{2}\) as described in Sect. 2, which can be calculated from the available data (\({\varvec{Y}},{\varvec{X}}\)). Letting \({\varvec{Z}}=diag({1}_{{n}_{1}},\dots ,{1}_{{n}_{m}})\), the proposed test statistic is given by
where the test offers the calculation of a onesided pvalue in a way that yields the correct TypeI error rate under the null hypothesis. As the expression in (12) will be applied to random intercept models, \({T}_{JR}\) is simply proportional to \({\widehat{\sigma }}_{b}^{2}\) since \({T}_{JR}={\widehat{\sigma }}_{b}^{2}{\sum }_{k=1}^{m}{n}_{k}\).
Construction of the permutation distribution of \({T}_{JR}\) is needed to calculate the pvalue. To do so, The marginal errors in (6) are permuted where, under the null hypothesis, the errors \({\varvec{e}}\) are iid with zero mean and variance equal to \({\sigma }_{\epsilon }^{2}\) and thus they are exchangeable. Note that the subtraction of the fixed effects term in (6) from \({\varvec{Y}}\) resolves the problems of requiring the continuous covariates to be identical among the clusters and the necessity of having equal number of observations per cluster. Hence, the errors can be permuted within and between clusters. Since \(\eta \) and \({\varvec{\beta}}\) need to be replaced by their estimates in practice, the estimated errors are calculated from the alternative model. It is shown by Schmoyer (1994) that, under the null hypothesis, the residuals are also asymptotically exchangeable both within and among clusters. Since \({\widehat{\sigma }}_{b}^{2}\) is a function of the residuals \({\widehat{e}}_{kj}\), as shown below (11), a straightforward permutation distribution for \({T}_{JR}\) can be generated.
Since the number of permutations grows with \(N={\sum }_{k=1}^{m} {n}_{k}\), we use a general algorithm for obtaining a Monte Carlo estimate of the permutation pvalue as follows:

(i)
Under \({H}_{0}: {\sigma }_{b}^{2}=0\), calculate \({T}_{JR}\) from the original sample.

(ii)
Randomly permute the cluster indices over all clusters, holding fixed the cluster sizes as \({n}_{k}\) in the new permuted sample. Then, recalculate the test statistic, say \({T}_{JR}^{(r)}\) where the superscript \(r\) denotes that the rth permutation sample has been constructed.

(iii)
Repeat the process a large number of times, say \(\widetilde{R}\) times, producing \(\widetilde{R}\) test statistics \({T}_{JR}^{(r)}\), \(r=1,\dots , \widetilde{R}\).

(iv)
The onesided pvalue, according to steps (i)–(iii), is calculated as the proportion of permutation samples (out of \(\widetilde{R}\)) such that \({T}_{JR}^{(r)}\) exceeds the original sample value of the test statistic.
In implementing of the Monte Carlo algorithm, the pooled set of pairs \(\left\{\left({y}_{kj},{{\varvec{x}}}_{kj}\right);k=1,\dots ,m;j=1,\dots ,{n}_{k}\right\}\) are exchangeable when the null hypothesis in (2) is true. The set of all residuals \(\left\{{\widehat{e}}_{kj};k=1,\dots ,m;j=1,\dots ,{n}_{k}\right\}\) are also exchangeable under the null hypothesis because both \(\widehat{\eta }\) and \({\widehat{{\varvec{\beta}}}}_{\varphi }\) are permutation invariant. Indeed, this invariance applies under any suitable regression estimation method when \({\sigma }_{b}^{2}=0\). When the distribution of the error components in the righthand side of (5) is contaminated, our proposed test is thus based on the invariant values of \(\widehat{\eta }\) and \({\widehat{{\varvec{\beta}}}}_{\varphi }\) using robust rankbased estimation of \({\widehat{\sigma }}_{b}^{2}\). The generated permutation distribution is valid regardless of (i) the distributional assumptions that are made about the error components in model (1) except for the first two moments, (ii) the estimation method that can be used to fit the model provided that the estimator is invariant to data permutations when the null hypothesis is true, and (iii) the cluster size, \({n}_{k}\), which may change from one cluster to another in unbalanced data. Beside \({T}_{JR}\), the above algorithm also applies to obtain the sampling distribution of \({\widehat{\sigma }}_{b}^{2}={\left({\sum }_{k=1}^{m}{n}_{k}\right)}^{1}{T}_{JR}\).
4 Simulation study
Simulation experiments are conducted to investigate the performance of the proposed test (\({T}_{JR}\)test hereafter). The empirical size and power are evaluated and compared to the permutation LRT (pLRT) (Fitzmaurice et al. 2007), the LRT and the restricted LRT (RLRT) (Crainiceanu and Ruppert 2004). The simulation setup covers various schemes such that focus is on the violations of the standard distributional assumptions about the error terms that are known to reduce the efficiency of the maximum likelihood estimators.
4.1 Simulation setup
Let the model for the response variable \({y}_{kj}\) given the random effect \({b}_{k}\) be given by
where we choose \(m=30, 40\) clusters, \({n}_{k}=3, 10\) observations within a cluster and \(\eta =2\). Assume that the intracluster correlation (ICC) takes on the values 0.10, 0.20, and 0.30 where ICC \(={\sigma }_{b}^{2}/({\sigma }_{b}^{2}+{\sigma }_{\epsilon }^{2})\). For every test under consideration, the value of ICC = 0 is used to examine the empirical size (TypeI error) while the empirical power (ICC \(>0\)). Both size and power are explored under the violation schemes given next. Assume that \({b}_{k}\sim N(0,{\sigma }_{b}^{2})\) and that the residual error term \({\epsilon }_{kj}\) follows a symmetric contaminated normal distribution, a skewed contaminated normal distribution, a normal distribution while allowing for the presence of outliers, and a skewed distribution. The detailed setup under each scheme, involving the value of \({\sigma }_{\epsilon }^{2}\), is given below.
4.1.1 Symmetric contaminated normal distribution
A symmetric contaminated normal distribution is a mixture of two normal distributions with mixing probabilities \((1\delta )\) and \(\delta \) where \(0<\delta <1\). For any random variable, say \(\epsilon \), that follows a normal distribution with density function \(g(\epsilon ; \mu , {\sigma }_{\epsilon })\) where \(\mu \) and \(\sigma \) denote, respectively, the mean and the standard deviation of the distribution, the contaminated normal density can be expressed as \({f}^{*}(\epsilon ) = (1\delta )g(\epsilon ; \mu , {\sigma }_{\epsilon }) + \delta g(\epsilon ; \mu , \lambda {\sigma }_{\epsilon })\) where \(\lambda > 1\) is a parameter that determines the standard deviation of the wider component. In the simulations, we apply the definition of \({f}^{*}(.)\) to the residual errors \({\epsilon }_{kj}\) in (13). We consider \(\delta =20\%\) as a commonly used level of contamination in the distribution of \({\epsilon }_{kj}\) (Kloke et al. 2009), \(\lambda =5\), \(\mu =0\) and \({\sigma }_{\epsilon }^{2}=1\). Table 1 summarizes the simulation results of this scheme.
4.1.2 Skewed contaminated normal distribution
Here, we investigate the performance of the tests when \({\epsilon }_{kj}\) are generated from a skewed normal distribution which can be defined as
where \( \phi \left( \epsilon \right)\,{\text{and}}\,\user2{\Phi }\left( {{\text{s}}\epsilon } \right) \) denote the standard normal density function and its distribution function that are defined at point \( {{\text{s}}\epsilon } \) respectively (Azzalini and Valle 1996). The component \( s \) represents the shape/skewness parameter because it regulates the shape of the density function. In the empirical study, \({\epsilon }_{kj}\) are generated from a skewed normal distribution that is contaminated, as defined in Sect. 4.1.1, with level of contamination being equal to \(\delta =20\%\), where \(\lambda =5\), \(\mu =0\), \({\sigma }_{\epsilon }^{2}=1\) and skewness parameter equal to 10 (McKean and Kloke 2014). The simulation results of this scheme are given in Table 2.
4.1.3 Outliers
Assuming that \({\epsilon }_{kj}\sim N(0,{\sigma }_{\epsilon }^{2})\) where \({\sigma }_{\epsilon }^{2}=0.5\), under this scheme we replace 5% of the residual errors by residual errors drawn from \(N(5,{15}^{2})\). We adopt this replacement for \({\epsilon }_{kj}\) while maintaining \({b}_{k}\sim N(0,{\sigma }_{b}^{2})\). Maximum likelihood estimation is known to produce inefficient estimates under the presence of outliers of this form. Table 3 emphasizes the consequences of this fact by displaying the empirical TypeI error rates that are achieved by each of the competing tests. The corresponding empirical power results are also reported.
4.1.4 Skewed distribution
We also investigate the performance of the competing tests when \({\epsilon }_{kj}\) are generated from heavily skewed distributions such as the Cauchy distribution with location parameter zero and scale parameter 0.5 [i.e. C(0, 0.5)], the chisquare distribution with 1 degree of freedom and the lognormal distribution with parameters (\(\mu \) = 0, \(\sigma \) = 1). The results of Cauchy distribution are provided in Table 4 while those for the chisquare and lognormal distributions are provided in Table 5.
4.2 Simulation results
Though not restricted to, the simulation outcomes obtained for the proposed test are based on defining \(\varphi \left(u\right)=\sqrt{12}[u(1/2)]\) where \(\varphi \left(u\right)\) is mentioned below (7) which denotes the Wilcoxon score function (Hettmansperger and McKean 2010; Kloke et al. 2009). Applying JR estimation, presented in Sect. 2, to calculate \({\widehat{\sigma }}_{b}^{2}\) under the working model (13) is essential for computing \({T}_{JR}\) as given in (12). Note that the vector of residuals \({\widehat{{\varvec{e}}}}_{JR}\) is calculated under the working model as \({\widehat{{\varvec{e}}}}_{JR}={\varvec{Y}}{1}_{N}\widehat{\eta }\), where \(\widehat{\eta }={median}_{kj}\{{y}_{kj}\}\). For the remaining tests, we use maximum likelihood estimation as recommended in their corresponding references. To evaluate the size or the power of each test, we generate 10,000 original samples. Besides, 10,000 permutation samples per each original sample are generated to test the null hypothesis and obtain the pvalues using the \({T}_{JR}\)test and the pLRT. The empirical size is calculated as the proportion of times in which a given pvalue is less than or equal the nominal level \(\alpha =5\%\).
Under the first contamination scheme, Table 1 summarizes the empirical sizes (ICC = 0) of the proposed \({T}_{JR}\)test, which are close to the nominal level \(\alpha =5\%\). The LRT is the next closest test to the nominal level followed by RLRT. The empirical power (ICC \(> 0\)) of the \({T}_{JR}\)test exceeds the power of the remaining tests where the poorest performance is provided by pLRT. We can see that when \(m=30, 40\) and \({n}_{k}=3\), the power (as the ICC departs from zero) of the \({T}_{JR}\)test increases, though not with high jumps, at faster rate compared to the remaining three tests. However, as the cluster size increases (\({n}_{k}=10\)), both the rate of increase in the power of the \({T}_{JR}\)test and the gap from the other tests increase, confirming the superiority of the proposed test. It is obvious that the increase in the cluster size is the factor that most discriminates the performance of the competing tests where the best performance is always dedicated to the proposed \({T}_{JR}\)test.
Table 2 presents the results under the second scheme in where the residual errors have a skewed contaminated normal distribution. The size of each of the four competing tests remains not too distant from the nominal level. The \({T}_{JR}\)test, in particular, preserves an acceptable performance along with the chosen cluster sizes and number of clusters. The power of the \({T}_{JR}\)test remains the highest in all experiments. We also note that the power performance of the other three tests remains very close to each other as the value of the ICC increases. Unlike the comparisons made under the first scheme, the pLRT here possesses a competitive power to the LRT and the RLRT. Maintaining all other factors fixed at their level under this scheme, we note that the imposed skewness on the distribution of the residual error widens the gap between the \({T}_{JR}\)test and the remaining tests if compared to the situation when residual errors follow a symmetric contaminated distribution (i.e. Table 1). This considerable discrimination holds for every power comparison (i.e. for every ICC > 0).
As mentioned in Sect. 4.1.3, the third scheme in our simulation experiments is concerned with the presence of outliers in the yspace and its implications on the performance of the competing tests. Table 3 provides the empirical sizes and powers of the four tests. We observe that the presence of outliers has a dramatic effect on TypeI error rates produced by the pLRT, LRT and RLRT (i.e. when ICC = 0). Obviously, the \({T}_{JR}\)test is the only robust test with reasonable rates that are close to the nominal level of 5%. The empirical sizes of the remaining three tests are far distant from this nominal level, indicating how poor and unreliable might the performance of these tests be when outliers are suspected in the available data.
Although the three tests (pLRT, LRT, and RLRT) do not possess correct error rates under null hypothesis when outliers are present, results on their rejection rates are reported when the alternative hypothesis in (2) holds. It is obvious that as any of the three factors (i.e. ICC level, the cluster size, and the number of clusters) increases, the corresponding rejection rates increase. Noticeably, when ICC = 0.30, the proportion of rejecting the nullity of the variance component using the LRT and the RLRT is either close to the power of the \({T}_{JR}\)test or even higher. Nevertheless, we recommend the use of the \({T}_{JR}\)test due to its robust performance in the presence of outliers.
The results of the fourth scheme are provided in Tables 4 and 5. Assuming the Cauchy distribution for \({\epsilon }_{kj}\), we conclude from Table 4 that the \({T}_{JR}\)test proceeds to control TypeI error rates when ICC = 0. As in the previous scheme, the other three tests do not guarantee an acceptable rejection rates under the null hypothesis. The \({T}_{JR}\)test proceeds to outperform the remaining tests in terms of its power under the alternative hypothesis. Indeed, the remaining tests fail to reject the null hypothesis due to the poor estimates produced using the maximum likelihood method under this scheme.
Further investigation under the fourth scheme is provided where \({\epsilon }_{kj}\) are generated from two heavily skewed distributions, namely the \({\chi }_{(1)}^{2}\) and lognormal(0,1) distributions. In Table 5, the empirical sizes of the three competing tests remain unstable but generally improve over their corresponding performance in Table 4. Noticeably, their power improves as we depart from the null hypothesis. The proposed \({T}_{JR}\)test remains the champion in terms of power comparisons, as is the case in all previous settings.
To sum up, the simulation experiments that are conducted in this section show a strong evidence that favors the use of the proposed \({T}_{JR}\)test, based on sizepower comparisons, to the other three tests. Our proposal remains robust when the other tests fail to do so, preserving a considerable power increase in all the schemes under consideration as we depart form the nullity of the ICC.
5 Rat pup data
In this section, the rat pup dataset (Pinheiro and Bates 2006) is used. The study considers the experimental compound effects on the birth weights of 322 pups for 30 mother rats. The data consists of 27 litters, which were randomly assigned to a specific level of treatment (high, low, control), and 322 rat pups were nested within these litters. The study had an unbalanced design such that the number of pups per litter is not the same. The smallest litter had a size of 2 pups while the largest litter had a size of 18 pups. In addition, the number of litters per treatment is not the same (i.e. 10 litters were assigned to the control treatment, 7 to the high dose treatment and 10 litters were assigned to the low dose treatment).
A summary of the weightsbytreatment and sex is provided in Table 6 and Fig. 1. We note that the experimental treatments (high and low) appear to have a negative effect on mean birth weight. The averages (also the medians) of the birth weights for the pups born in litters that received high and low treatments are lower than the those of the birth weights for rats born in litters that received the control dose. Besides, the sample means of birth weights of male pups are higher than those of females within all levels of treatment.
Figure 2 describes the litter effect on the rat pup birth weights using 27 box plots such that, from left to right, the first 10 belong to control level followed by 7 box plots that belong to a high level and the last 10 belong to the low level of treatment. It is obvious that the means/medians of the 27 box plots are not same where the largest means/medians appear in litters 8, 17 and 27 and the smallest means/medians are in litters 1, 11, 12 and 18. Potential outliers are also recognized in both Figs. 1 and 2 since some pups appear to have either lower or higher weights than the other pups that belong to the same group (treatment/litter).
5.1 LME model for the rat pup data
Figure 2 indicates a potential varying litter effect on the distribution of the values of the rat pup birth weights in each litter. Considering this effect to be random, the individual birth weight observation (\({WEIGHT}_{kj}\)) of the jth rat pup within the kth litter can be modeled using the following twolevel random intercept regression model:
where \({n}_{k}\) refers to the litter size that ranges between 2 and 18 pups per litter, \({WEIGHT}_{kj}\) is the response variable, \({TREAT1}_{k}\) and \({TREAT2}_{k}\) denote respectively level2 indicator variables for receiving the high and low levels of treatment, \({SEX}_{kj}\) is a level1 indicator variable for female rat pup and, \({LITSIZE}_{k}\) refers to the size of litter \(k\), where \(k=1, \dots , 27\). The random litter effect, \({b}_{k}\), is assumed to have normal distribution with mean zero and constant variance \({\sigma }_{litter}^{2}\) and the residual error term, \({\epsilon }_{kj}\), is also assumed to have a normal distribution with mean zero and constant variance \({\sigma }_{residuals}^{2}\) (Pinheiro and Bates 2006).
5.2 Parameter estimation
Former analyses of this dataset focused on using the restricted maximum likelihood (REML) estimation method to infer about the effect of the different treatment levels on the birth weight (Pinheiro and Bates 2006). REML estimation also represents the basic method on which the competing tests were based, and is preferred to maximum likelihood estimation as it takes into account the loss in degrees of freedom due to estimation of fixed effect parameters (Patterson and Thompson 1971). Nevertheless, REML estimation does not figure out the potential effect of outliers and other violations of the distributional assumptions on the efficiency of the estimates and the consequent inference under the LME framework. In the remainder of this section, we highlight the gains from using the robust rankbased estimation method in terms of estimating both the fixed effects and the variance components with higher efficiency when compared to likelihoodbased estimates.
The results of fitting model (15) are reported in Table 7 using the REML method versus the robust nonparametric JR method. The main effects (high vs. control) and (low vs. control) have a significant negative magnitude, indicating a negative effect on the birth weights of rat pups. The litter size is also found to have a significant negative effect on the birth weights of rat pup. The study shows a strong tendency for birth weights to decrease as a function of litter size in all litters.
Estimates of the variance components are also given in Table 7. We note that the JR estimate of \({\sigma }_{litter}^{2}\) has smaller standard error compared to the corresponding REML estimator. The same conclusion holds for the estimated value of \({\sigma }_{residuals}^{2}\). Next, we examine the effect of the outliers and the distributional assumptions on each estimation method.
5.3 Robustness of estimation methods
Here, we explore whether two features might have led to the superiority of the JR estimators in Table 7 over the REML estimators. First, we test the assumption of normality of data using Shapiro–Wilk test. Based on the original data, the Shapiro–Wilk test produces a test statistic of 0.8448 with pvalue \(<0.001\), which reveals a violation of the normality assumption. This result asserts the tendency of the JR method to outperform the REML method as concluded from Table 7 where the considerable departure from the normality assumption can be one of the reasons that favors the use of the JR fit.
The second feature of concern is the presence of potential outliers in the rat pup data as concluded from Fig. 2. In exploring the second feature, we follow the procedures in Kloke et al. (2009) to study the effect of changing the magnitude of the suspicious outliers on the efficiency of the REML and JR fits. The results are provided in Table 8. Moreover, we study the effect of removing these potential outliers, hence reducing the total sample size, by refitting the model to the reduced dataset. The corresponding results are provided in Table 10.
In order to assess the effect of the presence of the potential outliers in the rat pup data, we change their magnitudes in two dimensions as follow. For pups with weights larger than the majority of the other pups in the same litter, their magnitudes have been doubled. For those with weights less than the majority of the other pups in the same litter, their values have been divided by 2. From the results in Table 8, we note that according to each estimation method, the significance/insignificance status of fixed effects estimates remained unchanged. However, for the variances components, the REML standard errors became less efficient than their corresponding values using the original data. The JR standard errors remain approximately unchanged, confirming that their robustness to the presence of the outliers.
Table 9 provides a summary of the estimates of variance components and interclass correlation coefficients under REML and JR estimation methods for the original and changed rat pup datasets. The results show that, the JR variance components estimates under the changed data are \({\widehat{\sigma }}_{litter}^{2}=0.0035\), \({\widehat{\sigma } }_{residuals}^{2}=0.0879\) and the estimate of the total model variance is \({\widehat{\upsigma } }_{ }^{2}=0.0914\), where \({\widehat{\upsigma }}_{ }^{2}={\widehat{\sigma }}_{litter}^{2}+{\widehat{\sigma }}_{residuals}^{2}\), and \(\mathrm{ICC}=0.038\). These are essentially unchanged compared to their corresponding values in the original data and remain smaller than their corresponding results produced by REML estimation.
Model (15) has been refitted using the REML and JR methods to the reduced data, i.e. after removing the potential outliers from the original data. From Table 10, we conclude that the JR results remain better (in terms of the standard errors of the variance components) than their corresponding REML results. The conclusions made about the estimated fixed effects using both estimation methods do not change.
To sum up, it seems that the violation of the normality assumption was the main cause to advocate the use of the JR method in obtaining the results of the original data (Table 7) rather than the presence of potential outliers (Fig. 2). This conclusion has been enhanced by investigating the original data after changing the magnitude of these outliers (Table 8) and after their exclusion (Table 10).
5.4 Testing litter effect
Testing the need of random effect is conducted to decide whether the random effects that are associated with the intercepts for each litter can be omitted from model (15). Based on the original rat pup dataset, the proposed \({T}_{JR}\)test is calculated with 5000 permutation samples. The test produces a test statistic of \(0.5796\) with a pvalue \(=0.001\). The competing tests are also conducted such that the test statistics pLRT, LRT, and RLRT are 84.213 (pvalue \(=0.001\)), 89.406 (pvalue \(=0.0001\)) and 84.461 (pvalue \(=0.002\)), respectively. Thus, we reject the null hypothesis at the 5% nominal level which allows the random effect \({b}_{k}\) (\(k=1, \dots , 27\)) interpretation. This recommends retaining the random litter effects in this model. It should be emphasized that the role of the test is to decide about the need for the variance components in any further inferential procedures about the fixed effects under the potential presence of outliers or the absence of the normality. Retaining the variance components also validates the recommendation of using the JR estimation method. For further inferential procedures about the fixed effects under this method, the reader is then referred to Kloke et al. (2009).
6 Conclusion
In this article, our proposed variance components test is provided via a novel combination of tools that can play an important role in preserving a correct size meanwhile producing a competitive power using a permutation test. The exchangeability of the cluster indices, hence of the estimated residuals, along with the robustness of the estimation of both fixed effects and variance of the random effects are jointly utilized. This combination seems to be overlooked or not recognized in the literature. Our test statistic seems to be a natural choice for evaluating the nullity of the variance components in the LME model using a permutationbased test. The robust estimation theory for obtaining the test statistic is readily available when the model involves a single variance component. Particularly, the robustness of the underlying parameter estimation method controls the size of the proposed test to remain at an acceptable level compared to the poor size (invalidity) of the competing tests under the presence of outliers. Aside from outliers, the power of the proposed \({T}_{JR}\)test always exceeds its competitors under the remaining simulation schemes.
Needless to say, the proposed test remains limited to LME models involving one random effect per cluster. The lack of robust rankbased estimation theory under general linear mixed models with complex/unknown covariance structures restricts our proposal from potential extensions to test multiple variance components. This includes the challenging problem of testing a subset of them. It shall be a demanding point for future research. Extensions should at least cover the cases where the present subset of random effects under the null hypothesis possess the nonstandard properties considered in our simulation schemes.
References
Arboretti R, Corain L, Salmaso L, Melas VB, Pepelyshev A, Shpilev P (2015) On the optimal choice of the number of empirical Fourier coefficients for comparison of regression curves. Stat Pap 56(4):981–997
Azzalini A, Valle AD (1996) The multivariate skewnormal distribution. Biometrika 83(4):715–726
Basso D, Pesarin F, Salmaso L, Solari A (2009) Permutation tests for stochastic ordering and ANOVA: theory and applications in R. Springer, New York
Crainiceanu CM, Ruppert D (2004) Likelihood ratio tests in linear mixed models with one variance component. J R Stat Soc Ser B 66:165–185
Drikvandi R, Verbeke G, Khodadadi A et al (2013) Testing multiple variance components in linear mixedeffects models. Biostatistics 14:144–159
Du H, Wang L (2020) Testing variance components in linear mixed modeling using permutation. Multivar Behav Res 55(1):120–136
Fitzmaurice GM, Lipsitz SR, Ibrahim JG (2007) A note on permutation tests for variance components in multilevel generalized linear mixed models. Biometrics 63:942–946
Hahn S, Salmaso L (2017) A comparison of different synchronized permutation approaches to testing effects in twolevel twofactor unbalanced ANOVA designs. Stat Pap 58(1):123–146
Hettmansperger TP, McKean JW (2010) Robust nonparametric statistical methods. CRC Press, Boca Raton
Jaeckel LA (1972) Estimating regression coefficients by minimizing the dispersion of the residuals. Ann Math Stat 43:1449–1458
Jureckova J (1971) Nonparametric estimate of regression coefficients. Ann Math Stat 42:1328–1338
Kloke JD, McKean JW, Rashid MM (2009) Rankbased estimation and associated inferences for linear models with cluster correlated errors. J Am Stat Assoc 104:384–390
Lee OE, Braun TM (2012) Permutation tests for random effects in linear mixed models. Biometrics 68:486–493
Liu R, McKean JW (2015) Robust rankbased and nonparametric methods. Springer, Cham
McKean JW, Hettmansperger TP (2016) Rankbased analysis of linear models and beyond: a review. Robust RankBased and Nonparametric Methods: Michigan, USA, April 2015: Selected, Revised, and Extended Contributions, pp 1–24
McKean JW, Kloke JD (2014) Efficient and adaptive rankbased fits for linear models with skewnormal errors. J Stat Distrib Appl 1:1–18
Patterson HD, Thompson R (1971) Recovery of interblock information when block sizes are unequal. Biometrika 58(3):545–554
Pesarin F, Salmaso L (2010) Permutation tests for complex data: theory, applications and software. Wiley, Hoboken
Pesarin F, Salmaso L (2012) A review and some new results on permutation testing for multivariate problems. Stat Comput 22(2):639–646
Pfeffermann D (2013) New important developments in small area estimation. Stat Sci 28:40–68
Pinheiro J, Bates D (2006) Mixedeffects models in S and SPLUS. Springer Science & Business Media, New York
Schmoyer RL (1994) Permutation tests for correlation in regression errors. J Am Stat Assoc 89:1507–1516
Self SG, Liang KY (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610
Shapiro A (1985) Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints. Biometrika 72(1):133–144
Shapiro A (1988) Towards a unified theory of inequality constrained testing in multivariate analysis. Int Stat Rev 56(1):49–62
Stoel RD, Garre FG, Dolan C et al (2006) On the likelihood ratio test in structural equation modeling when parameters are subject to boundary constraints. Psychol Methods 11(4):439–455
Stram DO, Lee JW (1994) Variance components testing in the longitudinal mixed effects model. Biometrics 50:1171–1177
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Validity of the permutation test
The test statistic \({T}_{JR}\) is computed based on \({\widehat{\sigma }}_{b}^{2}\) which is a function of the estimated residuals \({\widehat{e}}_{kj}\). Under the null hypothesis of zero variance components, the order by which the cluster indices are arranged in the sampled dataset is just one possible arrangement (permutation) \(\pi \in \Sigma \) where \(\Sigma \) denotes the set of all \(N!\) permutations of cluster indices. Denote by \({T}_{JR}^{\pi }\) the value of \({T}_{JR}\) under permutation \(\pi \). For each permutation \(\pi \), the parameters \(\eta \) and \({\varvec{\beta}}\) are estimated and can be represented by \({\widehat{\eta }}^{\pi }\) and \({\widehat{{\varvec{\beta}}}}_{\varphi }^{\pi }\). Interestingly, under the null hypothesis we have
and
for all \(\pi \in \Sigma \) where \({\widehat{{\varvec{\beta}}}}_{\varphi }\) and \(\widehat{\eta }\) are given in (7) and (9).
The exchangeability requirement for running a permutation test based on \({T}_{JR}\) can be proved based on the assumption of independence (hence the exchangeability) of the errors
under the null hypothesis where (11) reduces to
Unfortunately, the errors \({e}_{kj}\) are unobservable random variables. However, the estimates \({\widehat{e}}_{kj}\) can replace the corresponding errors in approximating the permutation distribution of \({T}_{JR}\) if those estimates are exchangeable too. Observing that
Then, the possible permutations of \({\widehat{e}}_{kj}^{\pi }\) are equiprobable because the cluster indices (i.e. over \(k\) and \(j\)) are equiprobable when the null hypothesis is true. Hence, it suffices to prove the exchangeability of \({\widehat{e}}_{kj}\) to validate the approximation of the permutation test using \({\widehat{e}}_{kj}^{\pi }\) for all \(\pi \in \Sigma \). That is, their joint distribution is the same irrespective of their existing order.
For fixed \(j\) and \(k=1,\dots ,m\), the joint distribution of \({\widehat{e}}_{kj}^{\pi }\) over all clusters is given by
where under the null hypothesis, the last equation (A4) indicates that the estimated residuals \({\widehat{e}}_{kj}^{\pi }\) are independent and identically distributed given \(\widehat{\eta }\) and \({\widehat{{\varvec{\beta}}}}_{\varphi }\). Let \({\pi }^{*}\in \Sigma \) be another permutation. Then,
and due to the conditional independence, then
where the last equation (A5) implies \(f\left({\widehat{e}}_{1j}^{\pi },\dots ,{\widehat{e}}_{mj}^{\pi }\right)=f\left({\widehat{e}}_{1j}^{{\pi }^{*}},\dots ,{\widehat{e}}_{mj}^{{\pi }^{*}}\right)\) for any nonidentical permutations \(\pi \ne {\pi }^{*}\).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
ElHorbaty, Y.S., Hanafy, E.M. A Monte Carlo permutation procedure for testing variance components using robust estimation methods. Stat Papers (2023). https://doi.org/10.1007/s00362023013962
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00362023013962
Keywords
 Exchangeability
 Robustness
 Rankbased estimation
 Permutation test
 Outliers