Linear quantile mixed models
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s11222-013-9381-9
- Cite this article as:
- Geraci, M. & Bottai, M. Stat Comput (2014) 24: 461. doi:10.1007/s11222-013-9381-9
- 36 Citations
- 2.4k Downloads
Abstract
Dependent data arise in many studies. Frequently adopted sampling designs, such as cluster, multilevel, spatial, and repeated measures, may induce this dependence, which the analysis of the data needs to take into due account. In a previous publication (Geraci and Bottai in Biostatistics 8:140–154, 2007), we proposed a conditional quantile regression model for continuous responses where subject-specific random intercepts were included to account for within-subject dependence in the context of longitudinal data analysis. The approach hinged upon the link existing between the minimization of weighted absolute deviations, typically used in quantile regression, and the maximization of a Laplace likelihood. Here, we consider an extension of those models to more complex dependence structures in the data, which are modeled by including multiple random effects in the linear conditional quantile functions. We also discuss estimation strategies to reduce the computational burden and inefficiency associated with the Monte Carlo EM algorithm we have proposed previously. In particular, the estimation of the fixed regression coefficients and of the random effects’ covariance matrix is based on a combination of Gaussian quadrature approximations and non-smooth optimization algorithms. Finally, a simulation study and a number of applications of our models are presented.
Keywords
Best linear predictor Clarke’s derivative Hierarchical models Gaussian quadrature1 Introduction
Conditional quantile regression (QR) pertains to the estimation of unknown quantiles of an outcome as a function of a set of covariates and a vector of fixed regression coefficients. Generally, QR estimation makes no assumption on the shape of the distribution of the outcome (Boscovich 1757; Wagner 1959; Barrodale and Roberts 1978; Bassett and Koenker 1978; Koenker and Bassett 1978). Its ability to provide a thorough description of the distributional effects has contributed to make QR attractive in several fields. See for example Yu et al. (2003) and Koenker (2005) for an overview of recent applications.
In the last few years, the need for extending QR for independent data to clustered data has led to several and quite distinct approaches. These can be roughly classified into two groups: distribution-free and likelihood-based. The former includes fixed effects (Koenker 2004; Lamarche 2010; Galvao and Montes-Rojas 2010; Galvao 2011) and weighted (Lipsitz et al. 1997; Karlsson 2008; Fu and Wang 2012) approaches. The latter are mainly based on the asymmetric Laplace (AL) density (Geraci and Bottai 2007; Liu and Bottai 2009; Yuan and Yin 2010; Lee and Neocleous 2010; Farcomeni 2012) or other parametric distributions (Reich et al. 2010a).
These categories are by no means mutually exclusive. For example, penalty methods as those proposed by Koenker (2004) might have a strict relationship with the asymmetric Laplace regression with double-exponential random effects as suggested by Geraci and Bottai (2007). Yet, this has not been fully explored—see Lamarche (2010) for a discussion on the asymptotics of L_{1} penalized fixed-intercept models. Nor the classification is exhaustive. In the approach by Canay (2011) location-shift fixed effects are eliminated by a transformation. Also, other approaches might involve modeling the moments of a distribution function using a parametric family (e.g., Rigby and Stasinopoulos 2005) and deriving the quantiles of the response by inversion. Yet, the spirit of a likelihood-based approach to QR is different from a probability model fitting exercise, where a given distribution is assumed to be the ‘true’ distribution.
In Sect. 2, we briefly review Geraci and Bottai’s AL-based approach (Geraci and Bottai 2007, hereafter GB) and we introduce a generalization of the model proposed therein. In Sect. 3, we describe an estimation process based on numerical integration and nonsmooth optimization. Inferential issues are also discussed. A simulation study is presented in Sect. 4. We then conclude with some applications (Sect. 5) and final remarks (Sect. 6). All computations were performed using the package lqmm (Geraci 2012) for the statistical programming environment R (R Development Core Team 2012).
2 Linear quantile mixed models
2.1 Random-effects models with asymmetric Laplace error
A model with random intercepts only cannot obviously account for between-clusters heterogeneity associated with given explanatory variables. This is the case, for example, of growth curves in which both intercepts and slopes of temporal trajectories differ by subject due to genetic and environmental effects. In the next section, we extend our model to include multiple random effects.
2.2 Model generalization
There are several reasons why an analyst might consider a random-effects approach for their data while wanting to go beyond a location-shift model. A quantile regression approach allows for departures from location-shift assumptions, typical of LMMs, in which covariates do not affect the shape of the distribution. Here, we follow the same approach of GB but we provide a framework within which modeling of the regression quantiles of clustered continuous outcomes is more general and estimation is computationally more efficient.
We introduce the convenient assumption that the \(y_{i} = (y_{11},\ldots,y_{1n_{i}})'\), i=1,…,M, conditionally on a q×1 vector of random effects u_{i}, are independently distributed according to a joint AL with location and scale parameters given by \(\mu^{(\tau)}_{i} = X_{i}\theta^{(\tau)}_{x} + Z_{i}u_{i}\) and σ^{(τ)}, where \(\theta^{(\tau)}_{x}\in\mathbb{R}^{p}\) is a vector of unknown fixed effects. The skew parameter τ is set a priori and defines the quantile level to be estimated. Also, we assume that u_{i}=(u_{i1},…,u_{iq})′, for i=1,…,M, is a random vector independent from the model’s error term and distributed according to p(u_{i}|Ψ^{(τ)}), where Ψ^{(τ)} is a q×q covariance matrix. Note that all the parameters are τ-dependent. The random effects vector u depends on τ through Ψ^{(τ)}. We will omit the superscript τ when this is not source of confusion. Moreover, we assume that the random effects are zero-median vectors. Other scenarios may include the case in which u are not zero-centered (see for example Koenker 2005, p. 281) and/or are not symmetric. A discussion on proposed distributions for the random effects is deferred to the next section.
Clearly, since inference in model (2) is done seperately for each τ, quantile crossing phenomena may arise within the convex hull of X, perhaps as a result of model misspecification (Koenker 2005, p. 55). As explained by Lum and Gelfand (2012), the stronger constraint of a joint quantile specification is replaced by the weaker stochastic ordering of the asymmetric Laplace. Ad-hoc solutions to avoid quantile-crossing have been proposed (He 1997; Zhao 2000).
3 Estimation
GB proposed to fit the random-intercepts QR model by using a Monte Carlo EM algorithm. This approach on the one hand avoids evaluating a multidimensional integral, but on the other hand, it can be computationally demanding. In a simulation study, Alhamzawi et al. (2011) compared Gibbs sampling with GB’s EM algorithm and with Yuan and Yin’s (2010) MCMC algorithm. In terms of relative bias, the three approaches were found to be analogous. However, the EM algorithm showed poorer computational efficiency. Modeling complex structures of the random effects increases the computational requirements. In this section, we will explore alternative strategies based on different optimization techniques.
Theorem 1
(Prékopa 1973)
If p(u_{i}|Ψ) is log-concave in u, the integrand in (4) will be log-concave in y and Prékopa’s theorem will apply to p(y_{i}|θ_{x},σ,Ψ). As we shall see, the distributions of the random effects considered in this study are log-concave. The joint log-likelihood, however, is not a concave function of the scale and variance parameters for which a reparameterization is required in order to apply the above results.
3.1 Numerical integration
For this reason, the product rule entails a ‘curse of dimensionality’, an exponential increase of the number of evaluations of the integrand function. For example, we would need 3,200,000 function evaluations for a 5-dimensional quadrature rule based on 20 nodes and 64,000,000 for adding only one random effect, let alone the total number of evaluations necessary to convergence if the parameter’s estimation algorithm is iterative (as it will be seen to be the case). Fortunately, a satisfactory accuracy is typically achieved with less than 20 nodes.
The choice of an appropriate distribution for the random effects is not straightforward. As recognized by GB, robustness issues apply to the distribution of y|u as well as of u. As a robust alternative to the Gaussian choice, we suggested the use of the symmetric Laplace. This choice led to a regression model which, after a rescaling (Geraci and Bottai 2007, p. 146), was similar to the penalized model proposed by Koenker (2004). As noted by Koenker (2004), using L_{1} penalties in fixed-effect models is convenient as this produces an elegant form of the penalized objective function apt to be solved via linear programming algorithms. See also Koenker and Mizera (2004) for quantile bivariate smoothing using L_{1} penalties.
In the following, we will focus explicitly on two types of distributions, namely Gaussian and Laplacian. It is immediate to verify that these assumptions correspond to applying, respectively, a Gauss–Hermite and a Gauss–Laguerre quadrature to the integral in (4). More general considerations can be made with regard to the use of symmetric as well as asymmetric kernels belonging to the exponential family.
3.1.1 Normal random effects
The integral above can be recognized as a normal-Laplace convolution (Reed 2006). This type of distribution is known in a special form in meta-analysis (Demidenko 2004).
3.1.2 Robust random effects
The use of (7) for correlated random effects is uncertain. Even though it is easy to re-scale Ψ to a diagonal matrix, the joint density does not factorize into q AL variates (Eltoft et al. 2006) and, therefore, the q-dimensional quadrature can not be based on q successive applications of one-dimensional rules. Therefore, at the moment, we do not consider any of the available proposals for a generalized Laplace distribution as suitable for our purposes. See for example Liu and Bottai (2009) for some results on the use of Kotz et al.’s (2000) multivariate Laplace distribution.
3.2 Nonsmooth optimization
The nondifferentiability of the loss function ρ_{τ} at points where \(y_{ij}-x'_{ij}\theta_{x}-z'_{ij}u = 0\) interferes with the standard theory of smooth optimization. Subgradient optimization and derivative-free optimization techniques (e.g., coordinate and pattern-search methods, modified Nelder-Mead methods, implicit filtering) have been developed to tackle nonstandard optimization problems. Here, we briefly study the Clarke’s derivative of the loss function as starting point for a nonsmooth optimization approach. In his book, first published in 1983, Clarke (1990) developed a general theory of nonsmooth analysis that leads to a powerful and elegant approach to mathematical programming. Here, we focus our attention on theorems for Lipschitz functions which play an important role in Clarke’s treatise.
Using the subgradient approach of Rockafellar (1970), Koenker (2005) gives optimality conditions of the quantile regression problem for independent data. Here, we consider the gradient search algorithm for Laplace likelihood originally developed by Bottai and Orsini (2012) for censored Laplace regression (Bottai and Zhang 2010), which works as follows.
- 1.
Set θ=θ^{0}; δ=δ^{0}; σ=σ^{0}; k_{1}=0.
- 2.If \(\ell_{\mathrm{app}} \{\theta^{k_{1}} - \delta^{k_{1}}s (\theta^{k_{1}} ),\sigma^{0} \} \geq\ell_{\mathrm{app}} \{\theta ^{k_{1}},\sigma^{0} \}\)
- (a)
then set \(\delta^{k_{1}+1} = a\delta^{k_{1}}\);
- (b)else if \(\ell_{\mathrm{app}} \{\theta^{k_{1}},\sigma ^{0} \}\,{-}\,\ell_{\mathrm{app}} \{\theta^{k_{1}}\,{-}\,\delta^{k_{1}}s (\theta^{k_{1}} ),\sigma^{0} \} \,{<}\, \omega_{1}\)
- (i)
then return \(\theta^{k_{1}+1}\); stop;
- (ii)
else set \(\theta^{k_{1}+1} = \theta^{k_{1}} - \delta s (\theta^{k_{1}} )\); \(\delta^{k_{1}+1} = b\delta^{k_{1}}\).
- (i)
- (a)
- 3.
Set k_{1}=k_{1}+1; go to step 2.
3.3 Interpretation of parameters
Consider the random intercept model introduced in (1). Its marginal distribution is given by \(y \sim N (\mu,\allowbreak \psi^{2}_{u} ZZ' + \psi^{2}I )\). The parameters of such model have a straightforward interpretation: μ is the mean effect at the population level, \(\psi^{2}_{u}\) is a measure of the dispersion of the cluster-specific random effects and proportional to the ICC, and ψ^{2} is the error’s variance.
Let us now turn to the linear quantile mixed models. Our starting point is conditional. The marginal likelihood (4), estimated under (5) or (6), implicitly assumes that the τth regression quantiles of the clusters ‘gravitate’ around a common regression quantile which, in the case of the normal intercept model, would be μ+ψΦ^{−1}(τ), clearly different from the τth quantile of the marginal model \(\mu+\sqrt{\psi^{2}_{u}+\psi^{2}}\varPhi^{-1}(\tau)\). Oberhofer and Haupt’s (2005) showed that the unconditional quantile estimator for dependent random variables is asymptotically unbiased.
The scale parameter σ does not have, in general, a straightforward interpretation since the use of the Laplace distribution for the conditional response follows from the convenience of manipulating a likelihood rather than from the observation that the data is indeed Laplacian. But what if it is? Consider the linear median mixed model (τ=0.5). The variance of the quantile estimator of q_{i⋅}(0.5) based on normal approximations is \(\frac{\tau (1-\tau)}{n_{i} [\frac{\tau(1-\tau)}{\sigma} ]^{2}} = \frac{4\sigma ^{2}}{n_{i}}\). The relative efficiency of this estimator under Laplacian and normal hypotheses is then \(\frac{4\sigma^{2}}{n_{i}}/\frac{\pi \psi^{2}}{2n_{i}}=8\sigma^{2}/\pi\psi^{2}\). Note that if y∼AL(μ,σ,τ) then \(\operatorname{var}(y) = \frac{\sigma^{2}(1-2\tau+2\tau^{2})}{(1-\tau)^{2}\tau^{2}}\) (Yu and Zhang 2005), hence \(\operatorname{var}(y) = 8\sigma^{2}\) for τ=0.5. The median estimator would achieve the same asymptotic efficiency under the two distributions if, in terms of variances, 8σ^{2}/ψ^{2}=π or, equivalently in terms of scale parameters, if σ/ψ≈0.62.
3.4 Inference
The LQMM in (2) as well as GB’s random-intercept model assume constant scale parameter and, under this assumption, the asymptotics in heteroscedastic data would lead to incorrect inferential approximations. However, GB’s standard error calculation relied on bootstrap resampling which makes large-n approximations irrelevant (Kim and Yang 2011). The inclusion of variance weights might represent a possible development to address such limitation in our models. See Koenker (2005, p. 94) for a discussion on likelihood ratio tests based on the asymmetric Laplace residuals.
Bootstrap represents a very flexible approach and is often used in quantile estimation (Parzen et al. 1994; Buchinsky 1995; He and Hu 2002; Kocherginsky et al. 2005; Bose and Chatterjee 2003; Koenker 2005; Feng et al. 2011; Canay 2011). Here, we consider a block bootstrap approach to assess the uncertainty in conditional quantile estimates. R bootstrap samples are obtained by resampling the index i=1,…,M with replacement. The standard errors are computed as the square root of the diagonal elements of the covariance matrix \(\hat{V} = \operatorname{Cov} (B )\), where B is a matrix R×(p+1) with row vectors of bootstrap estimates of (θ_{x},σ)′. Similarly, standard errors for Ψ can be obtained after opportunely transforming the bootstrap estimates of θ_{z}.
Model selection can be conducted by using traditional measures such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). A Laplacian likelihood ratio test can be used for testing linear hypotheses of the type H_{0}: θ_{2}=0, with partitioned coefficients vector \(\theta_{x} = (\theta'_{1},\theta'_{2} )'\). See Koenker and Machado (1999) for additional details.
For the sake of simplicity, consider the LMM y=u+ε, u∼N(0,I), ε∼N(0,I) which entails an ICC of 0.5. As seen previously, the random effects play a pure location-shift role. Therefore, we expect that in this case a good predictor for u in LQMMs provided a prediction \(\hat{u}\) that is close to the best linear unbiased prediction (BLUP) of u from a LMM.
The residuals \(\hat{e} = y - X\hat{\theta}^{(\tau)}_{x} - Z\hat{u}^{(\tau)}_{\mathrm{eBLP}}\) did not satisfy the optimality condition of Koenker and Bassett (1978, Theorem 3.4). An ad-hoc adjustment might consist in constraining the estimation of u in (11) to satisfy such condition. The development of a predictor for the random effects that take into accounts these preliminary results is needed.
4 Simulation study
Simulation study scenarios. Distributions: N(0,ψ^{2}), normal with mean 0 and variance ψ^{2}; t_{g}, t with g degrees of freedom; \(\chi^{2}_{g}\), chi-square with g degrees of freedom; St_{g}, skew t with g degrees of freedom (Azzalini and Capitanio 2003)
Model description | (n,M) | u | v | e | γ |
---|---|---|---|---|---|
(1) location-shift symmetric | (5,50) | N(0,5) | – | N(0,5) | 0 |
(2) location-shift symmetric | (5,50) | t_{3} | – | N(0,5) | 0 |
(3) location-shift heavy-tailed | (5,50) | N(0,5) | – | t_{3} | 0 |
(4) location-shift heavy-tailed | (5,50) | t_{3} | – | t_{3} | 0 |
(5) location-shift asymmetric | (5,50) | N(0,5) | – | \(\chi^{2}_{2}\) | 0 |
(6) location-shift asymmetric | (5,50) | t_{3} | – | \(\chi^{2}_{2}\) | 0 |
(7) location-shift symmetric \(\operatorname{cor}(u,v) = 0\) | (5,50) | N(0,5) | N(0,5) | N(0,5) | 0 |
(8) location-shift heavy-tailed \(\operatorname{cor}(u,v) = 0\) | (5,50) | t_{3} | t_{3} | t_{3} | 0 |
(9) location-shift symmetric | (10,100) | N(0,5) | – | N(0,5) | 0 |
(10) location-shift symmetric | (20,200) | N(0,5) | – | N(0,5) | 0 |
(11) location-shift heavy-tailed \(\operatorname{cor}(u,v) = 0\) | (10,100) | t_{3} | t_{3} | t_{3} | 0 |
(12) location-shift heavy-tailed \(\operatorname{cor}(u,v) = 0\) | (20,200) | t_{3} | t_{3} | t_{3} | 0 |
(13) location-shift asymmetric \(\operatorname{cor}(u,v) = 0\) | (20,200) | t_{3} | t_{3} | \(\chi^{2}_{2}\) | 0 |
(14) heteroscedastic symmetric | (10,100) | N(0,5) | – | N(0,5) | 0.25 |
(15) heteroscedastic heavy-tailed | (10,100) | t_{3} | – | t_{3} | 0.25 |
(16) heteroscedastic asymmetric | (10,100) | t_{3} | – | \(\chi^{2}_{2}\) | 0.25 |
(17) location-shift symmetric \(\operatorname{cor}(u,v) > 0\) | (10,100) | N(0,5) | N(0,5) | N(0,5) | 0 |
(18) location-shift symmetric \(\operatorname{cor}(u,v) < 0\) | (10,100) | N(0,5) | N(0,5) | N(0,5) | 0 |
(19) location-shift heavy-tailed \(\operatorname{cor}(u,v) > 0\) | (10,100) | t_{3} | t_{3} | t_{3} | 0 |
(20) location-shift heavy-tailed \(\operatorname{cor}(u,v) < 0\) | (10,100) | St_{3} | St_{3} | t_{3} | 0 |
(21) location-shift heavy-tailed \(\operatorname{cor}(u,v) > 0\) | (10,100) | St_{3} | St_{3} | t_{3} | 0 |
(22) location-shift with 5 % contamination | (10,100) | N(0,5) | – | \(\chi^{2}_{2}\) + N(0,50) | 0 |
(23) location-shift with 5 % contamination | (10,100) | N(0,5)+N(0,50) | – | \(\chi^{2}_{2}\) | 0 |
We also assessed the coverage rate of the median estimator’s confidence intervals at the nominal 90 % level. Data were generated as in scenario (3) (Table 1) but with fixed design points (i.e., drawn only once), either for γ=0 or γ=0.25 and for three sample sizes (n,M): (5,50), (10,100), and (10,200). The standard errors were calculated using R=50 bootstrap replications. The estimates were obtained using a Gauss–Hermite quadrature with K=17 nodes and Nelder-Mead optimization.
Finally, we evaluated the Gauss–Hermite quadrature for several quadrature grid sizes. Data were generated from y_{ij}=(β_{k}+u_{i,k})x_{ij,k}+e_{ij}, u_{i,k}∼N(0,5), e_{ij}∼N(0,5) with n=5 and M=50. The number of fixed and random effects k ranged from 1 (intercept model) to 6, with x_{ij,k}=δ_{i}+ζ_{ij} as described above for k=2,…,6. LQMMs were estimated for τ∈{0.5,0.75,0.9} using K=5 or K=17 nodes.
In all cases, datasets were generated independently 500 times. In what follows, we report selected results for the sake of brevity. All results are available upon request from the first author.
Relative bias of \(\hat{\theta}_{x}^{(\tau)}\) for τ∈{0.5,0.75,0.9}
Model | Intercept | x | z | ||||||
---|---|---|---|---|---|---|---|---|---|
0.5 | 0.75 | 0.9 | 0.5 | 0.75 | 0.9 | 0.5 | 0.75 | 0.9 | |
(1) | −0.000 | 0.001 | 0.002 | −0.002 | −0.003 | 0.000 | 0.007 | 0.019 | −0.018 |
(2) | 0.000 | 0.001 | 0.000 | 0.001 | −0.004 | 0.001 | 0.018 | 0.041 | −0.017 |
(3) | 0.000 | 0.003 | 0.008 | 0.005 | 0.004 | −0.002 | 0.013 | 0.034 | 0.013 |
(4) | 0.000 | 0.002 | 0.004 | 0.004 | −0.000 | −0.002 | −0.006 | 0.034 | −0.024 |
(5) | 0.002 | 0.005 | 0.003 | 0.001 | 0.004 | 0.001 | 0.010 | 0.056 | 0.072 |
(6) | 0.002 | 0.003 | 0.002 | 0.004 | 0.007 | 0.011 | −0.018 | 0.055 | 0.048 |
(7) | −0.000 | 0.001 | 0.003 | 0.005 | 0.006 | −0.009 | −0.002 | 0.047 | −0.107 |
(8) | −0.000 | 0.010 | 0.039 | −0.013 | −0.013 | −0.004 | 0.038 | −0.002 | −0.038 |
(9) | −0.000 | 0.000 | 0.002 | 0.002 | −0.001 | −0.001 | −0.007 | 0.002 | −0.007 |
(10) | −0.000 | −0.001 | 0.002 | −0.000 | 0.002 | −0.001 | −0.002 | 0.001 | −0.008 |
(11) | −0.000 | 0.011 | 0.057 | −0.010 | 0.006 | −0.006 | −0.006 | −0.008 | 0.008 |
(12) | 0.000 | 0.009 | 0.021 | −0.016 | −0.018 | −0.020 | −0.005 | −0.011 | −0.005 |
(13) | 0.002 | 0.006 | 0.015 | 0.012 | −0.009 | 0.018 | −0.004 | −0.007 | −0.009 |
(14) | −0.000 | 0.000 | 0.001 | −0.006 | −0.005 | −0.072 | 0.011 | 0.012 | 0.001 |
(15) | 0.000 | 0.001 | 0.002 | 0.009 | 0.010 | −0.005 | −0.011 | −0.007 | −0.018 |
(16) | 0.002 | 0.001 | 0.000 | 0.018 | 0.034 | 0.024 | −0.005 | −0.002 | −0.006 |
(17) | 0.000 | −0.000 | 0.003 | 0.002 | −0.108 | 0.017 | 0.004 | 0.032 | −0.006 |
(18) | −0.000 | 0.002 | 0.005 | −0.013 | −0.045 | −0.089 | −0.010 | −0.000 | −0.034 |
(19) | 0.000 | 0.011 | 0.068 | −0.009 | 0.135 | 0.459 | −0.008 | −0.024 | −0.014 |
(20) | 0.006 | 0.010 | 0.016 | 0.328 | 0.258 | 0.218 | −0.003 | −0.002 | −0.018 |
(21) | 0.012 | 0.015 | 0.019 | 0.624 | 0.566 | 0.614 | 0.004 | 0.013 | 0.004 |
(22) | 0.003 | 0.005 | 0.010 | 0.001 | 0.003 | 0.005 | −0.002 | −0.009 | −0.034 |
(22)^{a} | 0.003 | 0.005 | 0.009 | 0.001 | 0.005 | 0.004 | −0.006 | −0.010 | −0.050 |
(23) | 0.003 | 0.003 | 0.004 | 0.001 | −0.000 | −0.007 | −0.002 | −0.001 | −0.006 |
(23)^{a} | 0.003 | 0.003 | 0.003 | −0.001 | −0.002 | −0.005 | −0.003 | 0.003 | 0.002 |
Outlier contamination, either in the random intercept or in the error term (scenarios 22 and 23), did not impact the estimation bias. However, for data generated under scenario (23), the Monte Carlo variance of the estimated intercepts using Gauss–Laguerre quadrature was 51 % (τ=0.5), 39 % (τ=0.75), and 28 % (τ=0.9) smaller than that calculated when using normal quadrature weights (results not shown).
Estimation bias in scenarios (1–6) remained unchanged when using derivative-free estimation (results not shown).
As expected, tail quantiles had higher estimation bias, in particular for τ=0.01 and τ=0.99, though results (not shown) differed markedly depending on the coefficient being estimated. The relative absolute bias, averaged over τ and the three scenarios, was 1 % for the intercept and 0.6 % for x’s slope but 24 % for the binary covariate’s coefficient.
Coverage rate and mean length of the confidence intervals based on 50 bootstrap replications for the median estimator at nominal 90 % level
(n,M) | Location-shift (γ=0) | Heteroscedastic (γ=0.25) | ||||
---|---|---|---|---|---|---|
(5,50) | (10,100) | (10,200) | (5,50) | (10,100) | (10,200) | |
Coverage (%) | ||||||
Intercept | 91.6 | 91.2 | 94.4 | 91.0 | 92.0 | 92.4 |
x | 93.0 | 93.2 | 94.2 | 90.8 | 91.0 | 94.2 |
z | 90.6 | 91.0 | 94.4 | 93.6 | 94.6 | 90.2 |
Length | ||||||
Intercept | 1.7 | 1.4 | 1.2 | 2.2 | 1.6 | 1.2 |
x | 0.4 | 0.2 | 0.2 | 1.5 | 0.8 | 0.6 |
z | 0.8 | 0.4 | 0.3 | 0.9 | 0.4 | 0.3 |
Relative absolute bias of \(\hat{\theta}_{x}^{(\tau)}\) for τ∈{0.5,0.75,0.9}, number of iterations and time to convergence
Nodes K | Dimension q | Grid size | Relative absolute bias | Iterations | CPU time (seconds) |
---|---|---|---|---|---|
5 | 1 | 5×1 | 0.000 | 37 | 0.01 |
2 | 25×2 | 0.006 | 50 | 0.05 | |
3 | 125×3 | 0.020 | 56 | 0.31 | |
4 | 625×4 | 0.014 | 61 | 2.36 | |
5 | 3125×5 | 0.014 | 65 | 18.10 | |
6 | 15625×6 | 0.026 | 71 | 126.83 | |
17 | 1 | 17×1 | 0.000 | 39 | 0.02 |
2 | 289×2 | 0.004 | 52 | 0.46 | |
3 | 4913×3 | 0.005 | 58 | 11.77 | |
4 | 83521×4 | 0.023 | 65 | 332.02 |
Substantial reductions in the size of the quadrature grid for a given level of accuracy can be obtained by employing efficient quadrature rules, such as, for example, integration on sparse grids (Heiss and Winschel 2008) and nested integration rule for Gaussian weights (Genz and Keister 1996). Also, adaptive rescaling (Pinheiro and Bates 1995; Pinheiro and Chao 2006) may improve the estimation when the variance of the random effects is large. These are topics of current research by the first author.
5 Examples
5.1 Treatment of lead-exposed children
In this section, we revisit a placebo-controlled, double-blind, randomized trial of succimer (a chelating agent) in children with blood lead levels (BLL) of 20–44 μg/dL conducted by the Treatment of Lead-Exposed Children (TLC) Trial Group (2000). Lead exposure is known to increase the risk of encephalopathy and, in general, to impair cognitive function. The trial was designed to test the hypothesis that children with moderate BLL who were treated with succimer would score higher than children given placebo on a number of behavioral and cognitive tests. Drug treatment was later found not to improve psychological test scores (Rogan et al. 2001).
The original data consist of repeated measurements of BLL observations on an initial sample of 780 children (396 in the succimer group and 384 in the placebo group). Here, we analyze a subset of the data for M=100 children and for a shorter time period. Our results, therefore, may not be extended to draw general conclusions on the original study. The dataset, available at http://biosun1.harvard.edu/~fitzmaur/ala/tlc.txt, provides BLL obtained at baseline (or week 0), week 1, week 4, and week 6 on children randomly assigned to treatment (50) or placebo (50) group.
We assumed a general positive-definite matrix and approximated the log-likelihood using a Gauss–Hermite quadrature as in (5) with K=7 nodes, giving a quadrature grid of size 2401×4, for 5 quantiles Open image in new window, Open image in new window. The objective function was optimized by using the gradient search algorithm described in Sect. 3.2. Standard errors were computed using R=30 bootstrap replications.
The intercepts can be interpreted as the quantiles τ, Open image in new window, of the distribution of BLL in untreated children at the beginning of the study. The estimated treatment effect shows that the BLL distributions in the two groups at baseline differ markedly at lower quantiles and moderately at the centre.
The estimated slopes for spline T_{1} in the placebo group show a fall of BLL over the 6-week time period at approximately uniform rates across the distribution, with levels comparable to the average rate. The negative estimates of T_{2}’s coefficients indicate a slightly steeper decrease of BLL after 4 weeks at rates that are, again, similar across the distribution.
In the treatment group, there is a generalized fall of BLL levels at all quantiles but at faster rates than those in the placebo group. The positive coefficients for spline T_{2} reflect the rebound that we observed in Fig. 2. The magnitude of this effect is weaker at higher levels of lead concentration.
The analysis of the so-called Lehman-Doksum quantile treatment effect (Doksum 1974; Lehmann 1975; Koenker and Xiao 2002) has particular relevance in placebo-controlled studies as the one considered here. In the case of independent observations, formal testing of location-shift and location-scale-shift hypotheses can be carried out, for example, by using Khmaladze-type tests (Koenker and Xiao 2002; Koenker 2005). To the best of our knowledge, these are not available for correlated data though it has been suggested that an extension of these tests to the analysis of cluster-specific distributions may be contemplated (Koenker 2005, p. 281).
Correlation matrix of the random effects for intercept (Int), basis spline T_{1}, treatment variable (Trt) and their interaction term (Trt: T_{1}). Variances are reported in brackets
Int | T_{1} | Trt | Trt: T_{1} | Int | T_{1} | Trt | Trt: T_{1} | |
---|---|---|---|---|---|---|---|---|
τ=0.1 | τ=0.25 | |||||||
Int | (11.05) | (13.40) | ||||||
T_{1} | 0.84 | (4.51) | 0.86 | (3.19) | ||||
Trt | 0.62 | 0.89 | (9.93) | 0.48 | 0.84 | (4.92) | ||
Trt: T_{1} | −0.64 | −0.65 | −0.52 | (1.34) | 0.55 | 0.87 | 0.98 | (4.96) |
τ=0.5 | Mean | |||||||
Int | (17.49) | (16.38) | ||||||
T_{1} | 1.00 | (4.03) | 1.00 | (2.03) | ||||
Trt | −0.59 | −0.61 | (1.00) | −0.44 | −0.44 | (4.91) | ||
Trt: T_{1} | 0.95 | 0.93 | −0.49 | (10.79) | 0.74 | 0.74 | 0.28 | (23.56) |
τ=0.75 | τ=0.9 | |||||||
Int | (19.48) | (31.54) | ||||||
T_{1} | 0.72 | (7.28) | 0.40 | (10.52) | ||||
Trt | −0.64 | −0.39 | (1.30) | 0.25 | 0.15 | (1.95) | ||
Trt: T_{1} | 0.76 | 0.89 | −0.75 | (13.31) | 0.58 | 0.94 | 0.04 | (15.27) |
5.2 A-level Chemistry scores
The dataset consists of A-level Chemistry examination scores in 1997 from 31,022 students in 2410 English schools (Fielding et al. 2003). Additional variables included in the dataset were gender, age in months centered at 222 months (18.5 years), average General Certificate of Secondary Education (GCSE) score centered at mean (6.3) and school identifier.
The intercepts can be interpreted as the quantiles τ, Open image in new window, of the distribution of Chemistry scores in 18.5 year-old males with a GCSE score equal to 6.3. The mean and median scores are very close and the monotonic increase of the quantiles denotes a uniform distribution.
The regression coefficients for age and sex are approximately constant and negative across quantiles and close to the corresponding mean age (−0.037) and sex (−0.74) effects. That is, older individuals and females perform less satisfactorily than younger peers and males, respectively, regardless of their rank in the Chemistry score distribution. However, at higher quantiles (τ=0.9) and therefore for high performing students, there is some indication that these effects may be weaker.
Prior achievement, as measured by mean GCSE results, is beneficial on average Chemistry score. For one point increase of GCSE mean score, A-level Chemistry score increases by 2.57 points. However, the positive effect is much stronger at lower quantiles (low performing students) than at higher quantiles (high performing students), with decreasing magnitude inbetween.
Finally, the between-school variability shows a decreasing pattern from lower to higher quantiles. It is interesting to note that, in contrast, the ICC follows an inverted U curve. That is, the intra-school correlation is higher at the center of the outcome distribution where, proportionally, the within-school variability is smaller. Differences between schools, therefore, might play a less important role in explaining the total variability in below- and above-the-average students’ performance. If these results were to be confirmed by other analyses, this would obviously have an important implication for education policies at the national level.
5.3 Doubly robust meta-analysis
5.4 Smoothing splines
Nonlinear QR for dependent data has received some attention in recent years. Karlsson (2008) developed a weighted approach to nonlinear QR estimation for repeated measures. Wang (2012) extended GB’s model to account for nonlinear effects in normal-Laplace models.
5.5 Spatial modeling
Spatial modeling refers, loosely, to the case in which geographical information is introduced in the model. This includes the case in which x is a vector of geographical coordinates or, more simply, when the grouping factor is some geographical unit (e.g., postal codes, wards or counties). In the latter case, each random term u of the LQMM would be associated with a small-area effect and the resulting variance-covariance matrix would be interpreted as a spatial correlation matrix at the quantile of interest. The modeling of Ψ, therefore, should take into account the spatial association between areas (e.g., contiguity). See for example Lee and Neocleous (2010) for an application of Bayesian quantile regression (Yu and Moyeed 2001) to environmental epidemiology using the results of Machado and Santos Silva (2005). A Bayesian spatial quantile regression approach using the asymmetric Laplace is proposed also by Lum and Gelfand (2012).
Quantile smoothing of surfaces and related inference is a recent topic (He et al. 1998; He and Portnoy 2000; Koenker and Mizera 2004). An application of triogram smoothing splines to poverty mapping is described by Geraci and Salvati (2007). A Bayesian quantile modeling of ozone concentration surfaces is given by Reich et al. (2010b).
6 Final remarks
The growing interest in the theory and applications of regression methods for quantiles of outcome variables of research interest is to a large extent a consequence of the insight and ease of interpretation that only quantiles can provide. The present paper extends the work by Geraci and Bottai (2007) to the general linear quantile mixed effects regression model; discusses relevant aspects of estimation, testing, and computation in detail; illustrates the use of the linear quantile mixed effects regression in two real data examples; shows some of the results of an extensive simulation study; and indicates possible avenues of future research. The linear quantile mixed effects regression presented in this paper constitutes a practical statistical tool for the evaluation of quantiles with dependent data that complements the widely popular mixed effects regression for the mean and may help to further understanding in many fields of applied research.
For this scenario, we increased the number of quadrature nodes K from 11 to 17. The relative bias decreased from 0.135 to 0.051 (τ=0.75) and from 0.459 to 0.075 (τ=0.9).
All computations were performed on a 64-bit operating system machine with 16 Gb of RAM and quad-core processor at 2.93 GHz.
Acknowledgements
The Centre for Paediatric Epidemiology and Biostatistics benefits from funding support from the Medical Research Council in its capacity as the MRC Centre of Epidemiology for Child Health (G0400546). The UCL Institute of Child Health receives a proportion of funding from the Department of Health’s NIHR Biomedical Research Centres funding scheme.