Skip to main content

Advertisement

Log in

Quantile regression for nonlinear mixed effects models: a likelihood based perspective

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Longitudinal data are frequently analyzed using normal mixed effects models. Moreover, the traditional estimation methods are based on mean regression, which leads to non-robust parameter estimation under non-normal error distribution. However, at least in principle, quantile regression (QR) is more robust in the presence of outliers/influential observations and misspecification of the error distributions when compared to the conventional mean regression approach. In this context, this paper develops a likelihood-based approach for estimating QR models with correlated continuous longitudinal data using the asymmetric Laplace distribution. Our approach relies on the stochastic approximation of the EM algorithm (SAEM algorithm), obtaining maximum likelihood estimates of the fixed effects and variance components in the case of nonlinear mixed effects (NLME) models. We evaluate the finite sample performance of the SAEM algorithm and asymptotic properties of the ML estimates through simulation experiments. Moreover, two real life datasets are used to illustrate our proposed method using the \(\texttt {qrNLMM}\) package from \(\texttt {R}\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aghamohammadi A, Mohammadi S (2017) Bayesian analysis of penalized quantile regression for longitudinal data. Stat Pap 58(4):1035–1053

    MathSciNet  MATH  Google Scholar 

  • Allassonnière S, Kuhn E, Trouvé A (2010) Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study. Bernoulli 16(3):641–678

    MathSciNet  MATH  Google Scholar 

  • Andriyana Y, Gijbels I, Verhasselt A (2016) Quantile regression in varying-coefficients models: non-crossing quantile curves and heteroscedasticity. Stat Pap. https://doi.org/10.1007/s00362-016-0847-7

  • Barndorff-Nielsen OE, Shephard N (2001) Non-gaussian ornstein-uhlenbeck-based models and some of their uses in financial economics. J R Stat Soc Ser B 63(2):167–241

    MathSciNet  MATH  Google Scholar 

  • Booth JG, Hobert JP (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc Ser B 61(1):265–285

    MATH  Google Scholar 

  • Davidian M, Giltinan D (1995) Nonlinear models for repeated measurement data. CRC Press, Boca Raton

    Google Scholar 

  • Davidian M, Giltinan D (2003) Nonlinear models for repeated measurement data: an overview and update. J Agric Biol Environ Stat 8(4):387–419

    Google Scholar 

  • Delyon B, Lavielle M, Moulines E (1999) Convergence of a stochastic approximation version of the EM algorithm. Ann Stat 8:94–128

    MathSciNet  MATH  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Fu L, Wang Y (2012) Quantile regression for longitudinal data with a working correlation model. Comput Stat Data Anal 56(8):2526–2538

    MathSciNet  MATH  Google Scholar 

  • Galarza C, Lachos VH, Cabral C, Castro L (2017) Robust quantile regression using a generalized classs of skewed distributions. Statistics 6:113–130

    MathSciNet  Google Scholar 

  • Galvao A (2011) Quantile regression for dynamic panel data with fixed effects. J Econ 164(1):142–157

    MathSciNet  MATH  Google Scholar 

  • Galvao A, Montes-Rojas GV (2010) Penalized quantile regression for dynamic panel data. J Stat Plan Inference 140(11):3476–3497

    MathSciNet  MATH  Google Scholar 

  • Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8(1):140–154

    MATH  Google Scholar 

  • Geraci M, Bottai M (2014) Linear quantile mixed models. Stat Comput 24(3):461–479

    MathSciNet  MATH  Google Scholar 

  • Grossman Z, Polis M, Feinberg M, Grossman Z, Levi I, Jankelevich S, Yarchoan R, Boon J, de Wolf F, Lange J, Goudsmit J, Dimitrov D, Paul W (1999) Ongoing HIV dissemination during HAART. Nat Med 5(10):1099–1104

    Google Scholar 

  • Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109

    MathSciNet  MATH  Google Scholar 

  • Huang Y, Dagne G (2011) A Bayesian approach to joint mixed-effects models with a skew-normal distribution and measurement errors in covariates. Biometrics 67(1):260–269

    MathSciNet  MATH  Google Scholar 

  • Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91(1):74–89

    MathSciNet  MATH  Google Scholar 

  • Koenker R (2005) Quantile regression. Cambridge University Press, New York

    MATH  Google Scholar 

  • Kozubowski T, Nadarajah S (2010) Multitude of Laplace distributions. Stat Pap 51:127–148

    MathSciNet  MATH  Google Scholar 

  • Kuhn E, Lavielle M (2004) Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM Probab Stat 8:115–131

    MathSciNet  MATH  Google Scholar 

  • Kuhn E, Lavielle M (2005) Maximum likelihood estimation in nonlinear mixed effects models. Comput Stat Data Anal 49(4):1020–1038

    MathSciNet  MATH  Google Scholar 

  • Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20(1):303–322

    MathSciNet  MATH  Google Scholar 

  • Lachos VH, Castro LM, Dey DK (2013) Bayesian inference in nonlinear mixed-effects models using normal independent distributions. Comput Stat Data Anal 64:237–252

    MathSciNet  MATH  Google Scholar 

  • Lavielle M (2014) Mixed effects models for the population approach. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Lipsitz SR, Fitzmaurice GM, Molenberghs G, Zhao LP (1997) Quantile regression methods for longitudinal data with drop-outs: application to CD4 cell counts of patients infected with the human immunodeficiency virus. J R Stat Soc Ser C 46(4):463–476

    MATH  Google Scholar 

  • Liu Y, Bottai M (2009) Mixed-effects models for conditional quantiles with longitudinal data. Int J Biostat. https://doi.org/10.2202/1557-4679.1186

  • Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44(2):226–233

    MathSciNet  MATH  Google Scholar 

  • Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092

    MATH  Google Scholar 

  • Meza C, Osorio F, De la Cruz R (2012) Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Stat Comput 22:121–139

    MathSciNet  MATH  Google Scholar 

  • Mu Y, He X (2007) Power transformation toward a linear regression quantile. J Am Stat Assoc 102:269–279

    MathSciNet  MATH  Google Scholar 

  • Perelson AS, Essunger P, Cao Y, Vesanen M, Hurley A, Saksela K, Markowitz M, Ho DD (1997) Decay characteristics of HIV-1-infected compartments during combination therapy. Nature 387(6629):188–191

    Google Scholar 

  • Pinheiro J, Bates D (1995) Approximations to the log-likelihood function in the nonlinear mixed effects model. J Comput Gr Stat 4:12–35

    Google Scholar 

  • Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-PLUS. Springer, New York

    MATH  Google Scholar 

  • R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  • Searle SR, Casella G, McCulloch C (1992) Variance components. Wiley, New York

    MATH  Google Scholar 

  • Sriram K, Ramamoorthi R, Ghosh P (2013) Posterior consitency of Bayesian quantile regression based on the misspecified asymmetric Laplace distribution. Bayesian Anal 8(2):479–504

    MathSciNet  MATH  Google Scholar 

  • Wang J (2012) Bayesian quantile regression for parametric nonlinear mixed effects models. Stat Methods Appl 21:279–295

    MathSciNet  MATH  Google Scholar 

  • Wu L (2002) A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. J Am Stat Assoc 97(460):955–964

    MATH  Google Scholar 

  • Wu L (2010) Mixed effects models for complex data. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Yu K, Moyeed R (2001) Bayesian quantile regression. Stat Probab Lett 54(4):437–447

    MathSciNet  MATH  Google Scholar 

  • Yu K, Zhang J (2005) A three-parameter asymmetric Laplace distribution and its extension. Commun Stat Theory Methods 34(9–10):1867–1879

    MathSciNet  MATH  Google Scholar 

  • Yuan Y, Yin G (2010) Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66(1):105–114

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank the Editor and two anonymous referees whose constructive comments and suggestions led to an improved presentation of the paper. The research of C. Galarza was supported by Grant 2015/17110-9 from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP-Brazil). L. M. Castro acknowledges Grant Fondecyt 1170258 from the Chilean government. The research of F. Louzada was supported by Grant 305351/2013-3 from from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq-Brazil) and by Grant 2013/07375-0 from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP-Brazil).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis M. Castro.

Appendix

Appendix

1.1 A1: Specification of initial values

It is well known that a smart choice of the initial values for the ML estimates can assure a fast convergence of the algorithm to the global maxima solution. Without considering the random effect term, i.e., \(\mathbf {b}_{i}=\mathbf {0}\), let \(\mathbf{y}_{i}\sim \text {AL}({\varvec{\eta }}({\varvec{\beta }}_{ p},\mathbf {0}),\sigma ,p)\). Next, considering the ML estimates for \({\varvec{\beta }}_{ p}\) and \(\sigma \) as defined in Yu and Zhang (2005) for this model, we follow the steps below for the QR-LME model implementation:

  1. 1.

    Compute an initial value \(\widehat{{\varvec{\beta }}}_{ p}^{{\scriptscriptstyle (0)}}\) as

    $$\begin{aligned} \widehat{{\varvec{\beta }}}_{ p}^{{\scriptscriptstyle (0)}}=\textit{argmin}{\scriptscriptstyle \beta _{ p}\in \mathbb {R}^{{k}}}\sum ^n_{i=1}\rho _{p}({\mathbf{y}_{i}-{\varvec{\eta }}({\varvec{\beta }}_{ p},\mathbf {0})}). \end{aligned}$$
  2. 2.

    Using the initial value of \(\widehat{{\varvec{\beta }}}_{ p}^{{\scriptscriptstyle (0)}}\) obtained above, compute \(\widehat{\sigma }^{(0)}\) as

    $$\begin{aligned} \widehat{\sigma }^{(0)}=\frac{1}{n}\sum ^n_{i=1}\rho _{p}({\mathbf{y}_{i}-{\varvec{\eta }}({\varvec{\beta }}_{ p},\mathbf {0})}). \end{aligned}$$
  3. 3.

    Use a \(q\times q\) identity matrix \(\mathbf I _{{\scriptscriptstyle q\times q}}\) for the the initial value \({\varvec{\varPsi }}^{(0)}\).

1.2 A2: Computing the conditional expectations

Due to the independence between \(u_{ij}\!\mid \! y_{ij},\mathbf {b}_{i}\) and \(u_{ik}\!\mid \! y_{ik},\mathbf {b}_{i}\), for all \(j,k\!=\!1,2,\ldots ,n_{i}\) and \(j\ne k\), we can write \(\mathbf {u}_{i}\mid \mathbf {y}_{i},\mathbf {b}_{i}=[\begin{array}{cccc} u_{i1}\mid y_{i1},\mathbf {b}_{i}&u_{i2}\mid y_{i2},\mathbf {b}_{i}&\cdots&u_{in_{i}}\mid y_{in_{i}},\mathbf {b}_{i}\end{array}]^{{\scriptscriptstyle {\top }}}\). Using this fact, we are able to compute the conditional expectations \(\mathcal {E}(\mathbf {u}_{i})\) and \(\mathcal {E}({\varvec{D}}_{i}^{{\scriptscriptstyle -1}})\) in the following way:

$$\begin{aligned} \mathcal {E}(\mathbf {u}_{i})= & {} \left[ \mathcal {E}(u_{i1})\;\mathcal {E}(u_{i1})\;\cdots \;\mathcal {E}(u_{in_{i}})\right] ^{{\scriptscriptstyle \top }}, \end{aligned}$$
(12)
$$\begin{aligned} \mathcal {E}({\varvec{D}}_{i}^{{\scriptscriptstyle -1}})= & {} \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c}\mathcal {E}(u_{i1}^{-1}) &{} 0 &{} \ldots &{} 0\\ 0 &{} \mathcal {E}(u_{i2}^{-1}) &{} \ldots &{} 0\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \ldots &{} \mathcal {E}(u_{in_{i}}^{-1}) \end{array}\right] . \end{aligned}$$
(13)

We have \(u_{ij}\mid y_{ij},\mathbf {b}_{i}\sim \text {GIG}(\frac{1}{2},\chi _{ij},\psi )\), where \(\chi _{ij}\) and \(\psi \) are defined in Sect. 3.2. Then, using (2), we compute the moments involved in the equations above as \(\mathcal {E}({u}_{ij})=\frac{\chi _{ij}}{\psi }(1+\frac{1}{\chi _{ij}\psi })\) and \(\mathcal {E}({u}_{ij}^{-1})=\frac{\psi }{\chi _{ij}}\). Thus, for the k-th iteration of the algorithm and for the \(\ell \)-th Monte Carlo realization, we can compute \(\mathcal {E}(\mathbf {u}_{i})^{{\scriptscriptstyle (\ell ,k)}}\) and \(\mathcal {E}[{\varvec{D}}_{i}^{{\scriptscriptstyle -1}}]^{{\scriptscriptstyle (\ell ,k)}}\) using equations (12)-(13) where

$$\begin{aligned} \mathcal {E}({u}_{ij})^{{\scriptscriptstyle (\ell ,k)}}= & {} \frac{2|y_{ij}-\eta _{ij}({\varvec{\beta }}_{ p}^{{\scriptscriptstyle (k)}},\mathbf {b}_{i}^{{\scriptscriptstyle (\ell ,k)}})|+4\sigma ^{({\scriptscriptstyle k)}}}{{\displaystyle \tau _{p}^{2}}},\\ \mathcal {E}({u}_{ij}^{-1})^{{\scriptscriptstyle (\ell ,k)}}= & {} \frac{{\displaystyle \tau _{p}^{2}}}{2|y_{ij}-\eta _{ij}({\varvec{\beta }}_{ p}^{{\scriptscriptstyle (k)}},\mathbf {b}_{i}^{{\scriptscriptstyle (\ell ,k)}})|}. \end{aligned}$$

1.3 A3: The empirical information matrix

Using (5), the complete log-likelihood function can be rewritten as

$$\begin{aligned} \ell _{ci}({\varvec{\theta }})= & {} -\frac{3}{2}n_{i}\text{ log }\sigma -\frac{1}{2\sigma \tau _{p}^{2}}\zeta _{i}^{\top }{\varvec{D}}_{i}^{-1}\zeta _{i}-\frac{1}{2}\text{ log }\bigl |{\varvec{\varPsi }}\bigr |-\frac{1}{2}\mathbf {b}_{i}^{\top }{\varvec{\varPsi }}^{-1}\mathbf {b}_{i}-\frac{1}{\sigma }\mathbf {u}_{i}^{\top }\mathbf {1}{}_{n_{i}}, \end{aligned}$$

where \(\zeta _{i}=\mathbf {y}_{i}-{\varvec{\eta }}({\varvec{\beta }}_{ p},\mathbf {b}_{i})-\vartheta _{p}\mathbf {u}_{i}\) and \({\varvec{\theta }}=({\varvec{\beta }}_{{\scriptscriptstyle p}}^{{\scriptscriptstyle \top }},\sigma ,\mathbf {{\varvec{\alpha }}^{{\scriptscriptstyle \top }}})^{{\scriptscriptstyle \top }}\). Differentiating with respect to \({\varvec{\theta }}\), we have the following score functions:

$$\begin{aligned} \frac{\partial \ell _{ci}({\varvec{\theta }})}{\partial {\varvec{\beta }}_{{\scriptscriptstyle p}}}= & {} \frac{\partial {\varvec{\eta }}}{\partial {\varvec{\beta }}_{{\scriptscriptstyle p}}}\frac{\partial \zeta _{i}}{\partial {\varvec{\eta }}}\frac{\partial \ell _{ci}({\varvec{\theta }})}{\partial \zeta _{i}}=\frac{1}{\sigma \tau _{p}^{2}}\mathbf {J_{i}^{\top }}{\varvec{D}}_{i}^{-1}\zeta _{i}, \end{aligned}$$

with \(\mathbf {J_{i}}\) as defined in Sect. 3.2. and

$$\begin{aligned} \frac{\partial \ell _{ci}({\varvec{\theta }})}{\partial \sigma }= & {} -\frac{3n_{i}}{2}\frac{1}{\sigma }+\frac{1}{2\sigma ^{2}\tau _{p}^{2}}\zeta _{i}^{\top }{\varvec{D}}_{i}^{-1}\zeta _{i} + \frac{1}{\sigma ^{2}}\mathbf {u}_{i}^{\top }\mathbf {1}{}_{n_{i}}. \end{aligned}$$

Let \({\varvec{\alpha }}\) be the vector of reduced parameters from \({\varvec{\varPsi }}\), the dispersion matrix for \(\mathbf {b}_i\). Using the trace properties and differentiating the complete log-likelihood function, we have that

$$\begin{aligned} \frac{\partial \ell _{ci}({\varvec{\theta }})}{\partial {\varvec{\varPsi }}}= & {} \frac{\partial }{\partial {\varvec{\varPsi }}}\left[ -\frac{n}{2} log\bigl |{\varvec{\varPsi }}\bigr |-\frac{1}{2}\text {tr}\{{\varvec{\varPsi }}^{-1}\mathbf {b}_{i}\mathbf {b}_{i}^{\top }\}\right] \\= & {} -\frac{1}{2}\text {tr}\{{\varvec{\varPsi }}^{-1}\}+\frac{1}{2}\text {tr}\{{\varvec{\varPsi }}^{-1}{\varvec{\varPsi }}^{-1}\mathbf {b}_{i}\mathbf {b}_{i}^{\top }\}\\= & {} \;\;\;\, \frac{1}{2}\text {tr}\{{\varvec{\varPsi }}^{-1}(\mathbf {b}_{i}\mathbf {b}_{i}^{\top }-{\varvec{\varPsi }}){\varvec{\varPsi }}^{-1}\}. \end{aligned}$$

Next, taking derivatives with respect to a specific \(\alpha _{j}\) from \({\varvec{\alpha }}\) based on the chain rule, we have

$$\begin{aligned} \frac{\partial \ell _{ci}({\varvec{\theta }})}{\partial \alpha _{j}}= & {} \frac{\partial {\varvec{\varPsi }}}{\partial \alpha _{j}}\frac{\partial \ell _{ci}({\varvec{\theta }})}{\partial {\varvec{\varPsi }}}\\= & {} \frac{\partial {\varvec{\varPsi }}}{\partial \alpha _{j}}\frac{1}{2}\text {tr}\{{\varvec{\varPsi }}^{-1}(\mathbf {b}_{i}\mathbf {b}_{i}^{\top }-{\varvec{\varPsi }}){\varvec{\varPsi }}^{-1}\}. \end{aligned}$$

where, using the fact that \(\text {tr}\{\mathbf{ABCD }\}=(\mathrm {vec}(\mathbf{A }^{\top }))^{\top }\)\(({\varvec{D}}^{\top }\otimes \mathbf{B })(\mathrm {vec}(\mathbf{C }))\), (14) can be rewritten as

$$\begin{aligned} \frac{\partial \ell _{ci}({\varvec{\theta }})}{\partial \alpha _{j}}=(\mathrm {vec}(\tfrac{\partial {\varvec{\varPsi }}}{\partial \alpha _{j}}^{\top }))^{\top }\frac{1}{2}({\varvec{\varPsi }}^{-1}\otimes {\varvec{\varPsi }}^{-1})(\mathrm {vec}(\mathbf {b}_{i}\mathbf {b}_{i}^{\top }-{\varvec{\varPsi }})). \end{aligned}$$
(14)

Let \(\mathcal {D}_{q}\) be the elimination matrix (Lavielle 2014), which transforms the vectorized \({\varvec{\varPsi }}\) (written as \(\text{ vec }({\varvec{\varPsi }})\)) into its half-vectorized form vech(\({\varvec{\varPsi }}\)), so that \(\mathcal {D}_{q}\mathrm {vec}({\varvec{\varPsi }})\)\(=\mathrm {vech}({\varvec{\varPsi }}).\) Using the fact that for all \(j=1,\ldots ,\frac{1}{2}q(q+1)\), the vector \((\mathrm {vec}(\frac{\partial {\varvec{\varPsi }}}{\partial \alpha _{j}})^{\top })^{\top }\) corresponds to the j-th row of the elimination matrix \(\mathcal {D}_{q}\), we can generalize the derivative in (14) for the vector of parameters \({\varvec{\alpha }}\) as

$$\begin{aligned} \frac{\partial \ell _{ci}({\varvec{\theta }})}{\partial {\varvec{\alpha }}}= & {} \frac{1}{2}\mathcal {D}_{q}({\varvec{\varPsi }}^{-1}\otimes {\varvec{\varPsi }}^{-1})(\mathrm {vec}(\mathbf {b}_{i}\mathbf {b}_{i}^{\top }-{\varvec{\varPsi }})). \end{aligned}$$

Finally, at each iteration, we can compute the empirical information matrix by approximating the score for the observed log-likelihood by the stochastic approximation given in (9).

1.4 A4: Figures

Fig. 8
figure 8

Soybean data: Point estimates (center solid line) and 95% confidence intervals for model parameters after fitting the QR-NLME model. The interpolated curves are spline-smoothed

Fig. 9
figure 9

ACTG 315 data: Point estimates (center solid line) and 95% confidence intervals for model parameters after fitting the QR-NLME model. The interpolated curves are spline-smoothed

Fig. 10
figure 10

Graphical summary for the convergence of the fixed effect estimates, variance components of the random effects, and nuisance parameters performing a median regression \((p=0.50)\) for the soybean data. The vertical dashed line delimits the beginning of the almost sure convergence as defined by the cut-point parameter \(c=0.25\)

1.5 A5: Sample output from R package qrNLMM()

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galarza, C.E., Castro, L.M., Louzada, F. et al. Quantile regression for nonlinear mixed effects models: a likelihood based perspective. Stat Papers 61, 1281–1307 (2020). https://doi.org/10.1007/s00362-018-0988-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-0988-y

Keywords

Navigation