Skip to main content
Log in

Parameter estimation of complex mixed models based on meta-model approach

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Complex biological processes are usually experimented along time among a collection of individuals, longitudinal data are then available. The statistical challenge is to better understand the underlying biological mechanisms. A standard statistical approach is mixed-effects model where the regression function is highly-developed to describe precisely the biological processes (solutions of multi-dimensional ordinary differential equations or of partial differential equation). A classical estimation method relies on coupling a stochastic version of the EM algorithm with a Monte Carlo Markov Chain algorithm. This algorithm requires many evaluations of the regression function. This is clearly prohibitive when the solution is numerically approximated with a time-consuming solver. In this paper a meta-model relying on a Gaussian process emulator is proposed to approximate the regression function, that leads to what is called a mixed meta-model. The uncertainty of the meta-model approximation can be incorporated in the model. A control on the distance between the maximum likelihood estimates of the mixed meta-model and the maximum likelihood estimates of the exact mixed model is guaranteed. Eventually, numerical simulations are performed to illustrate the efficiency of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Äijö, T., Lähdesmäki, H.: Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics. Bioinformatics 25(22), 2937–2944 (2009)

    Article  Google Scholar 

  • Aronszajn, N.: Theory of reproducing kernel. Trans. Am. Math. Soc. 68(3), 337–404 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  • Barbillon, P., Celeux, G., Grimaud, A., Lefebvre, Y., Rocquigny, E.D.: Nonlinear methods for inverse statistical problems. Comput. Stat. Data Anal. 55(1), 132–142 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Chatterjee, A., Guedj, J.: Mathematical modelling of HCV infection: what can it teach us in the era of direct-acting antiviral agents? Antivir. Ther. 17(6 Pt B), 1171–1182 (2012)

    Article  Google Scholar 

  • Davidian, M., Giltinan, D.: Nonlinear Models to Repeated Measurement Data. Chapman and Hall, London (1995)

    Google Scholar 

  • Delyon, B., Lavielle, M., Moulines, E.: Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27, 94–128 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Donnet, S., Samson, A.: Estimation of parameters in incomplete data models defined by dynamical systems. J. Stat. Plan. Inference 137, 2815–2831 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Fang, K., Li, R., Sudjianto, A.: Design and Modeling for Computer Experiments (Computer Science & Data Analysis). Chapman & Hall/CRC, Boca Raton (2005)

    Book  MATH  Google Scholar 

  • Fu, S., Celeux, G., Bousquet, N., Couplet, M.: Bayesian inference for inverse problems occurring in uncertainty analysis. International Journal for Uncertainty Quantification 5(1), 73–98 (2014)

    Article  MathSciNet  Google Scholar 

  • Grenier, E., Louvet, V., Vigneaux, P.: Parameter estimation in non-linear mixed effects models with SAEM Algorithm: extension from ODE to PDE. Math. Model. Numer. Anal. (ESAIM) 48(5), 1303 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Guedj, J., Thiébaut, R., Commenges, D.: Maximum likelihood estimation in dynamical models of HIV. Biometrics 63, 1198–2006 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Haario, H., Laine, M., Mira, A., Saksman, E.: Dram: efficient adaptive mcmc. Stat. Comput. 16(4), 339–354 (2006)

    Article  MathSciNet  Google Scholar 

  • Johnson, M.E., Moore, L.M., Ylvisaker, D.: Minimax and maximin distance designs. J. Stat. Plan. Inference 26(2), 131–148 (1990)

    Article  MathSciNet  Google Scholar 

  • Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13(4), 455–492 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Kennedy, M., O’Hagan, A.: Bayesian calibration of computer models (with discussion). J. R. Stat. Soc. Ser. B. Methodol. 63(3), 425–464 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Kim, S., Li, L.: Statistical identifiability and convergence evaluation for nonlinear pharmacokinetic models with particle swarm optimization. Comput. Methods Progr. Biomed. 113(2), 413–432 (2014)

    Article  Google Scholar 

  • Koehler, J.R., Owen, A.B.: Computer experiments. Design and analysis of experiments, Handbook of Statistics, vol. 13, pp. 261–308. North-Holland, Amsterdam (1996)

    Google Scholar 

  • Kuhn, E., Lavielle, M.: Maximum likelihood estimation in nonlinear mixed effects models. Comput. Stat. Data Anal. 49, 1020–1038 (2005)

  • Lavielle, M., Samson, A., Fermin, A., Mentre, F.: Maximum likelihood estimation of long term HIV dynamic models and antiviral response. Biometrics 67(1), 250–259 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Lophaven, N., Nielsen, H., Sondergaard, J.: DACE, a Matlab Kriging toolbox. Tech. Rep. IMM-TR-2002-12, DTU. http://www2.imm.dtu.dk/~hbn/dace/dace (2002)

  • Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 44(2), 226–233 (1982)

    MathSciNet  MATH  Google Scholar 

  • Pinheiro, J., Bates, D.: Mixed-Effect Models in S and Splus. Springer, New York (2000)

    Book  Google Scholar 

  • Prasad, N., Rao, J.N.K.: The estimation of the mean squared error of small-area estimators. J. Am. Stat. Assoc. 85, 163–171 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  • Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). MIT Press, Cambridge (2005)

    Google Scholar 

  • Ribba, B., Kaloshi, G., Peyre, M., Ricard, D., Calvez, V., Tod, M., Cajavec-Bernard, B., Idbaih, A., Psimaras, D., Dainese, L., Pallud, J., Cartalat-Carel, S., Delattre, J., Honnorat, J., Grenier, E., Ducray, F.: A tumor growth inhibition model for low-grade glioma treated with chemotherapy or radiotherapy. Clin. Cancer Res. 18, 5071–5080 (2012)

    Article  Google Scholar 

  • Rougier, J.: Efficient emulators for multivariate deterministic functions. J. Comput. Graph. Stat. 17(4), 827–843 (2008)

    Article  MathSciNet  Google Scholar 

  • Sacks, J., Schiller, S.B., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4, 409–435 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  • Samson, A., Lavielle, M., Mentré, F.: The SAEM algorithm for group comparison tests in longitudinal data analysis based on non-linear mixed-effects model. Stat. Med. 26(27), 4860–4875 (2007)

    Article  MathSciNet  Google Scholar 

  • Santner, T.J., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments. Springer, New York (2003)

    Book  MATH  Google Scholar 

  • Schaback, R.: Error estimates and condition numbers for radial basis function interpolation. Adv. Comput. Math. 3(3), 251–264 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Schaback, R.: Kernel-based meshless methods. Tech. Rep., Institute for Numerical and Applied Mathematics, Georg-August-University Goettingen (2007)

  • Wei, G.C.G., Tanner, M.A.: Calculating the content and boundary of the highest posterior density region via data augmentation. Biometrika 77(3), 649–652 (1990)

    Article  MathSciNet  Google Scholar 

  • Wolfinger, R.: Laplace’s approximation for nonlinear mixed models. Biometrika 80(4), 791–795 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Wu, H., Huang, Y., Acosta, E., Rosenkranz, S., Kuritzkes, D., Eron, J., Perelson, A., Gerber, J.: Modeling long-term HIV dynamics and antiretroviral response: effects of drug potency, pharmacokinetics, adherence, and drug resistance. J. Acquir. Immune Defic. Syndr. 39, 272–283 (2005)

    Article  Google Scholar 

  • Wu, Z.M., Schaback, R.: Local error estimates for radial basis function interpolation of scattered data. IMA J. Numer. Anal. 13, 13–27 (1992)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Adeline Samson has been supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01). Les recherches menant aux présents résultats ont bénéficié d’un soutien financier du septiéme programme-cadre de l’Union européenne (7ePC/2007-2013) en vertu de la convention de subvention n 266638.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Barbillon.

Appendix

Appendix

1.1 Proof of Proposition 3

We have

$$\begin{aligned}&|p(\mathbf {y};\theta )-\tilde{p}_D(\mathbf {y};\theta )|\\&\quad \le \int |p(\mathbf {y}|\varvec{\psi };\varvec{\theta })-\tilde{p}_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })|p(\varvec{\psi })d\varvec{\psi }\,. \end{aligned}$$

Therefore, we start by studying \(|p(\mathbf {y}|\varvec{\psi };\varvec{\theta })-\tilde{p}_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })|\):

$$\begin{aligned}&(2\pi \sigma _\varepsilon ^2)^{n_{tot}/2} |p(\mathbf {y}|\varvec{\psi };\varvec{\theta })-\tilde{p}_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })|\\&\quad = \Bigg | \exp \bigg (-\frac{1}{2\sigma _\varepsilon ^2}\sum _{ij}(y_{ij}-f(t_{ij},\psi _i))^2\bigg )\\&\qquad -\exp \bigg (-\frac{1}{2\sigma _\varepsilon ^2}\sum _{ij}(y_{ij}-m_D(t_{ij},\psi _i))^2\bigg )\Bigg |\\&\quad = \ \exp \left( -\frac{1}{2\sigma _\varepsilon ^2}\sum _{ij}(y_{ij}-f(t_{ij},\psi _i))^2\right) \times \\&\qquad \Bigg |1-\exp \bigg (-\frac{1}{2\sigma _\varepsilon ^2}\sum _{ij}\Big ((y_{ij}-m_D(t_{ij},\psi _i))^2\\&\qquad -(y_{ij}-f(t_{ij},\psi _i))^2\Big )\bigg ) \Bigg |\\&\quad \le \Bigg |1-\exp \bigg (-\frac{1}{2\sigma _\varepsilon ^2}\sum _{ij}\Big ( (f(t_{ij},\psi _i)-m_D(t_{ij},\psi _i))\\&\quad (2y_{ij}-f(t_{ij},\psi _i) -m_D(t_{ij},\psi _i))\Big )\bigg )\Bigg |\,. \end{aligned}$$

Under the assumption that the functions f and \(m_D\) are uniformly bounded on the support of \(\psi \), there exists a constant \(C_y\) which is uniform according to \(\psi \) such that \(|2y_{ij}-f(t,\psi )-m_D(t,\psi )|\le C_y\). Proposition 1 implies that the approximation error due to the metamodel \(|f(t_{ij},\psi _i)-m_D(t_{ij},\psi _i)|\) is controlled by inequality (8):

$$\begin{aligned} |f(t_{ij},\psi _i)-m_D(t_{ij},\psi _i)|\le \Vert f\Vert _{\mathcal {H}_K}G_K(a_D)\,. \end{aligned}$$

Then there exists a constant \(C_y\) depending only on \(\mathbf {y}\) such that

$$\begin{aligned}&(2\pi \sigma _\varepsilon ^2)^{n_{tot}/2} |p(\mathbf {y}|\varvec{\psi };\varvec{\theta })-\tilde{p}_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })|\\&\quad \le C_y \frac{n_{tot}}{2\sigma _\varepsilon ^2} \Vert f\Vert _{\mathcal {H}_K}G_K(a_D). \end{aligned}$$

Finally

$$\begin{aligned} |p(\mathbf {y};\theta )-\tilde{p}_D(\mathbf {y};\theta )|\le \frac{C_y}{(2\pi \sigma _\varepsilon ^2)^{n_{tot}/2}}\frac{n_{tot}}{2\sigma _\varepsilon ^2} \Vert f\Vert _{\mathcal {H}_K}G_K(a_D)\,. \end{aligned}$$

\(\square \)

1.2 Proof of Proposition 4

We study the distance between the two likelihoods \(p_D\) and \(\tilde{p}_D\). As in Proposition 3, we start by studying \(|p_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })-\tilde{p}_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })|\). We consider two Gaussian distributions with same expectations and different covariance matrix. Thus this distance is maximum when \(\sum (y_{ij}-m_D(t_{ij},\psi _i))^2=0\). This yields

$$\begin{aligned}&(2\pi )^{n_{tot}/2} |p_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })-\tilde{p}_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })|\\&\quad \le \left| \frac{1}{\sigma _\varepsilon ^{n_{tot}}}-\frac{1}{\sqrt{|\sigma _\varepsilon ^2I_{n_{tot}}+\mathbf {C}_D|}}\right| \\&\quad =\frac{1}{ \sigma _\varepsilon ^{n_{tot}}}\left| 1-\frac{\sigma _\varepsilon ^{n_{tot}}}{|\sigma _\varepsilon ^2I_{n_{tot}}+\mathbf {C}_D|^{1/2}}\right| \\&\quad \le \frac{1}{ \sigma _\varepsilon ^{n_{tot}}}\left| 1-\frac{\sigma _\varepsilon ^{n_{tot}}}{(\sigma _\varepsilon ^2+\frac{1}{n_{tot}}\sum _{ij} C_D(t_{ij},\psi _i;t_{ij},\psi _i))^{n_{tot}/2}}\right| \\&\quad \le \frac{1}{ \sigma _\varepsilon ^{n_{tot}}}\left| 1-\frac{1}{(1+\frac{1}{\sigma _\varepsilon ^2n_{tot}}\sum _{ij} C_D(t_{ij},\psi _i; t_{ij},\psi _i))^{n_{tot}/2}}\right| \\ \end{aligned}$$

where we use that the determinant, as a product of eigen values, is smaller than a function of the trace of the matrix. Thus, the sum is over the diagonal of the matrix \(\mathbf {C}_D\) i.e. the sum of the variances. Then, we obtain that there exists a constant C such that

$$\begin{aligned}&|p_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })-\tilde{p}_D(\mathbf {y}|\varvec{\psi };\varvec{\theta })|\\&\quad \le C \frac{1}{ \sigma _\varepsilon ^{n_{tot}}}\left| \frac{1}{2\sigma _\varepsilon ^2}\sum _{ij} C_D(t_{ij},\psi _i; t_{ij},\psi _i)\right| \\&\quad \le C (2\pi )^{n_{tot}/2}\frac{n_{tot}}{ \sigma _\varepsilon ^{n_{tot}+2}}G_K(a_D) \end{aligned}$$

where the last inequality holds using Proposition 1. Finally, we obtain

$$\begin{aligned} |p_D(\mathbf {y};\theta )-\tilde{p}_D(\mathbf {y};\theta )|\le C_y \frac{n_{tot}}{ \sigma _\varepsilon ^{n_{tot}+2}}G_K(a_D). \end{aligned}$$

The proof is similar for the distance between the two likelihoods \(\bar{p}_D\) and \(\tilde{p}_D\). \(\square \)

1.3 Details of assumptions for Proposition 2

(M1) :

function f and the individual parameters distribution \(p(\psi ; \beta )\) are such that there exist functions \(g, \nu \) of \(\theta \) verifying

$$\begin{aligned} \log p(y, \psi ;\theta )= - g(\theta ) + \left\langle S(y,\psi ), \nu (\theta ) \right\rangle \end{aligned}$$

where \(S(y, \psi )\) is a minimal sufficient statistic of the complete model, taking its value in a subset \({\mathcal {S}}\) and \( \left\langle \cdot , \cdot \right\rangle \) is the scalar product.

(M2) :

The functions \(g(\theta )\) and \(\nu (\theta )\) are twice continuously differentiable on \(\varTheta \).

(M3) :

The function \(\bar{s}:\varTheta \longrightarrow \mathcal {S}\) defined as \( \bar{s} (\theta ) = \int S(y,\psi ) p(\psi |y;\theta ) d\psi \) is continuously differentiable on \(\varTheta \).

(M4) :

The function \(\ell (\theta ) = \log p(y,\theta )\) is continuously differentiable on \(\varTheta \) and

$$\begin{aligned} \partial _\theta \int p(y,\psi ;\theta ) d\psi = \int \partial _\theta p(y,\psi ;\theta ) d\psi . \end{aligned}$$
(M5) :

Define \(L:\mathcal {S}\times \varTheta \longrightarrow \mathbb {R}\) as \( L(s,\theta ) = -g(\theta )+\langle s,\nu (\theta )\rangle \). There exists a function \(\hat{\theta }:\mathcal {S}\longrightarrow \varTheta \) such that

$$\begin{aligned} \forall \theta \in \varTheta , \quad \forall s \in \mathcal {S}, \quad L(s,\hat{\theta }(s))\ge L(s,\theta ). \end{aligned}$$
(SAEM1) :

The positive decreasing sequence of the stochastic approximation \((\alpha _m)_{m \ge 0}\) is such that \(\sum _{m} \alpha _m = \infty \) and \(\sum _{m}\alpha ^2_m < \infty \).

(SAEM2) :

\(\ell :\varTheta \rightarrow \mathbb {R}\) and \(\hat{\theta }:\mathcal {S}\rightarrow \varTheta \) are d times differentiable, where d is the dimension of \(S(y,\psi )\).

(SAEM3) :

For all \(\theta \in \varTheta \), \(\int || S(y,\psi )||^2\, p(\psi |y;\theta ) d\psi < \infty \) and the function \(\varGamma (\theta )=Cov_\theta (S(\psi ))\) is continuous.

(SAEM4) :

S is a bounded function.

(SAEM5’) :

Let us denote \(\Pi _\theta \) the transition kernel of the PMCMC algorithm and \(\pi (\psi )=p(\psi |y;\theta )\) its stationary distribution. We assume that \(\Pi _\theta \) is Lipschitz in \(\theta \) and generates a ergodic chain such that for any starting point \((\varphi )\)

$$\begin{aligned} \sum _{m\ge 0} \Vert 1_{\varphi }\Pi _\theta ^m- \pi \Vert _{TV} <\infty \end{aligned}$$

where \(\Vert \cdot \Vert _{TV}\) is the total variation of probability measures. This property is also called ergodicity of degree 2.

1.4 Additional results for Section 7

In Tables 1, 3 and 2, relative bias and relative root mean square error (RMSE) are displayed for each population parameter. The \(95\,\%\) coverage rates correspond to the coverage rate of the confidence interval on parameters based on the stochastic approximation of the Fisher Information matrix are also displayed in these tables. These results are obtained from 100 replications in Table 1 and from 1000 replications in Tables 3 and 2.

Table 1 One compartment simulations: relative bias (\(\,\%\)), relative MSE (\(\,\%\)) and coverage rate (\(\,\%\)) computed over 100 simulations, with the complete meta-, intermediate meta-, the simple meta- and the exact mixed models
Table 2 One compartment simulations: relative bias (\(\,\%\)), relative MSE (\(\,\%\)) and coverage rate (\(\,\%\)) computed over 1000 simulations, with the intermediate meta-, the simple meta- and the exact mixed models
Table 3 Michaelis–Menten pharmacokinetic simulations: relative bias (\(\,\%\)), relative MSE (\(\,\%\)) and coverage rate (\(\,\%\)) computed over 1000 simulations, with the intermediate meta-, the simple meta- and the exact mixed models

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barbillon, P., Barthélémy, C. & Samson, A. Parameter estimation of complex mixed models based on meta-model approach. Stat Comput 27, 1111–1128 (2017). https://doi.org/10.1007/s11222-016-9674-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9674-x

Keywords

Navigation