Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation–Maximization (SAEM) Algorithm

Chow, Sy-Miin; Lu, Zhaohua; Sherwood, Andrew; Zhu, Hongtu

doi:10.1007/s11336-014-9431-z

Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation–Maximization (SAEM) Algorithm

Published: 22 November 2014

Volume 81, pages 102–134, (2016)
Cite this article

Psychometrika Aims and scope Submit manuscript

Sy-Miin Chow¹,
Zhaohua Lu²,
Andrew Sherwood³ &
…
Hongtu Zhu²

1028 Accesses
27 Citations
Explore all metrics

Abstract

The past decade has evidenced the increased prevalence of irregularly spaced longitudinal data in social sciences. Clearly lacking, however, are modeling tools that allow researchers to fit dynamic models to irregularly spaced data, particularly data that show nonlinearity and heterogeneity in dynamical structures. We consider the issue of fitting multivariate nonlinear differential equation models with random effects and unknown initial conditions to irregularly spaced data. A stochastic approximation expectation–maximization algorithm is proposed and its performance is evaluated using a benchmark nonlinear dynamical systems model, namely, the Van der Pol oscillator equations. The empirical utility of the proposed technique is illustrated using a set of 24-h ambulatory cardiovascular data from 168 men and women. Pertinent methodological challenges and unresolved issues are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating reducible stochastic differential equations by conversion to a least-squares problem

Article 10 September 2018

Parameter Estimation for Multivariate Nonlinear Stochastic Differential Equation Models: A Comparison Study

Approximate maximum likelihood estimation for stochastic differential equations with random effects in the drift and the diffusion

Article 04 July 2018

Notes

The local truncation errors of a numerical solver at each time point are equal to $\varvec{c} \Delta ^{g+1}_{i, j}$, where $g$ is the order of the ODE solver and $\varvec{c}$ is a vector of constants that depends on elements such as the differentials of the ODEs (for further details see Press, Teukolsky, Vetterling, & Flannery, 2002; Ralston & Rabinowitz, 2001).
In contrast, in cases involving SDEs, $p(\tilde{\varvec{\mathrm {X}}}|\varvec{b}; \varvec{\theta })$ is not fixed even when $\varvec{b}$ is known and there is considerable increase in estimation complexity.
In the present context, we specify the gain constant to be $\gamma ^{(m)} = a_2/(m^{a_1}+a_2-1), \quad m=1, \ldots , K_1+K_2,$ where the real number $a_1$ and the integer $a_2$ are preassigned. In stage 1, $a_1$ and $a_2$ are selected such that the gain constant assumes some large values to prevent the SAEM algorithm from settling into local minima too quickly. In stage 2, the gain constant is slowly tapered toward zero to allow the algorithm to stabilize toward a final set of estimates (e.g., by setting $a_1 \in (.5, 1]$ to be close to 1, and $a_2$ to be a small integer, say, $a_2=2$). The transition from stage 1 to 2 is governed by another predefined criterion function (for details see Gu & Zhu, 2001; Zhu & Gu, 2007).
In the present study, we define the stopping rule to be
$$\begin{aligned} K_2&= \inf \Bigg \{m: \tilde{\varvec{s}}^{'(m)}_{\varvec{\mathrm {Y}}} \Big [\tilde{\varvec{\mathrm {I}}}^{(m)}_{\varvec{\mathrm {Y}}}\Big ]^{-1} \tilde{\varvec{s}}^{(m)}_{\varvec{\mathrm {Y}}} + \mathop {\mathrm {tr}}\Big \{ \Big [\tilde{\varvec{\mathrm {I}}}^{(m)}_{\varvec{\mathrm {Y}}}\Big ]^{-1}\hat{\varvec{\mathrm {\Sigma }}} \Big \}/m \le \text { some small constant}\Bigg \}. \end{aligned}$$
(13)
$\hat{\Sigma }$ denotes an estimate of the covariance matrix of Monte Carlo error. In practice, we used the sample covariance matrix of $\overline{\varvec{s}}_{\varvec{\mathrm {Z}}}^{(m)}$ as a rough estimate of $\hat{\varvec{\mathrm {\Sigma }}}$.
This decision was made because there were insufficient repeated measurements to estimate this parameter accurately, especially for individuals with a diurnal cycle that is longer than 24 h, or individuals with less than 24 h worth of measurements.
Because the model fitting procedures were based on the second-order Heun’s method whereas the true data were generated using a fourth-order Runge Kutta approach, the errors entailed from approximating the trajectories from the fourth-order solver by means of a second-order solver were expected to lead to some biases in the point estimates. Thus, the coverage performance of the confidence intervals as assessed, e.g., by the proportion of 95 % CIs covering each true population parameter value, can be expected to deviate from the nominal coverage rate of 0.95.
Table 2 Parameter estimates for the Van der Pol oscillator model with $T = 150$, true initial condition = fixed, fitted initial condition = fixed.
Full size table

Table 3 Parameter estimates for the Van der Pol oscillator model with $T = 300$, true initial condition = fixed, fitted initial condition = fixed.
Full size table

Table 4 Parameter estimates for the Van der Pol oscillator model with $T = 150$, true initial condition = fixed, fitted initial condition = random.
Full size table

Table 5 Parameter estimates for the Van der Pol oscillator model with $T = 300$, true initial condition = fixed, fitted initial condition = random.
Full size table

Table 6 Parameter estimates for the Van der Pol oscillator model with $T = 150$, true initial condition = random, fitted initial condition = random.
Full size table

Table 7 Parameter estimates for the Van der Pol oscillator model with $T = 300$, true initial condition = random, fitted initial condition = random.
Full size table

Table 8 Parameter estimates for the Van der Pol oscillator model with $T = 150$, true initial condition = random, fitted initial condition = fixed.
Full size table

Table 9 Parameter estimates for the Van der Pol oscillator model with $T = 300$, true initial condition = random, fitted initial condition = fixed.
Full size table
To ensure the positive definiteness of $\varvec{\mathrm {\Sigma }}_b$, we chose to estimate the lower triangular entries of $\varvec{\mathrm {L}}$ and the diagonal entries of $\varvec{\mathrm {D}}$ in the $\varvec{\mathrm {\Sigma }}_b = LDL^\mathrm{T}$ decomposition, with the constraint that the diagonal elements of $\varvec{\mathrm {D}}$ were positive (Anderson, 2003). These constraints might have affected the accuracy of the point and SE estimates for the initial condition variance–covariance parameters as well.

References

Ait-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. The Annals of Statistics, 36(2), 906–937.
Article Google Scholar
Anderson, T. W. (2003). An introduction to multivariate statistical analysis (3rd ed.)., Probability and Statistics New York, NY: Wiley.
Google Scholar
Arminger, G. (1986). Linear stochastic differential equation models for panel data with unobserved variables. In N. Tuma (Ed.), Sociological methodology (pp. 187–212). San Francisco: Jossey-Bass.
Google Scholar
Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. In C. W. Harris (Ed.), Problems in measuring change (pp. 3–20). Madison, WI: University of Wisconsin Press.
Google Scholar
Beskos, A., Papaspiliopoulos, O., & Roberts, G. (2009). Monte carlo maximum likelihood estimation for discretely observed diffusion processes. The Annals of Statistics, 37(1), 223–245.
Article Google Scholar
Beskos, A., Papaspiliopoulos, O., Roberts, G., & Fearnhead, P. (2006). Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 333–382.
Article Google Scholar
Boker, S. M., & Graham, J. (1998). A dynamical systems analysis of adolescent substance abuse. Multivariate Behavioral Research, 33, 479–507.
Article PubMed Google Scholar
Boker, S. M., & Nesselroade, J. R. (2002). A method for modeling the intrinsic dynamics of intraindividual variability: Recovering the parameters of simulated oscillators in multi- wave panel data. Multivariate Behavioral Research, 37, 127–160.
Article PubMed Google Scholar
Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary methods: Capturing life as it is lived. Annual Review of Psychology, 54, 579–616.
Article PubMed Google Scholar
Brown, E. N., & Luithardt, H. (1999). Statistical model building and model criticism for human circadian data. Journal of Biological Rhythms, 14, 609–616.
Article PubMed Google Scholar
Brown, E. N., Luithardt, H., & Czeisler, C. A. (2000). A statistical model of the human coretemperature circadian rhythm. American Journal of Physiology, Endocrinology and Metabolism, 279, 669–683.
Google Scholar
Browne, M. W., & du Toit, H. C. (1991). Models for learning data. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions (pp. 47–68). Washington, D.C.: American Psychological Association.
Chapter Google Scholar
Cao, J., Huang, J. Z., & Wu, H. (2012). Penalized nonlinear least squares estimation of time-varying parameters in ordinary differential equations. Journal of Computational and Graphical Statistics, 21(1), 42–56. doi:10.1198/jcgs.2011.10021.
Carels, R. A., Blumenthal, J. A., & Sherwood, A. (2000). Emotional responsivity during daily life: Relationship to psychosocial functioning and ambulatory blood pressure. International Journal of Psychophysiology, 36, 25–33.
Article PubMed Google Scholar
Carlin, B. P., Gelfand, A., & Smith, A. (1992). Hierarchical bayesian analysis of changepoints problems. Applied Statistics, 41, 389–405.
Article Google Scholar
Chow, S.-M., Ferrer, E., & Nesselroade, J. R. (2007). An unscented kalman filter approach to the estimation of nonlinear dynamical systems models. Multivariate Behavioral Research, 42(2), 283–321.
Article PubMed Google Scholar
Chow, S.-M., Grimm, K. J., Guillaume, F., Dolan, C. V., & McArdle, J. J. (2013). Regime switching bivariate dual change score model. Multivariate Behavioral Research, 48(4), 463–502.
Article PubMed Google Scholar
Chow, S.-M., Ho, M.-H. R., Hamaker, E. J., & Dolan, C. V. (2010). Equivalences and differences between structural equation and state-space modeling frameworks. Structural Equation Modeling, 17(303–332).
Chow, S.-M., & Nesselroade, J. R. (2004). General slowing or decreased inhibition? Mathematical models of age differences in cognitive functioning. Journals of Gerontology Series B—Psychological Sciences & Social Sciences, 59B(3), 101–109.
Article Google Scholar
Chow, S.-M., Tang, N., Yuan, Y., Song, X., & Zhu, H. (2011). Bayesian estimation of semiparametric dynamic latent variable models using the Dirichlet process prior. British Journal of Mathematical and Statistical Psychology, 64(1), 69–106.
Article PubMed Central PubMed Google Scholar
Chow, S.-M., & Zhang, G. (2013). Nonlinear regime-switching state-space (RSSS) models. Psychometrika: Application Reviews and Case Studies, 78(4), 740–768.
Article Google Scholar
Cronbach, L. J., & Furby, L. (1970). How should we measure “change”—or should we? Psychological Bulletin, 74(1), 68–80.
Article Google Scholar
Cudeck, R., & Klebe, K. J. (2002). Multiphase mixed-effects models for repeated measures data. Psychological Methods, 7(1), 41–6.
Article PubMed Google Scholar
Dembo, A., & Zeitouni, O. (1986). Parameter estimation of partially observed continuous time stochastic processes via the EM algorithm. Stochastic Processes and Their Applications, 23, 91–113.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
Google Scholar
Diebolt, J., & Celeux, G. (1993). Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions. Communications in Statistics B—Stochastic Models, 9(4), 599–613.
Article Google Scholar
Donnet, S., & Samson, A. (2007). Estimation of parameters in incomplete data models defined by dynamical systems. Journal of Statistical Planning and Inference, 137, 2815–2831.
Article Google Scholar
Du Toit, S. H. C., & Browne, M. W. (2001). The covariance structure of a vector ARMA time series. Structural equation modeling: Present and future (pp. 279–314). Chicago: Scientific Software International.
Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., & Alpert, A. (1999). An introduction to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
Google Scholar
Durbin, J., & Koopman, S. J. (2001). Time series analysis by state space methods. New York, NY: Oxford University Press.
Google Scholar
Gates, K. M., & Molenaar, P. C. M. (2012). Group search algorithm recovers effective connectivity maps for individuals in homogeneous and heterogeneous samples. Neuroimage, 63, 310–319.
Article PubMed Google Scholar
Geweke, J., & Tanizaki, H. (2001). Bayesian estimation of state-space models using the Metropolis–Hastings algorithm within Gibbs sampling. Computational Statistics & Data Analysis, 37, 151–170.
Article Google Scholar
Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEEE Proceedings-F, Radar and Signal Processing, 140(2), 107–113.
Article Google Scholar
Gu, M. G., & Zhu, H. T. (2001). Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation. Journal of the Royal Statistical Society, Series B, 63, 339–355.
Article Google Scholar
Hairer, M., Stuart, A. M., Voss, J., & Wiberg, P. (2005). Analysis of spdes arising in path sampling. part i: The gaussian case. Communications in Mathematical Sciences, 3(4), 587–603.
Hale, J. K., & Koçak, H. (1991). Dynamics and bifurcation. New York, NY: Springer.
Book Google Scholar
Harris, C. W. (Ed.). (1963). Problems in measuring change. Madison, WI: University of Wisconsin Press.
Harvey, A. C., & Souza, R. C. (1987). Assessing and modelling the cyclical behaviour of rainfall in northeast Brazil. Journal of Climate and Applied Meteorology, 26, 1317–1322.
Article Google Scholar
Hürzeler, M., & Künsch, H. (1998). Monte carlo approximations for general state-space models. Journal of Computational and Graphical Statistics, 7, 175–193.
Google Scholar
Jones, R. H. (1984). Fitting multivariate models to unequally spaced data. In E. Parzen (Ed.), Time series analysis of irregularly observed data (Vol. 25, p. 158–188). New York, NY: Springer.
Jones, R. H. (1993). Longitudinal data with serial correlation: A state-space approach. Boca Raton, FL: Chapman & Hall/CRC.
Book Google Scholar
Kaplan, D., & Glass, L. (1995). Understanding nonlinear dynamics. New York, NY: Springer.
Book Google Scholar
Kenny, D. A., & Judd, C. M. (1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96, 201–210.
Article Google Scholar
Kincanon, E., & Powel, W. (1995). Chaotic analysis in psychology and psychoanalysis. The Journal of Psychology, 129, 495–505.
Article PubMed Google Scholar
Kitagawa, G. (1998). A self-organizing state-space model. Journal of the American Statistical Association, 93(443), 1203–1215.
Google Scholar
Klein, A. G., & Muthén, B. O. (2007). Quasi maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects. Multivariate Behavioral Research, 42(4), 647–673.
Article Google Scholar
Kuhn, E., & Lavielle, M. (2005). Maximum likelihood estimation in nonlinear mixed effects models. Computational Statistics & Data Analysis, 49, 1020–1038.
Article Google Scholar
Kulikov, G., & Kulikova, M. (2014). Accurate numerical implementation of the continuous-discrete extended Kalman filter. IEEE Transactions on Automatic Control, 59(1), 273–279. doi:10.1109/TAC.2013.2272136.
Article Google Scholar
Lee, S., & Song, X. (2003). Maximum likelihood estimation and model comparison for mixtures of structural equation models with ignorable missing data. Journal of Classification, 20(2), 221–255. doi:10.1007/s00357-003-0013-5.
Article Google Scholar
Li, F., Duncan, T. E., & Acock, A. (2000). Modeling interaction effects in latent growth curve models. Structural Equation Modeling, 7(4), 497–533.
Article Google Scholar
Liang, H., Miao, H., & Wu, H. (2010). Estimation of constant and time-varying dynamic parameters of HIV infection in a nonlinear differential equation model. Annals of Applied Statistics, 4(1), 460–483.
Article PubMed Central PubMed Google Scholar
Longstaff, M. G., & Heath, R. A. (1999). A nonlinear analysis of the temporal characteristics of handwriting. Human Movement Science, 18, 485–524.
Article Google Scholar
Losardo, D. (2012). An examination of initial condition specification in the structural equation modeling framework. Unpublished doctoral dissertation, University of North Carolina, Chapel Hill, NC.
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 44, 190–200.
Google Scholar
Marsh, W. H., Wen, Z. L., & Hau, J.-T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9, 275–300.
Article PubMed Google Scholar
Mbalawata, I. S., Särkkä, S., & Haario, H. (2013). Parameter estimation in stochastic differential equations with Markov chain Monte Carlo and non-linear Kalman filtering. Computational Statistics, 28(3), 1195–1223.
Article Google Scholar
McArdle, J. J., & Hamagami, F. (2001). Latent difference score structural models for linear dynamic analysis with incomplete longitudinal data. In L. Collins & A. Sayer (Eds.), New methods for the analysis of change (pp. 139–175). Washington, DC: American Psychological Association.
Chapter Google Scholar
Mcardle, J. J., & Hamagami, F. (2003). Structural equation models for evaluating dynamic concepts within longitudinal twin analyses. Behavior Genetics, 33(2), 137–159. doi:10.1023/A:1022553901851.
Article PubMed Google Scholar
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107–122.
Article Google Scholar
Miao, H., Xin, X., Perelson, A. S., & Wu, H. (2011). On identifiability of nonlinear ODE models and applications in viral dynamics. SIAM Review, 53(1), 3–39.
Article PubMed Central PubMed Google Scholar
Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific pyschology-this time forever. Measurement: Interdisciplinary Research and Perspectives, 2, 201–218.
Google Scholar
Molenaar, P. C. M., & Newell, K. M. (2003). Direct fit of a theoretical model of phase transition in oscillatory finger motions. British Journal of Mathematical and Statistical Psychology, 56, 199–214. doi:10.1348/000711003770480002.
Article PubMed Google Scholar
Ortega, J. (1990). Numerical analysis: A second course. Philadelphia, PA: Society for Industrial and Academic Press.
Book Google Scholar
Oud, J. H. L. (2007). Comparison of four procedures to estimate the damped linear differential oscillator for panel data. In J. Oud & A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Oud, J. H. L., & Jansen, R. A. R. G. (2000). Continuous time state space modeling of panel data by means of SEM. Psychometrika, 65(2), 199–215.
Article Google Scholar
Oud, J. H. L., & Singer, H. (Eds.). (2010). Special issue: Continuous time modeling of panel data, 62 (1).
Pickering, T. G., Shimbo, D., & Haas, D. (2006). Ambulatory blood-pressure monitoring. The New England Journal of Medicine, 354, 2368–2374.
Article PubMed Google Scholar
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2002). Numerical recipes in C. Cambridge: Cambridge University Press.
Google Scholar
R Development Core Team. (2009). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria: R Foundation for Statistical Computing. Retrieved April, 2014, from http://www.R-project.org (ISBN: 3-900051-07-0).
Ralston, A., & Rabinowitz, P. (2001). A first course in numerical analysis (2nd ed.). Mineola, NY: Dover.
Google Scholar
Ramsay, J. O., Hooker, G., Campbell, D., & Cao, J. (2007). Parameter estimation for differential equations: A generalized smoothing approach (with discussion). Journal of Royal Statistical Society: Series B, 69(5), 741–796.
Article Google Scholar
Raudenbush, S. W., & Liu, X.-F. (2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods, 6(4), 387–401.
Article PubMed Google Scholar
Särkkä, S. (2013). Bayesian filtering and smoothing. Hillsdale, NJ: Cambridge University Press.
Book Google Scholar
SAS Institute Inc. (2008). SAS 9.2 Help and Documentation (Computer software manual). Cary, NC: SAS Institute Inc.
Sherwood, A., Steffen, P., Blumenthal, J., Kuhn, C., & Hinderliter, A. L. (2002). Nighttime blood pressure dipping: The role of the sympathetic nervous system. American Journal of Hypertension, 15, 111–118.
Article PubMed Google Scholar
Sherwood, A., Thurston, R., Steffen, P., Blumenthal, J. A., Waugh, R. A., & Hinderliter, A. L. (2001). Blunted nighttime blood pressure dipping in postmenopausal women. American Journal of Hypertension, 14, 749–754.
Article PubMed Google Scholar
Singer, H. (1992). The aliasing-phenomenon in visual terms. Journal of Mathematical Sociology, 14(1), 39–49.
Article Google Scholar
Singer, H. (1995). Analytical score function for irregularly sampled continuous time stochastic processes with control variables and missing values. Econometric Theory, 11, 721–735. doi:10.1017/S0266466600009701.
Article Google Scholar
Singer, H. (2002). Parameter estimation of nonlinear stochastic differential equations: Simulated maximum likelihood vs. extended Kaman filter and itô-Taylor expansion. Journal of Computational and Graphical Statistics, 11, 972–995.
Article Google Scholar
Singer, H. (2007). Stochastic differential equation models with sampled data. In K. van Montfort, J. H. L. Oud, & A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences (pp. 73–106). Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Singer, H. (2010). Sem modeling with singular moment matrices. Part I: Ml-estimation of time series. The Journal of Mathematical Sociology, 34(4), 301–320. doi:10.1080/0022250X.2010.532259.
Article Google Scholar
Singer, H. (2012). Sem modeling with singular moment matrices. Part II: Ml-estimation of sampled stochastic differential equations. The Journal of Mathematical Sociology, 36(1), 22–43. doi:10.1080/0022250X.2010.532259.
Article Google Scholar
Stone, A. A., & Shiffman, S. (1994). Ecological momentary assessment (ema) in behavioral medicine. Annals of Behavioral Medicine, 16(3), 199–202.
Google Scholar
Strogatz, S. H. (1994). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Cambridge, MA: Westview.
Google Scholar
Stuart, A. M., Voss, J., & Wilberg, P. (2004). Conditional path sampling of sdes and the langevin mcmc method. Communications in Mathematical Sciences, 2(4), 685–697.
Tanizaki, H. (1996). Nonlinear filters: Estimation and applications (2nd ed.). Berlin: Springer.
Book Google Scholar
Thatcher, R. W. (1998). A predator–prey model of human cerebral development. In K. M. Newell & P. C. M. Molenaar (Eds.), Applications of nonlinear dynamics to developmental process modeling (pp. 87–128). Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Wen, Z., Marsh, H. W., & Hau, K.-T. (2002). Interaction effects in growth modeling: A full model. Structural Equation Modeling, 9(1), 20–39.
Article Google Scholar
Wu, H. (2005). Statistical methods for HIV dynamic studies in AIDS clinical trials. Statistical Methods in Medical Research, 14, 171–192.
Article PubMed Google Scholar
Zhu, H., Gu, M., & Peterson, B. (2007). Maximum likelihood from spatial random effects models via the stochastic approximation expectation maximization algorithm. Statistics and Computing Archive, 17(2), 163–177.
Article Google Scholar
Zhu, H. T., & Zhang, H. P. (2006). Generalized score test of homogeneity for mixed effects models. Annals of Statistics, 34, 1545–1569.
Article Google Scholar

Download references

Acknowledgments

Funding for this study was provided by NSF Grant BCS-0826844, NIH Grants RR025747-01, P01CA142538-01, MH086633, EB005149-01, AG033387, and R01GM105004.

Author information

Authors and Affiliations

The Pennsylvania State University, 413 Biobehavioral Health Building, University Park, PA, 16802 , USA
Sy-Miin Chow
University of North Carolina at Chapel Hill, Chapel Hill, USA
Zhaohua Lu & Hongtu Zhu
Duke University, Durham, USA
Andrew Sherwood

Authors

Sy-Miin Chow
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohua Lu
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Sherwood
View author publications
You can also search for this author in PubMed Google Scholar
Hongtu Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sy-Miin Chow.

Appendices

Appendix 1: Score Vector and Information Matrix of the Complete-Data Loglikelihood Function

The elements in $\varvec{s}_{\varvec{\mathrm {Z}}}(\varvec{\theta }; \varvec{\mathrm {Z}})$ and $\varvec{\mathrm {I}}_{\varvec{\mathrm {Z}}}(\varvec{\theta }\varvec{\mathrm {Z}})$, namely, the score vector and information matrix of the complete-data loglikelihood function, are computed as

$$\begin{aligned} \varvec{s}_{\varvec{\mathrm {Z}}}(\varvec{\theta }; \varvec{\mathrm {Z}})&= \frac{\partial L(\varvec{\mathrm {Z}};\varvec{\theta })}{\partial \varvec{\theta }} = \displaystyle {\sum _{i=1}^{n}} \begin{bmatrix} \Big (\sum _{j=1}^\mathrm{T}\frac{\partial L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\beta }}\Big )\\ \Big (\sum _{j=1}^\mathrm{T}\frac{\partial L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }}}\Big )\\ \Big (\sum _{j=1}^\mathrm{T}\frac{\partial L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \Big )\\ \Big (\sum _{j=1}^\mathrm{T}\frac{\partial L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta })}{\partial \varvec{\theta }_{\varvec{\epsilon }}}\Big )\\ \Big (\frac{\partial L_{i}(\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta }_{\varvec{b}}}\Big ) \end{bmatrix}, \end{aligned}$$

(17)

$$\begin{aligned} \varvec{\mathrm {I}}_{\varvec{\mathrm {Z}}}(\varvec{\theta };\varvec{\mathrm {Z}})&= -\frac{\partial ^2 L(\varvec{\mathrm {Z}};\varvec{\theta })}{\partial \varvec{\theta } \partial \varvec{\theta }^\mathrm{T}}=- \displaystyle {\sum _{i=1}^{n}} \mathop {\mathrm {Diag}}\begin{bmatrix} \sum _{j=1}^\mathrm{T}\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b};\varvec{\theta }) }{\partial \varvec{\beta } \partial \varvec{\beta }^\mathrm{T}}\\ \sum _{j=1}^\mathrm{T}\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mu }}} \\ \sum _{j=1}^\mathrm{T}\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}}\\ \sum _{j=1}^\mathrm{T}\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta })}{\partial \varvec{\theta }_{\varvec{\epsilon }} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\epsilon }}}\\ \frac{\partial ^2 L_{i}(\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{b}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}} \end{bmatrix}, \end{aligned}$$

(18)

where $\mathop {\mathrm {Diag}}(.)$ denotes a block diagonal matrix formed by stacking the appropriate second partial derivative matrices in its diagonal section and zero matrices in its off-diagonal sections.

Using Heun’s method, with $\tilde{\varvec{x}}_i(t_{i,j})$ as defined in Eq. (4) and $\varvec{z}_i(t_{i,j}) = [\varvec{y}_i(t_{i, j})-\varvec{\mu }-\varvec{\mathrm {\Lambda }}\tilde{\varvec{x}}_i(t_{i, j})]$, first-order partial derivative elements of the complete-data loglikelihood are given by

$$\begin{aligned} \frac{\partial L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\beta }}&= -\,\frac{1}{2} \Bigg \{ \frac{\partial \varvec{\theta }_{f,i}}{\partial \varvec{\beta }} \frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }_{f,i}} \frac{\partial \varvec{z}_i(t_{i,j})}{\partial \tilde{\varvec{x}}_i(t_{i,j})} \frac{\partial \varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\partial \varvec{z}_i(t_{i,j})} \Bigg \} \nonumber \\&= \varvec{\mathrm {H}}^\mathrm{T}_i \frac{\partial \tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}}\varvec{\mathrm {\Lambda }}^\mathrm{T} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j}), \nonumber \\ \frac{\partial L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }}}&= -\,\frac{1}{2} \Bigg \{\frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \frac{\partial \varvec{z}_i(t_{i,j})}{\partial \varvec{\mu }}\frac{\partial \varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\varvec{z}_i(t_{i,j})} \Bigg \} \nonumber \\&= \frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j}),\nonumber \\ \frac{\partial L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}}&= -\,\frac{1}{2} \Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \frac{\partial \varvec{z}_i(t_{i,j})}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}\frac{\partial \varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\partial \varvec{z}_i(t_{i,j})} \Bigg \} \nonumber \\&= \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \mathop {\mathrm {vec}}\Big [ \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j})\tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}\Big ],\nonumber \\ \frac{\partial L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\epsilon }}}&= -\,\frac{1}{2} \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{\partial \varvec{\theta }_{\varvec{\epsilon }}} \Bigg \{\frac{\partial \varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }})} + \frac{\partial \log |\varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }}|}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }})} \Bigg \} \nonumber \\&= \frac{1}{2} \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{\partial \varvec{\theta }_{\varvec{\epsilon }}}\Bigg \{ [\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}] \mathop {\mathrm {vec}}[\varvec{z}_i(t_{i,j}) \varvec{z}_i(t_{i,j})^\mathrm{T}-\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}] \Bigg \}, \nonumber \\ \frac{\partial L_{i}(\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta }_{\varvec{b}}}&= -\frac{1}{2} \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{\partial \varvec{\theta }_{\varvec{b}}} \Bigg \{ \frac{\partial \varvec{b}^\mathrm{T}_i \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \partial \varvec{b}_i}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma }}_{\varvec{b}})} + \frac{\partial \log |\varvec{\mathrm {\Sigma }}_{\varvec{b}}|}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma }}_{\varvec{b}})} \Bigg \} \nonumber \\&= \frac{1}{2} \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{\partial \varvec{\theta }_{\varvec{b}}} \Bigg \{ [\varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1}] \mathop {\mathrm {vec}}(\varvec{b}_i \varvec{b}_i^\mathrm{T}-\varvec{\mathrm {\Sigma _{\varvec{b}}}}) \Bigg \}, \end{aligned}$$

(19)

where the $\mathop {\mathrm {vec}}(\varvec{\mathrm {W}})$ operator stacks the columns of the $m \times n$ matrix $\varvec{\mathrm {W}}$ into an $mn$-dimensional column vector and $\frac{\partial \tilde{\varvec{x}}_i(t_{i, j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}}$ is dictated by the dynamic model under consideration. Terms such as $\frac{\partial \varvec{\mu } }{\partial \varvec{\theta }_{\varvec{\mu }}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{\partial \varvec{\theta }_{\varvec{\epsilon }}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\eta }}}})}{\partial \varvec{\theta }_{\varvec{\eta }}}$, and $\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{\partial \varvec{\theta }_{\varvec{b}}}$ also depend on the model specification adopted in a particular application. Cases where some elements of $\varvec{\mathrm {\Lambda }}, \varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }}, \varvec{\mathrm {\Sigma }}_{\varvec{\eta }}$, and $\varvec{\mathrm {\Sigma }}_{\varvec{b}}$ are fixed at known values can be readily accommodated through appropriate specification of these matrices of partial derivatives.

Second-order partial derivative elements of the complete-data loglikelihood function are computed as

$$\begin{aligned} \frac{\partial ^2 L_{i,j}({\varvec{\mathrm {Y}}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta } \partial {\varvec{\beta }}^\mathrm{T}}&= \Big (\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{\mathrm {\Lambda }}^\mathrm{T}\otimes \varvec{\mathrm {I}}_{p_{\beta }} \Big ) \frac{\partial \mathop {\mathrm {vec}}}{\partial {\varvec{\beta }}^\mathrm{T}}\Bigg \{\varvec{\mathrm {H}}^\mathrm{T}_i\frac{\partial \tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}} \Bigg \}\\&\quad +\; \Bigg (\varvec{\mathrm {H}}^\mathrm{T}_i \frac{\partial {\tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}}{\partial \varvec{\theta }_{f,i}}\Bigg ) \frac{\partial \varvec{\mathrm {\Lambda }}^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\partial \varvec{z}_i(t_{i, j})^\mathrm{T}}\frac{\partial \varvec{z}_i(t_{i,j})}{\partial \tilde{\varvec{x}}_i(t_{i,j})^\mathrm{T}} \frac{\partial \tilde{\varvec{x}}_i(t_{i, j})}{\partial {\varvec{\theta }}^\mathrm{T}_{f,i}} \frac{\partial \varvec{\theta }_{f,i}}{\partial {\varvec{\beta }}^\mathrm{T}} \\&= \Big (\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{\mathrm {\Lambda }}\otimes \varvec{\mathrm {I}}_{p_{\beta }} \Big ) (\varvec{\mathrm {I}}_{n_x} \otimes \varvec{\mathrm {H}}^\mathrm{T}_i)\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta _{f,i}}^\mathrm{T}}\Bigg \{\frac{\partial \tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}{\partial {\varvec{\theta }}_{f,i}} \Bigg \}\varvec{\mathrm {H}}_i\\&\quad -\; \varvec{\mathrm {H}}^\mathrm{T}_i \frac{\partial \tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}}\varvec{\mathrm {\Lambda }}^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{\mathrm {\Lambda }}\frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}}\varvec{\mathrm {H}}_i,\\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mu }}}&=\Big (\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \otimes \varvec{\mathrm {I}}_{p_{\mu }}\Big )\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mu }}}\Bigg \{\frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}}\Bigg \} - \frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}^\mathrm{T}}, \\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}}&= \Bigg \{\mathop {\mathrm {vec}}[\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j})\tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}]^\mathrm{T}\otimes \varvec{\mathrm {I}}_{p_{\Lambda }}\Bigg \}\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}}\Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \Bigg \} \\&\quad -\,\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} [\tilde{\varvec{x}}_i(t_{i,j})\tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}\otimes \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}] \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}}, \end{aligned}$$

$$\begin{aligned} \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\epsilon }} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\epsilon }}}&= \frac{1}{2}\Big [\mathop {\mathrm {vec}}\Big (\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j})\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \Big )^\mathrm{T} \otimes \varvec{\mathrm {I}}_{p_{\epsilon }} \Big ] \frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\epsilon }}}} \Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }_{\varvec{\mathrm {\epsilon }}} } \Bigg \} \\&\quad -\;\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }_{\varvec{\mathrm {\epsilon }}} } [\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j})\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} ]\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\epsilon }}} }\\&\quad -\;\frac{1}{2}[\mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma ^{-1}_{\varvec{\epsilon }}}})^\mathrm{T} \otimes \varvec{\mathrm {I}}_{p_{\epsilon }}]\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\epsilon }}}}\Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }_{\varvec{\mathrm {\epsilon }}} } \Bigg \}\\&\quad +\;\frac{1}{2}\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }_{\varvec{\mathrm {\epsilon }}} } (\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1})\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\epsilon }}} },\\ \end{aligned}$$

$$\begin{aligned} \frac{\partial ^2 L_{i}(\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{b}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}&=\frac{1}{2}\Big [\mathop {\mathrm {vec}}\Big (\varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1}\varvec{b}_i\varvec{b}^\mathrm{T}_i \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \Big )^\mathrm{T} \otimes \varvec{\mathrm {I}}_{p_{b}} \Big ] \frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}} \Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }_{\varvec{b}} } \Bigg \} \nonumber \\&\quad -\; \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }_{\varvec{b}} } \left[ \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1}\varvec{b}_i\varvec{b}^\mathrm{T}_i \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \right] \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}\nonumber \\&\quad -\;\frac{1}{2} [\mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma ^{-1}_{\varvec{b}}}})^\mathrm{T} \otimes \varvec{\mathrm {I}}_{p_{b}}]\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}\Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }_{\varvec{b}} } \Bigg \}\nonumber \\&\quad +\; \frac{1}{2}\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }_{\varvec{b}} } (\varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1})\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}, \nonumber \\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\mu }}} \partial \varvec{\theta _{\varvec{\mathrm {\Lambda }}}}^\mathrm{T}}&= -\, \frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} [\tilde{\varvec{x}}_i(t_{i,j})^\mathrm{T} \otimes \varvec{\mathrm {I}}_{n_y} ] \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}},\nonumber \\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }}\partial \varvec{\beta }^\mathrm{T}}&= -\,\frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\frac{\partial \varvec{z}_i(t_{i,j})}{\partial \tilde{\varvec{x}}_i(t_{i,j})^\mathrm{T}} \frac{\partial \tilde{\varvec{x}}_i(t_{i, j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}} \frac{\partial \varvec{\theta }_{f,i}}{\partial \varvec{\beta }^\mathrm{T}},\nonumber \\&=-\,\frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{\mathrm {\Lambda }}\frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}}\varvec{\mathrm {H}}_i,\nonumber \\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}\partial \varvec{\mathrm {\beta }}^\mathrm{T}}&=\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \Bigg \{ (\varvec{\mathrm {I}}_{n_x}\otimes \varvec{\mathrm {\Sigma }}_{\epsilon }^{-1}) \Big [ -(\tilde{\varvec{x}}_i(t_{i,j})^\mathrm{T} \otimes \varvec{\mathrm {I}}_{n_y})\varvec{\mathrm {\Lambda }}\frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}}\varvec{\mathrm {H}}_i \nonumber \\&\quad +\; (\varvec{\mathrm {I}}_{n_x} \otimes \varvec{z}_i(t_{i,j})) \frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}} \varvec{\mathrm {H}}_i\Big ]\Bigg \}. \end{aligned}$$

(20)

Other second-order derivative elements are equal to null matrices, including $\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\mu }}}},$ $ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\mathrm {\Lambda }}}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\epsilon }}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mu }}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mathrm {\Lambda }}}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}$ and $\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b};\varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mathrm {\epsilon }}}}\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}$. Under the assumption that the model is correctly specified, the elements in $\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\mathrm {\Lambda }}}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\epsilon }}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\mu }}} \partial \varvec{\theta _{\varvec{\epsilon }}}^\mathrm{T}}$ and $\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\beta }}} \partial \varvec{\theta _{\varvec{\epsilon }}}^\mathrm{T}}$ are close to zeros at the MLEs of the modeling parameters. These elements are thus set to null matrices in the proposed estimation algorithm to stabilize the algorithm when initial parameter estimates are far from the MLEs and are not shown here. In addition, the off-diagonal elements shown in the last three equations in (20) are non-zero even near the MLEs. However, setting all the off-diagonal blocks of the information matrix of the complete-data loglikelihood function to null matrices helps stabilize the algorithm in case this information matrix is not positive definite in the optimization process. In our preliminary simulations, we verified that setting these three matrices to null matrices, as opposed to the forms as shown in Eq. (20), actually helped reduce numerical problems in the optimization process while having negligible effects on the final point and SE estimates because we are not using this matrix directly as the Fisher information matrix to derive the final SE estimates. We thus proceeded to setting all the off-diagonal elements, including the last three matrices shown in Eq. (20), to null matrices.

Appendix 2: Sampling From $p(\varvec{b}| \varvec{\mathrm {Y}};\varvec{\theta }^{(m-1)})$

The superscript $(m-1)$ of $\varvec{\theta }$ is temporarily suppressed for notational simplicity. It can be shown that $p(\varvec{b}| \varvec{\mathrm {Y}};\varvec{\theta })=\prod _{i=1}^{n} p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })$, where $p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })$ is non-standard and cannot be sampled directly. Specifically, $p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta }) \propto p(\varvec{b}_i;\varvec{\theta }_{\varvec{b}})p(\varvec{\mathrm {Y}}_i | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })$, in which $p(\varvec{\mathrm {Y}}_i | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })$ is given by

$$\begin{aligned} p(\varvec{\mathrm {Y}}_i | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta }) = \prod _{j=1}^\mathrm{T} p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta }), \end{aligned}$$

(21)

where $p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })$ is a multivariate normal density function with mean $\varvec{\mu }+\varvec{\mathrm {\Lambda }}\tilde{\varvec{x}}_i(t_{i, j})$ and covariance matrix $\varvec{\mathrm {\Sigma }}_{\epsilon }$. As $\varvec{b}_i$ is involved in the nonlinear $f(.)$ in $p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta }), p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })$ is usually non-standard. To sample from $p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })$, we adopt a Metropolis-Hastings (MH) algorithm as follows. At the $m$th iteration with current values in $\varvec{b}_i^{(m)}$, a new candidate $\varvec{b}_i$ is generated from a proposal distribution, chosen to be the normal distribution $\text{ N }(\varvec{b}_i^{(m)},\sigma _{b}^{2} \varvec{\mathrm {\Omega }}_{bi})$, where $\sigma _{b}^{2}$ is a scaling constant, $\varvec{\mathrm {\Omega }}_{bi} = (\varvec{\mathrm {\Sigma }}_{\varvec{b}}^{-1}+\sum _{j=1}^\mathrm{T}\varvec{\mathrm {D}}_{bit}^\mathrm{T}\varvec{\mathrm {\Sigma }}_{\epsilon }^{-1} \varvec{\mathrm {D}}_{bit})^{-1}, \varvec{\mathrm {D}}_{bit}=\partial \tilde{\varvec{x}}_i(t_{i, j})/\partial \varvec{b}_{i}^\mathrm{T} |_{\varvec{b}_i=\varvec{b}_i^{*}}$, and $\varvec{b}_i^{*}$ is a fixed value with high $p(\varvec{b}_i^{*}| \varvec{\mathrm {Y}};\varvec{\theta })$. One possibility is to use the mean of $p(\varvec{b}_i;\varvec{\theta }_{\varvec{b}})$ as $\varvec{b}_i^{*}$, which we have found to lead to good performance. The new $\varvec{b}_i$ is accepted with probability

$$\begin{aligned} \text{ min }\left\{ 1,\frac{p(\varvec{b}_i;\varvec{\theta }_{\varvec{b}}) \prod _{j=1}^\mathrm{T} p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })}{p(\varvec{b}^{(m)}_i;\varvec{\theta }_{\varvec{b}}) \prod _{j=1}^\mathrm{T} p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i^{(m)}; \varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })} \right\} , \end{aligned}$$

(22)

The scaling constant, $\sigma _{b}^2$, can be chosen such that the average acceptance rate is approximately 0.4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chow, SM., Lu, Z., Sherwood, A. et al. Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation–Maximization (SAEM) Algorithm. Psychometrika 81, 102–134 (2016). https://doi.org/10.1007/s11336-014-9431-z

Download citation

Received: 13 December 2013
Published: 22 November 2014
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11336-014-9431-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation–Maximization (SAEM) Algorithm

Abstract

Access this article

Similar content being viewed by others

Estimating reducible stochastic differential equations by conversion to a least-squares problem

Parameter Estimation for Multivariate Nonlinear Stochastic Differential Equation Models: A Comparison Study

Approximate maximum likelihood estimation for stochastic differential equations with random effects in the drift and the diffusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Score Vector and Information Matrix of the Complete-Data Loglikelihood Function

Appendix 2: Sampling From \(p(\varvec{b}| \varvec{\mathrm {Y}};\varvec{\theta }^{(m-1)})\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation–Maximization (SAEM) Algorithm

Abstract

Access this article

Similar content being viewed by others

Estimating reducible stochastic differential equations by conversion to a least-squares problem

Parameter Estimation for Multivariate Nonlinear Stochastic Differential Equation Models: A Comparison Study

Approximate maximum likelihood estimation for stochastic differential equations with random effects in the drift and the diffusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Score Vector and Information Matrix of the Complete-Data Loglikelihood Function

Appendix 2: Sampling From \(p(\varvec{b}| \varvec{\mathrm {Y}};\varvec{\theta }^{(m-1)})\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation