Skip to main content

Advertisement

Log in

Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation–Maximization (SAEM) Algorithm

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

The past decade has evidenced the increased prevalence of irregularly spaced longitudinal data in social sciences. Clearly lacking, however, are modeling tools that allow researchers to fit dynamic models to irregularly spaced data, particularly data that show nonlinearity and heterogeneity in dynamical structures. We consider the issue of fitting multivariate nonlinear differential equation models with random effects and unknown initial conditions to irregularly spaced data. A stochastic approximation expectation–maximization algorithm is proposed and its performance is evaluated using a benchmark nonlinear dynamical systems model, namely, the Van der Pol oscillator equations. The empirical utility of the proposed technique is illustrated using a set of 24-h ambulatory cardiovascular data from 168 men and women. Pertinent methodological challenges and unresolved issues are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The local truncation errors of a numerical solver at each time point are equal to \(\varvec{c} \Delta ^{g+1}_{i, j}\), where \(g\) is the order of the ODE solver and \(\varvec{c}\) is a vector of constants that depends on elements such as the differentials of the ODEs (for further details see Press, Teukolsky, Vetterling, & Flannery, 2002; Ralston & Rabinowitz, 2001).

  2. In contrast, in cases involving SDEs, \(p(\tilde{\varvec{\mathrm {X}}}|\varvec{b}; \varvec{\theta })\) is not fixed even when \(\varvec{b}\) is known and there is considerable increase in estimation complexity.

  3. In the present context, we specify the gain constant to be \(\gamma ^{(m)} = a_2/(m^{a_1}+a_2-1), \quad m=1, \ldots , K_1+K_2,\) where the real number \(a_1\) and the integer \(a_2\) are preassigned. In stage 1, \(a_1\) and \(a_2\) are selected such that the gain constant assumes some large values to prevent the SAEM algorithm from settling into local minima too quickly. In stage 2, the gain constant is slowly tapered toward zero to allow the algorithm to stabilize toward a final set of estimates (e.g., by setting \(a_1 \in (.5, 1]\) to be close to 1, and \(a_2\) to be a small integer, say, \(a_2=2\)). The transition from stage 1 to 2 is governed by another predefined criterion function (for details see Gu & Zhu, 2001; Zhu & Gu, 2007).

  4. In the present study, we define the stopping rule to be

    $$\begin{aligned} K_2&= \inf \Bigg \{m: \tilde{\varvec{s}}^{'(m)}_{\varvec{\mathrm {Y}}} \Big [\tilde{\varvec{\mathrm {I}}}^{(m)}_{\varvec{\mathrm {Y}}}\Big ]^{-1} \tilde{\varvec{s}}^{(m)}_{\varvec{\mathrm {Y}}} + \mathop {\mathrm {tr}}\Big \{ \Big [\tilde{\varvec{\mathrm {I}}}^{(m)}_{\varvec{\mathrm {Y}}}\Big ]^{-1}\hat{\varvec{\mathrm {\Sigma }}} \Big \}/m \le \text { some small constant}\Bigg \}. \end{aligned}$$
    (13)

    \(\hat{\Sigma }\) denotes an estimate of the covariance matrix of Monte Carlo error. In practice, we used the sample covariance matrix of \(\overline{\varvec{s}}_{\varvec{\mathrm {Z}}}^{(m)}\) as a rough estimate of \(\hat{\varvec{\mathrm {\Sigma }}}\).

  5. This decision was made because there were insufficient repeated measurements to estimate this parameter accurately, especially for individuals with a diurnal cycle that is longer than 24 h, or individuals with less than 24 h worth of measurements.

  6. Because the model fitting procedures were based on the second-order Heun’s method whereas the true data were generated using a fourth-order Runge Kutta approach, the errors entailed from approximating the trajectories from the fourth-order solver by means of a second-order solver were expected to lead to some biases in the point estimates. Thus, the coverage performance of the confidence intervals as assessed, e.g., by the proportion of 95 % CIs covering each true population parameter value, can be expected to deviate from the nominal coverage rate of 0.95.

    Table 2 Parameter estimates for the Van der Pol oscillator model with \(T = 150\), true initial condition = fixed, fitted initial condition = fixed.
    Table 3 Parameter estimates for the Van der Pol oscillator model with \(T = 300\), true initial condition = fixed, fitted initial condition = fixed.
    Table 4 Parameter estimates for the Van der Pol oscillator model with \(T = 150\), true initial condition = fixed, fitted initial condition = random.
    Table 5 Parameter estimates for the Van der Pol oscillator model with \(T = 300\), true initial condition = fixed, fitted initial condition = random.
    Table 6 Parameter estimates for the Van der Pol oscillator model with \(T = 150\), true initial condition = random, fitted initial condition = random.
    Table 7 Parameter estimates for the Van der Pol oscillator model with \(T = 300\), true initial condition = random, fitted initial condition = random.
    Table 8 Parameter estimates for the Van der Pol oscillator model with \(T = 150\), true initial condition = random, fitted initial condition = fixed.
    Table 9 Parameter estimates for the Van der Pol oscillator model with \(T = 300\), true initial condition = random, fitted initial condition = fixed.
  7. To ensure the positive definiteness of \(\varvec{\mathrm {\Sigma }}_b\), we chose to estimate the lower triangular entries of \(\varvec{\mathrm {L}}\) and the diagonal entries of \(\varvec{\mathrm {D}}\) in the \(\varvec{\mathrm {\Sigma }}_b = LDL^\mathrm{T}\) decomposition, with the constraint that the diagonal elements of \(\varvec{\mathrm {D}}\) were positive (Anderson, 2003). These constraints might have affected the accuracy of the point and SE estimates for the initial condition variance–covariance parameters as well.

References

  • Ait-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. The Annals of Statistics, 36(2), 906–937.

    Article  Google Scholar 

  • Anderson, T. W. (2003). An introduction to multivariate statistical analysis (3rd ed.)., Probability and Statistics New York, NY: Wiley.

    Google Scholar 

  • Arminger, G. (1986). Linear stochastic differential equation models for panel data with unobserved variables. In N. Tuma (Ed.), Sociological methodology (pp. 187–212). San Francisco: Jossey-Bass.

    Google Scholar 

  • Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. In C. W. Harris (Ed.), Problems in measuring change (pp. 3–20). Madison, WI: University of Wisconsin Press.

    Google Scholar 

  • Beskos, A., Papaspiliopoulos, O., & Roberts, G. (2009). Monte carlo maximum likelihood estimation for discretely observed diffusion processes. The Annals of Statistics, 37(1), 223–245.

    Article  Google Scholar 

  • Beskos, A., Papaspiliopoulos, O., Roberts, G., & Fearnhead, P. (2006). Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 333–382.

    Article  Google Scholar 

  • Boker, S. M., & Graham, J. (1998). A dynamical systems analysis of adolescent substance abuse. Multivariate Behavioral Research, 33, 479–507.

    Article  PubMed  Google Scholar 

  • Boker, S. M., & Nesselroade, J. R. (2002). A method for modeling the intrinsic dynamics of intraindividual variability: Recovering the parameters of simulated oscillators in multi- wave panel data. Multivariate Behavioral Research, 37, 127–160.

    Article  PubMed  Google Scholar 

  • Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary methods: Capturing life as it is lived. Annual Review of Psychology, 54, 579–616.

    Article  PubMed  Google Scholar 

  • Brown, E. N., & Luithardt, H. (1999). Statistical model building and model criticism for human circadian data. Journal of Biological Rhythms, 14, 609–616.

    Article  PubMed  Google Scholar 

  • Brown, E. N., Luithardt, H., & Czeisler, C. A. (2000). A statistical model of the human coretemperature circadian rhythm. American Journal of Physiology, Endocrinology and Metabolism, 279, 669–683.

    Google Scholar 

  • Browne, M. W., & du Toit, H. C. (1991). Models for learning data. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions (pp. 47–68). Washington, D.C.: American Psychological Association.

    Chapter  Google Scholar 

  • Cao, J., Huang, J. Z., & Wu, H. (2012). Penalized nonlinear least squares estimation of time-varying parameters in ordinary differential equations. Journal of Computational and Graphical Statistics, 21(1), 42–56. doi:10.1198/jcgs.2011.10021.

  • Carels, R. A., Blumenthal, J. A., & Sherwood, A. (2000). Emotional responsivity during daily life: Relationship to psychosocial functioning and ambulatory blood pressure. International Journal of Psychophysiology, 36, 25–33.

    Article  PubMed  Google Scholar 

  • Carlin, B. P., Gelfand, A., & Smith, A. (1992). Hierarchical bayesian analysis of changepoints problems. Applied Statistics, 41, 389–405.

    Article  Google Scholar 

  • Chow, S.-M., Ferrer, E., & Nesselroade, J. R. (2007). An unscented kalman filter approach to the estimation of nonlinear dynamical systems models. Multivariate Behavioral Research, 42(2), 283–321.

    Article  PubMed  Google Scholar 

  • Chow, S.-M., Grimm, K. J., Guillaume, F., Dolan, C. V., & McArdle, J. J. (2013). Regime switching bivariate dual change score model. Multivariate Behavioral Research, 48(4), 463–502.

    Article  PubMed  Google Scholar 

  • Chow, S.-M., Ho, M.-H. R., Hamaker, E. J., & Dolan, C. V. (2010). Equivalences and differences between structural equation and state-space modeling frameworks. Structural Equation Modeling, 17(303–332).

  • Chow, S.-M., & Nesselroade, J. R. (2004). General slowing or decreased inhibition? Mathematical models of age differences in cognitive functioning. Journals of Gerontology Series B—Psychological Sciences & Social Sciences, 59B(3), 101–109.

    Article  Google Scholar 

  • Chow, S.-M., Tang, N., Yuan, Y., Song, X., & Zhu, H. (2011). Bayesian estimation of semiparametric dynamic latent variable models using the Dirichlet process prior. British Journal of Mathematical and Statistical Psychology, 64(1), 69–106.

    Article  PubMed Central  PubMed  Google Scholar 

  • Chow, S.-M., & Zhang, G. (2013). Nonlinear regime-switching state-space (RSSS) models. Psychometrika: Application Reviews and Case Studies, 78(4), 740–768.

    Article  Google Scholar 

  • Cronbach, L. J., & Furby, L. (1970). How should we measure “change”—or should we? Psychological Bulletin, 74(1), 68–80.

    Article  Google Scholar 

  • Cudeck, R., & Klebe, K. J. (2002). Multiphase mixed-effects models for repeated measures data. Psychological Methods, 7(1), 41–6.

    Article  PubMed  Google Scholar 

  • Dembo, A., & Zeitouni, O. (1986). Parameter estimation of partially observed continuous time stochastic processes via the EM algorithm. Stochastic Processes and Their Applications, 23, 91–113.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.

    Google Scholar 

  • Diebolt, J., & Celeux, G. (1993). Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions. Communications in Statistics B—Stochastic Models, 9(4), 599–613.

    Article  Google Scholar 

  • Donnet, S., & Samson, A. (2007). Estimation of parameters in incomplete data models defined by dynamical systems. Journal of Statistical Planning and Inference, 137, 2815–2831.

    Article  Google Scholar 

  • Du Toit, S. H. C., & Browne, M. W. (2001). The covariance structure of a vector ARMA time series. Structural equation modeling: Present and future (pp. 279–314). Chicago: Scientific Software International.

  • Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., & Alpert, A. (1999). An introduction to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.

    Google Scholar 

  • Durbin, J., & Koopman, S. J. (2001). Time series analysis by state space methods. New York, NY: Oxford University Press.

    Google Scholar 

  • Gates, K. M., & Molenaar, P. C. M. (2012). Group search algorithm recovers effective connectivity maps for individuals in homogeneous and heterogeneous samples. Neuroimage, 63, 310–319.

    Article  PubMed  Google Scholar 

  • Geweke, J., & Tanizaki, H. (2001). Bayesian estimation of state-space models using the Metropolis–Hastings algorithm within Gibbs sampling. Computational Statistics & Data Analysis, 37, 151–170.

    Article  Google Scholar 

  • Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEEE Proceedings-F, Radar and Signal Processing, 140(2), 107–113.

    Article  Google Scholar 

  • Gu, M. G., & Zhu, H. T. (2001). Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation. Journal of the Royal Statistical Society, Series B, 63, 339–355.

    Article  Google Scholar 

  • Hairer, M., Stuart, A. M., Voss, J., & Wiberg, P. (2005). Analysis of spdes arising in path sampling. part i: The gaussian case. Communications in Mathematical Sciences, 3(4), 587–603.

  • Hale, J. K., & Koçak, H. (1991). Dynamics and bifurcation. New York, NY: Springer.

    Book  Google Scholar 

  • Harris, C. W. (Ed.). (1963). Problems in measuring change. Madison, WI: University of Wisconsin Press.

  • Harvey, A. C., & Souza, R. C. (1987). Assessing and modelling the cyclical behaviour of rainfall in northeast Brazil. Journal of Climate and Applied Meteorology, 26, 1317–1322.

    Article  Google Scholar 

  • Hürzeler, M., & Künsch, H. (1998). Monte carlo approximations for general state-space models. Journal of Computational and Graphical Statistics, 7, 175–193.

    Google Scholar 

  • Jones, R. H. (1984). Fitting multivariate models to unequally spaced data. In E. Parzen (Ed.), Time series analysis of irregularly observed data (Vol. 25, p. 158–188). New York, NY: Springer.

  • Jones, R. H. (1993). Longitudinal data with serial correlation: A state-space approach. Boca Raton, FL: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Kaplan, D., & Glass, L. (1995). Understanding nonlinear dynamics. New York, NY: Springer.

    Book  Google Scholar 

  • Kenny, D. A., & Judd, C. M. (1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96, 201–210.

    Article  Google Scholar 

  • Kincanon, E., & Powel, W. (1995). Chaotic analysis in psychology and psychoanalysis. The Journal of Psychology, 129, 495–505.

    Article  PubMed  Google Scholar 

  • Kitagawa, G. (1998). A self-organizing state-space model. Journal of the American Statistical Association, 93(443), 1203–1215.

    Google Scholar 

  • Klein, A. G., & Muthén, B. O. (2007). Quasi maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects. Multivariate Behavioral Research, 42(4), 647–673.

    Article  Google Scholar 

  • Kuhn, E., & Lavielle, M. (2005). Maximum likelihood estimation in nonlinear mixed effects models. Computational Statistics & Data Analysis, 49, 1020–1038.

    Article  Google Scholar 

  • Kulikov, G., & Kulikova, M. (2014). Accurate numerical implementation of the continuous-discrete extended Kalman filter. IEEE Transactions on Automatic Control, 59(1), 273–279. doi:10.1109/TAC.2013.2272136.

    Article  Google Scholar 

  • Lee, S., & Song, X. (2003). Maximum likelihood estimation and model comparison for mixtures of structural equation models with ignorable missing data. Journal of Classification, 20(2), 221–255. doi:10.1007/s00357-003-0013-5.

    Article  Google Scholar 

  • Li, F., Duncan, T. E., & Acock, A. (2000). Modeling interaction effects in latent growth curve models. Structural Equation Modeling, 7(4), 497–533.

    Article  Google Scholar 

  • Liang, H., Miao, H., & Wu, H. (2010). Estimation of constant and time-varying dynamic parameters of HIV infection in a nonlinear differential equation model. Annals of Applied Statistics, 4(1), 460–483.

    Article  PubMed Central  PubMed  Google Scholar 

  • Longstaff, M. G., & Heath, R. A. (1999). A nonlinear analysis of the temporal characteristics of handwriting. Human Movement Science, 18, 485–524.

    Article  Google Scholar 

  • Losardo, D. (2012). An examination of initial condition specification in the structural equation modeling framework. Unpublished doctoral dissertation, University of North Carolina, Chapel Hill, NC.

  • Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 44, 190–200.

    Google Scholar 

  • Marsh, W. H., Wen, Z. L., & Hau, J.-T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9, 275–300.

    Article  PubMed  Google Scholar 

  • Mbalawata, I. S., Särkkä, S., & Haario, H. (2013). Parameter estimation in stochastic differential equations with Markov chain Monte Carlo and non-linear Kalman filtering. Computational Statistics, 28(3), 1195–1223.

    Article  Google Scholar 

  • McArdle, J. J., & Hamagami, F. (2001). Latent difference score structural models for linear dynamic analysis with incomplete longitudinal data. In L. Collins & A. Sayer (Eds.), New methods for the analysis of change (pp. 139–175). Washington, DC: American Psychological Association.

    Chapter  Google Scholar 

  • Mcardle, J. J., & Hamagami, F. (2003). Structural equation models for evaluating dynamic concepts within longitudinal twin analyses. Behavior Genetics, 33(2), 137–159. doi:10.1023/A:1022553901851.

    Article  PubMed  Google Scholar 

  • Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107–122.

    Article  Google Scholar 

  • Miao, H., Xin, X., Perelson, A. S., & Wu, H. (2011). On identifiability of nonlinear ODE models and applications in viral dynamics. SIAM Review, 53(1), 3–39.

    Article  PubMed Central  PubMed  Google Scholar 

  • Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific pyschology-this time forever. Measurement: Interdisciplinary Research and Perspectives, 2, 201–218.

    Google Scholar 

  • Molenaar, P. C. M., & Newell, K. M. (2003). Direct fit of a theoretical model of phase transition in oscillatory finger motions. British Journal of Mathematical and Statistical Psychology, 56, 199–214. doi:10.1348/000711003770480002.

    Article  PubMed  Google Scholar 

  • Ortega, J. (1990). Numerical analysis: A second course. Philadelphia, PA: Society for Industrial and Academic Press.

    Book  Google Scholar 

  • Oud, J. H. L. (2007). Comparison of four procedures to estimate the damped linear differential oscillator for panel data. In J. Oud & A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Oud, J. H. L., & Jansen, R. A. R. G. (2000). Continuous time state space modeling of panel data by means of SEM. Psychometrika, 65(2), 199–215.

    Article  Google Scholar 

  • Oud, J. H. L., & Singer, H. (Eds.). (2010). Special issue: Continuous time modeling of panel data, 62 (1).

  • Pickering, T. G., Shimbo, D., & Haas, D. (2006). Ambulatory blood-pressure monitoring. The New England Journal of Medicine, 354, 2368–2374.

    Article  PubMed  Google Scholar 

  • Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2002). Numerical recipes in C. Cambridge: Cambridge University Press.

    Google Scholar 

  • R Development Core Team. (2009). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria: R Foundation for Statistical Computing. Retrieved April, 2014, from http://www.R-project.org (ISBN: 3-900051-07-0).

  • Ralston, A., & Rabinowitz, P. (2001). A first course in numerical analysis (2nd ed.). Mineola, NY: Dover.

    Google Scholar 

  • Ramsay, J. O., Hooker, G., Campbell, D., & Cao, J. (2007). Parameter estimation for differential equations: A generalized smoothing approach (with discussion). Journal of Royal Statistical Society: Series B, 69(5), 741–796.

    Article  Google Scholar 

  • Raudenbush, S. W., & Liu, X.-F. (2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods, 6(4), 387–401.

    Article  PubMed  Google Scholar 

  • Särkkä, S. (2013). Bayesian filtering and smoothing. Hillsdale, NJ: Cambridge University Press.

    Book  Google Scholar 

  • SAS Institute Inc. (2008). SAS 9.2 Help and Documentation (Computer software manual). Cary, NC: SAS Institute Inc.

  • Sherwood, A., Steffen, P., Blumenthal, J., Kuhn, C., & Hinderliter, A. L. (2002). Nighttime blood pressure dipping: The role of the sympathetic nervous system. American Journal of Hypertension, 15, 111–118.

    Article  PubMed  Google Scholar 

  • Sherwood, A., Thurston, R., Steffen, P., Blumenthal, J. A., Waugh, R. A., & Hinderliter, A. L. (2001). Blunted nighttime blood pressure dipping in postmenopausal women. American Journal of Hypertension, 14, 749–754.

    Article  PubMed  Google Scholar 

  • Singer, H. (1992). The aliasing-phenomenon in visual terms. Journal of Mathematical Sociology, 14(1), 39–49.

    Article  Google Scholar 

  • Singer, H. (1995). Analytical score function for irregularly sampled continuous time stochastic processes with control variables and missing values. Econometric Theory, 11, 721–735. doi:10.1017/S0266466600009701.

    Article  Google Scholar 

  • Singer, H. (2002). Parameter estimation of nonlinear stochastic differential equations: Simulated maximum likelihood vs. extended Kaman filter and itô-Taylor expansion. Journal of Computational and Graphical Statistics, 11, 972–995.

    Article  Google Scholar 

  • Singer, H. (2007). Stochastic differential equation models with sampled data. In K. van Montfort, J. H. L. Oud, & A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences (pp. 73–106). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Singer, H. (2010). Sem modeling with singular moment matrices. Part I: Ml-estimation of time series. The Journal of Mathematical Sociology, 34(4), 301–320. doi:10.1080/0022250X.2010.532259.

    Article  Google Scholar 

  • Singer, H. (2012). Sem modeling with singular moment matrices. Part II: Ml-estimation of sampled stochastic differential equations. The Journal of Mathematical Sociology, 36(1), 22–43. doi:10.1080/0022250X.2010.532259.

    Article  Google Scholar 

  • Stone, A. A., & Shiffman, S. (1994). Ecological momentary assessment (ema) in behavioral medicine. Annals of Behavioral Medicine, 16(3), 199–202.

    Google Scholar 

  • Strogatz, S. H. (1994). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Cambridge, MA: Westview.

    Google Scholar 

  • Stuart, A. M., Voss, J., & Wilberg, P. (2004). Conditional path sampling of sdes and the langevin mcmc method. Communications in Mathematical Sciences, 2(4), 685–697.

  • Tanizaki, H. (1996). Nonlinear filters: Estimation and applications (2nd ed.). Berlin: Springer.

    Book  Google Scholar 

  • Thatcher, R. W. (1998). A predator–prey model of human cerebral development. In K. M. Newell & P. C. M. Molenaar (Eds.), Applications of nonlinear dynamics to developmental process modeling (pp. 87–128). Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Wen, Z., Marsh, H. W., & Hau, K.-T. (2002). Interaction effects in growth modeling: A full model. Structural Equation Modeling, 9(1), 20–39.

    Article  Google Scholar 

  • Wu, H. (2005). Statistical methods for HIV dynamic studies in AIDS clinical trials. Statistical Methods in Medical Research, 14, 171–192.

    Article  PubMed  Google Scholar 

  • Zhu, H., Gu, M., & Peterson, B. (2007). Maximum likelihood from spatial random effects models via the stochastic approximation expectation maximization algorithm. Statistics and Computing Archive, 17(2), 163–177.

    Article  Google Scholar 

  • Zhu, H. T., & Zhang, H. P. (2006). Generalized score test of homogeneity for mixed effects models. Annals of Statistics, 34, 1545–1569.

    Article  Google Scholar 

Download references

Acknowledgments

Funding for this study was provided by NSF Grant BCS-0826844, NIH Grants RR025747-01, P01CA142538-01, MH086633, EB005149-01, AG033387, and R01GM105004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sy-Miin Chow.

Appendices

Appendix 1: Score Vector and Information Matrix of the Complete-Data Loglikelihood Function

The elements in \(\varvec{s}_{\varvec{\mathrm {Z}}}(\varvec{\theta }; \varvec{\mathrm {Z}})\) and \(\varvec{\mathrm {I}}_{\varvec{\mathrm {Z}}}(\varvec{\theta }\varvec{\mathrm {Z}})\), namely, the score vector and information matrix of the complete-data loglikelihood function, are computed as

$$\begin{aligned} \varvec{s}_{\varvec{\mathrm {Z}}}(\varvec{\theta }; \varvec{\mathrm {Z}})&= \frac{\partial L(\varvec{\mathrm {Z}};\varvec{\theta })}{\partial \varvec{\theta }} = \displaystyle {\sum _{i=1}^{n}} \begin{bmatrix} \Big (\sum _{j=1}^\mathrm{T}\frac{\partial L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\beta }}\Big )\\ \Big (\sum _{j=1}^\mathrm{T}\frac{\partial L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }}}\Big )\\ \Big (\sum _{j=1}^\mathrm{T}\frac{\partial L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \Big )\\ \Big (\sum _{j=1}^\mathrm{T}\frac{\partial L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta })}{\partial \varvec{\theta }_{\varvec{\epsilon }}}\Big )\\ \Big (\frac{\partial L_{i}(\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta }_{\varvec{b}}}\Big ) \end{bmatrix}, \end{aligned}$$
(17)
$$\begin{aligned} \varvec{\mathrm {I}}_{\varvec{\mathrm {Z}}}(\varvec{\theta };\varvec{\mathrm {Z}})&= -\frac{\partial ^2 L(\varvec{\mathrm {Z}};\varvec{\theta })}{\partial \varvec{\theta } \partial \varvec{\theta }^\mathrm{T}}=- \displaystyle {\sum _{i=1}^{n}} \mathop {\mathrm {Diag}}\begin{bmatrix} \sum _{j=1}^\mathrm{T}\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b};\varvec{\theta }) }{\partial \varvec{\beta } \partial \varvec{\beta }^\mathrm{T}}\\ \sum _{j=1}^\mathrm{T}\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mu }}} \\ \sum _{j=1}^\mathrm{T}\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}}\\ \sum _{j=1}^\mathrm{T}\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}| \varvec{b}; \varvec{\theta })}{\partial \varvec{\theta }_{\varvec{\epsilon }} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\epsilon }}}\\ \frac{\partial ^2 L_{i}(\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{b}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}} \end{bmatrix}, \end{aligned}$$
(18)

where \(\mathop {\mathrm {Diag}}(.)\) denotes a block diagonal matrix formed by stacking the appropriate second partial derivative matrices in its diagonal section and zero matrices in its off-diagonal sections.

Using Heun’s method, with \(\tilde{\varvec{x}}_i(t_{i,j})\) as defined in Eq. (4) and \(\varvec{z}_i(t_{i,j}) = [\varvec{y}_i(t_{i, j})-\varvec{\mu }-\varvec{\mathrm {\Lambda }}\tilde{\varvec{x}}_i(t_{i, j})]\), first-order partial derivative elements of the complete-data loglikelihood are given by

$$\begin{aligned} \frac{\partial L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\beta }}&= -\,\frac{1}{2} \Bigg \{ \frac{\partial \varvec{\theta }_{f,i}}{\partial \varvec{\beta }} \frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }_{f,i}} \frac{\partial \varvec{z}_i(t_{i,j})}{\partial \tilde{\varvec{x}}_i(t_{i,j})} \frac{\partial \varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\partial \varvec{z}_i(t_{i,j})} \Bigg \} \nonumber \\&= \varvec{\mathrm {H}}^\mathrm{T}_i \frac{\partial \tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}}\varvec{\mathrm {\Lambda }}^\mathrm{T} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j}), \nonumber \\ \frac{\partial L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }}}&= -\,\frac{1}{2} \Bigg \{\frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \frac{\partial \varvec{z}_i(t_{i,j})}{\partial \varvec{\mu }}\frac{\partial \varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\varvec{z}_i(t_{i,j})} \Bigg \} \nonumber \\&= \frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j}),\nonumber \\ \frac{\partial L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}}&= -\,\frac{1}{2} \Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \frac{\partial \varvec{z}_i(t_{i,j})}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}\frac{\partial \varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\partial \varvec{z}_i(t_{i,j})} \Bigg \} \nonumber \\&= \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \mathop {\mathrm {vec}}\Big [ \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j})\tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}\Big ],\nonumber \\ \frac{\partial L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\epsilon }}}&= -\,\frac{1}{2} \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{\partial \varvec{\theta }_{\varvec{\epsilon }}} \Bigg \{\frac{\partial \varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }})} + \frac{\partial \log |\varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }}|}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }})} \Bigg \} \nonumber \\&= \frac{1}{2} \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{\partial \varvec{\theta }_{\varvec{\epsilon }}}\Bigg \{ [\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}] \mathop {\mathrm {vec}}[\varvec{z}_i(t_{i,j}) \varvec{z}_i(t_{i,j})^\mathrm{T}-\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}] \Bigg \}, \nonumber \\ \frac{\partial L_{i}(\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta }_{\varvec{b}}}&= -\frac{1}{2} \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{\partial \varvec{\theta }_{\varvec{b}}} \Bigg \{ \frac{\partial \varvec{b}^\mathrm{T}_i \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \partial \varvec{b}_i}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma }}_{\varvec{b}})} + \frac{\partial \log |\varvec{\mathrm {\Sigma }}_{\varvec{b}}|}{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma }}_{\varvec{b}})} \Bigg \} \nonumber \\&= \frac{1}{2} \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{\partial \varvec{\theta }_{\varvec{b}}} \Bigg \{ [\varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1}] \mathop {\mathrm {vec}}(\varvec{b}_i \varvec{b}_i^\mathrm{T}-\varvec{\mathrm {\Sigma _{\varvec{b}}}}) \Bigg \}, \end{aligned}$$
(19)

where the \(\mathop {\mathrm {vec}}(\varvec{\mathrm {W}})\) operator stacks the columns of the \(m \times n\) matrix \(\varvec{\mathrm {W}}\) into an \(mn\)-dimensional column vector and \(\frac{\partial \tilde{\varvec{x}}_i(t_{i, j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}}\) is dictated by the dynamic model under consideration. Terms such as \(\frac{\partial \varvec{\mu } }{\partial \varvec{\theta }_{\varvec{\mu }}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{\partial \varvec{\theta }_{\varvec{\epsilon }}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\eta }}}})}{\partial \varvec{\theta }_{\varvec{\eta }}}\), and \(\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{\partial \varvec{\theta }_{\varvec{b}}}\) also depend on the model specification adopted in a particular application. Cases where some elements of \(\varvec{\mathrm {\Lambda }}, \varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }}, \varvec{\mathrm {\Sigma }}_{\varvec{\eta }}\), and \(\varvec{\mathrm {\Sigma }}_{\varvec{b}}\) are fixed at known values can be readily accommodated through appropriate specification of these matrices of partial derivatives.

Second-order partial derivative elements of the complete-data loglikelihood function are computed as

$$\begin{aligned} \frac{\partial ^2 L_{i,j}({\varvec{\mathrm {Y}}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta } \partial {\varvec{\beta }}^\mathrm{T}}&= \Big (\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{\mathrm {\Lambda }}^\mathrm{T}\otimes \varvec{\mathrm {I}}_{p_{\beta }} \Big ) \frac{\partial \mathop {\mathrm {vec}}}{\partial {\varvec{\beta }}^\mathrm{T}}\Bigg \{\varvec{\mathrm {H}}^\mathrm{T}_i\frac{\partial \tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}} \Bigg \}\\&\quad +\; \Bigg (\varvec{\mathrm {H}}^\mathrm{T}_i \frac{\partial {\tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}}{\partial \varvec{\theta }_{f,i}}\Bigg ) \frac{\partial \varvec{\mathrm {\Lambda }}^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \varvec{z}_i(t_{i,j})}{\partial \varvec{z}_i(t_{i, j})^\mathrm{T}}\frac{\partial \varvec{z}_i(t_{i,j})}{\partial \tilde{\varvec{x}}_i(t_{i,j})^\mathrm{T}} \frac{\partial \tilde{\varvec{x}}_i(t_{i, j})}{\partial {\varvec{\theta }}^\mathrm{T}_{f,i}} \frac{\partial \varvec{\theta }_{f,i}}{\partial {\varvec{\beta }}^\mathrm{T}} \\&= \Big (\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{\mathrm {\Lambda }}\otimes \varvec{\mathrm {I}}_{p_{\beta }} \Big ) (\varvec{\mathrm {I}}_{n_x} \otimes \varvec{\mathrm {H}}^\mathrm{T}_i)\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta _{f,i}}^\mathrm{T}}\Bigg \{\frac{\partial \tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}{\partial {\varvec{\theta }}_{f,i}} \Bigg \}\varvec{\mathrm {H}}_i\\&\quad -\; \varvec{\mathrm {H}}^\mathrm{T}_i \frac{\partial \tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}}\varvec{\mathrm {\Lambda }}^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{\mathrm {\Lambda }}\frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}}\varvec{\mathrm {H}}_i,\\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mu }}}&=\Big (\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \otimes \varvec{\mathrm {I}}_{p_{\mu }}\Big )\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mu }}}\Bigg \{\frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}}\Bigg \} - \frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}^\mathrm{T}}, \\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}}&= \Bigg \{\mathop {\mathrm {vec}}[\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j})\tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}]^\mathrm{T}\otimes \varvec{\mathrm {I}}_{p_{\Lambda }}\Bigg \}\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}}\Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \Bigg \} \\&\quad -\,\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} [\tilde{\varvec{x}}_i(t_{i,j})\tilde{\varvec{x}_i}(t_{i,j})^\mathrm{T}\otimes \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}] \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}}, \end{aligned}$$
$$\begin{aligned} \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\epsilon }} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\epsilon }}}&= \frac{1}{2}\Big [\mathop {\mathrm {vec}}\Big (\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j})\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \Big )^\mathrm{T} \otimes \varvec{\mathrm {I}}_{p_{\epsilon }} \Big ] \frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\epsilon }}}} \Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }_{\varvec{\mathrm {\epsilon }}} } \Bigg \} \\&\quad -\;\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }_{\varvec{\mathrm {\epsilon }}} } [\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{z}_i(t_{i,j})\varvec{z}_i(t_{i,j})^\mathrm{T}\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} ]\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\epsilon }}} }\\&\quad -\;\frac{1}{2}[\mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma ^{-1}_{\varvec{\epsilon }}}})^\mathrm{T} \otimes \varvec{\mathrm {I}}_{p_{\epsilon }}]\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\epsilon }}}}\Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }_{\varvec{\mathrm {\epsilon }}} } \Bigg \}\\&\quad +\;\frac{1}{2}\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }_{\varvec{\mathrm {\epsilon }}} } (\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1})\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{ \partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\epsilon }}} },\\ \end{aligned}$$
$$\begin{aligned} \frac{\partial ^2 L_{i}(\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{b}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}&=\frac{1}{2}\Big [\mathop {\mathrm {vec}}\Big (\varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1}\varvec{b}_i\varvec{b}^\mathrm{T}_i \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \Big )^\mathrm{T} \otimes \varvec{\mathrm {I}}_{p_{b}} \Big ] \frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}} \Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }_{\varvec{b}} } \Bigg \} \nonumber \\&\quad -\; \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }_{\varvec{b}} } \left[ \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1}\varvec{b}_i\varvec{b}^\mathrm{T}_i \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \right] \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}\nonumber \\&\quad -\;\frac{1}{2} [\mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma ^{-1}_{\varvec{b}}}})^\mathrm{T} \otimes \varvec{\mathrm {I}}_{p_{b}}]\frac{\partial \mathop {\mathrm {vec}}}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}\Bigg \{\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }_{\varvec{b}} } \Bigg \}\nonumber \\&\quad +\; \frac{1}{2}\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }_{\varvec{b}} } (\varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1} \otimes \varvec{\mathrm {\Sigma _{\varvec{b}}}}^{-1})\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{ \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}, \nonumber \\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\mu }}} \partial \varvec{\theta _{\varvec{\mathrm {\Lambda }}}}^\mathrm{T}}&= -\, \frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1} [\tilde{\varvec{x}}_i(t_{i,j})^\mathrm{T} \otimes \varvec{\mathrm {I}}_{n_y} ] \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }^\mathrm{T}_{\varvec{\mathrm {\Lambda }}}},\nonumber \\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mu }}\partial \varvec{\beta }^\mathrm{T}}&= -\,\frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\frac{\partial \varvec{z}_i(t_{i,j})}{\partial \tilde{\varvec{x}}_i(t_{i,j})^\mathrm{T}} \frac{\partial \tilde{\varvec{x}}_i(t_{i, j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}} \frac{\partial \varvec{\theta }_{f,i}}{\partial \varvec{\beta }^\mathrm{T}},\nonumber \\&=-\,\frac{\partial \varvec{\mu }}{\partial \varvec{\theta }_{\varvec{\mu }}} \varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}}^{-1}\varvec{\mathrm {\Lambda }}\frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}}\varvec{\mathrm {H}}_i,\nonumber \\ \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}\partial \varvec{\mathrm {\beta }}^\mathrm{T}}&=\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}} \Bigg \{ (\varvec{\mathrm {I}}_{n_x}\otimes \varvec{\mathrm {\Sigma }}_{\epsilon }^{-1}) \Big [ -(\tilde{\varvec{x}}_i(t_{i,j})^\mathrm{T} \otimes \varvec{\mathrm {I}}_{n_y})\varvec{\mathrm {\Lambda }}\frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}}\varvec{\mathrm {H}}_i \nonumber \\&\quad +\; (\varvec{\mathrm {I}}_{n_x} \otimes \varvec{z}_i(t_{i,j})) \frac{\partial \tilde{\varvec{x}}_i(t_{i,j})}{\partial \varvec{\theta }^\mathrm{T}_{f,i}} \varvec{\mathrm {H}}_i\Big ]\Bigg \}. \end{aligned}$$
(20)

Other second-order derivative elements are equal to null matrices, including \(\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\mu }}}},\) \( \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\mathrm {\Lambda }}}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\epsilon }}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mu }}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mathrm {\Lambda }}}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}\) and \(\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b};\varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mathrm {\epsilon }}}}\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}\). Under the assumption that the model is correctly specified, the elements in \(\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\mathrm {\Lambda }}}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\epsilon }}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\mu }}} \partial \varvec{\theta _{\varvec{\epsilon }}}^\mathrm{T}}\) and \(\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\beta }}} \partial \varvec{\theta _{\varvec{\epsilon }}}^\mathrm{T}}\) are close to zeros at the MLEs of the modeling parameters. These elements are thus set to null matrices in the proposed estimation algorithm to stabilize the algorithm when initial parameter estimates are far from the MLEs and are not shown here. In addition, the off-diagonal elements shown in the last three equations in (20) are non-zero even near the MLEs. However, setting all the off-diagonal blocks of the information matrix of the complete-data loglikelihood function to null matrices helps stabilize the algorithm in case this information matrix is not positive definite in the optimization process. In our preliminary simulations, we verified that setting these three matrices to null matrices, as opposed to the forms as shown in Eq. (20), actually helped reduce numerical problems in the optimization process while having negligible effects on the final point and SE estimates because we are not using this matrix directly as the Fisher information matrix to derive the final SE estimates. We thus proceeded to setting all the off-diagonal elements, including the last three matrices shown in Eq. (20), to null matrices.

Appendix 2: Sampling From \(p(\varvec{b}| \varvec{\mathrm {Y}};\varvec{\theta }^{(m-1)})\)

The superscript \((m-1)\) of \(\varvec{\theta }\) is temporarily suppressed for notational simplicity. It can be shown that \(p(\varvec{b}| \varvec{\mathrm {Y}};\varvec{\theta })=\prod _{i=1}^{n} p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })\), where \(p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })\) is non-standard and cannot be sampled directly. Specifically, \(p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta }) \propto p(\varvec{b}_i;\varvec{\theta }_{\varvec{b}})p(\varvec{\mathrm {Y}}_i | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })\), in which \(p(\varvec{\mathrm {Y}}_i | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })\) is given by

$$\begin{aligned} p(\varvec{\mathrm {Y}}_i | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta }) = \prod _{j=1}^\mathrm{T} p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta }), \end{aligned}$$
(21)

where \(p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })\) is a multivariate normal density function with mean \(\varvec{\mu }+\varvec{\mathrm {\Lambda }}\tilde{\varvec{x}}_i(t_{i, j})\) and covariance matrix \(\varvec{\mathrm {\Sigma }}_{\epsilon }\). As \(\varvec{b}_i\) is involved in the nonlinear \(f(.)\) in \(p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta }), p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })\) is usually non-standard. To sample from \(p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })\), we adopt a Metropolis-Hastings (MH) algorithm as follows. At the \(m\)th iteration with current values in \(\varvec{b}_i^{(m)}\), a new candidate \(\varvec{b}_i\) is generated from a proposal distribution, chosen to be the normal distribution \(\text{ N }(\varvec{b}_i^{(m)},\sigma _{b}^{2} \varvec{\mathrm {\Omega }}_{bi})\), where \(\sigma _{b}^{2}\) is a scaling constant, \(\varvec{\mathrm {\Omega }}_{bi} = (\varvec{\mathrm {\Sigma }}_{\varvec{b}}^{-1}+\sum _{j=1}^\mathrm{T}\varvec{\mathrm {D}}_{bit}^\mathrm{T}\varvec{\mathrm {\Sigma }}_{\epsilon }^{-1} \varvec{\mathrm {D}}_{bit})^{-1}, \varvec{\mathrm {D}}_{bit}=\partial \tilde{\varvec{x}}_i(t_{i, j})/\partial \varvec{b}_{i}^\mathrm{T} |_{\varvec{b}_i=\varvec{b}_i^{*}}\), and \(\varvec{b}_i^{*}\) is a fixed value with high \(p(\varvec{b}_i^{*}| \varvec{\mathrm {Y}};\varvec{\theta })\). One possibility is to use the mean of \(p(\varvec{b}_i;\varvec{\theta }_{\varvec{b}})\) as \(\varvec{b}_i^{*}\), which we have found to lead to good performance. The new \(\varvec{b}_i\) is accepted with probability

$$\begin{aligned} \text{ min }\left\{ 1,\frac{p(\varvec{b}_i;\varvec{\theta }_{\varvec{b}}) \prod _{j=1}^\mathrm{T} p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })}{p(\varvec{b}^{(m)}_i;\varvec{\theta }_{\varvec{b}}) \prod _{j=1}^\mathrm{T} p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i^{(m)}; \varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })} \right\} , \end{aligned}$$
(22)

The scaling constant, \(\sigma _{b}^2\), can be chosen such that the average acceptance rate is approximately 0.4.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chow, SM., Lu, Z., Sherwood, A. et al. Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation–Maximization (SAEM) Algorithm. Psychometrika 81, 102–134 (2016). https://doi.org/10.1007/s11336-014-9431-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-014-9431-z

Keywords

Navigation