Abstract
The past decade has evidenced the increased prevalence of irregularly spaced longitudinal data in social sciences. Clearly lacking, however, are modeling tools that allow researchers to fit dynamic models to irregularly spaced data, particularly data that show nonlinearity and heterogeneity in dynamical structures. We consider the issue of fitting multivariate nonlinear differential equation models with random effects and unknown initial conditions to irregularly spaced data. A stochastic approximation expectation–maximization algorithm is proposed and its performance is evaluated using a benchmark nonlinear dynamical systems model, namely, the Van der Pol oscillator equations. The empirical utility of the proposed technique is illustrated using a set of 24-h ambulatory cardiovascular data from 168 men and women. Pertinent methodological challenges and unresolved issues are discussed.
Similar content being viewed by others
Notes
The local truncation errors of a numerical solver at each time point are equal to \(\varvec{c} \Delta ^{g+1}_{i, j}\), where \(g\) is the order of the ODE solver and \(\varvec{c}\) is a vector of constants that depends on elements such as the differentials of the ODEs (for further details see Press, Teukolsky, Vetterling, & Flannery, 2002; Ralston & Rabinowitz, 2001).
In contrast, in cases involving SDEs, \(p(\tilde{\varvec{\mathrm {X}}}|\varvec{b}; \varvec{\theta })\) is not fixed even when \(\varvec{b}\) is known and there is considerable increase in estimation complexity.
In the present context, we specify the gain constant to be \(\gamma ^{(m)} = a_2/(m^{a_1}+a_2-1), \quad m=1, \ldots , K_1+K_2,\) where the real number \(a_1\) and the integer \(a_2\) are preassigned. In stage 1, \(a_1\) and \(a_2\) are selected such that the gain constant assumes some large values to prevent the SAEM algorithm from settling into local minima too quickly. In stage 2, the gain constant is slowly tapered toward zero to allow the algorithm to stabilize toward a final set of estimates (e.g., by setting \(a_1 \in (.5, 1]\) to be close to 1, and \(a_2\) to be a small integer, say, \(a_2=2\)). The transition from stage 1 to 2 is governed by another predefined criterion function (for details see Gu & Zhu, 2001; Zhu & Gu, 2007).
In the present study, we define the stopping rule to be
$$\begin{aligned} K_2&= \inf \Bigg \{m: \tilde{\varvec{s}}^{'(m)}_{\varvec{\mathrm {Y}}} \Big [\tilde{\varvec{\mathrm {I}}}^{(m)}_{\varvec{\mathrm {Y}}}\Big ]^{-1} \tilde{\varvec{s}}^{(m)}_{\varvec{\mathrm {Y}}} + \mathop {\mathrm {tr}}\Big \{ \Big [\tilde{\varvec{\mathrm {I}}}^{(m)}_{\varvec{\mathrm {Y}}}\Big ]^{-1}\hat{\varvec{\mathrm {\Sigma }}} \Big \}/m \le \text { some small constant}\Bigg \}. \end{aligned}$$(13)\(\hat{\Sigma }\) denotes an estimate of the covariance matrix of Monte Carlo error. In practice, we used the sample covariance matrix of \(\overline{\varvec{s}}_{\varvec{\mathrm {Z}}}^{(m)}\) as a rough estimate of \(\hat{\varvec{\mathrm {\Sigma }}}\).
This decision was made because there were insufficient repeated measurements to estimate this parameter accurately, especially for individuals with a diurnal cycle that is longer than 24 h, or individuals with less than 24 h worth of measurements.
Because the model fitting procedures were based on the second-order Heun’s method whereas the true data were generated using a fourth-order Runge Kutta approach, the errors entailed from approximating the trajectories from the fourth-order solver by means of a second-order solver were expected to lead to some biases in the point estimates. Thus, the coverage performance of the confidence intervals as assessed, e.g., by the proportion of 95 % CIs covering each true population parameter value, can be expected to deviate from the nominal coverage rate of 0.95.
To ensure the positive definiteness of \(\varvec{\mathrm {\Sigma }}_b\), we chose to estimate the lower triangular entries of \(\varvec{\mathrm {L}}\) and the diagonal entries of \(\varvec{\mathrm {D}}\) in the \(\varvec{\mathrm {\Sigma }}_b = LDL^\mathrm{T}\) decomposition, with the constraint that the diagonal elements of \(\varvec{\mathrm {D}}\) were positive (Anderson, 2003). These constraints might have affected the accuracy of the point and SE estimates for the initial condition variance–covariance parameters as well.
References
Ait-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. The Annals of Statistics, 36(2), 906–937.
Anderson, T. W. (2003). An introduction to multivariate statistical analysis (3rd ed.)., Probability and Statistics New York, NY: Wiley.
Arminger, G. (1986). Linear stochastic differential equation models for panel data with unobserved variables. In N. Tuma (Ed.), Sociological methodology (pp. 187–212). San Francisco: Jossey-Bass.
Bereiter, C. (1963). Some persisting dilemmas in the measurement of change. In C. W. Harris (Ed.), Problems in measuring change (pp. 3–20). Madison, WI: University of Wisconsin Press.
Beskos, A., Papaspiliopoulos, O., & Roberts, G. (2009). Monte carlo maximum likelihood estimation for discretely observed diffusion processes. The Annals of Statistics, 37(1), 223–245.
Beskos, A., Papaspiliopoulos, O., Roberts, G., & Fearnhead, P. (2006). Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 333–382.
Boker, S. M., & Graham, J. (1998). A dynamical systems analysis of adolescent substance abuse. Multivariate Behavioral Research, 33, 479–507.
Boker, S. M., & Nesselroade, J. R. (2002). A method for modeling the intrinsic dynamics of intraindividual variability: Recovering the parameters of simulated oscillators in multi- wave panel data. Multivariate Behavioral Research, 37, 127–160.
Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary methods: Capturing life as it is lived. Annual Review of Psychology, 54, 579–616.
Brown, E. N., & Luithardt, H. (1999). Statistical model building and model criticism for human circadian data. Journal of Biological Rhythms, 14, 609–616.
Brown, E. N., Luithardt, H., & Czeisler, C. A. (2000). A statistical model of the human coretemperature circadian rhythm. American Journal of Physiology, Endocrinology and Metabolism, 279, 669–683.
Browne, M. W., & du Toit, H. C. (1991). Models for learning data. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions (pp. 47–68). Washington, D.C.: American Psychological Association.
Cao, J., Huang, J. Z., & Wu, H. (2012). Penalized nonlinear least squares estimation of time-varying parameters in ordinary differential equations. Journal of Computational and Graphical Statistics, 21(1), 42–56. doi:10.1198/jcgs.2011.10021.
Carels, R. A., Blumenthal, J. A., & Sherwood, A. (2000). Emotional responsivity during daily life: Relationship to psychosocial functioning and ambulatory blood pressure. International Journal of Psychophysiology, 36, 25–33.
Carlin, B. P., Gelfand, A., & Smith, A. (1992). Hierarchical bayesian analysis of changepoints problems. Applied Statistics, 41, 389–405.
Chow, S.-M., Ferrer, E., & Nesselroade, J. R. (2007). An unscented kalman filter approach to the estimation of nonlinear dynamical systems models. Multivariate Behavioral Research, 42(2), 283–321.
Chow, S.-M., Grimm, K. J., Guillaume, F., Dolan, C. V., & McArdle, J. J. (2013). Regime switching bivariate dual change score model. Multivariate Behavioral Research, 48(4), 463–502.
Chow, S.-M., Ho, M.-H. R., Hamaker, E. J., & Dolan, C. V. (2010). Equivalences and differences between structural equation and state-space modeling frameworks. Structural Equation Modeling, 17(303–332).
Chow, S.-M., & Nesselroade, J. R. (2004). General slowing or decreased inhibition? Mathematical models of age differences in cognitive functioning. Journals of Gerontology Series B—Psychological Sciences & Social Sciences, 59B(3), 101–109.
Chow, S.-M., Tang, N., Yuan, Y., Song, X., & Zhu, H. (2011). Bayesian estimation of semiparametric dynamic latent variable models using the Dirichlet process prior. British Journal of Mathematical and Statistical Psychology, 64(1), 69–106.
Chow, S.-M., & Zhang, G. (2013). Nonlinear regime-switching state-space (RSSS) models. Psychometrika: Application Reviews and Case Studies, 78(4), 740–768.
Cronbach, L. J., & Furby, L. (1970). How should we measure “change”—or should we? Psychological Bulletin, 74(1), 68–80.
Cudeck, R., & Klebe, K. J. (2002). Multiphase mixed-effects models for repeated measures data. Psychological Methods, 7(1), 41–6.
Dembo, A., & Zeitouni, O. (1986). Parameter estimation of partially observed continuous time stochastic processes via the EM algorithm. Stochastic Processes and Their Applications, 23, 91–113.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
Diebolt, J., & Celeux, G. (1993). Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions. Communications in Statistics B—Stochastic Models, 9(4), 599–613.
Donnet, S., & Samson, A. (2007). Estimation of parameters in incomplete data models defined by dynamical systems. Journal of Statistical Planning and Inference, 137, 2815–2831.
Du Toit, S. H. C., & Browne, M. W. (2001). The covariance structure of a vector ARMA time series. Structural equation modeling: Present and future (pp. 279–314). Chicago: Scientific Software International.
Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., & Alpert, A. (1999). An introduction to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
Durbin, J., & Koopman, S. J. (2001). Time series analysis by state space methods. New York, NY: Oxford University Press.
Gates, K. M., & Molenaar, P. C. M. (2012). Group search algorithm recovers effective connectivity maps for individuals in homogeneous and heterogeneous samples. Neuroimage, 63, 310–319.
Geweke, J., & Tanizaki, H. (2001). Bayesian estimation of state-space models using the Metropolis–Hastings algorithm within Gibbs sampling. Computational Statistics & Data Analysis, 37, 151–170.
Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEEE Proceedings-F, Radar and Signal Processing, 140(2), 107–113.
Gu, M. G., & Zhu, H. T. (2001). Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation. Journal of the Royal Statistical Society, Series B, 63, 339–355.
Hairer, M., Stuart, A. M., Voss, J., & Wiberg, P. (2005). Analysis of spdes arising in path sampling. part i: The gaussian case. Communications in Mathematical Sciences, 3(4), 587–603.
Hale, J. K., & Koçak, H. (1991). Dynamics and bifurcation. New York, NY: Springer.
Harris, C. W. (Ed.). (1963). Problems in measuring change. Madison, WI: University of Wisconsin Press.
Harvey, A. C., & Souza, R. C. (1987). Assessing and modelling the cyclical behaviour of rainfall in northeast Brazil. Journal of Climate and Applied Meteorology, 26, 1317–1322.
Hürzeler, M., & Künsch, H. (1998). Monte carlo approximations for general state-space models. Journal of Computational and Graphical Statistics, 7, 175–193.
Jones, R. H. (1984). Fitting multivariate models to unequally spaced data. In E. Parzen (Ed.), Time series analysis of irregularly observed data (Vol. 25, p. 158–188). New York, NY: Springer.
Jones, R. H. (1993). Longitudinal data with serial correlation: A state-space approach. Boca Raton, FL: Chapman & Hall/CRC.
Kaplan, D., & Glass, L. (1995). Understanding nonlinear dynamics. New York, NY: Springer.
Kenny, D. A., & Judd, C. M. (1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96, 201–210.
Kincanon, E., & Powel, W. (1995). Chaotic analysis in psychology and psychoanalysis. The Journal of Psychology, 129, 495–505.
Kitagawa, G. (1998). A self-organizing state-space model. Journal of the American Statistical Association, 93(443), 1203–1215.
Klein, A. G., & Muthén, B. O. (2007). Quasi maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects. Multivariate Behavioral Research, 42(4), 647–673.
Kuhn, E., & Lavielle, M. (2005). Maximum likelihood estimation in nonlinear mixed effects models. Computational Statistics & Data Analysis, 49, 1020–1038.
Kulikov, G., & Kulikova, M. (2014). Accurate numerical implementation of the continuous-discrete extended Kalman filter. IEEE Transactions on Automatic Control, 59(1), 273–279. doi:10.1109/TAC.2013.2272136.
Lee, S., & Song, X. (2003). Maximum likelihood estimation and model comparison for mixtures of structural equation models with ignorable missing data. Journal of Classification, 20(2), 221–255. doi:10.1007/s00357-003-0013-5.
Li, F., Duncan, T. E., & Acock, A. (2000). Modeling interaction effects in latent growth curve models. Structural Equation Modeling, 7(4), 497–533.
Liang, H., Miao, H., & Wu, H. (2010). Estimation of constant and time-varying dynamic parameters of HIV infection in a nonlinear differential equation model. Annals of Applied Statistics, 4(1), 460–483.
Longstaff, M. G., & Heath, R. A. (1999). A nonlinear analysis of the temporal characteristics of handwriting. Human Movement Science, 18, 485–524.
Losardo, D. (2012). An examination of initial condition specification in the structural equation modeling framework. Unpublished doctoral dissertation, University of North Carolina, Chapel Hill, NC.
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 44, 190–200.
Marsh, W. H., Wen, Z. L., & Hau, J.-T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9, 275–300.
Mbalawata, I. S., Särkkä, S., & Haario, H. (2013). Parameter estimation in stochastic differential equations with Markov chain Monte Carlo and non-linear Kalman filtering. Computational Statistics, 28(3), 1195–1223.
McArdle, J. J., & Hamagami, F. (2001). Latent difference score structural models for linear dynamic analysis with incomplete longitudinal data. In L. Collins & A. Sayer (Eds.), New methods for the analysis of change (pp. 139–175). Washington, DC: American Psychological Association.
Mcardle, J. J., & Hamagami, F. (2003). Structural equation models for evaluating dynamic concepts within longitudinal twin analyses. Behavior Genetics, 33(2), 137–159. doi:10.1023/A:1022553901851.
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107–122.
Miao, H., Xin, X., Perelson, A. S., & Wu, H. (2011). On identifiability of nonlinear ODE models and applications in viral dynamics. SIAM Review, 53(1), 3–39.
Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific pyschology-this time forever. Measurement: Interdisciplinary Research and Perspectives, 2, 201–218.
Molenaar, P. C. M., & Newell, K. M. (2003). Direct fit of a theoretical model of phase transition in oscillatory finger motions. British Journal of Mathematical and Statistical Psychology, 56, 199–214. doi:10.1348/000711003770480002.
Ortega, J. (1990). Numerical analysis: A second course. Philadelphia, PA: Society for Industrial and Academic Press.
Oud, J. H. L. (2007). Comparison of four procedures to estimate the damped linear differential oscillator for panel data. In J. Oud & A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences. Mahwah, NJ: Lawrence Erlbaum Associates.
Oud, J. H. L., & Jansen, R. A. R. G. (2000). Continuous time state space modeling of panel data by means of SEM. Psychometrika, 65(2), 199–215.
Oud, J. H. L., & Singer, H. (Eds.). (2010). Special issue: Continuous time modeling of panel data, 62 (1).
Pickering, T. G., Shimbo, D., & Haas, D. (2006). Ambulatory blood-pressure monitoring. The New England Journal of Medicine, 354, 2368–2374.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2002). Numerical recipes in C. Cambridge: Cambridge University Press.
R Development Core Team. (2009). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria: R Foundation for Statistical Computing. Retrieved April, 2014, from http://www.R-project.org (ISBN: 3-900051-07-0).
Ralston, A., & Rabinowitz, P. (2001). A first course in numerical analysis (2nd ed.). Mineola, NY: Dover.
Ramsay, J. O., Hooker, G., Campbell, D., & Cao, J. (2007). Parameter estimation for differential equations: A generalized smoothing approach (with discussion). Journal of Royal Statistical Society: Series B, 69(5), 741–796.
Raudenbush, S. W., & Liu, X.-F. (2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods, 6(4), 387–401.
Särkkä, S. (2013). Bayesian filtering and smoothing. Hillsdale, NJ: Cambridge University Press.
SAS Institute Inc. (2008). SAS 9.2 Help and Documentation (Computer software manual). Cary, NC: SAS Institute Inc.
Sherwood, A., Steffen, P., Blumenthal, J., Kuhn, C., & Hinderliter, A. L. (2002). Nighttime blood pressure dipping: The role of the sympathetic nervous system. American Journal of Hypertension, 15, 111–118.
Sherwood, A., Thurston, R., Steffen, P., Blumenthal, J. A., Waugh, R. A., & Hinderliter, A. L. (2001). Blunted nighttime blood pressure dipping in postmenopausal women. American Journal of Hypertension, 14, 749–754.
Singer, H. (1992). The aliasing-phenomenon in visual terms. Journal of Mathematical Sociology, 14(1), 39–49.
Singer, H. (1995). Analytical score function for irregularly sampled continuous time stochastic processes with control variables and missing values. Econometric Theory, 11, 721–735. doi:10.1017/S0266466600009701.
Singer, H. (2002). Parameter estimation of nonlinear stochastic differential equations: Simulated maximum likelihood vs. extended Kaman filter and itô-Taylor expansion. Journal of Computational and Graphical Statistics, 11, 972–995.
Singer, H. (2007). Stochastic differential equation models with sampled data. In K. van Montfort, J. H. L. Oud, & A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences (pp. 73–106). Mahwah, NJ: Lawrence Erlbaum Associates.
Singer, H. (2010). Sem modeling with singular moment matrices. Part I: Ml-estimation of time series. The Journal of Mathematical Sociology, 34(4), 301–320. doi:10.1080/0022250X.2010.532259.
Singer, H. (2012). Sem modeling with singular moment matrices. Part II: Ml-estimation of sampled stochastic differential equations. The Journal of Mathematical Sociology, 36(1), 22–43. doi:10.1080/0022250X.2010.532259.
Stone, A. A., & Shiffman, S. (1994). Ecological momentary assessment (ema) in behavioral medicine. Annals of Behavioral Medicine, 16(3), 199–202.
Strogatz, S. H. (1994). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Cambridge, MA: Westview.
Stuart, A. M., Voss, J., & Wilberg, P. (2004). Conditional path sampling of sdes and the langevin mcmc method. Communications in Mathematical Sciences, 2(4), 685–697.
Tanizaki, H. (1996). Nonlinear filters: Estimation and applications (2nd ed.). Berlin: Springer.
Thatcher, R. W. (1998). A predator–prey model of human cerebral development. In K. M. Newell & P. C. M. Molenaar (Eds.), Applications of nonlinear dynamics to developmental process modeling (pp. 87–128). Mahwah, NJ: Lawrence Erlbaum.
Wen, Z., Marsh, H. W., & Hau, K.-T. (2002). Interaction effects in growth modeling: A full model. Structural Equation Modeling, 9(1), 20–39.
Wu, H. (2005). Statistical methods for HIV dynamic studies in AIDS clinical trials. Statistical Methods in Medical Research, 14, 171–192.
Zhu, H., Gu, M., & Peterson, B. (2007). Maximum likelihood from spatial random effects models via the stochastic approximation expectation maximization algorithm. Statistics and Computing Archive, 17(2), 163–177.
Zhu, H. T., & Zhang, H. P. (2006). Generalized score test of homogeneity for mixed effects models. Annals of Statistics, 34, 1545–1569.
Acknowledgments
Funding for this study was provided by NSF Grant BCS-0826844, NIH Grants RR025747-01, P01CA142538-01, MH086633, EB005149-01, AG033387, and R01GM105004.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Score Vector and Information Matrix of the Complete-Data Loglikelihood Function
The elements in \(\varvec{s}_{\varvec{\mathrm {Z}}}(\varvec{\theta }; \varvec{\mathrm {Z}})\) and \(\varvec{\mathrm {I}}_{\varvec{\mathrm {Z}}}(\varvec{\theta }\varvec{\mathrm {Z}})\), namely, the score vector and information matrix of the complete-data loglikelihood function, are computed as
where \(\mathop {\mathrm {Diag}}(.)\) denotes a block diagonal matrix formed by stacking the appropriate second partial derivative matrices in its diagonal section and zero matrices in its off-diagonal sections.
Using Heun’s method, with \(\tilde{\varvec{x}}_i(t_{i,j})\) as defined in Eq. (4) and \(\varvec{z}_i(t_{i,j}) = [\varvec{y}_i(t_{i, j})-\varvec{\mu }-\varvec{\mathrm {\Lambda }}\tilde{\varvec{x}}_i(t_{i, j})]\), first-order partial derivative elements of the complete-data loglikelihood are given by
where the \(\mathop {\mathrm {vec}}(\varvec{\mathrm {W}})\) operator stacks the columns of the \(m \times n\) matrix \(\varvec{\mathrm {W}}\) into an \(mn\)-dimensional column vector and \(\frac{\partial \tilde{\varvec{x}}_i(t_{i, j})^\mathrm{T}}{\partial \varvec{\theta }_{f,i}}\) is dictated by the dynamic model under consideration. Terms such as \(\frac{\partial \varvec{\mu } }{\partial \varvec{\theta }_{\varvec{\mu }}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Lambda }})}{\partial \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\epsilon }}}})}{\partial \varvec{\theta }_{\varvec{\epsilon }}}, \frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{\eta }}}})}{\partial \varvec{\theta }_{\varvec{\eta }}}\), and \(\frac{\partial \mathop {\mathrm {vec}}(\varvec{\mathrm {\Sigma _{\varvec{b}}}})}{\partial \varvec{\theta }_{\varvec{b}}}\) also depend on the model specification adopted in a particular application. Cases where some elements of \(\varvec{\mathrm {\Lambda }}, \varvec{\mathrm {\Sigma }}_{\varvec{\epsilon }}, \varvec{\mathrm {\Sigma }}_{\varvec{\eta }}\), and \(\varvec{\mathrm {\Sigma }}_{\varvec{b}}\) are fixed at known values can be readily accommodated through appropriate specification of these matrices of partial derivatives.
Second-order partial derivative elements of the complete-data loglikelihood function are computed as
Other second-order derivative elements are equal to null matrices, including \(\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\mu }}}},\) \( \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\mathrm {\Lambda }}}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{\theta }_{\varvec{\epsilon }}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\beta }\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mu }}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mathrm {\Lambda }}}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}\) and \(\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b};\varvec{\theta })}{\partial \varvec{\theta _{\varvec{\mathrm {\epsilon }}}}\partial \varvec{\theta }^\mathrm{T}_{\varvec{b}}}\). Under the assumption that the model is correctly specified, the elements in \(\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\mathrm {\Lambda }}}} \partial \varvec{\theta }^\mathrm{T}_{\varvec{\epsilon }}}, \frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\mu }}} \partial \varvec{\theta _{\varvec{\epsilon }}}^\mathrm{T}}\) and \(\frac{\partial ^2 L_{i,j}(\varvec{\mathrm {Y}}|\varvec{b}; \varvec{\theta }) }{\partial \varvec{\theta _{\varvec{\beta }}} \partial \varvec{\theta _{\varvec{\epsilon }}}^\mathrm{T}}\) are close to zeros at the MLEs of the modeling parameters. These elements are thus set to null matrices in the proposed estimation algorithm to stabilize the algorithm when initial parameter estimates are far from the MLEs and are not shown here. In addition, the off-diagonal elements shown in the last three equations in (20) are non-zero even near the MLEs. However, setting all the off-diagonal blocks of the information matrix of the complete-data loglikelihood function to null matrices helps stabilize the algorithm in case this information matrix is not positive definite in the optimization process. In our preliminary simulations, we verified that setting these three matrices to null matrices, as opposed to the forms as shown in Eq. (20), actually helped reduce numerical problems in the optimization process while having negligible effects on the final point and SE estimates because we are not using this matrix directly as the Fisher information matrix to derive the final SE estimates. We thus proceeded to setting all the off-diagonal elements, including the last three matrices shown in Eq. (20), to null matrices.
Appendix 2: Sampling From \(p(\varvec{b}| \varvec{\mathrm {Y}};\varvec{\theta }^{(m-1)})\)
The superscript \((m-1)\) of \(\varvec{\theta }\) is temporarily suppressed for notational simplicity. It can be shown that \(p(\varvec{b}| \varvec{\mathrm {Y}};\varvec{\theta })=\prod _{i=1}^{n} p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })\), where \(p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })\) is non-standard and cannot be sampled directly. Specifically, \(p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta }) \propto p(\varvec{b}_i;\varvec{\theta }_{\varvec{b}})p(\varvec{\mathrm {Y}}_i | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })\), in which \(p(\varvec{\mathrm {Y}}_i | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })\) is given by
where \(p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta })\) is a multivariate normal density function with mean \(\varvec{\mu }+\varvec{\mathrm {\Lambda }}\tilde{\varvec{x}}_i(t_{i, j})\) and covariance matrix \(\varvec{\mathrm {\Sigma }}_{\epsilon }\). As \(\varvec{b}_i\) is involved in the nonlinear \(f(.)\) in \(p(\varvec{y}_i(t_{i, j}) | \varvec{b}_i;\varvec{\theta }_{\varvec{\mu }}, \varvec{\theta }_{\varvec{\mathrm {\Lambda }}}, \varvec{\theta }_{\varvec{\epsilon }},\varvec{\beta }), p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })\) is usually non-standard. To sample from \(p(\varvec{b}_i| \varvec{\mathrm {Y}};\varvec{\theta })\), we adopt a Metropolis-Hastings (MH) algorithm as follows. At the \(m\)th iteration with current values in \(\varvec{b}_i^{(m)}\), a new candidate \(\varvec{b}_i\) is generated from a proposal distribution, chosen to be the normal distribution \(\text{ N }(\varvec{b}_i^{(m)},\sigma _{b}^{2} \varvec{\mathrm {\Omega }}_{bi})\), where \(\sigma _{b}^{2}\) is a scaling constant, \(\varvec{\mathrm {\Omega }}_{bi} = (\varvec{\mathrm {\Sigma }}_{\varvec{b}}^{-1}+\sum _{j=1}^\mathrm{T}\varvec{\mathrm {D}}_{bit}^\mathrm{T}\varvec{\mathrm {\Sigma }}_{\epsilon }^{-1} \varvec{\mathrm {D}}_{bit})^{-1}, \varvec{\mathrm {D}}_{bit}=\partial \tilde{\varvec{x}}_i(t_{i, j})/\partial \varvec{b}_{i}^\mathrm{T} |_{\varvec{b}_i=\varvec{b}_i^{*}}\), and \(\varvec{b}_i^{*}\) is a fixed value with high \(p(\varvec{b}_i^{*}| \varvec{\mathrm {Y}};\varvec{\theta })\). One possibility is to use the mean of \(p(\varvec{b}_i;\varvec{\theta }_{\varvec{b}})\) as \(\varvec{b}_i^{*}\), which we have found to lead to good performance. The new \(\varvec{b}_i\) is accepted with probability
The scaling constant, \(\sigma _{b}^2\), can be chosen such that the average acceptance rate is approximately 0.4.
Rights and permissions
About this article
Cite this article
Chow, SM., Lu, Z., Sherwood, A. et al. Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation–Maximization (SAEM) Algorithm. Psychometrika 81, 102–134 (2016). https://doi.org/10.1007/s11336-014-9431-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-014-9431-z