Behavior Research Methods

, Volume 50, Issue 2, pp 490–500 | Cite as

Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random

  • Joshua N. Pritikin
  • Timothy R. Brick
  • Michael C. Neale


A novel method for the maximum likelihood estimation of structural equation models (SEM) with both ordinal and continuous indicators is introduced using a flexible multivariate probit model for the ordinal indicators. A full information approach ensures unbiased estimates for data missing at random. Exceeding the capability of prior methods, up to 13 ordinal variables can be included before integration time increases beyond 1 s per row. The method relies on the axiom of conditional probability to split apart the distribution of continuous and ordinal variables. Due to the symmetry of the axiom, two similar methods are available. A simulation study provides evidence that the two similar approaches offer equal accuracy. A further simulation is used to develop a heuristic to automatically select the most computationally efficient approach. Joint ordinal continuous SEM is implemented in OpenMx, free and open-source software.


Structural equation modeling Multivariate probit Joint ordinal continuous Continuous latent variables Maximum likelihood 


  1. Aitken, A. C. (1935). Note on selection from a multivariate normal population. In Proceedings of the Edinburgh Mathematical Society (series 2) 4.2 (pp. 106–110).
  2. Asparouhov, T., & Muthén, B. (2010). Bayesian analysis of latent variable models using Mplus. Retrieved November 1, 2016 from
  3. Baker, F. B., & Kim, S. H. (2004) Item Response Theory: Parameter Estimation Techniques. 2nd. Boca Raton: CRC Press.Google Scholar
  4. Bodner, T. E. (2008). What improves with increased missing data imputations? In Structural Equation Modeling 15.4 (pp. 65–675).
  5. Bradley, E. L. (1973). The equivalence of maximum likelihood and weighted least squares estimates in the exponential family. In Journal of the American Statistical Association 68.341 (pp. 199–200).Google Scholar
  6. Broyden, C. G. (1965). A class of methods for solving nonlinear simultaneous equations. In Mathematics of Computation 19.92 (pp. 577–593).
  7. van Stef, B. (2012) Flexible imputation of missing data. Boca Raton: CRC Press.Google Scholar
  8. Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. In Psychometrika 75.1 (pp. 33–57).
  9. Driver, C. C., Oud, J. H. L., & Voelkle, M. C. (2017). Continuous time structural equation modeling with R Package ctsem. In Journal of Statistical Software 77.5 (pp. 1–35).
  10. Duncan, S. C., Duncan, T. E., & Strycker, L. A. (2001). Qualitative and quantitative shifts in adolescent problem behavior development: a cohort-sequential multivariate latent growth modeling approach. In Journal of Psychopathology and Behavioral Assessment 23.1 (pp. 43–50).
  11. Elliott, D. (2008). National Youth Survey [United States]: Waves I-V, 1976-1980. Inter-university Consortium for Political and Social Research (ICPSR) [distributor].
  12. Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. In Structural Equation Modeling 8.3 (pp. 430–457).
  13. Ferron, J. M., & Hess, M. R. (2007). Estimation in SEM: a concrete example. In Journal of Educational and Behavioral Statistics 32.1 (pp. 110–120).
  14. Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. In Psychological Methods 9.4 (pp. 466-491).
  15. Genz, A. (1992). Numerical computation of multivariate normal probabilities. In Journal of Computational and Graphical Statistics 1.2 (pp. 141–149).
  16. Gilbert, P., & Varadhan, R. (2012). numDeriv: accurate Numerical Derivatives. R package version 2012.9-1.
  17. Griewank, A. (1989). On automatic differentiation. In Mathematical Programming: Recent Developments and Applications 6.6 (pp. 83–107).Google Scholar
  18. Hagenaars, J. A. (1988). Latent structure models with direct effects between indicators local dependence models. In Sociological Methods & Research 16.3 (pp. 379–405).
  19. Jöreskog, K. G. (1990). New developments in LISREL: analysis of ordinal variables using polychoric correlations and weighted least squares. In Quality & Quantity 24.4 (pp. 387–404).
  20. Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: a comparison of three approaches. In Multivariate Behavioral Research 36.3 (pp. 347–387).
  21. Kirkpatrick, R. M., & Neale, M. C. (2016). Applying multivariate discrete distributions to genetically informative count data. In Behavior Genetics 46.2 (pp. 252–268).
  22. Kline, R. B. (2015) Principles and practice of structural equation modeling. New York: The Guilford Press.Google Scholar
  23. KruŻel, F., & Banaś, K. (2013). Vectorized openCL implementation of numerical integration for higher-order finite elements. In Computers & Mathematics with Applications 66.10 (pp. 2030–2044).
  24. Lee, S. -Y., Poon, W. -Y., & Bentler, P. M. (1990). Full maximum likelihood analysis of structural equation models with polytomous variables. In Statistics & Probability Letters 9.1 (pp. 91–97).
  25. Lee, S. -Y., Poon, W. -Y., & Bentler, P. M. (1992). Structural equation models with continuous and polytomous variables. In Psychometrika 57.1 (pp. 89–105).
  26. Little, R. J. A., & Schlucter, M. D. (1985). Maximum likelihood estimation for mixed continuous and categorical data with missing values. In Biometrika 72.3 (pp. 497–512).
  27. Lord, F. M., Novick, M. R., & Birnbaum, A. (1968) Statistical theories of mental test scores. Oxford: Addison-Wesley.Google Scholar
  28. Manjunath, G. B., & Wilhelm, S. (2012). Moments Calculation For the Doubly Truncated Multivariate Normal Density. arXiv:1206.5387[stat.CO].
  29. Matsunaga, M. (2008). Item parceling in structural equation modeling: a primer. In Communication Methods and Measures 2.4 (pp. 260– 293).
  30. Mehta, P. D., Neale, M. C., & Flay, B. R. (2004). Squeezing interval change from ordinal panel data: Latent growth curves with ordinal outcomes. In Psychological Methods 9.3 (p. 301).
  31. Mehta, P. D., & West, S. G. (2000). Putting the individual back into individual growth curves. In Psychological Methods 5.1 (p. 23).
  32. Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. In Psychometrika 49.1 (pp. 115–132).
  33. Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: a more flexible representation of substantive theory. In Psychological Methods 17.3 (pp. 313–335).
  34. Nasser, F., & Wisenbaker, J. (2003). A Monte Carlo study investigating the impact of item parceling on measures of fit in confirmatory factor analysis. In Educational and Psychological Measurement 63.5 (pp. 729–757).
  35. Neale, M. C., et al. (1989). Bias in correlations from selected samples of relatives: the effects of soft selection. In Behavior Genetics 19.2 (pp. 163–169).
  36. Neale, M. C., et al. (2016). OpenMx 2.0: extended structural equation and statistical modeling. In Psychometrika 81.2 (pp. 535–549).
  37. von Oertzen, T., & Brick, T. R. (2014). Efficient Hessian computation using sparse matrix derivatives in RAM notation. In Behavior Research Methods 46.2 (pp. 385–395).
  38. Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. In Psychological Methods 17.3 (pp. 354– 373).
  39. Richardson, L. F. (1911). The approximate arithmetical solution by finite differences of physical problems involving differential equations, with an application to the stresses in a masonry dam. In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character (Vol. 210, pp. 307–357).
  40. Rubin, D. B. (1976). Inference and missing data. In Biometrika 63.3 (pp. 581–592).
  41. Anders, S., & Rabe-Hesketh, S. (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Boca Raton: CRC Press.Google Scholar
  42. Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across repeated allocations of items to parcels. In Multivariate Behavioral Research 45.2 (pp. 322–358).
  43. van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. In Journal of Statistical Software 45.3 (pp. 1–67).
  44. Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. In Transactions of the American Mathematical Society 54.3 (pp. 426–482).
  45. Wilhelm, S., & Manjunath, G. B. (2015). tmvtnorm: truncated Multivariate Normal and Student t Distribution. R package version 1.4-10.
  46. Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. In The Annals of Mathematical Statistics 9.1 (pp. 60–62).Google Scholar
  47. Wu, W., Jia, F., & Enders, C. (2015). A comparison of imputation strategies for ordinal missing data on Likert scale variables. In Multivariate Behavioral Research 50.5 (pp. 484–503).
  48. Zhang, Z., et al. (2013). Bayesian inference and application of robust growth curve models using Student’s t distribution. In Structural Equation Modeling: a Multidisciplinary Journal 20.1 (pp. 47–78).

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Joshua N. Pritikin
    • 1
  • Timothy R. Brick
    • 2
  • Michael C. Neale
    • 1
  1. 1.Department of Psychiatry and Virginia Institute for Psychiatric and Behavior GeneticsVirginia Commonwealth UniversityRichmondUSA
  2. 2.Department of Human Development and Family StudiesPennsylvania State UniversityState CollegeUSA

Personalised recommendations