Advertisement

Behavior Research Methods

, Volume 50, Issue 2, pp 490–500 | Cite as

Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random

  • Joshua N. Pritikin
  • Timothy R. Brick
  • Michael C. Neale
Article
  • 216 Downloads

Abstract

A novel method for the maximum likelihood estimation of structural equation models (SEM) with both ordinal and continuous indicators is introduced using a flexible multivariate probit model for the ordinal indicators. A full information approach ensures unbiased estimates for data missing at random. Exceeding the capability of prior methods, up to 13 ordinal variables can be included before integration time increases beyond 1 s per row. The method relies on the axiom of conditional probability to split apart the distribution of continuous and ordinal variables. Due to the symmetry of the axiom, two similar methods are available. A simulation study provides evidence that the two similar approaches offer equal accuracy. A further simulation is used to develop a heuristic to automatically select the most computationally efficient approach. Joint ordinal continuous SEM is implemented in OpenMx, free and open-source software.

Keywords

Structural equation modeling Multivariate probit Joint ordinal continuous Continuous latent variables Maximum likelihood 

References

  1. Aitken, A. C. (1935). Note on selection from a multivariate normal population. In Proceedings of the Edinburgh Mathematical Society (series 2) 4.2 (pp. 106–110).  https://doi.org/10.1017/S0013091500008063
  2. Asparouhov, T., & Muthén, B. (2010). Bayesian analysis of latent variable models using Mplus. Retrieved November 1, 2016 from http://statmodel.com/download/BayesAdvantages6.pdf
  3. Baker, F. B., & Kim, S. H. (2004) Item Response Theory: Parameter Estimation Techniques. 2nd. Boca Raton: CRC Press.Google Scholar
  4. Bodner, T. E. (2008). What improves with increased missing data imputations? In Structural Equation Modeling 15.4 (pp. 65–675).  https://doi.org/10.1080/10705510802339072
  5. Bradley, E. L. (1973). The equivalence of maximum likelihood and weighted least squares estimates in the exponential family. In Journal of the American Statistical Association 68.341 (pp. 199–200).Google Scholar
  6. Broyden, C. G. (1965). A class of methods for solving nonlinear simultaneous equations. In Mathematics of Computation 19.92 (pp. 577–593).  https://doi.org/10.2307/2003941
  7. van Stef, B. (2012) Flexible imputation of missing data. Boca Raton: CRC Press.Google Scholar
  8. Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. In Psychometrika 75.1 (pp. 33–57).  https://doi.org/10.1007/s11336-009-9136-x
  9. Driver, C. C., Oud, J. H. L., & Voelkle, M. C. (2017). Continuous time structural equation modeling with R Package ctsem. In Journal of Statistical Software 77.5 (pp. 1–35).  https://doi.org/10.18637/jss.v077.i05
  10. Duncan, S. C., Duncan, T. E., & Strycker, L. A. (2001). Qualitative and quantitative shifts in adolescent problem behavior development: a cohort-sequential multivariate latent growth modeling approach. In Journal of Psychopathology and Behavioral Assessment 23.1 (pp. 43–50).  https://doi.org/10.1023/A:1011091523808
  11. Elliott, D. (2008). National Youth Survey [United States]: Waves I-V, 1976-1980. Inter-university Consortium for Political and Social Research (ICPSR) [distributor].  https://doi.org/10.3886/ICPSR08375.v2
  12. Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. In Structural Equation Modeling 8.3 (pp. 430–457).  https://doi.org/10.1207/S15328007SEM0803_5
  13. Ferron, J. M., & Hess, M. R. (2007). Estimation in SEM: a concrete example. In Journal of Educational and Behavioral Statistics 32.1 (pp. 110–120).  https://doi.org/10.3102/1076998606298025
  14. Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. In Psychological Methods 9.4 (pp. 466-491).  https://doi.org/10.1037/1082-989X.9.4.466
  15. Genz, A. (1992). Numerical computation of multivariate normal probabilities. In Journal of Computational and Graphical Statistics 1.2 (pp. 141–149).  https://doi.org/10.1080/10618600.1992.10477010
  16. Gilbert, P., & Varadhan, R. (2012). numDeriv: accurate Numerical Derivatives. R package version 2012.9-1. http://CRAN.R-project.org/package=numDeriv
  17. Griewank, A. (1989). On automatic differentiation. In Mathematical Programming: Recent Developments and Applications 6.6 (pp. 83–107).Google Scholar
  18. Hagenaars, J. A. (1988). Latent structure models with direct effects between indicators local dependence models. In Sociological Methods & Research 16.3 (pp. 379–405).  https://doi.org/10.1177/0049124188016003002
  19. Jöreskog, K. G. (1990). New developments in LISREL: analysis of ordinal variables using polychoric correlations and weighted least squares. In Quality & Quantity 24.4 (pp. 387–404).  https://doi.org/10.1007/BF00152012
  20. Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: a comparison of three approaches. In Multivariate Behavioral Research 36.3 (pp. 347–387).  https://doi.org/10.1207/S15327906347-387
  21. Kirkpatrick, R. M., & Neale, M. C. (2016). Applying multivariate discrete distributions to genetically informative count data. In Behavior Genetics 46.2 (pp. 252–268).  https://doi.org/10.1007/s10519-015-9757-z
  22. Kline, R. B. (2015) Principles and practice of structural equation modeling. New York: The Guilford Press.Google Scholar
  23. KruŻel, F., & Banaś, K. (2013). Vectorized openCL implementation of numerical integration for higher-order finite elements. In Computers & Mathematics with Applications 66.10 (pp. 2030–2044).  https://doi.org/10.1016/j.camwa.2013.08.026
  24. Lee, S. -Y., Poon, W. -Y., & Bentler, P. M. (1990). Full maximum likelihood analysis of structural equation models with polytomous variables. In Statistics & Probability Letters 9.1 (pp. 91–97).  https://doi.org/10.1016/0167-7152(90)90100-L
  25. Lee, S. -Y., Poon, W. -Y., & Bentler, P. M. (1992). Structural equation models with continuous and polytomous variables. In Psychometrika 57.1 (pp. 89–105).  https://doi.org/10.1007/BF02294660
  26. Little, R. J. A., & Schlucter, M. D. (1985). Maximum likelihood estimation for mixed continuous and categorical data with missing values. In Biometrika 72.3 (pp. 497–512).  https://doi.org/10.1093/biomet/72.3.497
  27. Lord, F. M., Novick, M. R., & Birnbaum, A. (1968) Statistical theories of mental test scores. Oxford: Addison-Wesley.Google Scholar
  28. Manjunath, G. B., & Wilhelm, S. (2012). Moments Calculation For the Doubly Truncated Multivariate Normal Density. arXiv:1206.5387[stat.CO].
  29. Matsunaga, M. (2008). Item parceling in structural equation modeling: a primer. In Communication Methods and Measures 2.4 (pp. 260– 293).  https://doi.org/10.1080/19312450802458935
  30. Mehta, P. D., Neale, M. C., & Flay, B. R. (2004). Squeezing interval change from ordinal panel data: Latent growth curves with ordinal outcomes. In Psychological Methods 9.3 (p. 301).  https://doi.org/10.1037/1082-989X.9.3.301
  31. Mehta, P. D., & West, S. G. (2000). Putting the individual back into individual growth curves. In Psychological Methods 5.1 (p. 23).  https://doi.org/10.1037/1082-989X.5.1.23
  32. Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. In Psychometrika 49.1 (pp. 115–132).  https://doi.org/10.1007/BF02294210
  33. Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: a more flexible representation of substantive theory. In Psychological Methods 17.3 (pp. 313–335).  https://doi.org/10.1037/a0026802
  34. Nasser, F., & Wisenbaker, J. (2003). A Monte Carlo study investigating the impact of item parceling on measures of fit in confirmatory factor analysis. In Educational and Psychological Measurement 63.5 (pp. 729–757).  https://doi.org/10.1177/0013164403258228
  35. Neale, M. C., et al. (1989). Bias in correlations from selected samples of relatives: the effects of soft selection. In Behavior Genetics 19.2 (pp. 163–169).  https://doi.org/10.1007/BF01065901
  36. Neale, M. C., et al. (2016). OpenMx 2.0: extended structural equation and statistical modeling. In Psychometrika 81.2 (pp. 535–549).  https://doi.org/10.1007/s11336-014-9435-8
  37. von Oertzen, T., & Brick, T. R. (2014). Efficient Hessian computation using sparse matrix derivatives in RAM notation. In Behavior Research Methods 46.2 (pp. 385–395).  https://doi.org/10.3758/s13428-013-0384-4
  38. Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. In Psychological Methods 17.3 (pp. 354– 373).  https://doi.org/10.1037/a0029315
  39. Richardson, L. F. (1911). The approximate arithmetical solution by finite differences of physical problems involving differential equations, with an application to the stresses in a masonry dam. In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character (Vol. 210, pp. 307–357).  https://doi.org/10.1098/rsta.1911.0009
  40. Rubin, D. B. (1976). Inference and missing data. In Biometrika 63.3 (pp. 581–592).  https://doi.org/10.2307/2335739
  41. Anders, S., & Rabe-Hesketh, S. (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Boca Raton: CRC Press.Google Scholar
  42. Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across repeated allocations of items to parcels. In Multivariate Behavioral Research 45.2 (pp. 322–358).  https://doi.org/10.1080/00273171003680302
  43. van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: multivariate imputation by chained equations in R. In Journal of Statistical Software 45.3 (pp. 1–67). http://www.jstatsoft.org/v45/i03/
  44. Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. In Transactions of the American Mathematical Society 54.3 (pp. 426–482).  https://doi.org/10.2307/1990256
  45. Wilhelm, S., & Manjunath, G. B. (2015). tmvtnorm: truncated Multivariate Normal and Student t Distribution. R package version 1.4-10. http://CRAN.R-project.org/package=tmvtnorm
  46. Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. In The Annals of Mathematical Statistics 9.1 (pp. 60–62).Google Scholar
  47. Wu, W., Jia, F., & Enders, C. (2015). A comparison of imputation strategies for ordinal missing data on Likert scale variables. In Multivariate Behavioral Research 50.5 (pp. 484–503).  https://doi.org/10.1080/00273171.2015.1022644
  48. Zhang, Z., et al. (2013). Bayesian inference and application of robust growth curve models using Student’s t distribution. In Structural Equation Modeling: a Multidisciplinary Journal 20.1 (pp. 47–78).  https://doi.org/10.1080/10705511.2013.742382

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Joshua N. Pritikin
    • 1
  • Timothy R. Brick
    • 2
  • Michael C. Neale
    • 1
  1. 1.Department of Psychiatry and Virginia Institute for Psychiatric and Behavior GeneticsVirginia Commonwealth UniversityRichmondUSA
  2. 2.Department of Human Development and Family StudiesPennsylvania State UniversityState CollegeUSA

Personalised recommendations