Computational Statistics

, Volume 28, Issue 4, pp 1385–1452 | Cite as

Linear latent variable models: the lava-package

  • Klaus Kähler Holst
  • Esben Budtz-Jørgensen
Original Paper


An R package for specifying and estimating linear latent variable models is presented. The philosophy of the implementation is to separate the model specification from the actual data, which leads to a dynamic and easy way of modeling complex hierarchical structures. Several advanced features are implemented including robust standard errors for clustered correlated data, multigroup analyses, non-linear parameter constraints, inference with incomplete data, maximum likelihood estimation with censored and binary observations, and instrumental variable estimators. In addition an extensive simulation interface covering a broad range of non-linear generalized structural equation models is described. The model and software are demonstrated in data of measurements of the serotonin transporter in the human brain.


Latent variable model Structural equation model  Maximum likelihood Serotonin Seasonality SERT 



We thank the referees for helpful comments. This work was supported by The Danish Agency for Science, Technology and Innovation.


  1. Andersen EB (1971) The asymptotic distribution of conditional likelihood ratio tests. J Am Stat Assoc 66(335):630–633zbMATHCrossRefGoogle Scholar
  2. Angrist J (2001) Estimation of limited dependent variable models with dummy endogenous regressors: simple strategies for empirical practice. J Bus Econ Stat 19:2–16MathSciNetCrossRefGoogle Scholar
  3. Bates D, Maechler M (2009) lme4: linear mixed-effects models using S4 classes., R package version 0.999375-31
  4. Boker S, Neale M, Maes H, Wilde M, Spiegel M, Brick T, Spies J, Estabrook R, Kenny S, Bates T, Mehta P, Fox J (2011) Openmx: an open source extended structural equation modeling framework. Psychometrika 76:306–317MathSciNetzbMATHCrossRefGoogle Scholar
  5. Bollen K (1996) An alternative two stage least squares (2sls) estimator for latent variable equations. Psychometrika 61(1):109–121MathSciNetzbMATHCrossRefGoogle Scholar
  6. Bollen KA (1989) Structural equations with latent variables. Applied probability and statistics, Wiley series in probability and mathematical statistics. Wiley, New YorkGoogle Scholar
  7. Bollen KA (2001) Two-stage least squares and latent variable models: simultaneous estimation and robustness to misspecification. In: Cudeck R, Sörbom D, Du Toit SHC (eds) Structural equation modeling, present and future: a festschrift in honor of Karl Jöreskog, Scientific Software International, LincolnwoodGoogle Scholar
  8. Bollen KA, Kirby JB, Curran PJ, Paxton PM, Chen F (2007) Latent variable models under misspecification two-stage least squares (2SLS) and maximum likelihood (ML) estimators. Soc Methods Res 36(1):48–86. doi: 10.1177/0049124107301947 MathSciNetCrossRefGoogle Scholar
  9. Budtz-Jørgensen E, Keiding N, Grandjean P, Weihe P, White RF (2003) Statistical methods for the evaluation of health effects of prenatal mercury exposure. Environmetrics 14:105–120CrossRefGoogle Scholar
  10. Caffo B, Griswold M (2006) A user-friendly introduction to link-probit-normal models. Am Stat 60(2): 139–145MathSciNetCrossRefGoogle Scholar
  11. Csardi G, Nepusz T (2006) The igraph software package for complex network research, InterJ, Complex Syst 1695.
  12. Ditlevsen S, Christensen U, Lynch J, Damsgaard MT, Keiding N (2005) The mediation proportion: a structural equation approach for estimating the proportion of exposure effect on outcome explained by an intermediate variable. Epidemiology 16(1):114–120. doi: 10.1097/01.ede.0000147107.76079.07 Google Scholar
  13. Erritzoe D, Holst KK, Frokjaer VG, Licht CL, Kalbitzer J, Nielsen FA, Svarer C, Madsen J, Knudsen GM (2010) A nonlinear relationship between cerebral serotonin transporter and 5-HT2A receptor binding: an in vivo molecular imaging study in humans. J Neurosci 30(9):3391–3397. doi:10.1523/JNEUROSCI.2852-09.2010. Google Scholar
  14. Fox J (2006) Teacher’s corner: structural equation modeling with the sem package in r. Struct Equ Model Multidiscip J 13(13):465–585. doi: 10.1207/s15328007sem1303_7 CrossRefGoogle Scholar
  15. Fox J (2009) Sem: structural equation models., R package version 0.9-16
  16. Gansner ER, North SC (1999) An open graph visualization system and its applications to software engineering. Softw Pract Exper 30:1203–1233CrossRefGoogle Scholar
  17. Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80.
  18. Gentry J, Long L, Gentleman R, Falcon S, Hahne F, Sarkar D (2009) Rgraphviz: provides plotting capabilities for R graph objects. R package version 1.20.3Google Scholar
  19. Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2009) Mvtnorm: multivariate normal and t distributions., R package version 0.9-4
  20. Gilbert P (2009) NumDeriv: accurate numerical derivatives., R package version 2006.4-1
  21. Greene WH (2002) Econometric analysis, 5th edn. Prentice Hall, Englewood CliffsGoogle Scholar
  22. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70. doi: 10.2307/4615733 MathSciNetzbMATHGoogle Scholar
  23. Holst KK (2011) Lava.tobit: latent variable models with censored and binary outcomes., R package version 0.4-3
  24. Holst KK (2012) Gof: model-diagnostics based on cumulative residuals., R package version 0.8-1
  25. Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685MathSciNetzbMATHCrossRefGoogle Scholar
  26. Hotelling H (1953) New light on the correlation coefficient and its transforms. J R Stat Soc Ser B 15:193–225 (discussion, 225–232)Google Scholar
  27. Jöreskog K (1970) A general method for analysis of covariance structures. Biometrika 57:239–251MathSciNetzbMATHGoogle Scholar
  28. Kalbitzer J, Erritzoe D, Holst KK, Nielsen F, Marner L, Lehel S, Arentzen T, Jernigan TL, Knudsen GM (2010) Seasonal changes in brain serotonin transporter binding in short serotonin transporter linked polymorphic region-allele carriers but not in long-allele homozygotes. Biol Psychiatry 67:1033–1039. doi: 10.1016/j.biopsych.2009.11.027 CrossRefGoogle Scholar
  29. Kenward MG, Molenberghs G (1998) Likelihood based frequentist inference when data are missing at random. Stat Sci 13(3):236–247. doi: 10.1214/ss/1028905886 MathSciNetzbMATHCrossRefGoogle Scholar
  30. Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics 38:963–974zbMATHCrossRefGoogle Scholar
  31. Lehmann EL, Romano JP (2005) Testing statistical hypotheses. Springer texts in statistics. Springer, New YorkGoogle Scholar
  32. Liang KY, Zeger S (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22MathSciNetzbMATHCrossRefGoogle Scholar
  33. Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley series in probability and statistics, Wiley, HobokenGoogle Scholar
  34. Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley, ChichesterGoogle Scholar
  35. McArdle JJ, McDonald RP (1984) Some algebraic properties of the reticular action model for moment structures. Br J Math Stat Psychol 37(2):234–251zbMATHCrossRefGoogle Scholar
  36. Muthén LK, Muthén BO (2007) Mplus user’s guide (version 5), 5th edn. Muthén& Muthén, Los AngelesGoogle Scholar
  37. Paik M (1988) Repeated measurement analysis for nonnormal data in small samples. Commun Stat Simul Comput 17:1155–1171zbMATHCrossRefGoogle Scholar
  38. Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-PLUS. Springer, BerlinzbMATHCrossRefGoogle Scholar
  39. Pinheiro JC, Chao EC (2006) Efficient laplacian and adaptive gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat 15(1):58–81MathSciNetCrossRefGoogle Scholar
  40. R Development Core Team (2010) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna., ISBN 3-900051-07-0
  41. REvolution Computing (2009) Foreach: foreach looping construct for R., R package version 1.3.0
  42. Rabe-Hesketh S, Skrondal A, Pickles A (2004) Generalized multilevel structural equation modeling. Psychometrika 69:167–190. doi: 10.1007/BF02295939 Google Scholar
  43. Raftery A (1993) Bayesian model selection in structural equation models. In: Bollen K, Long J (eds) Testing structural equation models. Sage, Newbury Park, pp 163–180Google Scholar
  44. Rotnitzky A, Robins JM (1995) Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82(4):805–820MathSciNetzbMATHCrossRefGoogle Scholar
  45. Sanchez BN, Budtz-Jørgensen E, Ryan LM, Hu H (2005) Structural equation models: a review with applications to environmental epidemiology. J Am Stat Assoc 100:1443–1455zbMATHCrossRefGoogle Scholar
  46. Sharpsteen C, Bracken C (2010) TikzDevice: a device for R graphics output in PGF/TikZ format., R package version 0.5.2/r34
  47. Steiger JH (2001) Driving fast in reverse. J Am Stat Assoc 96(453):331–338. doi: 10.1198/016214501750332893 CrossRefGoogle Scholar
  48. Therneau T, original R port by Thomas Lumley (2009) Survival: survival analysis, including penalised likelihood., R package version 2.35-8
  49. White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50(1):1–26MathSciNetzbMATHCrossRefGoogle Scholar
  50. Williams RL (2000) A note on robust variance estimation for cluster-correlated data. Biometrics 56(2):645–646. doi: 10.1111/j.0006-341X.2000.00645.x zbMATHCrossRefGoogle Scholar
  51. Yan J, Fine J (2004) Estimating equations for association structures. Stat Med 23:859–874. doi: 10.1002/sim.1650 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Department of BiostatisticsUniversity of CopenhagenCopenhagenDenmark

Personalised recommendations