Skip to main content
Log in

An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm

  • Original Paper
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

The objective of this study was to examine the performance of the two most popular missing data methods (i.e., multiple imputation and maximum likelihood), as well as newly developed machine learning framework based on random forest algorithm for missing data under various reserach conditions. The design of the simulation study included random and non-random missingness (i.e., MCAR, MAR, and MNAR), small samples, and different levels of missing rates. All statistical inferences were investigated using latent variable interaction modeling. Consistent with the missing data literature, the combined effects of small sample sizes, higher missing rates, and non-ignorable missingness along with complicated modeling structure adversely affected the accuracy of statistical inferences. Although there is a possibility for overparameterization, it is a good way to select MI when convergence is concerned. If the primary goal of research is to investigate the relationship between variables as in many studies, ML would be attractive. MF presented similar performance compared to MI and ML across all research conditions and outperformed when estimating the variability of parameter estimates. Other practical issues pertaining to the missing data methods were also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Denoting complete data as Ycom and it partitioned as Yobs and Ymis. In missing completely at random (MCAR), the probability of an observation being missing (R) does not depend on observed (Yobs) and unobserved (Ymis) measurements (\(P\left(R|{Y}_{\mathrm{com}}\right)=P(R)\)), whereas missing at random (MAR) indicates that missingness depends on observed characteristics of the individuals but not on the missing values (\(P\left(R|{Y}_{\mathrm{com}}\right)=P(R|{Y}_{\mathrm{obs}})\)) (Enders, 2010; Rubin, 1996; Shin et al., 2017).

  2. Data are missing not at random (MNAR) when the probability of missing data on a variable Y can depend on other variables (i.e., \({Y}_{\mathrm{obs}}\)) as well as on the underlying values of Y itself (i.e., \({Y}_{\mathrm{mis}}\)) \(p(R|{Y}_{\mathrm{obs}}, {Y}_{\mathrm{mis}})\).

References

  • Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Sage.

    Google Scholar 

  • Aittokallio, T. (2009). Dealing with missing values in large-scale studies: Microarray data imputation and beyond. Briefings in Bioinformatics, 2(2), 253–264.

    Article  Google Scholar 

  • Ajzen, I. (1987). Attitudes, traits, and actions: Dispositional prediction of behavior in personality and social psychology. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 20, pp. 1–63). Academic Press.

    Google Scholar 

  • Algina, J., & Moulder, B. C. (2001). A note on estimating the Jöreskog-Yang model for latent variable interaction using LISREL 8.3. Structural Equation Modeling, 8, 40–52.

    Article  Google Scholar 

  • Alkasawneh, Pan, & Green. (2007) Multiple imputation for missing data. A caution tale. Sociological Methods and Research, 28(3), 301–309.

  • Allison, P. D. (2003). Missing data techniques for structural equation models. Journal of Abnormal Psychology, 112, 545–557.

    Article  Google Scholar 

  • Allison, P. D. (2006). Multiple imputation of categorical variables under the multivariate normal model. In Paper presented at the annual meeting of the American Sociological Association, Montreal Convention Center, Montreal, Quebec, Canada, Aug. 11, 2006.

  • Allison, P. D. (2010). Missing data. In J. D. Wright & P. V. Marsden (Eds.), Handbook of survey research (pp. 631–657). Emerald Group Publishing Ltd.

    Google Scholar 

  • Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49(2), 155–173.

    Article  Google Scholar 

  • Anderson, T. W. (1957). Maximum likelihood estimates for the multivariate normal distribution when some observations are missing. Journal of the American Statistical Association, 52, 200–203.

    Article  MathSciNet  MATH  Google Scholar 

  • Arbuckle, J. (1996). AMOS-Analysis of moment structures. Small Waters Corporation.

    MATH  Google Scholar 

  • Arminger, G., & Sobel, M. E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195–203.

    Article  MathSciNet  Google Scholar 

  • Asparouhov, T. & Muthén, B. (2010). Bayesian analysis using Mplus: Technical implementation. http://statmodel.com/download/Bayes3.pdf

  • Asparouhov, T., & Muthén, B. (2008). Auxiliary variables predicting missing data. Technical appendix. Muthén & Muthén.

    Google Scholar 

  • Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48, 5–37.

    Article  Google Scholar 

  • Baraldi, A. N., & Enders, C. K. (2013). Missing data methods. In T. D. Little (Ed.), Oxford library of psychology. The Oxford handbook of quantitative methods: Statistical analysis (pp. 635–664). Oxford University Press.

    Google Scholar 

  • Black, A. C., Harel, O., & McCoach, D. B. (2011). Missing data techniques for multilevel data: Implications of model misspecification. Journal of Applied Statistics, 38(9), 1845–1865.

    Article  MathSciNet  MATH  Google Scholar 

  • Boomsma, A. (1985). Nonconvergence, improper solutions, and starting values in LISREL maximum likelihood estimation. Psychometrika, 50, 229–242.

    Article  Google Scholar 

  • Brand, J. (1999). Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets. University of Medical Center, Rotterdam.

  • Breiman, L. (2003). Manual for setting up, using, and understanding random forest V4.0. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    Article  MATH  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Breiman, L., & Cutler, A. (2002). Manual on setting up, using, and understanding random forests V3.1. Berkeley: University of California, Berkeley. http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf

  • Cham, H., Baraldi, A. N., & Enders, C. K. (2013). Applying maximum likelihood estimation and multiple imputation to moderated regression models with incomplete predictor variables. Multivariate Behavioral Research, 45, 153–154.

    Article  Google Scholar 

  • Chiarella, C., Kang, B., Meyer, G., & Ziogas, A. (2014). Computational methods for derivatives with early exercise features. In K. Schmedders & K. L. Judd (Eds.), Handbook of computational economics (3rd ed., chap. 5). Elsevier.

  • Cho, S. J., & Rabe-Hesketh, S. (2011). Alternating imputation posteriors estimation of models with crossed random effects. Computational Statistics & Data Analysis, 55, 12–25.

    Article  MathSciNet  MATH  Google Scholar 

  • Coenders, G., Batista-Foguet, J. M., & Saris, W. E. (2008). Simple, efficient and distribution-free approach to interaction effects in complex structural equation models. Quality & Quantity, 42, 369–396.

    Article  Google Scholar 

  • Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analyses for the behavioral sciences. Erlbaum.

    Google Scholar 

  • Collins, L. M., Schafer, J. L., & Kam, C. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330–351.

    Article  Google Scholar 

  • Copas, J. B., & Li, H. G. (1997). Inference for non-random samples (with discussion). Journal of Royal Statistical Society (series b), 59, 55–96.

  • Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. Irvington.

    Google Scholar 

  • Croy, C. D., & Novins, D. K. (2005). Methods for addressing missing data in psychiatric and developmental research. Journal of the American Academy of Child and Adolescent Psychiatry, 44, 1230–1240.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistics Society (series b), 39, 1–38.

    MATH  Google Scholar 

  • Díaz-Uriarte, R & de Andrés, A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1), 3. Retrieved from http://www.biomedcentral.com/1471-2105/7/3

  • Didelez, V. (2002). ML- and semiparametric estimation in logistic models with incomplete covariate data. Statistica Neerlandica, 56, 330–345.

    Article  MathSciNet  Google Scholar 

  • Dong, F., & Yin, G. (2017). Maximum likelihood estimation for incomplete multinomial data via the weaver algorithm. Statistics and Computing (published on-line).

  • Dong, Y., & Peng, C.-Y.J. (2013). Principled missing data methods for researchers. Springer plus, 2, 222. https://doi.org/10.1186/2193-1801-2-222

    Article  Google Scholar 

  • Doove, L., Van Buuren, S., & Dusseldorp, E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92–104.

    Article  MathSciNet  MATH  Google Scholar 

  • Dubow, E. F., & Ullman, D. G. (1989). Assessing social support in elementary school children: The survey of children’s social support. Journal of Clinical Child Psychology, 18(1), 52–64.

    Article  Google Scholar 

  • Duncan, S., & Duncan, T. (1994). Modeling incomplete longitudinal substance use data using latent variable growth curve methodology. Multivariate Behavioral Research, 29, 313–338.

    Article  Google Scholar 

  • Edwards, S. L., Berzofsky, M. E., & Biemer, P. P. (2018). Addressing nonresponse for categorical data items using full information maximum likelihood with latent GOLD 5.0. RTI Press Publication No. MR-0038-1809. RTI Press.

  • Enders, C. K. (2001a). A Primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling, 8, 128–141.

    Article  Google Scholar 

  • Enders, C. K. (2001b). The impact of nonnormality on full information maximum-likelihood estimation for structural equation models with missing data. Psychological Methods, 6, 352–370.

    Article  Google Scholar 

  • Enders, C. K. (2006). Analyzing structural equation models with missing data. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 313–342). Information Age Publishing.

    Google Scholar 

  • Enders, C. K. (2010). Applied missing data analysis. The Guilford Press.

    Google Scholar 

  • Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457.

    Article  MathSciNet  Google Scholar 

  • Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing. Psychometrika, 44, 409–420.

    Article  MATH  Google Scholar 

  • Ganzach, Y. (1997). Misleading interaction and curvilinear terms. Psychological Methods, 2, 235–247.

    Article  Google Scholar 

  • Gelman, A., & Rubin, D. (1992). A single series from the Gibbs sampler provides a false sense of security. In J. M. Bernardo, J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian statistics (pp. 625–631). Oxford University Press.

    Google Scholar 

  • Gold, M. S., & Bentler, P. M. (2000). Treatments of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization. Structural Equation Modeling, 7, 319–355.

    Article  Google Scholar 

  • Gold, M. S., Bentler, P. M., & Kim, K. H. (2003). A comparison of maximum-likelihood and asymptotically distribution-free methods of treating incomplete nonnormal data. Structural Equation Modeling, 10(1), 47–79.

    Article  MathSciNet  Google Scholar 

  • Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197–218.

    Article  Google Scholar 

  • Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarification of multiple imputation theory. Prevention Science, 8, 206–213.

    Article  Google Scholar 

  • Hallquist, M. N., & Wiley, J. F. (2017). MplusAutomation: an R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling, 25(4), 621–638.

    Article  Google Scholar 

  • Hapfelmeier, A. (2012). Analysis of missing data with random forests (Doctoral dissertation. Ludwig Maximilian University of Munich, Munich, Germany). Retrieved from https://edoc.ub.uni-muenchen.de/15058/

  • Hartley, H. O., & Hocking, R. (1971). The analysis of incomplete data. Biometrics, 27, 783–808.

    Article  Google Scholar 

  • Herzog, W., & Boomsma, A. (2009). Small-sample robust estimators of noncentrality-based and incremental model fit. Structural Equation Modeling, 16(1), 1–27.

    Article  MathSciNet  Google Scholar 

  • Ho, P., Silva, M., & Hogg, T. (2001). Multiple imputation and maximum likelihood principal component analysis of incomplete multivariate data from a study of the ageing of port. Chemometrics and Intelligent Laboratory Systems, 55(2), 1–11.

    Article  Google Scholar 

  • Ishioka, T. (2013). Imputation of missing values for unsupervised data using the proximity in random forests. In The fifth international conference on mobile, hybrid, and on-line learning (pp. 30–36). The National Center for University Entrance Examinations.

  • Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. Annals of Applied Statistics, 2, 841–860.

    Article  MathSciNet  MATH  Google Scholar 

  • Jaccard, J., Turrisi, R., & Wan, C. K. (1990). Interaction effects in multiple regression (Sage university papers series. Quantitative applications in the social sciences; Vol. no. 07-072). Sage Publications.

  • Jackman, S. (2000). Estimation and inference via Bayesian simulation: an introduction to Markov Chain Monte Carlo. American Journal of Political Science, 44(2), 375–404.

    Article  Google Scholar 

  • Jansen, I., Hens, N., Molenberghs, G., Aerts, M., Verbeke, G., & Kenward, M. G. (2006). The nature of sensitivity in monotone missing not at random models. Computational Statistics and Data Analysis, 50, 830–858.

    Article  MathSciNet  MATH  Google Scholar 

  • Jeon, M., & Rijmen, F. (2014). Recent developments in maximum likelihood estimation of MTMM models for categorical data. Frontier in Psychology, 5(1), 1–7.

    Google Scholar 

  • Ji, L., Chow, S.-M., Schermerhorn, A. C., Jacobson, N. C., & Cummings, E. M. (2018). Handling missing data in the modeling of intensive longitudinal data. Structural Equation Modeling, 25(5), 715–736.

    Article  MathSciNet  Google Scholar 

  • Jöreskog, K. G., & Yang, F. (1996). Nonlinear structural equation models: The Kenny-Judd model with interaction effects. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 57–87). Lawrence Erlbaum Associates.

    MATH  Google Scholar 

  • Kang, J., & Shin, T. (2015). The effects of adolescents’ stress on suicidal ideation: Focusing on the moderating and mediating effects of depression and social support. Korean Journal of Youth Studies, 22(5), 27–51.

    Google Scholar 

  • Karasek, R. A. (1979). Job demands, job decision latitude, and mental strain: Implication for job redesign. Administrative Science Quarterly, 24, 285–308.

    Article  Google Scholar 

  • Kelava, A. (2009). Multicollinearity in nonlinear structural equation models. (Doctoral dissertation, Goethe University, Frankfurt, Germany). Retrieved from http://publikationen.ub.uni-frankfurt.de/volltexte/2009/6336/

  • Kelava, A., & Brandt, H. (2009). Estimation of nonlinear latent structural equation models using the extended unconstrained approach. Review of Psychology, 16, 123–131.

    Google Scholar 

  • Kelava, A., Werner, C. S., Schermelleh-Engel, K., Moosbrugger, H., Zapf, D., Ma, Y., Cham, H., Aiken, L. S., & West, S. G. (2011). Advanced nonlinear latent variable modeling: distribution analytic LMS and QML estimators of interaction and quadratic effects. Structural Equation Modeling, 18(3), 465–491.

    Article  MathSciNet  Google Scholar 

  • Kenny, D., & Judd, C. M. (1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96, 201–210.

    Article  Google Scholar 

  • Kenward, M. G., & Carpenter, J. (2007). Multiple imputation: Current perspectives. Statistical Methods in Medical Research, 16, 199–218.

    Article  MathSciNet  MATH  Google Scholar 

  • King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review, 95(1), 49–69.

    Article  Google Scholar 

  • Klein, A. G., & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65, 457–474.

    Article  MathSciNet  MATH  Google Scholar 

  • Klein, A. G., & Muthén, B. O. (2007). Quasi maximum likelihood estimation of structural equation models with multiple interaction and quadratic effects. Multivariate Behavioral Research, 42, 647–674.

    Article  Google Scholar 

  • Klein, A. G., Schermelleh-Engel, K., Moosbrugger, H., & Kelava, A. (2009). Assessing spurious interaction effects. In T. Teo & M. S. Khine (Eds.), Structural equation modeling in educational research: Concepts and applications (pp. 13–28). Sense.

    Google Scholar 

  • Korean Youth Policy Institute. (2007). Korean children and youth panel survey, Sejong-si.

  • Kovacs, M. (1983). The children's depression inventory: A self-rated depression scale for school-aged youngsters. University of Pittsburgh school of medicine, Department of Psychiatry, Western Psychiatric Institute and Clinic.

  • Kroll, C. N., & Stedinger, J. R. (1996). Estimation of moments and quantiles using censored data. Water Resource Research, 32(4), 1005–1012.

    Article  Google Scholar 

  • Larsen, R. (2011). Missing data imputation versus full information maximum likelihood with second-level dependencies. Structural Equation Modeling, 18, 649–662.

    Article  MathSciNet  Google Scholar 

  • Lee, L. E. (1993). Asymptotic distribution of the maximum likelihood estimator for a stochastic frontier function model with a singular information matrix. Econometric Theory, 9, 413–430.

    Article  MathSciNet  Google Scholar 

  • Lee, S. Y., & Song, X. Y. (2004). Bayesian model comparison of nonlinear structural equation models with missing continuous and ordinal data. British Journal of Mathematical and Statistical Psychology, 57, 131–150.

    Article  MathSciNet  Google Scholar 

  • Liao, S. G., Lin, Y., Kang, D., Chandra, D., Bon, J., Kaminski, N., Sciurba, F. C., & Tseng, G. C. (2014). Missing value imputation in high-dimensional phenomic data: Imputable or not, and how? BMC Bioniformatics, 5(15), 346. https://doi.org/10.1186/s12859-014-0346-6

    Article  Google Scholar 

  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.

    Google Scholar 

  • Lin, G.-C., Wen, Z., Marsh, H., & Lin, H.-S. (2010). Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies. Structural Equation Modeling, 17(3), 374–391.

    Article  MathSciNet  Google Scholar 

  • Little, R. J. A. (1992). Regression with missing X’s: a review. Journal of the American Statistical Association, 87, 1227–1237.

    Google Scholar 

  • Little, R. J., & Rubin, D. B. (2002). Statistical analysis with missing data. Wiley.

    Book  MATH  Google Scholar 

  • Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing\powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13(4), 497–519.

    Article  MathSciNet  Google Scholar 

  • Loh, P. L., & Wainwright, M. J. (2011). High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Advances in Neural Information Processing Systems, 24, 2726–2734.

    Google Scholar 

  • Lusch, R. F., & Brown, J. R. (1996). Interdependency, contracting, and relational behavior in marketing channels. Journal of Marketing, 60, 19–38.

    Article  Google Scholar 

  • Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9, 275–300.

    Article  Google Scholar 

  • Marsh, H. W., Wen, Z., & Hau, K. T. (2006). Structural equation models of latent interaction and quadratic effects. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (pp. 225–265). Information Age.

    Google Scholar 

  • Miccéri, T. (1989). The unicorn, the normal curve, and other improbably creatures. Psychological Bulletin, 105, 156–166.

    Article  Google Scholar 

  • Moosbrugger, H., Schermelleh-Engel, K., Kelava, A., & Klein, A. G. (2009). Testing multiple nonlinear effects in structural equation modeling: A comparison of alternative estimation approaches. In T. Teo & M. Khine (Eds.), Structural equation modeling in educational research: Concepts and applications (pp. 103–136). Sense Publishers.

    Google Scholar 

  • Moosbrugger, H., Schermelleh-Engel, K., & Klein, A. G. (1997). Methodological problems of estimating latent interaction effects. Methods of Psychological Research Online, 2, 95–111.

    Google Scholar 

  • Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171–189.

    Article  Google Scholar 

  • Muthén, L. K., & Muthén, B. O. (2018). Mplus version 8.2 [Computer software]. Muthén & Muthén.

  • Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., & Ishii, S. (2003). A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16), 2088–2096.

    Article  Google Scholar 

  • Pantanowitz, A., & Marwala, T. (2008). Evaluating the impact of missing data imputation through the use of the random forest algorithm. School of Electrical and Information Engineering. University of the Witwatersrand Private Bag x3. Wits. 2050. Republic of South Africa. Retrieved from http://arxiv.org/ftp/arxiv/papers/0812/0812.2412.pdf

  • Peng, C.-Y.J., & Zhu, J. (2008). Comparison of two approaches for handling missing covariates in logistic regression. Educational and Psychological Measurement, 68(1), 58–77.

    Article  MathSciNet  Google Scholar 

  • Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525–556.

    Article  Google Scholar 

  • Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353–383.

    Article  Google Scholar 

  • Raghunathan, T. E. (2004). What do we do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health, 25, 99–117.

    Article  Google Scholar 

  • Rotnitzky, A., Cox, D. R., Bottai, M., & Robins, J. (2000). Likelihood-based inference with singular information matrix. Bernoulli, 6, 243–284.

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley.

    Book  MATH  Google Scholar 

  • Rubin, D. B. (1996). Multiple imputation after 18+ years (with discussion). Journal of the American Statistical Association, 91, 473–489.

    Article  MATH  Google Scholar 

  • Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys. Wiley.

    MATH  Google Scholar 

  • Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. In ASA proceedings of the business and economic section (pp. 308–313).

  • Savalei, V., & Bentler, P. M. (2005). A statistically justified pairwise ML method for incomplete nonnormal data: A comparison with direct ML and pairwise ADF. Structural Equation Modeling, 12, 183–214.

    Article  MathSciNet  Google Scholar 

  • Savalei, V., & Bentler, P. M. (2009). A two-stage approach to missing data: Theory and application to auxiliary variables. Structural Equation Modeling, 16(3), 477–497.

    Article  MathSciNet  Google Scholar 

  • Savalei, V., & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 37–62.

    Article  MathSciNet  Google Scholar 

  • Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chapman & Hall.

    Book  MATH  Google Scholar 

  • Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15.

    Article  Google Scholar 

  • Schafer, J. L., & Graham, J. W. (2002). Missing data: our view of the state of the art. Psychological Methods, 7, 147–177.

    Article  Google Scholar 

  • Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33, 545–571.

    Article  Google Scholar 

  • Schmitt, M. (1990). Konsistenz als Persönlichkeitseigenschaft? Moderatorvariablen in der Persönlichkeits- und Einstellungsforschung [Consistency as a personality trait? Moderator variables in personality and attitude research]. Springer.

    Google Scholar 

  • Schouten, R. M., Lugtig, P., & Vink, G. (2018). Generating missing values for simulation purpose: A multivariate amputation procedure. Journal of Statistical Computation and Simulation, 88(15), 2909–2930.

    Article  MathSciNet  MATH  Google Scholar 

  • Shah, A. D., Bartlett, J. W., Carpenter, J., Nicholas, O., & Hemingway, H. (2014). Comparison of random forest and parametric imputation models for imputing missing data using MICE: A caliber study. American Journal of Epidemiology, 179(6), 764–774.

    Article  Google Scholar 

  • Shin, T., Davison, M. L., & Long, J. D. (2009). Effects of missing data methods in structural equation modeling with nonnormal longitudinal data. Structural Equation Modeling, 16, 70–98.

    Article  MathSciNet  Google Scholar 

  • Shin, T., Davison, M. L., & Long, J. D. (2017). Maximum likelihood versus multiple imputation for missing data in small longitudinal samples with nonnormality. Psychological Methods, 22(3), 426–449.

    Article  Google Scholar 

  • Sinharay, S., Stern, H. S., & Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods, 6, 317–329.

    Article  Google Scholar 

  • Snyder, M., & Tanke, E. D. (1976). Behavior and attitude: some people are more consistent than others. Journal of Personality, 44, 501–517.

    Article  Google Scholar 

  • Stekhoven, J. D. (2016). missForest: Nonparametric missing value imputation using random forest. R package version 1.4.

  • Stekhoven, D. J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118.

    Article  Google Scholar 

  • Tang, F. (2017). Random forest missing data approaches. Open Access Dissertations. Retrieved from https://scholarlyrepository.miami.edu/oa_dissertations/1852

  • Tang, F., & Ishwaran, H. (2017). Random forest missing data algorithms. Statistical Analysis and Data Mining, 10, 363–377.

    Article  MathSciNet  MATH  Google Scholar 

  • Taylor, L., & Zhou, X. H. (2009). Multiple imputation methods for treatment noncompliance and nonresponse in randomized clinical trials. Biometrics, 65(1), 88–95.

    Article  MathSciNet  MATH  Google Scholar 

  • van Brand, J., Buuren, S., & Groothuis-Oudshoorn, C. (2003). A toolkit in SAS for the evaluation of multiple imputation methods. Statist Neerlandica, 57(1), 36–45.

    Article  MathSciNet  Google Scholar 

  • van Burren, S. (2012). Flexible imputation of missing data. Chapman & Hall/CRC.

    Book  Google Scholar 

  • van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, K., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064.

    Article  MathSciNet  MATH  Google Scholar 

  • von Hipple, P. (2007). Regression with missing y’s: An improved method for analyzing multiple imputed data. Sociological Methodology, 37, 83–117.

    Article  Google Scholar 

  • Von Hipple, P. (2016). New confidence intervals and bias comparisons show that maximum likelihood can be multiple imputation in small samples. Structural Equation Modeling, 23(3), 422–437.

    Article  MathSciNet  Google Scholar 

  • Waljee, A. K., Mukherjee, A., Singal, A. G., Zhang, Y., Warren, J., Balis, U., Marrero, J., Zhu, J., & Higgins, P. D. R. (2013). Comparison of imputation methods for missing laboratory data in medicine. British Medical Journal Open, 3, 1–7.

    Google Scholar 

  • Wall, M. M., & Amemiya, Y. (2000). Estimation for polynomial structural equation models. Journal of the American Statistical Association, 95, 929–940.

    Article  MathSciNet  MATH  Google Scholar 

  • Wilks, S. S. (1932). Moments and distributions of estimates of population parameters from fragmentary samples. Annals of Mathematical Statistics, 3, 163–195.

    Article  MATH  Google Scholar 

  • Wothke, W. (2000). Longitudinal and multi-group modeling with missing data. In T. D. Little, K. U. Schnabel, & J. Baumert (Eds.), Modeling longitudinal and multiple group data: Practical issues, applied approaches, and specific examples (pp. 219–240). Erlbaum.

    Google Scholar 

  • Yuan, K. H., & Bentler, P. M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical and Statistical Psychology, 51, 289–309.

    Article  Google Scholar 

  • Yuan, K. H., & Bentler, P. M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. Sociological Methodology, 30, 165–200.

    Article  Google Scholar 

  • Yuan, K. H., Fan, Y., & Bentler, P. M. (2012a). ML versus MI for missing data with violation of distribution conditions. Sociological Methods & Research, 41(4), 598–629.

    Article  MathSciNet  Google Scholar 

  • Yuan, K. H., Yang-Wallentin, F., & Bentler, P. M. (2012b). ML versus MI for missing data with violation of distribution conditions. Social Methods Research, 4(4), 598–629.

    Article  MathSciNet  Google Scholar 

  • Yuan, K. H., Tong, X., & Zhang, Z. (2015). Bias and efficiency for SEM with missing data and auxiliary variables: Two-stage robust method versus two-stage ML. Structural Equation Modeling, 22(2), 178–192.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tacksoo Shin.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shin, T., Long, J.D. & Davison, M.L. An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm. Jpn J Stat Data Sci 5, 629–659 (2022). https://doi.org/10.1007/s42081-022-00176-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-022-00176-w

Keywords

Navigation